Someday, you will read reviews saying things like, “2017, that was a great vintage for Machine Learning Data” or “2025 was a very bad data vintage.”

For grapes, what makes a “great vintage” is weather bad enough to make grapevines struggle but not bad enough to kill the vines. When the vines struggle a little, they often create superior grapes.

You want plenty of sunshine and enough rain. But you do not want a lot of rain; even if the vines flourish, the grapes may rot or get fungal diseases or just be lower quality. Of course, this varies by region and grape type. Wonderful weather for a Riseling could be terrible for a Cabernet Sauvignon.

The same type of factors apply to the best data for Machine Learning. You would like to have enough variation to see distinct factors and trends emerge, but not huge disruptions, wiping out the patterns we want to learn.

For example, if you are tracking consumer data, you might want to know if consumers are spending more money immediately after payday but not spending as much several days after payday. That means they have limited discretionary income to spend. Or you might want to see how gasoline purchases define the summer driving season when many Americans “hit the road.” For this, we need repeatable patterns over weeks, months, and years.

2020 US Airline Travel

2025 Machine Learning Airline Travel Graphic for 2020

Like wine vintages, some years create terrible data for Machine Learning. 2020 was a pretty terrible year. Covid disrupted everything. The number of people going through TSA checkpoints tells part of the story. In 2025, the Tariffs and the threat of trade wars are shifting global economic activity, so 2025 will not be a great vintage year for data either.

You might object that 2020 was a “black swan,” but that misses the point. We ALWAYS have “rare events.” Here are some “Black Swans” that might impact the quality of your data vintage.

“Black Swans” Impacting Data Quality

Year Event Scale
1990 Saddam Hussein orders the invasion of Kuwait, leading to the first Gulf War Global
1991 Mount Pinatubo Eruption causes widespread weather, travel, and economic disruption Global
1992 45 US Disasters – New Record for US FEMA National
1993 Largest floods in US History (Missouri, Mississippi Rivers) National
1994 Most Expensive Earthquake in US History, Northridge, California Regional
1995 Chicago Heat Wave kills over 700 people Regional
1996 New York City Blizzard – Worst since 1888 – Drifts 8 feet high Global
1997 Asian Financial Crisis National
1998 Worst US Drought and Heatwave since the Great Depression disrupts crops and meat production National
1998 Yangtze flood in China leaves 15 million homeless National
1999 December rainstorms cause thousands of landslides in Venezuela National
2000 Dotcom Bubble Bursts, A Trillion Dollars of Market Capital Lost Global
2001 9/11 Terror Attack; Twin Towers, Pentagon Global
2002 Worst European flooding in centuries disrupts barge traffic on the Danube, Elbe, and Moldova Rivers Global
2004 Tsunami impacts 17 nations, 230,000 dead Global
2005 The Most Expensive Hurricane in US History, Katrina had over $200 Billion in damage National
2008 Housing Market Crash, Financial Crisis (S&P down 57% by March 2009) Global
2008 Magnitude 8 Earthquake leaves millions of people homeless in China National
2010 Floods in China affect over 60 million people National
2011 Tsunami Devastates Japanese Coast – Most Expensive Natural Disaster in History National
2012 Superstorm Sandi causes about $90 Billion in East Coast Damage; NYSE closes National
2013 Typhoon Haiyan (the strongest typhoon in history?) displaces nearly 4 million people National
2014 Russia Invades Ukraine (Chapter 1) Global
2015 Coordinated Terror Attacks in Paris Global
2015 A magnitude 7.8 Earthquake leaves a million people homeless in Nepal Regional
2017 Three of the Most Expensive Hurricanes in US History: Harvey, Irma, and Maria National
2020 Covid-19 Epidemic disrupts economy, travel… Global
2022 Russian Invades Ukraine (again) Global
2023 Hamas invades Israel, Takes 251 hostages National
2023 Houthis’ Red Sea attacks begin, Disrupting world shipping Global
2024 Drought Disrupts Panama Canal and World Shipping Global
2025 Los Angles Wildfires Regional
2025 New Administration Trade/Tariff Policy Global
Totals Global Disruptions 15
National Disruptions 14
Regional Disruptions 4

Some disruptions create a dramatic impact on data sets. Others may not damage the data you care about. A cool year might be particularly good for Riesling and terrible for Cabernet Sauvignon. So, if your AI depends on high-quality data for machine learning, you should probably be concerned about whether your data is “good enough” for your purposes and whether the world tomorrow is “close enough” to yesterday’s data your AI and ML are learning from. At Lone Star, our Evolved AI® incorporates methods to cope with bad vintage data and even no data at all. If your AI cannot do this, perhaps we should talk.

* indicates required