Why 2025 is a Bad Vintage for Machine Learning
Someday, you will read reviews saying things like, “2017, that was a great vintage for Machine Learning Data” or “2025 was a very bad data vintage.”
For grapes, what makes a “great vintage” is weather bad enough to make grapevines struggle but not bad enough to kill the vines. When the vines struggle a little, they often create superior grapes.
You want plenty of sunshine and enough rain. But you do not want a lot of rain; even if the vines flourish, the grapes may rot or get fungal diseases or just be lower quality. Of course, this varies by region and grape type. Wonderful weather for a Riseling could be terrible for a Cabernet Sauvignon.
The same type of factors apply to the best data for Machine Learning. You would like to have enough variation to see distinct factors and trends emerge, but not huge disruptions, wiping out the patterns we want to learn.
For example, if you are tracking consumer data, you might want to know if consumers are spending more money immediately after payday but not spending as much several days after payday. That means they have limited discretionary income to spend. Or you might want to see how gasoline purchases define the summer driving season when many Americans “hit the road.” For this, we need repeatable patterns over weeks, months, and years.
2020 US Airline Travel
Like wine vintages, some years create terrible data for Machine Learning. 2020 was a pretty terrible year. Covid disrupted everything. The number of people going through TSA checkpoints tells part of the story. In 2025, the Tariffs and the threat of trade wars are shifting global economic activity, so 2025 will not be a great vintage year for data either.
You might object that 2020 was a “black swan,” but that misses the point. We ALWAYS have “rare events.” Here are some “Black Swans” that might impact the quality of your data vintage.
“Black Swans” Impacting Data Quality
Year | Event | Scale |
1990 | Saddam Hussein orders the invasion of Kuwait, leading to the first Gulf War | Global |
1991 | Mount Pinatubo Eruption causes widespread weather, travel, and economic disruption | Global |
1992 | 45 US Disasters – New Record for US FEMA | National |
1993 | Largest floods in US History (Missouri, Mississippi Rivers) | National |
1994 | Most Expensive Earthquake in US History, Northridge, California | Regional |
1995 | Chicago Heat Wave kills over 700 people | Regional |
1996 | New York City Blizzard – Worst since 1888 – Drifts 8 feet high | Global |
1997 | Asian Financial Crisis | National |
1998 | Worst US Drought and Heatwave since the Great Depression disrupts crops and meat production | National |
1998 | Yangtze flood in China leaves 15 million homeless | National |
1999 | December rainstorms cause thousands of landslides in Venezuela | National |
2000 | Dotcom Bubble Bursts, A Trillion Dollars of Market Capital Lost | Global |
2001 | 9/11 Terror Attack; Twin Towers, Pentagon | Global |
2002 | Worst European flooding in centuries disrupts barge traffic on the Danube, Elbe, and Moldova Rivers | Global |
2004 | Tsunami impacts 17 nations, 230,000 dead | Global |
2005 | The Most Expensive Hurricane in US History, Katrina had over $200 Billion in damage | National |
2008 | Housing Market Crash, Financial Crisis (S&P down 57% by March 2009) | Global |
2008 | Magnitude 8 Earthquake leaves millions of people homeless in China | National |
2010 | Floods in China affect over 60 million people | National |
2011 | Tsunami Devastates Japanese Coast – Most Expensive Natural Disaster in History | National |
2012 | Superstorm Sandi causes about $90 Billion in East Coast Damage; NYSE closes | National |
2013 | Typhoon Haiyan (the strongest typhoon in history?) displaces nearly 4 million people | National |
2014 | Russia Invades Ukraine (Chapter 1) | Global |
2015 | Coordinated Terror Attacks in Paris | Global |
2015 | A magnitude 7.8 Earthquake leaves a million people homeless in Nepal | Regional |
2017 | Three of the Most Expensive Hurricanes in US History: Harvey, Irma, and Maria | National |
2020 | Covid-19 Epidemic disrupts economy, travel… | Global |
2022 | Russian Invades Ukraine (again) | Global |
2023 | Hamas invades Israel, Takes 251 hostages | National |
2023 | Houthis’ Red Sea attacks begin, Disrupting world shipping | Global |
2024 | Drought Disrupts Panama Canal and World Shipping | Global |
2025 | Los Angles Wildfires | Regional |
2025 | New Administration Trade/Tariff Policy | Global |
Totals | Global Disruptions | 15 |
National Disruptions | 14 | |
Regional Disruptions | 4 |
Some disruptions create a dramatic impact on data sets. Others may not damage the data you care about. A cool year might be particularly good for Riesling and terrible for Cabernet Sauvignon. So, if your AI depends on high-quality data for machine learning, you should probably be concerned about whether your data is “good enough” for your purposes and whether the world tomorrow is “close enough” to yesterday’s data your AI and ML are learning from. At Lone Star, our Evolved AI® incorporates methods to cope with bad vintage data and even no data at all. If your AI cannot do this, perhaps we should talk.