Sixth Unsolved Problem in Data Science and Analytics
Our sixth unsolved problem is Avoiding Mathematical Malpractice.
Mathematical Malpractice is fun to say. It really feels good if you put the emphasis on “Mal”. Mal, of course is Latin for “badly”. In Old French it means “evil” or “wrong.” All those meanings come to play here. The Malpractice in analytics and AI is bad, evil and wrong.
We recently completed a 3-year study on best practices in analytics, modeling and simulation, including AI. A common error we saw was assuming uncertainty has a Normal distribution. There seem to be two reasons people do this. One reason is blamed on math professors. People think their professor told them nearly all uncertainty is normally distributed. And, it’s even CALLED “normal”; can’t we expect that’s what we normally find? Newsflash – that’s not true. Here are real world examples. The data on the left is from a study on happiness and its distribution around the world. There are 11 distributions displayed. None are Gaussian, or “Normal.”
Well, you say, that’s a bunch of soft social science guys with made up subjective data. Ok, look at distribution 2. That’s what AMA doctors report as wait times for insurance companies to approve the drugs or procedures they want to perform. It’s not normal, not Poisson, and it’s not a log normal either.
Well you say, there are still messy humans involved. So, look at distribution 3. That’s an autopilot history, of altitude. It is also not any distribution even named for a dead mathematician.
Distribution 4 is the recorded lengths of MLB games in one season. 5 is what the fans thought the game lengths were. Again – not normal, and not normal.
So, whatever your math professor said, or you thought they said, the normal assumption is dangerous.
The second reason for this mistake is subtler.
People are using tools and methods which are only valid for normally distributed data, and they don’t even know it. They just dump data into a process and hope for the best. Sort of like a doctor hoping that bleeding you will make you well. It seems like malpractice.
In our benchmarking, we did not find any self-labeled AI practitioners who were exemplars of best practice. We looked at several of them, and we tested their methods to see if we could repeat the results they claimed they got. Repeatable results are a hallmark of real science, as you might recall. None of the work we tested proved to be repeatable.
It’s possible that we just didn’t find the right practitioners to benchmark. But we did come up with a checklist for AI/ML risks. The checklist asks 10 questions, 4 of them dealing with mathematical malpractice.
Most sciences have agreed on what good practice looks like, so you know malpractice when you see it. We don’t have that for analytics, data science or AI. That’s our 6th unsolved problem.
About Lone Star Analysis
Lone Star Analysis enables customers to make insightful decisions faster than their competitors. We are a predictive guide bridging the gap between data and action. Prescient insights support confident decisions for customers in Oil & Gas, Transportation & Logistics, Industrial Products & Services, Aerospace & Defense, and the Public Sector.
Lone Star delivers fast time to value supporting customers planning and on-going management needs. Utilizing our TruNavigator® software platform, Lone Star brings proven modeling tools and analysis that improve customers top line, by winning more business, and improve the bottom line, by quickly enabling operational efficiency, cost reduction, and performance improvement. Our trusted AnalyticsOSSM software solutions support our customers real-time predictive analytics needs when continuous operational performance optimization, cost minimization, safety improvement, and risk reduction are important.
Headquartered in Dallas, Texas, Lone Star is found on the web at http://www.Lone-Star.com.