What if I Don’t Have Historical Data? – Advanced Analytics When Your Data Lake is Dry
What if I Don’t Have Historical Data? Here are three ways Lone Star does advanced analytics even when your data lake is dry
The rise of Business Intelligence (BI) and Artificial Intelligence (AI) has been powerful. But even good things can bring unintended negative consequences. One of the worst ideas from the rise of AI/BI is linking “data” and “analytics.”
Your immediate reaction might be confusion. How can we talk about analytics without historical data? The answer lies in your definition of data. AI/BI needs rich, full data lakes. But terabytes and zettabytes are not the only measures of datasets. What do you do when your data lake is dry or barely damp?
Lone Star has recently had discussions with multiple impressive “pure play AI” firms. We discussed cases where industrial data drought challenged mainstream, big data-driven AI. The other firms were nonchalant about their inability to solve important classes of problems. They seemed to believe that someday, the world would soon have all the data they needed. There will be plenty of historical data for analytics.
We disagree. There are many reasons we will NEVER have the data to train “deep” AI (also known as brute force AI) for some problems. We are happy to debate that point. More importantly, many customers across industrial verticals have been forced to the same viewpoint. They spent millions of dollars and years trying to build a dataset they could use for advanced analytics. Their scars are perhaps the most convincing evidence supporting our skepticism.
So, what can we do? Lone Star has three answers.
Our first method: talk to domain experts. They know a lot, and generally, they know more than they think they do. We had a customer who needed a predictive/prescriptive analytics solution for maintenance staffing yet had little/no usable maintenance-related data. They owned a fleet of Beechcraft King Air twin-turboprops. We recruited eight experienced maintenance professionals and went through the Beech maintenance manuals line by line, asking them, “how long does THIS maintenance task take?”
About a week (and dozens of pizzas) later, we had a dataset with accurate statistical spreads for nearly everything maintainers do on a King Air. Building the data from “real life” would have required logs from 10’s of millions of flight hours and 10’s of millions of maintenance hours. These didn’t exist in a usable form (and maybe not at all). But now, Lone Star has this data for one of the most ubiquitous aircraft in the world.
Extracting information from experts is tricky. It is MUCH easier to do wrong than right. Lone Star uses training and technology to ensure we do it correctly. In this arsenal is TruCast®, a web-based solution that enables efficient polling of a panel of experts without introducing bias. During COVID-19, travel bans were invaluable.
The second method is to leverage BI and AI on the data we have. Very often, clients claim they “have all the data you’ll need.” Someday, this will be true, but after hundreds of successful engagements, we haven’t seen it yet. However, falling short of “all” needed data isn’t the same thing as “no” data.
For brute-force methods of AI, all or nothing is a fatal problem. We aren’t alone in this view. Hubbard Decision Research promotes a sophisticated perspective on Information Economics. The basic idea is simple. It costs money to collect, store, and process data. Is all that trouble worth what it costs? Information Economics will lead you to reject the all-or-nothing view and realize there are many shades of grey between “no data” and “all the data you ever dreamed of.”
So, how do you use limited historical data for analytics when it’s “not enough” to train an AI or create a clear BI visualization?
You add known cause-effect relationships. We’ve done this for several customers in rail transportation, oil and gas production, telecom networks, factory operations, and electrical power generation. One customer successfully understood the solutions to complex problems created by COVID-19 shocks in their production system. Clearly, there was no data for COVID-19 in that data lake, yet they were still able to navigate through uncertainty. We can’t disclose the specifics here, but these were all successful in spite of nearly dry data lakes.
An example we can talk about is the U.S. Navy’s exceptionally reliable system in ejection seats. On rare occasions, it blows up. Literally, like, it explodes. There is no data logger on the component which fails. Adding a sensor would cost many millions and require years of data collection, and the sensor would probably make the system less reliable, so it is not an option. But the Navy did have eight reasonably good data points. That information was augmented with some secondary data collection about temperatures around the seats. Most importantly, the Navy had experts who had promising ideas of cause-effect relationships. The result was a highly effective digital twin for each device installed in the jets.
Third, there are new AI methods, some of which are called “hybrid AI.” We prefer the term “Evolved AI.” The new methods address the criticism that mainstream AI is “greedy, brittle, opaque, and shallow” and the realization, as The Economist says, that AI’s limitations are starting to sink in. It takes a lot of historical data to get to the analytics you need.
A full description of these methods can’t be presented in a short blog. But here are a few principles:
- We must segregate intelligence functions and orchestrate them, much the same as natural brains work
- We must free ourselves from the constraints of the Universal Approximation Theorem, which we don’t fully understand (in particular, we need to be free to deal with non-linear and discontinuous systems)
- We need to transparently control the hierarchy of AI actions based on rules we can control and not risk whatever the algorithm decides (what Asimov hoped for)
Evolved AI by Lone Star meets all these goals and may be the most credible path to something, which is:
- Frugal with data economics rather than greedy
- Resilient to unexpected shocks, like the Covid-19 epidemic, rather than brittle
- Transparent and ethical in operation, rather than an opaque black box
- Rich in features rather than shallow
These three methods are not theories. Lone Star customers see them working every day. These aren’t the only weapons in our arsenal, but they illustrate some ways to avoid dying of thirst in your dry data lake bed.