In nearly every case, your big data has gaping holes in it. That may sound wrong to most people. In fact, many data science practices involve dimensionality reduction or making Big Data smaller.

So, how can it be that your terabytes or zettabytes are incomplete? The answer is too long for this short blog, but here are some questions you may want to consider asking:

Are we forgetting the weather data?  We see this concept continue to reappear across vastly different clients and applications. It feels like a trend. We stare at the data trying to figure out what’s causing a trend or a sudden spike in failures to occur. Then, someone finally mentions the hail storm or heat wave or hurricane that shut down the factory for a week. A problem is not confined to a single dataset- we must continually ask ourselves what other explanatory data is available.

This unwanted trend is made more complex by interconnected global supply chains. Changes in weather or an earthquake on the other side of the world could simultaneously affect your own operations, without you knowing it.

Are we trying to rediscover physics?  We see approaches in which eager data scientists are trying to reinvent the wheel. In a rush to apply the newest algorithm, it can be easy to forget that many applications have very firm fundamentals at their core. If you’re passionate about data analysis, then having a good working knowledge of statistical principles is key.

Of course, if you’re dealing with consumer preferences in the beverage market, there are no coherent underlying physics. However, when we see the default approach to analyzing turbine engine data is to throw a neural net at it, we get squeamish. Just think how much better our analytics could be if we took the time to consider some Thermodynamics?

Are we creating a divide between those creating analytics and those who consume it?   This is the largest hole of all. The Big Data analytics creator is usually not aware of the decision maker’s domain.  Likewise, someone consuming the data may not understand the limitations and assumptions that the analytics require. This creates a gap between the handoff of data creation and consumption, preventing us from fully understanding our data altogether. Anything we can do to meld these two communities is a step in the right direction towards having complete data.

Do you have a time machine?  Unless you’re Marty McFly, I would assume the answer is no. Without a way to predict the uncertainty and advancements the future will bring, how can you accurately account for future needs, features, and changes? You can’t- so it’s safe to say your data only reflects the past, not incorporating the critical changes that must be accounted for in the future.  Your data only includes the customers, products, regulations and competitors you have today, which probably has some holes in it, too.

So, what can we do? Lone Star’s CTO, Eric Haney has three suggestions;

  1. Lay out an Analytics Roadmap before starting Analytics: Clearly identify what problem you are trying to solve, what business metric a solution is going to improve, and what options you have for data sources. And do it early.
  2. Use the Data You Have before Investing in the Data You Don’t: Be inventive with sourcing outside data. Fill in gaps with inference, estimation and domain expertise. Only when these approaches fail, should resources be spent collecting a widened dataset.
  3. Don’t rely on Brute Force when Newton is in Your Corner: Remember that we are thousands of years into civilization. The Babylonians were doing trigonometry more than 3,000 years ago. You are not starting from scratch, or at least you shouldn’t be.

About Lone Star Analysis

Lone Star Analysis enables customers to make insightful decisions faster than their competitors.  We are a predictive guide bridging the gap between data and action.  Prescient insights support confident decisions for customers in Oil & Gas, Transportation & Logistics, Industrial Products & Services, Aerospace & Defense, and Military & Intelligence.

Lone Star® delivers fast time to value supporting customers planning and on-going management needs.  Utilizing our TruNavigator® software platform, Lone Star® brings proven modeling tools and analysis that improve customers top line, by winning more business, and improve the bottom line, by quickly enabling operational efficiency, cost reduction, and performance improvement. Our trusted AnalyticsOSTM software solutions support our customers real-time predictive analytics needs when continuous operational performance optimization, cost minimization, safety improvement, and risk reduction are important.

Headquartered in Dallas, Texas, Lone Star is found on the web at


Recent Blog Posts