So… you want to be a Data Scientist…
Lone Star provides software for analytics, along with transparent analytic solutions. We’ve done extensive benchmarking to understand the Age of Algorithms.
So, we are frequently asked a question like this, “What makes a great data scientist?”
It’s a big subject. For a start, here are six things we believe you need to be a data scientist:
- Understand the scientific method. Science is about testing an idea. In data science we either have ideas we want to test, or we are looking in data for patterns which might lead to ideas we can test. Failure to grasp this idea of testing (and falsification) is the biggest problem we see in our work. The scientific method is NOT dumping data into some code you found on GitHub, and blindly reporting what the mystery code generated. Look up Karl Popper or watch this little 2-minute video.
- Have a good working knowledge of statistical principles. Feynman said it’s more important to understand things than to know their names. While it’s nice to know the names of distributions like Poisson or Gamma, it is much more important to grasp some basic ideas about probability and statistics. Memorizing a stats book is useless if you don’t grasp the ideas. On the other hand, if you understand the ideas, you can always look up the names later, should you need them.
- Be comfortable with some coding. Today’s code is pretty gentle, and a lot of it exists in code repositories. You don’t need to be a master coder, but it helps to be competent when it comes to the basics. Lone Star uses no-code tools for much of our work, like TruNavigator and AnaltyicsOS, using R along with other code sets, too.
- Be ready to spend most of your time cleaning data. Nearly all data sets are corrupted, polluted, distorted, and out of date. Dirty data is inevitable, so realizing you can’t kid yourself about it is imperative. This will be about 80% of your job.
- Be creative about how to fill in the gaps in the data you need. This is one of Lone Star’s great strengths, as we’re proficient in finding ways to quantify what’s missing from our client’s data. Although there’s several ways to collect missing data, from SME’s to audits, finding the missing pieces sometimes involves outside-the-box solutioning and creating Little Questions® to get your answer.
- Understand cognition. Most data science involves uncertainty. Humans are bad at this, as it’s not how our brains are wired. Therefore, there are all kinds of crazy behavioral biases in data which reflect human activity. These are compounded by biases (and fibs) in what people SAY about their activities. These are then further compounded by our biases in how we analyze our data. In the end, our findings will be distorted by the biases of the decision maker we report to. It’s more common than we think to distort perceptions, memories, and thinking in relation to data. Reading Kahneman or watching a TED talk, like this one, can give further insight as to why this uncertainty exists so frequently.
Alone, these six things won’t make you a great data scientist. However, you can’t really be any kind of data scientist without these six things.