Author: William Vorhies
Summary: A little history lesson about all the different names by which the field of data science has been called, and why, whatever you call it, it’s all the same thing.
A little reminiscence, or for those of you who are only recently data scientists, a little history lesson.
Our profession of finding the signal in the data, be that supervised or unsupervised got underway in the 90s. In the last 20+ years we’ve been called by a variety of names. It’s not at all clear that as those names changed that any clarity was added.
In fact, for a profession as concerned with accuracy as we are, we’ve done a pretty poor job at naming things. Take ‘Big Data’ for instance. Not really about ‘big’ at all. Just as much about velocity and variety as it is about volume. Or NoSQL which has pretty much lost its meaning since all those NoSQL DBs now run SQL just fine. Or ‘artificial intelligence’. That term has been high jacked by the press and developers to put the sheen of AI on pretty much everything we do.
So just for fun, here’s a brief recap of all the names that have been used to describe what we do.
KDD (Knowledge Discovery in Databases): This is the oldest title I can personally remember to describe what we do. Coined by Gregory Piatetsky-Shapiro in 1989. We still had both feet firmly planted in BI and its retrospective point of view but already there were the inklings that the same data could also tell us many things about the future.
Data Mining: DM was originally intended to describe what went on in KDD, but like ‘AI’ the term was widely adopted in the business press and became the more popular descriptor up through the mid-2000s.
Predictive Modeling: When I first got involved in data science in 2001, Predictive Modeling was the preferred term. It more accurately described what we were doing and the tools we were using to predict future behavior and future values.
It caught me by surprise that Gartner and the other review agencies changed that name almost immediately to ‘Predictive Analytics’. This was at the time when data viz began to play a more important role and if you read the reports from that period it looks like Gartner and the others generalized the name to ‘Analytics’ to allow the data viz platforms a place at the table.
Prescriptive Analytics: In 2014 Gartner once again got involved in changing definitions by introducing Prescriptive Analytics as separate from Predictive Analytics. Gartner’s definition says we should differentiate what ‘could happen’ (predictive) versus what ‘should happen’ (prescriptive). I admit that I still see this as a distinction without a difference since ‘prescriptive’ is merely ‘predictive’ with some optimization math applied. That’s something we had been doing all along.
Machine Learning: The term ML actually predates KDD but it only came into common use in the last half of the 2000s. As our techniques for supervised and unsupervised learning became more diverse with the adoption of SVMs, ensemble methods, and the rebirth of ANNs, there was renewed focus on the fact that many new techniques belonged in the tent provided they met the original criteria of discovering patterns in the data without being explicitly programmed to do so.
2006 marks the beginning of the NoSQL age with open source Hadoop that allowed us to begin to apply ML techniques to unstructured and semi-structured text and image data. ML still describes the most commonly adopted business applications of data science through scoring models and forecasting and the source of most of the value currently created by data science.
Deep Learning: To data scientists, the term Deep Learning or its more explicit title Deep Neural Nets (DNNs) was an outgrowth of the introduction of NoSQL DBs and the rapidly increasing compute capacity of advanced chips and the cloud. To data scientists, DL/DNN was and is the tool set that enabled what we came to see as artificial intelligence.
It took from the advent of open source Hadoop in 2006 until about 2016 or 2017 to reach human-capable levels of speech, text, and image recognition, the cornerstones of AI.
It’s also worth noting that the original field of ML continued to innovate better and better algorithms through about 2016. Very little in the way of major break throughs has occurred in ML since that time and for the last several years both ML and AI have been in mature implementation and value harvesting phases.
Artificial Intelligence (AI): By 2017 the term “AI” had been fully appropriated by the press, the public, and even by developers. It evolved into a generic phrase literally defined as:
Anything that makes a decision or takes an action that a human used to take, or helps a human make a decision or take an action.
So as you have conversations with potential users today you need to have that up-front qualifying conversation about ‘what do you really mean when you say you want an AI solution’. The great majority of implementations continue to be Machine Learning and recently, at least within the data science profession, there’s been a return to more accurately labeling this as “ML/AI”.
Is This Really All the Same Thing?
There is a wonderful set of just five questions that describes everything we do with our models in data science.
- Is this A or B?
- Is this weird?
- How much – or – How many?
- How is this organized?
- What should I do next?
I apologize that I’m unable to find the author’s name to credit. This simple summary drives home the fact that no matter what techniques you’re using, and whether you’re deep in DNNs or in equally sophisticated ML techniques like XGBoost, that what we do has a very common and easy to understand purpose, regardless of what you call it.
Other articles by Bill Vorhies
About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2 million times.
He can be reached at:
Bill@DataScienceCentral.com or Bill@Data-Magnum.com