Author: Stephanie Glen
“Data Scientist” is 2020’s equivalent of the rocket scientist of the 1950’s: mysterious, sexy, and well-paid. But are you actually a “scientist”? While “data science” isn’t fully defined yet as an academic subject (National Academies of Sciences, Engineering, and Medicine, 2018), more and more evidence seems to point to it being more of an art, rather than a science.
So if the essence of data science isn’t yet solidified, how can I make the bold statement that your’e an artist, not a scientist? if we can’t yet agree on what exactly the core tenets of data science are, we can take a look at it’s main components: programming and statistics–neither of which are actually sciences either.
So, Programming Isn’t a Science?
Renowned Stanford computer scientist Donald Knuth, who the NY Times calls “The Yoda of Silicon Valley”, eloquently lays any argument to rest (as cited on SNHU),
“Computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.”
But to look at it from a different perspective, this time from artist Warren Sack, chair and professor of the Film and Digital Media Department at UC Santa Cruz. Professor Cruz studied computer science as an undergraduate and had this to say about his experiences with programming (as cited in a UC Santa Cruz article).
“Ever since I was an undergraduate computer science major taking art courses, it seemed obvious to me that writing software is an art.”
Surely Statistics is a Science?
This one is a little trickier. Statistics has gone through many transformations since its Biblical era inception as a tally-keeper for governments or states. The argument about whether statistics is a science isn’t anything new. Back in 1978, M. Healy noted in the Journal of the Royal Statistical Society article Is Statistics a Science? that statistics “…may itself be best considered as a technology rather than a science.” It has transmogrified over time into a behemoth of a subject, filled with “..a diverse set of methods that contradict each other,” (Mark van der Laan, Professor in Biostatistics and Statistics at UC Berkeley).
Putting aside the notion that statistics may be an art form itself, what is clear that it has been applied to numerous metrics fields (econometrics, biometrics, psychometrics…) all of which, without the application of statistical methods, clearly fall into the category of Arts. The addition of statistics into the mix, including estimation, testing, and prediction–muddies the waters a little, but the application of algorithms doesn’t magically turn an Art into a Science. So, if you view statistics as a set of scientific tools that can be applied to artistic fields (as well as technological ones, like computing), that still does not make statistics “Science.” It would be like calling a ruler, protractor, and slide rule “Science.” It’s not the tools that define science, but rather the rules, laws, and procedures that govern what you do with those tools. Yes, statistics contains rules, procedures and algorithms. But it also requires a hefty amount of guesswork and creativity. To say that statistics is a science because it contains rules, procedures and algorithms would be like stating visual arts is also a science because of the use of techniques, perspective and proportions.
Consider the quote “Art is the skill or the power of performing certain actions..the practical application of set-up rules of principles to practice” (Alagar, 2009, p. 4). Change the word “art” to “statistics” and the sentence still makes sense in that it describes the role of the statistician perfectly:
“[Statistics] is the skill or the power of performing certain actions..the practical application of set-up rules of principles to practice”.
The fact that the Scientific Method and Statistics share a love of hypotheses doesn’t make statistics a science either. Hypotheses are the backbone of the scientific method, without which science would not exist. The same isn’t true of statistics: take out the Hypothesis Testing and you’re still left with an abundance of creative and exploratory tools that don’t rely on the detested p-value: Bayesian methods, Exploratory Data Analysis, Trend Analysis, Descriptive Statistics, to name but a few.
But What About Modeling?
One of the most important contributions statistics gives to data science is modeling. At it’s best, modeling is a impressive, powerful technique for understanding data and making predictions. Statistical modeling is based on a set of precise rules that allow you to transform a glut of data into a comprehensible, workable model. If “Science is the body of systematic knowledge…the observation of certain facts” (Alagar, 2009, p. 4), then Statistical modeling is science at it’s best. However, there’s a problem. In an ideal world, the mathematics (and thus, the science) behind modeling would be followed to the letter. But if you’ve spent any time with statistical modeling at all (which you probably have, if you’re a data scientist), you know that adhering to the strict rules and assumptions is challenging, and it’s all so easy to deviate from the unwieldy rules. Statisticians don’t always play by the rules either, and “These statisticians do not respect the definition of a statistical model.”(Mark van der Laan, Professor in Biostatistics and Statistics at UC Berkeley. )
If statisticians don’t always get it right, the odds are that you, as a data “scientist” probably aren’t getting it right either–at least, in the technical sense of the word. So if you aren’t following a set of rigid rules, then you’re getting creative–and hence, you’re probably an artist.
Statistics as a Liberal Arts
So, if statistics isn’t a science, then what is it? You could argue that statistics is a blend:
“Thus, we can conclude that statistics is both science and art” (Alagar, 2009, p. 4).
In his article Statistics Among the Liberal Arts, David Moore argues that it should fall under the umbrella of liberal arts.
“The liberal arts are usually understood to be general and flexible modes of reasoning. By this definition, statistics qualifies as a liberal art, and it is important to the health of the discipline that it be recognized as such. ”
Data Sci-Art, Anyone?
If a data point labeled “data science” were fed into a classification algorithm (based on the above “rules”), undoubtedly it would be classified as an art. Or, if you’re still on the fence, and want to argue that there is still a little science in “data science,” Perhaps you can call yourself a Data Sci-Artist? Not quite as sexy as “Data Scientist,” but perhaps a little more truthful.
References
Alan Agresti. The Art and Science of Learning from Data. 2013, Pearson.
Alagar, K. Business Statistics, 2 Ed. 2009. Mc Graw Hill India.
Mark van der Laan. Statistics as a Science, Not an Art: The Way to Survive in Data Science, 2015
Healy, M. Is Statistics a Science? In Journal of the Royal Statistical Society. Series A (General) Vol. 141, No. 3 (1978), pp. 385-393 (9 pages)
Moore, S. Statistics Among the Liberal Arts. Journal of the American Statistical Association. Volume 93, 1998-Issue 444.
National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. https://doi.org/10.17226/25104.