Author: Bill Schmarzo
I’ve been reading the interesting and soul-searching (from a data scientist perspective) book from Cathy O’Neil titled “Weapons of Math Destruction”, or WMD as used in the book. The book provides several real-world examples of how Big Data and Data Science – when not properly structured – can lead to ethically-wrong unintended consequences.
Chapter 3 “Arms Race: Going to College” describes how the college ranking system developed by “US News & World Report” in 1983 has created its own self-fulfilling, mis-aligned ecosystem. Because of the influence the “US News & World Report” ranking has on the multi-billion-dollar college recruiting business, a few key metrics – SAT scores, student-teacher ratios, acceptance rates, alumni donations, freshman retention – get over-valued in college’s investment strategies.
The unintended consequences is that many colleges focus their investments on overly-opulent facilities and over-paid research faculty programs in the effort to increase their ranking, sometimes at the expense of a more holistic “quality education and enlightening personal experience” for the college students.
But the “US News & World Report” ranking is greatly flawed with the omission of several critical key metrics. For example, the ranking doesn’t considered price. If cost is not an issue for someone deciding to go to college, then that’s okay. But for the other 99% of us, cost is an important factor in determining a “quality” educational experience.
And that’s the challenge with AI model biases, if you don’t carefully consider the different variables and metrics against which you need to measure model progress and success, you may end up with AI models that deliver ethically-wrong unintended consequences.
So, how does one mitigate the negative impacts of models that are supposed to represent the real-world, but actually provide a dangerously biased and skewed perspective on that world? Here are a couple of things that every organization can do to reduce the ethically-wrong unintended consequences caused by AI models that turn into ““Weapons of Math Destruction”:
- Brainstorm a “diverse, sometimes conflicting set of metrics” across a diverse group of stakeholders that drives the AI Utility Function
- Thoroughly explore and quantify the AI model costs of False Positives and False Negatives
#1. Brainstorm Diverse Set of Metrics to Power AI Utility Function
One way to avoid AI models that deliver unintended consequences is to invest the time upfront to brainstorm a “diverse, sometimes conflicting set of metrics” against which the AI model will seek to optimize. This means embracing a diverse set of stakeholders (a stakeholder map can help to identify the different stakeholders who either impact or are impacted by the AI model) who can provide a diverse set of perspectives on how best to measure the AI model’s progress and success.
To understand why it’s important to capture a diverse and sometimes conflicting set of metrics against which the AI model must seek to optimize, one needs to understand how an AI model (AI Agent) works (see Figure 1):
- The AI model relies upon the creation of an “AI Agent” that interacts with the environment to learn, where learning is guided by the definition of the rewards and penaltiesassociated with actions taken by the AI Agent.
- The rewards and penalties against which the “AI Agent” seeks to take the “right” or optional actions are framed by the definition of valueas represented in the AI Utility Function.
- In order to create an “AI Agent” that makes the “right” decision, the AI Utility Functionmust be comprised of a holistic definition of “value” including financial/economic, operational, customer, society, environmental and spiritual value.
Figure 1: Role of AI Agents and Continuously-learning and Adapting
Bottom-line: the AI Agent determines or learns “right versus wrong” based upon the definition of value as articulated in the AI Utility Function. The AI Utility Function provides the metrics against which the AI model will learn the right actions to take in what situations (see Figure 2)
Figure 2: AI Utility Function
To avoid the unintended consequences of a poorly constructed AI Utility Function, collaboration with a diverse set of stakeholders is required to identify those short-term and long-term metrics and KPI’s against which AI model progress and success will be measured. The careful weighing of the short-term and long-term metrics associated with the financial/economic, operational, customer, society, environmental and spiritual dimensions must be taken into consideration if we are to make AI work to the benefit of all stakeholders (and maybe avoid those pesky Terminators in the process).
To help brainstorm these diverse set of metrics, embrace the “Thinking Like a Data Scientist” methodology which is designed to drive the cross-organizational collaboration necessary to root out and brainstorm these different metrics. The “Thinking Like a Data Scientist” process guides the identification of a “diverse, sometimes conflicting metrics” into the data science modeling work because the real world is full of “diverse, sometimes conflicting metrics” against which the world must try to optimize (see Figure 3).
Figure 3: The Art of Thinking Like a Data Scientist
A key deliverable from the “Thinking Like a Data Scientist” process is the Hypothesis Development Canvas. The Hypothesis Development Canvas helps in the identification of the variables and metrics against which one is going to measure the targeted use case’s progress and success. For example, increase financial value, while reducing operational costs and risks, while improving customer satisfaction and likelihood to recommend, while improving societal value and quality of life, while reducing environmental impact and carbon footprint (see Figure 4).
Figure 4: Hypothesis Development Canvas
The AI modeling requirements captured in the Hypothesis Development Canvas then need to be translated into the AI Utility Function that guides the metrics and variables against which the AI model will seek to optimize. Shortcutting the process to define the measures against which to monitor any complicated business initiative is naïve…and could ultimately be dangerous depending upon the costs associated with False Positives and False Negatives.
#2. Codifying the Costs Associated with False Positives and False Negatives
Unintended consequences can easily occur with the AI model if a thorough, comprehensive exploration of “what could go wrong” isn’t conducted prior to building the AI models, and then integrated those costs into the AI Utility Function. And that brings us into the realm of Type I and Type II errors, or False Positives and False Negatives.
- A Type I Error, or False Positive, occurs when asserting something as true when it is actually false. This false positive error is basically a “false alarm” – a result that indicates a given condition has been fulfilled when it actually has not been fulfilled (i.e., erroneously a positive result has been assumed).
- A Type II Error, or False Negative, occurs when a test result indicates that a condition has failed, when in reality the condition was successful. A Type II Error or False Negative occurs when we fail to believe something (like someone is sick, or a part is going to break) is a true condition.
I think most folks struggle to understand Type I (False Positive) and Type II (False Negative) errors, which is why I think Figure 5 summarizes Type I and Type II errors very nicely (he-he-he).
Figure 5: Understanding Type I (False Positive) and Type II (False Negative) Errors
In Figure 5, a Type I Error (False Positive) occurs when the doctor tells the man that he is pregnant, when obviously he can’t be. The Type II Error (False Negative) occurs when the doctor tells the women that she is NOT pregnant when visual inspection confirms that she is pregnant.
Let’s look at understanding the costs of False Positives and False Negative using a real-world COVID19 example. With respect to COVID19, when one has incomplete data and is trying to buy time in order to get more complete, accurate and trusted data through testing, then the best thing that one can do is to make decisions based upon the costs of the False Positives and False Negatives. In the case of the COVID19, that means:
- The cost of a False Positiveis that a healthy person will be quarantined and will be one of the first to receive the vaccine when it is available. The cost of being wrong in this case are the costs associated with being quarantined such as lost wages and the inconvenience associated with being quarantined. The cost of the False Positive in this case is very low.
- One the other hand, the cost of a False Negativeis that an infected person is classified as healthy and they continue to mingle in public infecting others and even potentially leading to the death of others. The cost of the False Negative in this case is very high.
See the blog “Using Confusion Matrices to Quantify the Cost of Being Wrong” for more homework on understanding the costs associated with False Positives and False Negatives. Maybe some of you can share this blog with some of our elected officials…
Summary
Any time you see a very complex, multi-faceted decision that has been boiled down to a single number…WATCH OUT! Creating a single number against which to monitor any complicated business initiative is naïve. Baseball, for example, leverages a bevy of numbers and metrics to determine the value of a particular player, and many of those numbers and metrics – such as Wins above Replacement, Offensive Wins above Replacement, Offensive Runs above Average and W-L Percentage of Offensive Wins above Average – are complex, composite metrics that are comprised of additional data and metrics.
In a world more and more driven by AI models, Data Scientists cannot effectively ascertain on their own the costs associated with the unintended consequences of False Positives and False Negatives. Mitigating unintended consequences requires the collaboration across a diverse set of stakeholders in order to identify the metrics against which the AI Utility Function will seek to optimize. And these metrics need to represent multiple, sometimes conflicting objectives including financial/economic, operational, customer, society, environmental and spiritual objectives. And again, the determination of the metrics that comprise the AI Utility Function is not a Data Scientist job, unless, of course, you don’t mind herds of Terminators roaming the local mall (I hear that they like sunglasses).