Author: William Vorhies
Summary: Digital Twins is a concept based in IoT but requiring the skills of machine learning and potentially AI. It’s not completely new but it is integral to Gartner’s vision of the digital enterprise and makes the Hype Cycle for 2017. It’s a major enabler of event processing as opposed to traditional request processing.
If the concept of Digital Twins is new to you, you need to be looking way over to the left on Gartner’s 2017 Hype Cycles of Emerging Technologies. There between Quantum Computing and Serverless PaaS you’ll find Digital Twins with a time to acceptance of 5 to 10 years, or more specifically that by 2021, one-half of companies will be using Digital Twins. In fact Digital Twins is one of Gartner’s Top Ten Strategic Technology Trends for 2017.
In many respects this is old wine in new bottles. One of those exercises in renaming an existing practice to draw attention, much like separating out prescriptive analytics from predictive analytics a few years back. I also think Gartner is way off on their time line, but first things first with some definitions.
A digital twin is intended to be a digital replica of physical assets, processes, or systems, in other words, a model. It is most often referenced as an outcome of IoT (internet of things) where the exponentially expanding world of devices with sensors provides us with an equally fast expanding body of data about those devices that can be analyzed and assessed for efficiency, design, maintenance, and many other factors.
Since the data continues to flow, the model can be continuously updated and ‘learn’ in near real time any change that may occur.
While the definition mentions the ability to model or digitally twin processes and systems, the folks who have most enthusiastically embraced DT are the IIoT community (Industrial Internet of Things) with their focus on large, complex, and capital intensive machines.
A Little History
Michael Grieves at the University of Michigan is credited with first formulating the terminology of digital twins in 2002. However, as many of you will immediately observe, data scientists and engineers were buiding computer models of complex machines and even manufacturing processes well before this date. NASA particularly is credited with pioneering this field in the 80s as a way of managing and monitoring spacecraft with which they had no physical connection.
Nor was it only NASA with its large teams of engineers that labored at these problems. Large and complex industrial processes were equipped with SCADA systems that were the precursors of IoT. There was and continues to be great demand for adaptive real time control for these machines and processes. But before MPP and NoSQL we were challenged by both available algorithms and compute power.
To illustrate how far back this goes, I studied a project completed in 2000 to model and subsequently optimize the operation of a very large scale nuclear waste incinerator run by the government in South Carolina. The project was a collaboration between a data scientist friend, Frank Francone and engineers at SAIC. The problem had failed to yield to any number of algorithms including neural nets but was finally solved using Francone’s proprietary genetic algorith achieving an R^2 of .96 but required over 600 CPU hours to compute. Things are easier now.
What was far sighted in 2002 was that Grieves was foretelling the volume of applications that would be possible once stream processing of NoSQL data became possible and morphed into the rapid growth of IoT.
Economic Importance
Modeling to predict preventive mainenance or to optimize output of complex machines or industrial processes may not be new but the news of its economic importance is catching everyone’s attention.
GE is a leader in IIoT and the use of that data to improve performance. Their goal for the digital twin they have created for their wind farms is to generate 20% increases in efficiency. This includes real time maintenance and configuration changes during operation but also extends to new product design, configuration, and the construction of new wind farms.
Tesla is the poster child for using real time IoT data directly from customer’s cars and their driving experiences to enhance the performance of not only its existing fleet but also future models. Reportedly this can be as discrete as resolving a customer’s rattling door by updating on board software to adjust hydraulic pressure in that specific door.
Passenger jets and Formula 1 racers are just two other examples of complex mechanical systems that have extremely large numbers of sensors gathering and transmitting data in real time to their digital twins where increased performance, efficiency, safety, and reduced unscheduled maintenance are the goal.
IDC forecasts that by this year, 2018, companies investing in digital twins will see improvements of 30% in cycle times of critical processes.
Machine Learning and AI in Digital Twins
The fact is that digital twins can produce value without machine learning and AI if the system is simple. If for example there are limited variables and an easily discoverable linear relation between inputs and outputs then no data science may be required.
However, the vast majority of target systems have multiple variables and multiple streams of data and do require the talents of data science to make sense of what’s going on.
Unfortunately the popular press tends to equate all this with AI. Actually the great majority of the benefit of modeling can be achieved with traditional machine learning algorithms. Let me be clear that I am using ‘machine learning’ in the traditional sense of any computer enabled algorithm applied to a body of data to discover a pattern.
It is possible though to see that the AI represented by deep learning, specifically image and video processing and text and speech processing (with CNNs and RNNs respectively) can also be incorporated as input into models alongside traditional numerical sensor readings.
For example, video feeds of components during manufacture can already be used to detect defective items and reject them. Similarly audio inputs of large generators can carry signals of impending malfunctions like vibration even before traditional sensors can detect the problem.
Mind the Cost
If you already operate with IoT, especially those connected to industrial machines and processes you are probably in the sweet spot for Digital Twins. However, mind the cost. Any predictive model is potentially subject to drift over time and needs to be maintained. IoT sensors for example are notoriously noisey and as you upgrade sensors or even the mathematical techniques you use to isolate signal from noise your models will undoubtedly need to be updated.
The business message here is simple. Be sure to do your cost benefit analysis before launching into DTs, where cost is the incremental cost of the data science staff needed to maintain these models.
What About Business Processes
Although the definition of digital twins often includes specific referenece to ‘processes’, examples of processes modeled with digital twins other than mechanical factory processes are difficult to find. Since not many of us have complex or capital intensive machinery and industrial processes, what is the role of digital twins in ordinary business processes like order-to-cash, or order-to-inventory-to-fullfillment, or even from first-sales-contact-to-completed-order.
Where you have to look for these types of examples is outside of the digital twin literature, in business process automation (BPA) or business process management (BPM). Although certainly valuable, both these overlapping fields have been slow to find opportunities to incorporate machine learning or AI. But the time may be close if it’s not here already.
Not all the data that streams is IoT. Technically IoT is about data streamed from sensors but there are plenty of other types of data that stream that do not originate from sensors, for example data captured in web logs such as ecommerce applications.
There are not many current examples but one is the case of Rocket Mortgage. The mortgage originator seeks to interface with the borrower exclusively on-line. The efficiency of each step from initial application through funding is closely monitored for both cycle time (efficiency) and accuracy. There are BPA applications available today that can automatically detect the beginning and end points of each step in the transaction from web logs thus providing the same sort of data stream for mortgage origination as sensors might for a wind turbine. Rocket has time and accuracy goals for each of these steps that constitute a digital twin of the process.
This may not be common today but it’s a natural extension of the digital twin concept into human-initiated processes.
Mind the Error Rate
Here’s a fundamental rule of data science. The more that human activity is included in the data of what is being modeled, the less accurate the model will be. So for those of us who have modeled machine-based or factory-process based data where very little human intervention occurs we can regularly achieve accuracy in the high 9s.
However if we are modeling a business process such as customer-views-to-order in ecommerce, or something as mundane as order-to-cash, then the complexity of human action will mean that our best models may be limited to accuracy in the 7s and 8s.
But even with industrial applications the error rate still exists. Models have error rates. So for example, when we use Digital Twin models to predict preventive maintenance or equipment failure, in some percentage of cases we will perform the maintenance too early and in some we will fail to forsee an unexpected failure. We may continue to improve the model as new data and techniques are available but it will always be a model, not a one-to-one identity with reality.
Another major use of Digital Twins is in optimization. The same impact of error rate will be true except that if some of our solutions based on DT modeling involve significant capital spending, then some of those decisions may be wrong.
The third and perhaps most concerning area is where Digital Twins are used as a representation of current reality and new machines, processes, or components are designed and built up from scratch using those assumptions about operating reality. There particular care must be exercised to understand how the error rate in the underlying model might mislead designers into serious errors about how the newly designed machine or process might perform in the current reality.
Stand by for the Event Driven Future
The great majority of our interaction with digital systems is still request driven, that is, once a condition is observed we instruct or request the system to take action. This is being rapidly supplanted by event driven processing.
What used to be called prescriptive analytics, the machine learning extension from the model to the decision of what should happen next is being rebranded as ‘Event Driven Digital Business’. The modeling of machines, systems, and processes is a precondition for the optimization work that determines when specific actions and decisions are needed. As the digital twin movement expands, more streaming applications will be enabled with automated event driven decision making.
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001. He can be reached at: