People of ACM - Wil van der Aalst

March 2, 2021

Will you give a brief description of what process mining is?

Process mining is an up-and-coming research discipline that combines data science and process science. When I started to work on the first process mining algorithms in the late 1990s, processes and data were completely disconnected. Most of the people working on process management were focusing on process models and workflow automation. Most of the people working on data mining and machine learning did not consider operational processes. Process mining provided the missing link between model-based process analysis and data-oriented analysis techniques.

The starting point for process mining is formed by the event data one can find in any information system. Activities performed by customers, machines, employees, patients, doctors, robots, vehicles, etc. leave digital traces in today's information systems. Process discovery techniques use event data to create process models describing operational processes in terms of their key activities. These process models reveal the actual processes and can be extended to show bottlenecks and outlier behavior. Conformance checking techniques compare observed behavior (i.e., event data) with modeled behavior (i.e., process models). These techniques can be used to show deviations, i.e., behaviors different from what is expected or desired.

Process models may also include probabilities, time distributions, and business rules. Therefore, process mining also includes a range of techniques enabling predictive and prescriptive analytics. Many of the larger organizations (especially in Europe) have adopted this technology. For example, within Siemens, over 6,000 employees use Celonis Process Mining to improve hundreds of core processes. Seemingly simple processes like the SAP Order-to-Cash process have over 900,000 variants illustrating the improvement potential possible using process mining.

What were the conditions two decades ago, both in terms of your own research, as well as what was going on in the field, that led you to develop the process mining approach?

The idea of process mining emerged from my experiences applying workflow management systems and discrete event simulation in real organizations in the 1990s. The majority of workflow management projects failed because of the gap between idealized process diagrams and the actual processes that were much more complex. I also conducted over 50 simulation projects that all showed that it is very difficult to capture real processes in a simulation model. Driven by my experiences in these workflow management and simulation projects, I realized that processes need to be analyzed in a more rigorous, objective and evidence-based manner.

Therefore, I got interested in the use of event data to automatically learn process models. The idea was that data in systems like SAP should tell what the real processes are (and not some consultant). At the end of the 1990s, I had developed some baseline algorithms to discover Petri nets from event logs. When I established a new research group at Eindhoven University of Technology in 2000, I made this a key priority, and this led to the open source process mining framework ProM. For a long time, our group was the only group working on process mining in a systematic manner. Also, from industry there was little interest, yet we continued working on dozens of process mining algorithms covering the whole spectrum, including topics such as conformance checking, predictive analytics, and decision mining.

Some of my students started process mining companies, and around 2010, there were several mature commercial offerings based on our ideas. Currently, there are over 35 commercial process mining tools (e.g., Celonis, Disco, UiPath/ProcessGold, myInvenio, PAFnow, Minit, QPR, Mehrwerk, Puzzledata, LanaLabs, Process Diamond, Apromore, Everflow, TimelinePI, Signavio, and Logpickr). The large-scale adoption in industry happened only in the last five years. Many organizations are using process mining today, but we are still only at the start of this development.

How might process mining assist in the distribution of the COVID-19 vaccine?

The COVID-19 pandemic shows that accurate data are vital to managing operational processes. Global supply chains were taken by surprise, and vulnerabilities were exposed. Process mining can be used to create full transparency of what is happening in a supply chain and recommend actions. Several process mining vendors provide COVID-19 help programs to optimize cashflows, manage supply chains, and keep mission-critical operations up and running. Process mining can also help to improve management of COVID-19 vaccine distribution,which governments have difficulty planning, monitoring, and controlling. Process mining can be used to address compliance and performance problems, e.g., tracking whether people get a second dose, ensuring that the right people get the vaccines, and detecting counterfeited or improperly handled vaccines.

In the ICU4Covid project, supported by the European Commission, we are using process mining to improve the treatment of COVID-19 patients in intensive care units. Different intensive care units in the Aachen region operate as one unit, and process mining is used to learn best practices.

How does process mining relate to machine learning (ML) and artificial intelligence (AI)?

Process mining techniques are very different from mainstream ML and AI techniques. Currently, ML and AI are dominated by deep learning approaches using various kinds of neural networks. Progress in speech recognition, computer vision, and machine translation has been spectacular. However, to train a neural network, large amounts of labeled data are needed. Moreover, deep learning cannot be used to discover a process model represented in terms of a Petri net, BPMN diagram, or UML activity diagram.

Process mining techniques are closer to optimization and concurrency theory. Many researchers seem to suffer from tunnel vision, assuming that everything can be solved using neural networks. Process discovery and conformance checking require different approaches and need to use process models that people can interpret. However, after discovering a process model and aligning the event data, it is possible to generate more standard machine learning problems, e.g., predicting the remaining processing time of a running case or the likelihood of a particular deviation. Despite the attention being given to ML and AI, the adoption in industry is limited. Process mining is getting much less attention, but is probably better applicable in industry at this stage.

Where is the intersection of process management and data science headed in the next five to 10 years?

Data science has become an important discipline and is here to stay. Just as computer science emerged from mathematics and electrical engineering, data science emerged from computer science and statistics. Data science includes data extraction, data preparation, data exploration, data transformation, storage and retrieval, computing infrastructures, various types of mining and learning, presentation of explanations and predictions, and the exploitation of results taking into account ethical, social, legal, and business aspects.

Data science is much broader than ML and AI. Many young people want to study data science because they see that this will change the way we work, do business, and socialize. Many jobs will simply disappear because of advances in data science. At the same time, there is a continued need to manage and improve operational processes. The Business Process Management (BPM) discipline was considered to be dead a few years ago, but is currently thriving because of advances in process mining and Robotic Process Automation (RPA). RPA makes it possible to automate repetitive administrative tasks in a cost-effective manner. Process mining supports RPA initiatives by uncovering automation opportunities and ensuring the correct operation of software robots. The process mining discipline connects process management and data science, and will continue to grow in size and scope. Moreover, the redistribution of work between humans and machines will be an ongoing topic in the coming decades.

Wil van der Aalst is a Professor and head of the Process and Data Science (PADS) group at RWTH Aachen University in Aachen, Germany. He has often been called “the Godfather of Process Mining,” an emerging field that bridges the gap between traditional business management techniques and modern data science techniques, such as machine learning and data mining. He was instrumental in starting the field two decades ago and is the author of the definitive textbook Process Mining. His research interests include process mining, Petri nets, business process management, workflow management, process modeling, and process analysis. He has published over 250 journal papers, 22 books (as author or editor), 550 refereed conference/workshop publications, and 80 book chapters.

At RWTH Aachen, van der Aalst is the recipient of a highly prestigious Alexander von Humboldt Professorship, which is funded by the German Federal Ministry of Education and Research and comes with a grant of 5 million euro. He is an IEEE Fellow, an IFIP Fellow, and was recently named an ACM Fellow for contributions to process mining, process management, and data science.