Applying Process Mining to the Test Process of ASML
In contrast to the traditional, manual business process modeling approach, process mining technology enables the analysis of existing processes based on historical data collected by the supporting IT systems.
I thought it might be nice to give a few more application examples to illustrate where and how exactly process mining can be used.
To make a start, I chose a case study that I conducted during my time as a PhD student at TU/e. Together with Dr. Ivo de Jong, Dr. Christian Gnther, and Prof. Wil van der Aalst, I used process mining techniques to analyze the test process of ASML.
It is not a very typical case study but I still like it because it shows the broad applicability of process mining techniques.
You can find more details about it in the scientific articles1, but I summarize the main points here in this post.
ASML’s test process
ASML is the worlds leading manufacturer of chip-making equipment and a key supplier to the semi-conductor industry. It makes so-called wafer scanners that are used to manufacture processors in devices ranging from mobile phones to desktop computers.
Wafer scanners are really complex machines that consist of many building blocks and use a photographic process to image nanometric circuit patterns onto a silicon wafer, much like a camera prints an image on film. There is an ongoing effort to reduce the line widths on silicon wafer to enhance the performance of the manufactured semi-conductors. Every new generation of wafer scanners is balancing on the border of what is technologically possible.
As a result, the testing of manufactured wafer scanners is an important but also time-consuming process.
Every wafer scanner is tested in the factory of ASML.
When it passes all tests, the wafer scanner is disassembled and shipped to the customer where the system is re-assembled.
At the customers site, the wafer scanner is tested again.
The test process is a complex, knowledge-intense process, and testing takes several weeks at both sites.
Goal of the analysis
Generally, the main strength of process mining is that it makes existing processes visible and shows what is actually happening. This visibility is the first step towards low-risk process improvements, but the actual improvement goals may vary.
Because ASML operates in a market where the time-to-market of system enhancements and new system types is critical, the main goal was to reduce the test period rather than, for example, cutting costs.
The event log
Because the wafer scanners are continuously enhanced, the number of manufactured wafer scanners of a single type is typically less than 50. And with each new type, parts of the calibration and test phase are adjusted. Therefore, we selected 24 machines of the same family to be analyzed.
Each wafer scanner in the ASML factory produces a log of the software tests that are executed. The wafer scanner is calibrated and tested using calibration and performance software, and each test is indicated in the logging as a four-letter test code2. The logging contains the start and stop moment of each test (see picture above on the left side). Furthermore, the machine number at the beginning indicates the particular wafer scanner that is tested.
Back then Christian wrote a conversion plug-in to translate the event log into the MXML format, which can be read by ProM (see picture above on the right side). Today, we would use Nitro to directly convert the original log in a couple of minutes.
What is remarkable about this log is the ratio of process instances (i.e., cases) to events per case. In most domains, we usually see a large number of relatively short log traces. For example, when looking at processes related to patient flows, call centers, traffic fines, etc., then there are typically thousands of cases each containing less than 50 events.
The log of the ASML test process has very different characteristics: There are just a few cases (i.e., machines) but for each machine there may be thousands of log events. In the initial data set we had process instances that contained more than 50000 log events (each indicating either the start or the completion of a specific test). In the final log, the longest trace was still 16250 events long. There were also 360 different test codes.
Process mining results
We analyzed the test process based on these historical data to find bottlenecks and ideas for improvement. Because it was the goal to shorten the test process, we particularly watched out for idle times and re-executions in the log.
To understand what this means you have to imagine the test process as follows: Sequences of tests are scheduled in batch mode to the machine. To do this, test engineers follow a reference process that describes all the tests a machine has to pass successfully to get through to the end. As soon as a test fails, one of the following happens:
A hardware fix is needed, which puts the test process on hold (idle time) unless new tests are scheduled that can already run in the mean time.
Sometimes the system can fix itself by recalibrating parameters in the software.
Whether a part had to be replaced or parameters have changed, often such a fix results in the re-execution of tests that were already passed earlier. So, valuable time is lost.
We did a performance analysis to find unnecessary idle times. Furthermore, we used process discovery techniques to visualize the re-executions in the actual flow of the test process: Based on the logged test sequences we could automatically construct a process model that showed how the test process had been executed for these 24 machines.
You can see the difference between the idealized and the actual test process in the 2 high-level process models above:
On the left is the reference process that is used by ASML’s test engineers to schedule the lower-level test sequences. This process describes the ideal test process as it is followed if nothing goes wrong.
On the right you can see the discovered process model, which shows the actual process flow of the different test phases. There is much more repetition and you can see the loop-backs (e.g., in the framed area).
We also measured the conformance of the reference process and the discovered process by comparing the models with the actual log data: The reference process had a fitness of only 38%, which indicates a lot of deviation. The discovered process also did not have 100% fitness (because we had to simplify the model to make it readable), but with over 70% fitness it is a much better representation of reality.
Now, this is not surprising and the test engineers expect these feedback loops. But they don’t know where they will be. Based on an analysis of the underlying tests that were causing the loop-backs, we could make concrete improvement suggestions. For example, some of these tests can be duplicated and executed already in earlier test phases, which would cause them to fail earlier and thus avoid the re-execution of later tests in the sequence.
What has happened since then?
Unfortunately, the data that we analyzed were already too old when we presented the results. So, our improvement suggestions were not immediately applicable to the new systems currently tested. Because of the rapid innovation of the machines also the test process changes, and the process mining analysis would need to be executed incrementally by an ASML engineer to be useful for them.
For us, it was still a useful case study. The data set was much bigger and the underlying process was much more complex than other logs that we had seen before. In fact, the ASML test logs were one of those data sets that inspired Christian to develop the Fuzzy Miner, which interactively simplifies complex process models to make them more readable.
Did you find this case study interesting? Let me know in the comments.
The case study was first described in 2007 in this technical report. Later, we published a part of the results in this IEEE note and another part in this workshop paper. We also used the data for a case study in this activity mining paper. ↩︎
Both the machine numbers and the test codes have been anonymized for confidentiality reasons. ↩︎