A Process Mining Data Quality Study at the Swiss National Bank
This is a guest article by Dr. Stefan Michel, Audit Manager Internal Audit at the Swiss National Bank. Stefan performed this case study within the context of his doctoral thesis and wrote this summary article to share his findings with the process mining community. You can download the full case study here. If you have a guest article or process mining case study that you would like to share as well, please contact us via email@example.com.
The Swiss National Bank (SNB) is automating more and more of their processes to reduce costs and operational risks. Process mining can help to improve processes — and help identify tasks that can be automated in the future — by reconstructing and analyzing process models based on log data.
A key factor for the successful mining of reliable process models is a high level of data quality. Therefore, this study investigates the data quality of a concrete SNB payment process. The results of the analysis show that the data that is available via the core banking platform, the Avaloq Banking System (ABS), is appropriate for process mining. Therefore, process mining can be applied to operational processes that are run via ABS.
At the same time, we did identify a number of data-specific problems, for which we have developed practical solutions. These data quality problems and their solutions are described in more detail in the remainder of this article.
Issue #1: Timestamp recorded in the event log does not reflect the actual time of the activity
The time stability of order numbers is ensured in ABS at all times, even in the event of potential order cancellations. Furthermore, it is not possible to manipulate these numbers, either by system users or by the system itself.
During the analysis of the quality of the timestamps a similarly positive picture emerged. The timestamps in the workflow log are uniformly formatted. Furthermore, There is no mixing of activity start and end times. There are also no missing or incomplete timestamps or dummy values (such as 01.01.1900).
To validate the data with domain experts, the animation can be used as a communication means. An animation demonstrates in a visual way how cases move through the process, which enables process experts to compare the dynamic flow with their intuition.
For example, we analyzed the processing of outgoing payments on behalf of a third party (from the receipt of the payment message in ABS to when it is technically ready to be processed in the payment system). The screenshot below illustrates the queue and wait times of payments with the ‘wait for trade date’ status. These wait times are expected for these payments, because the instructed payment execution date has not yet been reached and the workflow activity ‘release’ can, therefore, not yet be executed (see highlighted area).
However, what became clear is that the timestamp stored in the event log for each activity shows when a field in the ABS input mask was saved. It does not provide any information about when an employee actually carried out an activity in the underlying process.
This means that before relying on these timestamps in a process mining analysis, it needs to be further investigated how close they are to the actual activities.
Issue #2: Attributes in free-text format
We had decided to include the ‘Post-it’ functionality in the data extraction process. Although this field is not essential for the process in this case study, it can provide process-related context information for potential future analyses, thereby enhancing the settlement of a payment order with additional information. For example, it could be used to electronically document queries that are made in connection with a pending payment order due to incorrect debit or credit account details.
Because the ‘Post-it’ field is an optional free-text field, there are too many variations to analyze the field directly. For example, if one and the same property is logged once with spaces and once without spaces then this results into two different values. Especially if a free-text field is used as an activity name (and not just as an additional data field for context) this is a problem because the number of activities in the process map explodes.
To be useful, the free-text data needs to be pre-processed in some form to map all the values that belong to the same activity on a common value. Another approach could be to provide standardized selection options in the system in the future.
Issue #3: Too fine-grained activity names
We also found that workflow activities with very similar names were recorded slightly differently in the workflow log. For example, the workflow activity ‘Approve (4-Eyes)’ appeared both with the label ‘Approve (4-Eyes) (222180)’ and ‘Approve (4-Eyes) (222380)’. The database administrators explained that the additional technical descriptions vary depending on the processing stage of a payment order.
The problem was that these variations complicated the process map, because they increased the overall number of activities. On a technical level, different system commands were executed. However, from a business process analysis perspective these distinctions were not relevant. Therefore, all the names of thematically related activities were standardized, which simplified the process map. It brought the process back to the level of detail that was useful for a domain expert-based process mining analysis.
Issue #4: Events occurring in reality, but not recorded in the event log
We realized that the events in the workflow log only cover the status changes for the technical workflows that are stored in ABS. This means that other technical process activities that occur outside of ABS are not included.
As a result of this problem, there is no visibility about the technical process steps that are carried out by members of the Payments unit. Despite this limitation, the partial process map still provided a valuable basis for discussion and allowed the process owner to improve their understanding of the process. If the missing part of the process needs to be analyzed in more detail, conventional process discovery techniques can be applied.
For further details, read the full case study here.