Case Study: Auditing With Process Mining — Part VI: Data Transformation
This is the 6th article in our case study series on auditing with process mining. The series is written by Jasmine Handler and Andreas Preslmayr from the City of Vienna. You can find an overview of all the articles in the series here.
The goal of the next step was to bring the raw data in a format that we could load into the process mining software. We filtered the relevant information from the raw data files and linked the data tables based on the prior defined connections. The output data was formatted as an event log, with a unique ID as case ID, activity names, timestamps, resources, and attributes for each event.
We performed the data transformation using the open-source software KNIME. To validate the transformed data, we performed crosschecking with the productive system whenever we implemented changes in the data transformation workflow. These validation steps showed quite some potential for improvement, and we adapted the workflow several times until the output data finally represented the data from the productive system (see Figure 7 below).
Figure 7: The first (left) and last (right) data transformation workflow version
The data transformation was the most time-consuming step within the process mining project. One of the factors was that we had no direct access to the productive system. Therefore, the audited party had to support the data validation process and help with crosschecking. This led to waiting times and delays within the project.
Another factor was that we initially had not appropriately considered the 1:n and n:m relationships when tracing the case IDs. For example, one order can lead to several invoices and payments. Furthermore, one invoice can address multiple orders. One payment can cover more than one invoice, and so on. These many-to-many relationships had to be adequately handled during data transformation.
After several adaptions to the transformation workflow, we passed all the validation steps and generated a data set we were confident working with.
New parts in this auditing series will appear on this blog every week. Simply come back or sign up to be notified about new blog entries here.
Leave a Comment
You must be logged in to post a comment.