Data Quality Problems In Process Mining And What To Do About Them — Missing Complete Timestamps for Ongoing Activities
This is the 13th article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.
If you have ‘start’ and ‘complete’ timestamps in your data set, then you can sometimes encounter situations, where the ‘complete’ timestamp is missing for those activities that are currently still running.
For example, take a look at the data snippet below (click on the image to see a larger version). Two process steps were performed for case ID 1938. The second activity that was recorded for this case is ‘Analyze Purchase Requisition’. It has a ‘start’ timestamp but the ‘complete’ timestamp is empty, because the activity has not yet completed (it is ongoing).
In principle, this is not a problem. After importing the data set, you can simply analyze the process map and the variants, etc., as you would usually do. When you look at a concrete case, then the activity duration for the activities that have not completed yet is shown as “instant” (see the history for case ID 1938 in the screenshot below).
However, where this does become a problem is when you analyze the activity duration statistics (see screenshot below). The “instant” activity durations influence the mean and the median duration of the activity. So, you want to remove those activities that are still ongoing from the calculation of the activity duration statistics.
How to fix:
Leave a Comment
You must be logged in to post a comment.