Data Quality Problems In Process Mining And What To Do About Them — Part 5: Same Timestamps for Different Activities
This is the fifth article in our series on data quality problems for process mining. Make sure you also read the previous articles on formatting errors, missing data, Zero timestamps, and wrong timestamp configurations.
In the previous article on wrong timestamp configurations we have seen how timestamp problems can influence the process flows and the process variants. One reason for why timestamps can cause problems is that they are not sufficiently different. For example, if you only have a date (and no time) then it may easily happen that two activities within the same case happen on the same day. As a result you don’t know in which order they actually happened!
Take a look at the following example: We can see a simple document signing process with four activities and three cases.
The order of the rows in each case is arbitrary. When importing this data set, the sequence of events is determined based on the timestamps. For example, the sequence of the steps ‘Created’ and ‘Sent to Customer’ for case 1 is reversed (compared to the original file), because the dates reflect that the two steps have happened in the opposite order (see screenshot below).
However, if two activities happen at the same time (on the same day in this example), then Disco does not know in which order they actually occurred. So, it keeps the order in which they appear in the …