Data Quality Problems in Process Mining and What To Do About Them — Part 6: Different Timestamp Granularities
This is the sixth article in our series on data quality problems for process mining. Make sure you also read the previous articles on formatting errors, missing data, Zero timestamps, wrong timestamp configurations, and same timestamp activities. You can find an overview of all articles in the series here.
In the previous article on same timestamp activities we have seen how timestamps that do not have enough granularity can cause problems. For example, if multiple activities happen at the same day for the same case then they cannot be brought in the right order, because we don’t know in which order they have been performed. Another timestamp-related problem you might encounter is that your dataset has timestamps of different granularities.
Let’s take a look at the example below. The file snippet shows a data set with six different activities. However, only activity ‘Order received’ contains a time (hour and minutes).
Note that in this particular example there is no issue with fundamentally different timestamp patterns. However, a typical reason for different timestamp granularities is that these timestamps come from different IT systems. Therefore, they will also often have different timestamp patterns. You can refer to the article How To Deal With Data Sets That Have Different Timestamp Formats to address this problem.
In this article, we focus on the problems that different timestamp granularities can bring. So, why would this be a problem? After all, it is good that we have some more…