Data Quality Problems In Process Mining And What To Do About Them — Part 3: Zero Timestamps
This week, we are moving to the timestamp problems. Timestamps are really the Achilles heel of data quality in process mining. Everything is based on the timestamps: Not just the performance measurements but also the process flows and variant sequences themselves. So, over the next weeks we will look at the most typical timestamp-related issues.
Zero timestamps (or future timestamps)
One data problem that you will most certainly encounter at some point in time are so-called zero timestamps, or other kind of default timestamps that are given by the system. Often, zero timestamps were initially set as an empty value by the programmer of the information system. They can either be a mistake or indicate that the real timestamp has not yet been provided (for example, because an expected process step has not happened yet). Another reason can be typos in manually entered data.
These Zero timestamps typically take the form of 1 January 1900, the Unix epoch timestamp 1 January 1970, or some future timestamp (like 2100).
To find out whether you have Zero timestamps in your data, you can best go to the Overview statistics and take a look at the earliest and the latest timestamps in the data set. For example, in the screenshot below we can see that there is at…