Data Quality Problems in Process Mining and What To Do About Them — Part 12: Missing History
This is the twelfth article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.
When you get a data set and assess the suitability of the data for process mining, you start by looking for the three elements: Case ID, activity name and timestamp.
For example, when you look for the case ID then you start looking at the candidate columns to see whether there are multiple rows in the data set that refer to the same ID (see image below). If you don’t have multiple rows with the same case ID, then most likely the field that you thought could be your case ID is just an event ID and does not help you to correlate the steps that belong to the same process instance1.
When you continue looking for the other fields, it sometimes seems as if you have all the fields that you need at first. But then you find out that you actually miss the history information in these fields. Read on to learn about four situations, where this can happen.
Missing Activity History
When you look for a field that can be your activity name, you may encounter a situation like shown in the picture below: The status is the same for each event in the case.
In this situation, you do have a column that tells you something about the process step, or the status, for…