How To Deal With ‘Old Value / New Value’ Data Sets
Take a look at the following example. Instead of one Activity or Status column, you have two columns showing the “old” and the “new” status. For example, in line no. 2 the status is changed from ‘New’ to ‘Opened’ in the first step of case 1.
This is a pattern that you will encounter in some situations, for example, in some database histories or CRM audit trail tables.
The question is how to deal with log data in this format.
Should you use both the ‘Old value’ and the ‘New value’ column as the activity column and join them together?
This would be solution no. 1 and leads to the following process picture.
All combinations of old and new statuses are considered here. This makes sense but can lead to quite inflated process maps with many different activity nodes for all the combinations very quickly.
Normally, you would like to see the process map as a flow between the different status changes. So, what happens if you just choose the ‘Old value’ as the activity during importing your data set?
You would get the following process map.
The process map shows the process flow through the different status changes as expected, but there is one problem: You miss the very last status in every case (which is recorded in the ‘New value’ column).
For example, for case 2 the process flow goes from ‘Opened’ directly to the end point (omitting the ‘Aborted’ status it changed into in the last event).
You can do the same by importing just the ‘New value’ column as the activity column and get the following picture.
This way, you see all the different end points of the process. For example, some cases end with the status ‘Closed’ while others end as ‘Aborted’. But now you miss the very first status of each case (the ‘New’ status).
In this example, all cases change from ‘New’ to ‘Opened’. So, missing the ‘New’ in the beginning is less of a problem compared to missing the different end statuses. Therefore, solution 3 would be the preferred solution in this case. But in other situations, the opposite might be the case.
Filtering Based on Endpoints
Note that you can still use the values of the column that you did not use as the activity name to filter incomplete cases with the ‘Endpoints’ filter.
For example, if you used Solution 2 (see above) but wanted to remove all cases that ended in the ‘New value’ = ‘Aborted’ you can configure the desired end status based on the ‘New value’ attribute with the Endpoints filter as shown below:
In summary, what you can take away from this is the following:
If you encounter the ‘Old value / New value’ situation, often just using one of the two columns is preferred to get the expected view of status changes in the process map.
If you choose the ‘Old value’ column, you will lose the very last status change in each case.
If you choose the ‘New value’ column, you will miss the very first status in each case.
You can still filter start and end points based on the attribute column that you did not use for the activity name.
In most situations, this is enough and you can use your ‘Old value / New value’ data just as it is. If, however, you really need to see the very first and the very last status in your process flow, then you would need to reformat your source data into the standard process mining format and add the missing start or end status as an extra row.
(This article previously appeared in the Process Mining News Sign up now to receive regular articles about the practical application of process mining.)