Managing Complexity in Process Mining Part II: Remove Incomplete Cases
Part II: Remove Incomplete Cases
Removing incomplete cases seems like a pre-analysis, clean-up step but read on to learn why it is also relevant as a simplification strategy.
Strategy 3) Remove Incomplete Cases
Imagine you just got a new data set and simply want to make a first process map. You typically do not want to get into a detailed analysis right away. For example, you often want to first validate that the extracted data is right, or you might need to quickly show the process owner a first picture of how the discovered process looks like.
Obviously, a complex process map is getting in your way to do that.
Now, while filtering incomplete cases is a typical preparation step for your actual analysis, you might also want to check whether you have incomplete cases to get a simpler process map. Here is why.
In many cases, the data that is freshly extracted from the IT system contains cases that are not yet finished. They are in a certain state now and if we would wait longer then new process steps would appear. The same can happen with incomplete start points of the process (things may have happened before the data extraction window).
For the analysis of, for example, process durations it is very important to remove incomplete cases, because otherwise you will be judging half-finished cases as particularly fast, reducing the average process duration in a wrong way. But incomplete cases can also inflate your process map layout by adding many additional paths to the process end point.
To understand why, take a look at the process map below. It shows that next to the regular end activity Order completed there are several other activities that were performed as the last step in the process showing up as dashed lines leading to the end point at the bottom of the map. For example, Invoice modified was the last step in the process for 20 cases (see below). This does not sound like a real end activity for the process, does it?
To remove incomplete cases, you can just add an Endpoints filter in Disco and select the start and end activities that are valid start and end points in your process (see below).
The resulting process map will be simpler, because the graph layout becomes simpler (see below).
So, even if you are in a hurry and not really in the analysis phase yet, it is worth to try removing incomplete cases if you are faced with too much complexity in your process.
That was strategy No. 3. Watch out for Part III, where we explain how dividing up your data can help simplifying your process maps.