Why Process Mining is better than Excel for Process Analysis
I keep meeting people who tell me that process mining is so much easier than Excel for process analysis.
“Process analysis with Excel?” some of you may ask.
You can do a lot of things that you wouldn’t think are possible. For example, the picture above is entirely made from packaging tape. I had no idea that you could do this with a roll of packaging tape. That’s why it’s art.
Excel is so prevalent that there must be quite a few Excel spreadsheets out there that are close to art, at least in terms of the dedication and pain that it took to get there.
We tend to use the tools that we know to answer any task at hand, whether it’s the best tool for the job or not. And often that makes sense, because the time that we need to learn a new tool must be factored in as well.
But with a process mining tool as easy to learn as Disco, it’s time to revisit your typical process analysis tasks again and to ask yourself whether you could not solve some of them much faster and better with process mining. Chances are you can. Here is why.
1. The Assumed Process is Not Your Real Process
When I see people do process analysis with Excel, they invariably gravitate to a data format like the following:
One row per case (see case 1 highlighted)
Activities in columns with the dates or timestamps recorded in the cell content
This is often done to make things easier (how would you otherwise measure the time it takes to get from A to E?).
The problem with this format is that it assumes that the process goes through the activities A-E in an orderly fashion. But processes are really complex and messy in reality. And pressing your data in such a column-based format loses information about the real process.
Look at the following event log, which has been transformed into a row-based data format by:
duplicating the rows for each activity (again, case 1 is highlighted)
adding an activity and timestamp column to capture the time for each activity
This is the format you need to transform your data to if you want to import it in Disco. But it’s not a pure formatting issue. The column-based format is not suitable to capture event data about your process, because it inherently loses information about repetitions.
Look at the following data set, which shows the real process log as it happened:
Only case 2 followed the expected path
In case 1 and in case 3 rework occurred (see blue mark-up) that is simply lost in the first event log
Now, if you import a data set that was transformed from a column-based format to a row-based format, you can analyze it with process mining, but you might get some distortions (see discovered process map below). For example, the direct transition between Activity B and Activity D never actually happened.
In reality, the process looks like this:
If you are curious: these were simplified versions of the process. Here are the full pictures for both the column-based and transformed (left) and the real data set (right). Click on them to enlarge them.
This shows that just by capturing your data in an Excel-friendly format you already lose information about the real process. It’s much better to take the actual data and analyze the real processes, which a process mining tool like Disco makes very easy to do.
2. The Case Context is Preserved
One of the advantages of process mining is that, because the analysis is based on the raw transactional data, you can always – at any point in time – look up individual examples (see screenshot of Cases view in Disco below) for patterns that you find in your analysis.
It’s important to be able to look at concrete examples to really understand what is going on and to derive actionable information:
This is the normal process path? Let me look at some example cases.
Some cases take more than five months? Which teams are handling them? I’ll filter them and look at them in more detail.
This path is impossible! Let me drill into that and look at an example case to see what is happening.
If you do your case duration analysis in Excel and, for example, have found the case IDs for the 10 longest-running processes instances, then you have to look up the case history in the source system (for example, your CRM, ITSM, Workflow or ERP system) to understand the context of the case and derive actionable information out of your analysis. This is slow and painful and can only be done for a few cases before it becomes impractical.
3. Easy Filtering and Variant Analysis Possibilities
Filtering is an important part of process analysis. Sometimes, you want to remove cases that were done at a specific location, because there “they do things differently”. You might need to focus on those long-running cases. You want to drill down into this path that you thought should not be possible. And you need to be able to do it fast.
Disco has very powerful filtering capabilities (see example screenshot of the Performance filter in Disco above) and lets you answer almost any question very quickly. This is the advantage of a specialized process mining tool that – unlike Excel – focuses on the process perspective. Filtering becomes process-oriented and interactive.
Another example for a process-oriented analysis that is not possible in Excel is variant analysis. You can read a detailed article about variant analysis in this previous blog post about How to understand the variants in your process.
4. Visualization is King
Processes need to be visualized to understand them. This is why every traditional process discovery and analysis activity includes drawing process maps. Process mining is inherently visual because it provides factual and graphical representations of your discovered process.
You can even replay the actual behavior from the event log and visualize the process that took place over time (see above).
In Excel you always need to imagine the process along with your calculations and this only works for very simple processes.
5. The Power of Exploration
Excel is a very powerful tool and I am sure you can answer almost any question with it if you tweak the data and perhaps start programming around your data in Visual Basic. Just like you can answer almost any question with SQL queries based on your data in a database.
But this leaves out one very important element in process mining: The possibility to discover your process beyond questions that you already had. Process mining is inherently explorative. It shows you what really happened in your process and then gives you the possibility to easily interact, filter, and visualize your data from a process perspective.
The visualization, the interactivity, and the process-orientation together give you the power to see and further explore things that you did not see before.
As I work through the review I am always comparing this experience to a past project I did with an order to cash process. The objective was to take a few million dollars out in cost savings and I worked with an event log, although I did not have a process mining tool like Disco. I was totally reliant on custom analysis with Excel.
I did define a perfect process based on specific client and process conditions and used it in my analysis. After spending about a week defining and extracting data, I spent a couple of months on the analysis.
In my estimation, with a tool like Disco I could have done a better job of the analysis in a couple of weeks.
What did you try to do with Excel that you later found to be much easier with a process mining tool? Let us know in the comments!