ProM Tips — Which Mining Algorithm Should You Use?
Probably the most well-known and popular process mining tool available is ProM, an open source toolkit developed at Eindhoven University of Technology. ProM is a good choice to explore process mining, because it has consistently been at the forefront of that technology1.
If you start up ProM for the first time to try out Process Mining, the number of available plugins (almost 300) can be daunting. Just look at the plugins that discover a process model, and you end up counting at least 16.
What not to do
Many people have read about the alpha-algorithm in some paper, or in the ProM tutorial, and just keep using that one. Don’t do this. The alpha-algorithm is beautiful from a scientific perspective, because it can be formalized in 8 lines (see page 83 in this presentation) and because interesting properties can be proven around it.
For real-life logs, the alpha-algorithm is almost never the right choice. It won’t work. Well, it will give you a result, of course (it always does) – But the result won’t be good. So, don’t use it.
The 3 recommended Mining Algorithms
So, which algorithm should you use? I can recommend you to use the following three process discovery plugins in ProM.
1. Heuristic miner
The Heuristic Miner was the second process mining algorithm, closely following the alpha algorithm. It was developed by Dr. Ton Weijters, who used a heuristic approach to address many problems with the alpha algorithm, making this algorithm much more suitable in practice.
- Output: Heuristic net
- When to use it: When you have real-life data with not too many different events, or when you need a Petri net model for further analysis in ProM
The Heuristic miner (previously Little Thumb) derives XOR and AND connectors from dependency relations. It can abstract from exceptional behavior and noise (by leaving out edges) and, therefore, is also suitable for many real-life logs.
One of the advantages is that a Heuristic net can be converted to other types of process models, such as a Petri net for further analysis in ProM.
2. Fuzzy miner
The Fuzzy miner is one of the younger process discovery algorithms, and was developed by Fluxicon co-founder Christian W. Gnther in 2007. It is the first algorithm to directly address the problems of large numbers of activities and highly unstructured behavior.
- Output: Fuzzy Model
- When to use: When you have complex and unstructured log data, or when you want to simplify the model in an interactive manner
The Fuzzy miner uses significance/correlation metrics to interactively simplify the process model at desired level of abstraction. Compared to the Heuristic miner it can also leave out less important activities (or hide them in clusters) if you have hundreds of them.
The fuzzy model cannot be converted to other types of process modeling languages, but you can use it to animate the event log on top of the created model to get a feeling for the dynamic process behavior.
[Update: The process mining algorithm in Disco is based on the Fuzzy miner. You can read more about how the Disco miner has been further developed based on the Fuzzy miner in the Disco Tour here.]
3. Multi-phase miner
The Multi-phase miner was the first algorithm to explicitly use the OR split/join semantics, as found in EPCs, enabling it to express complex behavior in relatively well-structured models. It was developed by Dr. Boudewijn van Dongen, a process mining veteran and longtime leading developer of ProM.
- Output: Event-driven Process Chain (EPC)
- When to use: When you have simple and structured log data and you want to export the mining result to Aris
The Multi-phase miner folds XOR, AND, and OR connectors from so-called runs and displays the resulting model as an EPC. The EPC can then be exported to Aris (e.g., in Aris graph format) and further processed from there.
One of the advantages of the Multi-phase miner is that it constructs a model that always fits the complete event log (more on that in a later post). However, it is seldom useful for more complex processes because the model becomes unreadable.
What do you think?
In this post, I have tried to give you a pragmatic recommendation for which mining algorithm you should use, and when. So, while there may be other plugins that are fascinating from a scientific standpoint, I have focused here on what works in practice.
Let me know if you disagree, and please share your experiences in the comments!
Another reason that many people like ProM, obviously, is that it’s free. But that is a topic for an entirely different discussion ↩︎