BPI Challenge 2012 — An Interview with Boudewijn van Dongen
As you may have heard, this year’s BPM Conference will take place 3–7 September in Tallinn, Estonia. The conference program itself is always very high-quality and features a number of talks on process mining. However, for me the most interesting and inspiring part of the conference has always been the BPI Workshop, which is the meeting place for process mining researchers.
This year, the BPI Workshop features the Second International Business Process Intelligence Challenge (BPIC12), which is also aimed at practitioners. The basic idea is simple: The BPI team has provided an event log, with some background information and points of interest. This is a great opportunity for everybody, researchers and practitioners, to show off their process mining skills and analyze this log, with the chance for eternal fame and a prize.
I had the chance to ask Boudewijn van Dongen, a former colleague of ours from Eindhoven University of Technology, and one of the BPI Workshop’s organizers, some questions about the BPI Challenge. Read our interview below, or head straight to the BPI Challenge website!
Interview with Boudewijn van Dongen
Christian: You have been involved in process mining research since the very beginning, and you are also organizing the BPI workshop, one of the major events where process mining researchers meet. Last year, you have initiated the first BPI Challenge alongside the workshop, and I was happy to see that you are continuing this tradition in 2012. How would you explain to people what the BPI Challenge is about?
Boudewijn: From the start of our process mining research, we have been looking for real-life data to validate our algorithms, approaches and techniques. Some 10 years ago, finding organizations that had event data available and at the same time were willing to share this data was a great challenge. Through a number of successful projects (and probably just as many unsuccessful ones) however, our research gained some momentum while at the same time organizations became more willing to share their data as they became more and more aware of the benefits of applying process mining techniques.
Meanwhile, researchers all over the world also gained interest in the domain of process mining and started to develop process mining techniques of their own. Many of these researchers were faced with the same challenge we encountered when looking for real-life data that can be used for validation purposes. Over the years, this has resulted in many requests from fellow researchers if we could share our real-life data to allow them to validate their work. On many occasions, we had visiting researchers use our collected case studies, but we were almost never allowed or able to publicly share our datasets.
In 2010, the three universities of technology in The Netherlands joined forces in erecting the 3TU Datacenter. This initiative aimed at publicly sharing datasets such that other researchers can benefit from whatever data can be collected (in many domains, not just process mining). This spawned an idea within our group to make a real-life dataset available to the community. However, we needed a way to make the research community aware of the existence of this dataset (as well as the entire collection) and this is where the BPI challenge comes in.
For the BPI challenge, both researchers and practitioners are asked to test, apply or validate whatever technique or tool they developed to real-life data. The datasets we use for this challenge are, of course, completely anonimized, but other than that they are not cleaned or altered in any way and such datasets indeed pose a challenge for many tools and techniques. This year for example, we have obtained a dataset from a loan application process of a financial institute. Every case in this log is an actual request by an actual person for a loan. We expect these datasets to pose a challenge for many process mining techniques, i.e. for example the “alpha-algorithm” does not produce sensible results, but at the same time we expect that process mining techniques are mature enough to provide insightful results.
Last year, the jurors of the challenge were indeed pleasantly surprised by the maturity of the submitted results. One participant even wrote in his conclusions that the dataset looked complicated at first sight, but that the underlying process was actually straightforward, and this was a dataset from the most complex of organizations, namely a Hospital.
Christian: _Yes, I still remember the times when researchers were desperately looking for real-life event logs to try their techniques and software on. In fact, sometimes people still contact me with requests for logs.
In that sense I think it is great that the challenge does not only address the question for real-life data, but also provides sort of a benchmark for the variety of process mining and other BPI approaches out there. It forces researchers out of their tightly-controlled comfort zone to prove their approaches in the wild.
While we do have a lot of researchers and students reading our blog, I think there is an even larger number of practitioners, consultants or people working on process analysis in industry. In your opinion, why should these people take part in the BPI Challenge? And maybe you have some tips on how to get started?_
Boudewijn: While researchers are often looking for logs to validate their work on, practitioners are faced with the opposite problem. Business analysts, consultants, but also process owners often have a good feeling about potential improvements to their processes. When looking for software solutions to analyze their processes and to confirm their ideas they encounter the problem of heterogenity in the data required for these tools. By participating in this challenge, I think that they can benefit in two ways.
One of the goals of the IEEE Task Force on Process Mining was to standardize the format in which event logs are recorded and stored and, as you know, the XES format in which the logs for the challenge are presented does exactly that. Currently, different process analysis solutions require different input. In some cases, a simple CSV file or database dump may be sufficient to start analysis, while in other cases complete adapters have to be developed from scratch. However, more and more tools support the XES format and participating in the challenge will help practitioners to get used to this event logging format.
However, a greater benefit for practitioners is in the experience of seeing many different analysis results on the same dataset. By getting logs in the XES format from their own processes, they could simply repeat the analysis of other participants. Moreover, most researchers would be happy to help in doing such an analysis, especially if the results can be published. In our academic community, we see that there is a real requirement for any new technique to be validated on real life data and what better way is there to validate results than to do this together with the process owner who has something to gain?
For a practitioner to get started on process mining, I think that there are a few good places. First, I would recommend looking at the website www.processmining.org and the ProM toolset for which a tutorial is available. Also, last year’s winner, JC Bose, has written a section in his PhD thesis on the analysis of the hospital log which nicely explains how he tackled that log. For a real introduction into the field of process mining, the recent book by Wil van der Aalst is also a good starting point.
Christian: These are some great pointers. Thanks for taking the time for this interview, Boudewijn!
Do you accept the challenge?
If you accept the BPI Challenge, the organizers have made it very easy this year to get started. On the website for the BPI Challenge, you can download the challenge event log in the XES and MXML formats.
And if you would like to use Disco for the challenge, we have prepared a handy project containing this log which you can download here1.
The deadline for submitting your analysis results is Monday, 30 July 2012.
As a modern process mining software, Disco has of course no problem directly loading the XES file provided by the challenge organizers. However, if you use Disco in demo mode, using this project file makes sure you can analyze the full data set. ↩︎