Blog Posts

Data Suitability Checklist for Process Mining

Lab testing

Once you start looking for process mining data within your organization, you will be faced with data sets for which you need to determine whether they are suitable for process mining or not.

Perhaps you have found an existing report and want to see if that data extract is usable for your process mining project. Or you have requested a data set set from your IT department and now you need to judge whether it fulfills the requirements for a process mining analysis.

What exactly do you need to look for? Here is a checklist with the questions that you can go through to assess the suitability of your data. You can also download this PDF version to print it out and check off each point.

Checklist Data Suitability

  1. Structured data? Do you have data with columns and rows?

  2. Case ID, Activity, and Timestamp columns available? Do you have at least one column that can be your case ID, your activity name, and your timestamp? See when a timestamp is not needed here.

  3. Same case ID in multiple rows? Does the same case ID show up in more than one row at least sometimes? If each row has a unique case ID, your data is either not usable or you may need to reformat it.

  4. Different activities in the same case? Does the activity name change at least sometimes within the same case? If the activity field does not change over time, it does not contain the history and you need to look for another activity column.

  5. Different timestamps in the same case? Does the timestamp change at least sometimes within the same case? If the timestamp field does not change over time, it does not contain the history and cannot be used as your timestamp column. You can import your data without timestamps if it is already sorted.

  6. Date and time in one column? Are the date and the time portion of your timestamp placed in the same column? Because you can have multiple timestamps, each timestamp needs to be in one column.

  7. Data in one file? If your data was distributed across multiple files (for example, because it comes from different IT systems), have you combined it into one file?

  8. Different timestamp patterns in separate columns? If you have timestamps with different timestamp patterns, are they placed in different columns?

  9. Activity names human-readable? Are your activity names understandable (not just a numeric value like an action code, or a transaction number)?

  10. Activity names generalized enough? Does the same activity in another case have the same activity label (not just a free-text field that is filled differently every time)?

Can you answer ‘Yes’ to all of the points above? Then you can import your data into Disco and continue by checking the quality of your data before starting the actual process mining analysis.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/data-suitability-checklist-for-process-mining/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×