Blog Posts

Data Quality Problems in Process Mining and What To Do About Them — Part 8: Different Clocks

Mission Control

This is the eighth article in our series on data quality problems for process mining. You can find an overview of all articles in the series here.

In previous articles we have seen how wrong timestamps can mess up everything in process mining: The process flows, the variants, and time measurements like case durations and waiting times in the process map.

One particularly tricky reason for timestamp errors is that the timestamps in your data set may have been recorded by multiple computers that run on different clocks. For example, in this case study at a security services company operators logged their actions when they arrived on-site, identified the problem, etc. on their hand-held devices. These mobile devices sometimes had different local times from the server as well as from each other.

If you look at the scenario below you can see why that is a problem: Let’s say a new incident is reported at the headquarters at 1:30 PM. Five minutes later, a mobile operator responds to the request and indicates that they will go to the location to fix it. However, because the clock on their mobile device is running 10 minutes late, the recorded timestamp indicates 1:25 PM.

When you then combine all the different timestamps in your data set to perform a process mining analysis, you will actually see the response of the operator show up before the initial incident report. Not only does this create incorrect flows in your process map and variants, but when you try to measure the time between the raising of the incident and the first response it will actually give you a negative time.

Process mining scenario with different clocks

So, what can you do when you have data that has this problem?

First, investigate the problem to see whether the clock drift is consistent over time and which activities are affected. Then, you have the following options.

How to fix:

  1. If the clock difference is consistent enough you can correct it in your source data. For example, in the scenario above you could add 10 minutes to the timestamps from the local operator.

  2. If an overall correction is not possible, you can try to clean your data by removing cases that show up in the wrong order. Note that the Follower filter in Disco also allows you to remove cases, where more or less than a specified amount of time has passed between two activities. This way, you can separate minor clock drift glitches (typically the differences are just a few seconds) from cases where two activities were indeed recorded with a significant time difference. Make sure that the remaining data set is still representative after the cleaning.

  3. If nothing helps, you might have to go back to your data collection system and set up a clock synchronization mechanism to constantly measure the time differences between the networked devices and get the correct timestamps while recording the data along the way.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/data-quality-problems-in-process-mining-and-what-to-do-about-them-part-8-different-clocks-2/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×