Privacy, Security and Ethics in Process Mining — Part 3: Anonymization
This is the 3rd article in our series on privacy, security and ethics in process mining. You can find an overview of all articles in the series here.
If you have sensitive information in your data set, instead of removing it you can also consider the use of anonymization techniques. When you anonymize a set of values, then the actual values (for example, the employee names “Mary Jones”, “Fred Smith”, etc.) will be replaced by another value (for example, “Resource 1”, “Resource 2”, etc.).
If the same original value appears multiple times in the data set, then it will be replaced with the same replacement value (“Mary Jones” will always be replaced by “Resource 1”). This way, anonymization allows you to obfuscate the original data but it preserves the patterns in the data set for your analysis. For example, you will still be able to analyze the workload distribution across all employees without seeing the actual names.
Some process mining tools (Disco and ProM) include anonymization functionality. This means that you can import your data into the process mining tool and select which data fields should be anonymized. For example, you can choose to anonymize just the Case IDs, the resource name, attribute values, or the timestamps. Then you export the anonymized data set and you can distribute it among your team for further analysis.
Determine which data fields are sensitive and need to be anonymized (see also the list of common process mining