Business Management Presentations Process Analysis Process Modeling

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Transcript

On the Role of Fitness,Precision, Generalization andSimplicity in ProcessDiscoveryJoos BuijsBoudewijn van DongenWil van der Aalst http://www.win.tue.nl/coselog/ Advances in Process Mining• Many process discovery and conformance checking algorithms and tools are available (cf. the various ProM packages).• Also commercial software based on these ideas: Disco (Fluxicon), Reflect (Futura/Perceptive), BPMOne (Pallas Athena/Perceptive), ARIS Process Performance Manager (Software AG), Interstage Automated Process Discovery (Fujitsu), QPR ProcessAnalyzer/Analysis (QPR Software), flow (fourspark), Discovery Analyst (StereoLOGIC), etc.• We applied process mining in over 100 organizations. More than 75 people involving more than 50 organizations created the Process Mining Manifesto in the context of the IEEE Task Force on Process Mining. Available in 13 languages PAGE 1 Example Process Discovery(Vestia, Dutch housing agency, 208 cases, 5987 events) PAGE 2 Example: Conformance Checking(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988) PAGE 3 Challenge:Four Competing Quality Criteria “able to replay event log” “Occam’s razor”replay fitness simplicity process discoverygeneralization precision “not overfitting the log” “not underfitting the log” PAGE 4 Example: one log four models b examine thoroughly g pay c compensation a examine e start register casually decide end # trace request h 455 acdeh d reject check ticket request 191 abdeg f reinitiate request 177 adceh N1 : fitness = +, precision = +, generalization = +, simplicity = + 144 abdeh 111 acdeg a c d e h 82 adceg start register examine check decide reject end request casually ticket request 56 adbeh N2 : fitness = -, precision = +, generalization = -, simplicity = + 47 acdefdbeh “able to replay event log” “Occam’s razor” 38 adbeg examine check thoroughly b d ticket g 33 acdefbdehreplay fitness simplicity pay compensation a 14 acdefbdeg start register examine c end 11 acdefdbeg request casually e f reinitiate h process decide request reject request 9 adcefcdeh discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh 5 adcefbdeg a d c e g 3 acdefbdefdbeggeneralization precision register request check ticket examine casually decide pay compensation 2 adcefdbeg a c d e g 2 adcefbdefbdeg “not overfitting the log” “not underfitting the log” register examine check decide pay request casually ticket compensation 1 adcefdbefbdeh a d c e h 1 adbefbdefdbeg register check examine decide reject request ticket casually request 1 adcefdbefcdefdbeg a c d e h 1391 start end register examine check decide reject request casually ticket request … (all 21 variants seen in the log) a b d e g register examine check decide pay request thoroughly ticket compensation a d b e h register check examine decide reject request ticket thoroughly request a b d e h register examine check decide reject request thoroughly ticket request PAGE 5 N4 : fitness = +, precision = +, generalization = -, simplicity = – # trace 455 acdeh Model N1 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh b 47 acdefdbeh examine thoroughly 38 adbeg g 33 acdefbdeh pay c compensation 14 acdefbdeg a examine e 11 acdefdbegstart register casually decide end request 9 adcefcdeh h d reject 8 adcefdbeh check ticket request 5 adcefbdeg f reinitiate 3 acdefbdefdbeg requestN1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 6 1391 # trace 455 acdeh Model N2 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh 38 adbeg a c d e h 33 acdefbdehstart register examine check decide reject end 14 acdefbdeg request casually ticket request N2 : fitness = -, precision = +, generalization = -, simplicity = + 11 acdefdbeg 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 7 1391 # trace 455 acdeh Model N3 191 abdeg 177 adceh 144 abdeh 111 acdeg 82 adceg 56 adbeh 47 acdefdbeh examine check thoroughly b d ticket g 38 adbeg pay 33 acdefbdeh compensation a 14 acdefbdegstart register examine end 11 acdefdbeg request casually c e f reinitiate reject h 9 adcefcdeh decide request request 8 adcefdbeh N3 : fitness = +, precision = -, generalization = +, simplicity = + 5 adcefbdeg 3 acdefbdefdbeg 2 adcefdbeg 2 adcefbdefbdeg 1 adcefdbefbdeh 1 adbefbdefdbeg 1 adcefdbefcdefdbeg PAGE 8 1391 # trace 455 acdehModel N4 191 abdeg 177 adceh 144 abdeh a d c e g 111 acdeg register check examine decide pay request ticket casually compensation 82 adceg a c d e g 56 adbeh register examine check decide pay request casually ticket compensation 47 acdefdbeh a d c e h 38 adbeg register check examine decide reject request ticket casually request 33 acdefbdeh a c d e h 14 acdefbdegstart end register examine check decide reject request casually ticket request 11 acdefdbeg … (all 21 variants seen in the log) 9 adcefcdeh 8 adcefdbeh 5 adcefbdeg a b d e g register examine check decide pay 3 acdefbdefdbeg request thoroughly ticket compensation 2 adcefdbeg a d b e h register check examine decide reject 2 adcefbdefbdeg request ticket thoroughly request 1 adcefdbefbdeh a b d e h register examine check decide reject 1 adbefbdefdbeg request thoroughly ticket request 1 adcefdbefcdefdbeg N4 : fitness = +, precision = +, generalization = -, simplicity = – PAGE 9 1391 Another challenge: Huge search spaceM M M M M M M M M M M M M M M M M M M M MM M M M M M M M M M MM M M M M M M MM M M M M M M M M MM M M M M M M M M M M M M M M M M M M MMM M MM M M MM M M M M M M MM M MM M MM M MM MMMM M M M M M M M M M M M M M M M M M M M M M MM M M M M M M M M M M M M M M M M MM M M M M M M M M M M M M M M M M M MM M M M M M M M M MM M M M MM M M M M M M M MM M M M M MM M M M M M M MM M M M M M M M M M M M MM M MM M M MM M M M M M M M M M MM M M M MM M M M M M MM M MM MM M M M M M M M M M M M M M M MM M M M M M M M M M M M MM M M M M M M M MM M M M M MM M M M M M M M M M M MM M M M M M M M MM M M M M M M MM M M M MM M MM M M M M MM M M MM M MM M M M M MM M M M M PAGE 10 … with just a few interesting candidatesM M M M M M M M M M M M M M M M M M M M MM M M M M M M M M M M MM M M M M M MM M M M M M M M M MM M M M M M M M M M M M M M M MM M M M M M M MMM M MM M M M M M M MM M M M M MM M MM M MM MM M M M MMM M M M MM M M M M M M M M MM M M MM M M M M M M M M M M M M M M M M M MM M M M M M M MM M M M M M M M M M M M MM MM M M M M M MM M M M M M M M MM M M M M MM M M M M M MM M M M M M M M M M M M M MM M M M M M M M MM M M M MM M M MM M M M M MM M MM MM M M M M M M M M MM M M M M M M M M MM M M M M M M M M M M M M M MM M M M M M M M MM M M M M MM M M M M M M M M M M MM M M M M M M M M MM M M M M M MM M M M M M M M M M MM M M M MM M M M M M MM M MM M M M M M PAGE 11 Two requirements1. It should be possible to “able to replay event log” “Occam’s razor” replay fitness simplicity seamlessly balance the different quality criteria process discovery based on user-defined generalization precision preferences. “not overfitting the log” “not underfitting the log”2. The algorithm should always return a “correct” process model and not waste time on model having deadlocks and other anomalies. PAGE 12 Proposal: Evolutionary Tree Miner (ETM)• Process trees as representation (= limit search space to “good” models).• Genetic approach (= very flexible)• Fitness function uses all four criteria (= seamlessly balance the different “forces”) PAGE 13 Representational Bias: Process Trees → • Always sound because of the block structure A A → • Also Loop and OR operator ∧ E B C D X D A B B C A (BA)* E PAGE 14 Petri Net Semantics(used for comparison and conformance checking only) A A BSequence Parallellism B A B AExclusive Choice A B B Or ChoiceLoop PAGE 15 Steps of the Genetic ETM Algorithm Change Population No Create Return Measure Stop Initial Best Quality ? YesPopulation Individual PAGE 16 Population Change ElitePopulation i Population i+1 Selection Replace Crossover Mutation PAGE 17 Four Metrics (see paper) “able to replay event log” “Occam’s razor” replay fitness simplicity process discovery generalization precision “not overfitting the log” “not underfitting the log” 1 = optimal 0 = very bad PAGE 18 Example B E A C G D F A = send e-mail, B = check credit, C = calculate capacity, D = check system, E = accept, F = reject, G = send e-mail PAGE 19 Conventional Algorithms (1/3) (“best effort” mapping to process trees to allow for comparison)alpha miner low fitnessILP miner low precisionlanguage-based region miner low fitness PAGE 20 Conventional Algorithms (2/3)heuristic minermulti-phase miner PAGE 21 Conventional Algorithms (3/3)genetic minerstate-based region miner PAGE 22 Often unsound result and no mechanismto seamlessly balance the four criteria PAGE 23 Genetic Mining (ETM) While Considering Only One Criterion best value possible for this logETM with weight zero to three out of four perspectives. PAGE 24 Considering Replay Fitness and One Other CriterionETM with weight zero to two out of four perspectives. PAGE 25 Considering 3 of 4 Criteria replay fitness needs to have a larger weight PAGE 26 Considering All Four Criteria withEmphasis on Fitness fitness has weight 10 PAGE 27 Initial Model Versus Discovered Model simulated discovered by ETM B EA C G D F PAGE 28 Real-Life Event Logs• Event log L0 is the event log used before. L0 contains 100 traces, 590 events and 7 activities.• Event Log L1 contains 105 traces, 743 events in total, with 6 different activities.• Event Log L2 contains 444 traces, 3.269 events in total, with 6 different activities.• Event Log L3 contains 274 traces, 1:582 events in total, with 6 different activities.Event logs L1, L2 and L3 are extracted fromthe information systems of municipalitiesparticipating in the CoSeLoG project(http://www.win.tue.nl/coselog/). PAGE 29 Results If unsound , the sound behavior is approximated when creating the process tree. Equal weights for all criteria. PAGE 30 Conclusion “able to replay event log” “Occam’s razor”replay fitness simplicity process discoverygeneralization precision “not overfitting the log” “not underfitting the log” • First algorithm that allows for balancing all four perspectives. • Genetic algorithm is very flexible, but also very slow. • Process trees only used internally (choose your favorite representation) • Future work: − Improve speed − Distribute PM tasks − Discover configurable process trees PAGE 34 www.processmining.org www.win.tue.nl/ieeetfpm/ PAGE 35

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/on-the-role-of-fitness-precision-generalization-and-simplicity-in-process-discovery/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×