Slice, Mine, Dice: Complexity-Aware Automated Discovery of Business Process Models
Description
Presentation at the 11th International Conference on Business Process Management (BPM’2013), Beijing, China, 27 August 2013. Winner of the best paper award.
Transcript
1
Event log
Discovering “all-in-one” process models…
Process discovery
algorithms
2
a b a b c d e c a f g h
a b a b k d e c h f g h
b c p q p r a k q r s
b c p h p r a k q r s
a x p h y z t t u
Trace
clustering
a b a b c d e c a f g h
a b a b k d e c h f g h
b c p q p r a k q r s
b c p h p r a k q r s
a x p h y z t t u
Cluster 1
Cluster 2
Noise
Event log
Process variant 1
Process variant 2
Trace clustering
3
Process
discovery
Trace
clustering
Event log
Complexity
threshold
e.g. Size ≤ 30
Process model
Slice, Mine and Dice (SMD)
>?
4
1) Slice
2) Mine
3) Dice
Process
discovery
Trace
clustering
Event log
Complexity
threshold
e.g. Size ≤ 30
Slice, Mine and Dice (SMD)
>?
Process models
5
Process
discovery
Trace
clustering
Event log
Complexity
threshold
e.g. Size ≤ 30
Slice, Mine and Dice (SMD)
>?
Process models
6
Process model M3
Process model M2
A closer look…
7
F10
F11 F13
F14F12
M2 RPST of M2
F20
F21 F22
F24F23
F25
RPST of M3
Refined Process Structure Tree (RPST)
J. Vanhatalo, H. Volzer, J. Koehler: The Refined Process Structure Tree. Data Knowl. Eng., 2009
M3
8
M2
M3
F14 F25
M. Dumas, L. García-Bañuelos, M. La Rosa, and R. Uba. Fast Detection of Exact Clones in Business Process Model
Repositories. Information Systems, 2013
RPSDAG
F10
F11 F13
F14F12
RPST of M2
F20
F21 F22
F24F23
F25
RPST of M3
9
F14
M2
M3
S3
+
S3
+
Extracting exact clones
S3
Exact clones:
• SESE
• Non-trivial
• Identical
10
F12
F22M3
+
S3
+
S3
?
?
Extracting approximate clones
C. C. Ekanayake, M. Dumas, L. García-Bañuelos, M. La Rosa, and A. H. M. ter Hofstede. Approximate Clone
Detection in Repositories of Business Process Models, BPM 2012
Appr. clones:
• SESE
• Non-trivial
• Similar
• Unrelated
11
+
+
M2
Merging
algorithm
Fragment F12 of model M2
Fragment F22 of model M3
Configurable
gateway
Configurable
label
M. La Rosa, M. Dumas, R. Uba, and R. M. Dijkman. Business Process Model Merging: An Approach to Business
Process Consolidation. ACM TOSEM, 2013.
Merging approximate clones
S4
12
S4S3
+
The result…
13
+
+
+
Trace clustering
• M. Song, C.W. Gunther, and W.M.P. van der Aalst, Improving Process Mining
with Trace Clustering, J. Korean Inst. of Industrial Engineers 34(4), 2008
• R.P.J.C. Bose, W.M.P. van der Aalst, Trace Clustering Based on Conserved
Patterns: Towards Achieving Better Process Models, BPM 2009 Workshops
• A.K.A. de Medeiros, A. Guzzo, G. Greco, W.M.P. van der Aalst, A.J.M.M.
Weijters, B.F. van Dongen, D. Saccà. Process Mining Based on Clustering: A
Quest for Precision, BPM Workshops 2007
Discovery
• A.J.M.M. Weijters, J.T.S. Ribeiro. Flexible Heuristics Miner (FHM), CIDM,
2011.
Evaluation setup
Log Traces Events
Event
classes
Duplication
ratio
Motor 4,293 33,202 292 114
Commercial 4,852 54,134 81 668
BPI 2012 5,312 91,949 36 2,554
14
Evaluation – repository size and models number
S: Song et al.
B: Bose et al.
M: Medeiros et al.
• up to 64% reduction in repository size
• up to 66% reduction in # of top level process models
• up to 120 sub-processes extracted
Motor Comm BPI Motor Comm BPI
14%
22%
66%
15
64%
Evaluation – individual model complexity
Log Method
Size
CFC ACD Density CNC
Avg Min Max Savings (%)
Motor
S 22.75 4 37
22.8
12.07 2.71 0.07 1.26
S + SMD 17.57 4 37 10.07 2.34 0.11 1.21
B 20.01 4 37
9.8
9.97 2.51 0.08 1.2
B + SMD 18.04 4 37 10.05 2.38 0.11 1.2
M 15.73 3 49
-1.1
7.36 2.14 0.11 1.12
M + SMD 15.9 4 49 8.34 2.12 0.12 1.14
Commercial
S 24.07 6 34
22.4
13.65 2.96 0.06 1.32
S + SMD 18.67 2 34 11.34 2.49 0.1 1.24
B 21.11 2 34
20.3
11.04 2.65 0.07 1.23
B+ SMD 16.82 2 34 9.73 2.29 0.12 1.18
M 18.86 2 40
11.1
10.18 2.47 0.09 1.22
M + SMD 16.76 2 34 9.71 2.38 0.11 1.21
BPI
S 47.32 15 56
29.7
20.77 2.34 0.03 1.24
S + SMD 33.27 4 56 20.18 2.41 0.07 1.28
B 46.54 13 56
30.6
20.48 2.35 0.03 1.23
B + SMD 32.3 4 56 19.29 2.33 0.07 1.27
M 46.48 21 61
18.9
21.16 2.34 0.03 1.24
M + SMD 37.71 7 56 25.29 2.38 0.04 1.3
16
+ Seeks to discover process models that meet user-specified
complexity thresholds
+ Reduces repository size and top level models number
compared to trace clustering techniques
+ Preserves fitness, appropriateness & generalization of
models discovered from trace clusters
+ Little impact on structural complexity of process models
– Performance overhead
The SMD technique
17
Optimizations
• Parallelization of divisive trace clustering, discovery and
GED computation
• Incremental divisive trace clustering
• Sub-polygons and sub-bonds extraction
Complexity thresholds tuning
In the future…
18
Leave a Comment
You must be logged in to post a comment.