Presentations Process Modeling

Mining Branch-Time Scenarios From Execution Logs

Description

This presentation was given at the International Conference on Automated Software Engineering (ASE 2013) in Palo Alto, November 2013.

We describe a technique for automatically extracting specifications from execution traces of an application. The particular specification that we extract are scenarios in the form of conditional existential Live-Sequence Charts (LSC), which are similar to UML Sequence Diagrams.

The technique is implemented in a tool and was evaluated on two real-life event logs.

Transcript

Dirk Fahland
David Lo
Shahar Maoz

Mining
Branching-Time Scenarios

Eindhoven University of Technology
Singapore
Management
University

Tel Aviv University

Dirk Fahland
David Lo
Shahar Maoz

Mining
Branching-Time Scenarios

Dirk Fahland
David Lo
Shahar Maoz

Mining
Branching-Time Scenarios

Understanding Existing Applications

fix bugs, add
features, test,
document, …

specification

understanding of objects and
object interplay

3

Understanding Existing Applications

fix bugs, add
features, test,
document, …

specification

Usually, there is no specification
or worse, if it exists, it is
outdated…
understanding of objects and
object interplay

4

Understanding Existing Applications

fix bugs, add
features, test,
document, …

specification

?
?

?
?

source code

understanding of objects and
object interplay

5

Understanding Existing Applications
Understanding applications from source
code is
• laborious
• fix bugs, add
time-consuming
specification
•features, test,
error prone
document, …

?
?

?
?

source code

understanding of objects and
object interplay

6

Specification Mining

automatically
extract

specification

?
?

?
?

source code

understanding of objects and
object interplay

7

Specification Mining from Event Logs

log

automatically
extract
specification

?
?

?
?

source code

understanding of objects and
object interplay

8

This talk

log

automatically
extract
specification

?
?

?
?

source code

understanding of objects and
object interplay

9

This talk

log

automatically
extract
specification

?
What is a good specification
language to get an overview of
?
?
how an application works?
?
source code

understanding of objects and
object interplay

10

Understanding Object Interplay
FTP server: How does Login/Logout work?
In object-oriented applications,
the hard part is to understand
how the different objects relate
to and interact with each other.
Here are some classes of an
FTP server. How does
login/logout work?

11

Understanding Object Interplay
An object of class A invokes
method onConnect() on an Scenario: Login/Logout
object of class B.
A

UserCmd

B

C

C
onConnect()

UserCmd

A

B

12

Understanding Object Interplay
Scenario: Login/Logout
A

UserCmd

B

C

C
onConnect()
onLogin()

setUser()
UserCmd
setLogout()

A

B

13

Understanding Object Interplay
Scenario: Login/Logout
A

UserCmd

B

C

C
onConnect()
onLogin()

setUser()
UserCmd
setLogout()

A

• non-local behavior:
multiple objects
• logically related
B

14

When does it happen?
Scenario: Login/Logout

This scenario tells us
which objects and
methods are involved in
login/logout.

A

UserCmd

B

C

onConnect()
onLogin()

setUser()

But it does not tell when
this scenario occurs in
the application.

setLogout()

15

When does it happen?
Scenario: Login/Logout
A

whenever the prechart
happens…
eventually the mainchart
will happen

UserCmd

B

C

onConnect()
onLogin()

setUser()
setLogout()

16

Linear-Time LSCs – Invariants
pre
Login
Login

There can be other behaviors
between the behaviors shown in
the LSC.
2 runs

17

Linear-Time LSCs – Invariants
pre
Login
Login

This run does not continue with
the complete main chart of the
LSC.
2 runs

18

Understanding Everything

FTP
download

FTP
delete

not everything is an invariant
 alternative behaviors

FTP
rename

2 alternative runs

19

Linear-Time is Insufficient
scenario for FTP delete command

This LSC for the delete command
does not hold in every run.
2 runs

20

Branching Time

We merge all runs on their joint
prefixes into an execution tree.
execution tree

21

Branching-Time LSCs
[Sibay, Uchitel, Braberman ICSE 2008]

whenever the prechart happens

there exists a branch
where the mainchart happens
execution tree

22

Describing Alternatives
We can define an LSC for the download
command, that is alternative to delete.

execution tree

23

Describing Alternatives
… and also an LSC for the Rename
command.

execution tree

24

Describing Alternatives

execution tree

25

Describing Alternatives

execution tree

26

LSC Mining from Event Logs

log

automatically
extract
complete set of LSCs
(linear / branching)
We want to discover a set of
LSCs that can describe all
behaviors (or as much as
possible of the behaviors).
understanding of objects and
object interplay

27

Logs

log

automatically
extract
log method calls

complete set of LSCs
(linear / branching)

caller1, callee1, method1(…)
caller2, callee2, method2(…)

Each execution of the application gives
one trace. Run application multiple
times for a log.
28

Desired Outcome
log
log
=
tree

automatically
extract
complete set of LSCs
(occuring at least s times
= support)

29

Mining Algorithm
github.com/scenario-based-tools/sam/

tree

variant of [Lo, Maoz, Khoo ASE 2007]

30

Mining Algorithm
github.com/scenario-based-tools/sam/

tree

1. enumerate all sequences of events
occurring ≥ s times

onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words

Starting from sequences of
length 1, recursively append
events and check if it occurs
often enough.
Efficient implementation
uses branch and bound and
some heuristics.
31

Mining Algorithm
github.com/scenario-based-tools/sam/

tree

1. enumerate all sequences of events
occurring ≥ s times

onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words

From LSC with pre-chart
length 1 to LSC with mainchart length 1.

2. generate all
candidate LSCs

onConnect()
onLogin()
setLogin()
setLogout()

onConnect()
onLogin()
setLogin()
setLogout()
32

Mining Algorithm
github.com/scenario-based-tools/sam/

tree

1. enumerate all sequences of events
occurring ≥ s times

onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words

3. test for each LSC
if satisfied ≥ c%

2. generate all
candidate LSCs

onConnect()
onLogin()
setLogin()
setLogout()

onConnect()
onLogin()
setLogin()
setLogout()
33

Mining Algorithm
github.com/scenario-based-tools/sam/

tree

1. enumerate all sequences of events
occurring ≥ s times

onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words

3. test for each LSC
if satisfied ≥ c%

2. generate all
candidate LSCs

onConnect()
onLogin()
setLogin()
setLogout()

onConnect()
onLogin()
setLogin()
setLogout()
34

LSC Mining from Event Logs

log

automatically
extract
complete set of LSCs
(linear and branching)
What do branching
scenarios add to
specification mining?
understanding of objects and
object interplay

35

Experiments
 CrossFTP server: 54 traces, 50 event types
s

Linear
LSC

covered
events

avg.
length

time
[s]

Branching
LSC

covered
events

avg.
length

time
[s]

20

7

90%

7

3

7+0

90%

7

3

14

9

90%

5

31

9+12

95%

13

26

10

9

90%

5

1008

9+18

95%

18

685

Branching LSC contain the linear LSC and some more
strictly branching LSC that were not found before.

Branching LSC are less frequent (lower support
threshold).
36

Experiments
 CrossFTP server: 54 traces, 50 event types
s

Linear
LSC

covered
events

avg.
length

time
[s]

Branching
LSC

covered
events

avg.
length

time
[s]

20

7

90%

7

3

7+0

90%

7

3

14

9

90%

5

31

9+12

95%

13

26

10

9

90%

5

1008

9+18

95%

18

685

Branching LSC can explore more events of the log than
just Linear LSC.

37

Experiments
 CrossFTP server: 54 traces, 50 event types
s

Linear
LSC

covered
events

avg.
length

time
[s]

Branching
LSC

covered
events

avg.
length

time
[s]

20

7

90%

7

3

7+0

90%

7

3

14

9

90%

5

31

9+12

95%

13

26

10

9

90%

5

1008

9+18

95%

18

685

Branching LSC are longer than Linear LSC. In other
words, they show more details for a particular behavior.

38

Experiments
 CrossFTP server: 54 traces, 50 event types
s

Linear
LSC

covered
events

avg.
length

time
[s]

Branching
LSC

covered
events

avg.
length

time
[s]

20

7

90%

7

3

7+0

90%

7

3

14

9

90%

5

31

9+12

95%

13

26

10

9

90%

5

1008

9+18

95%

18

685

Running times for extraction are feasible.
Note that LSCs shown here are the LSCs left after
removing subsumed ones. Originally, the algorithm finds
around 6 million branching LSC in 685 seconds.
39

Experiments
 CrossFTP server: 54 traces, 50 event types
s

Linear
LSC

covered
events

avg.
length

time
[s]

Branching
LSC

covered
events

avg.
length

time
[s]

20

7

90%

7

3

7+0

90%

7

3

14

9

90%

5

31

9+12

95%

13

26

10

9

90%

5

1008

9+18

95%

18

685

 Columba mail client: 104 traces, 79 event types
s/c

Linear
LSC

covered
events

avg.
length

time
[s]

20

57

70%

4

159

10

205

72%

6

10/.5

163

78%

6

Branching
LSC

covered
events

avg.
length

time
[s]

57+1

71%

9

154

2191

205+53

75%

9

2055

2256

163+44

84%

6

2125

full data sets and results:
http://dx.doi.org/10.4121/uuid:aa7db920-aae6-4750-8975-cb739262f432

40

Experiments
 CrossFTP server: 54 traces, 50 event types
s

Linear
LSC

covered
events

avg.
length

time
[s]

Branching
LSC

covered
events

avg.
length

time
[s]

20

7

90%

7

3

7+0

90%

7

3

14

9

90%

5

31

9+12

95%

13

26

10

9

90%

5

1008

9+18

95%

18

685

 Columba mail client: 104 traces, 79 event types
s/c

Linear
LSC

covered
events

avg.
length

time
[s]

20

57

70%

4

159

10

205

72%

6

10/.5

163

78%

6

Branching
LSC

covered
events

avg.
length

time
[s]

57+1

71%

9

154

2191

205+53

75%

9

2055

2256

163+44

84%

6

2125

full data sets and results:
http://dx.doi.org/10.4121/uuid:aa7db920-aae6-4750-8975-cb739262f432

41

Linear vs. Branching: CrossFTP
What is the qualitative contribution
of branching LSC to specification
mining?
application life-cycle
from end to end

connect

login
logout
clean up

42

Linear vs. Branching: CrossFTP

application life-cycle
from end to end

short invariants of
individual FTP
commands

invariant of RENAME

43

Linear vs. Branching: CrossFTP

application life-cycle
from end to end FTP command
+
where triggered
short invariants of
individual FTP
commands

The branching LSC fills
the gap between large
and small invariants.

login
rename command

44

Linear vs. Branching: CrossFTP

application life-cycle individual FTP
individual FTP
FTP commands
from end to end all commands +
commands +
+where they are
where they are
can be triggered in
triggered
triggered
the same situation
short invariants of
individual FTP
commands

We found all ftp commands
supported by the server, as
alternative LSC.
45

Linear vs. Branching: CrossFTP

application life-cycle individual FTP
individual FTP
FTP commands
from end to end all commands +
commands +
+where they are
where they are
can be triggered in
triggered
triggered
the same situation

… and we could discover
cyclic behavior: after
rename, there could be
another delete command

short invariants of
individual FTP
commands

cycles: rename  delete
46

Take Home Points

http://github.com/scenario-based-tools/sam/

log

• mining branching scenarios
 alternatives, cycles
• combined with linear:
comprehensive specification

• future work:
visualizing results
distributed scenarios

complete set of LSCs

understanding of objects and
object interplay

47

about.me/dirk.fahland
@dfahland

Mining
Branching-Time Scenarios

Q&A …is branching time really necessary?
Yes, here is a linear LSC showing a disjunction
for continuing after the pre-chart.
if
then
delete
or

download

49

The full execution tree satisfies this Linear LSC with disjunction
Branching Time vs. Disjunction
and two branching LSCs describing the two alternatives in
separate LSCs.

50

Branching Time vs. Disjunction

Removing one branch
from the tree (the
execution of the
download command),
violates the branching
LSCs, but still satisfies
the disjunctive linear
LSCs (because only
one of them has to
hold).
51

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/mining-branch-time-scenarios-from-execution-logs/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×