Mining Branch-Time Scenarios From Execution Logs
Description
This presentation was given at the International Conference on Automated Software Engineering (ASE 2013) in Palo Alto, November 2013.
We describe a technique for automatically extracting specifications from execution traces of an application. The particular specification that we extract are scenarios in the form of conditional existential Live-Sequence Charts (LSC), which are similar to UML Sequence Diagrams.
The technique is implemented in a tool and was evaluated on two real-life event logs.
Transcript
Dirk Fahland
David Lo
Shahar Maoz
Mining
Branching-Time Scenarios
Eindhoven University of Technology
Singapore
Management
University
Tel Aviv University
Dirk Fahland
David Lo
Shahar Maoz
Mining
Branching-Time Scenarios
Dirk Fahland
David Lo
Shahar Maoz
Mining
Branching-Time Scenarios
Understanding Existing Applications
fix bugs, add
features, test,
document, …
specification
understanding of objects and
object interplay
3
Understanding Existing Applications
fix bugs, add
features, test,
document, …
specification
Usually, there is no specification
or worse, if it exists, it is
outdated…
understanding of objects and
object interplay
4
Understanding Existing Applications
fix bugs, add
features, test,
document, …
specification
?
?
?
?
source code
understanding of objects and
object interplay
5
Understanding Existing Applications
Understanding applications from source
code is
• laborious
• fix bugs, add
time-consuming
specification
•features, test,
error prone
document, …
?
?
?
?
source code
understanding of objects and
object interplay
6
Specification Mining
automatically
extract
specification
?
?
?
?
source code
understanding of objects and
object interplay
7
Specification Mining from Event Logs
log
automatically
extract
specification
?
?
?
?
source code
understanding of objects and
object interplay
8
This talk
log
automatically
extract
specification
?
?
?
?
source code
understanding of objects and
object interplay
9
This talk
log
automatically
extract
specification
?
What is a good specification
language to get an overview of
?
?
how an application works?
?
source code
understanding of objects and
object interplay
10
Understanding Object Interplay
FTP server: How does Login/Logout work?
In object-oriented applications,
the hard part is to understand
how the different objects relate
to and interact with each other.
Here are some classes of an
FTP server. How does
login/logout work?
11
Understanding Object Interplay
An object of class A invokes
method onConnect() on an Scenario: Login/Logout
object of class B.
A
UserCmd
B
C
C
onConnect()
UserCmd
A
B
12
Understanding Object Interplay
Scenario: Login/Logout
A
UserCmd
B
C
C
onConnect()
onLogin()
setUser()
UserCmd
setLogout()
A
B
13
Understanding Object Interplay
Scenario: Login/Logout
A
UserCmd
B
C
C
onConnect()
onLogin()
setUser()
UserCmd
setLogout()
A
• non-local behavior:
multiple objects
• logically related
B
14
When does it happen?
Scenario: Login/Logout
This scenario tells us
which objects and
methods are involved in
login/logout.
A
UserCmd
B
C
onConnect()
onLogin()
setUser()
But it does not tell when
this scenario occurs in
the application.
setLogout()
15
When does it happen?
Scenario: Login/Logout
A
whenever the prechart
happens…
eventually the mainchart
will happen
UserCmd
B
C
onConnect()
onLogin()
setUser()
setLogout()
16
Linear-Time LSCs – Invariants
pre
Login
Login
There can be other behaviors
between the behaviors shown in
the LSC.
2 runs
17
Linear-Time LSCs – Invariants
pre
Login
Login
This run does not continue with
the complete main chart of the
LSC.
2 runs
18
Understanding Everything
FTP
download
FTP
delete
not everything is an invariant
alternative behaviors
FTP
rename
2 alternative runs
19
Linear-Time is Insufficient
scenario for FTP delete command
This LSC for the delete command
does not hold in every run.
2 runs
20
Branching Time
We merge all runs on their joint
prefixes into an execution tree.
execution tree
21
Branching-Time LSCs
[Sibay, Uchitel, Braberman ICSE 2008]
whenever the prechart happens
there exists a branch
where the mainchart happens
execution tree
22
Describing Alternatives
We can define an LSC for the download
command, that is alternative to delete.
execution tree
23
Describing Alternatives
… and also an LSC for the Rename
command.
execution tree
24
Describing Alternatives
execution tree
25
Describing Alternatives
execution tree
26
LSC Mining from Event Logs
log
automatically
extract
complete set of LSCs
(linear / branching)
We want to discover a set of
LSCs that can describe all
behaviors (or as much as
possible of the behaviors).
understanding of objects and
object interplay
27
Logs
log
automatically
extract
log method calls
complete set of LSCs
(linear / branching)
caller1, callee1, method1(…)
caller2, callee2, method2(…)
…
Each execution of the application gives
one trace. Run application multiple
times for a log.
28
Desired Outcome
log
log
=
tree
automatically
extract
complete set of LSCs
(occuring at least s times
= support)
29
Mining Algorithm
github.com/scenario-based-tools/sam/
tree
variant of [Lo, Maoz, Khoo ASE 2007]
30
Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
Starting from sequences of
length 1, recursively append
events and check if it occurs
often enough.
Efficient implementation
uses branch and bound and
some heuristics.
31
Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
From LSC with pre-chart
length 1 to LSC with mainchart length 1.
2. generate all
candidate LSCs
onConnect()
onLogin()
setLogin()
setLogout()
onConnect()
onLogin()
setLogin()
setLogout()
32
Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
3. test for each LSC
if satisfied ≥ c%
2. generate all
candidate LSCs
onConnect()
onLogin()
setLogin()
setLogout()
onConnect()
onLogin()
setLogin()
setLogout()
33
Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
3. test for each LSC
if satisfied ≥ c%
2. generate all
candidate LSCs
onConnect()
onLogin()
setLogin()
setLogout()
onConnect()
onLogin()
setLogin()
setLogout()
34
LSC Mining from Event Logs
log
automatically
extract
complete set of LSCs
(linear and branching)
What do branching
scenarios add to
specification mining?
understanding of objects and
object interplay
35
Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Branching LSC contain the linear LSC and some more
strictly branching LSC that were not found before.
Branching LSC are less frequent (lower support
threshold).
36
Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Branching LSC can explore more events of the log than
just Linear LSC.
37
Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Branching LSC are longer than Linear LSC. In other
words, they show more details for a particular behavior.
38
Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Running times for extraction are feasible.
Note that LSCs shown here are the LSCs left after
removing subsumed ones. Originally, the algorithm finds
around 6 million branching LSC in 685 seconds.
39
Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Columba mail client: 104 traces, 79 event types
s/c
Linear
LSC
covered
events
avg.
length
time
[s]
20
57
70%
4
159
10
205
72%
6
10/.5
163
78%
6
Branching
LSC
covered
events
avg.
length
time
[s]
57+1
71%
9
154
2191
205+53
75%
9
2055
2256
163+44
84%
6
2125
full data sets and results:
http://dx.doi.org/10.4121/uuid:aa7db920-aae6-4750-8975-cb739262f432
40
Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Columba mail client: 104 traces, 79 event types
s/c
Linear
LSC
covered
events
avg.
length
time
[s]
20
57
70%
4
159
10
205
72%
6
10/.5
163
78%
6
Branching
LSC
covered
events
avg.
length
time
[s]
57+1
71%
9
154
2191
205+53
75%
9
2055
2256
163+44
84%
6
2125
full data sets and results:
http://dx.doi.org/10.4121/uuid:aa7db920-aae6-4750-8975-cb739262f432
41
Linear vs. Branching: CrossFTP
What is the qualitative contribution
of branching LSC to specification
mining?
application life-cycle
from end to end
connect
login
logout
clean up
42
Linear vs. Branching: CrossFTP
application life-cycle
from end to end
short invariants of
individual FTP
commands
invariant of RENAME
43
Linear vs. Branching: CrossFTP
application life-cycle
from end to end FTP command
+
where triggered
short invariants of
individual FTP
commands
The branching LSC fills
the gap between large
and small invariants.
login
rename command
44
Linear vs. Branching: CrossFTP
application life-cycle individual FTP
individual FTP
FTP commands
from end to end all commands +
commands +
+where they are
where they are
can be triggered in
triggered
triggered
the same situation
short invariants of
individual FTP
commands
We found all ftp commands
supported by the server, as
alternative LSC.
45
Linear vs. Branching: CrossFTP
application life-cycle individual FTP
individual FTP
FTP commands
from end to end all commands +
commands +
+where they are
where they are
can be triggered in
triggered
triggered
the same situation
… and we could discover
cyclic behavior: after
rename, there could be
another delete command
short invariants of
individual FTP
commands
cycles: rename delete
46
Take Home Points
http://github.com/scenario-based-tools/sam/
log
• mining branching scenarios
alternatives, cycles
• combined with linear:
comprehensive specification
• future work:
visualizing results
distributed scenarios
complete set of LSCs
understanding of objects and
object interplay
47
about.me/dirk.fahland
@dfahland
Mining
Branching-Time Scenarios
Q&A …is branching time really necessary?
Yes, here is a linear LSC showing a disjunction
for continuing after the pre-chart.
if
then
delete
or
download
49
The full execution tree satisfies this Linear LSC with disjunction
Branching Time vs. Disjunction
and two branching LSCs describing the two alternatives in
separate LSCs.
50
Branching Time vs. Disjunction
Removing one branch
from the tree (the
execution of the
download command),
violates the branching
LSCs, but still satisfies
the disjunctive linear
LSCs (because only
one of them has to
hold).
51
Leave a Comment
You must be logged in to post a comment.