Business Management Presentations Process Analysis Process Management

Process Mining for ERP Systems

Description

Presentation held at the 1st Workshop for Data- and Artifact-Centric Processes, co-located with BPM 2012, September 2012.

Transcript

Erik Nooijen, Boudewijn v. Dongen, Dirk FahlandProcess Mining for ERP Systems Process Discovery process event process discovery log model algorithm c1: A B C D E assumptions c2: A C B D E • case = sequence of events of this case c3: A F D E • cases are isolated: event A in c1 happens only in c1 (and not in c2) … • cases of the same process • one unique case id, • each event associated to exactly one case id PAGE 1 Typical Process in an ERP System Manufacturer Material A Material B order Material B Material B product X orderAlice materials ACME Inc. Material B Material A order Material C Material C product Y orderBob materials Build to Order Mega Corp. PAGE 2 n-to-m relations  database process process discovery model algorithmid attributes time-stamp attributes ProductOrder CustomerpoID cust. … created processed built shipped cust. address …po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 Alice … …po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 Bob … … relations data attributes OrderedMaterial id attributes MaterialOrderpoID moID type added moID suppl. … completed sent receivedpo1 mo3 B 30-08 13:13 mo3 ACME 30-08 13:15 30-08 14:15 01-09 9:05po1 mo4 A 30-08 13:14 mo4 MEGA 30-08 13:17 30-08 16:12 01-09 10:13po2 mo3 B 30-08 13:15po2 mo4 C 30-08 13:16 relations PAGE 3 Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DBProductOrder – cust 1 -… • events stored as time-stamped- poID- cust attributes in tables- created OrderedMat. MaterialOrder- processed – poID- built 1 – moID – moID • multiple primary keys- shipped 1..* – supplier  multiple notions of case – type 1..* – completed – added 1 – sent – received • tables are related  one event related to multiple cases PAGE 4 Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DBProductOrder – cust 1 -… • events stored as time-stamped- poID- cust attributes in tables- created OrderedMat. MaterialOrder- processed – poID- built 1 – moID – moID • multiple primary keys- shipped 1..* – supplier  multiple notions of case – type 1..* – completed – added 1 – sent – received • tables are related  one event related to multiple cases PAGE 5 Outline process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 6 Find Artifact Schemas process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 7 Step 0: discover database schema document schema vs. actual schema  identify • column types (esp. time-stamped columns) • primary keys • foreign keys various (non-trivial) techniques available key discovery is NP-complete in the size of the table(s) result: PAGE 8 Step 1: decompose schema into processes= schema summarization find: 1. sets of corresponding tables 2. links between those ProductOrder MaterialOrder PAGE 9 Automatic Schema Summarization= group similar tables through clustering define a distance between any 2 tables • by relations • by information content tables that are close to each other  same cluster # of clusters: user input PAGE 10 Automatic Schema Summarization1. structural distance A between tables 1 2 fanout: 1 = (2+0)/2 fanout ~ avg. # of child fanout: 1 records related to the fanout: 2 same parent record A B A B A B 1 X 1 X 1 X 2 Y 1 Y 1 Y 2 Z 2 U PAGE 11 Automatic Schema Summarization1. structural distance A between tables 1 2 fanout: 1 fanout ~ avg. # of child fanout: 1 m.fr: 2 = 1/ (1/2) records related to the m.fr: 1 fanout: 2 same parent record m.fr: 1 A B A B A B matched fraction ~ 1 X 1 X 1 X 1 / (fraction of records in 2 Y 1 Y 1 Y parent with matching child 2 Z record) 2 U PAGE 12 Grouping by Clustering1. structural distance2. information distance importance of each table = entropy (is maximal if all records are different) distance: 2 tables with high entropies  large distance3. weighted distance by structure + information4. k-means clustering: most important table of cluster k clusters based on = table with least distance to all  key attribute of the cluster weighted distance PAGE 13 Artifact Schema  Artifact Log process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 14 Log Extraction cluster = set of related tables + primary key of most important table case id poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15po2: po2 mo4 C 30-08 13:16 PAGE 15 Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, …) po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 16 Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 17 Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 18 Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 (added, poID=po1, time=30-08 13:13, moID=mo3, …)po2 mo4 C 30-08 13:16 refers to artifact “MaterialOrder” PAGE 19 Outline process model compose by primary foreign-key relations decompose by primary keys model f. log f. discovery order log f. model f. order quote quote discovery PAGE 20 Resulting Model(s) Product Order Material Order 1..* added create completed processed added 1..* sent built received shipped (addded, poID=po1, …, moID=mo3) PAGE 21 Implementation & Evaluation prototype tool • input: relational database (via JDBC), .csv tables • steps − discover database schema (types, keys, relations) − discover artifact schema − by k-means clustering − by user picking tables − extract logs  ProM PAGE 22 Evaluation: SAP System of Sligro > 300 tables, > 40 GiB of data schema extraction time-stamp attributes: 15 hrs primary keys: 4 hrs foreign keys: 5 hrs (single col)/ 6 days (double col.) clustering entropies: 17 hrs table distances: 5 hrs clustering: a few seconds ~20 different artifacts found largest: 47 tables, 869 columns log extraction extract 1000 traces of > 246,000 events query database: 1 hrs write log file: 32 hrs PAGE 23 Sligro: Artikel lifecycle model PAGE 24 Open issues performance • key discovery: NP-complete in R (# of records) • foreign key discovery: NP-complete in R2 • problem is in the “hard part” of NP •  sampling of data, domain knowledge, semi-automatic requires good database structure • proper relations, proper keys • otherwise wrong clusters are formed • events don’t get right attributes •  semi-automatic approach events shared by multiple cases… working on it… PAGE 25 Erik Nooijen, Boudewijn v. Dongen, Dirk FahlandProcess Mining for ERP Systems

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/process-mining-for-erp-systems/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×