Blog Posts BPMN DMN

More On DMN Data Validation

Blog: Method & Style (Bruce Silver)

This month we return to a topic I’ve written about twice before, data validation in DMN models.  This post, in which I will describe a third method, is hopefully the last word.

Beginning decision modelers generally assume that the input data supplied at execution time is complete and valid.  But that is not always the case, and when input data is missing or invalid the invoked decision service returns either an error or an incorrect result.  When the service returns an error result, typically processing stops at the first one and the error message generated deep within the runtime is too cryptic to be helpful to the modeler.  So it is important to precede the main decision logic with a data validation service, either as part of the same decision model or a separate one.  It should report all validation errors, not stop at the first one, and should allow more helpful, modeler-defined error messages.  There is more than one way to do that, and it turns out that the design of that validation service depends on details of the use case.

The first method, which I wrote about in April 2021, uses a Collect decision table with generalized unary tests to find null or invalid input values, as you see below.  When I introduced my DMN training, I thought this was the best way to do it, but it’s really ideal only for the simple models I was using in that training.  That is because the method assumes that values used in the logic are easily extracted from the input data, and that the rule logic is readily expressed in a generalized unary test.  Moreover, because an error in the decision table will usually fail without indicating which rule had the problem, the method assumes a modest number of rules with fairly simple validation expressions.  As a consequence, this method is best used when:

The second method, which I wrote about in March 2023, takes advantage of enhanced type checking against the item definition, a new feature of DMN1.5.  Unlike the first method, this one returns an error result when validation errors are present, but it returns all errors, not just the first one, each with a modeler-defined error message.  Below you see the enhanced type definition, using generalized unary tests, and the modeler-defined error messages when testing in the Trisotech Decision Modeler.  Those same error messages are returned in the fault message when executed as a decision service.  On the Trisotech platform, this enhanced type checking can be either disabled, enabled only for input data, or enabled for input data and decisions.

This method of data validation is avoids many of the limitations of the first method, but cannot be used if you want the decision service to return a normal response, not a fault, when validation errors are present.  Thus it is applicable when:

More recently I have been involved in a large data validation project in which neither of these methods are ideal.  Here the input data is a massive data structure containing several hundred elements to be validated, and we want validation errors to generate a normal response, not a fault, with helpful error messages.  Moreover, data values used in the rules are buried deep within the structure and many of them are recurring, so simply extracting them properly is non-trivial.  Think of a tax return or loan application.  Also, even with properly extracted values, the validation rules themselves may be complex conditions involving many variables.

For these reasons, neither of the two methods described in my previous posts fits the bill here. The fact that an element’s validation rule can be a complex expression involving multiple elements rules out the type-checking method and is also a problem with the Collect decision table.  Decision tables also add the problem of testing.  When you have many rules, some of them are going to be coded incorrectly the first time, and if a rule returns an error the whole decision table fails, so debugging is extremely difficult.  You need to be able to tell, when a rule fails to return the expected result, if it is because you have incorrectly extracted the data element value or you have incorrectly defined the rule logic.  Your validation method needs to separate those concerns.

This defines a new set of requirements:

The third method thus has a completely different architecture:

While possibly overkill for simple validation services, in complex validation services this method has a number of distinct advantages over the other two:

Let’s walk through this third data validation method.  We start with the Extraction service.  The input data Complex Input has the structure shown here:

In this case there is only one non-repeating component, containing just two child elements, and one repeating component, also containing just two child elements.  In the project I am working on, there are around 10 non-repeating components and 50 repeating components, many containing 10 or more child elements.  So this model is much simpler than the one in my engagement.

The Extraction DRD has a separate branch for each non-repeating and each repeating component.   Repeating element branches must iterate a BKM that extracts the individual elements for that instance.

The decisions ending in “Elements” extract all the variables referenced in the validation rules.  These are not identical to the elements contained in Complex Input.  For example, element A1 is just the value of the input data element A1, but element A1Other is either the input data element A2,  if the value of A1 is “Other”, or null otherwise.

Repeating component branches must iterate a BKM that extracts the variable from a single instance of the branch.

In this case, we are extracting three variables – C1, AllCn, and Ctotal – although AllCn is just used in the calculation of Ctotal, not used in a rule.  The goal of Extraction is just to obtain the values of variables used in the validation rules.

The ExtractAll service will be invoked by the Rules, and again the model has one branch for each non-repeating component and one for each repeating component.  Encapsulating ExtractAll as a separate service is not necessary in a model this simple, but when there are dozens of branches it helps.

Let’s focus on Repeating Component Errors, which iterates a BKM that reports errors for a single instance of that branch.

In this example we have just two validation rules.  One reports an error if element C1 is null, i.e. missing in the input.  The other reports an error if element Ctotal is not greater than 0.  The BKM here is a context, one context entry per rule, and all context entries have the same type, tRuleData, with the four components shown here.  We could have added a fifth component containing the error message text, but here we assume that is looked up from a separate table based on the RuleID.

So the datatype tRepeatingComponentError is a context containing a context, and the decision Repeating Component Errors is a collection of a context containing a context.  And to collect all the errors, we have one of these for each branch in the model.

That is an unwieldy format.  We’d really like to collect the output for all the rules – with isError either true or false – in a single table.  The decision ErrorTable provides that, using the little-known FEEL function get entries().  This function converts a context into a table of key-value pairsand we want to apply it to the inner context, i.e. a single context entry of Repeating Component Error.

It might take a minute to wrap your head around this logic.  Here fRow is a function definition – basically a BKM as a context entry – that converts the output of get entries() into a table row containing the key as a column.  For non-repeating branches, we iterate over each error, calling get entries() on each one.  This generates a table with one row per error and five columns.  For repeating branches, we need to iterate over both the branches and for the errors in each branch, an iteration nested in another iteration.  That creates a list of lists, so we need the flatten() function to make that a simple list, again one row per error (across all instances of the branch) and five columns.  In the final result box, we just concatenate the tables to make one table for all errors in the model.

Here is the output of ErrorTable when run with the inputs below:

ErrorTable as shown here lists all the rules, whether an error or not.  This is good for testing your logic.  Once tested, you can easily filter this table to list only rules for which isError is true.

Bottom Line: Validating input data is always important in real-world decision services.  We’ve now seen three different ways to do it, with different features and applicable in different use cases.

 

 

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/more-on-dmn-data-validation/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×