DMN: Validating Data Input
Blog: Method & Style (Bruce Silver)
Last month we discussed a valuable low-code implementation pattern using Business Automation Services to create event-driven database apps in the cloud. Upon receipt of a business event – a REST API call – the pattern involves a straight-through process that retrieves various database records, calls a decision service to generate additional records, and then inserts the new records in the database. The beauty of this pattern is that by using BPMN, DMN, and OData, subject matter experts can create executable implementations themselves without developer resources, greatly increasing business agility. But there is a gotcha: Debugging errors in a BPMN process is not as easy as debugging a DMN model. In this month’s post, we’ll discuss how to minimize the errors.
The most common problems I’ve encountered are “Bad Request” errors on the database inserts and occasional runtime errors in the decision task. The error message tells you which task had the error but little more. Typically the cause is one of the following problems in the data sent to the task:
- It does not match the schema of the operation input parameter.
- It contains null values where a non-null is expected.
- It contains an empty list where a non-empty list is expected.
The usual source of these problems, although it is not the only one, is the original business event. It could omit a required element, contain an unsupported value or datatype for that element, provide a list where an item is expected (or vice versa), or fail to find expected matching records in the database. Since the event typically originates from an external client, you have no control over it. The Business Automation service specifies that the business event cannot be null, but it does not specify which components of the event must be populated, nor specific constraints on component values. So to avoid these runtime errors, you need to validate the business event up front.
The diagram above is a slight tweak of the one I used last month. In the new variant, we immediately follow receipt of the business event with a DMN-based decision service that checks whether all required elements are populated and that all business event components – whether required or not – satisfy the data requirements of the subsequent logic. If any validation errors are found, the process exits with error messages that let you find and fix the problems. Following that, we need one or more service tasks that query the existing tables to ensure that identifiers in the business event can be matched to those in the existing tables. Again, the process exits if expected matching records cannot be found. If both validation steps are successful, the process proceeds with its normal logic.
Data Validation Service
Let’s start with the data validation decision service, implemented as a single DMN model. DMN models can do data validation, but creating them takes you into the lesser-known corners of the standard: the get entries() context function, generalized unary tests, the instance of operator, and the matches() string function. Data validation typically involves two types of decisions: An iteration literal expression that tests each component in a specified list, and a decision table with Collect hit policy with rules that test individual components. Both types of decision return a list of error messages detailing the problem.
Testing for Missing Values
A basic check ensures that all components of the business event that are expected by the service logic or required by the database schema are provided, i.e., not null in the business event. If the component is not a collection and is omitted from the event, its FEEL value is null. We can create a variable Required Elements that lists the names of all required components that are not collections, and use it in the following expression on the data input Event:
for x in get entries(Event)[list contains(Required Elements, x.key)][item.value=null] return "Event component " + x.key + " may not be null."
There’s a lot to upack in that expression. The function get entries() operates on a context, or data structure, and turns it into a two-column table. The first column, key, holds the name of the component. The second column, value, holds its value. This allows you to iterate over the components of the structure, handy if the test is the same for all of them. Here we first filter get entries(Event) with the constraint that the name of the component (x.key) is contained in the list Required Elements. On top of that we add the filter that the required item value is null in the business event. Ideally no components of the business event survive this double filter and the iteration returns an empty list, but if any member of Required Elements is null, it generates an error message.
If any components of the business event are collections, the logic is nearly the same:
for x in get entries(Event)[list contains(Required Collections, x.key)][item.value=] return "Event component " + x.key + " may not be an empty list."
Testing for Incorrect Type
We use the FEEL operator instance of to test whether the value of a business event component conforms to the expected FEEL base type: string, number, boolean, date, or date and time. But that just tests the base type. Testing whether the datatype of a business event component matches all the requirements of either the decision service or the database schema is a little harder, because you need to avoid runtime errors in the validation logic itself. For example, applying a string function to a non-string value can give a runtime error, as can applying a number function to a string value, etc.
If you have multiple components of the same base type, you can test conformance to the expected base type with iteration over get entries() as shown earlier:
for x in get entries(Event)[list contains(String Elements, x.key) and not(x.value instance of string)] return "Event component " + x.key + "must be a string."
To check for datatype errors beyond base type conformance, more often you will use a Collect decision table with a Boolean test of the type in the input entry. Each rule returns an error message if its input entry evaluates to true. But you need to be careful that this input entry itself does not give a runtime error. For example, the following Boolean test of the component Event.comp1, which is expected to be a non-empty string, is incorrect:
not(Event.comp1 instance of string) or (Event.comp1 instance of string and string length(Event.comp1)=0)
The first part tests the type and the second part tests some constraint on the type. If Event.comp1 is a number, you might expect the expression to return true, but actually it gives a runtime error, because the argument of string length() may not be a number. Instead you need to write it so that the string length() function is executed only in the case that Event.comp1 is a string, thus avoiding the runtime error:
if not(Event.comp1 instance of string) then true else string length(Event.comp1)=0
Alternatively, if you have a separate test that tests whether Event.comp1 is a string, the string length() test can be written as
if Event.comp1 instance of string and string length(Event.comp1)=0 then true else false
In practice, I am more likely to do it this way.
Generalized Unary Tests
In the validation service, Boolean expressions like this are used as input entries in a decision table where the output entries are the error messages. The decision table has hit policy Collect, so its output is a list of error messages for which the input entry evaluates to true. The ability to use complete FEEL expressions in a decision table input entry requires the special generalized unary test syntax, in which the column heading (input expression) is replaced in the input entry by the ? character. In our case, the column heading is Event, and each decision rule tests a specific component. For example, the test above might be written as follows:
if not(?.comp1 instance of string) then true else string length(?.comp1)=0
with the associated output entry
"Event.comp1 must be a non-empty string."
In a similar vein, here are some additional examples:
- The database or decision model requires a positive integer.
if not(?.comp2 instance of number) then true else ?.comp2 != floor(?.comp2) or ?.comp2<0
- The database or decision model requires a string with min length of 1 and max length of 10.
if not(?.comp3 instance of string) then true else not(string length(Event.comp3) in [1..10])
- The database or decision model requires a string of 9 numeric digits, such as a Tax ID.
To test for specific string formats, you can use the matches() function, which compares the string value to a regular expression. Regular expressions are beyond the experience of most subject matter experts, but there are many good resources available online. Google them.
Testing for Item Definition Constraint Violations
The instance of operator can only test conformance to FEEL base types. It cannot check whether the value conforms to the constraints of an item definition, such as an enumeration of allowed values or defined numeric range.
- The item definition specifies enumerated values.
Unfortunately, there is no FEEL function or operator that can introspect the item definition to obtain the list of enumerated values; in the decision rule, you need to enter them manually as a list and use the list contains function. For example, if the allowed values are a, b, or c, you can use
list contains(["a", "b", "c"], ?.comp6) = false
- The item definition specifies a numeric range.
If the item definition specifies a number or date in a certain range, you can validate using the in operator:
if ?.component instance of number = false then true else ?.comp6 in [1..10] = false
- The service requires particular values.
Sometimes a value allowed by the item definition is not allowed in this particular service. For example, while Event.TransactionType allows the values “Buy”, “Sell”, “Interest”, “Dividend”, or “Journal”, the service processing a Buy event only allows the value “Buy”. In that case, the input entry becomes
?.TransactionType != "Buy"
Data Consistency Errors
Another source of errors is inconsistent numerical data. For example, you might have Event.Quantity, Event.UnitPrice, and Event.Amount, where Event.Amount should equal the product of the other two. If they don’t, the data is not consistent, and best to trap it up front. You need to make sure that all three components are numbers and not null. The test is:
if (every x in [?.Quantity, ?.UnitPrice, ?.Amount] satisfies x instance of number) then abs(?.Quantity*?.UnitPrice - ?.Amount) > tolerance else false
where tolerance is the allowed amount of discrepancy.
Avoiding Runtime Errors in the Error Message
Consider the following error message when Event.component is not an allowed value:
"Event.component " + string(Event.component) + " is not an allowed value."
It looks ok, but actually if Event.component is null, this string concatenation will give you a runtime error! To solve this, I use a BKM stringn(val), defined as follows:
if val != null then string(val) else "null"
So the correct error message is
"Event.component " + stringn(Event.component) + " is not an allowed value."
Testing for Matching Database Records Not Found
Even if the value of Event.component is allowed by the database and decision model, it could still be incorrect if it fails to match some existing database record. For this, you cannot use the validation decisions described above. Instead, you need a service task to query the database for the matching record or records. If it is not found, you could receive either an error or an empty list, depending on the query service employed. If an error is returned, you can use an Error boundary event on the service task. If an empty list, you can test the result in a gateway with the condition
count(MatchingRecords) = 0
Even though all these validation tests seem like a bit of work – and they are – the goal is to avoid runtime errors, especially “Bad Request”, which – trust me – is usually a larger bit of work to find and fix. With that in mind, let’s take a look at a simple example.
We’re building a Stock Trading app that receives various types of Business Event (Buy, Sell, etc.) and updates a variety of tables as a result. All types of Business Event share the same item definition tEvent, but each type triggers a different service that updates the tables in a particular way. The diagram above illustrates the service for a Buy Event. The main steps in the service involve logging the event, retrieving the position for that account and security, generating a new position record, and inserting it in the database. But to avoid runtime errors, first we need to validate Buy Event and make sure the security it references is one we support. We focus here on the decision task Validation Errors, which looks for errors in the data input Buy Event. It invokes the decision service, also called Validation Errors, created in DMN, which outputs a list of validation errors… ideally an empty list.
Buy Event is FEEL type tEvent, shown above. For this service, certain components are required, others optional. Some have allowed values enumerated by the item definition. The validation DMN model is shown below:
I’ve broken out the validation into three separate decisions, looking for Missing Required Elements, Base Type Errors, and Element Validation Errors, respectively. The decision Validation Errors merely concatenates the error lists of those three.
Missing Required Elements iterates over the get entries() list, filtered by Required Elements that are null, as discussed earlier.
Alternatively, we can use Trisotech’s iteration boxed expression, which for some users makes the logic easier to understand:
Base Type Errors similarly iterates over get entries() to look for mismatch of component values to the expected base type for string, number, and date components, and concatenates the errors into a single list.
Again, we could use the iteration boxed expression instead:
Element Validation Errors is a Collect decision table that tests the values of event components individually. It does not return an error if the base type is incorrect, as we have another decision that does that, but it does need to make sure that the type is consistent with any FEEL functions and operators used in the logic. Note the generalized unary test syntax of the input entries and the if-then-else syntax of certain rules in order to avoid runtime errors. Also note the use of the BKM stringn() in some of the output entries, in case the component value is null.
When we run this model in Execution/Test with the Business Event components shown on the left below, we get the error messages shown on the right.
Using the Trisotech decision service wizard we define the service Validation Errors and publish it to the cloud.
The DMN expressions used in data validation are a little different from those you normally encounter, but once you get in the habit of validating business automation service inputs, you will quickly get used to them. Validation is a bit of extra work, but avoiding the headache of runtime errors makes the effort worthwhile.