bpmn 1 x process management blog posts

Business Intelligence Built on Metrics – Part II.

Blog: IMIXS-WORKFLOW Blog

In Part I. of this series, we explored how metrics can bridge the gap between dynamic business processes (BPM) and business intelligence (BI). Now, in Part II, we dive deeper into the practical implementation: How to implement a Metric service, how to select the right metrics, integrate them into your BI tools, and ensure they deliver actionable insights?

The Technical Stack

Our solution combines several powerful open source technologies, each chosen for its unique strengths in building a scalable and efficient Business Intelligence system.

  • Eclipse Microprofile Metrics: Serving as the application part translating business data into metrics. This framework allows you to implement metrics based on the widely used Prometheus metric format in an easy way.
  • Prometheus: As our time-series database, Prometheus excels at storing and querying large volumes of metric data in real-time, making it ideal for monitoring and analysis.
  • Grafana: For visualization, Grafana transforms raw data into intuitive dashboards, enabling teams to quickly interpret trends and make data-driven decisions.

In the following sections, we’ll dive deeper into how each piece of this stack works and how you can leverage it to build your own metrics-driven BI solution.

These components also seamlessly integrated into Imixs-Office-Workflow, an open source business process management platform.

Metric Definition Layer in Practice

In Part I, we discussed the importance of defining metrics that align with your business processes. Now, let’s apply this concept to a practical example: tracking customer balances in an invoicing and payment workflow. This scenario requires monitoring outstanding payments across different currencies, and our metric definitions need to reflect these dynamics.

In a typical invoicing process, events like “invoice created,” “payment received,” and “invoice canceled” directly impact a customer’s balance. To capture these changes, we can define a service that builds and updates the corresponding business metrics. But how do we implement this in practice? Let’s dive into the technical details and explore how to collect and process these metrics using Jakarta EE.

Collecting Metrics with Jakarta EE

In Jakarta EE we can easily build a metric service for customer balance using the Eclipse Microprofile Metric API. Assuming we we have a kind of Event signal the change in our customers invoice workflow we can build a metric on the fly. See the following example:

Tagging Metrics

In the simplified example above, we demonstrated how to measure a customer’s balance using Eclipse MicroProfile Metrics. The metric is bound to the customer ID and enriched with tags like country and currency. Tags are a powerful way to add context to your metrics, enabling flexible grouping and filtering during analysis.

When defining tags, a good rule of thumb is to use the most granular key (e.g., customerId) and tag additional attributes that provide meaningful context. For example, when tracking a customer’s balance, you might include tags like:

  • currency
  • department
  • product group

This approach allows you to not only track individual customer balances but also aggregate metrics by currency, department, or product group. For instance, you could analyze the total outstanding balance for all customers in the “Marketing” department or filter by a specific product group.

Example in Prometheus Format

Here’s how the metric might look in Prometheus format:

In this example:

  • The metric application_dbtr_balance represents the customer’s balance.
  • Tags like currency, department, and productgroup provide additional context.
  • The metric can be queried and aggregated in various ways, such as filtering by currency="EUR" or grouping by department.

Data Collection with Imixs-Metrics

The Imixs-Workflow engine already provides a powerful extension called Imixs-Metrics, which leverages the Eclipse Microprofile Metric API. This integration allows us to collect process metrics in real-time as workflows execute. The Microprofile Metric API provides a standardized way to expose business process metrics from our application.

Storage and Integration with Prometheus

Prometheus acts as our time-series database, perfectly suited for storing metric data. It regularly scrapes metrics from our application’s endpoints and stores them efficiently for later analysis. Its powerful query language allows us to analyze trends and patterns in our process metrics.

Prometheus can be integrated as a Docker container easily. This is an example of a docker-compose.yaml file:

This example integration binds a data volume to store the metrics and prometheus config to scrape the metric data.

From the Prometheus Dashboard you can test the data within your web browser:

Prometheus automatically stores the transformed data in a star schema with:

  • Process instances as facts
  • Time, resource, and business context as dimensions
  • Pre-calculated metrics for common analyses

Using long Scrape Intervals

Business metrics did not change as often as technical metrics in a server system. This means you rarely have changes in the metrics. For example, it doesn’t make sense to check the balance of a customer account every 10 seconds. A time interval of 1 hour or more is sufficient here. This also reflects to the size of our Prometheus database.

If your metrics change rarely, you can increase the scrape_interval to an hour or further e.g., to 7200s (2 hours) or 14400s (4 hours). This further reduces the number of stored data points.

Here is an example of a Prometheus scrape job that collect the metrics one in 2 hours, which means we store 12 data points in a day:

Optimize Storage in Prometheus

  • Since your metrics change infrequently, you should adjust the retention settings in Prometheus to save storage space. The retention time can be set to 2 years with the following environment variable:

Here is an example how to setup Prometheus in Kubernetes with a custom config map and a custom retention time:

Visualization Using Grafana

The last part in our architecture is the visualization layer. Grafana is a great open soruce tool to visualize metrics in custom ways. We can connect Grafana to our Prometheus instance and provides rich visualization capabilities. We can create dashboards that show:

  • Real-time customer balance overviews
  • Trend analysis of payment behaviors
  • Currency distribution of outstanding invoices
  • Alert triggers for overdue payments

Queries with long Scrape Intervals

In case your scrape intervals are very long (more than a 5 minutes) like in our setup explained before, you need to be careful collecting instant queries. It is recommended to align the ‘Scrape Interval’ from your Prometheus Datasource configured in Grafana to the scrape interval configured in your prometheus job. You can do this in Grafana by adjusting the advance settings:

Using Instant Queries

To display the current data in your BI Board you need to query type called ‘Instance Query’. An instant query means you run the query against a single point in time. For this query, the ‘to’ time is used”.

Using last_over_time() or max_over_time() is the correct solution, as these functions return the last known value within a time range. Whether you should use max_over_time() or last_over_time(), depends on what you want to achieve with the data. Here is a short summary to make the best choice for your case:

  • last_over_time():
    • Returns the last known value within the specified time period.
    • This is useful if you want to see the most current value, whether it’s the highest or lowest value.
    • For example, if you want to view the current balance, last_over_time() is the right choice.
  • max_over_time():
    • Returns the maximum value within the specified time period.
    • This is useful if you want to see the highest value in a specific time period.
    • For example, if you want to see the highest balance in the last 24 hours, max_over_time() is the right choice.

So in case your metrics are change infrequently (for example, once a week), it’s likely that you’ll want to see the current status. In this case, last_over_time() would be a better choice because it returns the last known value that corresponds to the current balance.

max_over_time() would only make sense if you want to show the highest balance in a specific time period (e.g. the last 7 days). That’s probably not the goal in all cases.

Example to query a vendor balance:

To view the current balance, that changes only one a week you can use

[2h]: The time period should be at least as large as your scrape_interval (1 hour) to ensure that a value is always returned.

That means:

  • last_over_time(): Use this to view the current state of the balances.
  • max_over_time(): Use this only if you want to see the highest balance in a given time period.

Optimizing Prometheus Queries

In case of business metrics the following tips can help to optimize your prometheus queries

Be Aware of staleness in Prometheus

  • Prometheus marks metrics as “stale” if they are not updated for more than 5 minutes. However, since your scrape_interval is 3600s, the metrics will be marked as “stale” between scrapes.
  • This is normal and not a problem, as long as you use functions like last_over_time() that can handle such data.

Use for Clause in Alerting Rules

  • If you create alerts based on these metrics, use the for clause to avoid triggering alerts due to missing data.
  • Example:

Optimize Grafana Queries

  • Use last_over_time() with a time range that is at least as large as your scrape_interval.
    Example:

Adjust Grafana Dashboards

  • When creating dashboards that display these metrics, use visualizations that work well with infrequently updated data, such as Stat Panels or Tables.
  • Avoid visualizations like Graphs, which expect continuous data.

Putting It All Together

The beauty of this architecture lies in its modularity and scalability. Each component plays a specific role while working seamlessly together. The Imixs-Metrics component generates standardized metrics, Prometheus efficiently stores them, and Grafana makes them actionable through powerful visualizations.

By combining these powerful tools, we’ve created a robust platform for business intelligence that works harmoniously with process-oriented systems. The solution maintains the flexibility needed for business processes while providing the structured analysis capabilities of traditional BI systems.