Blog Posts Business Management

The Open Business Data Lake Standard, Part VII

Blog: Capgemini CTO Blog

In my previous blog posts (Part I,  Part II,  and Part III) about the ‘Open Business Data Lake Conceptual Framework (O-BDL) I introduced its background, concept, characteristics and platform capabilities. In Part IV, Part V and Part VI I compared a Data Lake with other data processing platforms, described how an O-BDL should work and defined possible business scenario’s which can make use of an O-BDL. In this part I’ll describe the O-BDL in more detail with regards to the O-BDL data and data ingestion concept.

In Part V I introduced the following  process diagram (applying The ArchiMate® Enterprise Architecture Modeling Language) to describe how an O-BDL should work.

Based on this process the O-BDL data and data ingestion concepts are described in more detail.

The Data concept within an O-BDL
The data concept within an O-BDL is shown in the following diagram:


As the diagram shows, data can be either structured, semi-structured, or unstructured (database tables, binary raw data from sensors, images and videos from cameras, tweets, documents/files etc.). Metadata is data describing data and represents key inputs for information governance, especially data quality, data confidentiality and discovery. Examples of metadata are:

An event is a specific structured data piece that has a date and time of occurrence. An event can contain additional data pieces, especially semi-structured or unstructured data. A stream represents a flow or succession of ordered events. The order of “recorded” events in the stream does not necessarily reflect the order of occurrence in real life. When considering or processing a stream, events shall be immutable.

An O-BDL favors two types analysis streams:

  • Batch streams can consume very large data sets but can potentially take time (hours).
  • Real-time streams can deliver insights very quickly (sub-second latency) but they can’t leverage all kinds of analytics.

Insights are data items that typically represent the added value of an O-BDL. They are produced by successive distillation steps executing analytics in an O-BDL. Real-time insights are particular insights that are produced with a very low latency by real-time analyses consuming events or streams of data augmented by data stored in an O-BDL.

The Data Ingestion concept within an O-BDL
The data ingestion concept is shown in the following diagram:

Batch ingestion is the most common way of acquiring data within an O-BDL, meaning creating new data sets. It consists of acquiring a large number of data items that were previously existing elsewhere in the IT landscape. Loading (in a few hours) 30 years of customers’ orders is an example of batch ingestion. Implementations of an O-BDL should be designed to be able to execute multiple sustained batch ingestions at the same time.

Real-time ingestion is dedicated to processing streams or events, which are structured and generally small data. An O-BDL is designed to be able to execute multiple sustained real-time ingestions at high velocity (thousands of values/events per second).

Micro-batch ingestion implements a “bridge” between real-time and batch analyses. It turns streams of events into data sets that can be analyzed as historical data for very long timeframes.

The ingestion of metadata can be done in multiple ways, depending on the nature of the data but also on its automation or not. The simple way is to automatically extract metadata from data, and create metadata at the same time data is ingested. In some cases, the metadata extraction consists of several processing steps, some of them being performed asynchronously to the ingestion of data itself, following a metadata enrichment process that is implemented as a distillation step. Metadata enrichment can also happen through the action components and real-time analysis.

In the next (eighth) blog post I’ll elaborate on the O-BDL data processing concept.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/the-open-business-data-lake-standard-part-vii/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×