The Open Business Data Lake Standard, Part IX
Blog: Capgemini CTO Blog
In my previous blog posts (Part I, Part II, and Part III) about the ‘Open Business Data Lake Conceptual Framework (O-BDL) I introduced its background, concept, characteristics and platform capabilities. In Part IV, Part V and Part VI I compared a Data Lake with other data processing platforms, described how an O-BDL should work and defined possible business scenario’s which can make use of an O-BDL. In Part VII and Part VIII I elaborated on the O-BDL data, data ingestion and data processing concept. In this final part I’ll describe the O-BDL in more detail with regards to the O-BDL data management and operations concept.
In Part V I introduced the following process diagram (applying The ArchiMate® Enterprise Architecture Modeling Language) to describe how an O-BDL should work.
Based on this process the O-BDL data management and operations concepts are described in more detail.
The Data Management concept
Data management exploits metadata to manage the data lifecycle, data quality, access policies, and services for Master Data Management (MDM) and Reference Data Management (RDM).
Master data represents a single source of common, basic business data objects that can be used by O-BDL distillation and real-time analysis processing to verify, enrich, and correlate data. If MDM practices and tools are already deployed in the enterprise, an O-BDL should be built on the side at first.
Reference Data Management (RDM)
Reference data contains authoritative lists of values or entities. These lists are generally massively re-used and widely “referenced” by other data or metadata. Country codes or calendars constitute typical examples of reference data.
Audit and Policy Management
An O-BDL should be implemented to accommodate the audit controls (e.g., COBIT 5.0) and the centralized application of information policies for security and information governance including provisioning, de-provisioning, access logs, data quality actions, authentication, authorization, encryption, filtering, log-ins, and single sign-on.
Privacy and Protection
Data in an O-BDL implementation may come from numerous sources that reside in different jurisdictions, each with different privacy, retention, and appropriate use legislation. This is especially true in large multi-national companies. Architects have to be aware of the legislation and ensure that the appropriate controls can be implemented in an O-BDL.
Information security shall be architected from the beginning, including the labeling, handling, and access to data over time (i.e., the sensitivity of data can vary over time such as a report to shareholders which becomes common knowledge after release, or increased overall sensitivity that may occur when different groups of data are aggregated over time).
The Operations concept
Operations concern the ability to provision, configure, monitor, and manage the whole O-BDL from a single, unified environment that abstracts the distributed infrastructures and the multiple integrated services
System monitoring shall consolidate information from multiple levels, at least:
- Infrastructures (disk, memory, and network usage)
- Operating system
- Data storage
- Processing workflows
An O-BDL itself can be used to get insights from the logging data extracted from all the layers and services of an O-BDL.
System management mainly consists of:
- A resource manager for the provisioning of O-BDL elastic infrastructures. It also takes care of failures that can happen among the cluster nodes.
- A workflow manager that executes batch processing workflows.
The resource manager generally has control over the processing engines, so that an O-BDL is asscalable as possible.
System management must take into account the diversity of business compartments, especially regarding the elasticity (or not) of the underlying infrastructures and priorities for processing workflows.
Relationship to Other Open Group Standards
The Open Platform 3.0 Standard
The objective of the Open Platform 3.0 standard is to enable agile, secure, reliable, interoperable, and manageable multiple technology solutions within and across enterprises. It is an interoperability standard for platforms that support integration of cloud computing, mobile computing, social computing, big data analytics, and the Internet of Things (IoT) computing paradigms, technologies, infrastructures, and applications across enterprises. An Open Business Data Lake is a particularly relevant solution for the big data analytics services relevant to the ongoing development of standards by the Open Platform 3.0 Forum, a Forum of The Open Group.
The TOGAF® Standard
An O-BDL is an instantiation of parts of an enterprise information architecture designed to handle big data in real time, provide analytics and self-service data access and sharing for enterprise use, and is complementary to the TOGAF standard. This O-BDL specifies instantiations of parts of the generic Information Sharing Environment (ISE) concept introduced in The Open Group White Paper: An Information Architecture Vision: Moving from Data Rich to Information Smart
The ArchiMate® Standard
O-BDL concepts are represented using the ArchiMate modeling conventions and metamodel
The IT4IT Standard
This O-BDL Conceptual Framework can be used as part of a solution to create an IT4IT implementation
The O-DEF Standard
The Open Data Element Framework (O-DEF) semantic interoperability concepts can be used in an O-BDL implementation.