Get Ready for the Big Shift to the Data Mesh
Blog: Jim Sinur
Today many organizations are stuck in data muck. Simultaneously, organizations are drowning in vast amounts of new data while the business needs more speed in digesting data that are often aggregated with other data. In today’s outcome-driven world, management doesn’t need more dashboards. They need fast boards that tell the data story by incorporating in-the-moment operational data with historical data to create the business context needed for decision-making. One of the main obstacles that organizations have is their data infrastructure sprawl. Organizations need to simplify to accelerate by changing their data landscape to be re-oriented to the data mesh architectural concept.
The Data Iceberg Needs Better Management
Like it or not, our data management efforts are getting overwhelmed as the amount, complexity, and contexts of data are emerging fast. In contrast, slow legacy data sources are a hidden danger just under the surface. When someone has to traverse the end-to-end data journey from start to outcomes, the complexity is almost overwhelming even if things don’t change. The problem is that they are changing and changing fast, creating a need for more data to manage. Now, imagine a goodly sum of the data migrating to the cloud as the data sources further distribute and include new signals, events, patterns, and contexts. Figure 1 attempts to identify all the sources, emergent or not, that either create or contain data to manage to service outcome-driven organizations.
Figure 1 The Data Iceberg
We need a dynamic data-view maker with all the sources to configure to their dynamic needs and outcomes. Today organizations have to over-specify and create the data view ahead of time. A real unified dynamic data experience is what is needed. Today’s approach is to think about the data lifecycle in terms of the functional role of the datastore. That is, new data gets created by applications and stored in that application’s OLTP database. Then, for analytical purposes, that data is copied and moved to an OLAP database for reporting. These days, there are more application types and systems generating more types of structured, semi-structured, and unstructured data and storing and processing that data in a greater variety of single-purpose datastores types, for instance, a document database for product catalog information. This causes unnecessary latency where real-time needs are important but also presents a greater management cost by maintaining the variety of datastores and maintaining the skill sets needed to design and operate the datastore variety. A data mesh approach reorients the data management efforts to align the consumers of “data products” instead of thinking about many data pipelines and datastore types. A data mesh provides a unified view and architecture for organizational outcomes supported by applications, processes, or dashboards. It combines data inside the cloud with data outside the cloud. A data mesh hides the complexity and variety of data from the end-use. A data mesh manages both fast and slow data, whether it is organized in a centralized or distributed fashion. Within a data mesh, you have what’s known as “nodes”. Each node corresponds to a data product and defines all the data, metadata, consumers, and providers of that data product. When realizing these “nodes”, there’s an opportunity to gain efficiency by selecting fewer datastores and use them for a broader range of workloads. For instance, there are now modern, cloud-native, distributed SQL databases that support what I call “Monster Data” while storing and processing real-time streaming data and historical data simultaneously. These can be used in a data mesh to reduce the number of skills set and reduce the data infrastructure sprawl, ultimately resulting in a simpler and less costly data landscape. In essence, a data mash unifies and simplifies data management coupled with a modern, cloud-native distributed SQL database can lower the cost for more extensive monster data sources that move at lightning speeds. See Figure 2 for a depiction of a data mesh.
Figure 2 Sample Data Mesh
The data inventory to manage has gotten unwieldy and continues to be like a monster to tame. Organizations will need new architectures including distributed data cells that carry the intelligence to self-manage to play well with other data sources from various contexts. As organizations try to leverage cloud data economically, they have to watch out for the pitfalls of hidden cloud costs. All of this is a must while simplifying the access to the new and emergent data combined with legacy data types or sources. This transformation is a significant accelerator to digital business transformation.