Blog Posts Business Management

Organizations need to give unstructured data its rightful place if they want to get value out of data

Blog: Capgemini CTO Blog

According to IDC, the total volume of data will reach 163 zettabytes in 2025. It is expected that 80% of this will be unstructured data. That’s a mind-boggling number, though what is even more amazing is that companies have only marginally shifted in how they handle their unstructured data.

Traditionally, companies mainly used structured (meaning that it fits well within the rows and columns of a database) and internal (meaning that it is created within the organization) data. Nowadays, the part of the data that is unstructured and external is growing the fastest. Sources of external data are social media platforms such as Facebook, Twitter, and WhatsApp, but also search phrases in Google, data streams from smart devices (IoT), video streams from security cameras, or geo info used by Uber or Lyft. All these sources, as well as many others, are adding to the enormous pile of unstructured data that is available to be used and analyzed.

Obviously, unstructured data has always been part of the data used by companies, consisting of text documents, presentations, notes, and to a lesser degree, photos, videos, and images. Traditionally, this has been addressed by either storing this information in a database (as BLOB or CLOB data), or by using an enterprise content management system (ECM). The drawback of storing a contract in a database, for instance, is that it can only be stored and retrieved. It cannot be searched or edited. In this context, an ECM can be seen as the next step. It provides the ability to not only store and edit the data, but also to share it, work on it simultaneously with other people, understand changes between versions, etc. Indeed, this is already a big improvement from the standpoint of handling and leveraging unstructured data.

Big data

Unfortunately, this won’t be enough for all the unstructured data that is out there on the internet. The emergence of what is called big data has led to a landslide of new products aimed at handling unstructured data fast enough, also in the event of large data volumes. Hadoop, HDFS, and Map/Reduce can now almost be considered household names, but there are many more products that have emerged as solutions for situations where traditional databases fall short. Document stores, key-value stores, column family stores, and graph databases are all examples of new categories of databases that help manage the large amounts of unstructured data we are seeing today. Semi-structured data, such as documents, can best be handled by using a document store. Any combination of data can be stored as is and does not have to comply with a uniform format – something unheard of in a relational database.

There seems to be a gap between the potential business value that unstructured data holds and day-to-day practices. Some of these challenges include:

The high cost of maintenance and of finding relevant data, as well as the low probability of actually finding the information you want are some of the effects of the situation described above. Finding contradictory data and the effort to find out which is the correct set of data are other disadvantages. In short, there is still much to be won by organizing unstructured data better.

Organizing for value out of data

Organizations that want to get value out of data need to have a solid data foundation that covers both structured and unstructured data, but achieving such a foundation requires remedying the challenges stated above. Several capabilities are needed to better manage unstructured data:

Analyzing all processes where unstructured data is involved and understanding how it is used will provide an integral view on the unstructured data in the organization. This makes it possible to understand how this data can best be supported. The list mentioned above can help understand to what degree a certain application supports the required functionalities.

For all systems that store unstructured data, it can then be determined whether the system is a reference system, a system of entry (input), or a system of use. While data can be entered and used in many different systems, there can only be one system of reference for the same data. Working in this way ensures that it is clear what constitutes the correct data at any point in time. The resulting simplification and alignment support the data foundation mentioned earlier and makes it possible to get value out of unstructured data, whether it be in combination with structured data, or not.

So for organizations there is a big opportunity to get more value out of data by reorganizing the unstructured data they already have. Let’s no longer wait and build that data foundation!

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples