Unveiling Emerging Data-centric Storage Architectures

Blog: NASSCOM Official Blog

The digital way of interactions is becoming normal as data is the output of all our everyday activities. This is the reason even enterprises are adopting data-centric architectures and it is estimated in one of the articles by Tony Bishop that by 2024, Global 2000 Enterprises will create data at a rate of 1.1 million gigabytes per second and will require 15,635 exabytes of additional data storage annually. In fact, at the recent SNIA SDC USA 2020, emerging storage architectures was a hot topic among the speakers.

In one of the keynote sessions at SDC 2020, Pankaj Mehra, VP Storage Pathfinding, Samsung Electronics, discussed at length about “Emerging Data-centric Storage Architectures“. He covered in his keynote, advanced workload-optimized SSDs by Samsung for data at a large scale. Let’s dig deeper to understand the takeaways from his session.

Challenges of data at a large scale

Pankaj explained the current scenario of data at a large scale with the bottlenecks and inefficiencies faced by enterprises:

You want your processing power and processing bandwidth to not bottleneck, because to handle data at scale, you need the ability to process and move that data to the processing that scales with it.
With large data, you inevitably end up with a very large number of objects and if the metadata that you have is too granular or insufficiently granular, then you will end up with metadata inefficiencies.
In some of the infrastructure trends, we are now disaggregating storage from the compute mode, which wants to be increasingly stateless. We are moving towards an architecture where our storage is connected to the data center fabric, in this case, the choice of protocol inevitably revolves around NVMe over Fabric, NVMe over TCP, and more. Here the question is, how many times, where for instance you should be terminating that wire protocol. We will notice that in current architectures you can have bottlenecks due to repeated terminations of protocols, buffering, and re-buffering which leads to latencies and pipeline bubbles.

So, the idea of noticing bottlenecks led to inefficiencies such as:

Inability to deliver both performance and scale due to the bottleneck of processing power and processing bandwidth
Wasted endurance, wasted memory bandwidth due to the metadata inefficiency of object storage & retrieval
CPU overhead of I/O, CPU overhead of I/O virtualization due to wire protocol termination for disaggregated flash

Unveiling Emerging Data-centric Storage Architectures

Blog: NASSCOM Official Blog

Challenges of data at a large scale

Trending solutions