Blog Posts Enterprise Architecture (EA)

Real-Time Analytics and Ad Hoc Queries: What Hadoop Can and Can’t Do for You

Operational Intelligence I Hadoop I VitriaI recently ran across an article in TechTarget that talks extensively about Hadoop’s limitations around real-time analytics applications.  The article, authored by Ed Burns, emphasizes that while Hadoop was designed to process large sets of structured, unstructured and semi-structured data, it was built as a batch processing system which imposes significant limitations around real-time analysis. In the article Burns features excerpts from an interview with Forrester analyst Mike Gualtieri who mentions that there are plenty of vendors and end users asking, “Why can’t we execute real-time data analytics and ad hoc queries using Hadoop?” It’s a valid question, and Mike cites a key obstacle Hadoop faces with respect to real-time analytics.

Mike states that most of the new Hadoop query engines remain slower and more cumbersome than queries posed against mainstream relational databases. Various tools include interfaces that allow users to write queries in the SQL programming language that in turn get translated via MapReduce for execution on a Hadoop cluster.

While Hadoop’s scalability and affordability are appealing to some organizations, it’s important to recognize its place in the market. And if real-time analytics and ad hoc query capabilities are important, experts agree it’s better not to cobble it into a systems architecture where it doesn’t fit or make sense.

Nonetheless, Hadoop vendors are complementing their batch processing capabilities by partnering with stream processing technology providers. Vitria is an example of one such vendor that can continuously process streaming data in real-time. By contrast, because Hadoop is a batch processing system, it’s a piecemeal approach requiring multiple technologies to achieve streaming data analytics capabilities.

If effectively managing your business requires immediate and continuous data analysis – down to the fraction of a second – batch processing just won’t cut the mustard.  Companies that operate within the retail, energy, financial services or telecom industries, for example, need this level and speed of analysis. And while the value of continuous real-time data analytics is widely recognized within these industries, it’s critically important to make the distinction between “continuous, real-time” and “on-demand, near real-time” and understand the potential pitfalls and drawbacks associated with Hadoop-based data analytics solutions. 

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/real-time-analytics-and-ad-hoc-queries-what-hadoop-can-and-cant-do-for-you/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×