Comments on some of the ambiguity about the notion of big data

A number of tech markets, including enterprise computing, cloud, SaaS, PaaS, IaaS and IoT have demonstrated a voracious appetite for data management and analysis. Anyone following data management technology may get lost in the notion of “big data”.

I say lost, as an enormous amount of hype has been built up around the “theme” of “big data.” But a lot of long standing data management methods — relational databases management systems (RDBMS) with a columnar architecture built to provide structure to data — work really well for, ostensibly, enormous amounts of information (meaning data). Readers may want to consider efforts like the Port Authority of New York and New Jersey, and the toll road system it manages. How many millions of vehicle transactions occur on a monthly basis? In turn, how many billions of bits of data does the history of vehicle transactions through toll machines represent? Has this enormous amount of data proven to be unmanageable?

The answers to each of the questions, just presented, all support an argument for RDBMS and Structured Query Language (SQL) as a useful method of working with enormous amounts of data. These questions and answers echo across a very wide of applications; for example, the purview of the U.S. National Weather Service; or the universe of drugs managed by the U.S. Food and Drug Administration.

So there is nothing inherently radical about the notion of “big data”, at least if the notion is correctly understood as merely the set of methods commonly in use to manage data. In fact, and this is where, in my opinion. commentator hyperbole has clouded the whole question of just what is changing — in a truly radical way — about data management methods, the notion of big data is NOT correctly understood as I’ve just presented it. The “big” piece of “big data” appears to have been meant to represent a scalable data management architecture (best typified by Apache Hadoop (http://hadoop NULL.apache NULL.org)). Anyone reading the presentation on the Hadoop web site can’t help but understand the role of clusters of servers for Hadoop as a solution. Clusters of servers, in turn, provide a perfect rationale for the Apache project to provide the foundation for Hadoop.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved