The Apache Hadoop project “develops open-source software for reliable, scalable, distributed computing” (quoted from the “What is Apache Hadoop?” section of the site). So it makes sense for Microsoft and Intel to enthusiastically support the project. Microsoft is deeply committed to its cloud, IaaS effort, Azure, and one of the prime revenue generators for Intel is its Data Center Business. Azure and Intel’s Data Center business are both all about lots and lots of computer servers. The former consumes servers, while the latter provides the CPUs driving them.
As I wrote in the previous post to this blog, it’s likely a majority of the enterprise consumer segment of the tech reader community maintains a questionable understanding of the notion of “big data”. But, when correctly understood, it should not be a stretch for readers to understand why the Apache Hadoop project (or its OpenStack competitor) are positioned at the very core of this technology trend.
Microsoft and Intel are not the only mature ISVs looking to benefit from big data. IBM and EMC are two other champions with solutions on the market to add value for enterprises looking to implement Hadoop.
Intel ostensibly understands the ambiguity of the notion of “big data”, and the imperative of providing the enterprise business consumer with a clearer understanding of just what this buzzword is really all about. A section of the Intel web site, titled Big Data, What It Is, Why You Should Care, and How Companies Gain Competitive Advantage is an attempt to provide this information.
But Intel’s effort to educate the consumer, in my opinion, falls into the same swamp as a lot of the other hype before it can deliver on its promise. The amount of data may be growing exponentially, as the opening of the short Intel animation on the topic contends, but there are a lot of mature ISVs (Oracle, IBM, Microsoft, etc) with relational database management systems, designed for pricey big server hardware, which are capable of providing a columnar structure for the data.
Even when “unstructured data” is mentioned, the argument is shaky. there are solutions for enterprise consumers like Microsoft SharePoint (specifically, The Term Store service), which are designed to build a method of effectively pouring text data into an RDBMS, for example SQL Server (the terms are added to SQL Server and are used to tag the text strings identified in unstructured data).
I am not arguing for the sole use of traditional RDBMSs, with SQL tools to manage a data universe experiencing exponential growth. Rather, I think big data proponents (and Hadoop champions) need to perform a closer study on what the real benefits are of clustering servers and then articulate the message for their enterprise computing audience.
Ira Michael Blonder
© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved