8
Dec

Comments on some of the ambiguity about the notion of big data

A number of tech markets, including enterprise computing, cloud, SaaS, PaaS, IaaS and IoT have demonstrated a voracious appetite for data management and analysis. Anyone following data management technology may get lost in the notion of “big data”.

I say lost, as an enormous amount of hype has been built up around the “theme” of “big data.” But a lot of long standing data management methods — relational databases management systems (RDBMS) with a columnar architecture built to provide structure to data — work really well for, ostensibly, enormous amounts of information (meaning data). Readers may want to consider efforts like the Port Authority of New York and New Jersey, and the toll road system it manages. How many millions of vehicle transactions occur on a monthly basis? In turn, how many billions of bits of data does the history of vehicle transactions through toll machines represent? Has this enormous amount of data proven to be unmanageable?

The answers to each of the questions, just presented, all support an argument for RDBMS and Structured Query Language (SQL) as a useful method of working with enormous amounts of data. These questions and answers echo across a very wide of applications; for example, the purview of the U.S. National Weather Service; or the universe of drugs managed by the U.S. Food and Drug Administration.

So there is nothing inherently radical about the notion of “big data”, at least if the notion is correctly understood as merely the set of methods commonly in use to manage data. In fact, and this is where, in my opinion. commentator hyperbole has clouded the whole question of just what is changing — in a truly radical way — about data management methods, the notion of big data is NOT correctly understood as I’ve just presented it. The “big” piece of “big data” appears to have been meant to represent a scalable data management architecture (best typified by Apache Hadoop (http://hadoop NULL.apache NULL.org)). Anyone reading the presentation on the Hadoop web site can’t help but understand the role of clusters of servers for Hadoop as a solution. Clusters of servers, in turn, provide a perfect rationale for the Apache project to provide the foundation for Hadoop.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

15
Oct

ISVs debut cloud, SaaS solutions to satisfy consumer appetite for Analytics and Data

On Monday, October 13, 2014, Salesforce.com announced the debut of a new cloud, SaaS solution named “Wave” (https://www NULL.salesforce NULL.com/company/news-press/press-releases/2014/10/141013 NULL.jsp). Back on September 16, 2014, IBM announced “Watson Analytics”, once again, a cloud SaaS, but, this time, a freemium offer. So it’s safe to say Analytics for the masses has become a new competitive ground for big, mature ISVs to contend for more market share.

A couple of points are worth noting about the Salesforce.com press release:

  1. GE Capital is mentioned as already using Wave. Given GE’s own recent PR campaign around its own data and analytics effort, one must wonder why the business finance component of the company opted not to use the home grown solution ostensibly available to it
  2. Informatica is mentioned as an “ecosystem” partner for Wave and released its own press release, titled Informatica Cloud Powers Wave, the Salesforce Analytics Cloud, to Break Down Big Data Challenges and Deliver Insights (http://www NULL.marketwatch NULL.com/story/informatica-cloud-powers-wave-the-salesforce-analytics-cloud-to-break-down-big-data-challenges-and-deliver-insights-2014-10-13)

The Wave announcement follows, by less than a month, IBM’s announcement of a freemium offer for “Watson Analytics”, and Oracle’s “Analytics Cloud”. Both of these offers are delivered via a cloud, SaaS model. So it’s likely safe to say enterprise technology consumers have demonstrated a significant appetite for analytics. The decision by Salesforce.com, IBM, and Oracle to all deliver their solutions via a cloud, SaaS offer speaks to the new enterprise computing topology (a heterogeneous computing environment) and the need to look to browsers as the ideal thin clients for users to work with their data online.

An ample supply of structured and unstructured data is likely motivating these enterprise tech consumers to look for methods of producing the kind of dashboards and graphs each of these analytics offers is capable of producing. With data collection methods advancing, particularly for big data (unstructured data), this appetite doesn’t look to abate anytime soon.

ISVs with solutions already available, principally Microsoft with its suite of Power tools for Excel (PowerBI, PowerPivot, etc), may also be participating in this “feeding frenzy”. It will be interesting to see how each of the ISVs with offers for this market fare over the next few business quarters.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

17
Jun

As More Enterprise Businesses Embrace Hadoop, Intel Stands to Benefit

Of the 42 members of Hadoop’s Project Management Committee (http://hadoop NULL.apache NULL.org/who NULL.html), 8 are directly affiliated with Cloudera®, and another with Intel®. Patrick Hunt, an Engineer at Cloudera appears to have played a key role in the development of a keyword search feature for Hadoop, which is not a trial achievement for a database like Hadoop, which is designed for unstructured data. Intel has an investment in Cloudera. Therefore, Intel should benefit as more organizations choose to proceed with unstructured data, and Hadoop as its repository.

Some prominent online businesses, including:

  • Amazon
  • eBay
  • facebook
  • Twitter
  • and Spotify

have made major commitments to Hadoop.

Readers are recommended to review Who uses Hadoop? (http://wiki NULL.apache NULL.org/hadoop/PoweredBy) to familiarize themselves with the size of an average Hadoop implementation. Of course, very large repositories of data like these require a lot of CPU resources for processing. As the leading manufacturer of server CPUs, Intel benefits from all of this need for computing power, regardless of whether an organization implementing Hadoop runs it on the Apple OS X O/S, Ubuntu, or another Linux flavor. The recommended hardware for each of these is Intel.

The tools offered by Cloudera for managing Hadoop data repositories (http://www NULL.cloudera NULL.com/content/cloudera/en/solutions/enterprise-solutions NULL.html) are designed to provide enterprise businesses with familiar features and procedures. Since most of these enterprise data centers are already full of Intel hardware, Cloudera can be seen, perhaps, as another method Intel can leverage to maintain its position in these same installations.

What bearing does all of the above have on discussions about large data centers, a need for better power management, and the likelihood of hardware OEMs building solutions on the ARM architecture capturing substantial share? Given the importance of Hadoop to the leading cloud, IaaS vendor — Amazon, as well as to Microsoft Azure (http://azure NULL.microsoft NULL.com/en-us/solutions/big-data/?WT NULL.mc_id=azurebg_us_sem_bing_br_solutions_nontest_bigdata&WT NULL.srch=1) it doesn’t appear likely server cores running ARM architecture will quickly become the standard in these environments any time soon.

Further, Intel is certainly not standing by, but working, very actively to produce more power efficient hardware in very small form factors. One can argue Microsoft’s Surface Pro 3, which is powered by either an Intel Quad Core i3, i5, or even i7 is a tangible example of how much progress they have made to better satisfy consumer appetite for power thrifty, extremely thin computing devices.

Ira Michael Blonder (https://plus NULL.google NULL.com/108970003169613491972/posts?tab=XX?rel=author)

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved