19
Dec

Success Stories and Case Studies do serve a purpose for enterprise technology consumers

2-Color-Design-Hi-Res-100px-widthIf ISVs with offerings targeted to enterprise computing markets needed any more indication of the importance of case studies and success stories, they likely got what they needed in an article written by Elizabeth Dwoskin, which was published on December 16, 2014 on the Online Wall Street Journal web site.

The title of Dwoskin’s article is The Joys and Hype of Software Called Hadoop (http://www NULL.wsj NULL.com/articles/the-joys-and-hype-of-software-called-hadoop-1418777627?mod=LS1). The reason her article should alert any ISVs still in the dark as to why they absolutely require a marketing communications effort, which will produce success stories and case studies can be found in the following quote:

  • “Yet companies that have tried to use Hadoop have met with frustration. Bank of New York Mellon used it to locate glitches in a trading system. It worked well enough on a small scale, but it slowed to a crawl when many employees tried to access it at once, and few of the company’s 13,000 information-technology workers had the expertise to troubleshoot it. David Gleason, the bank’s chief data officer at the time, said that while he was a proponent of Hadoop, ‘it wasn’t ready for prime time.'” (quoted in entirety from Dwoskin’s article in the WSJ. I have provided a link to the entire article, above and encourage readers to spend some time on it)

This comment from a large enterprise consumer — BNY Mellon — which can be read as less than positive, can (and likely will) do a lot to encourage peers to look a lot closer at Hadoop prior to moving forward on an implementation.

Bottom line: enterprise businesses do not like to proceed where their peers have hit obstacles like the one Gleason recounts in his comment. Peer comparisons are, arguably, a very important activity for enterprise business consumers. So ISVs working with Hadoop on big data offers, or NoSQL databases and related analytics need to make the effort to queue up positive comments about consumer experiences with their products.

I recently wrote a set of posts to this blog on Big Data, NoSQL and JSON and must admit to experiencing some difficulty finding the case studies and success stories I needed to gain a perspective on just how enterprise consumers have been using products presented as solutions to the market for these computing trends. Hortonworks (http://www NULL.hortonworks NULL.com), on the other hand, is an exception. So I would encourage any readers after the same type of testimonial content about customer experience with products to visit Hortonworks on the web.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

18
Dec

A Microsoft Perspective on NoSQL and Document Databases

2-Color-Design-Hi-Res-100px-widthIn November, 2011, Julie Lerman wrote a post for Microsoft’s MSDN Magazine on Document Databases. The title of her post is What the Heck Are Document Databases? (http://msdn NULL.microsoft NULL.com/en-us/magazine/hh547103 NULL.aspx) Her post may provide business sponsors of NoSQL database projects with useful information about the notion of NoSQL, and, therefore is recommended reading material.

What prompts me to recommend this post for business stakeholders in NoSQL projects (aka Gartner’s “Citizen Developers”) is the comparative lack of abstraction characterizing Lerman’s presentation. She quickly identifies document databases as one of several types of NoSQL databases (she also presents “key-value pair” databases and points to Azure Table Storage as an example). Here’s a great example of the simplicity of Lerman’s presentation of the notion of NoSQL: “The term is used to encompass data storage mechanisms that aren’t relational and therefore don’t require using SQL for accessing their data.”

For some business readers even this short definition may be challenging. Just what does she mean when she presents her notion of “data storage mechanisms that aren’t relational?” It would, perhaps, have been helpful for the audience I have targeted to add an additional sentence, to simply illustrate how rows and columns in tables, which are, defacto, “relational” components (or structure) actually offer users a method of storing information. Kind of like “I know where you are, therefore, dear data, you have been stored SOMEWHERE”.

But the business user is likely not Lerman’s intended audience. This post appears in Microsoft’s MSDN (Microsoft Developer Network) Magazine, so the intended audience, I would assume, are coders working with Microsoft tools (.NET, C#) via VisualStudio. Nevertheless, sections of the post (like the one’s I’ve quoted, above) are certainly worth a read by the audience I have in mind, as well.

Here’s more useful information. As I wrote last week, the definition of NoSQL, “Not Only Structured Query Language” is a useful text string to keep in mind when grappling with hype about “radically different” approaches to managing data, or “getting rid of” relational databases. Back in November, 2011, when Lerman published her post, she drills down into defining the NoSQL acronym, too, by pointing her readers to a post by Brad Holt of the CouchDB (http://couchdb NULL.apache NULL.org/) project. The title of Holt’s post is Addressing the NoSQL Criticism (http://bradley-holt NULL.com/2011/07/addressing-the-nosql-criticism/), which he handles by noting “First, NoSQL is horrible name. It implies that there’s something wrong with SQL and it needs to be replaced with a newer and better technology. If you have structured data that needs to be queried, you should probably use a database that enforces a schema and implements Structured Query Language. I’ve heard people start redefining NoSQL as “not only SQL”. This is a much better definition and doesn’t antagonize those who use existing SQL databases. An SQL database isn’t always the right tool for the job and NoSQL databases give us some other options.” (this quote is excerpted, in entirety, from Brad Holt’s post. I’ve provided a link here to the complete post and encourage readers to read the post in entirety.).

So if you need to get a good understanding about the Document Database type of NoSQL structure, I recommend reading Lerman and Holt’s posts.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

17
Dec

Google Debuts Cloud Dataflow at Google I/O 2014

2-Color-Design-Hi-Res-100px-widthAt the end of a 2.5 hr plus webcast of the Keynote Presentation from Google I/O 2014 (https://www NULL.google NULL.com/events/io#wtLJPvx7-ys) can be found the debut of Google Cloud Dataflow, the replacement for Google MapReduce. Readers unfamiliar with MapReduce, but avidly interested in the big data enterprise computing trend, need to understand MapReduce as the application at the foundation of today’s Apache Hadoop project. Without MapReduce, the Apache Hadoop project would not exist. So Google MapReduce is a software package worth some study, as is Cloud Dataflow.

But wait, there’s more. As Urs Hölze, Senior Vice President, Technical Infrastructure, introduces Google Cloud Dataflow, his audience is also informed about Google’s role in the creation of another of today’s biggest enterprise data analytics approaches — NoSQL (“Not only SQL”). He casually informs his audience (the segue is a simple “by the way”) Google invented NoSQL.

I hope readers will get a feel for where I’m headed with these comments about these revelations about Google’s historical role in the creation of two of the very big trends in enterprise computing in late 2014. I’m perplexed at why Google would, literally, bury this presentation at the very end of the Keynote. Why would Google prefer to cover its pioneering role in these very hot computing trends with a thick fog? Few business decision-makers, if any, will be likely to pierce this veil of obscurity as they search for best-in-class methods of incorporating clusters of servers in a parallel processing role (in other words “big data”) to better address the task of analyzing text data scraped from web pages for corporate sites (“NoSQL”).

On the other hand, I’m also impressed by the potential plus Google can realize by removing this fog. Are they likely to move in this direction? I think they are, based upon some of the information they reported to the U.S. SEC in their most recent 10Q filing for Q3 2014. Year-over-year, the “Other Revenues” segment of Google’s revenue stream grew by 50% from $1,230 (in 000s) in 2013, to $1,841 in 2014. Any/all revenue Google realizes from Google Cloud and its related components (which, by the way, include Cloud Dataflow) are included in this “Other Revenues” segment of the report. For the nine months ending September 30, 2014, the same revenue segment increased from $3,325 in 2013, to $4,991 in 2014. Pretty impressive stuff, and not likely to diminish with a revamped market message powering “Google at Work”, and Amit Singh (late of Oracle) at the head of the effort.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

15
Dec

Who’s losing sleep over NoSQL?

One of the biggest challenges facing product marketing within any business is successfully identifying a market segment. I would argue more businesses fail because they either:

  1. don’t understand their market niche
  2. or can’t articulate a message intelligible to their market niche
  3. The next step is to put together a portrait of an ideal prospect within this segment. Over time, if a business is lucky enough to succeed, this portrait will likely change (perhaps scale is a better word). After all, early adopters will spread the word to more established prospects. The latter are more conservative, and proceed at a different pace, based upon different triggers.

The 3 steps I’ve just identified are no less a mandatory path forward for early stage ISVs than they are for restaurants, convenience stores, or any other early stage business.

But a lot of the marketing collateral produced by early stage ISVs offering NoSQL products and solutions, in my opinion, doesn’t signal a successful traverse of this path. In an interview published on December 12, 2014, Bob Wiederhold, CEO of CouchBase presents the first and second phases of what he refers to as “NoSQL database adoption” by businesses. Widerhold’s comments are recorded in an article titled Why 2015 will be big for NoSQL databases: Couchbase CEO (http://www NULL.zdnet NULL.com/article/why-2015-will-be-big-for-nosql-databases-couchbase-ceo/).

My issue is with Wiederhold’s depiction of the first adopters of NoSQL Databases: “Phase one started in 2008-ish, when you first started to see commercial NoSQL products being available. Phase one is all about grassroots developer adoption. Developers would go home one weekend, and they’ll have heard about NoSQL, they download the free software, install it, start to use it, like it, and bring it into their companies”.

But it’s not likely these developers would have brought the software to their companies unless somebody was losing sleep over some problem. Nobody wants to waste time trying something new simply because it’s new. No insomnia, no burning need to get a good night’s rest. What I needed to hear about was just what was causing these early adopters to lose sleep.

I’m familiar with the group of developers Wiederhold portrays in the above quote. I’ve referred to them differently for other software products I’ve marketed. These people are the evangelists who spread the word about a new way of doing something. They are the champions. Any adoption campaign has to target this type of person.

But what’s missing is a portrait of the tough, mission-critical problem driving these people to make their effort with a new, and largely unknown piece of software.

It’s incumbent on CouchBase and its peers to do a better job depicting the type of organization with a desperate need for a NoSQL solution in its marketing communications and public relations efforts.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

12
Dec

The job of classifying large amounts of text data becomes easier with JSON

The final cloud-like computing theme contributing to the unfortunate fog around the notion of “big data” is JSON (http://www NULL.json NULL.org/). In my opinion, enterprise consumers of big data solutions built with NOSQL databases aren’t going to be able to connect the dots from the presentation on the JSON open-source project homepage.

More intelligible information about JSON for the non programmer can be found on the web site of the Apache CouchDB project (http://couchdb NULL.apache NULL.org/). “CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents and query your indexes with your web browser, via HTTP” (quoted from the first sentence of editorial content published on the site). Quering indexes with your web browser, hmmm . . . might this have something with Chrome’s Omnibox (https://www NULL.google NULL.com/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&q=omnibox%20json)? In fact, as any reader following the link just provided will note, it does.

So now with this flexibility in mind, it might provide enterprise computing consumers with more of a rationale for calling for the implementation of databases conforming to JSON, which will lend themselves to analytics built with NOSQL tools. If the process of collecting data on some aspect of a business process can be reduced down to little more than punching some keywords into Chrome’s Omnibox (a version of which is now available for Firefox and Internet Explorer), then Lines of Business (LoBs) can count on their personnel getting to the data they need, when they need it, from any device (mobile, desktop, laptop) whenever they need it without the need for any proprietary solution.

Pretty cool. The cool factor increases when one reads more about the CouchDB project. JSON represents an alternative to XML, which requires substantially more verbosity (meaning lines of code) to express the same programming statement. Lots of lines of code contribute to a slower web, where pages can take forever to load. So the comparatively lighter weight promised by using JSON to express steps in a program makes a lot of sense. The intention of JSON and XML are the same, namely to provide a method of data exchange (http://www NULL.idealware NULL.org/articles/data_exchange_alpha_soup NULL.php).

JSON produces “JSON Documents”. Here’s an example of what IBM© is doing with JSON: Search JSON documents with Informix (https://www NULL.ibm NULL.com/developerworks/community/blogs/idsdoc/entry/search_json_documents_with_informix?lang=en).

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

11
Dec

The NoSQL notion suffers from some of the same ambiguity plaguing the notion of big data

Readers interested in finding out what NOSQL is all about will benefit from simply developing some familiarity with the definition of this acronym. NOSQL stands for “not only SQL”. I found this definition to be very helpful as it helped me correct my first misunderstanding about this notion. I thought NOSQL referred to a set of software tools designed to work with text, document, databases lacking the columnar table structure their Structured Query Language (SQL) siblings thrive upon.

But my understanding was wrong, which, unfortunately for businesses championing a NOSQL approach, may be the case of a lot of the enterprise user segment of the enterprise computing market for NOSQL analytics and the tools required for their delivery. mongoDB (http://www NULL.mongodb NULL.com/nosql-explained) is an example of a database built to conform to NOSQL.

But as the cliche goes “the best of all intentions” can go astray, as is the case, in my opinion, for the mongoDB definition. The average consumer of enterprise computing solutions built to work with social media conversations culled from lots of web pages, likely a chief marketing officer for a popular consumer brand-name, isn’t likely to be able to understand how “Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents” (quoted from the mongoDB web page presentation).

Further, characterizing the choices facing the enterprise consumer as an either “RDBMS” or “non RDBMS” isn’t going to be helpful if the literal definition of the NOSQL acronym is applied. As MapR© points out on its web site, an optimum approach to implementing NOSQL analytics is to combine SQL and text query tools built with JSON components to digest the same data, which, admittedly be incorporated into a mongoDB database, but came, originally from an RDBMS.

What’s even more surprising about the page on the mongoDB website is the light it sheds on a programming effort by a much larger, and much more mature ISV, namely Microsoft: “Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB”. Hmmm . . . Now “Office Graph”, which is the predecessor of “Delve”, makes a lot more sense.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

10
Dec

Use Hadoop to collect and analyze data with clusters of computer servers

Customers with large amounts of data, who are capable of supporting a distributed server architecture, as clusters, can benefit from a decision to implement Apache Hadoop® as the solution. The key operant principle is the notion of clusters. Readers eager to learn more about this benefit may want to take a few moments to review a short animation, titled Hadoop* Server Clusters with 10 Gigabit Intel® Ethernet, which is available for public viewing on a web site published by Intel.

I’m not recommending the video for the presentation of Intel’s high speed gigabit networking hardware. This segment takes up approximately the last 1-2 mins of the animation. But the opening section does more to present viewers with information about how Apache Hadoop is uniquely capable of adding value to any effort to implement data management and analytics architectures over comparatively lower cost server hardware than most of the hype otherwise available online on the notion of “big data”.

For readers looking for even more help drilling down to just what the value-add may amount to should a decision be made to implement Hadoop, a quick visit to a page on the MapR© web site titled What is Apache™ Hadoop®? (https://www NULL.mapr NULL.com/products/apache-hadoop) will likely be worth the effort. The short presentation on the page, in my opinion, provides useful information about why clusters of servers are uniquely capable of servicing as the repository for an enormous number of web pages filled with information.

Certainly market consumers have opted to implement Hadoop for a lot of other purposes than its original “reason to be” as an evolution of “a new style of data processing known as MapReduce” (which was developed by Google) as the MapR presentation points out. These implementations provide a lot of the support for arguments for the notion of “big data”, at least the ones short on hype and long on sensibility.

What’s missing from the MapR presentation are customer success stories/case studies. Fortunately anyone looking for this type of descriptive content on just how real life businesses can benefit from an implementation of Hadoop can simply visit a page of the Hortonworks web site titled They Do Hadoop (http://hortonworks NULL.com/customers/) and watch some of the videos.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

9
Dec

Hadoop attracts support from Microsoft and Intel

The Apache Hadoop project (http://hadoop NULL.apache NULL.org/#What+Is+Apache+Hadoop%3F) “develops open-source software for reliable, scalable, distributed computing” (quoted from the “What is Apache Hadoop?” section of the site). So it makes sense for Microsoft and Intel to enthusiastically support the project. Microsoft is deeply committed to its cloud, IaaS effort, Azure (http://www NULL.azure NULL.com), and one of the prime revenue generators for Intel is its Data Center Business (http://www NULL.intel NULL.com/content/www/us/en/search NULL.html?keyword=data%20center). Azure and Intel’s Data Center business are both all about lots and lots of computer servers. The former consumes servers, while the latter provides the CPUs driving them.

As I wrote in the previous post to this blog, it’s likely a majority of the enterprise consumer segment of the tech reader community maintains a questionable understanding of the notion of “big data”. But, when correctly understood, it should not be a stretch for readers to understand why the Apache Hadoop project (or its OpenStack (http://www NULL.openstack NULL.org) competitor) are positioned at the very core of this technology trend.

Microsoft and Intel are not the only mature ISVs looking to benefit from big data. IBM and EMC are two other champions with solutions on the market to add value for enterprises looking to implement Hadoop.

Intel ostensibly understands the ambiguity of the notion of “big data”, and the imperative of providing the enterprise business consumer with a clearer understanding of just what this buzzword is really all about. A section of the Intel web site, titled Big Data, What It Is, Why You Should Care, and How Companies Gain Competitive Advantage (http://www NULL.intel NULL.com/content/www/us/en/big-data/big-data-101-animation NULL.html) is an attempt to provide this information.

But Intel’s effort to educate the consumer, in my opinion, falls into the same swamp as a lot of the other hype before it can deliver on its promise. The amount of data may be growing exponentially, as the opening of the short Intel animation on the topic contends, but there are a lot of mature ISVs (Oracle, IBM, Microsoft, etc) with relational database management systems, designed for pricey big server hardware, which are capable of providing a columnar structure for the data.

Even when “unstructured data” is mentioned, the argument is shaky. there are solutions for enterprise consumers like Microsoft SharePoint (specifically, The Term Store service), which are designed to build a method of effectively pouring text data into an RDBMS, for example SQL Server (the terms are added to SQL Server and are used to tag the text strings identified in unstructured data).

I am not arguing for the sole use of traditional RDBMSs, with SQL tools to manage a data universe experiencing exponential growth. Rather, I think big data proponents (and Hadoop champions) need to perform a closer study on what the real benefits are of clustering servers and then articulate the message for their enterprise computing audience.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

8
Dec

Comments on some of the ambiguity about the notion of big data

A number of tech markets, including enterprise computing, cloud, SaaS, PaaS, IaaS and IoT have demonstrated a voracious appetite for data management and analysis. Anyone following data management technology may get lost in the notion of “big data”.

I say lost, as an enormous amount of hype has been built up around the “theme” of “big data.” But a lot of long standing data management methods — relational databases management systems (RDBMS) with a columnar architecture built to provide structure to data — work really well for, ostensibly, enormous amounts of information (meaning data). Readers may want to consider efforts like the Port Authority of New York and New Jersey, and the toll road system it manages. How many millions of vehicle transactions occur on a monthly basis? In turn, how many billions of bits of data does the history of vehicle transactions through toll machines represent? Has this enormous amount of data proven to be unmanageable?

The answers to each of the questions, just presented, all support an argument for RDBMS and Structured Query Language (SQL) as a useful method of working with enormous amounts of data. These questions and answers echo across a very wide of applications; for example, the purview of the U.S. National Weather Service; or the universe of drugs managed by the U.S. Food and Drug Administration.

So there is nothing inherently radical about the notion of “big data”, at least if the notion is correctly understood as merely the set of methods commonly in use to manage data. In fact, and this is where, in my opinion. commentator hyperbole has clouded the whole question of just what is changing — in a truly radical way — about data management methods, the notion of big data is NOT correctly understood as I’ve just presented it. The “big” piece of “big data” appears to have been meant to represent a scalable data management architecture (best typified by Apache Hadoop (http://hadoop NULL.apache NULL.org)). Anyone reading the presentation on the Hadoop web site can’t help but understand the role of clusters of servers for Hadoop as a solution. Clusters of servers, in turn, provide a perfect rationale for the Apache project to provide the foundation for Hadoop.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved