7
Feb

NoSQL is, for better or worse, inevitable

 

2-Color-Design-Hi-Res-100px-widthThe following comments are based on a literal definition of the NoSQL acronym, “Not Only SQL”. So readers are advised not to interpret my comments as an endorsement of “NoSQL databases” (MongoDB, DocumentDB, etc).

A lot has been written over the last few months on the promise – or illusion of one – represented by NoSQL databases. This commentary focuses on the experience of enterprise consumers who have failed to obtain the results they expected from their efforts to implement a new approach to addressing data and working with it. The consistent thread running through these presentations is an assessment about the quality of the technology – not ready just yet – for prime time. For readers not familiar with this debate, a recent research report from Forrester claimed 42% of enterprise consumers of off the shelf “NoSQL” databases are challenged by them. Reference is made to the Forrester report in an article titled Database drama: Relational or NoSQL? How to find the best choice for you

Perhaps this assessment is accurate. But what if it really doesn’t matter? What if these consumers have no choice but to use other approaches than simply SQL to get at the results they require? In 2015 for prominent consumer brands, this is the case. Just 20 years ago Procter and Gamble, Clorox, Church & Dwight and their peers all looked to television and radio advertising, and print as their promotional playgrounds. Nielsen, Harris and other polling organizations could service this big business market segment with periodic reports, data visualizations, and even predictions produced by algorithms.

But in 2015 retail customers find their entertainment content online. Over the top video does not look to be leaving the scene anytime soon. Cloud SaaS social media options continue to magnetize their interest and speak to their needs with greater accuracy based on personalization technology already in use almost everywhere.

So how does Procter and Gamble crunch these numbers? Do they collect online chatter into columnar database structures for processing via SQL queries? Not likely. In fact it is highly unlikely the Procter and Gambles of the world are even touching online chatter any more. It makes more sense for them to simply consume the predictive product offered by facebook and/or another social media ISV. Sure they will likely look to Oracle, Microsoft, SAP and IBM to run the operation because they have the on-premises infrastructure and RDBMS repositories big consumer brands still need to put together with the massive volume of unstructured data their promotional efforts are producing in the cloud. But without NoSQL methods of addressing so-called “dark data” it is not likely we would be seeing Twitter, facebook, LinkedIn reporting the kind of increases in revenue, and even profit of the last couple of weeks.

Here is another important point to consider when evaluating whether or not NoSQL data structures make sense as a long-term solution for big business, or not: Twitter, facebook, LinkedIn, Google, Amazon and Microsoft all have developed their own version of big data solutions – clusters of servers in a peer computing architecture. Google claims to have invented NoSQL as a method of addressing lots of data. Microsoft has DocumentDB. They are all using analytics developed for unstructured data along with SQL to product the business intelligence the brands need to survive.

Until another medium emerges to challenge online content publishing over Ethernet networks with variants of hypertext NoSQL is simply inevitable.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2015 All Rights Reserved

 

 

 

23
Jan

SQL remains a useful foundation for building tools to analyze data

2-Color-Design-Hi-Res-100px-widthA lot of editorial content about data, and tools built for data analysis, includes a Pavlov-like association between “big data” and “modern” computing. Relational database approaches to addressing data in a form built to accommodate analysis with Structured Query Language (SQL) tools are treated as a dated approach, somehow behind the times.

But much of this content fails to inform the people reading it about just how these “modern” computing systems actually work. For better or worse, Relational databases, which provide a structure (perhaps backbone would be a better word) for information, are, at some point in the process of analyzing electronic information (data), indispensable.

As a rule of thumb, the best examples of editorial content written on topics relevant to this subject, will incorporate some respect for SQL. David Chappell of Chappell & Associates has written a white paper for Microsoft, titled Introducing DocumentDB A NoSQL Database for Microsoft Azure, which, follows this route. Chappell writes: “To support applications with lots of users and lots of data, DocumentDB is designed to scale: a single database can be spread across many different machines . . . .DocumentDB also provides a query language based on SQL, along with the ability to run JavaScript code directly in the database as stored procedures and triggers with atomic transactions.”

From Chappell’s description it should be clear DocumentDB has been built to replicate some of the core planks of Relational Database Management Systems (RDBMS) best practices. These certainly include SQL tools along with stored procedures, and triggers. Enterprise consumers of RDBMS and/or NoSQL collections of data will approve of the end of Chappell’s sentence: “atomic transactions”. This phrase provides these readers with an important assurance: DocumentDB has been built with ACID “Atomicity, Consistency, Isolation and Durability” transaction process in mind. ACID data communications is the floor supporting today’s commercial quality electronic transactions. Without an ACID compliant structure on both sides of a commerce transaction, businesses are not likely to exchange information. The negative ramifications of such a condition are great, so “modern” best practices have been built with an assumption of ACID compliance as a given.

Unfortunately non relational database systems are challenged to demonstrate ACID compliance. This fact is not lost on Chappell. The white paper he has written for Microsoft presents a balance between big data, NoSQL and SQL and RDBMS concepts in a coherent presentation. In my opinion other technical writers would benefit from his approach. I suspect Chappell’s success at his effort is a direct result of his technical understanding of how these systems actually work.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2015 All Rights Reserved

18
Dec

A Microsoft Perspective on NoSQL and Document Databases

2-Color-Design-Hi-Res-100px-widthIn November, 2011, Julie Lerman wrote a post for Microsoft’s MSDN Magazine on Document Databases. The title of her post is What the Heck Are Document Databases? Her post may provide business sponsors of NoSQL database projects with useful information about the notion of NoSQL, and, therefore is recommended reading material.

What prompts me to recommend this post for business stakeholders in NoSQL projects (aka Gartner’s “Citizen Developers”) is the comparative lack of abstraction characterizing Lerman’s presentation. She quickly identifies document databases as one of several types of NoSQL databases (she also presents “key-value pair” databases and points to Azure Table Storage as an example). Here’s a great example of the simplicity of Lerman’s presentation of the notion of NoSQL: “The term is used to encompass data storage mechanisms that aren’t relational and therefore don’t require using SQL for accessing their data.”

For some business readers even this short definition may be challenging. Just what does she mean when she presents her notion of “data storage mechanisms that aren’t relational?” It would, perhaps, have been helpful for the audience I have targeted to add an additional sentence, to simply illustrate how rows and columns in tables, which are, defacto, “relational” components (or structure) actually offer users a method of storing information. Kind of like “I know where you are, therefore, dear data, you have been stored SOMEWHERE”.

But the business user is likely not Lerman’s intended audience. This post appears in Microsoft’s MSDN (Microsoft Developer Network) Magazine, so the intended audience, I would assume, are coders working with Microsoft tools (.NET, C#) via VisualStudio. Nevertheless, sections of the post (like the one’s I’ve quoted, above) are certainly worth a read by the audience I have in mind, as well.

Here’s more useful information. As I wrote last week, the definition of NoSQL, “Not Only Structured Query Language” is a useful text string to keep in mind when grappling with hype about “radically different” approaches to managing data, or “getting rid of” relational databases. Back in November, 2011, when Lerman published her post, she drills down into defining the NoSQL acronym, too, by pointing her readers to a post by Brad Holt of the CouchDB project. The title of Holt’s post is Addressing the NoSQL Criticism, which he handles by noting “First, NoSQL is horrible name. It implies that there’s something wrong with SQL and it needs to be replaced with a newer and better technology. If you have structured data that needs to be queried, you should probably use a database that enforces a schema and implements Structured Query Language. I’ve heard people start redefining NoSQL as “not only SQL”. This is a much better definition and doesn’t antagonize those who use existing SQL databases. An SQL database isn’t always the right tool for the job and NoSQL databases give us some other options.” (this quote is excerpted, in entirety, from Brad Holt’s post. I’ve provided a link here to the complete post and encourage readers to read the post in entirety.).

So if you need to get a good understanding about the Document Database type of NoSQL structure, I recommend reading Lerman and Holt’s posts.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

17
Dec

Google Debuts Cloud Dataflow at Google I/O 2014

2-Color-Design-Hi-Res-100px-widthAt the end of a 2.5 hr plus webcast of the Keynote Presentation from Google I/O 2014 can be found the debut of Google Cloud Dataflow, the replacement for Google MapReduce. Readers unfamiliar with MapReduce, but avidly interested in the big data enterprise computing trend, need to understand MapReduce as the application at the foundation of today’s Apache Hadoop project. Without MapReduce, the Apache Hadoop project would not exist. So Google MapReduce is a software package worth some study, as is Cloud Dataflow.

But wait, there’s more. As Urs Hölze, Senior Vice President, Technical Infrastructure, introduces Google Cloud Dataflow, his audience is also informed about Google’s role in the creation of another of today’s biggest enterprise data analytics approaches — NoSQL (“Not only SQL”). He casually informs his audience (the segue is a simple “by the way”) Google invented NoSQL.

I hope readers will get a feel for where I’m headed with these comments about these revelations about Google’s historical role in the creation of two of the very big trends in enterprise computing in late 2014. I’m perplexed at why Google would, literally, bury this presentation at the very end of the Keynote. Why would Google prefer to cover its pioneering role in these very hot computing trends with a thick fog? Few business decision-makers, if any, will be likely to pierce this veil of obscurity as they search for best-in-class methods of incorporating clusters of servers in a parallel processing role (in other words “big data”) to better address the task of analyzing text data scraped from web pages for corporate sites (“NoSQL”).

On the other hand, I’m also impressed by the potential plus Google can realize by removing this fog. Are they likely to move in this direction? I think they are, based upon some of the information they reported to the U.S. SEC in their most recent 10Q filing for Q3 2014. Year-over-year, the “Other Revenues” segment of Google’s revenue stream grew by 50% from $1,230 (in 000s) in 2013, to $1,841 in 2014. Any/all revenue Google realizes from Google Cloud and its related components (which, by the way, include Cloud Dataflow) are included in this “Other Revenues” segment of the report. For the nine months ending September 30, 2014, the same revenue segment increased from $3,325 in 2013, to $4,991 in 2014. Pretty impressive stuff, and not likely to diminish with a revamped market message powering “Google at Work”, and Amit Singh (late of Oracle) at the head of the effort.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

15
Dec

Who’s losing sleep over NoSQL?

One of the biggest challenges facing product marketing within any business is successfully identifying a market segment. I would argue more businesses fail because they either:

  1. don’t understand their market niche
  2. or can’t articulate a message intelligible to their market niche
  3. The next step is to put together a portrait of an ideal prospect within this segment. Over time, if a business is lucky enough to succeed, this portrait will likely change (perhaps scale is a better word). After all, early adopters will spread the word to more established prospects. The latter are more conservative, and proceed at a different pace, based upon different triggers.

The 3 steps I’ve just identified are no less a mandatory path forward for early stage ISVs than they are for restaurants, convenience stores, or any other early stage business.

But a lot of the marketing collateral produced by early stage ISVs offering NoSQL products and solutions, in my opinion, doesn’t signal a successful traverse of this path. In an interview published on December 12, 2014, Bob Wiederhold, CEO of CouchBase presents the first and second phases of what he refers to as “NoSQL database adoption” by businesses. Widerhold’s comments are recorded in an article titled Why 2015 will be big for NoSQL databases: Couchbase CEO.

My issue is with Wiederhold’s depiction of the first adopters of NoSQL Databases: “Phase one started in 2008-ish, when you first started to see commercial NoSQL products being available. Phase one is all about grassroots developer adoption. Developers would go home one weekend, and they’ll have heard about NoSQL, they download the free software, install it, start to use it, like it, and bring it into their companies”.

But it’s not likely these developers would have brought the software to their companies unless somebody was losing sleep over some problem. Nobody wants to waste time trying something new simply because it’s new. No insomnia, no burning need to get a good night’s rest. What I needed to hear about was just what was causing these early adopters to lose sleep.

I’m familiar with the group of developers Wiederhold portrays in the above quote. I’ve referred to them differently for other software products I’ve marketed. These people are the evangelists who spread the word about a new way of doing something. They are the champions. Any adoption campaign has to target this type of person.

But what’s missing is a portrait of the tough, mission-critical problem driving these people to make their effort with a new, and largely unknown piece of software.

It’s incumbent on CouchBase and its peers to do a better job depicting the type of organization with a desperate need for a NoSQL solution in its marketing communications and public relations efforts.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

11
Dec

The NoSQL notion suffers from some of the same ambiguity plaguing the notion of big data

Readers interested in finding out what NOSQL is all about will benefit from simply developing some familiarity with the definition of this acronym. NOSQL stands for “not only SQL”. I found this definition to be very helpful as it helped me correct my first misunderstanding about this notion. I thought NOSQL referred to a set of software tools designed to work with text, document, databases lacking the columnar table structure their Structured Query Language (SQL) siblings thrive upon.

But my understanding was wrong, which, unfortunately for businesses championing a NOSQL approach, may be the case of a lot of the enterprise user segment of the enterprise computing market for NOSQL analytics and the tools required for their delivery. mongoDB is an example of a database built to conform to NOSQL.

But as the cliche goes “the best of all intentions” can go astray, as is the case, in my opinion, for the mongoDB definition. The average consumer of enterprise computing solutions built to work with social media conversations culled from lots of web pages, likely a chief marketing officer for a popular consumer brand-name, isn’t likely to be able to understand how “Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents” (quoted from the mongoDB web page presentation).

Further, characterizing the choices facing the enterprise consumer as an either “RDBMS” or “non RDBMS” isn’t going to be helpful if the literal definition of the NOSQL acronym is applied. As MapR© points out on its web site, an optimum approach to implementing NOSQL analytics is to combine SQL and text query tools built with JSON components to digest the same data, which, admittedly be incorporated into a mongoDB database, but came, originally from an RDBMS.

What’s even more surprising about the page on the mongoDB website is the light it sheds on a programming effort by a much larger, and much more mature ISV, namely Microsoft: “Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB”. Hmmm . . . Now “Office Graph”, which is the predecessor of “Delve”, makes a lot more sense.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved

8
Dec

Comments on some of the ambiguity about the notion of big data

A number of tech markets, including enterprise computing, cloud, SaaS, PaaS, IaaS and IoT have demonstrated a voracious appetite for data management and analysis. Anyone following data management technology may get lost in the notion of “big data”.

I say lost, as an enormous amount of hype has been built up around the “theme” of “big data.” But a lot of long standing data management methods — relational databases management systems (RDBMS) with a columnar architecture built to provide structure to data — work really well for, ostensibly, enormous amounts of information (meaning data). Readers may want to consider efforts like the Port Authority of New York and New Jersey, and the toll road system it manages. How many millions of vehicle transactions occur on a monthly basis? In turn, how many billions of bits of data does the history of vehicle transactions through toll machines represent? Has this enormous amount of data proven to be unmanageable?

The answers to each of the questions, just presented, all support an argument for RDBMS and Structured Query Language (SQL) as a useful method of working with enormous amounts of data. These questions and answers echo across a very wide of applications; for example, the purview of the U.S. National Weather Service; or the universe of drugs managed by the U.S. Food and Drug Administration.

So there is nothing inherently radical about the notion of “big data”, at least if the notion is correctly understood as merely the set of methods commonly in use to manage data. In fact, and this is where, in my opinion. commentator hyperbole has clouded the whole question of just what is changing — in a truly radical way — about data management methods, the notion of big data is NOT correctly understood as I’ve just presented it. The “big” piece of “big data” appears to have been meant to represent a scalable data management architecture (best typified by Apache Hadoop). Anyone reading the presentation on the Hadoop web site can’t help but understand the role of clusters of servers for Hadoop as a solution. Clusters of servers, in turn, provide a perfect rationale for the Apache project to provide the foundation for Hadoop.

Ira Michael Blonder

© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved