Ninety percent, or more, of the data produced by enterprise-class businesses and organizations in the public and not-for-profit sectors is entirely structured. But content published on web pages is unstructured text data and, therefore, a difficult challenge. So should these consumers think differently about how to process this information? According to an article published on Computerworld January 30, 2015 titled Test shows big data text analysis inconsistent, inaccurate, they should. The article is written by Kevin Fogarty.
The point of Fogarty’s article is to expose the actual inaccuracy of a key component of most “modern” analytical tools for text information, a “modeling technique” called the “Latent Dirichlet allocation (LDA)”. Fogarty writes the LDA has been recently proven to be highly inaccurate, at least according to some research attributed to Luis Amaral, a physicist “from Northwestern University”.
Fogarty quotes Amaral admonishing ISVs offering text data analytical tools to these enterprise consumers to come clean about just how useful (or useless) their tools may prove to be before money changes hands to the ultimate dissatisfaction of the group making their purchase.
But, at another level, are enterprise consumers already thinking differently about text data? Microsoft SharePoint and SharePoint Online both offer Managed Metadata Services (MMS), the term store and taxonomy support. OpenText and Microsoft’s own circle of “Managed Partners” (meaning ISVs who work closely with Microsoft to fill in the blanks on high value solutions for enterprise consumers) have already come to market with complete solutions to cultivate useful data from content published on SharePoint sites. These platforms are ubiquitous across enterprise consumers, with, perhaps, as much as 80% of Fortune 500 businesses supporting an instance of one or the other of these solutions.
If the points Fogarty presents in his article prove to be true, then it should not be much of a stretch for stakeholders in a serious effort to mine high value business intelligence (BI) from web sites and social media to decide to pursue Microsoft’s solutions as a best possible choice.
ISVs looking to challenge Microsoft in this space may want to think seriously about providing a cloud, PaaS offer. After all, if these ISVs are already succeeding at this game, why shouldn’t consumers do better by simply “hitching a ride” on these platforms? As Fogarty points out DIY isn’t cutting it. At least not yet.
Ira Michael Blonder
© IMB Enterprises, Inc. & Ira Michael Blonder, 2015 All Rights Reserved