In June, 2011, Michael O’Connor posted an article to the blog on schema.org announcing a joint effort by Bing, Google, and Yahoo “to create and support a standard set of schemas for structured data markup on web pages”. At the time, O’Connor worked for Microsoft in the Bing product group. This post has no title. Nevertheless, in this writer’s opinion the announcement of this partnership, back in the summer of 2011, marks an important moment, which, today, in 2014, has produced something called the Google Structured Data Markup Helper.
For readers working with a set of assumptions about two separate notions of data:
- Structured Data, or “Data that resides in fixed fields within a record or file” (this definition has been published by PCMag on its web site)
- and Unstructured Data, “(or unstructured information). [Unstructured data] refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner.” (this definition has been published on Wikipedia)
the rationale behind Google’s involvement with the Schema.org project, as a foundation-level participant, may be difficult to understand. After all, why would the largest compendium of web pages located anywhere (Google) want to develop and distribute a tool people can use to aggregate otherwise unstructured data (ASCII text published on HTML web pages) into morsels of information search engines can use to package and produce something called “rich content snippets”?
Apparently the “fixed fields within a record or file” (excerpted from the above mentioned quoted definition of structured data, which can be found on the PCMag web site) is a very valuable feature, and one which Google, Yahoo, Bing (and probably Facebook, LinkedIn, Twitter and any other web site with lots and lots of web pages filled with text content) really need to produce highly relevant results from natural language queries. The key point I hope readers will keep in mind about “fixed fields” is the role columns and rows play as the best method of articulating this characteristic about data. Let’s take it one step further: data repositories with lots and columns and rows are usually referred to as Relational Database Management Systems (RDBMSs).
But in 2014, it’s possible to produce a lot of the same result with at least two (if not more) of the options available to any organization needing a means of exchanging data (for web services, SOA, etc). One of these options is XML. The other is JSON. The important point for any readers with a stake in an enterprise content, or document management system (ECM, EDM) (for example, Microsoft SharePoint on-premises, or SharePoint Online, Office 365) is the opportunities presented by XML, JSON and the entire Schema.org project, to move beyond proprietary systems and towards ostensibly lower cost, custom versions of the same functionality build with open source components.
Ira Michael Blonder
© IMB Enterprises, Inc. & Ira Michael Blonder, 2014 All Rights Reserved