Google and RDFa: what and why

Surprise—to make more money!

May 15, 2009

After the initial burst of discussion about Google putting their toe into the standardized metadata water, I started wondering about the corner of the pool they had chosen. They’re not ready to start parsing any old RDFa; they’ll be looking for RDFa that uses the vocabulary they somewhat hastily defined for the purpose. Why does the vocabulary define the properties that it defines?

The People properties sound basic enough, although as all the semweb geeks have already tweeted, Google should have leveraged the extensive existing work done on the FOAF vocabulary for that. The other three categories of properties they define are Reviews, Products, and Businesses and organizations. Of all the knowledge domains to represent, why these?

In the words of Drupal project lead Dries Buytaert, "Structured data is the new search engine optimization".

Comparing a given Google project to the big picture of all their projects can be overwhelming, but there’s no need to when you remember what their core business is: putting ads next to search results and charging for the ads when they get clicked. The more relevant the ads are to the content next to them, the more likely they are to get clicked, and the more money Google makes.

In a blog post titled The future of RDFa in February of last year, I wrote that “Pricing is… a huge area where people would be happy to give away data in the form of extra embedded metadata in their web pages, because it can drive new paying customers to the source of that data”. Google wants that data to help people sell more stuff and make more money themselves. The kind of metadata that would be embedded in reviews and information about products and companies—especially the category, brand, and price properties, and the detailed metadata that can be included in reviews—can make it much easier for Google to find users who are using their search engine to research things they’re interested in buying.

It will be interesting to see how the big hustling SEO world adapts to this. In the words of Drupal project lead Dries Buytaert, Structured data is the new search engine optimization. When he writes “Every webmaster wanting to improve click-through rates, reduce bounce rates, and improve conversation rates, can no longer ignore RDFa or Microformats”, it reminds me that when the SEO world eventually gravitates more in the RDFa direction or the microformats direction, these very quantitative, results-driven people will have some real data to explain why. I’ll have to start searching their voluminous discussions out there to see what people are saying.

Some other miscellaneous notes on Google and RDFa:

For now, Google isn’t going to look for this markup in all the data they crawl. As far as I can tell, they want you to nominate your own site to be crawled and parsed for the extra metadata.
It’s nice that Google encourages people to add a proper namespace declaration of xmlns:v=“http://rdf.data-vocabulary.org/" to a web page before adding properties such as v:reviewer and v:description. They even make this their number one “important property”. But, when they parse a document that may contain this metadata, will they check for xmlns:v=“http://rdf.data-vocabulary.org/" and then only look for v:reviewer and the other properties if they find it? Or, if they see xmlns:foo=“http://rdf.data-vocabulary.org/", will they look for foo:reviewer and other properties from their namespace even though they document doesn’t use the prefix from Google’s demo?
They point to the “official” W3C RDFa Primer. (It was a pleasant surprise to be reminded that the Primer’s acknowledgments mention me for “reviewing the work and providing useful commentary”.) Even if Google’s implementation of this will only deal with a limited vocabulary, from what I can see they’re not subsetting the standard itself, like Adobe did with their XMP “profile” of RDF.
Google does see the semantic web world beyond what’s defined in their ontology. According to the Reviews page, “You can use the additional expressiveness of RDFa to provide more information about the subject of your review. Google does not currently use the about property in search results, but it may be used in the future”. Building on this, they reassure the reader about an issue that often confuses those who are new to the use of URIs as identifiers instead of just being URLs: “If the object you’re referring to does not have an obvious URL to include, you could use the URL of pages on Wikipedia or similar web sources”.
It was nice to see how quickly a community effort led by Kingsley Idehen put together an ontology (explore it here) defining relationships between Google’s properties and more well-established ones, complete with owl:equivalentProperty properties defined to help clean up the potential mess of the vaguely defined delimiters between the http://rdf.data-vocabulary.org URI and each property name. (See here, near the bottom for an example.) This could become a canonical example of the value of ontologies.

It will be a lot of fun to build apps that use RDFa found by Google…

6 Comments

By Daniel O’Connor on May 15, 2009 10:48 PM

I only wish that I could make blogger output xhtml strict - but I can’t, because of how they throw in some iframes and what have you.

This means I can’t swap my doctype over to xhtml+rdfa and weave in their new information properly.

Annoying.

By Mark Birbeck on May 16, 2009 2:45 AM

Daniel,

The doctype is optional.

Mark

By Michael Hausenblas on May 16, 2009 3:20 AM

Bob,

Good post, I by and large agree (esp. re semantic SEO) - see also my 2c at [1].

Cheers,
Michael

[1] http://lists.w3.org/Archives/Public/public-lod/2009May/0095.html

By Tony Hammond on May 16, 2009 8:33 AM

Nice post, Bob.

Re your 2nd bullet, this is really encouraging news. A shame that Google Scholar persists in not making a namespace available for its vocabulary. For an example, see this post on Nasecnt about Nature’s inclusion of META tags, and compare the DC and PRISM vocabularies which have declared schemas with the Google Scholar tags which have no decalred schema. In fact, I couldn’t find any web page for this vocabulary other than “contact us” type links.

This new approach to including namespaces is refreshing.

Tony

By Eric Hellman on May 20, 2009 9:57 AM

I’m also disturbed by all the careless mistakes that google has left in their help documentation at http://google.com/support/webmasters/bin/answer.py?hl=en&answer=146898

I have also commented at http://www.google.com/support/forum/p/Webmasters/thread?tid=165a6bebc77f2217&hl=en

Who knows what they’ve actually implemented.

By Bob DuCharme on May 20, 2009 10:50 AM

Eric: gluejar?

Maybe they’re going with a “release early, release often” strategy and crowdsourcing the QA of the design to those who show an interest, like us…

Bob

Converting RDFS schemas to SHACL constraints

Filtering foreign literals out of SPARQL query results

Parsing JSON with Python

Amazon's failed folksonomy and Kevin Federline

RDF serialization formats

Selecting all the triples from all the graphs

Editing schemas, ontologies, and SKOS taxonomies with VocBench

SPARQLing anything

Querying for audio on Wikidata

Use SPARQL to query for movies, then watch them

blog

home