Some people complain when an RDF dataset lacks a documented data model. A great thing about RDF and SPARQL is that if you want to know what kind of modeling might have been done for a dataset, you just look, even if they’re using non-(W3C-)standard modeling structures. They’re still using triples, so you look at the triples.
I found all kinds of interesting things in the article “Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph”(pdf) by Stanislav Malyshev of the Wikimedia Foundation and four co-authors from the Technical University of Dresden. I wanted to highlight two particular things that I will find useful in the future and then I’ll list a few more.
Last month I promised that I would dig further into the Wikidata data model, its mapping to RDF, and how we can take advantage of this with SPARQL queries. I had been trying to understand the structure of the data based on the RDF classes and properties I saw and the documentation that I could find, and some of the vocabulary discussing these issues confused me–for example, RDF is about describing resources, but I was seeing lots of references to entities, which can mean slightly different…
I’ve written so often about DBpedia here that a few times I considered writing a book about it. As I saw Wikidata get bigger and bigger, I kept postponing the day when I would dig in and learn more about this Wikipedia sibling project. I’ve finally done this, starting with a few basic steps and one extra fun one: