Querying for labels
The normal way and the wikibase:label service way
In my last blog entry I discussed various ways that different RDF datasets assign human-readable labels to resources, with the rdfs:label
property being at the center of them all. I mentioned how schema.org doesn’t use rdfs:label
but its own equivalent of that, schema:name
, which its schema declares as a subproperty of rdfs:label
. Since I wrote that, Fan Li pointed out that Facebook’s Open Graph protocol also has their own equivalent: og:title
, which you can see used in the HTML source of IMDB, Instagram, and yelp. (I tried pointing each of those three links to the view-source version of the pages, and that didn’t work, so you’ll have to take the extra step with each to view their source and see each one’s og:title
value.) This also gets defined as a subproperty of rdfs:label
in the OGP schema, so a serious RDFS application could parse that schema and then treat og:title
values as rdfs:label
values.
Treating those rdfs:label variations as rdfs:label values
Querying for rdfs:label
values is simple enough. To demonstrate how a query for rdfs:label
values will retrieve og:title
and schema:name
values when a query engine that can do inferencing has access to the Open Graph Protocol and schema.org schemas, I added some of those values to the following document with comments about where I found each. (Where I found them they were not in Turtle syntax like they are here, but they were in machine-readable formats that could easily be converted to Turtle.)
Sample data:
@prefix og: <http://ogp.me/ns#> .
@prefix schema: <https://schema.org/> .
# og:title examples
<https://www.imdb.com/title/tt22041854/?ref_=ttls_li_tt>
og:title "Priscilla (2023) ⭐ 6.9 | Biography, Drama, Music" .
<https://www.instagram.com/bobdcofficial/>
og:title " (@bobdcofficial) • Instagram photos and videos" .
<https://www.yelp.com/biz/peter-changs-china-grill-charlottesville>
og:title "Peter Chang's China Grill - Charlottesville, VA" .
# schema:name examples
## (added by Hugo as a default with no special configuration from me)
<https://www.bobdc.com/blog/rdflabels/>
schema:name "Human-readable names in RDF" .
<https://www.newyorker.com/best-books-2023>
schema:name "The Best Books We Read This Week" .
<https://www.landsend.com/products/mens-super-t-long-sleeve-t-shirt/id_130670>
schema:name "Men's Super-T Long Sleeve T-Shirt" .
I downloaded the schema.org and OGP schema files and combined them into a single schema file:
cat ogp.me.ttl schemaorg-current-https.ttl > comboschema.ttl
Then, as I described in Hidden gems included with Jena’s command line utilities, I used the Jena riot
tool to do RDFS inferencing with the data above and the combined schemas. It produced a lot of triples, so I used grep
to only show the ones that mentioned the rdfs:label
value:
riot --rdfs comboschema.ttl labeldata.ttl | grep "#label"
It produced these results:
<https://www.imdb.com/title/tt22041854/?ref_=ttls_li_tt> <http://www.w3.org/2000/01/rdf-schema#label> "Priscilla (2023) ⭐ 6.9 | Biography, Drama, Music" .
<https://www.instagram.com/bobdcofficial/> <http://www.w3.org/2000/01/rdf-schema#label> " (@bobdcofficial) • Instagram photos and videos" .
<https://www.yelp.com/biz/peter-changs-china-grill-charlottesville> <http://www.w3.org/2000/01/rdf-schema#label> "Peter Chang's China Grill - Charlottesville, VA" .
<https://www.bobdc.com/blog/rdflabels/> <http://www.w3.org/2000/01/rdf-schema#label> "Human-readable names in RDF" .
<https://www.newyorker.com/best-books-2023> <http://www.w3.org/2000/01/rdf-schema#label> "The Best Books We Read This Week" .
<https://www.landsend.com/products/mens-super-t-long-sleeve-t-shirt/id_130670> <http://www.w3.org/2000/01/rdf-schema#label> "Men's Super-T Long Sleeve T-Shirt" .
So, asking for the rdfs:label
values when the schemas were available retrieved the schema:name
and og:title
values because they were subproperties of rdfs:label
and because I used a query engine that could do inferencing. (When I created a repo that would do RDFS inferencing with the free version of GraphDB, the same thing happened. Standards!)
Some extra help from the Wikidata Query Service
Querying for an rdfs:label
value in Wikipedia can be simple enough:
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT * WHERE {
wd:Q144 rdfs:label ?name
}
Doing this in Wikidata, though, gets about 300 results (and the number has gone up since I first drafted this blog entry) because Wikidata knows the word for “dog” in so many languages. We could FILTER
it down to one or just a few languages like this:
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?label WHERE {
wd:Q144 rdfs:label ?label
FILTER (lang(?label) IN ("en","es"))
}
Wikidata has a special service to make this easier. To demonstrate it, let’s say I’m wondering about the topics of the Wikiquote pages https://en.wikiquote.org/wiki/Dogs and https://en.wikiquote.org/wiki/Cats (although it’s pretty clear from the URLs). The following query, which you can try on the Wikidata Query Service, will show me a ?foo
value of wd:Q144
and and ?bar
value of wd:Q146
, which are not very informative:
SELECT ?foo ?bar
WHERE {
{ <https://en.wikiquote.org/wiki/Dogs> schema:about ?foo }
UNION
{ <https://en.wikiquote.org/wiki/Cats> schema:about ?bar }
}
I could ask for rdfs:label
values of ?foo
and ?bar
, but instead I’ll use the wikibase:label
service built in to the Wikidata Query Service. This not only looks up the labels but even creates variables for them by adding “Label” to the names of the variables representing the resources that I’m querying about:
SELECT ?fooLabel ?barLabel
WHERE {
{ <https://en.wikiquote.org/wiki/Dogs> schema:about ?foo }
UNION
{ <https://en.wikiquote.org/wiki/Cats> schema:about ?bar }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
}
Running that query gives us the following results:
fooLabel barLabel
-------- --------
dog house cat
I could name a specific language if I wanted; running the next one shows a ?fooLabel
value of “Hund” and a ?barLabel
value of “Hauskatze”.
SELECT ?fooLabel ?barLabel
WHERE {
{ <https://en.wikiquote.org/wiki/Dogs> schema:about ?foo }
UNION
{ <https://en.wikiquote.org/wiki/Cats> schema:about ?bar }
SERVICE wikibase:label { bd:serviceParam wikibase:language "de" }
}
A neat Wikidata Query Service trick that I only recently learned about is how the web interface lets you reset the default language. If I click on “English” in the upper right of the query screen I get a drop-down, searchable list of languages. If I pick “español” from this list, the query screen’s “Examples” button gets renamed as “Ejemplos”, “Help” becomes “Ayuda”, and so forth with the rest of the UI. When I run the [AUTO_LANGUAGE]
query from above after doing this, it shows a ?fooLabel
value of “perro” and a ?barLabel
value of “gato doméstico”.
With a made-up language code of “xyz” that it doesn’t recognize, it gives me the Q names from the ?foo
and ?bar
values as ?fooLabel
and ?barLabel
values:
fooLabel barLabel
-------- --------
Q144 Q146
The wikibase:label
service is not standard SPARQL, but with the tremendous amount of multi-lingual data available in Wikidata, it adds a lot of convenience that can trim down the length of your Wikidata queries.
Comments? Reply to my tweet (or even better, my Mastodon message) announcing this blog entry.
Share this post