Querying for audio on Wikidata
Music and more.
For a long time I’ve thought that it would be fun to use SPARQL queries of Wikidata to create music playlists that can be played back. While researching last month’s blog entry Use SPARQL to query for movies, then watch them I learned about the P724 Internet Archive ID property, and that turned out to be an excellent hook for finding Wikidata audio recordings that we can listen to.
In that last entry, my query for the films of Frank Capra searched for resources that have a P31 value of Q11424 (that is, they are instances of film) and have a P724 value of some movie that we can watch. This brought up the question: what other types besides film have P724 values? We can answer this with a simple query:
SELECT DISTINCT ?name WHERE {
?s wdt:P724 ?internetArchiveID ;
wdt:P31 ?type .
?type rdfs:label ?name .
FILTER ( lang(?name) = "en" )
}
LIMIT 100
(Instead of showing query results here, I will provide links so you can run each yourself.) The list of types that have Internet Archive IDs includes many interesting possibilities. (I limited it to 100 results because it was threatening to time out.)
While I encourage you to explore the various values that this retrieves, I decided to focus on Q105543609, musical work/composition. I found that many had a genre property, so I listed all the possible values that came up for that:
SELECT DISTINCT ?genre ?genreName WHERE {
?s wdt:P31 wd:Q105543609; # a musical composition
wdt:P51 ?recording ; # where a recording exists
wdt:P136 ?genre . # that is tagged with a genre
?genre rdfs:label ?genreName .
FILTER( lang(?genreName) = "en" )
}
ORDER BY ?genreName
When I ran this I found 150 genres.
I got a little too excited until I rediscovered one of the common issues with metadata: just because people can tag something with structured metadata doesn’t mean that they do, so there are very few recordings associated with many of these tags. For example, the following asks for spirituals,
SELECT * WHERE {
?s wdt:P31 wd:Q105543609; # a musical composition
wdt:P51 ?recording ; # with a recording
wdt:P136 wd:Q212024 ; # that has a genre of spiritual
rdfs:label ?name .
?wppage schema:about ?s .
FILTER(contains(str(?wppage),"//en.")) # Only the English Wikipedia pages
FILTER( lang(?name) = "en" )
}
but if you run it you’ll only find two recordings. (Of the two, the 1915 Tuskegee Institute Singers 78RPM record of “The Old Time Religion” is a wonderful Wikimedia find.)
That last query and several of the remaining ones also ask for the associated Wikipedia page, which was handy when one of my queries would turn up a recording that made me think “wait, what IS this?”
Replacing the genre value in that query with others gave me some interesting results. Using wd:Q102932 for “avant-garde” got me a MIDI file and an ogg vorbis “recording” of John Cage’s famous silent piece 4'33". A genre of wd:Q9734 for “symphony” found two movements of Beethoven’s 7th and no other recordings. The one recording tagged as wd:Q7749 for “rock and roll” was the U.S. Air Force band playing “When the Saints Go Marching In”, which reminds me of the old semantic web saying “anyone can say anything about anything”. (Considering their arrangement, I was tempted to change it to the wd:Q906647 category for “dixieland jazz”, but because the Wikidata page lists the song as a “gospel hymn”, I changed “rock and roll” to that.)
Another genre is “national anthem”. A query for that only gave one result, but there are other ways to query for specific types of recordings than by the genre value. Instead of looking for recordings that are instances of musical composition, I can just look for those that are instances of national anthem. This turned up 369 recordings:
SELECT ?anthemName ?wppage ?recording WHERE {
?anthem wdt:P31 wd:Q23691 ; # is a national anthem
wdt:P51 ?recording ;
rdfs:label ?anthemName .
?wppage schema:about ?anthem .
FILTER( lang(?anthemName) = "en" )
FILTER(contains(str(?wppage),"//en.")) # Only the English Wikipedia pages
}
Running that can let you create a pretty crazy playlist.
Instrumentation was another interesting property to use when searching for music. I started by asking, for all the recordings, which instrumentation values were used:
SELECT DISTINCT ?instrumentation WHERE {
?s wdt:P51 ?recording ;
wdt:P870 ?instrumentationURI .
?instrumentationURI rdfs:label ?instrumentation.
FILTER ( lang(?instrumentation) = "en" )
}
ORDER BY ?instrumentation
Running it showed 44 results. One was viola, so I wondered how many have that as their instrumentation value:
SELECT ?recording WHERE {
?s wdt:P51 ?recording ;
wdt:P870 ?instrumentationURI .
?instrumentationURI rdfs:label "viola"@en.
}
ORDER BY ?instrumentation
Running this one showed nine pieces, each where a viola would be part of the group. For example, Mozart’s “A Little Night Music” has four instrumentation values:
SELECT DISTINCT ?instrumentation WHERE {
wd:Q12025 wdt:P51 ?recording ;
wdt:P870 ?instrumentationURI .
?instrumentationURI rdfs:label ?instrumentation.
FILTER ( lang(?instrumentation) = "en" )
}
You will see that it was written for a string orchestra.
As you can see, for many of these I would see which properties were used with resources that had recordings and then did more queries with those properties. There are bird calls, historic speeches from the early days of audio recording, and all kinds of things to explore. I’m sure I’ll be doing more.
I will leave you with one of my more successful queries:
SELECT ?name ?wppage ?recording WHERE {
?composerURL rdfs:label "Johann Sebastian Bach"@en .
?instrumentationURI rdfs:label "harpsichord"@en .
?s wdt:P51 ?recording ;
wdt:P870 ?instrumentationURI ;
rdfs:label ?name ;
wdt:P86 ?composerURL .
?wppage schema:about ?s .
FILTER( lang(?name) = "en" )
FILTER(contains(str(?wppage),"//en.")) # Only the English Wikipedia pages
}
Running this will give you recordings of 14 J.S. Bach harpsichord pieces.
Comments? Reply to my tweet (or even better, my Mastodon message) announcing this blog entry.
Share this post