[still from His Girl Friday]

I recently learned about WikiFlix, which lets you search for streamable movies on the Internet. It was assembled by Sandra Fauconnier and Magnus Manske. (Magnus played a major role in developing MediaWiki, which I’ve blogged about several times.) Sandra has provided some good background on the history and goals of WikiFlix on Wikimedia.

When I sent her some geeky questions about the role of Wikidata in that, she told me about a great Wikidata property that I hadn’t known about: P1651, or “YouTube video ID”. It usually links to a video of an entire movie. Once I started playing with it, it didn’t take me long to come up with this query for the titles and YouTube links of films with Cary Grant:

SELECT ?filmTitle ?youTubeURL  WHERE {
  ?castMember rdfs:label "Cary Grant"@en . 
  ?film wdt:P161 ?castMember ;
        rdfs:label ?filmTitle ;
        wdt:P1651 ?youtubeID .
  FILTER(lang(?filmTitle) = "en")
  BIND(URI(CONCAT('https://www.youtube.com/watch?v=',?youtubeID)) AS ?youTubeURL)
}

(I didn’t include prefix declarations in the query because I only used prefixes that are predeclared in Wikidata.) The last line just adds the P1651 value to the usual YouTube stub and converts the result to a URI— or for our purposes, a URL, because it’s locating something. As you’ll see if you run this query on Wikidata, the ?youTubeURL links will take you to the listed movies on YouTube.

Some of the links actually lead to a page saying that that the video has been taken down because of a copyright claim. The earlier the film was made, the more likely it is to be available on YouTube, so let’s list them sorted by date:

SELECT ?releaseDate ?filmTitle ?youTubeURL  WHERE {
  ?castMember rdfs:label "Cary Grant"@en . 
  ?film wdt:P161 ?castMember ;
        wdt:P577 ?releaseDate ; 
        rdfs:label ?filmTitle ;
        wdt:P1651 ?youtubeID .
  FILTER(lang(?filmTitle) = "en")
  BIND(URI(CONCAT('https://www.youtube.com/watch?v=',?youtubeID)) AS ?youTubeURL)
}
ORDER BY ?releaseDate

If you run that on Wikidata you’ll see them listed by date. The links for the first few movies in the result worked fine for me. Some link to multiple YouTube URLs, which I thought was worth leaving in case the first one I try doesn’t work.

Of course movies have all kinds of metadata to query for, which adds to the fun. For example, I could query for Cary Grant movies directed by Howard Hawks:

SELECT ?releaseDate ?filmTitle ?youTubeURL  WHERE {
  ?castMember rdfs:label "Cary Grant"@en . 
  ?director rdfs:label "Howard Hawks"@en . 
  ?film wdt:P161 ?castMember ;
        wdt:P577 ?releaseDate ; 
        rdfs:label ?filmTitle ;
        wdt:P57 ?director ; 
        wdt:P1651 ?youtubeID .
  FILTER(lang(?filmTitle) = "en")
  BIND(URI(CONCAT('https://www.youtube.com/watch?v=',?youtubeID)) AS ?youTubeURL)
}
ORDER BY ?releaseDate

You’ll see three movies in the results. (I have certainly seen “Bringing Up Baby” and “His Girl Friday” but I have never heard of Hawks and Grant doing a film called “Monkey Business”, which I will certainly need to check out.) The P136 genre property is another that can make movie query results more interesting.

Sandra also told me about two more properties that point at other video collections: P10 and P724.

P10 points to video content stored on Wikimedia. For example, the Wikidata page for Dziga Vertov’s Man with a Movie Camera, which I knew was an important early Soviet silent but have never seen, includes this triple; the triple’s object links to a version of the film that we can watch:

Q829250 wdt:P10 <http://commons.wikimedia.org/wiki/Special:FilePath/Man%20With%20A%20Movie%20Camera%20%28Dziga%20Vertov%2C%201929%29.webm>

P724 is a resource’s Internet Archive ID. This may be a film, or it may be an emulated video game or even software such as Visicalc. The Wikidata page for the 1944 Frank Capra film Arsenic and Old Lace has two wdt:P724 values: “1944-arsenic-and-old-lace-arsenico-por-compasion-frank-capra-vose” and “1944-arsenic-and-old-lace-este-mundo-e-um-hospicio-frank-capra-legendado”. Add either one to the stub https://archive.org/details/ and you’ll have a URL that lets you watch the whole movie.

Because P724 gets applied to so many different media, when looking for movies it’s a good idea to have your query specify that you want an instance of film. For example:

SELECT ?title ?internetArchiveURL WHERE {
  ?film wdt:P31	 wd:Q11424 ;   # it's a film 
        wdt:P724 ?internetArchiveID; 
        wdt:P57 ?director ;
        rdfs:label ?title . 
  ?director rdfs:label "Frank Capra"@en .
  FILTER(lang(?title) = "en")
  BIND(URI(CONCAT('https://archive.org/details/',?internetArchiveID)) AS ?internetArchiveURL)
}

(When I ran that one it was interesting to see how many of the World War II propaganda films that Capra directed are available for viewing.)

So the next time you’re looking for a film to watch, instead of Netflix or Apple TV, let some SPARQL queries of Wikidata point you to classic films that you can watch for free!


Comments? Reply to my tweet (or even better, my Mastodon message) announcing this blog entry.