More Picasso paintings in one year than all the Vermeer paintings?
Answering an art history question with SPARQL.
Sometimes a question pops into my head that, although unrelated to computers, could likely be answered with a SPARQL query. I don’t necessarily know the query off the top of my head and have to work it out. I’m going to discuss an example of one that I worked out and the steps that I took, because I wanted to show how I navigated the Wikidata data model to get what I wanted.
On a recent trip to Dublin my wife and I went to Dublin’s wonderful National Gallery of Ireland. Among other paintings we saw Vermeer’s Woman Writing a Letter, with her Maid and Picasso’s Still Life with a Mandolin.
Seeing any Vermeer is a treat because there are so few of them around, and the way he depicts light makes for a huge difference between seeing a picture of the painting and seeing the real thing in front of you. (Remember, when you see these dumb discussions about AI-generated “paintings”: we can discuss whether they’re art or not, but they’re not paintings if there is no paint. They’re PNG and JPG files. If you compare the image above with the Vermeer hanging on the wall at the National Gallery of Ireland you’ll see what a tremendous difference that can be.) The Picasso was also great to see live because it was from his more colorful late cubist period; while some of his related collages included bits of wall paper, for this one he painted wallpaper-like patterns onto the canvas.
We know that Picasso was very prolific for many decades. This led me to wonder: was there any single year of Picasso’s career where he produced more paintings than Vermeer produced in his whole life? (Judging, in both cases, by surviving paintings that we have record of.)
The Wikipedia page for Vermeer tells us that “only 34 paintings are universally attributed to him today”, so I didn’t need SPARQL for that. The question for to me answer was this: were there any years where Picasso painted more than 34 paintings?
What triples say “Picasso made this painting”?
First I had to identify how Wikidata tells us that Picasso painted a given painting. I started with one of his most famous ones and clicked Wikidata item on the left side of the Guernica (Picasso) Wikipedia page. This showed me that Q175036 is the Wikidata identifier for this painting. I knew that the Wikidata triples with subjects that build on this ID would provide some good clues about developing a query that could count up his paintings per year.
What triples say “It’s a painting”?
I didn’t want to count up all his artworks per year, but just his paintings, so I entered the following query and executed it to see what class Guernica was an instance of. (Note that instead of using rdf:type
or a
as a property meaning “is an instance of”, Wikidata uses wdt:P31. Being reminded of this was part of my navigation around the Wikidata data model that I mentioned above.)
SELECT * WHERE {
wd:Q175036 wdt:P31 ?class .
?class rdfs:label ?name .
FILTER (lang(?name) = "en")
}
This showed that it is an instance of wd:Q3305213, or “painting”.
What triples say “It’s by Picasso”?
I went to the Wikipedia page for Picasso, picked Wikidata item, and saw that Picasso’s Wikidata identifier is WQ5593.
Next, I did a very simple query for all the data about the painting Guernica:
SELECT * WHERE {
wd:Q175036 ?p ?o
}
The result of this query included “wdt:P170 wd:Q5593”. If wd:Q5593
is Picasso, what is wdt:P170
? This is easy enough to find out when executing the query with the Wikidata SPARQL endpoint HTML form: I just clicked on this name in the query result and it showed me that wdt:P170
means “creator”.
What triples say what year a painting was created?
The Wikipedia page for Guernica says that it was created in 1937. The earlier result of asking for all the triples about the painting showed that it has a wdt:P571
value of “1 January 1937”, where wdt:P571 means “inception.”
What paintings in what years?
Next, I used this query to list all the paintings by Picasso and the dates they were created:
SELECT * WHERE {
?painting wdt:P31 wd:Q3305213 ; # it's a painting
wdt:P170 wd:Q5593 ; # by Picasso
rdfs:label ?title ;
wdt:P571 ?inceptionDate .
FILTER (lang(?title) = "en")
}
This listed them, but the Wikidata endpoint interface was displaying dates like 1913-01-01 as “1 January 1913” (with a suspicious amount having that “1 January”, so that may be a default when the month and day were unavailable). I just wanted the year if I was going to look for total paintings per year. I eventually realized that the date values were in ISO 8601 format, so I tried pulling out the year values with this query:
SELECT * WHERE {
?painting wdt:P31 wd:Q3305213 ; # it's a painting
wdt:P170 wd:Q5593 ; # by Picasso
rdfs:label ?title ;
wdt:P571 ?inceptionDate .
BIND(substr(?inceptionDate,1,4) AS ?year)
FILTER (lang(?title) = "en")
}
The dates still looked inconsistent, so I stored that query in the file pquery1.rq
and used curl to run the query from my shell command line so that I could see the raw result:
curl --data-urlencode "query@pquery1.rq" https://query.wikidata.org/sparql
That showed me that the dates weren’t just arranged in ISO 8601 format—they were actually typed as ISO dates, so I revised the query above to convert those to regular strings before pulling out the year value with this query, and the ?year
values came as the four-digit numbers I wanted to see:
SELECT * WHERE {
?painting wdt:P31 wd:Q3305213 ; # it's a painting
wdt:P170 wd:Q5593 ; # by Picasso
rdfs:label ?title ;
wdt:P571 ?inceptionDate .
# added str() call to following
BIND(substr(str(?inceptionDate),1,4) AS ?year)
FILTER (lang(?title) = "en")
}
How many Picasso paintings per year?
I wasn’t really interested in the painting titles or their month and day of inception. I had everything I needed to answer my original question: how many paintings did Picasso do each year?
SELECT ?year (COUNT(?painting) AS ?paintingsInYear) WHERE {
?painting wdt:P31 wd:Q3305213 ; # it's a painting
wdt:P170 wd:Q5593 ; # by Picasso
wdt:P571 ?inceptionDate .
BIND(substr(str(?inceptionDate),1,4) AS ?year)
}
GROUP BY ?year
ORDER BY DESC(?paintingsInYear)
Here are the first few rows of the results:
year paintingsInYear
1901 52
1906 33
1908 31
1909 30
1905 25
1914 24
1903 23
So there’s the answer: we know of more Picasso paintings from 1901 than we know of Vermeer paintings from his whole life, and in 1906 Picasso came close to the Vermeer total. The first decade of the twentieth century was a very busy year for Picasso. (I then found a website showing his paintings by year; the 1901 page is interesting.)
The eye icon dropdown “Display result as” menu on the left side of the Wikidata Query Service page offers other ways to visualize the data. I changed the ORDER BY
line in the last query to sort by the ?year
value, ran the query, and then picked “line chart” from the dropdown and got this graph of the number of Picasso’s paintings per year:
This makes it even clearer how busy he was in the first decade of that century.
There are other display types, and of course, many other painters. There is a lot more fun to have here!
The most difficult part of creating such a query is the cryptic nature of the entity and property IDs: a single letter followed by a few digits. If the resources and properties used more readable names such as “Guernica (painting)” and “creator” instead, it would be more intuitive and easier to write queries—for those of us who speak English. But, Wikidata is designed to be usable by everyone in the world, not just the English speakers, and that’s a good thing. I won’t complain.
One more note: I included a digital-humanities tag with this post because it’s about using technology to answer an art history question. The field is often about accumulating data from different sources so that people can identify new patterns, but as the data in Wikidata accumulates more and more, there are more and more great things we can do with this wonderful source.
Comments? Reply to my tweet announcing this blog entry.
Share this post