SPARQL in a Jupyter Notebook

For real this time.

A few years ago I wrote a blog post titled SPARQL in a Jupyter (a.k.a. IPython) notebook: With just a bit of Python to frame it all. It described how Jupyter notebooks, which have become increasingly popular in the data science world, are an excellent way to share executable code and the results and documentation of that code. Not only do these notebooks make it easy to package all of this in a very presentable way; they also make it easy for your reader to tweak the code in a local copy of your…

Last month in Populating a Schema.org dataset from Wikidata I talked about pulling data out of Wikidata and using it to create Schema.org triples, and I hinted about the possibility of updating Wikidata data directly. The SPARQL fun of this is to then perform queries against Wikidata and to see your data edits reflected within a few minutes. I was pleasantly surprised at how quickly edits showed up in query results, so I thought I would demo it with a little video.

One-click replacement of an IMDb page with the corresponding Wikipedia page

With some Python, JavaScript, and of course, SPARQL.

I recently tweeted “I find that @imdb is so crowded with ads that’s it’s easier to use Wikipedia to look up movies and actors and directors and their careers. And then there’s that Wikidata SPARQL endpoint!” Instead of just cursing the darkness, I decided to light a little SPARQL-Python-JavaScript candle, and it was remarkably easy.

In an October 14th article in the New Yorker about the use of Artificial Intelligence to generate prose, John Seabrook wrote: “A recent exhibition on the written word at the British Library dates the emergence of cuneiform writing to the fourth millennium B.C.E., in Mesopotamia”. That got me thinking about some notes I once took on the early history of metadata, and I wondered if there was any scholarship to show that the earliest metadata is as old as the earliest writing. Not…

Avoiding accidental cross products in SPARQL queries

Because one can sneak into your query when you didn't want it.

Have you ever written a SPARQL query that returned a suspiciously large amount of results, especially with too many combinations of values? You may have accidentally requested a cross product. I have spent too much time debugging queries where this turned out to be the problem, so I wanted to talk about avoiding it.

I’ve been thinking about which machine learning tools can contribute the most to the field of digital humanities, and an obvious candidate is document embeddings. I’ll describe what these are below but I’ll start with the fun part: after using some document embedding Python scripts to compare the roughly 560 Wikibooks recipes to each other, I created an If you liked… web page that shows, for each recipe, what other recipes were calculated to be most similar to that…