I recently worked on a project where we had a huge amount of RDF and no clue what was in there apart from what we saw by looking at random triples. I developed a few SPARQL queries to give us a better idea of the dataset’s content and structure and these queries are generic enough that I thought that they could be useful to other people.
In my last posting I described Carnegie Mellon University’s Index of Digital Humanities Conferences project, which makes over 60 years of Digital Humanities research abstracts and relevant metadata available on both the project’s website and as a file of zipped CSV that they update often. I also described how I developed scripts to convert all that CSV to some pretty nice RDF and made the scripts available on github. I finished with a promise to follow up by showing some of the…
I think that RDF has been very helpful in the field of Digital Humanities for two reasons: first, because so much of that work involves gaining insight from adding new data sources to a given collection, and second, because a large part of this data is metadata about manuscripts and other artifacts. RDF’s flexibility supports both of these very well, and several standard schemas and ontologies have matured in the Digital Humanities community to help coordinate the different data sets.
There is a reasonable chance that you’ve never heard of SQLite and are unaware that this database management program and many database files in its format may be stored on all of your computing devices. Firefox and Chrome in particular use it to keep track of your cookies and, as I’ve recently learned, many other things. Of course I want to query all that data with SPARQL, so I wrote some short simple scripts to convert these tables of data to Turtle.
When I wrote the blog posting My SQL quick reference last month, I showed how you can pass an SQL query to MySQL from the operating system command line when starting up MySQL, and also how adding a -B
switch requests a tab-separated version of the data. I did not mention that -X
requests it in XML, and that this XML is simple enough that a fifteen-line XSLT 1.0 spreadsheet can convert any such output to RDF.