In my book Learning SPARQL I often use a query for all the triples in a dataset (that is, all the triples in the default graph and all the triples in any named graphs) that I now realize needs some revision to be more accurate.

To see the issue that I ran into, first imagine running the update request in example 338 from the book on an empty dataset. It inserts two triples into the default graph and two each into two named graphs:

# filename: ex338.ru

PREFIX d:  <http://learningsparql.com/ns/data#>
PREFIX dm: <http://learningsparql.com/ns/demo#>

INSERT DATA
{
  d:x dm:tag "one" . 
  d:x dm:tag "two" . 

  GRAPH d:g1
  { 
    d:x dm:tag "three" . 
    d:x dm:tag "four" . 
  }

  GRAPH d:g2
  { 
    d:x dm:tag "five" . 
    d:x dm:tag "six" . 
  }
}

Next we run example 332:

# filename: ex332.rq

SELECT ?g ?s ?p ?o
WHERE
{
  { ?s ?p ?o }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}

Contrasting this query with a SELECT * WHERE {?s ?p ?o} query earlier in the book, I wrote “This really is the List All Triples query, because it lists a union of all triples in the default graph and all the triples in any named graph along with the associated graph names”. When run with the Jena Fuseki triplestore, it lists the six triples shown in example 338 above with the associated graph names next to the last four.

I had assumed that a triple is either in a named graph or in a default graph, but I have recently learned that it’s not always that simple. For example, according to the SPARQL query specification’s Examples of RDF Datasets section, “One possible arrangement of graphs in an RDF Dataset is to have the default graph be the RDF merge of some or all of the information in the named graphs”. According to my experiments, the GraphDB, Blazegraph, and RDFLib query engines each assume that named graph triples are also in the default graph. With these query engines, running the query above with the data above gets me a list of ten query results because the “three”, “four”, “five”, and “six” triples appear with their graph names, and because of the {?s ?p ?o} before the UNION keyword, they also show up as part of the default graph.

As I learned in a conversation with Andy Seaborne on the Jena mailing list, you can configure Fuseki to do this. From now on, though, when I want to list all of a dataset’s triples by first listing those that aren’t in a named graph and then listing the ones that are with their graph names, I’ll use this new query below. It uses the MINUS keyword to explicitly exclude named graph triples from the set of default graph triples being retrieved by the clause before the UNION keyword:

SELECT ?g ?s ?p ?o
WHERE
{
    { ?s ?p ?o
      MINUS { GRAPH ?g {?s ?p ?o} }
    }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}

Using the data above, this query returns the same six rows with Fuseki, GraphDB, and RDFLib. (Blazegraph returns six rows, but with bd:nullGraph as the ?g value for the “one” and “two” triples.)

It’s not a super efficient query, but asking for absolutely all the triples in a dataset rarely is. With small datasets it’s a quick way to answer the question “what do we have here”, so I use it often when showing the effect of various keywords and syntax in a SPARQL query. I’ll be using this new query often enough that I already have it as one of the Saved queries that GraphDB lets you keep handy regardless of what dataset or project you’re working on.

Triples dumpsters

Comments? Reply to my tweet (or even better, my Mastodon message) announcing this blog entry.