Appreciating SPARQL CONSTRUCT more
Another way to get more out of your data.
As with SQL, SPARQL’s most popular verb is SELECT. It lets you request the data you want from a collection, whether you’re asking for a single phone number or you want a list of first and last names and phone numbers of all employees hired after January 1st, sorted by last name.
CONSTRUCT provides a nice example of how SPARQL is more than a query language; along with extracting data using queries, you can create useful new data as well.
In SPARQL, SELECT is actually known as a query form, and another is CONSTRUCT. According to the SPARQL Query Language for RDF W3C Recommendation, CONSTRUCT returns a graph—a set of triples. I had thought of CONSTRUCT as a way of pulling a set of triples out of a triplestore, especially a remote triplestore, but while reviewing some TopQuadrant training material I realized how handy CONSTRUCT can be to create useful new triples.
For example, let’s say you have the following triples written in Turtle syntax to identify the gender and parent/child relationships of a few people:
@prefix : <http://www.snee.com/ns/demo#> .
:jane :hasParent :gene .
:gene :hasParent :pat ;
:gender :female .
:joan :hasParent :pat ;
:gender :female .
:pat :gender :male .
:mike :hasParent :joan .
The following CONSTRUCT statement creates new triples based on the ones above to specify who is who’s grandfather:
PREFIX : <http://www.snee.com/ns/demo#>
CONSTRUCT { ?p :hasGrandfather ?g . }
WHERE {?p :hasParent ?parent .
?parent :hasParent ?g .
?g :gender :male .
}
When I ran this query with the data above, ARQ returned the newly constructed triples in Turtle format:
@prefix : <http://www.snee.com/ns/demo#> .
:jane
:hasGrandfather :pat .
:mike
:hasGrandfather :pat .
From the same little data file, we can generate triples about who is who’s aunt:
PREFIX : <http://www.snee.com/ns/demo#>
CONSTRUCT { ?p :hasAunt ?aunt . }
WHERE {?p :hasParent ?parent .
?parent :hasParent ?g .
?aunt :hasParent ?g ;
:gender :female .
FILTER (?parent != ?aunt)
}
With this query, ARQ constructs these triples:
@prefix : <http://www.snee.com/ns/demo#> .
:jane
:hasAunt :joan .
:mike
:hasAunt :gene .
This isn’t really creating new information, but the ability to make implicit information explicit can certainly add value to a system, especially when the rules necessary to assemble the pieces are more complicated than the ones shown above for identifying grandfathers and aunts.
How you use your newly constructed triples depends on how your SPARQL engine gives them to you. As we saw above, ARQ writes them out in Turtle syntax. TopQuadrant’s TopBraid Composer displays them in the window used for SPARQL query output, and after you select one or more of them, the “Assert selected constructed triples” menu choice adds them to the graph of triples that you’re currently working with. (This works in the free edition as well.)
CONSTRUCT provides a nice example of how SPARQL is more than a query language; along with extracting data using queries, you can create useful new data as well.
4 Comments
By Keith Fahlgren on September 10, 2009 1:04 AM
I also started turning to CONSTRUCT recently as a performance optimization. Rather than having to ask the server to build the “normal” huge serialization that the libraries expect, I just plucked out a tiny subset that I needed (and didn’t cross too many internal graph storage boundaries) and asked for a CONSTRUCT of that. The speedup wasn’t as huge as I’d hoped, but it was still a fruitful exercise.
By Simon Reinhardt on September 10, 2009 4:12 AM
You can see some more examples for the usefulness of CONSTRUCT for things like rules and views at http://spinrdf.org/spin.html and at http://www.uni-koblenz-landau.de/koblenz/fb4/institute/IFI/AGStaab/Research/systeme/NetworkedGraphs .
By Bob DuCharme on September 10, 2009 9:28 AM
Believe me, I’ve been studying spinrdf.org for a few weeks now–it’s part of my job!
By Daniel Mekonnen on September 13, 2009 5:27 PM
Congratulations Bob, I think you are now seeing “the stars in the obelisk” to use a 2001 analogy (in the Arthur C. Clarke sense, not the actual year :).
In my own experience with SPARQL I actually use CONSTRUCT, INSERT and DELETE more than SELECT. Which is very much a part of the process of semanticizing unlinked data sets from raw sources like Excel files, XML, XSD, CSV, RDBMS sources and text dumps from PDF files. TopBraid Composer can import most anything that you can point a URL at and bring into a semantic representation.
But that’s just the starting point. The semantic representation that you get from the many import features are not necessarily going to be in the vocabulary that you are required to work with on a given project. This is where CONSTRUCT and friends come to the rescue, to transform the triple patterns form into another. SPARQLMotion brings the process to another level, allowing you to pipeline a series of transformations together, even merge multiple sources together, into the representation that you need.
Very powerful stuff. I view the process as “data shaping” and the equivalent of “s/pattern A/pattern B/” from the regex world. I think people who enjoy writing regular expressions will find SPARQL very enjoyable, kind of like going from 1D to 2D pattern matching.
Share this post