My command line OWL processor
With most of the credit going to to Ivan Herman.
I recently asked on Twitter about the availability of command line OWL processors. I got some leads, but most would have required a little coding or integration work on my part. I decided that a small project that I did with the OWL-RL Python library a few years ago gave me a head start on just creating my own OWL command line processor in Python. It was pretty easy.
My goal was something that would read RDF files, do inferencing, and output any triples created by the inferencing. The heavy lifting is done by the OWL-RL library for the classic RDFLib Python library. The OWL-RL library was originally written by Ivan Herman and is now maintained by Ashley Sommer and Nicholas Car. (As you would guess from its name, this library implements the rule-based OWL profile known as OWL RL.) My script is short and simple enough that instead of putting it on github I’ve just pasted it below.
Testing it
In my recent blog posting You probably don’t need OWL, I wrote about an inferencing use case:
For example, in Trying Out Blazegraph (which only supports bits of OWL), I showed a dataset that had triples about various chairs and desks being located in various rooms, as well as triples about which rooms were in which buildings, but nothing about which furniture was in which buildings (or for that matter, what counted as furniture). I then used the RDFS
rdfs:subClassOf
property to declare thatdm:Chair
anddm:Desk
were subclasses ofdm:Furniture
, and I also declared that mydm:locatedIn
property was anowl:TransitiveProperty
. With these additional modeling triples, a SPARQL query to an OWL processor that understoodrdfs:subClassOf
andowl:TransitiveProperty
could then list which furniture was in which building. This little bit of OWL actually added some semantics to the model as well, because it tells us—and OWL processors—a little about the “meaning” ofdm:locatedIn
.
To try this example with my new command line processor, I didn’t even need to use SPARQL. I just stored the “Trying Out Blazegraph” sample data in a file called chairsAndTables.ttl
and fed it to my script like this:
owl-rl-inferencing.py chairsAndTables.ttl
Here are the first three triples of the output:
<http://learningsparql.com/ns/data#chair15> a ns2:Furniture, ns1:Thing ;
ns2:locatedIn <http://learningsparql.com/ns/data#building100> .
It inferred that chair 15 is an instance of the Furniture
class (and of the Thing
class) and that it’s in building 100. It also output triples about what buildings all the other chairs and tables were in, so I counted this as a successful test.
For another test, I was especially happy to see the script do the inferencing I expected from one particular example in my book Learning SPARQL. Example dataset ex424.ttl
lists the name, instrument played, and birth state of six musicians without saying that any is a member of any class. Here are two examples:
d:m2 rdfs:label "Charlie Christian" ;
dm:plays d:Guitar ;
dm:stateOfBirth d:TX .
d:m4 rdfs:label "Kim Gordon" ;
dm:plays d:Bass ;
dm:stateOfBirth d:NY .
It also includes the following restriction class definitions, which specify conditions that qualify an instance as a member of the classes Guitarist
, Texan
, and TexasGuitarPlayer
:
dm:Guitarist
owl:equivalentClass
[ rdf:type owl:Restriction ;
owl:hasValue d:Guitar ;
owl:onProperty dm:plays
] .
dm:Texan
owl:equivalentClass
[ rdf:type owl:Restriction ;
owl:hasValue d:TX ;
owl:onProperty dm:stateOfBirth
] .
dm:TexasGuitarPlayer
owl:equivalentClass
[ rdf:type owl:Class ;
owl:intersectionOf (dm:Texan dm:Guitarist)
] .
To test my script’s ability to read different serializations, I split up ex424.ttl
into ex424a.ttl
, ex424b.nt
, and ex424c.rdf
before feeding them to the script like this:
owl-rl-inferencing.py ex424a.ttl ex424b.nt ex424c.rdf
The output included the following triples, so we know that it inferred that Charlie Christian was an instance of all three classes:
<http://learningsparql.com/ns/data#m2> a
<http://learningsparql.com/ns/demo#Guitarist>,
<http://learningsparql.com/ns/demo#Texan>,
<http://learningsparql.com/ns/demo#TexasGuitarPlayer> .
It did not infer that resource m4
, New York bassist Kim Gordon, was in either class. It did infer that Texas piano player Red Garland was a Texan
, but not a Guitarist
or a TexasGuitarPlayer
, and it inferred that native Californian Bonnie Raitt was a Guitarist
but not a member of the other two classes.
Combining this with other tools
The inferred triples may need some management after they’re materialized. If chair 15 gets moved from room 101 in building 100 to building 201 in building 200, we don’t want that inferred triple about it being in building 100 hanging out any more. Named graphs can help here, as I described in Living in a materialized world: Managing inferenced triples with named graphs. That shows how RDFLib lets you pipeline a series of queries and updates, letting you combine simple and complex operations into sophisticated applications. The ability to do OWL inferencing can contribute a lot to these pipelines.
Without taking advantage of RDFLib’s pipelining ability at the Python code level, you can do some pipelining right from your operating system command line, sending the output of my owl-rl-inferencing.py
script to an Apache Jena tool such as riot.
Either way, I hope the script is useful to someone. Let me know!
The code
#!/usr/bin/env python3
# owl-rl-inferencing.py: read RDF files provided as command line
# arguments, do OWL RL inferencing, and output any new triples
# resulting from that.
import sys
import rdflib
import owlrl
if len(sys.argv) < 2: # print directions
print("Read RDF files, perform inferencing, and output the new triples.")
print ("Enter one or more .ttl, .nt, and .rdf filenames as arguments.")
sys.exit()
inputGraph = rdflib.Graph()
graphToExpand = rdflib.Graph()
# Read the files. arg 0 is the script name, so don't parse that as RDF.
for filename in sys.argv[1:]:
if filename.endswith(".ttl"):
inputGraph.parse(filename, format="turtle")
elif filename.endswith(".nt"):
inputGraph.parse(filename, format="nt")
elif filename.endswith(".rdf"):
inputGraph.parse(filename, format="xml")
else:
print("# Filename " + filename + " doesn't end with .ttl, .nt, or .rdf.")
# Copy the input graph so that we can diff to identify new triples later.
for s, p, o in inputGraph:
graphToExpand.add((s,p,o))
# Do the inferencing. See
# https://owl-rl.readthedocs.io/en/latest/stubs/owlrl.DeductiveClosure.html#owlrl.DeductiveClosure
# for other owlrl.* choices.
owlrl.DeductiveClosure(owlrl.OWLRL_Semantics).expand(graphToExpand)
newTriples = graphToExpand - inputGraph # How cool is that?
# Output Turtle comments reporting on graph sizes
print(f"# inputGraph: {len(inputGraph)} triples")
print(f"# graphToExpand: {len(graphToExpand)} triples")
print(f"# newTriples: {len(newTriples)} triples")
# Output the new triples (decode() is to omit "b'' " in output)
print(newTriples.serialize(format='turtle').decode())
Comments? Reply to my tweet announcing this blog entry.
Share this post