Calling your own JavaScript functions from SPARQL queries
More Jena arq fun.
When I saw “Add support for scripting languages other than JavaScript” in the Jena release 4.0.0 release notes my first reaction was “What? I can run the arq
command line SPARQL processor and call my own functions that I wrote in JavaScript?”
The ARQ - JavaScript SPARQL Functions page of the Jena documentation shows how to do this. I had some fun playing with this capability, and as you’ll see, it offers some easy opportunities to clean up and improve your data.
First, let’s see how it looks on the command line to run arq
with a SPARQL query that calls external JavaScript functions. It’s basically a typical invocation of arq
with an additional --set
parameter to point at a file of JavaScript functions, which in this example is called myjs.js
:
arq --set arq:js-library=myjs.js --query jstest.rq --data phoneNumbers.ttl
The data file that I used for my experiments simply lists a few people and their phone numbers. The v:homeTel
values use several different conventions for notating US phone numbers:
@prefix v: <http://www.w3.org/2006/vcard/ns#> .
@prefix d: <http://learningsparql.com/ns/data#> .
d:i9771 v:given-name "Cindy" ;
v:homeTel "1 (203) 446-5478" .
d:i0432 v:given-name "Richard" ;
v:homeTel " (729)556-5135 " .
d:i8301 v:given-name "Craig" ;
v:homeTel "9232765135" .
d:i8309 v:given-name "Leigh" ;
v:homeTel "843-5544" .
The query in jstest.rq
copies the triples and also does the following:
- Passes the
v:homeTel
value to anormalizeUSPhoneNumber()
function that I wrote in themyjs.js
file. - Calls the
createRating()
function in the same JavaScript file and passes the result to the CONSTRUCT clause, which puts the generated value in ad:rating
triple. - Calls a JavaScript
Date()
function directly (as opposed to calling it via something inmyjs.js
) and assigns the returned value to an?updateDate
variable that also gets used in the CONSTRUCT clause.
Notice how all of the JavaScript function calls in the SPARQL query have a js:
prefix that is declared at the top like any other prefix. This is how arq
knows that these are external JavaScript functions.
# jstest.rq
PREFIX js: <http://jena.apache.org/ARQ/jsFunction#>
PREFIX d: <http://learningsparql.com/ns/data#>
PREFIX v: <http://www.w3.org/2006/vcard/ns#>
CONSTRUCT {
?s v:given-name ?name ;
v:homeTel ?normalizedUSPhoneNumber ;
d:rating ?starRating ;
d:as-of ?updateDate;
}
WHERE {
?s v:given-name ?name ;
v:homeTel ?phoneNum .
BIND (js:normalizeUSPhoneNumber(?phoneNum) AS ?normalizedUSPhoneNumber)
BIND (js:createRating() AS ?starRating)
BIND (js:Date() AS ?updateDate) # calling JavaScript function directly
}
The JavaScript file defines two functions, both mentioned above:
normalizeUSPhoneNumber()
uses regular expressions to convert the phone number to an nnn-nnn-nnnn format if it has an area code and nnn-nnnn if it doesn’t. While SPARQL offers some support for regular expressions when you’re calculating a Boolean value to use in a FILTER expression, it doesn’t let you use regular expressions to manipulate values that can then be used in output, so I wanted to write a function that would demonstrate that.createRating()
generates a random integer between one and five to demonstrate how we can call therandom()
function to generate a number and then use other functions to massage that number into something we want.
// myjs.js
function normalizeUSPhoneNumber(phoneNumber) {
phoneNumber = phoneNumber.replace(/ /g, "")
.replace(/^1/g,"")
.replace(/-/g,"")
.replace(/\(/g,"")
.replace(/\)/g,"")
.replace(/(\d\d\d\d$)/, "-$1");
if (phoneNumber.length > 10) {
phoneNumber = phoneNumber.replace(/^(\d\d\d)/,"$1-");
}
return phoneNumber;
}
function createRating() {
return Math.ceil(Math.random()*5);
}
Running the command line shown with these files gives us this output:
@prefix d: <http://learningsparql.com/ns/data#> .
@prefix v: <http://www.w3.org/2006/vcard/ns#> .
@prefix js: <http://jena.apache.org/ARQ/jsFunction#> .
d:i9771 d:as-of "Mon May 10 2021 08:02:35 GMT-0400 (EDT)" ;
d:rating 3 ;
v:given-name "Cindy" ;
v:homeTel "203-446-5478" .
d:i8309 d:as-of "Mon May 10 2021 08:02:35 GMT-0400 (EDT)" ;
d:rating 5 ;
v:given-name "Leigh" ;
v:homeTel "843-5544" .
d:i0432 d:as-of "Mon May 10 2021 08:02:35 GMT-0400 (EDT)" ;
d:rating 4 ;
v:given-name "Richard" ;
v:homeTel "729-556-5135" .
d:i8301 d:as-of "Mon May 10 2021 08:02:35 GMT-0400 (EDT)" ;
d:rating 2 ;
v:given-name "Craig" ;
v:homeTel "923-276-5135" .
Running it more than once gives different values for d:rating
each time, as I had hoped. (You always want to double-check that with random functions.)
I also wanted to demonstrate a filter condition with a function that takes multiple arguments and returns true or false, and that’s easy enough to do, but I couldn’t think of a good one that would do something that I couldn’t do in SPARQL. In SPARQL something like that might take up multiple lines of the query, so it would be more verbose, but still, comparing values in multiple variables to then set a Boolean as true or false is straightforward in standard SPARQL without calling some external function.
Since writing this little demo I have already used this ability to call external JavaScript functions to clean up some data in another project the way I did with the phone numbers above. I had the SPARQL query above call js:Date()
directly to show that we can call JavaScript functions directly from such queries; if I hadn’t, I would have the query call a new function in the myjs.js
file that called js:Date()
and then used regular expressions or some other string manipulation tools to trim the returned date value down or convert it to ISO 8601 format. It would be another good example of how this ability to call external JavaScript functions from a SPARQL query makes the excellent library of native JavaScript functions available to a SPARQL developer.
Share this post