Running Spark GraphX algorithms on Library of Congress subject heading SKOS
Well, one algorithm, but a very cool one.
(This blog entry has also been published on the databricks company blog.)
Well, one algorithm, but a very cool one.
(This blog entry has also been published on the databricks company blog.)
Some interesting possibilities for working together.
In Spark Is the New Black in IBM Data Magazine, I recently wrote about how popular the Apache Spark framework is for both Hadoop and non-Hadoop projects these days, and how for many people it goes so far as to replace one of Hadoop’s fundamental components: MapReduce. (I still have trouble writing “Spar” without writing “ql” after it.) While waiting for that piece to be copyedited, I came across 5 Reasons Why Spark Matters to Business by my old XML.com editor Edd…
Note: I wrote this blog entry to accompany the IBM Data Magazine piece mentioned in the first paragraph, so for people following the link from there this goes into a little more detail on what RDF, triples, and SPARQL are than I normally would on this blog. I hope that readers already familiar with these standards will find the parts about doing the inferencing on a Hadoop cluster interesting.
Retrieve data from a SPARQL endpoint, graph it and more, then automate it.
In part 1 of this series, I discussed the history of R, the programming language and environment for statistical computing and graph generation, and why it’s become so popular lately. The many libraries that people have contributed to it are a key reason for its popularity, and the SPARQL one inspired me to learn some R to try it out. Part 1 showed how to load this library, retrieve a SPARQL result set, and perform some basic statistical analysis of the numbers in the result set. After I…
Or, R for RDF people.
R is a programming language and environment for statistical computing and graph generation that, despite being over 30 years old, has gotten hot lately because it’s an open-source, cross-platform tool that brings a lot to the world of Data Science, a recently popular field often associated with the analytics aspect of the drive towards Big Data. The large, active community around R has developed many add-on libraries, including one for working with data retrieved from SPARQL endpoints, so…