Parsing JSON with Python
My personal quick reference
It seems like every few months I have a project where I need to parse some JSON and pull out certain parts. Maybe the JSON came in JSON files, or maybe I retrieved it from an API. The duration between each of these occasions is long enough that I’ve had to relearn some basics each time, so a year or two ago I made a sample JSON file that demonstrates a few data structures and features, and then I wrote a Python demo script that parses them. Now I look at that script to review the basics each time I need to do this.
I usually need to pull out a subset of that JSON and convert it to RDF triples. If it’s JSON-LD, I don’t need any Python parsing because it’s already an RDF serialization format, so I can feed it to any proper RDF parser as-is, but it’s rarely JSON-LD.
Another option is AtomGraph’s JSON2RDF. This converts any JSON at all to RDF, but if I only need a small subset of the data, I need to then create a SPARQL query to run with the JSON2RDF output so that I can pull out the parts that I want and convert them to the RDF classes and properties that I need. And, I would also have to build and install JSON2RDF on the platform where I’m running this, which was not an option on the server where I recently had work with some JSON.
My sample demo data to parse is pretty close to the test input that I used when I wrote about JSON2RDF:
{
"mydata": {
"color": "red",
"amount": 3,
"arrayTest": [
"north",
"south",
"east",
"escaped \"test\" string",
"west"
],
"boolTest": true,
"nullTest": null,
"addressBookEntry": {
"givenName": "Richard",
"familyName": "Mutt",
"address": {
"street": "1 Main St",
"city": "Springfield",
"zip": "10045"
}
}
}
}
I read and output it with this Python:
#!/usr/bin/env python3
import json
f = open('jsondemo.js')
data = json.load(f)
print(data["mydata"]["color"])
print(data["mydata"]["amount"])
# Pull something out of the middle of an array
print(data["mydata"]["arrayTest"][3])
print(data["mydata"]["boolTest"])
print(data["mydata"]["nullTest"])
# Use a boolean value
if data["mydata"]["boolTest"]:
print("So boolean!")
# Dig down into a data structure
print(data["mydata"]["addressBookEntry"]["address"]["city"])
print("-- mydata properties: --")
for p in data["mydata"]:
print(p)
print("-- list addressBookEntry property names and values: --")
for p in data["mydata"]["addressBookEntry"]:
print(p + ': ' + str(data["mydata"]["addressBookEntry"][p]))
# Testing whether values are present.
if "familyName" in data["mydata"]["addressBookEntry"]:
print("There is a family name value.")
else:
print("There is no family name value.")
if "phone" in data["mydata"]["addressBookEntry"]:
print("There is a phone value.")
else:
print("There is no phone value.")
f.close()
It has print statements and comments describing the demonstrated tasks, so I don’t need to describe them here. Here is the output:
red
3
escaped "test" string
True
None
So boolean!
Springfield
-- mydata properties: --
color
amount
arrayTest
boolTest
nullTest
addressBookEntry
-- list addressBookEntry property names and values: --
givenName: Richard
familyName: Mutt
address: {'street': '1 Main St', 'city': 'Springfield', 'zip': '10045'}
There is a family name value.
There is no phone value.
I hope that someday when someone asks themselves, as I have asked myself every few months, “how do I deal with that little bit of JSON in Python again?” that this demo can save them a few minutes.
Comments? Reply to my Mastodon message or Bluesky post announcing this blog entry.
Share this post