Parsing JSON with Python

My personal quick reference

It seems like every few months I have a project where I need to parse some JSON and pull out certain parts. Maybe the JSON came in JSON files, or maybe I retrieved it from an API. The duration between each of these occasions is long enough that I’ve had to relearn some basics each time, so a year or two ago I made a sample JSON file that demonstrates a few data structures and features, and then I wrote a Python demo script that parses them. Now I look at that script to review the basics each time I need to do this.

I usually need to pull out a subset of that JSON and convert it to RDF triples. If it’s JSON-LD, I don’t need any Python parsing because it’s already an RDF serialization format, so I can feed it to any proper RDF parser as-is, but it’s rarely JSON-LD.

Another option is AtomGraph’s JSON2RDF. This converts any JSON at all to RDF, but if I only need a small subset of the data, I need to then create a SPARQL query to run with the JSON2RDF output so that I can pull out the parts that I want and convert them to the RDF classes and properties that I need. And, I would also have to build and install JSON2RDF on the platform where I’m running this, which was not an option on the server where I recently had work with some JSON.

My sample demo data to parse is pretty close to the test input that I used when I wrote about JSON2RDF:

{
    "mydata": {
	"color": "red",
	"amount": 3,
	"arrayTest": [
	    "north",
	    "south",
	    "east",
	    "escaped \"test\" string",
	    "west"
	],
	"boolTest": true,
	"nullTest": null,
	"addressBookEntry": {
	    "givenName": "Richard",
	    "familyName": "Mutt",
	    "address": {
		"street": "1 Main St",
		"city": "Springfield",
		"zip": "10045"
	    }
	}
    }
}

I read and output it with this Python:

#!/usr/bin/env python3
import json

f = open('jsondemo.js')
data = json.load(f)

print(data["mydata"]["color"])
print(data["mydata"]["amount"])
# Pull something out of the middle of an array
print(data["mydata"]["arrayTest"][3])
print(data["mydata"]["boolTest"])
print(data["mydata"]["nullTest"])

# Use a boolean value
if data["mydata"]["boolTest"]:
    print("So boolean!")

# Dig down into a data structure
print(data["mydata"]["addressBookEntry"]["address"]["city"])

print("-- mydata properties: --")
for p in data["mydata"]:
    print(p)

print("-- list addressBookEntry property names and values: --")
for p in data["mydata"]["addressBookEntry"]:
    print(p + ': ' + str(data["mydata"]["addressBookEntry"][p]))

# Testing whether values are present.
if "familyName" in data["mydata"]["addressBookEntry"]:
    print("There is a family name value.")
else:
    print("There is no family name value.")
    
if "phone" in data["mydata"]["addressBookEntry"]:
    print("There is a phone value.")
else:
    print("There is no phone value.")
    
f.close()

It has print statements and comments describing the demonstrated tasks, so I don’t need to describe them here. Here is the output:

red
3
escaped "test" string
True
None
So boolean!
Springfield
-- mydata properties: --
color
amount
arrayTest
boolTest
nullTest
addressBookEntry
-- list addressBookEntry property names and values: --
givenName: Richard
familyName: Mutt
address: {'street': '1 Main St', 'city': 'Springfield', 'zip': '10045'}
There is a family name value.
There is no phone value.

I hope that someday when someone asks themselves, as I have asked myself every few months, “how do I deal with that little bit of JSON in Python again?” that this demo can save them a few minutes.


Comments? Reply to my Mastodon message or Bluesky post announcing this blog entry.