Python & JSON: Working with large datasets using Pandas

Working with large JSON datasets can be a pain, particularly when they are too large to fit into memory. In cases like this, a combination of command line tools and Python can make for an efficient way to explore and analyze the data. In this post, we’ll look at how to leverage tools like Pandas to explore and map out police activity in Montgomery County, Maryland. We’ll start with a look at the JSON data, then segue into exploration and analysis of the JSON with Python.

When data is stored in SQL databases, it tends to follow a rigid structure that looks like a table. Here’s an example from a SQLite database:

id|code|name|area|area_land|area_water|population|population_growth|birth_rate|death_rate|migration_rate|created_at|updated_at 1|af|Afghanistan|652230|652230|0|32564342|2.32|38.57|13.89|1.51|2015-11-01 13:19:49.461734|2015-11-01 13:19:49.461734 2|al|Albania|28748|27398|1350|3029278|0.3|12.92|6.58|3.3|2015-11-01 13:19:54.431082|2015-11-01 13:19:54.431082 3|ag|Algeria|2381741|2381741|0|39542166|1.84|23.67|4.31|0.92|2015-11-01 13:19:59.961286|2015-11-01 13:19:59.961286

As you can see, the data consists of rows and columns, where each column maps to a defined property, like id, or code. In the dataset above, each row represents a country, and each column represents some fact about that country.

But as the amount of data we capture increases, we often don’t know the exact structure of the data at the time we store it. This is called unstructured data. A…