CSV to JSON stream converter in Python

less than 1 minute read

I’ve been working with some govermental data that is available as huge (>50G) CSV files.
While there are workarounds to working with large files, I wanted to keep the stream processing I do with JSON files.
However, this was not a JSON file. Stream processin with CSV is hard. jq is so much easier.

#!/bin/env python3

import csv
import sys
import json

csv.field_size_limit(sys.maxsize)

for row in csv.DictReader(sys.stdin):
    sys.stdout.write(json.dumps(row)+"\n")

This tiny Python script reads STDIN line by line, converting each line from CSV to JSON and printing it out.
I then can use my standard tooling to continue chewing on the file:

python /tmp/convert.py </tmp/big_file.csv \
| jq 'select(.type=="049" or .type=="048") | .url' -r \
| head -n20 \
| xargs wget

Introducing ESLint to your codebase smoothly

3 minute read

When adding a linter to an existing codbase, my methodology is as follows: Create lint config files and approve them with the team (people have strong opi...

A quick and simple VPN

less than 1 minute read

I’m currently on vacation abroad and need access to one of the government-run websites to coordinate a time-sensitive matter. As a very cheap security measur...

A Laptop can be a Big Raspberry Pi

2 minute read

I used to be one of those people running a rpi home server. I have a long history with running rpi, and I learned some things along the way: The disastero...

Streaming SQL results from SQLALchemy via a FastAPI endpoint

2 minute read

I was asked to create an endpoint that gets an SQL query and replies with a JSON list of the results. The prototype was ready in 10 minutes:

Nitzan

CSV to JSON stream converter in Python

You May Also Enjoy

Introducing ESLint to your codebase smoothly

A quick and simple VPN

A Laptop can be a Big Raspberry Pi

Streaming SQL results from SQLALchemy via a FastAPI endpoint