CSV to JSON stream converter in Python

less than 1 minute read

I’ve been working with some govermental data that is available as huge (>50G) CSV files.
While there are workarounds to working with large files, I wanted to keep the stream processing I do with JSON files.
However, this was not a JSON file. Stream processin with CSV is hard. jq is so much easier.

#!/bin/env python3

import csv
import sys
import json

csv.field_size_limit(sys.maxsize)

for row in csv.DictReader(sys.stdin):
    sys.stdout.write(json.dumps(row)+"\n")

This tiny Python script reads STDIN line by line, converting each line from CSV to JSON and printing it out.
I then can use my standard tooling to continue chewing on the file:

python /tmp/convert.py </tmp/big_file.csv \
| jq 'select(.type=="049" or .type=="048") | .url' -r \
| head -n20 \
| xargs wget

Tags:

Updated: