Skip to content

Parquet and DuckDB

Export SNOMED CT to Apache Parquet for analytics with DuckDB, pandas, Polars, R, or Spark.


Parquet for Analytics

Docs: sct parquet

sct parquet --input snomed-uk-20250301.ndjson --output snomed.parquet

# ~5 s for 831k concepts → 824 MB

Query with DuckDB

Install DuckDB: https://duckdb.org/install/

Then run queries directly on the Parquet file:

duckdb -c "
  SELECT hierarchy, COUNT(*) AS n
  FROM 'snomed.parquet'
  GROUP BY hierarchy
  ORDER BY n DESC
  LIMIT 10"

Docs: For more DuckDB examples, see the sct parquet documentation