Skip to content

Benchmarks

Timing measurements for sct commands run against two SNOMED CT editions: - UK MonolithSnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z (831,132 active concepts) - UK ClinicalSnomedCT_UKClinicalRF2_PRODUCTION_20260311T000001Z (34,553 active concepts)

Machine: Lenovo Yoga 9i Pro — Intel Core Ultra 9 185H (16 cores), 64 GB RAM, NVMe SSD.


Methodology

Each command was timed with time (wall-clock) on a warm filesystem (second run, after OS page-cache is populated). Disk is NVMe SSD. NB: the first cold run will be slower due to filesystem and page-cache effects.

time sct ndjson --rf2 ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z/
time sct sqlite  --input snomed.ndjson
time sct parquet --input snomed.ndjson
time sct markdown --input snomed.ndjson

Results — UK Monolith Edition (831,132 concepts)

Command Concepts Output size Wall time Notes
sct ndjson 831,132 990 MB 29.6 s RF2 parsing + join + sort + serialise
sct sqlite 831,132 1.3 GB 11.3 s Stream NDJSON → WAL SQLite + FTS5 rebuild
sct parquet 831,132 824 MB 5.2 s Batched Arrow writes (50k rows/batch)
sct markdown 831,132 3.2 GB 14.5 s One file per concept (831k files)

Results — UK Clinical Edition (34,553 concepts)

Command Concepts Output size Wall time Notes
sct ndjson 34,553 20 MB 0.78 s RF2 parsing + join + sort + serialise
sct sqlite 34,553 24 MB 0.27 s Stream NDJSON → WAL SQLite + FTS5 rebuild
sct parquet 34,553 12 MB 0.11 s Batched Arrow writes (50k rows/batch)
sct markdown 34,553 137 MB 0.49 s One file per concept (34k files)

MCP server startup time

The sct mcp server must start under 100 ms to be usable in Claude Desktop without a perceptible delay.

time echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
  | (stdbuf -o0 sct mcp --db snomed.db & sleep 0.3; kill %1) 2>/dev/null

Result on the Monolith database (1.3 GB SQLite):

{"id":1,"jsonrpc":"2.0","result":{"capabilities":{"tools":{}},"protocolVersion":"2024-11-05","serverInfo":{"name":"sct-mcp","version":"0.2.0"}}}

The response appears in < 5 ms — well within the 100 ms budget. The sleep 0.3 in the timing harness dominates the wall-clock total; actual server response latency is sub-millisecond after the socket is open.


How to benchmark yourself

sct ndjson

--rf2 accepts either an RF2 directory or a .zip file directly:

# Using a zip file
time sct ndjson --rf2 ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z.zip

# Using a pre-extracted directory (warm the page cache first for a fair comparison)
find ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z -type f -exec cat {} + > /dev/null 2>&1
time sct ndjson --rf2 ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z/

sct sqlite

time sct sqlite --input snomedct-monolithrf2-production-20260311t120000z.ndjson --output snomed.db
ls -lh snomed.db

Verify FTS works:

sqlite3 snomed.db "SELECT id, preferred_term FROM concepts_fts WHERE concepts_fts MATCH 'heart attack' LIMIT 5"

sct parquet

time sct parquet --input snomedct-monolithrf2-production-20260311t120000z.ndjson --output snomed.parquet
ls -lh snomed.parquet

Verify DuckDB can read it:

duckdb -c "SELECT hierarchy, COUNT(*) n FROM 'snomed.parquet' GROUP BY hierarchy ORDER BY n DESC LIMIT 5"

sct markdown

time sct markdown --input snomedct-monolithrf2-production-20260311t120000z.ndjson --output snomed-concepts/
du -sh snomed-concepts/
find snomed-concepts/ -name "*.md" | wc -l