Benchmarks

Timing measurements for sct commands run against two SNOMED CT editions: - UK Monolith — SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z (831,132 active concepts) - UK Clinical — SnomedCT_UKClinicalRF2_PRODUCTION_20260311T000001Z (34,553 active concepts)

Machine: Lenovo Yoga 9i Pro — Intel Core Ultra 9 185H (16 cores), 64 GB RAM, NVMe SSD.

Methodology

Each command was timed with time (wall-clock) on a warm filesystem (second run, after OS page-cache is populated). Disk is NVMe SSD. NB: the first cold run will be slower due to filesystem and page-cache effects.

time sct ndjson --rf2 ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z/
time sct sqlite  --input snomed.ndjson
time sct parquet --input snomed.ndjson
time sct markdown --input snomed.ndjson

Results — UK Monolith Edition (831,132 concepts)

Command	Concepts	Output size	Wall time	Notes
`sct ndjson`	831,132	990 MB	29.6 s	RF2 parsing + join + sort + serialise
`sct sqlite`	831,132	1.3 GB	11.3 s	Stream NDJSON → WAL SQLite + FTS5 rebuild
`sct parquet`	831,132	824 MB	5.2 s	Batched Arrow writes (50k rows/batch)
`sct markdown`	831,132	3.2 GB	14.5 s	One file per concept (831k files)

Results — UK Clinical Edition (34,553 concepts)

Command	Concepts	Output size	Wall time	Notes
`sct ndjson`	34,553	20 MB	0.78 s	RF2 parsing + join + sort + serialise
`sct sqlite`	34,553	24 MB	0.27 s	Stream NDJSON → WAL SQLite + FTS5 rebuild
`sct parquet`	34,553	12 MB	0.11 s	Batched Arrow writes (50k rows/batch)
`sct markdown`	34,553	137 MB	0.49 s	One file per concept (34k files)

MCP server startup time

The sct mcp server must start under 100 ms to be usable in Claude Desktop without a perceptible delay.

time echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
  | (stdbuf -o0 sct mcp --db snomed.db & sleep 0.3; kill %1) 2>/dev/null

Result on the Monolith database (1.3 GB SQLite):

{"id":1,"jsonrpc":"2.0","result":{"capabilities":{"tools":{}},"protocolVersion":"2024-11-05","serverInfo":{"name":"sct-mcp","version":"0.2.0"}}}

The response appears in < 5 ms — well within the 100 ms budget. The sleep 0.3 in the timing harness dominates the wall-clock total; actual server response latency is sub-millisecond after the socket is open.

How to benchmark yourself

`sct ndjson`

--rf2 accepts either an RF2 directory or a .zip file directly:

# Using a zip file
time sct ndjson --rf2 ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z.zip

# Using a pre-extracted directory (warm the page cache first for a fair comparison)
find ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z -type f -exec cat {} + > /dev/null 2>&1
time sct ndjson --rf2 ~/downloads/SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z/

`sct sqlite`

time sct sqlite --input snomedct-monolithrf2-production-20260311t120000z.ndjson --output snomed.db
ls -lh snomed.db

Verify FTS works:

sqlite3 snomed.db "SELECT id, preferred_term FROM concepts_fts WHERE concepts_fts MATCH 'heart attack' LIMIT 5"

`sct parquet`

time sct parquet --input snomedct-monolithrf2-production-20260311t120000z.ndjson --output snomed.parquet
ls -lh snomed.parquet

Verify DuckDB can read it:

duckdb -c "SELECT hierarchy, COUNT(*) n FROM 'snomed.parquet' GROUP BY hierarchy ORDER BY n DESC LIMIT 5"

`sct markdown`

time sct markdown --input snomedct-monolithrf2-production-20260311t120000z.ndjson --output snomed-concepts/
du -sh snomed-concepts/
find snomed-concepts/ -name "*.md" | wc -l