sct Walkthrough
A hands-on tour of the sct SNOMED-CT local-first toolchain.
0 — What is sct?
sct is a single Rust binary that transforms a SNOMED CT RF2 release into a set of
queryable, offline-first artefacts. No server required. No bloody Java.
It was initially created as an experiment in file-based data handling, offline-first tooling, and learning about the structure of SNOMED, but it turns out it's pretty fast and useful too, so I'm gradually adding features with the aim of creating something genuinely useful for practitioners, informaticians, and researchers working with SNOMED CT.
Data map:
SNOMED RF2 release
│
▼
sct ndjson ← build once per release (~30 s for 831k concepts)
│
├──▶ sct sqlite → snomed.db SQL + full-text search
│ │
│ └──▶ sct tct → snomed.db (+TCT) O(1) subsumption queries (optional)
├──▶ sct parquet → snomed.parquet analytics with DuckDB / pandas
├──▶ sct markdown → snomed-concepts/ one file per concept (RAG)
└──▶ sct embed → snomed-embeddings.arrow semantic vector search
│
sct mcp AI tool use via Claude
Key sct design principles
- Offline - everything happens on your local machine, no network calls or external servers required
- Deterministic - same RF2 + locale always produces identical output files, which can be version-controlled, diffed, and audited.
- File based - each artefact is a single portable file (or directory of files) that can be copied, versioned, and used with standard tools. No custom server or API needed at query time.
- No special tools required - query the SQLite database with
sqlite3, do analytics with DuckDB or pandas, search withjqorripgrep, read concept details in VSCode or a Markdown viewer, or use the MCP server to integrate with LLMs like Claude.
1 — Installation
git clone https://github.com/pacharanero/sct.git
cd sct
cargo install --path . --features "tui gui"
We're working on packaging binaries for the usual distribution channels (Homebrew, PyPI, etc.) but for now you need Rust and Cargo to build from source. Feedback in Issues will help us decide which platforms and formats to prioritise for pre-built binaries.
Verify installation:
sct --version
# sct 0.3.7
Optionally, you can generate shell completions for your shell at this point.
2 — Get SNOMED RF2 Data
SNOMED CT is distributed as RF2 (Release Format 2) — a set of TSV files.
- UK edition (recommended): Download the UK Monolith from NHS TRUD
- Includes: International release + UK clinical extension + dm+d drugs extension
- Monolith (item 105) is preferred over the Clinical Edition (item 101) — it's a single zip with everything pre-merged
- International edition: Download from SNOMED MLDS
- IPS Free Set: Available without affiliate membership from SNOMED MLDS
sct accepts ZIP files or extracted directories.
Confused by the NHS TRUD download options? See UK Edition structure for a plain-English guide to the different release types, what's in each zip, and how to decode the filenames.
3 — Build the NDJSON Artefact
The first step is always sct ndjson. This joins the RF2 tables and produces the
canonical intermediate artefact that everything else is built from.
Docs: sct ndjson
sct ndjson --rf2 .downloads/uk_sct2mo_41.6.0_20260311000001Z.zip \
--output snomed.ndjson
# ~30 s for 831k concepts → snomed.ndjson (1.1 GB)
If you pass it a .zip it will automatically extract and parse the RF2 files within. If you pass it a directory containing extracted RF2 files, it will parse them directly.
The output is a single .ndjson file — one JSON object per line, each representing a SNOMED concept with all its details (ID, preferred term, synonyms, hierarchy, relationships, attributes, etc.)
Testing on my laptop, this takes about 30 seconds for the UK Monolith release with 831k active concepts. The resulting NDJSON file is about 1.1 GB. Incredibly, because NDJSON is easier to handle in memory than JSON, you can load the whole 1.1 GB file into VSCode (takes less than 5 seconds) and play around with it there, great for getting to understand what data is available and how it's structured.
You can now query the NDJSON file with jq or any tool that can handle line-delimited JSON. For example, to get the full details of Myocardial infarction (disorder)":
jq 'select(.id == "22298006")' snomed.ndjson
Which should return something similar to the below:
{
"id": "22298006",
"fsn": "Myocardial infarction (disorder)",
"preferred_term": "Myocardial infarction",
"synonyms": [
"Infarction of heart",
"Cardiac infarction",
"Heart attack",
"Myocardial infarct",
"MI - myocardial infarction"
],
"hierarchy": "Clinical finding",
"hierarchy_path": [
"SNOMED CT Concept",
"Clinical finding",
"Finding of trunk structure",
"Finding of upper trunk",
"Finding of thoracic region",
"Disorder of thorax",
"Disorder of mediastinum",
"Heart disease",
"Structural disorder of heart",
"Myocardial lesion",
"Myocardial necrosis",
"Myocardial infarction"
],
"parents": [
{
"id": "251061000",
"fsn": "Myocardial necrosis (disorder)"
},
{
"id": "414545008",
"fsn": "Ischemic heart disease (disorder)"
}
],
"children_count": 14,
"active": true,
"module": "900000000000207008",
"effective_time": "20020131",
"attributes": {
"associated_morphology": [
{
"id": "55641003",
"fsn": "Infarct (morphologic abnormality)"
}
],
"finding_site": [
{
"id": "74281007",
"fsn": "Myocardium structure (body structure)"
}
]
},
"ctv3_codes": [
"X200E"
],
"read2_codes": [],
"schema_version": 2
}
The NDJSON artefact is the stable interface. Version-controlled. Copyable. Diffable. (Remember though, it is still copyright SNOMED International and subject to the SNOMED CT licence terms, so don't share it publicly.)
Here are some examples of mini-queries you can run directly on the NDJSON file with jq or grep:
Get all Procedures:
grep '"hierarchy":"Procedure"' snomed.ndjson | wc -l
Get all concepts with "heart" in the preferred term:
jq 'select(.preferred_term | test("heart"; "i")) | {id, preferred_term}' snomed.ndjson
Get a concept via CTV3 code (UK edition only):
jq 'select(.ctv3_codes | index("X200E"))' snomed.ndjson
Programmatic access
You can look inside the NDJSON file with any language that can read it eg. Python:
Prints all concepts that have CTV3 codes, with their preferred term and list of CTV3 codes:
import json
with open('snomed.ndjson') as f:
for line in f:
rec = json.loads(line)
if rec['ctv3_codes']:
print(f"{rec['id']}\t{rec['preferred_term']}\t{rec['ctv3_codes']}")
NDJSON is great for quick exploration and ad-hoc queries, but for more complex querying and analytics, the next step is to load it into SQLite or export to Parquet.
4 — SQLite + Full-Text Search
Load the NDJSON artefact into a SQLite database with FTS5 full-text search.
sct sqlite --input snomed.ndjson --output snomed.db
Docs: sct sqlite
On my machine this takes about 45 seconds for the UK Monolith release with 831k active concepts. The resulting snomed.db file is about 2 GB.
Now you can query SNOMED CT with standard sqlite3: The following examples should all work out of the box on the resulting database, running in the terminal.
LLMs are excellent at generating SQL queries, so you can also use any LLM to generate custom SQL queries for you on demand. sct includes an MCP server that exposes the database as 'tools' to LLMs in a standard way for interactive querying — see below.
Free-text search (FTS5)
sqlite3 snomed.db \
"SELECT id, preferred_term FROM concepts_fts WHERE concepts_fts MATCH 'heart attack' LIMIT 5"
Direct concept lookup
sqlite3 snomed.db \
"SELECT preferred_term, json(attributes) FROM concepts WHERE id = '22298006'"
Browse by hierarchy
sqlite3 snomed.db \
"SELECT id, preferred_term FROM concepts WHERE hierarchy = 'Procedure' LIMIT 10"
Recursive subsumption via IS-A table
sqlite3 snomed.db \
"WITH RECURSIVE descendants(id) AS (
SELECT DISTINCT child_id FROM concept_isa WHERE parent_id = '22298006'
UNION
SELECT ci.child_id FROM concept_isa ci JOIN descendants d ON ci.parent_id = d.id
)
SELECT DISTINCT c.preferred_term FROM concepts c JOIN descendants d ON c.id = d.id LIMIT 20"
For simple lexical searches, I added a sct lexical subcommand that generates SQL queries for you, so you don't have to write the raw SQL yourself. It supports free-text search, hierarchy filtering, and prefix search:
sct lexical --db snomed.db "heart attack" --limit 10
sct lexical --db snomed.db "diabetes" --hierarchy "Clinical finding"
sct lexical --db snomed.db "amox*" # prefix search
Docs:
sct lexicalFor more advanced and interesting SQL queries, see the
sct sqlitedocumentation
4a — UK Crossmaps: CTV3
UK edition only. The CTV3 (Clinical Terms Version 3) crossmaps are available when building from a
UK NHS SNOMED CT release (UK Monolith or UK Clinical Edition from NHS TRUD).
They are parsed automatically from the der2_sRefset_SimpleMap reference set (refset ID 900000000000497000).
CTV3 is the legacy NHS terminology used in GP and secondary care systems before SNOMED CT. Having SNOMED → CTV3 mappings is useful for:
- Migrating data from legacy systems that recorded CTV3 codes
- Interoperability with older clinical records
- Reporting to systems that still consume CTV3
- Learning and exploration — see how concepts were mapped from CTV3 to SNOMED CT
Over 524,000 concepts have CTV3 mappings in the UK Monolith release. Read v2 codes are not distributed as a separate refset in current UK releases.
Data structure:
The SQLite database includes:
concepts.ctv3_codes— JSON array of CTV3 codes for each conceptconcept_mapstable — reverse index for fast CTV3 code → SNOMED lookup
Example queries:
Forward: SNOMED → CTV3 code
sqlite3 snomed.db "SELECT id, preferred_term, ctv3_codes FROM concepts WHERE id = '22298006'"
# 22298006|Myocardial infarction|["X200E"]
Reverse: CTV3 code → SNOMED concept
sqlite3 snomed.db "
SELECT c.id, c.preferred_term, c.hierarchy
FROM concepts c
JOIN concept_maps m ON c.id = m.concept_id
WHERE m.code = 'X200E' AND m.terminology = 'ctv3'"
# 22298006|Myocardial infarction|Clinical finding
4b — Transitive Closure Table (TCT)
Docs:
sct tct
By default, sct sqlite stores only direct IS-A parent-child pairs in concept_isa. Subsumption queries ("give me all descendants of X") require a recursive CTE at query time. The transitive closure table (TCT) precomputes every ancestor-descendant pair in the hierarchy so these queries become a single indexed JOIN.
The TCT is entirely optional. Because it is derived from concept_isa — which is already in every sct sqlite output — it can be added to any existing database at any time without re-reading the original NDJSON artefact.
Build the TCT
Apply to an existing database:
sct tct --db snomed.db
# spinner: Building TCT for 831,132 concepts (5000/831132)...
# Done. 18,432,601 ancestor-descendant pairs in concept_ancestors.
Or build it in a single step alongside the main load:
sct sqlite --input snomed.ndjson --output snomed.db --transitive-closure
Both call the same underlying algorithm and produce identical output. The --transitive-closure flag is a convenience shorthand for pipelines that want everything in one command.
To include self-referential rows (depth = 0, ancestor_id = descendant_id) — useful if your queries always want "descendants including self":
sct tct --db snomed.db --include-self
Verify with sct info
sct info snomed.db
Without TCT:
IS-A edges: 504,216
TCT: not present (run `sct tct --db <file>` to build)
After sct tct:
IS-A edges: 504,216
TCT rows: 18,432,601
Performance comparison
The queries below are equivalent — both return all descendants of Myocardial infarction (22298006) in the IS-A hierarchy. The TCT version replaces a full recursive tree-walk with a single index seek.
Without TCT — recursive CTE (~4 ms on UK Monolith):
sqlite3 snomed.db <<EOF
.timer on
WITH RECURSIVE descendants(id) AS (
SELECT child_id FROM concept_isa WHERE parent_id = '22298006'
UNION
SELECT ci.child_id FROM concept_isa ci
JOIN descendants d ON ci.parent_id = d.id
)
SELECT COUNT(*) FROM descendants;
EOF
With TCT — indexed lookup (<1 ms on UK Monolith):
sqlite3 snomed.db <<EOF
.timer on
SELECT COUNT(*) FROM concept_ancestors WHERE ancestor_id = '22298006';
EOF
Both return the same count. The TCT version is faster because the index on ancestor_id gives SQLite a direct range scan over a single column, with no recursion.
The performance gap grows sharply with hierarchy depth and fanout. For large ancestors (e.g. Clinical finding with ~300k descendants), recursive CTEs can take hundreds of milliseconds; the TCT lookup stays under 1 ms regardless of hierarchy size.
Full subsumption query with preferred terms
sqlite3 snomed.db <<EOF
.timer on
SELECT c.preferred_term
FROM concepts c
JOIN concept_ancestors a ON c.id = a.descendant_id
WHERE a.ancestor_id = '22298006'
ORDER BY c.preferred_term;
EOF
Subsumption test (is A a descendant of B?)
sqlite3 snomed.db <<EOF
.timer on
SELECT CASE WHEN EXISTS (
SELECT 1 FROM concept_ancestors
WHERE ancestor_id = '22298006'
AND descendant_id = '57054005'
) THEN 'yes — is a descendant' ELSE 'no' END;
EOF
This is O(1) with the unique composite index — the core operation of any SNOMED subsumption check.
5 — Parquet for Analytics
Export to Apache Parquet for use with DuckDB, pandas, Polars, R, or Spark.
Docs:
sct parquet
sct parquet --input snomed-uk-20250301.ndjson --output snomed.parquet
# ~5 s for 831k concepts → 824 MB
Query with DuckDB
Install DuckDB: https://duckdb.org/install/
Then run queries directly on the Parquet file:
duckdb -c "
SELECT hierarchy, COUNT(*) AS n
FROM 'snomed.parquet'
GROUP BY hierarchy
ORDER BY n DESC
LIMIT 10"
Docs: For more DuckDB examples, see the
sct parquetdocumentation
6 — Markdown Export for RAG
Export SNOMED CT as a directory of Markdown files — one per concept. Ideal for retrieval-augmented generation (RAG), Claude Code file reading, or filesystem MCP.
!!! danger "CRASH WARNING"
Use with caution: the resulting directory is about 3.2 GB with 831k files (nested in subdirectories)which can be unwieldy to manage and version-control. If you try to open the directory in a text editor, it may crash. Consider using .gitignore or a separate branch if you want to keep it in the same repository.
Docs:
sct markdown
sct markdown --input snomed.ndjson --output ./snomed-concepts/
# ~14.5 s for ~831k .md files, ~1 GB total
Example output (cat snomed-concepts/clinical-finding/22298006.md):
# Myocardial infarction
**SCTID:** 22298006
**FSN:** Myocardial infarction (disorder)
**Hierarchy:** SNOMED CT Concept > Clinical finding > Finding of trunk structure > Finding of upper trunk > Finding of thoracic region > Disorder of thorax > Disorder of mediastinum > Heart disease > Structural disorder of heart > Myocardial lesion > Myocardial necrosis
## Synonyms
- Infarction of heart
- Cardiac infarction
- Heart attack
- Myocardial infarct
- MI - myocardial infarction
## Relationships
- **Associated morphology:** Infarct [55641003]
- **Finding site:** Myocardium structure [74281007]
## Hierarchy
- SNOMED CT Concept
- Clinical finding
- Finding of trunk structure
- Finding of upper trunk
- Finding of thoracic region
- Disorder of thorax
- Disorder of mediastinum
- Heart disease
- Structural disorder of heart
- Myocardial lesion
- Myocardial necrosis
- **Myocardial infarction** *(this concept)*
## Parents
- Myocardial necrosis (disorder) `251061000`
- Ischemic heart disease (disorder) `414545008`
Hierarchy-mode (one file per top-level hierarchy, ~19 files):
sct markdown --input snomed.ndjson --output ./snomed-hierarchies/ --mode hierarchy
# ~ 3 s for ~ 20 .md files, total ~ 380 MB
These human-readable files can be quite helpful for just getting an understanding of how concepts are structured, what their preferred terms and synonyms are, and what relationships they have. They can be used as context documents for retrieval-augmented generation (RAG) with LLMs, or simply for browsing in a Markdown viewer or VSCode.
7 — Vector Embeddings
Generate dense vector embeddings for semantic (nearest-neighbour) search.
!!! tip "Local AI required" Requires Ollama running locally.
The embeddings take quite a while to generate for the whole release (about 40 minutes for the UK Monolith with 831k concepts), and the resulting Arrow IPC file is about 2.7 GB, but the resulting semantic search capabilities are pretty impressive — you can find relevant concepts even when there are no shared keywords between the query and the concept text.
Docs:
sct embed
Pull the embedding model
ollama pull nomic-embed-text
# ~
Generate embeddings (streams SNOMED into Arrow IPC file)
sct embed --input snomed.ndjson \
--output snomed-embeddings.arrow \
--model nomic-embed-text
# ~65 mins for ~831k concepts → snomed-embeddings.arrow (2.7 GB)
Each concept is embedded using a rich text template:
"Heart attack. Myocardial infarction (disorder).
Synonyms: Cardiac infarction, Infarction of heart, MI.
Hierarchy: SNOMED CT concept > Clinical finding > ... > Myocardial infarction"
The Arrow IPC file can be queried in DuckDB or PyArrow, and is the input for
sct semantic.
8 — Semantic Search experimental! :lucide-test-tube
Find conceptually similar concepts using cosine similarity over embeddings. No keyword match needed.
Docs:
sct semantic
sct semantic --embeddings snomed-embeddings.arrow \
"blocked coronary artery" \
--limit 5
Example output:
5 closest concepts to "blocked coronary artery":
0.9340 [22298006] Myocardial infarction
0.9210 [44771008] Coronary artery occlusion
0.9080 [394659003] Acute coronary syndrome
0.8970 [414795007] Ischaemic heart disease
0.8810 [53741008] Coronary artery atherosclerosis
The first column is the cosine similarity between the query vector and the concept embedding — a value between 0 and 1 where 1 means identical direction in vector space. In practice, scores above ~0.85 indicate strong semantic relevance; scores below ~0.70 are usually noise. There is no hard threshold — results are always returned ranked, so the top few are what matter.
Semantic search finds concepts even when the exact terms don't match — useful for natural-language queries, typos, and synonym gaps.
The same search is also available to Claude via the snomed_semantic_search MCP tool
when sct mcp is started with --embeddings.
9 — MCP Server for LLMs
Expose SNOMED CT as a set of tools in Claude Code, Claude Desktop, or any other LLM harness or tool that supports the MCP (Model-Tool Communication Protocol) standard.
Docs:
sct mcp
Start stdio MCP server; add to Claude Desktop config
sct mcp --db snomed.db
With semantic search enabled:
sct mcp --db snomed.db --embeddings snomed-embeddings.arrow
Claude Desktop configuration
Depending on your platform, the configuration file is located at ~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows, and ~/.config/claude/claude_desktop_config.json on Linux.
{
"mcpServers": {
"snomed": {
"command": "sct",
"args": ["mcp", "--db", "/path/to/snomed.db"]
}
}
}
With semantic search:
{
"mcpServers": {
"snomed": {
"command": "sct",
"args": ["mcp", "--db", "/path/to/snomed.db",
"--embeddings", "/path/to/snomed-embeddings.arrow"]
}
}
}
Tools available in the MCP server
| Tool | Description |
|---|---|
snomed_search |
Free-text search — returns top matching concepts |
snomed_concept |
Full concept detail by SCTID |
snomed_children |
Immediate IS-A children of a concept |
snomed_ancestors |
Full ancestor chain to SNOMED root |
snomed_hierarchy |
All concepts within a top-level hierarchy |
snomed_map |
Cross-map between SNOMED CT and CTV3 (UK only) |
snomed_semantic_search |
Nearest-neighbour semantic search (requires --embeddings) |
Example MCP interaction:
"What are the subtypes of type 2 diabetes mellitus?"
LLM calls snomed_children with SCTID 44054006, receives the list, and answers
with accurate SNOMED-grounded terminology.
UK edition: CTV3 cross-mapping
If your database was built from a UK NHS SNOMED CT release, the MCP server also has access to
snomed_map — a bidirectional lookup tool for CTV3 legacy codes.
Example MCP interaction:
"What's the CTV3 code for myocardial infarction?"
LLM calls snomed_map with SCTID 22298006 and terminology snomed, receives:
{
"snomed_id": "22298006",
"ctv3_codes": ["X200E"],
"read2_codes": []
}
Or in reverse:
"I have a legacy CTV3 code X200E. What's the current SNOMED concept?"
LLM calls snomed_map with code X200E and terminology ctv3, receives full
SNOMED concept details and provides context with the modern terminology.
MCP server properties:
- Startup time < 5 ms (well under the 100 ms MCP budget)
- Read-only and stateless
- Dual-mode transport: supports both Claude Desktop (Content-Length framing) and Claude Code 2.1.86+ (newline-delimited JSON)
- Schema version validation on startup
10 — Interactive UIs
Terminal UI experimental!
To reduce the size of the default sct binary, the interactive terminal UI is an optional feature that needs to be enabled at build time with the tui feature flag. If you built sct without it, you can rebuild with: cargo install --path . --features tui
Docs:
sct tui
sct tui --db snomed.db
Three-panel layout:
- Top-left: Hierarchy browser
- Bottom-left: Search box + results
- Right: Full concept detail
Keybindings: / search, Tab switch panels, ↑↓ navigate, Enter select, q quit.
Browser UI experimental!
Docs:
sct gui
The browser-based UI is another optional feature that needs to be enabled at build time with the gui feature flag. If you built sct without it, you can rebuild with: cargo install --path . --features gui
sct gui --db snomed.db
# Opens http://127.0.0.1:8420 in your browser
sct gui # --db defaults to ./snomed.db or $SCT_DB
sct gui --port 9000 # custom port
sct gui --no-open # start server but don't open browser
Single-page app with three tabs:
- Detail — full concept view: preferred term, FSN, synonyms, attributes, parents, children count
- Graph — D3 force-directed graph showing the focal concept (centre), its parents (above), and up to 50 children (below). Draggable nodes, zoom/pan, click any node to navigate.
- Hierarchy — browse the 19 top-level SNOMED hierarchies
Bound to localhost only — never accessible from the network.
11 — Release Comparison experimental :lucide-test-tube
Compare two NDJSON artefacts to see what changed between SNOMED releases.
sct diff --old snomed-uk-20240901.ndjson \
--new snomed-uk-20250301.ndjson \
--format summary
Reports:
- Concepts added
- Concepts inactivated
- Terms changed (preferred term or FSN updated)
- Hierarchy changed (concept moved in IS-A tree)
# Machine-readable NDJSON output for scripting
sct diff --old old.ndjson --new new.ndjson --format ndjson | \
jq 'select(.change_type == "term_changed")'
12 — Artefact Inspection experimental!
Inspect any sct-produced file without needing to know its internals.
sct info snomed.ndjson
sct info snomed.db
sct info snomed-embeddings.arrow
Output includes:
- Concept count
- Schema version
- Hierarchy breakdown (concept counts per top-level hierarchy)
- File size
- Release date (if present)
13 — Performance
All timings below are for the UK Monolith (831k active concepts) on NVMe SSD.
| Operation | Time | Output size |
|---|---|---|
| RF2 → NDJSON | ~30 s | ~1.1 GB |
| NDJSON → SQLite | ~11 s | 1.3 GB |
| NDJSON → Parquet | ~5 s | 824 MB |
| NDJSON → Markdown | ~15 s | 3.2 GB (831k files) |
| MCP server startup | < 5 ms | — |
vs. remote FHIR terminology server (benchmark results):
Local SQLite queries are 50–2700× faster than equivalent FHIR R4 operations over the
network. See benchmarks.md for full methodology and results.
Run the benchmarking suite yourself:
bench/bench.sh \
--server https://your-fhir-server/fhir \
--db snomed.db \
--runs 10 \
--format table
14 — UK Clinical Edition: Layered Builds
The UK SNOMED CT Clinical Edition is built by layering three RF2 releases:
sct ndjson \
--rf2 SnomedCT_InternationalRF2_PRODUCTION_20250101T120000Z.zip \
--rf2 SnomedCT_UKClinicalRF2_PRODUCTION_20250401T000001Z.zip \
--rf2 SnomedCT_UKDrugRF2_PRODUCTION_20250401T000001Z.zip \
--locale en-GB \
--output snomed-uk-20250401.ndjson
Later --rf2 flags override earlier ones for the same concept. The --locale en-GB
flag selects GB English preferred terms from the UK language reference set.
15 — Code Lists
Manage curated collections of clinical codes as plain-text .codelist files with YAML
front-matter — designed to live in version control and be reviewed like source code.
Also accessible as sct refset and sct valueset.
Scaffold a new codelist
sct codelist new codelists/asthma-diagnosis.codelist \
--title "Asthma diagnosis" \
--author "Marcus Baw" \
--terminology "SNOMED CT"
Creates the file with full YAML front-matter (id, title, description, licence, warnings, etc.)
and opens it in $EDITOR. Pass --no-edit to skip the editor.
Add concepts
# Add single concepts by SCTID (resolved against the database)
sct codelist add codelists/asthma-diagnosis.codelist 195967001 389145006 --db snomed.db
# Add a concept plus all its active descendants
sct codelist add codelists/asthma-diagnosis.codelist 195967001 \
--db snomed.db \
--include-descendants
Remove (exclude) a concept
sct codelist remove codelists/asthma-diagnosis.codelist 41553006 \
--comment "occupational asthma — separate pathway"
Moves the line to a commented exclusion record, preserving the audit trail:
# 41553006 Occupational asthma # occupational asthma — separate pathway
Validate (CI-ready)
sct codelist validate codelists/asthma-diagnosis.codelist --db snomed.db
Checks: all SCTIDs exist and are active, preferred terms match the database (warns on drift), pending review items, required fields, duplicate SCTIDs.
Exit code 0 = warnings only. Exit code 1 = errors. Suitable for CI.
Stats
sct codelist stats codelists/asthma-diagnosis.codelist --db snomed.db
Prints concept count, hierarchy breakdown, leaf vs. intermediate ratio, excluded count, and SNOMED release age.
Diff two codelists
sct codelist diff codelists/asthma-v1.codelist codelists/asthma-v2.codelist
Reports added, removed, moved-to-excluded, and preferred-term-changed concepts.
Export
sct codelist export codelists/asthma-diagnosis.codelist --format csv
sct codelist export codelists/asthma-diagnosis.codelist --format opencodelists-csv
sct codelist export codelists/asthma-diagnosis.codelist --format markdown --output asthma.md
Typical git workflow
sct codelist new codelists/asthma-diagnosis.codelist
git add codelists/asthma-diagnosis.codelist
git commit -m "codelist: scaffold asthma-diagnosis"
sct codelist add codelists/asthma-diagnosis.codelist 195967001 266361008 389145006 --db snomed.db
git commit -m "codelist: add core asthma concepts"
sct codelist validate codelists/asthma-diagnosis.codelist --db snomed.db
git tag codelist/asthma-diagnosis/v1
15 — Command Reference Summary
| Command | Description |
|---|---|
sct ndjson |
RF2 → canonical NDJSON (build once per release) |
sct sqlite |
NDJSON → SQLite + FTS5 (SQL + full-text search) |
sct tct |
Add transitive closure table to an existing SQLite database |
sct parquet |
NDJSON → Parquet (DuckDB / analytics) |
sct markdown |
NDJSON → Markdown files (RAG / file reading) |
sct embed |
NDJSON → Arrow embeddings (requires Ollama) |
sct mcp |
Stdio MCP server for Claude (wraps SQLite) |
sct lexical |
Keyword search via FTS5 |
sct semantic |
Semantic search via cosine similarity |
sct diff |
Compare two NDJSON releases |
sct info |
Inspect any sct-produced artefact |
sct tui |
Terminal UI (requires --features tui) |
sct gui |
Browser UI (requires --features gui) |
sct completions |
Generate shell completion scripts |
sct codelist |
Build, validate, publish code lists (also: sct refset, sct valueset) |
Next Steps
sct trud— automated download from NHS TRUD APIsct serve— drop-in FHIR R4/R5 terminology server backed by SQLitesct codelist search— interactive FTS5 search → include/exclude (coming)sct codelist import/sct codelist publish— import from OpenCodelists, publish back (coming)
See specs/roadmap.md for the full list of planned features.