sct ndjson
Convert an RF2 Snapshot directory into the canonical SNOMED CT NDJSON artefact.
This is the required first step — all other sct subcommands consume this output. It joins the RF2 files once, deterministically, and writes each active concept as a single line of JSON.
Usage
sct ndjson --rf2 <DIR|ZIP> [--rf2 <DIR|ZIP>...] [OPTIONS]
Options
| Flag | Default | Description |
|---|---|---|
--rf2 <DIR\|ZIP> |
(required) | RF2 Snapshot directory or a .zip release archive. Repeat to layer extensions. |
--locale <LOCALE> |
en-GB |
BCP-47 locale for preferred term selection. |
--output <FILE> |
(derived from RF2 dir name) | Output NDJSON path. Use -o - for stdout. |
--include-inactive |
off | Include inactive concepts (omitted by default). |
Examples
UK Monolith from a downloaded zip (no manual extraction needed)
sct ndjson --rf2 SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z.zip
# Output: snomedct-monolithrf2-production-20260311t120000z.ndjson
UK Monolith from an already-extracted directory
sct ndjson --rf2 SnomedCT_MonolithRF2_PRODUCTION_20260311T120000Z/
International release with explicit output name
sct ndjson \
--rf2 SnomedCT_InternationalRF2_PRODUCTION_20250101T120000Z.zip \
--locale en-US \
--output snomed-international-20250101.ndjson
Two-release UK edition (clinical + drug extension)
sct ndjson \
--rf2 SnomedCT_UKClinicalRF2_PRODUCTION_20250401T000001Z.zip \
--rf2 SnomedCT_UKDrugRF2_PRODUCTION_20250401T000001Z.zip \
--locale en-GB \
--output snomed-uk-full-20250401.ndjson
Write to stdout (pipe into another tool)
sct ndjson --rf2 ./SnomedCT_Release/ -o - | jq 'select(.id == "22298006")'
Output format
One JSON object per line, sorted by concept SCTID. Every line is a standalone JSON object — the file is valid NDJSON.
{
"id": "22298006",
"fsn": "Myocardial infarction (disorder)",
"preferred_term": "Heart attack",
"synonyms": ["Cardiac infarction", "Infarction of heart", "MI - Myocardial infarction"],
"hierarchy": "Clinical finding",
"hierarchy_path": [
"SNOMED CT Concept",
"Clinical finding",
"Disorder of cardiovascular system",
"Ischemic heart disease",
"Myocardial infarction"
],
"parents": [{"id": "414795007", "fsn": "Ischemic heart disease (disorder)"}],
"children_count": 47,
"active": true,
"module": "900000000000207008",
"effective_time": "20020131",
"attributes": {
"finding_site": [{"id": "302509004", "fsn": "Entire heart (body structure)"}],
"associated_morphology": [{"id": "55641003", "fsn": "Infarct (morphologic abnormality)"}]
},
"ctv3_codes": ["X200E"],
"read2_codes": [],
"schema_version": 2
}
Fields
| Field | Type | Description |
|---|---|---|
id |
string | SNOMED CT concept identifier (SCTID) |
fsn |
string | Fully Specified Name — unique, includes semantic tag in parentheses |
preferred_term |
string | Preferred synonym for the requested locale |
synonyms |
string[] | All other active synonyms (preferred term excluded) |
hierarchy |
string | Top-level hierarchy label (e.g. Clinical finding, Procedure) |
hierarchy_path |
string[] | Ancestor chain from root to this concept (semantic tags stripped) |
parents |
{id, fsn}[] |
Direct IS-A parents, sorted by SCTID |
children_count |
integer | Number of direct IS-A children in this release |
active |
boolean | Always true unless --include-inactive is used |
module |
string | SNOMED module identifier |
effective_time |
string | Date this concept last changed, YYYYMMDD |
attributes |
object | Named attribute groups with {id, fsn}[] values |
ctv3_codes |
string[] | CTV3 crossmap codes (UK edition only; empty array otherwise) |
read2_codes |
string[] | Read v2 codes (UK edition only; empty array otherwise) |
schema_version |
integer | Artefact schema version (currently 2) |
Artefact properties
- One line per active concept (inactive omitted unless
--include-inactive) - Stable ordering by concept ID
- Locale-aware preferred terms
- Self-contained: each line is independently interpretable
- Greppable:
grep "22298006" snomed.ndjson
Querying with standard tools
The artefact is designed to be queried with jq without any custom tooling.
# Look up a concept by SCTID
jq 'select(.id == "22298006")' snomed.ndjson
# Search by preferred term (case-insensitive)
jq 'select(.preferred_term | test("myocardial infarction"; "i"))' snomed.ndjson \
| head -1 | jq '{id, preferred_term, hierarchy}'
# Count concepts by top-level hierarchy
jq -r '.hierarchy' snomed.ndjson | sort | uniq -c | sort -rn | head -10
# Find concepts with a specific attribute
jq 'select(.attributes.finding_site != null) | {id, preferred_term}' snomed.ndjson
# All concepts with CTV3 mappings
jq 'select(.ctv3_codes | length > 0) | {id, preferred_term, ctv3_codes}' snomed.ndjson
# Concepts modified in a specific release
jq 'select(.effective_time == "20260301") | .preferred_term' snomed.ndjson
Which TRUD download to use
| TRUD item | Use it? | Notes |
|---|---|---|
| Monolith Edition, RF2: Snapshot | ✅ Recommended | International + UK clinical + dm+d in one directory. Single --rf2 argument. |
| Clinical Edition, RF2: Full, Snapshot & Delta | ✅ Works | Snapshot files are used; Full and Delta ignored. |
| Drug Extension, RF2: Full, Snapshot & Delta | ⚠️ Supplement | Use as a second --rf2 alongside Clinical Edition. |
| Clinical Edition, RF2: Delta | ❌ Won't work | No Snapshot files. |
| Cross-map Historical Files | ❌ Not needed | Ignored by sct. |
Determinism
Given the same RF2 Snapshot directory and --locale, sct ndjson always produces byte-for-byte identical output:
sha256sum snomed-uk-20260311.ndjson
The file can be checksummed, committed to git-lfs, and used as a pinned dependency.
RF2 file patterns recognised
sct scans the supplied directory recursively for:
| Pattern | Content |
|---|---|
sct2_Concept_Snapshot_*.txt |
Concept identifiers and status |
sct2_Description_Snapshot_*.txt |
Terms and synonyms |
sct2_Relationship_Snapshot_*.txt |
IS-A and attribute relationships (inferred) |
der2_cRefset_Language_*.txt |
Language reference sets (preferred term acceptability) |
der2_sRefset_SimpleMap_*.txt |
Simple map reference sets (CTV3/Read v2 crossmaps) |
Stated relationship files (sct2_StatedRelationship_*) are intentionally skipped — the inferred release is used for hierarchy and attributes. Full and Delta files are ignored.
Next: load into SQLite with sct sqlite, export to Parquet with sct parquet, or generate embeddings with sct embed.