sct markdown

Export a SNOMED CT NDJSON artefact to per-concept Markdown files, organised by hierarchy.

Designed for RAG (retrieval-augmented generation) indexing, filesystem MCP tools, and direct LLM file reading.

Usage

sct markdown --input <NDJSON> [--output <DIR>] [--mode <MODE>]

Options

Flag	Default	Description
`--input <FILE>`	(required)	NDJSON file produced by `sct ndjson`. Use `-` for stdin.
`--output <DIR>`	`snomed-concepts`	Output directory.
`--mode <MODE>`	`concept`	Output grouping: `concept` (one file per concept) or `hierarchy` (one file per top-level hierarchy).

Modes

`--mode concept` (default)

One .md file per SNOMED CT concept, named by SCTID. Output directory is partitioned by top-level hierarchy:

snomed-concepts/
  clinical-finding/
    22298006.md
    57054005.md
    ...
  procedure/
    173171007.md
    ...
  substance/
    ...

Best for: - Fine-grained RAG indexing (one chunk per concept) - grep / ripgrep / fzf searching - Filesystem MCP tools that can browse individual files

`--mode hierarchy`

One .md file per top-level hierarchy (~19 files), each containing all concepts in that hierarchy.

snomed-concepts/
  clinical-finding.md
  procedure.md
  substance.md
  ...

Best for: - Upload to LLM context windows - RAG pipelines that struggle with very large numbers of small files - Quick browsing of an entire hierarchy

Examples

# One file per concept (default)
sct markdown \
  --input snomed.ndjson \
  --output snomed-concepts/

# One file per hierarchy
sct markdown \
  --input snomed.ndjson \
  --output snomed-by-hierarchy/ \
  --mode hierarchy

Per-concept file format (`--mode concept`)

# Heart attack
**SCTID:** 22298006
**FSN:** Myocardial infarction (disorder)
**Hierarchy:** SNOMED CT Concept > Clinical finding > ... > Ischemic heart disease

## Synonyms
- Cardiac infarction
- Infarction of heart
- MI - Myocardial infarction

## Relationships
- **Finding site:** Entire heart [302509004]
- **Associated morphology:** Infarct [55641003]

## Hierarchy
- SNOMED CT Concept
  - Clinical finding
    - ...
      - **Myocardial infarction** *(this concept)*

## Parents
- Ischemic heart disease (disorder) `414795007`

Searching concept files

# Find files mentioning a term
grep -r "heart attack" snomed-concepts/ -l

# Full-text search with ripgrep
rg "myocardial" snomed-concepts/

# Fuzzy-find concept files by SCTID
fzf < <(find snomed-concepts/ -name "*.md")

# Find concepts with a specific attribute
grep -r "Finding site" snomed-concepts/clinical-finding/ -l | wc -l

Use with filesystem MCP

The Markdown output pairs well with a filesystem MCP server (e.g. the MCP filesystem server):

{
  "mcpServers": {
    "snomed-files": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/snomed-concepts"]
    }
  }
}

This allows an LLM to browse and read individual concept files directly.

Notes on scale

The full UK Monolith produces ~831,000 files in --mode concept. This is handled fine by: - ripgrep, grep, find - Standard filesystem tools on Linux/macOS (ext4, APFS) - Most RAG indexing pipelines

Some tools that may struggle with 800k+ files: - Windows Explorer - Certain filesystem MCP servers with directory listing limits - Git (do not commit the output directory — add it to .gitignore)