Transitive Closure Table (TCT)
Precompute every ancestor-descendant pair in the SNOMED hierarchy for O(1) subsumption queries.
Docs:
sct tct
By default, sct sqlite stores only direct IS-A parent-child pairs in concept_isa. Subsumption queries ("give me all descendants of X") require a recursive CTE at query time. The transitive closure table (TCT) precomputes every ancestor-descendant pair in the hierarchy so these queries become a single indexed JOIN.
The TCT is entirely optional. Because it is derived from concept_isa — which is already in every sct sqlite output — it can be added to any existing database at any time without re-reading the original NDJSON artefact.
Build the TCT
Apply to an existing database:
sct tct --db snomed.db
# spinner: Building TCT for 831,132 concepts (5000/831132)...
# Done. 18,432,601 ancestor-descendant pairs in concept_ancestors.
Or build it in a single step alongside the main load:
sct sqlite --input snomed.ndjson --output snomed.db --transitive-closure
Both call the same underlying algorithm and produce identical output. The --transitive-closure flag is a convenience shorthand for pipelines that want everything in one command.
To include self-referential rows (depth = 0, ancestor_id = descendant_id) — useful if your queries always want "descendants including self":
sct tct --db snomed.db --include-self
Verify with sct info
sct info snomed.db
Without TCT:
IS-A edges: 504,216
TCT: not present (run `sct tct --db <file>` to build)
After sct tct:
IS-A edges: 504,216
TCT rows: 18,432,601
Performance comparison
The queries below are equivalent — both return all descendants of Myocardial infarction (22298006) in the IS-A hierarchy. The TCT version replaces a full recursive tree-walk with a single index seek.
Without TCT — recursive CTE (~4 ms on UK Monolith):
sqlite3 snomed.db <<EOF
.timer on
WITH RECURSIVE descendants(id) AS (
SELECT child_id FROM concept_isa WHERE parent_id = '22298006'
UNION
SELECT ci.child_id FROM concept_isa ci
JOIN descendants d ON ci.parent_id = d.id
)
SELECT COUNT(*) FROM descendants;
EOF
With TCT — indexed lookup (<1 ms on UK Monolith):
sqlite3 snomed.db <<EOF
.timer on
SELECT COUNT(*) FROM concept_ancestors WHERE ancestor_id = '22298006';
EOF
Both return the same count. The TCT version is faster because the index on ancestor_id gives SQLite a direct range scan over a single column, with no recursion.
The performance gap grows sharply with hierarchy depth and fanout. For large ancestors (e.g. Clinical finding with ~300k descendants), recursive CTEs can take hundreds of milliseconds; the TCT lookup stays under 1 ms regardless of hierarchy size.
Full subsumption query with preferred terms
sqlite3 snomed.db <<EOF
.timer on
SELECT c.preferred_term
FROM concepts c
JOIN concept_ancestors a ON c.id = a.descendant_id
WHERE a.ancestor_id = '22298006'
ORDER BY c.preferred_term;
EOF
Subsumption test (is A a descendant of B?)
sqlite3 snomed.db <<EOF
.timer on
SELECT CASE WHEN EXISTS (
SELECT 1 FROM concept_ancestors
WHERE ancestor_id = '22298006'
AND descendant_id = '57054005'
) THEN 'yes — is a descendant' ELSE 'no' END;
EOF
This is O(1) with the unique composite index — the core operation of any SNOMED subsumption check.