sct semantic
Semantic similarity search over a SNOMED CT Arrow IPC embeddings file.
When to use: you want to search by meaning rather than exact words. sct semantic "sticky blood" returns hypercoagulable state concepts; sct semantic "water tablets" returns diuretics — even though neither phrase appears in SNOMED. For exact keyword search, sct lexical is faster and requires no Ollama.
Embeds your query text via Ollama and performs cosine similarity against all concept embeddings in the .arrow file produced by sct embed. Returns the concepts whose meaning is closest to your query — including concepts that don't share any keywords.
Usage
sct semantic <QUERY> [--embeddings <FILE>] [--model <MODEL>] [--ollama-url <URL>] [--limit <N>]
Options
| Flag | Default | Description |
|---|---|---|
<QUERY> |
(required) | Natural-language search query. |
--embeddings <FILE> |
snomed-embeddings.arrow |
Arrow IPC file produced by sct embed. |
--model <MODEL> |
nomic-embed-text |
Ollama model — must match the model used when building the embeddings. |
--ollama-url <URL> |
http://localhost:11434 |
Ollama base URL. |
--limit <N> |
10 |
Maximum number of results. |
Prerequisites
Ollama must be running with the same model that was used to build the embeddings:
ollama serve
ollama pull nomic-embed-text # if not already pulled
Examples
# Basic semantic search
sct semantic "heart attack"
# Finds concepts by meaning even if the words differ
sct semantic "difficulty breathing"
sct semantic "water tablets" # → diuretic concepts
sct semantic "sticky blood" # → hypercoagulable state concepts
# Return more results
sct semantic "chest pain" --limit 20
# Use embeddings built with a different model
sct semantic "fracture" \
--embeddings snomed-embeddings-small.arrow \
--model mxbai-embed-large
# Use embeddings on a remote host
sct semantic "epilepsy" --ollama-url http://192.168.1.100:11434
Output
10 closest concepts to "heart attack":
0.9821 [22298006] Heart attack
0.9734 [57054005] Acute myocardial infarction
0.9701 [233843008] Silent myocardial infarction
0.9688 [194828000] Angina pectoris
...
The first column is the cosine similarity score — a value between 0 and 1 representing how closely the concept's meaning aligns with your query in vector space. 1 would mean identical direction; 0 means completely unrelated. In practice:
| Score | Interpretation |
|---|---|
| > 0.90 | Very strong match — almost certainly relevant |
| 0.80 – 0.90 | Good match — worth reviewing |
| 0.70 – 0.80 | Weak match — may be tangentially related |
| < 0.70 | Usually noise |
Results are always returned ranked, so the absolute values matter less than relative ordering — a score of 0.82 at rank 1 is more relevant than 0.81 at rank 10.
How it works
- Your query text is sent to Ollama, which returns a 768-dimensional float32 vector.
- The
.arrowfile is scanned; cosine similarity is computed between the query vector and each concept's embedding. - The top-N concepts by score are printed.
The query is embedded using the same text template as sct embed, so the query vector lives in the same embedding space as the concept vectors. The search is entirely local — no network call beyond the Ollama process running on your machine.
Comparison with sct lexical
sct lexical |
sct semantic |
|
|---|---|---|
| Basis | Keyword matching (FTS5) | Meaning / vector similarity |
| Input | SQLite .db |
Arrow .arrow + Ollama |
| Speed | Instant | ~1–2 s (embedding the query) |
| Finds synonyms | Only if indexed | Yes |
| Finds related concepts without shared words | No | Yes |
| Works offline | Yes | Requires local Ollama |
Use sct lexical when you know the SNOMED term. Use sct semantic when you're describing a concept in plain language or exploring related concepts.
See also
sct lexical— keyword search (faster, no Ollama required)sct embed— build the embeddings filesct mcp— the same search exposed assnomed_semantic_searchfor AI clients