Recall & Search
Recall is the core operation — everything else (storing, connecting, consolidating) exists to make recall better. Engram uses a hybrid retrieval pipeline that combines semantic understanding with keyword matching, then ranks results using multiple signals.
The Recall Pipeline
Section titled “The Recall Pipeline”flowchart TD
Q[Query text] --> E[Embed query]
Q --> F[Extract content words]
E --> V[Vector search\nsqlite-vec]
F --> T[FTS5 keyword\nsearch]
V --> M[Merge & deduplicate]
T --> M
M --> R[Rank by combined score]
R --> C{Distance\ncutoff}
C -- Pass --> D[Graph traversal\ndepth 1-3]
C -- Fail --> X[Filtered out]
D --> O[Return results]
1. Vector Search
Section titled “1. Vector Search”The query text is embedded using the same sentence transformer model used at storage time, then compared against all stored memory vectors using cosine distance via sqlite-vec.
A critical design detail: the raw natural-language query is embedded, not a stripped keyword version. Embedding models are trained on natural language — function words like “on top of” and “we will want to” provide structural context that improves match quality. Empirical testing confirmed that stripping function words before embedding degraded recall precision.
2. Full-Text Search
Section titled “2. Full-Text Search”In parallel, content words are extracted from the query (stripping stop words using NLTagger part-of-speech analysis) and run through SQLite FTS5. This catches memories that share exact terminology but might rank lower in vector space.
The two search paths intentionally use different inputs from the same query:
- Vector search — raw full query (preserves structural meaning)
- FTS5 — extracted content words only (pure keyword matching)
3. Scoring and Ranking
Section titled “3. Scoring and Ranking”Results from both searches are merged and ranked by a weighted combination of signals:
| Signal | Weight | Description |
|---|---|---|
| Semantic similarity | Primary | Cosine distance between query and memory embeddings |
| Full-text relevance | Secondary | FTS5 keyword match quality |
| Project boost | Contextual | Same-project and global memories rank higher |
| Importance | Up to +20% | Memories rated 4-5 get a ranking boost |
| Recency | Tiebreaker | Recently accessed memories edge out older ones |
A dynamic distance cutoff filters results that are too dissimilar. This prevents noise — when nothing relevant exists, you get fewer results rather than irrelevant ones.
4. Graph Traversal
Section titled “4. Graph Traversal”When depth is set (1-3), the pipeline follows edges from the initial results to surface connected memories. This is where the knowledge graph pays off:
- A hub memory about “sync architecture” pulls in its
part_ofchildren — specific design decisions and implementation details - A
supersedeschain surfaces the latest version of an evolving decision - A
contradictsedge surfaces the opposing viewpoint alongside the matching one
Traversal adds context that keyword or vector search alone would miss.
Context Injection
Section titled “Context Injection”In practice, most recalls happen automatically through the advise hook — a pre-prompt integration that fires before each user message in Claude Code. The hook:
- Takes the user’s message as a recall query
- Runs the full hybrid pipeline
- Injects the most relevant memories into the model’s context window
This means the AI agent starts every response with relevant knowledge from past sessions — preferences, patterns, architecture decisions, debugging insights — without anyone explicitly asking for it.
Project Scoping
Section titled “Project Scoping”The project parameter on recall is a soft ranking signal, not a hard filter. When you recall with project: "MyApp":
- Memories scoped to
MyAppget a ranking boost - Global memories (project:
"global") also get a boost — cross-project preferences should always surface - Memories from other projects can still appear if they’re semantically relevant
This means a memory about a Swift concurrency pattern stored under project “Lattice” will still surface when working in “Engram” if the query is relevant — it just ranks slightly lower than an Engram-specific memory about the same topic.
Scaling Characteristics
Section titled “Scaling Characteristics”The hybrid pipeline is designed to stay fast as knowledge bases grow:
- Vector search uses sqlite-vec’s approximate nearest neighbor index — sub-millisecond even with thousands of memories
- FTS5 is SQLite’s built-in full-text engine, highly optimized for text search
- Graph traversal follows edges in the database, not in-memory — memory footprint stays constant
- All queries run against a local SQLite file — no network round-trips
The practical bottleneck is context window size, not search speed. With thousands of memories, the challenge shifts from “finding relevant ones” to “fitting the best ones into the model’s context.”