Recall & Search

Recall is the core operation — everything else (storing, connecting, consolidating) exists to make recall better. Engram uses a hybrid retrieval pipeline that combines semantic understanding with keyword matching, then ranks results using multiple signals.

The Recall Pipeline

flowchart TD
    Q[Query text] --> E[Embed query]
    Q --> F[Extract content words]
    E --> V[Vector search\nsqlite-vec]
    F --> T[FTS5 keyword\nsearch]
    V --> M[Merge & deduplicate]
    T --> M
    M --> R[Rank by combined score]
    R --> C{Distance\ncutoff}
    C -- Pass --> D[Graph traversal\ndepth 1-3]
    C -- Fail --> X[Filtered out]
    D --> O[Return results]

1. Vector Search

The query text is embedded using the same sentence transformer model used at storage time, then compared against all stored memory vectors using cosine distance via sqlite-vec.

A critical design detail: the raw natural-language query is embedded, not a stripped keyword version. Embedding models are trained on natural language — function words like “on top of” and “we will want to” provide structural context that improves match quality. Empirical testing confirmed that stripping function words before embedding degraded recall precision.

2. Full-Text Search

In parallel, content words are extracted from the query (stripping stop words using NLTagger part-of-speech analysis) and run through SQLite FTS5. This catches memories that share exact terminology but might rank lower in vector space.

The two search paths intentionally use different inputs from the same query:

Vector search — raw full query (preserves structural meaning)
FTS5 — extracted content words only (pure keyword matching)

3. Scoring and Ranking

Results from both searches are merged and ranked by a weighted combination of signals:

Signal	Weight	Description
Semantic similarity	Primary	Cosine distance between query and memory embeddings
Full-text relevance	Secondary	FTS5 keyword match quality
Project boost	Contextual	Same-project and global memories rank higher
Importance	Up to +20%	Memories rated 4-5 get a ranking boost
Recency	Tiebreaker	Recently accessed memories edge out older ones

A dynamic distance cutoff filters results that are too dissimilar. This prevents noise — when nothing relevant exists, you get fewer results rather than irrelevant ones.

4. Graph Traversal

When depth is set (1-3), the pipeline follows edges from the initial results to surface connected memories. This is where the knowledge graph pays off:

A hub memory about “sync architecture” pulls in its part_of children — specific design decisions and implementation details
A supersedes chain surfaces the latest version of an evolving decision
A contradicts edge surfaces the opposing viewpoint alongside the matching one

Traversal adds context that keyword or vector search alone would miss.

Context Injection

In practice, most recalls happen automatically through the advise hook — a pre-prompt integration that fires before each user message in Claude Code. The hook:

Takes the user’s message as a recall query
Runs the full hybrid pipeline
Injects the most relevant memories into the model’s context window

This means the AI agent starts every response with relevant knowledge from past sessions — preferences, patterns, architecture decisions, debugging insights — without anyone explicitly asking for it.

Project Scoping

The project parameter on recall is a soft ranking signal, not a hard filter. When you recall with project: "MyApp":

Memories scoped to MyApp get a ranking boost
Global memories (project: "global") also get a boost — cross-project preferences should always surface
Memories from other projects can still appear if they’re semantically relevant

This means a memory about a Swift concurrency pattern stored under project “Lattice” will still surface when working in “Engram” if the query is relevant — it just ranks slightly lower than an Engram-specific memory about the same topic.

Scaling Characteristics

The hybrid pipeline is designed to stay fast as knowledge bases grow:

Vector search uses sqlite-vec’s approximate nearest neighbor index — sub-millisecond even with thousands of memories
FTS5 is SQLite’s built-in full-text engine, highly optimized for text search
Graph traversal follows edges in the database, not in-memory — memory footprint stays constant
All queries run against a local SQLite file — no network round-trips

The practical bottleneck is context window size, not search speed. With thousands of memories, the challenge shifts from “finding relevant ones” to “fitting the best ones into the model’s context.”