Skip to content

Recall & Search

Recall is the core operation — everything else (storing, connecting, consolidating) exists to make recall better. Engram uses a hybrid retrieval pipeline that combines semantic understanding with keyword matching, then ranks results using multiple signals.

flowchart TD
    Q[Query text] --> E[Embed query]
    Q --> F[Extract content words]
    E --> V[Vector search\nsqlite-vec]
    F --> T[FTS5 keyword\nsearch]
    V --> M[Merge & deduplicate]
    T --> M
    M --> R[Rank by combined score]
    R --> C{Distance\ncutoff}
    C -- Pass --> D[Graph traversal\ndepth 1-3]
    C -- Fail --> X[Filtered out]
    D --> O[Return results]

The query text is embedded using the same sentence transformer model used at storage time, then compared against all stored memory vectors using cosine distance via sqlite-vec.

A critical design detail: the raw natural-language query is embedded, not a stripped keyword version. Embedding models are trained on natural language — function words like “on top of” and “we will want to” provide structural context that improves match quality. Empirical testing confirmed that stripping function words before embedding degraded recall precision.

In parallel, content words are extracted from the query (stripping stop words using NLTagger part-of-speech analysis) and run through SQLite FTS5. This catches memories that share exact terminology but might rank lower in vector space.

The two search paths intentionally use different inputs from the same query:

  • Vector search — raw full query (preserves structural meaning)
  • FTS5 — extracted content words only (pure keyword matching)

Results from both searches are merged and ranked by a weighted combination of signals:

SignalWeightDescription
Semantic similarityPrimaryCosine distance between query and memory embeddings
Full-text relevanceSecondaryFTS5 keyword match quality
Project boostContextualSame-project and global memories rank higher
ImportanceUp to +20%Memories rated 4-5 get a ranking boost
RecencyTiebreakerRecently accessed memories edge out older ones

A dynamic distance cutoff filters results that are too dissimilar. This prevents noise — when nothing relevant exists, you get fewer results rather than irrelevant ones.

When depth is set (1-3), the pipeline follows edges from the initial results to surface connected memories. This is where the knowledge graph pays off:

  • A hub memory about “sync architecture” pulls in its part_of children — specific design decisions and implementation details
  • A supersedes chain surfaces the latest version of an evolving decision
  • A contradicts edge surfaces the opposing viewpoint alongside the matching one

Traversal adds context that keyword or vector search alone would miss.

In practice, most recalls happen automatically through the advise hook — a pre-prompt integration that fires before each user message in Claude Code. The hook:

  1. Takes the user’s message as a recall query
  2. Runs the full hybrid pipeline
  3. Injects the most relevant memories into the model’s context window

This means the AI agent starts every response with relevant knowledge from past sessions — preferences, patterns, architecture decisions, debugging insights — without anyone explicitly asking for it.

The project parameter on recall is a soft ranking signal, not a hard filter. When you recall with project: "MyApp":

  • Memories scoped to MyApp get a ranking boost
  • Global memories (project: "global") also get a boost — cross-project preferences should always surface
  • Memories from other projects can still appear if they’re semantically relevant

This means a memory about a Swift concurrency pattern stored under project “Lattice” will still surface when working in “Engram” if the query is relevant — it just ranks slightly lower than an Engram-specific memory about the same topic.

The hybrid pipeline is designed to stay fast as knowledge bases grow:

  • Vector search uses sqlite-vec’s approximate nearest neighbor index — sub-millisecond even with thousands of memories
  • FTS5 is SQLite’s built-in full-text engine, highly optimized for text search
  • Graph traversal follows edges in the database, not in-memory — memory footprint stays constant
  • All queries run against a local SQLite file — no network round-trips

The practical bottleneck is context window size, not search speed. With thousands of memories, the challenge shifts from “finding relevant ones” to “fitting the best ones into the model’s context.”