Skip to content

How Memory Works

Engram gives AI agents persistent memory by storing knowledge as atomic memories — small, focused pieces of information that persist across sessions and can be recalled semantically.

Each memory is a single concept: a preference, a pattern, a decision, a debugging insight. This matters because:

  • Precise recall — a focused memory about “Lattice uses WAL mode” matches better than a sprawling paragraph that mentions it in passing
  • Targeted updates — you can update or expire one fact without touching others
  • Clean consolidation — when memories overlap, the system can merge them without losing unrelated information

When a memory covers multiple topics, Engram encourages decomposition. A hub memory captures the overview, and detail memories link to it via part_of edges (see Knowledge Graph). This creates a tree where recall can surface either the summary or the specifics depending on what the query needs.

When a memory is stored, its text is converted into a 384-dimensional vector using a sentence transformer model that runs entirely on-device — no API calls, no data leaves your machine.

This vector captures the meaning of the text, not just its keywords. “The app crashes on login” and “authentication flow throws an exception” produce similar vectors even though they share no words.

The embedding model (paraphrase-MiniLM-L6-v2) is optimized for:

  • Fast inference on consumer hardware (CoreML-accelerated on Apple Silicon)
  • Strong performance on short technical text
  • Semantic similarity between paraphrases

Every memory must have a valid embedding. If the embedding model fails to load, the store operation fails loudly rather than saving a memory that can never be found.

Engram stores everything in SQLite via Lattice, a Swift ORM designed for local-first applications.

Each user gets their own database file. Inside it:

ComponentPurpose
Memories tableAll fields — content, topic, project, importance, timestamps, embedding, etc.
FTS5 indexSQLite full-text search over memory content for keyword matching
vec0 virtual tablesqlite-vec for vector similarity search
Edges tableDirected relationships between memories
Checkpoints & EpisodesWork-in-progress state for session continuity

SQLite was chosen deliberately over a dedicated vector database:

  • Single file — the entire knowledge base is one .sqlite file, easy to backup or sync
  • ACID guarantees — no partial writes, no corruption on crash
  • WAL mode — concurrent reads don’t block writes
  • No server — runs in-process, zero network latency

When storing a new memory, Engram checks for near-duplicates using a two-gate system:

flowchart TD
    A[New memory arrives] --> B[Compute embedding]
    B --> C{Semantic gate: embedding\ndistance below threshold?}
    C -- No --> D[Store normally]
    C -- Yes --> E{Lexical gate: Jaccard\nterm overlap above threshold?}
    E -- No --> D
    E -- Yes --> F[Flag as conflict]
    F --> G[Return existing memory\n+ resolution options]

Both gates must trigger for a conflict to be flagged. Two memories can be about the same topic (similar embeddings) while covering different aspects (different terms), and they won’t be flagged as duplicates.

When a conflict is detected, the system returns the existing memory and suggests resolution: update the existing one, force-store to keep both, or forget the old one.

Memories can have an expiration date set via expires_in_days. Expired memories are automatically filtered from recall results but aren’t deleted — they remain in the database for timeline queries or direct ID lookup.

This is designed for temporal context: “currently working on the auth refactor,” “PR #42 needs review,” or “blocked on API migration.” Valuable now, noise in a few weeks.

All of this — embedding, storage, search, conflict detection — runs on your machine. The embedding model is bundled with the application. The database is a local file. No memory content is sent to any external service unless you explicitly enable team sync.