How Memory Works

Engram gives AI agents persistent memory by storing knowledge as atomic memories — small, focused pieces of information that persist across sessions and can be recalled semantically.

Atomic Memories

Each memory is a single concept: a preference, a pattern, a decision, a debugging insight. This matters because:

Precise recall — a focused memory about “Lattice uses WAL mode” matches better than a sprawling paragraph that mentions it in passing
Targeted updates — you can update or expire one fact without touching others
Clean consolidation — when memories overlap, the system can merge them without losing unrelated information

When a memory covers multiple topics, Engram encourages decomposition. A hub memory captures the overview, and detail memories link to it via part_of edges (see Knowledge Graph). This creates a tree where recall can surface either the summary or the specifics depending on what the query needs.

Embeddings

When a memory is stored, its text is converted into a 384-dimensional vector using a sentence transformer model that runs entirely on-device — no API calls, no data leaves your machine.

This vector captures the meaning of the text, not just its keywords. “The app crashes on login” and “authentication flow throws an exception” produce similar vectors even though they share no words.

The embedding model (paraphrase-MiniLM-L6-v2) is optimized for:

Fast inference on consumer hardware (CoreML-accelerated on Apple Silicon)
Strong performance on short technical text
Semantic similarity between paraphrases

Every memory must have a valid embedding. If the embedding model fails to load, the store operation fails loudly rather than saving a memory that can never be found.

Storage

Engram stores everything in SQLite via Lattice, a Swift ORM designed for local-first applications.

Each user gets their own database file. Inside it:

Component	Purpose
Memories table	All fields — content, topic, project, importance, timestamps, embedding, etc.
FTS5 index	SQLite full-text search over memory content for keyword matching
vec0 virtual table	sqlite-vec for vector similarity search
Edges table	Directed relationships between memories
Checkpoints & Episodes	Work-in-progress state for session continuity

SQLite was chosen deliberately over a dedicated vector database:

Single file — the entire knowledge base is one .sqlite file, easy to backup or sync
ACID guarantees — no partial writes, no corruption on crash
WAL mode — concurrent reads don’t block writes
No server — runs in-process, zero network latency

Conflict Detection

When storing a new memory, Engram checks for near-duplicates using a two-gate system:

flowchart TD
    A[New memory arrives] --> B[Compute embedding]
    B --> C{Semantic gate: embedding\ndistance below threshold?}
    C -- No --> D[Store normally]
    C -- Yes --> E{Lexical gate: Jaccard\nterm overlap above threshold?}
    E -- No --> D
    E -- Yes --> F[Flag as conflict]
    F --> G[Return existing memory\n+ resolution options]

Both gates must trigger for a conflict to be flagged. Two memories can be about the same topic (similar embeddings) while covering different aspects (different terms), and they won’t be flagged as duplicates.

When a conflict is detected, the system returns the existing memory and suggests resolution: update the existing one, force-store to keep both, or forget the old one.

Expiration

Memories can have an expiration date set via expires_in_days. Expired memories are automatically filtered from recall results but aren’t deleted — they remain in the database for timeline queries or direct ID lookup.

This is designed for temporal context: “currently working on the auth refactor,” “PR #42 needs review,” or “blocked on API migration.” Valuable now, noise in a few weeks.

What Stays Local

All of this — embedding, storage, search, conflict detection — runs on your machine. The embedding model is bundled with the application. The database is a local file. No memory content is sent to any external service unless you explicitly enable team sync.