How Memory Works
Engram gives AI agents persistent memory by storing knowledge as atomic memories — small, focused pieces of information that persist across sessions and can be recalled semantically.
Atomic Memories
Section titled “Atomic Memories”Each memory is a single concept: a preference, a pattern, a decision, a debugging insight. This matters because:
- Precise recall — a focused memory about “Lattice uses WAL mode” matches better than a sprawling paragraph that mentions it in passing
- Targeted updates — you can update or expire one fact without touching others
- Clean consolidation — when memories overlap, the system can merge them without losing unrelated information
When a memory covers multiple topics, Engram encourages decomposition. A hub memory captures the overview, and detail memories link to it via part_of edges (see Knowledge Graph). This creates a tree where recall can surface either the summary or the specifics depending on what the query needs.
Embeddings
Section titled “Embeddings”When a memory is stored, its text is converted into a 384-dimensional vector using a sentence transformer model that runs entirely on-device — no API calls, no data leaves your machine.
This vector captures the meaning of the text, not just its keywords. “The app crashes on login” and “authentication flow throws an exception” produce similar vectors even though they share no words.
The embedding model (paraphrase-MiniLM-L6-v2) is optimized for:
- Fast inference on consumer hardware (CoreML-accelerated on Apple Silicon)
- Strong performance on short technical text
- Semantic similarity between paraphrases
Every memory must have a valid embedding. If the embedding model fails to load, the store operation fails loudly rather than saving a memory that can never be found.
Storage
Section titled “Storage”Engram stores everything in SQLite via Lattice, a Swift ORM designed for local-first applications.
Each user gets their own database file. Inside it:
| Component | Purpose |
|---|---|
| Memories table | All fields — content, topic, project, importance, timestamps, embedding, etc. |
| FTS5 index | SQLite full-text search over memory content for keyword matching |
| vec0 virtual table | sqlite-vec for vector similarity search |
| Edges table | Directed relationships between memories |
| Checkpoints & Episodes | Work-in-progress state for session continuity |
SQLite was chosen deliberately over a dedicated vector database:
- Single file — the entire knowledge base is one
.sqlitefile, easy to backup or sync - ACID guarantees — no partial writes, no corruption on crash
- WAL mode — concurrent reads don’t block writes
- No server — runs in-process, zero network latency
Conflict Detection
Section titled “Conflict Detection”When storing a new memory, Engram checks for near-duplicates using a two-gate system:
flowchart TD
A[New memory arrives] --> B[Compute embedding]
B --> C{Semantic gate: embedding\ndistance below threshold?}
C -- No --> D[Store normally]
C -- Yes --> E{Lexical gate: Jaccard\nterm overlap above threshold?}
E -- No --> D
E -- Yes --> F[Flag as conflict]
F --> G[Return existing memory\n+ resolution options]
Both gates must trigger for a conflict to be flagged. Two memories can be about the same topic (similar embeddings) while covering different aspects (different terms), and they won’t be flagged as duplicates.
When a conflict is detected, the system returns the existing memory and suggests resolution: update the existing one, force-store to keep both, or forget the old one.
Expiration
Section titled “Expiration”Memories can have an expiration date set via expires_in_days. Expired memories are automatically filtered from recall results but aren’t deleted — they remain in the database for timeline queries or direct ID lookup.
This is designed for temporal context: “currently working on the auth refactor,” “PR #42 needs review,” or “blocked on API migration.” Valuable now, noise in a few weeks.
What Stays Local
Section titled “What Stays Local”All of this — embedding, storage, search, conflict detection — runs on your machine. The embedding model is bundled with the application. The database is a local file. No memory content is sent to any external service unless you explicitly enable team sync.