Memory & knowledge
Retrieval, recall, and durable context assembly.
Agents don't have long-term recall by default — the providers forget the moment the response ends. Koda's memory and knowledge layers keep that forgetting from mattering. Memory extracts what's worth remembering from conversations; knowledge retrieves what the operator has already approved as grounding context. Both feed into the runtime's context-assembly step on every task.
The memory map
This is a snapshot of what a populated memory store looks like: typed nodes, semantic edges, and clusters that emerge from how agents actually interact with the system. Hover a node to trace the connected path.
Memory lifecycle
Memory is intentionally simple at the edges and nuanced in the middle. Two points of contact with the runtime, one background pipeline.
- Recall (pre-query). Before the main provider runs, the queue manager asks the memory manager for context. Recall is bounded by
MEMORY_RECALL_TIMEOUT(default 3.0 seconds) and is non-fatal: if it times out, the task proceeds with whatever landed in time. - Extraction (post-query). Once the provider has finished, an extraction provider/model reads the query and the response, emits candidate memories with confidence scores, and the memory manager persists the ones that pass the quality gate.
- Background pipelines. Digest, maintenance, embedding repair, deduplication, and clustering jobs run independently of the main interaction path. A user-facing task never waits on them.
Memory types
Every memory is typed. The type drives the default TTL, the recall ranking bonus, and how the curation UI groups it. All types share the same schema; only the semantics change.
| Type | Default TTL | Used for |
|---|---|---|
| FACT | 730 days | Static information about users, systems, or the world |
| EVENT | 365 days | Time-bound occurrences (incidents, deploys, meetings) |
| PREFERENCE | 730 days | How this operator or user likes things done |
| DECISION | 730 days | Choices that have been made and their rationale |
| PROBLEM | 365 days | Issues, bugs, and debugging context |
| COMMIT | 365 days | Summary and intent of a code change |
| RELATIONSHIP | 730 days | Connections between entities in the domain |
| TASK | 90 days | Open work items with short-horizon relevance |
| PROCEDURE | 365 days | How-to steps, processes, repeatable patterns |
Memory status
Every memory carries a status that controls whether it's eligible for recall. Status transitions are the authoritative record of how context evolves over time.
- ACTIVE — current; in use for recall.
- STALE — past its TTL but not yet replaced.
- SUPERSEDED — explicitly replaced by a newer memory with the same
conflict_key. - INVALIDATED — known to be factually incorrect.
- REJECTED — failed the extraction-time quality gate.
Memory layers
When the recall step assembles context for the provider, memories are organised into four layers. The layer determines composition priority and display in the trace view.
- EPISODIC — the most recent conversation turns. Most time-sensitive.
- PROCEDURAL — learned procedures and reusable how-tos.
- CONVERSATIONAL — prior exchanges with the same user.
- PROACTIVE — context the agent chose to pre-stage. Least time-sensitive.
Configuration knobs
The defaults are sane for most installs. These are the environment variables you're most likely to reach for.
MEMORY_MAX_RECALL(25) — how many memories can land in a recall result.MEMORY_RECALL_THRESHOLD(0.25) — minimum similarity for a memory to be recalled.MEMORY_RECENCY_HALF_LIFE_DAYS(120) — decay rate for time-based ranking.MEMORY_MAX_CONTEXT_TOKENS(3500) — token budget for context assembly.MEMORY_MAX_PER_USER(2000) — retention cap before maintenance prunes least-important records.MEMORY_SIMILARITY_DEDUP_THRESHOLD(0.92) — cosine similarity threshold for deduplication.
paraphrase-multilingual-MiniLM-L12-v2 via sentence-transformers. Swap it via MEMORY_EMBEDDING_MODEL if you need a different language tier or size.Knowledge retrieval
Knowledge is the operator-approved counterpart to memory. Where memory learns from interactions, knowledge is curated: documents, runbooks, policies, and evidence that operators have explicitly marked as authoritative.
The retrieval engine combines three signals into one ranked result:
- Lexical — keyword match on the document chunks.
- Dense — cosine similarity over the chunk embedding.
- Graph — entity and relation proximity via the canonical-entity graph.
Scores are fused with Reciprocal Rank Fusion (RRF, K=60). Graph edges carry typed weights — strong ones like governs (1.0) and supersedes (0.95) dominate the ranking, weak ones like mentions (0.35) barely nudge it. A contradicts edge has weight 0, so a contradicting entity will never boost a hit.
What retrieval returns
Every query returns a structured response you can show in the UI:
- Selected hits — the final ranked chunks the agent should use.
- Trace hits — every candidate, with its lexical, dense, and graph ranks for debugging.
- Authoritative vs supporting evidence — highest confidence sources vs corroborating ones.
- Linked entities and graph relations — the part of the graph the query touched.
- Grounding score — overall confidence the answer can be grounded.
- Answer plan — a recommended action mode (direct, scoped, defer) for the agent.
- Judge result — policy-compliance metrics: citation coverage, contradiction rate, policy compliance.
Knowledge storage
Knowledge lives in Postgres with pgvector:
knowledge_documents— source documents.knowledge_chunks— chunked text, each with an embedding row.knowledge_entitiesandknowledge_relations— the canonical entity graph.retrieval_traces,retrieval_trace_hits,retrieval_bundles— per-query audit and ranking breakdown.answer_tracesandanswer_judgements— what the agent did with the retrieval result.
Memory is learned — it comes out of conversations automatically and is scoped to a user or agent. Knowledge is curated — documents and policies operators explicitly trust, shared across agents. Both are consulted on every task; neither is authoritative on its own.
Go deeper
- Authoring a Skill — Skills combine a prompt contract with the recall and retrieval context above.
- Runtime API — the endpoints that surface memory hits and retrieval traces in the UI.
- Security — how the security service sanitises every memory write and retrieval result before it leaves Postgres.