Concepts

Memory & knowledge

Retrieval, recall, and durable context assembly.

Agents don't have long-term recall by default — the providers forget the moment the response ends. Koda's memory and knowledge layers keep that forgetting from mattering. Memory extracts what's worth remembering from conversations; knowledge retrieves what the operator has already approved as grounding context. Both feed into the runtime's context-assembly step on every task.

The memory map

This is a snapshot of what a populated memory store looks like: typed nodes, semantic edges, and clusters that emerge from how agents actually interact with the system. Hover a node to trace the connected path.

Memory map21 · 5 clusters

facteventpreferencedecisionproblemproceduretaskrelationshipcommit

Memory lifecycle

Memory is intentionally simple at the edges and nuanced in the middle. Two points of contact with the runtime, one background pipeline.

Recall (pre-query). Before the main provider runs, the queue manager asks the memory manager for context. Recall is bounded by MEMORY_RECALL_TIMEOUT (default 3.0 seconds) and is non-fatal: if it times out, the task proceeds with whatever landed in time.
Extraction (post-query). Once the provider has finished, an extraction provider/model reads the query and the response, emits candidate memories with confidence scores, and the memory manager persists the ones that pass the quality gate.
Background pipelines. Digest, maintenance, embedding repair, deduplication, and clustering jobs run independently of the main interaction path. A user-facing task never waits on them.

Governance and safety

Current Koda memory is scoped and scanned before it becomes durable context. The safety scanner blocks prompt injection, exfiltration instructions, credential leakage, secret path/read attempts, and invisible/control unicode before persistence.

Namespace kinds include user, agent, squad, workspace, project, and org. Legacy rows resolve as agent-scoped memory.
Recall explanations include selected, dropped, stale/status, conflict, namespace, sensitivity, source, and trust metadata.
Child runs do not recall or write shared memory unless their context policy explicitly allows it.
RunGraph and context-governance payloads receive metadata-only memory blocks. Raw memory text stays out of those traces.

Memory types

Every memory is typed. The type drives the default TTL, the recall ranking bonus, and how the curation UI groups it. All types share the same schema; only the semantics change.

Type	Default TTL	Used for
FACT	730 days	Static information about users, systems, or the world
EVENT	365 days	Time-bound occurrences (incidents, deploys, meetings)
PREFERENCE	730 days	How this operator or user likes things done
DECISION	730 days	Choices that have been made and their rationale
PROBLEM	365 days	Issues, bugs, and debugging context
COMMIT	365 days	Summary and intent of a code change
RELATIONSHIP	730 days	Connections between entities in the domain
TASK	90 days	Open work items with short-horizon relevance
PROCEDURE	365 days	How-to steps, processes, repeatable patterns

Memory status

Every memory carries a status that controls whether it's eligible for recall. Status transitions are the authoritative record of how context evolves over time.

ACTIVE — current; in use for recall.
STALE — past its TTL but not yet replaced.
SUPERSEDED — explicitly replaced by a newer memory with the same conflict_key.
INVALIDATED — known to be factually incorrect.
REJECTED — failed the extraction-time quality gate.

Memory layers

When the recall step assembles context for the provider, memories are organised into four layers. The layer determines composition priority and display in the trace view.

EPISODIC — the most recent conversation turns. Most time-sensitive.
PROCEDURAL — learned procedures and reusable how-tos.
CONVERSATIONAL — prior exchanges with the same user.
PROACTIVE — context the agent chose to pre-stage. Least time-sensitive.

Configuration knobs

The defaults are sane for most installs. These are the environment variables you're most likely to reach for.

MEMORY_MAX_RECALL (25) — how many memories can land in a recall result.
MEMORY_RECALL_THRESHOLD (0.25) — minimum similarity for a memory to be recalled.
MEMORY_RECENCY_HALF_LIFE_DAYS (120) — decay rate for time-based ranking.
MEMORY_MAX_CONTEXT_TOKENS (3500) — token budget for context assembly.
MEMORY_MAX_PER_USER (2000) — retention cap before maintenance prunes least-important records.
MEMORY_SIMILARITY_DEDUP_THRESHOLD (0.92) — cosine similarity threshold for deduplication.

Embedding model

Memory vectors default to paraphrase-multilingual-MiniLM-L12-v2 via sentence-transformers. Swap it via MEMORY_EMBEDDING_MODEL if you need a different language tier or size.

Knowledge retrieval

Knowledge is the operator-approved counterpart to memory. Where memory learns from interactions, knowledge is curated: documents, runbooks, policies, and evidence that operators have explicitly marked as authoritative.

The retrieval engine combines three signals into one ranked result:

Lexical — keyword match on the document chunks.
Dense — cosine similarity over the chunk embedding.
Graph — entity and relation proximity via the canonical-entity graph.

Scores are fused with Reciprocal Rank Fusion (RRF, K=60). Graph edges carry typed weights — strong ones like governs (1.0) and supersedes (0.95) dominate the ranking, weak ones like mentions (0.35) barely nudge it. A contradicts edge has weight 0, so a contradicting entity will never boost a hit.

What retrieval returns

Every query returns a structured response you can show in the UI:

Selected hits — the final ranked chunks the agent should use.
Trace hits — every candidate, with its lexical, dense, and graph ranks for debugging.
Authoritative vs supporting evidence — highest confidence sources vs corroborating ones.
Linked entities and graph relations — the part of the graph the query touched.
Grounding score — overall confidence the answer can be grounded.
Answer plan — a recommended action mode (direct, scoped, defer) for the agent.
Judge result — policy-compliance metrics: citation coverage, contradiction rate, policy compliance.

Knowledge storage

Knowledge lives in Postgres with pgvector:

knowledge_documents — source documents.
knowledge_chunks — chunked text, each with an embedding row.
knowledge_entities and knowledge_relations — the canonical entity graph.
retrieval_traces, retrieval_trace_hits, retrieval_bundles — per-query audit and ranking breakdown.
answer_traces and answer_judgements — what the agent did with the retrieval result.

Memory vs knowledge

Memory is learned — it comes out of conversations automatically and is scoped to a user or agent. Knowledge is curated — documents and policies operators explicitly trust, shared across agents. Both are consulted on every task; neither is authoritative on its own.

Go deeper

Authoring a Skill — Skills combine a prompt contract with the recall and retrieval context above.
Runtime API — the endpoints that surface memory hits and retrieval traces in the UI.
Security — how the security service sanitises every memory write and retrieval result before it leaves Postgres.