Skip to content
Kodakodadocs
Concepts

Memory & knowledge

Retrieval, recall, and durable context assembly.

Agents don't have long-term recall by default — the providers forget the moment the response ends. Koda's memory and knowledge layers keep that forgetting from mattering. Memory extracts what's worth remembering from conversations; knowledge retrieves what the operator has already approved as grounding context. Both feed into the runtime's context-assembly step on every task.

The memory map

This is a snapshot of what a populated memory store looks like: typed nodes, semantic edges, and clusters that emerge from how agents actually interact with the system. Hover a node to trace the connected path.

Memory map21 · 5 clusters
auth / sessionruntime executionmemory lifecycleknowledge retrievaloperations
facteventpreferencedecisionproblemproceduretaskrelationshipcommit

Memory lifecycle

Memory is intentionally simple at the edges and nuanced in the middle. Two points of contact with the runtime, one background pipeline.

  1. Recall (pre-query). Before the main provider runs, the queue manager asks the memory manager for context. Recall is bounded by MEMORY_RECALL_TIMEOUT (default 3.0 seconds) and is non-fatal: if it times out, the task proceeds with whatever landed in time.
  2. Extraction (post-query). Once the provider has finished, an extraction provider/model reads the query and the response, emits candidate memories with confidence scores, and the memory manager persists the ones that pass the quality gate.
  3. Background pipelines. Digest, maintenance, embedding repair, deduplication, and clustering jobs run independently of the main interaction path. A user-facing task never waits on them.

Memory types

Every memory is typed. The type drives the default TTL, the recall ranking bonus, and how the curation UI groups it. All types share the same schema; only the semantics change.

TypeDefault TTLUsed for
FACT730 daysStatic information about users, systems, or the world
EVENT365 daysTime-bound occurrences (incidents, deploys, meetings)
PREFERENCE730 daysHow this operator or user likes things done
DECISION730 daysChoices that have been made and their rationale
PROBLEM365 daysIssues, bugs, and debugging context
COMMIT365 daysSummary and intent of a code change
RELATIONSHIP730 daysConnections between entities in the domain
TASK90 daysOpen work items with short-horizon relevance
PROCEDURE365 daysHow-to steps, processes, repeatable patterns

Memory status

Every memory carries a status that controls whether it's eligible for recall. Status transitions are the authoritative record of how context evolves over time.

  • ACTIVE — current; in use for recall.
  • STALE — past its TTL but not yet replaced.
  • SUPERSEDED — explicitly replaced by a newer memory with the same conflict_key.
  • INVALIDATED — known to be factually incorrect.
  • REJECTED — failed the extraction-time quality gate.

Memory layers

When the recall step assembles context for the provider, memories are organised into four layers. The layer determines composition priority and display in the trace view.

  • EPISODIC — the most recent conversation turns. Most time-sensitive.
  • PROCEDURAL — learned procedures and reusable how-tos.
  • CONVERSATIONAL — prior exchanges with the same user.
  • PROACTIVE — context the agent chose to pre-stage. Least time-sensitive.

Configuration knobs

The defaults are sane for most installs. These are the environment variables you're most likely to reach for.

  • MEMORY_MAX_RECALL (25) — how many memories can land in a recall result.
  • MEMORY_RECALL_THRESHOLD (0.25) — minimum similarity for a memory to be recalled.
  • MEMORY_RECENCY_HALF_LIFE_DAYS (120) — decay rate for time-based ranking.
  • MEMORY_MAX_CONTEXT_TOKENS (3500) — token budget for context assembly.
  • MEMORY_MAX_PER_USER (2000) — retention cap before maintenance prunes least-important records.
  • MEMORY_SIMILARITY_DEDUP_THRESHOLD (0.92) — cosine similarity threshold for deduplication.
Embedding model
Memory vectors default to paraphrase-multilingual-MiniLM-L12-v2 via sentence-transformers. Swap it via MEMORY_EMBEDDING_MODEL if you need a different language tier or size.

Knowledge retrieval

Knowledge is the operator-approved counterpart to memory. Where memory learns from interactions, knowledge is curated: documents, runbooks, policies, and evidence that operators have explicitly marked as authoritative.

The retrieval engine combines three signals into one ranked result:

  • Lexical — keyword match on the document chunks.
  • Dense — cosine similarity over the chunk embedding.
  • Graph — entity and relation proximity via the canonical-entity graph.

Scores are fused with Reciprocal Rank Fusion (RRF, K=60). Graph edges carry typed weights — strong ones like governs (1.0) and supersedes (0.95) dominate the ranking, weak ones like mentions (0.35) barely nudge it. A contradicts edge has weight 0, so a contradicting entity will never boost a hit.

What retrieval returns

Every query returns a structured response you can show in the UI:

  • Selected hits — the final ranked chunks the agent should use.
  • Trace hits — every candidate, with its lexical, dense, and graph ranks for debugging.
  • Authoritative vs supporting evidence — highest confidence sources vs corroborating ones.
  • Linked entities and graph relations — the part of the graph the query touched.
  • Grounding score — overall confidence the answer can be grounded.
  • Answer plan — a recommended action mode (direct, scoped, defer) for the agent.
  • Judge result — policy-compliance metrics: citation coverage, contradiction rate, policy compliance.

Knowledge storage

Knowledge lives in Postgres with pgvector:

  • knowledge_documents — source documents.
  • knowledge_chunks — chunked text, each with an embedding row.
  • knowledge_entities and knowledge_relations — the canonical entity graph.
  • retrieval_traces, retrieval_trace_hits, retrieval_bundles — per-query audit and ranking breakdown.
  • answer_traces and answer_judgements — what the agent did with the retrieval result.
Memory vs knowledge

Memory is learned — it comes out of conversations automatically and is scoped to a user or agent. Knowledge is curated — documents and policies operators explicitly trust, shared across agents. Both are consulted on every task; neither is authoritative on its own.

Go deeper

  • Authoring a Skill — Skills combine a prompt contract with the recall and retrieval context above.
  • Runtime API — the endpoints that surface memory hits and retrieval traces in the UI.
  • Security — how the security service sanitises every memory write and retrieval result before it leaves Postgres.