Skip to content
Kodakodadocs
Concepts

Runtime

How agents are supervised and executed.

The runtime is the execution layer. It receives requests, resolves which agent should handle them, assembles context from memory and knowledge, dispatches to a provider, runs the tool loop, and writes every result, artifact, and audit record to durable storage. Everything you ever see in the dashboard's trace view originated here.

Runtime execution lifecycle
  1. 01·Handler
    Intake
    Request reaches the platform through a supported interface (Telegram, dashboard, API).
  2. 02·Handler
    Normalize
    Handlers normalize input and enforce user access, then route into orchestration.
  3. 03·Queue
    Resolve
    Queue manager resolves the active agent, provider, prompt contract, and runtime context.
  4. 04·Context
    Assemble
    Memory recall, knowledge retrieval, and artifact context are gathered in parallel.
  5. 05·Provider
    Execute
    The selected provider runs the task, streaming responses and optionally entering a tool loop.
  6. 06·Runtime
    Tool loop
    <agent_cmd> tags are parsed, validated, executed, and surfaced back as <tool_result> tags.
  7. 07·State
    Persist
    Results, artifacts, memory writes, and audit records land in Postgres and object storage.

Execution lifecycle

Every task follows the same seven-step pipeline. Each step has a clear owner, a timeout, and a failure mode that degrades rather than hides.

  1. Intake. A supported interface — Telegram, the dashboard chat, the runtime HTTP API — receives a user query. The handler normalises it into the canonical runtime request shape.
  2. Access enforcement. The handler applies policy-defined user access controls before the request enters the queue.
  3. Resolution. The queue manager identifies the active agent, pulls the compiled prompt contract, resolves provider + model, and builds the execution context.
  4. Context assembly. Memory recall, knowledge retrieval, and artifact analysis are invoked in parallel. All three are best-effort and time-bounded — the task never blocks on them.
  5. Provider execution. The selected provider runs the task. Results stream back through the runner (Claude, Codex, Ollama, etc.). Streaming is preserved where the provider supports it.
  6. Tool loop. If the response contains <agent_cmd> tags, they are parsed, validated, executed, and the structured <tool_result> is fed back into the provider. The loop continues until no more commands appear, a cycle is detected, or the iteration cap is reached.
  7. Persistence. Results, metadata, artifacts, memory writes, and audit records land in Postgres and the S3-compatible store. The final output is returned through the original calling interface.
Best-effort context
Memory and knowledge are best-effort. A timeout or error never blocks the provider call — the agent proceeds with whatever context landed in time. Hard security boundaries, on the other hand, are always fail-closed.

Runtime layers in code

The pipeline above maps onto five source-code layers. If you're debugging, this is the right order to read in.

  • Handlers and adapters (koda/handlers) — interface-specific input normalisation and access control.
  • Queue and orchestration (koda/services/queue_manager.py) — resolves context, coordinates context assembly, dispatches to providers, handles fallback.
  • Provider runners (koda/services/llm_runner.py, claude_runner.py, codex_runner.py) — provider-specific CLI and adapter invocation; session continuity.
  • Tool dispatcher (koda/services/tool_dispatcher.py, tool_prompt.py) — parses <agent_cmd>, enforces policy, runs bounded tools, returns <tool_result>.
  • Persistence (koda/state, koda/services/runtime) — state, metadata, audit rows, artifact references.

Internal gRPC services

Behind the HTTP surface, five specialised gRPC services own the hard parts. They run in the same compose network, are bound to 127.0.0.1 only, and are reached through their *_GRPC_TARGET environment variables.

  • runtime-kernel:50061 environment & task lifecycle. Creates isolated work environments (git worktrees), launches background tasks, runs commands with timeouts, opens and streams interactive terminals, drives browser automation sessions, snapshots and restores workspaces.
  • memory:50063recall + extraction. Recalls context pre-query with a 3-second budget; extracts candidate memories post-query; runs clustering, deduplication, curation, and the memory map view.
  • retrieval:50062knowledge search. Hybrid lexical + dense + graph retrieval over operator-approved knowledge, returning ranked hits, canonical entities, graph relations, authoritative evidence, and a grounding score.
  • artifact:50064 artifact ingest & evidence. Streams artifact uploads into the S3-compatible store, extracts evidence (text, OCR, transcription, media analysis), tracks metadata.
  • security:50065 validation & redaction. Shell-command validation, environment sanitisation, runtime-path validation, S3 object-key validation, credential redaction for logs, filesystem policy.

The agent tool loop

When an agent wants to take an action, it emits an <agent_cmd> tag in its textual response. The dispatcher walks the response, turns every tag into a tool call, and feeds structured results back.

text
<agent_cmd tool="runtime.run" timeout="30">
pnpm test --filter=web
</agent_cmd>

Behind that tag the dispatcher runs the following loop:

  1. Parse every <agent_cmd> from the response.
  2. Check policy: in supervised mode, write operations surface as a confirmation requirement instead of executing blindly.
  3. Run each tool handler with its timeout and feature gates.
  4. Wrap the output in a <tool_result> tag and resume the active provider session with the new context.
  5. If the provider cannot resume, the queue manager can bootstrap a new session with the transcript and tool-loop context attached, and can downgrade to a smaller model to keep the task moving.
  6. Loop until no more <agent_cmd> appear, a cycle is detected, or the iteration cap is reached.
Why structured tag passing
Treating tools as structured text (not a separate channel) lets any provider — one with or without native tool-call support — participate. Every tag is also recorded in the trace view so you can replay the exact sequence of prompts, commands, and results.

The runtime HTTP surface

The /api/runtime/* routes expose inspection and control. They are guarded by the same operator session as the control plane (or the RUNTIME_LOCAL_UI_TOKEN for the dashboard-to-runtime path).

  • GET /api/runtime/ready — health probe.
  • GET /api/runtime/agents, GET /api/runtime/agents/:id — catalogue and detail.
  • POST /api/runtime/tasks, GET /api/runtime/tasks/:id — submit and inspect.
  • GET /api/runtime/tasks/:id/trace — the full step trace (provider calls, tool calls, memory hits, retrieval results, audit).

Operational characteristics

  • Fail closed when required bootstrap infrastructure is unavailable — the runtime won't start tasks with a missing database, missing object store, or unvalidated secrets.
  • Memory and retrieval are best-effort — time-bounded with explicit timeouts, never silently bypassing a hard security boundary when they fail.
  • Health, doctor, and OpenAPI surfaces are first-class — operators can always tell whether the runtime is healthy without SSH'ing into a container.
  • Production-like topology in quickstart — the local install uses the same services, ports, and internal targets as a VPS deployment, so drift between environments stays minimal.

Go deeper