Runtime
How agents are supervised and executed.
The runtime is the execution layer. It receives requests, resolves which agent should handle them, assembles context from memory and knowledge, dispatches to a provider, runs the tool loop, and writes every result, artifact, and audit record to durable storage. Everything you ever see in the dashboard's trace view originated here.
- 01·HandlerIntakeRequest reaches the platform through a supported interface (Telegram, dashboard, API).
- 02·HandlerNormalizeHandlers normalize input and enforce user access, then route into orchestration.
- 03·QueueResolveQueue manager resolves the active agent, provider, prompt contract, and runtime context.
- 04·ContextAssembleMemory recall, knowledge retrieval, and artifact context are gathered in parallel.
- 05·ProviderExecuteThe selected provider runs the task, streaming responses and optionally entering a tool loop.
- 06·RuntimeTool loop<agent_cmd> tags are parsed, validated, executed, and surfaced back as <tool_result> tags.
- 07·StatePersistResults, artifacts, memory writes, and audit records land in Postgres and object storage.
Execution lifecycle
Every task follows the same seven-step pipeline. Each step has a clear owner, a timeout, and a failure mode that degrades rather than hides.
- Intake. A supported interface — Telegram, the dashboard chat, the runtime HTTP API — receives a user query. The handler normalises it into the canonical runtime request shape.
- Access enforcement. The handler applies policy-defined user access controls before the request enters the queue.
- Resolution. The queue manager identifies the active agent, pulls the compiled prompt contract, resolves provider + model, and builds the execution context.
- Context assembly. Memory recall, knowledge retrieval, and artifact analysis are invoked in parallel. All three are best-effort and time-bounded — the task never blocks on them.
- Provider execution. The selected provider runs the task. Results stream back through the runner (Claude, Codex, Ollama, etc.). Streaming is preserved where the provider supports it.
- Tool loop. If the response contains
<agent_cmd>tags, they are parsed, validated, executed, and the structured<tool_result>is fed back into the provider. The loop continues until no more commands appear, a cycle is detected, or the iteration cap is reached. - Persistence. Results, metadata, artifacts, memory writes, and audit records land in Postgres and the S3-compatible store. The final output is returned through the original calling interface.
Runtime layers in code
The pipeline above maps onto five source-code layers. If you're debugging, this is the right order to read in.
- Handlers and adapters (
koda/handlers) — interface-specific input normalisation and access control. - Queue and orchestration (
koda/services/queue_manager.py) — resolves context, coordinates context assembly, dispatches to providers, handles fallback. - Provider runners (
koda/services/llm_runner.py,claude_runner.py,codex_runner.py) — provider-specific CLI and adapter invocation; session continuity. - Tool dispatcher (
koda/services/tool_dispatcher.py,tool_prompt.py) — parses<agent_cmd>, enforces policy, runs bounded tools, returns<tool_result>. - Persistence (
koda/state,koda/services/runtime) — state, metadata, audit rows, artifact references.
Internal gRPC services
Behind the HTTP surface, five specialised gRPC services own the hard parts. They run in the same compose network, are bound to 127.0.0.1 only, and are reached through their *_GRPC_TARGET environment variables.
runtime-kernel:50061— environment & task lifecycle. Creates isolated work environments (git worktrees), launches background tasks, runs commands with timeouts, opens and streams interactive terminals, drives browser automation sessions, snapshots and restores workspaces.memory:50063— recall + extraction. Recalls context pre-query with a 3-second budget; extracts candidate memories post-query; runs clustering, deduplication, curation, and the memory map view.retrieval:50062— knowledge search. Hybrid lexical + dense + graph retrieval over operator-approved knowledge, returning ranked hits, canonical entities, graph relations, authoritative evidence, and a grounding score.artifact:50064— artifact ingest & evidence. Streams artifact uploads into the S3-compatible store, extracts evidence (text, OCR, transcription, media analysis), tracks metadata.security:50065— validation & redaction. Shell-command validation, environment sanitisation, runtime-path validation, S3 object-key validation, credential redaction for logs, filesystem policy.
The agent tool loop
When an agent wants to take an action, it emits an <agent_cmd> tag in its textual response. The dispatcher walks the response, turns every tag into a tool call, and feeds structured results back.
<agent_cmd tool="runtime.run" timeout="30"> pnpm test --filter=web</agent_cmd>Behind that tag the dispatcher runs the following loop:
- Parse every
<agent_cmd>from the response. - Check policy: in supervised mode, write operations surface as a confirmation requirement instead of executing blindly.
- Run each tool handler with its timeout and feature gates.
- Wrap the output in a
<tool_result>tag and resume the active provider session with the new context. - If the provider cannot resume, the queue manager can bootstrap a new session with the transcript and tool-loop context attached, and can downgrade to a smaller model to keep the task moving.
- Loop until no more
<agent_cmd>appear, a cycle is detected, or the iteration cap is reached.
The runtime HTTP surface
The /api/runtime/* routes expose inspection and control. They are guarded by the same operator session as the control plane (or the RUNTIME_LOCAL_UI_TOKEN for the dashboard-to-runtime path).
GET /api/runtime/ready— health probe.GET /api/runtime/agents,GET /api/runtime/agents/:id— catalogue and detail.POST /api/runtime/tasks,GET /api/runtime/tasks/:id— submit and inspect.GET /api/runtime/tasks/:id/trace— the full step trace (provider calls, tool calls, memory hits, retrieval results, audit).
Operational characteristics
- Fail closed when required bootstrap infrastructure is unavailable — the runtime won't start tasks with a missing database, missing object store, or unvalidated secrets.
- Memory and retrieval are best-effort — time-bounded with explicit timeouts, never silently bypassing a hard security boundary when they fail.
- Health, doctor, and OpenAPI surfaces are first-class — operators can always tell whether the runtime is healthy without SSH'ing into a container.
- Production-like topology in quickstart — the local install uses the same services, ports, and internal targets as a VPS deployment, so drift between environments stays minimal.
Go deeper
- Memory & knowledge — how recall and retrieval feed the runtime's context assembly step.
- Runtime API reference — every route, request, response, and status code.
- Troubleshooting — common runtime failure modes and where to look first.