Concepts

Runtime

How agents are supervised and executed.

The runtime is the execution layer. It receives requests, resolves which agent should handle them, assembles context from memory and knowledge, dispatches to a provider, runs the tool loop, and writes every result, artifact, and audit record to durable storage. Everything you ever see in the dashboard's trace view originated here.

Runtime execution lifecycle

01·Handler
Intake
Request reaches the platform through a supported interface (Telegram, dashboard, API).
02·Handler
Normalize
Handlers normalize input and enforce user access, then route into orchestration.
03·Queue
Resolve
Queue manager resolves the active agent, provider, prompt contract, and runtime context.
04·Context
Assemble
Memory recall, knowledge retrieval, and artifact context are gathered in parallel.
05·Provider
Execute
The selected provider runs the task, streaming responses and optionally entering a tool loop.
06·Runtime
Tool loop
<agent_cmd> tags are parsed, validated, executed, and surfaced back as <tool_result> tags.
07·State
Persist
Results, artifacts, memory writes, and audit records land in Postgres and object storage.

Execution lifecycle

Every task follows the same seven-step pipeline. Each step has a clear owner, a timeout, and a failure mode that degrades rather than hides.

Intake. A supported interface — dashboard sessions, approved Telegram identities, or runtime HTTP routes — receives a user query. The handler normalises it into the canonical runtime request shape.
Access enforcement. The handler applies channel, operator, and policy gates before the request enters the queue. Unknown channel senders are denied or queued before task enqueue.
Resolution. The queue manager identifies the active agent, pulls the compiled prompt contract, resolves provider + model, and builds the execution context.
Context assembly. Memory recall, knowledge retrieval, and artifact analysis are invoked in parallel. All three are best-effort and time-bounded — the task never blocks on them.
Provider execution. The selected provider runs the task. Results stream back through the runner (Claude, Codex, Ollama, etc.). Streaming is preserved where the provider supports it.
Tool loop. If the response contains <agent_cmd> tags, they are parsed, validated, executed, and the structured <tool_result> is fed back into the provider. The loop continues until no more commands appear, a cycle is detected, or the iteration cap is reached.
Persistence. Results, metadata, artifacts, memory writes, RunGraph nodes, and audit records land in Postgres and the S3-compatible store. The final output is returned through the original calling interface.

Best-effort context

Memory and knowledge are best-effort. A timeout or error never blocks the provider call — the agent proceeds with whatever context landed in time. Hard security boundaries, on the other hand, are always fail-closed.

Runtime layers in code

The pipeline above maps onto five source-code layers. If you're debugging, this is the right order to read in.

Handlers and adapters (koda/handlers) — interface-specific input normalisation and access control.
Queue and orchestration (koda/services/queue_manager.py) — resolves context, coordinates context assembly, dispatches to providers, handles fallback.
Provider runners (koda/services/llm_runner.py, claude_runner.py, codex_runner.py) — provider-specific CLI and adapter invocation; session continuity.
Tool dispatcher (koda/services/tool_dispatcher.py, tool_prompt.py) — parses <agent_cmd>, enforces policy, runs bounded tools, returns <tool_result>.
Persistence (koda/state, koda/services/runtime) — state, metadata, audit rows, artifact references.

Internal gRPC services

Behind the HTTP surface, five specialised gRPC services own the hard parts. They run in the same compose network, are bound to 127.0.0.1 only, and are reached through their *_GRPC_TARGET environment variables.

runtime-kernel:50061 — environment & task lifecycle. Creates isolated work environments (git worktrees), launches background tasks, runs commands with timeouts, opens and streams interactive terminals, drives browser automation sessions, snapshots and restores workspaces.
memory:50063 — recall + extraction. Recalls context pre-query with a 3-second budget; extracts candidate memories post-query; runs clustering, deduplication, curation, and the memory map view.
retrieval:50062 — knowledge search. Hybrid lexical + dense + graph retrieval over operator-approved knowledge, returning ranked hits, canonical entities, graph relations, authoritative evidence, and a grounding score.
artifact:50064 — artifact ingest & evidence. Streams artifact uploads into the S3-compatible store, extracts evidence (text, OCR, transcription, media analysis), tracks metadata.
security:50065 — validation & redaction. Shell-command validation, environment sanitisation, runtime-path validation, S3 object-key validation, credential redaction for logs, filesystem policy.

The agent tool loop

When an agent wants to take an action, it emits an <agent_cmd> tag in its textual response. The dispatcher walks the response, turns every tag into a tool call, and feeds structured results back.

text

<agent_cmd tool="runtime.run" timeout="30">
  pnpm test --filter=web
</agent_cmd>

Behind that tag the dispatcher runs the following loop:

Parse every <agent_cmd> from the response.
Check policy: in supervised mode, write operations surface as a confirmation requirement instead of executing blindly.
Run each tool handler with its timeout and feature gates.
Wrap the output in a <tool_result> tag and resume the active provider session with the new context.
If the provider cannot resume, the queue manager can bootstrap a new session with the transcript and tool-loop context attached, and can downgrade to a smaller model to keep the task moving.
Loop until no more <agent_cmd> appear, a cycle is detected, or the iteration cap is reached.

Why structured tag passing

Treating tools as structured text (not a separate channel) lets any provider — one with or without native tool-call support — participate. Every tag is also recorded in the trace view so you can replay the exact sequence of prompts, commands, and results.

The runtime HTTP surface

The /api/runtime/* routes expose inspection and control. They are guarded by the same operator session as the control plane (or the RUNTIME_LOCAL_UI_TOKEN for the dashboard-to-runtime path).

GET /api/runtime/readiness — runtime readiness probe.
GET /api/runtime/queues, GET /api/runtime/environments, and GET /api/runtime/schedules — queue, environment, and schedule views.
GET /api/runtime/tasks/:task_id, /run-graph, /replay, /sandbox-doctor, /events, and /artifacts — task detail and evidence.
GET /api/runtime/tasks/:task_id/workspace/*, /terminals, /browser, /services, /resources, /loop, and /sessions — runtime panes shown in the dashboard.
Runtime control actions include cancel, retry, recover, pause, resume, save, attach terminal/browser, pin, unpin, cleanup, and process termination.
WebSocket routes under /ws/runtime* carry event streams, terminal relays, and browser sessions.

Operational characteristics

Fail closed when required bootstrap infrastructure is unavailable — the runtime won't start tasks with a missing database, missing object store, or unvalidated secrets.
Memory and retrieval are best-effort — time-bounded with explicit timeouts, never silently bypassing a hard security boundary when they fail.
Health, doctor, and runtime dashboard routes are first-class — operators can always tell whether the runtime is healthy without SSH'ing into a container.
Production-like topology in quickstart — the local install uses the same services, ports, and internal targets as a VPS deployment, so drift between environments stays minimal.

Go deeper

Memory & knowledge — how recall and retrieval feed the runtime's context assembly step.
Runtime API reference — route inventory, response-shape boundaries, and status codes.
Troubleshooting — common runtime failure modes and where to look first.