Skills

Best practices

Patterns for skills that age well and scale across agents.

The existing skill corpus in koda/skills/ is the best source of truth for how to write a good skill. The patterns below are extracted from reading the curated library (architecture.md, code-review.md, deep-research.md, best-practices.md, among others). They're conventions, not hard rules — but skills that follow them age well and compose cleanly.

Scope the methodology

The <when_to_use> block should state applicability and non-applicability. Skills that only describe when to apply drift into over-use; skills that also tell the model when to skip stay sharp.

markdown

<when_to_use>
Apply this methodology for new system design, significant architectural
changes, or evaluating existing architectures. For small features or bug
fixes, this level of analysis is unnecessary — focus on the code instead.
</when_to_use>

Be honest about the edges

A skill that says "apply when relevant" is a skill that never deactivates. Anchor the scope to concrete task shapes and explicitly name the situations where a simpler approach wins.

Keep the instruction terse

The instruction frontmatter field drives model behaviour. One or two sentences with concrete verbs ("analyse", "prioritise", "emit") outperform a paragraph of exhortation. Save the nuance for the body.

yaml

instruction: "Analyze code against SOLID principles and common anti-patterns. Prioritize findings by structural impact and provide concrete refactored code, not abstract advice."

Enforce output shape

output_format_enforcement is where reproducibility lives. Models follow specific structural requirements more reliably than vague ones. "Structure as X, Y, Z in that order" consistently beats "format your answer clearly".

yaml

output_format_enforcement: "Structure as: **Context** (problem + constraints), **Decision** (chosen approach), **Trade-offs** (what was gained vs lost), **Risks** (what could go wrong + mitigation). Include a component diagram if applicable."

Write the approach as numbered steps

Four to seven numbered steps is the sweet spot. Each step names a concrete action, not an aspiration. Steps with sub-bullets work well for checklists.

"Identify key Non-Functional Requirements (NFRs): Performance, Scalability, Availability, Consistency vs Availability trade-offs, Security and compliance" ← good.
"Understand what matters" ← bad. No action, no criteria.

Signal confidence and limits

Skills that ask the model to label its confidence produce more trustworthy output than skills that don't. deep-research.md asks for confidence levels per finding; best-practices.md uses High / Medium / Low priority; architecture.md requires explicit risk + mitigation. Borrow the pattern.

Trigger patterns

Triggers gate auto-selection. A few patterns make them reliable:

Always case-insensitive ((?i)) and word-bounded (\b).
Cover synonyms, including language variants — skills tend to span English and Portuguese in the existing corpus.
Match noun phrases, not whole sentences. \brefactor(ing)?\b is better than refactor.*code.
Avoid being too greedy. A skill fired by "any" query is rarely a skill that's actually helpful.

Priority and token budgets

The selector uses priority as a tie-breaker when multiple skills match. Curated skills cluster around 40-50. Push higher (55-60) only when the skill should win against peers in a specific domain — and document why in a comment or the tags.

max_tokens sits between 2000 and 3000 for most skills. Longer methodologies (research, architecture) can justify 3000; quick-hit skills (TDD, code smell checks) do fine at 2000.

Use aliases generously

Three to four aliases per skill is typical. The ones that matter most: the natural English short name, a Portuguese variant (the codebase leans bilingual), and a hyphenated URL-safe form. They make the /skill command usable without memorising the canonical ID.

yaml

aliases: [boas-praticas, quality, standards]

Compose sparingly

requires and conflicts exist, but most skills stand alone. Reach for composition when a skill genuinely depends on another's methodology — a specialised code-review skill might require the general code-review skill, for example. Avoid chains that run four skills deep: each added skill eats into the shared token budget and blurs the methodology.

Versioning and change

Bump the version field on every material change to methodology or output shape. It's the only signal downstream consumers (dashboards, API clients) have that a skill's behaviour shifted. Mechanical edits (typo, alias addition) don't need a version bump.

Smoke-test before shipping

Two checks before a new skill lands in a deployment:

Manual invocation. Run /skill <name> with a representative query. The composed prompt should read like something an expert would ask, not a wall of metadata.
Trigger cross-check. Paste three candidate queries you'd expect to fire the skill, and three you wouldn't. Confirm the selector agrees in both directions.

Don't smuggle tools in

Skills describe methodology. Tool invocations belong in the agent contract, not the skill body. A skill that tries to call tools directly will confuse the runtime tool loop and bloat the token budget.

Next steps

Authoring a Skill — the file format and frontmatter reference.
Runtime — how the composer assembles your skill into the active prompt.