โ† Blog

June 1, 2026 ยท Caio Pizzol

Your AGENTS.md and CLAUDE.md, under the same test

Pickled is primarily about public product context: can an outside agent reach the right answer about your product through your README, docs, llms.txt, or MCP surface? But the same source / context / question / fact model also works on the files you write to steer your own coding agents, like AGENTS.md and CLAUDE.md. This is the secondary case: the same machinery pointed inward.

Steering files are untested prompt surface

You wrote an AGENTS.md: the package manager, the test command, the directories not to touch, the patterns you never want to see again. Then you trusted it. Your code has tests; that file has none. It can rot for months, and the only signal is an agent quietly running npm in a Bun repo or reintroducing the exact pattern you wrote a line to forbid.

A stale instruction is worse than a missing one. A missing one leaves the agent to its priors; a stale one is a confident wrong answer the next agent follows as policy.

Put it under test

Register the file as a source, ask the question it is supposed to answer, and pin the contract. No model grades another model: the answer is checked deterministically against the facts it must cover and the stale claims it must reject.

product:
  name: my-repo
  description: the repo your agents work in

sources:
  agents_md: ./AGENTS.md

agents:
  local:
    provider: claude-code
    model: claude-haiku-4-5

contexts:
  given_agents_md: { mode: inject, source: agents_md }

facts:
  bun_install:
    statement: The repo installs dependencies with bun install.
    match:
      allOf: ["bun install"]

misstatements:
  npm_or_yarn:
    statement: The answer recommends npm or yarn for dependency installation.
    match:
      anyOf: ["npm install", "yarn add"]

questions:
  - id: package-manager
    question: What package manager does this repo use, and what installs dependencies?
    agents: [local]
    contexts: [given_agents_md]
    expects: [bun_install]
    rejects: [npm_or_yarn]
    examples:
      pass:
        - "Use Bun. Run `bun install` to install dependencies."
      fail:
        - "Run `npm install` or `yarn add`."

thresholds:
  questions: 100

With mode: inject, AGENTS.md is the only content injected. The bun_install fact asserts the answer you want; the npm_or_yarn misstatement catches the stale-but-plausible npm or yarn a thin instruction file lets through. Wire it into CI and a drifting AGENTS.md fails the build before it ships, instead of surfacing as a confused agent three weeks later.

Same machinery, two directions

Whether you are testing how the outside world's agents read your published product, or whether your own steering files still steer your own agents, it is one model: register the source, name the context, ask the question, check the answer by contract. Public context is the main event. Internal steering is the same test, pointed at your own repo.