โ† Blog

June 1, 2026 ยท Caio Pizzol

The same agent gives three different answers about your product

Ask an agent the same question about your product three ways and you can get three different answers. Hand it your docs directly and it nails the install command. Tell it to search the web and it never finds your site, so it guesses. Give it an MCP server and it returns a section that is six months stale.

Same agent, same question, same product. The variable is how the agent reached the information. If you only test one path, you are blind to the other two, and your users' agents run across all of them.

Three paths, three different signals

Pickled models the path as a named context, and the run includes one cell per context. Each one tests something different:

A pass on inject and a fail on web is a specific, useful diagnosis: your content is good, but agents cannot discover it. A pass on web and a fail on mcp says your published docs work but the MCP path is not returning the same answer. One blended score would have hidden both.

What it looks like

product:
  name: my-product
  description: short one-liner

sources:
  docs_bundle: { url: https://docs.my-product.dev/llms-full.txt }

agents:
  quick:
    provider: claude-code
    model: claude-haiku-4-5

contexts:
  given_docs_bundle: { mode: inject, source: docs_bundle }
  web_docs_bundle: { mode: web, source: docs_bundle }
  mcp_context7:
    mode: mcp
    servers:
      context7:
        url: https://mcp.context7.com/mcp
        headers:
          CONTEXT7_API_KEY: ${CONTEXT7_API_KEY}

facts:
  auth_api_key:
    statement: Authentication uses an API key.
    match:
      allOf: ["api key"]

questions:
  - id: auth-setup
    question: How do I authenticate with my-product?
    agents: [quick]
    contexts: [given_docs_bundle, web_docs_bundle, mcp_context7]
    expects: [auth_api_key]

thresholds:
  questions: 80

One question, three cells, scored independently. The same fact runs on all three, so the only thing that varies between them is the context path. That is the comparison you want.

The label has to be honest

There is a failure a softer test would miss: an agent can answer from memory without ever using the tool the cell is named for, and happen to be right. Pickled vetoes that. For a web or mcp cell, if the agent never invokes the configured tool path, the cell is set to NO regardless of whether the words match. A cell labeled web has to have actually used the web. Otherwise "passes on web" would be a lie, and the whole point is to know which path works.

One consequence worth stating: in web and mcp cells, your sources are not injected into the prompt. URL sources become canonical hints; local-file sources become readable names. If the same answer does not come back through the tool path, the cell fails, which is exactly the signal you are looking for.

Legibility is per-surface

There is no single number for "does an agent understand my product," because it depends on how the agent got there. Contexts turn that into something you can act on: a grid showing which retrieval paths you are legible on and which ones you are invisible on. That tells you where to spend, whether that is making your content discoverable, fixing what your MCP server returns, or tightening the docs themselves.

Try it

Declare an inject, a web, and one mcp context, point a question at all three, and read the row. The gaps are the work.