Ask an agent the same question about your product three ways and you can get three different answers. Hand it your docs directly and it nails the install command. Tell it to search the web and it never finds your site, so it guesses. Give it an MCP server and it returns a section that is six months stale.
Same agent, same question, same product. The variable is how the agent reached the information. If you only test one path, you are blind to the other two, and your users' agents run across all of them.
Three paths, three different signals
Pickled models the path as a named context, and the run includes one cell per context. Each one tests something different:
mode: injectplaces your source straight into the prompt. This is the content test: given the material, is the answer right? It isolates one thing, the quality of what you wrote, by removing retrieval from the equation.mode: webinjects nothing. If the context names a URL source, Pickled gives the agent that canonical URL as the target and requires a web tool call. This is the discovery test: can the agent reach the content through the web path?mode: mcpwires up an MCP server and requires the agent to call it. This is the surface test: does the tool you expose actually return the right answer?
A pass on inject and a fail on web is a specific, useful diagnosis: your content is good, but agents cannot discover it. A pass on web and a fail on mcp says your published docs work but the MCP path is not returning the same answer. One blended score would have hidden both.
What it looks like
product:
name: my-product
description: short one-liner
sources:
docs_bundle: { url: https://docs.my-product.dev/llms-full.txt }
agents:
quick:
provider: claude-code
model: claude-haiku-4-5
contexts:
given_docs_bundle: { mode: inject, source: docs_bundle }
web_docs_bundle: { mode: web, source: docs_bundle }
mcp_context7:
mode: mcp
servers:
context7:
url: https://mcp.context7.com/mcp
headers:
CONTEXT7_API_KEY: ${CONTEXT7_API_KEY}
facts:
auth_api_key:
statement: Authentication uses an API key.
match:
allOf: ["api key"]
questions:
- id: auth-setup
question: How do I authenticate with my-product?
agents: [quick]
contexts: [given_docs_bundle, web_docs_bundle, mcp_context7]
expects: [auth_api_key]
thresholds:
questions: 80
One question, three cells, scored independently. The same fact runs on all three, so the only thing that varies between them is the context path. That is the comparison you want.
The label has to be honest
There is a failure a softer test would miss: an agent can answer from memory without ever using the tool the cell is named for, and happen to be right. Pickled vetoes that. For a web or mcp cell, if the agent never invokes the configured tool path, the cell is set to NO regardless of whether the words match. A cell labeled web has to have actually used the web. Otherwise "passes on web" would be a lie, and the whole point is to know which path works.
One consequence worth stating: in web and mcp cells, your sources are not injected into the prompt. URL sources become canonical hints; local-file sources become readable names. If the same answer does not come back through the tool path, the cell fails, which is exactly the signal you are looking for.
Legibility is per-surface
There is no single number for "does an agent understand my product," because it depends on how the agent got there. Contexts turn that into something you can act on: a grid showing which retrieval paths you are legible on and which ones you are invisible on. That tells you where to spend, whether that is making your content discoverable, fixing what your MCP server returns, or tightening the docs themselves.
Try it
Declare an inject, a web, and one mcp context, point a question at all three, and read the row. The gaps are the work.