You added an llms.txt. It felt like the right move: a clean, agent-readable map of your product, sitting at a well-known path. You shipped it and moved on.
But adding the file is not the same as the file working. llms.txt is a promise to agents. The only way to know the promise pays off is to ask an agent a question your llms.txt is supposed to answer, and check what comes back.
Two ways an llms.txt quietly fails
The first is content. You give an agent your llms.txt and it still answers wrong: the file is thin, ambiguous, or out of date, so the agent fills the gap with its training-data guess about how products like yours usually work. The file is present and useless.
The second is discovery. The agent never reads your llms.txt at all. It answers from memory, or from whatever it scraped months ago, because nothing made it fetch the current file. The file is fine and unread.
These are different failures and they need different tests. One asks "is the content good enough?" The other asks "can the agent even get to it?"
Test both, in the same question
Register your llms.txt as a source and run one question across two contexts: an injected cell that hands the agent the file directly, and a web cell that makes the agent reach it through a tool.
product:
name: my-product
description: short one-liner
sources:
llms: { url: https://my-product.dev/llms.txt }
agents:
quick:
provider: claude-code
model: claude-haiku-4-5
contexts:
given_llms: { mode: inject, source: llms }
web_llms: { mode: web, source: llms }
facts:
pricing:
statement: my-product is free and open source.
match:
allOf: ["free", "open source"]
questions:
- id: pricing
question: How is my-product priced?
agents: [quick]
contexts: [given_llms, web_llms]
expects: [pricing]
thresholds:
questions: 80
The given_llms cell injects your llms.txt into the prompt. That is the content test: if it fails here, your llms.txt does not actually answer the question. The web_llms cell does not inject the file. It gives the agent the canonical llms.txt URL and requires a real web tool call. That is the discovery test: if it fails here while given_llms passes, your content is fine but the agent did not use it through the configured web path.
Both cells use the same facts contract for the answer you want. The web tool calls are real. No model grades another model. The score is the score.
Why this beats reading it yourself
You can open your own llms.txt and nod at it. That tells you it looks complete. It does not tell you whether an agent, given a real question, produces the right answer from it, or whether it bothers to read it under conditions you do not control. The file is written for a reader you are not. Test it with that reader.
Run this on the questions that matter most for your product, the ones where a wrong answer costs you a user, and put it in CI so a regression in your llms.txt fails the build instead of surfacing as a confidently wrong agent.
Try it
Point pickled at your llms.txt and write one question it is supposed to answer. Run it once with the file injected and once with the agent reaching the URL through web tools, and see which promise holds.