Blog
- Use build tasks when answers are not enough
A build task lets an agent edit a workspace and proves success with your verifier commands.
- Does your public product context answer real questions?
Docs, README files, llms.txt, and MCP servers are the context agents use before they build with your product. Test the path that gives them the right answer.
- You added llms.txt. Can an agent actually use it?
An llms.txt is a promise to agents. Adding the file is not the same as the promise paying off. Here is how to put it under test.
- The same agent gives three different answers about your product
How an agent reaches your information - injected context, web search, or an MCP server - changes the answer. Test all three paths, not one.
- Stop letting one model grade another
LLM-as-judge is the default for agent evals. For testing whether an agent understands your product, a deterministic contract is the honest measure. Here is why.
- Your public product context has no CI gate. Give it one.
Your code fails the build when it breaks. Your docs, README, llms.txt, and MCP surfaces drift silently. Here is how to gate agent understanding in CI.
- Your AGENTS.md and CLAUDE.md, under the same test
Pickled is mostly about public product context. The same source, context, question, and fact model also puts your own internal steering files under test.