Catch the agents you can’t trust before they touch your data.
We polygraph
AI agents so you
don’t have to.
Independent, lab-evaluated polygraphs for MCP servers and the agents that use them. Free public polygraphs. CLI for runtime checks.
$ npx polygraph check <mcp-server>- 01→fetches the polygraph from polygraph.so
- 02→returns polygraph: grade + last-tested date
- 03→links to the full evidence report
Not yet evaluated? The CLI returns queued, position #Nand notifies you when the polygraph lands. The CLI is a lookup — probes run in our lab, not on your machine.
The MCP ecosystem grew faster than anyone’s ability to vet it.
Adoption metrics and dependency scans don’t tell you whether a server will exfiltrate your data or hijack the agent calling it. Frontier labs won’t independently polygraph the ecosystem they’re building on. You need an outside opinion.
5 probes. 3 categories. One sandbox.
- C-01shipping v1
Tool-output injection
probe 1.1 · probe 1.2
Does the server's output try to hijack the agent calling it? We feed it inputs that bait it into emitting injection-shaped text, then scan outputs for instruction mimicry, hidden unicode, and markdown tricks. - C-02shipping v1
Permission overreach
probe 2.2
Does it touch more than it claimed? In a no-expected-egress run, we flag any outbound network call. Phone-home detection on a default-deny network namespace. - C-03shipping v1
Sensitive data handling
probe 4.1 · probe 4.2
Does your data leave the sandbox when it shouldn't? We plant trackable markers (fake keys, distinctive PII strings) and watch every egress path plus the tool's own outputs back to the agent. - C-04v2 · deferred
Adversarial input handling
How does it behave on malformed inputs, oversized payloads, and known jailbreak patterns? Deferred from v1 — the deterministic battery ships first; this category waits for the harness to mature.
Probes evolve as agents do — new failure modes get new probes. The methodology is versioned and public. Read the v1 spec.
Three orthogonal axes. Never averaged.
| # | Question | Status |
|---|---|---|
| 01 | Is this artifact well-made? Public registries · OpenSSF · GitHub | existing |
| 02 | Does it behave well under pressure? Our sandbox | v1 polygraph — shipping |
| 03 | Does it stay behaving well in production? Runtime telemetry | next |
We run axis 02 — the polygraph. We point at axes 01 and 03 — never average them in.
Get notified when polygraphs publish.
One email when polygraphs for the servers you care about land. No drip campaign, no “hey just checking in.”