catch the silently-wrong stepSoftware for Agents · Paris Builds
Your agent's cheap model just confidently produced a wrong number — and nothing errored.
Plumbline checks every step against reality — it runs the test, it does the arithmetic — not
another AI's opinion. It catches the silently-wrong step, pinpoints it, and fixes only that step: retry the
cheap model with the reason, then escalate to a stronger model on another provider. You keep your model. You keep
your price. You stop shipping wrong answers.
👇 A cheap agent runs 7 steps. Watch the left ship a wrong wire — and Plumbline catch it.
● Cheap model alone
one cheap model, no safety net — what most agents do today
● Cheap model + Plumbline
same cheap model — Plumbline verifies each step against reality and recovers the failures
How you connect — one line, keep your own key
# You pick your model. Plumbline verifies it. Any provider.
client = OpenAI(
base_url="http://localhost:8129/v1", # <- the only change (runs today)
api_key=YOUR_KEY, # your key, passes through
)
Not a router. Routers pick a model up front and lock it. Plumbline keeps the model YOU
chose, checks each step against reality, and escalates only the failures — across providers a single lab won't reach.