What is a critic gate in an AI agent system?

A critic gate is an evaluation step between a generation agent and the live output. It judges each drafted change against an explicit, page-specific brief and decides whether it ships, needs revision, or is rejected. It holds a real veto — nothing publishes without its approval.

Why does the critic matter more than the writer?

Because generation is cheap and judgment is scarce. Models can draft many plausible variants instantly; the hard part is consistently choosing the correct one against a defensible standard. Left alone, writers optimize for plausible — the critic is what enforces correct, on-brand, and defensible.

Should the same model write and evaluate?

No. A single agent grading its own work marks its own homework and shares the blind spots of the draft it produced. Separating writer and critic into different agents with different prompts gives the evaluation genuine independence — the same reason an editor is not the author.

How do you know if a critic gate is working?

Watch the rejection rate. A healthy critic blocks a meaningful share of drafts. A near-zero rate means it is rubber-stamping; an extremely high rate means it is miscalibrated or being fed poor drafts. The rejection rate is a core system-health metric worth publishing.

The Critic Gate: Why Evaluation Beats Generation in AI Agents

Everyone obsesses over the writer — which model drafts the copy, how clever the prompt is, how good the prose reads. We have come to believe that is the wrong focus. In an agentic content system, the layer that decides what is allowed to ship matters far more than the layer that drafts it. The critic, not the writer, is where quality is actually determined.

This is counterintuitive because generation is visible and evaluation is not. You see the article the writer produced; you do not see the twenty worse versions the critic rejected. But the rejected versions are the point. A mediocre writer behind a strict, well-designed critic produces a reliable system. A brilliant writer with no critic produces a confident, ungoverned mess. This piece is about building the gate.

What is a critic gate in an agentic system?

A critic gate is an evaluation step that sits between a generation agent and the live site, judging each drafted change against an explicit standard and deciding whether it ships, gets revised, or is rejected. It is the system's editorial conscience: the place where "could we write this" is separated from "should we publish this."

The gate only works if it has real authority. A critic that can comment but not block is a suggestion box, not a gate. The critic in a serious system holds a veto — it can stop a change cold, and nothing ships without its approval. That authority is what turns evaluation from decoration into governance.

It is not the model that writes that makes a system good. It is the model that decides what is allowed to ship.

Why does evaluation matter more than generation?

Evaluation matters more because generation is cheap and getting cheaper, while judgment is the scarce resource. Modern models can draft a dozen plausible variants of anything in seconds. The bottleneck is no longer producing options — it is choosing correctly among them, consistently, against a standard you can defend. That choosing is the critic's entire job.

Generation regresses to plausible; evaluation enforces correct

Left alone, a generation agent optimizes for plausible — text that reads well and sounds right. Plausible is not the same as true, on-brand, or defensible. The critic is what pulls the system from plausible toward correct, by holding each draft against criteria the writer is not even trying to satisfy.

The critic encodes taste the writer lacks

A writer agent knows how to write. A critic agent knows what this page is for, what claims are allowed, what voice is required, and what failure looks like. That contextual judgment — taste, encoded as explicit criteria — is the actual product. It is also the part competitors cannot copy by swapping in a better base model.

How do you design a critic that earns trust?

A trustworthy critic judges against an explicit, page-specific brief — not a vague sense of quality — and can demand revision or reject outright. Three design choices make the difference between a real gate and theater.

Give it an explicit brief. The critic evaluates against written criteria for this specific page: allowed claims, required voice, performance budgets, the core message it must not dilute. Vague critics make inconsistent decisions.
Give it teeth. The critic must be able to reject and to require revisions, not only approve. A gate that always says yes is not a gate.
Separate it from the writer. Different agent, different prompt, ideally a different vantage point — so the system is not grading its own homework.

Why separate the writer and the critic?

You separate them to defeat self-grading bias. A single agent asked to write and then judge its own work grades generously — it is, in effect, marking its own homework, and it shares every blind spot of the draft it just produced. Splitting generation and evaluation into distinct agents with distinct instructions gives the critic genuine independence from the writer's assumptions.

This mirrors how good human teams work: the editor is not the author, and the code reviewer is not the author of the pull request. The value of review comes precisely from the reviewer not being the person who fell in love with the draft. We build the same separation into the engine on purpose.

How do you measure a critic gate?

Track the rejection rate — the share of drafts the critic blocks or sends back. A healthy critic rejects a meaningful fraction; a critic with a near-zero rejection rate is rubber-stamping, and a critic rejecting almost everything is miscalibrated or fed bad drafts. The number is a core health metric for the whole system, and we watch it the way you would watch error rates on a service.

In our own [weekly engine logs](/blog/14-changes-shipped-this-week) we publish what the critic rejected alongside what shipped, because the rejections reveal more about the system's judgment than the approvals do. A system you can trust is one whose gate you can inspect. If you want an agentic content engine with a critic gate you can audit, [book a discovery](/contact.html).

The critic gate matters more than the writer