Is a rollback a sign the AI made a mistake?

Usually not. Of three rollbacks this period, two were client-initiated (and both re-shipped a week later once data favored them) and one was the critic gate pulling a change on live data. Zero were forced by a change that broke a page — which is the count that would actually indicate a mistake.

Why did changes get re-shipped after being rolled back?

Because the system defers to human judgment first and lets evidence settle it second. When a client asked to revert two changes, we did so immediately, then kept a small traffic slice running each variant. A week later the data favored both, the client agreed, and they re-shipped.

Can the engine roll back its own changes automatically?

Yes. The critic gate keeps watching after a change ships, not only before. When a previously approved change tripped an engagement guardrail on live data, the engine rolled it back on its own before a human flagged it.

Why does reversibility matter so much?

It decouples speed from risk. A complete one-step undo drops the cost of a wrong change to a few minutes live, so the system can try more, measure more, and keep only what wins. A self-healing site requires a self-undoing system underneath it.

Three Rollbacks, None Ours: Why Reversibility Wins

Engine log, reversibility edition. Three changes were rolled back this period. Two were client-initiated — and both re-shipped a week later once the data came in. One was a critic-gate rollback, caught after the fact. And the number we actually care about: zero rollbacks were forced by a change that broke something. That distinction is the whole story.

We publish this because rollbacks sound like failures and mostly are not. A system that can undo any change in one step is a system that can afford to move fast — the undo button is what makes the speed safe. Here is the breakdown, and why we treat reversibility as a headline feature rather than a quiet embarrassment.

What got rolled back, and why?

Three changes, three different causes — and crucially, none of them was "the engine shipped something that broke a page." Here is each one.

Two client-initiated rollbacks (both re-shipped)

A client looked at two changes, did not like them on first read, and asked us to revert. We did — immediately, in one step, no debate. Then we let a small traffic slice keep running the variant in the background. A week later the data favored the change on both, we walked the client through the numbers, and both re-shipped with their blessing. The system deferred to human judgment first and let evidence settle it second. That is exactly the order we want.

One critic-gate rollback

One change passed the [critic gate](/blog/the-critic-gate-matters-most) at draft time but tripped a threshold once live data arrived — engagement on the page dipped below the guardrail. The engine rolled it back on its own, before a human flagged it. The critic does not only judge before shipping; it keeps watching after, and it can pull a change it previously approved.

The rollback we never want to log is the one forced by a broken page. This period, like most, that count was zero. The other three are just the system working.

Why is "none of them ours" the important part?

Because the failure mode that matters is shipping a change that breaks something — and that is the count we drive to zero. Client-initiated rollbacks are the system correctly deferring to human preference. Critic-gate rollbacks are the system correctly policing itself on live data. Neither is the engine causing harm; both are the safety architecture doing its job.

If our logs ever showed rollbacks forced by breakage, that would be the real story, and we would lead with it. The honest version of an engine log includes the bad numbers when they happen. This period they did not, and that is precisely because every change is small, gated, and reversible by design.

Why does reversibility make fast iteration safe?

Reversibility decouples speed from risk. Normally, moving fast on a live site means accepting a higher chance of shipping something you regret. A one-step, complete rollback breaks that link: the cost of a wrong change drops to "a few minutes live, then undone," so the system can afford to try more, measure more, and keep only what wins.

Every change is reversible in one step — no manual cleanup, no partial state.
Traffic-slice testing means most weak changes are caught before full rollout, not after.
The critic keeps watching post-ship and can pull its own prior approval.
Clients can always revert, and can always keep what they like — control stays with them.

This is the substrate under the phrase "self-healing site." A system that heals itself needs a system that can un-do itself just as cleanly — healing is only safe when every step is reversible. We wrote about the philosophy of small, gated changes in [The case for boring agents](/blog/the-case-for-boring-agents). If you want an engine that iterates fast because it can always undo, [book a discovery](/contact.html).

Pick how you want to work together

Pros

Cons

Pros

Cons

Pros

Cons

Three rollbacks. None of them ours

What got rolled back, and why?

Two client-initiated rollbacks (both re-shipped)

One critic-gate rollback

Why is "none of them ours" the important part?

Why does reversibility make fast iteration safe?

Frequently asked questions

Related reading