If you are hiring an in-house AI lead, the job description you found online is probably wrong. Most read like a research-scientist posting bolted onto a buzzword, and they hire for the wrong thing: knowledge of models, rather than the judgment to deploy them safely and the rigor to prove they help. Here is a short, opinionated checklist of what the role actually does, and what to insist on.
This is written from the operator's seat — what we have learned running an agentic system in production, where the difference between a good AI hire and a bad one shows up not in what they can build but in what they can measure, govern, and undo.
What does an in-house AI lead actually do?
An effective AI lead is an operator and systems-builder: they turn AI capability into reliable, measured, reversible changes to how the business runs. Day to day, that means choosing where AI belongs, building the guardrails and rollback paths around it, instrumenting everything so its impact is provable, and saying no to the flashy applications that would not survive contact with production.
What they are not is a research scientist pushing the frontier of models, nor a prompt hobbyist with a clever-tricks collection. The frontier is being handled by the labs; the tricks are commoditizing weekly. The durable value is in the unglamorous middle: judgment about where to apply AI, and the engineering discipline to apply it without breaking things.
Hire for the judgment to deploy AI safely and the rigor to prove it worked — not for the longest list of models someone can name.
What is the single most important skill to insist on?
Insist on evaluation skill above all: the ability to measure whether an AI change actually helped. This is rarer and more valuable than the ability to build the change, because building is increasingly easy and measuring is still hard. A lead who can ship a model but cannot prove it improved anything will fill your roadmap with impressive-sounding work of unknown value.
Building is cheap; knowing if it worked is not
Anyone can wire up an AI feature in a weekend now. The expensive, scarce skill is designing the test that tells you whether the feature helped, hurt, or did nothing — the traffic slice, the control group, the metric that is not gameable. As we argued in [The critic gate matters more than the writer](/blog/the-critic-gate-matters-most), evaluation is where quality actually lives. The same is true of the person you hire.
The role owns safety, not just capability
A real AI lead owns the guardrails: rollback paths, observability, the ability to shut a system down cleanly when it misbehaves. This is the [reversibility discipline](/blog/three-rollbacks-none-ours) that makes fast iteration safe. A candidate who talks only about what they will build, and never about how they will contain it, is telling you they have not run AI in production.
What do most AI job descriptions get wrong?
Most over-index on model knowledge and under-index on judgment, change management, and measurement. They list frameworks and model names as requirements and omit the things that actually predict success: the ability to choose what not to automate, to manage the organizational change AI causes, and to prove impact with clean experiments.
- They demand model/framework familiarity that will be obsolete in a year, and ignore durable judgment.
- They ask for things to be built, but not for evidence that built things worked.
- They forget the role is half change-management — AI reshapes how people work, and someone has to lead that.
- They confuse enthusiasm for AI with the discipline to deploy it responsibly.
What questions actually reveal a good candidate?
Two questions separate operators from hobbyists. Ask them, and listen for whether the answer is concrete or hand-wavy.
"How would you prove this AI system is working?" A strong candidate immediately reaches for measurement: a control, a metric that cannot be gamed, a time horizon, a way to attribute the effect. A weak candidate describes how impressive the output looks. The first is thinking about value; the second is thinking about demos.
"How would you shut it down?" A strong candidate has a rollback story ready — how to detect failure, how to revert cleanly, what the blast radius is. A weak candidate is surprised by the question. Anyone who has run AI in production has been burned and has built the off switch; anyone who has not, has not.
When should you hire a lead vs. buy an engine?
Hire a lead when AI is becoming core to how your product or operations work and you need someone owning that capability full-time. Buy an engine — or a managed system — when you need the outcomes of agentic AI without building and staffing the whole apparatus yourself. Many teams do both: a lead to own strategy, and a managed engine to handle the high-volume execution underneath them.
That is, in fact, the shape we see most often with clients: a small internal team setting direction and governing the brief, with the Avakata engine doing the relentless [volume work behind a critic](/blog/five-person-team-one-engine). If you are weighing the build-vs-buy line for your own AI roadmap, [book a discovery](/contact.html) and we will talk through where the line sits for your team.