Verification Is the Bottleneck

Pick any workflow in your business that AI has measurably compressed in the last two years. Code review. Document extraction. Customer ticket triage. First-draft contract markup. Look at what they share. It is not the model. It is not the vendor. It is not how much training the team got.

What they share is that somebody, often years before any AI conversation, made it cheap to check whether the output was correct.

This is the most useful predictor I can give you for which parts of your business are about to compress and which are going to feel stuck no matter how much you spend. It is also the explanation for the pattern your champions have been showing you and the pattern your laggards keep blaming on the tool.

Where AI eats, and why

The list of domains AI has visibly conquered is shorter than the marketing suggests and longer than the skeptics admit. Coding. Frontend work in particular. Translation between known formats. Document extraction against a schema. Triage. Forecasting against ground truth. Chess, Go, protein folding, math olympiad problems on the research side.

Each of these has the same hidden property. There is a fast, mechanical, cheap answer to “is this correct.” The compiler accepts the code or it doesn’t. The schema validates or it doesn’t. The customer comes back angry or they don’t. The forecast either matches the eventual number or it embarrasses you. Every one of these workflows had a verifier sitting in the environment before AI showed up, and the verifier is what the model attaches to.

The technical name for this is the asymmetry of verification. Some tasks take far less effort to check than to solve. Sudoku is the canonical case. A leetcode problem with a comprehensive test suite is the same shape. So is “find a smaller layout that fits these constraints” when you have a script that scores the layout. Where the asymmetry holds, models get sharp quickly, because they can be wrong a million times overnight and converge on the right answer by morning. Where it doesn’t, they get fluent. Different thing.1

The reader version of this is even simpler. Frontend code dominates AI coding success stories because verification is literally looking at the page. If it’s wrong, you see it.2 The check costs a glance.

Once you have the lens, the pattern across your own organization stops being mysterious. The places AI feels magical are places where verification was already solved or fell out for free. The places it feels like a toy are places where the only verifier is a senior person reading carefully on a Tuesday.

Where you get vibes instead

The flip side is uglier and more common, and it is most of what an executive sees in their inbox.

Strategy decks. Marketing copy. Performance reviews. Threat models. Most consulting deliverables. Anything whose correctness lives in a reader’s judgment, hours or days after the work was produced. The model writes these confidently and quickly. You cannot tell, in the moment, whether what you’re reading is right or merely plausible, because the verifier is “an experienced human reading carefully,” and that verifier doesn’t scale and isn’t always available.

This is also where long-horizon agents quietly fall apart. Each step looks fine in isolation. The composite is subtly off, and nobody catches it until much later, when the cost of catching it is much higher. The loop didn’t fail because the model is dumb. It failed because nothing in the environment was capable of saying “no” at step three. The wrong-answer failure mode in Managing Risk is the operational version of the same point.

A useful heuristic. If you and a competent colleague would disagree on whether the output is correct, and the disagreement would take a meeting to resolve, you do not have a verifier. You have taste. Taste is real and it matters and it will not run a million times overnight.

Why your champions found the workflows they found

Look back at the people in your organization who are operating at AI leverage. The ones from Recognizing Leverage — the engineer shipping at the pace of a small team, the finance lead who collapsed the close, the marketer who runs the campaign without the agency.

They didn’t pick those workflows by accident. They picked them, often without being able to articulate why, because each one had a verifier sitting at the end of it. The pull request runs in CI. The reconciliation either ties out or it doesn’t. The campaign metric reports back in a week. Your champion intuited the asymmetry. They walked toward the work where being wrong was cheap and stayed away from the work where it wasn’t.

This is also why the broad-middle adoption problem in Driving Adoption is harder than it looks. The “workflow fit” gap that keeps half your seats idle isn’t really about training. It’s about the fact that your champions found the verifiable workflows first, and the workflows left over for the broad middle are the ones where verification is expensive. Telling those people to “use AI more” without redesigning the work to add a verifier is asking them to ship into a void. They correctly conclude the tool doesn’t help and stop opening it.

The fix isn’t a curriculum. The fix is to design the verifier into the workflow before you ask the team to use AI on it.

The work is designing the verifier

Once verification is the bottleneck, the interesting question stops being “can AI do this task.” It becomes “can I design a cheap, trustworthy way to check this task.” Get that right and the automation question answers itself.

Designing a verifier is rarely a technical exercise. It is a definitional one. Underwriting has a verifier: did the loan perform. Sales has a verifier: did the deal close. Customer support has several: was the ticket resolved, did the customer come back, did the refund get issued. Manufacturing has a verifier in the literal sense, sitting at the end of the line. The processes in your business that already have one of these are the processes AI is about to compress. The processes that don’t, the ones whose quality is whatever a senior person says it is on a given Tuesday, are the ones that are going to feel stuck.

The instinct will be that the model isn’t smart enough yet. The actual problem is that the work has never been defined sharply enough for anything, human or otherwise, to be measured against it.

This is a tractable problem. Most of the time, the verifier exists implicitly and just hasn’t been made explicit. What does a good version of this output look like? What would make a reviewer reject it? How would we know, six weeks later, whether this decision was the right one? Answering those questions is unglamorous work, and it is the work that determines whether AI does anything for you. Tim Ariyeh’s piece on what we automate runs the same argument back to double-entry bookkeeping in 1494, which is roughly when the modern version of this principle was first written down.

Something to carry

Pick the workflow in your organization that you most want AI to compress. The one you’ve been gesturing at in leadership meetings. The one a vendor demo made you optimistic about.

Before you spend another dollar on tools or training for it, write down two sentences. The first is what a correct output of this workflow looks like, in terms specific enough that two people on your team would agree on whether a given output meets the bar. The second is how you would know, mechanically and within hours rather than weeks, whether a given output was correct.

If you can write both sentences, the workflow is ready to compress and your champion can have it on a thirty-day clock. If you can’t write the second one, the work in front of you isn’t an AI project. It is a verifier project.

Footnotes

  1. Jason Wei, “Asymmetry of verification and verifier’s law”. ↩

  2. Alperen Keles, “Verifiability is the limit”. ↩