Verification Is the Bottleneck

Pick any workflow in your business that AI has measurably compressed in the last two years. Code review. Document extraction. Customer ticket triage. First-draft contract markup. Look at what they share. It is not the model. It is not the vendor. It is not how much training the team got.

What they share is that someone, often years before any AI conversation, made it cheap to find out whether the output was correct.

This is the most useful predictor I can give you for which parts of your business are about to compress and which are going to feel stuck no matter how much you spend. It is also the explanation for the pattern your champions have been showing you and the pattern your laggards keep blaming on the tool.

Where AI is winning, and why

The list of work AI has visibly taken over is shorter than the marketing suggests and longer than the skeptics admit. Coding, especially anything with a UI you can look at. Translation between known formats. Extracting structured data from messy inputs. First-pass triage. Forecasting where the answer eventually comes back and tells you whether you were right. On the research side: chess, Go, protein folding, math olympiad problems.

Run your finger down the list and the pattern stops looking like a vendor story and starts looking like a property of the work itself. Each one of these has a fast, mechanical answer to “did the model get it right.” A compiler accepts the code or rejects it. The schema validates or it doesn’t. The customer comes back angry or they don’t. The forecast either matches the eventual number or it embarrasses you.

That property has a name. Researchers call it the asymmetry of verification: tasks where checking the answer is dramatically cheaper than producing it.1 Sudoku is the canonical case, but the executive version is more useful. A reconciliation that ties out is a verifier. A test suite that goes red is a verifier. A customer satisfaction signal a week later is a verifier. Anywhere this property holds, models get sharp fast, because they can be wrong a thousand times before lunch and still converge on a working answer by the end of the day. Anywhere it doesn’t, they get fluent. Polished output, with no machinery to tell you whether it’s right.

The version of this principle that is most visible in your org sits in the engineering team. Frontend work dominates the AI coding success stories because verification is literally looking at the page.2 A glance is the entire check. That is why your engineers got there first, and why some of them are now doing the work of three.

Where you get vibes instead

The flip side is uglier and more common, and it is most of what hits an executive’s inbox.

Strategy memos. Marketing copy. Performance reviews. Vendor RFP responses. Most consulting deliverables. Board narratives. Anything whose correctness lives in the reader’s head, days or weeks after the work was produced. The model writes this work quickly and with conviction. The problem is that you cannot, sitting at your desk on a Wednesday afternoon, tell whether the document in front of you is right or merely well-arranged. The check requires a senior person reading slowly with the right context loaded. That check doesn’t scale and it isn’t always available, which is why the work piles up at the top of the org and bottlenecks there.

This is also where the long-horizon agent demos quietly fall apart in production. Each step looks reasonable in isolation. The chain ends up subtly off, and the error doesn’t surface until much later, when undoing it is expensive. The agent didn’t fail because the model was dumb. It failed because nothing in the loop was capable of saying “no” at step three, and the loop had no way to feel the wrongness as it accumulated. The wrong-answer failure mode in Managing Risk is the operational version of the same point.

A working heuristic. If a meeting about a decision would get heated, you don’t have a verification flow. This is a judgement call. Those exist, but you can’t automate them.

Why your champions found the workflows they found

Look back at the people in your organization who are operating at AI leverage. The ones from Recognizing Leverage. The engineer shipping at the pace of a small team. The finance lead who collapsed the close. The marketer who runs the campaign without the agency.

They didn’t pick those workflows by accident. They picked them, often without being able to articulate why, because each one had a mechanical check sitting at the end of it. The pull request runs in CI. The reconciliation either ties out or it doesn’t. The campaign metric reports back inside a week. Your champion intuited the asymmetry. They walked toward the work where being wrong was cheap and stayed away from the work where it wasn’t.

This is also why the broad-middle adoption problem in Driving Adoption is harder than the org chart makes it look. The “workflow fit” gap that keeps half your seats idle isn’t really about training. It’s about who got the verifiable workflows first. Your champions took those, and what’s left for the broad middle is the work where checking the answer is itself the hard part. Telling those people to “use AI more” without redesigning the work to add a check is asking them to swing at air. They correctly conclude the tool doesn’t help and stop opening it.

The fix isn’t a curriculum. The fix is to put a verifier into the workflow before you ask the team to use AI on it.

The work is designing the check

Once verification is the bottleneck, the interesting executive question stops being “can AI do this.” It becomes “can we define this work sharply enough that something, model or human, can be measured against it.” Get that right and the automation question collapses into an implementation detail.

Designing the check is rarely a technical exercise. It’s a definitional one, and most of the time it has nothing to do with AI. Underwriting already has a check: did the loan perform. Sales has one: did the deal close. Customer support has several: did the ticket get resolved, did the customer come back, did the refund go out. Manufacturing has one in the literal sense, sitting at the end of the line. The work in your business that already carries a check like this is the work AI is about to compress. The work that doesn’t, the work whose quality is whatever a senior person says it is on a given Tuesday, is the work that’s going to feel stuck.

The instinct will be to wait for a smarter model. That is almost never the actual constraint. The actual constraint is that the work has never been pinned down clearly enough for anything to be measured against it.

This is fixable, and it is much closer to a leadership exercise than a technical one. In most cases the check exists implicitly, in the head of the person who would reject the work in review, and the job is to get it written down. What does an acceptable output look like, specifically enough that two people on the team would agree? What would make a reviewer hand it back? How would we know, six weeks later, whether this was the right call?

Something to carry

Pick the workflow in your organization that you most want AI to compress. The one you’ve been gesturing at in leadership meetings. The one a vendor demo made you optimistic about.

Before you spend another dollar on tools or training for it, write two sentences. The first is what a correct output looks like, in terms specific enough that two of your people would agree on whether a given output meets the bar. The second is how you would know, mechanically and within hours rather than weeks, whether a given output was correct.

If you can write both, the workflow is ready to compress and your champion can have it on a thirty-day clock. If you can’t write the second one, the work in front of you isn’t an AI project. It’s a verifier project.

Footnotes

  1. Jason Wei, “Asymmetry of verification and verifier’s law”. ↩

  2. Alperen Keles, “Verifiability is the limit”. ↩