TL;DR. AI tool value is concentrated in the 5 to 15 percent of your people who actually extract it. Procurement-led tool selection optimizes for the median user, who is exactly where the leverage is not. The right move is the opposite of consolidation. Let your leveraged users pick. Fund what they already open. Standardize only after the market inside your walls has converged. This guide is the framework.
You have an AI tool. Probably more than one. There is a renewal coming up. There is a vendor on your calendar. There is an internal champion arguing for the platform you don’t have. There is a peer at another company who told you, in a tone that implied you were behind, that they “standardized on” something.
None of those conversations are improved by another vendor demo. The decision you actually have to make is older and simpler than the marketing makes it sound. Which tool, for which roles, at which tier. The reason it feels hard is that almost every method commonly used to make this decision is wrong.
How most orgs choose AI tools
The default playbook. Run an RFP. Build a feature matrix. Negotiate with the vendor with the best partnership program. Standardize on one. Push it to everyone. Train everyone. Measure adoption. Move on.
This works for payroll. It does not work here. Payroll software is bought for the median user, who has to be paid the same way as every other user. AI tools are not used by the median user. They are used hard by a small fraction of your workforce, ignored by most of the rest, and the productivity gain shows up in the work the leveraged few are now able to do. Procurement logic optimizes for the wrong end of that distribution.
You can see the consequences in your own org. Your most senior leveraged user is paying for ChatGPT Plus on a personal card and pasting work into it because the approved tool is worse than the free one. Your engineers built their own internal Claude Code workflow because waiting for the procurement decision was going to cost them a quarter. Your marketing director runs a personal Claude account on the side and uses it for the work she actually wants to ship. The tools your org bought are not the tools your org’s leverage runs on.
This is not a discipline problem. It is a selection problem. The procurement process selected for the wrong qualities, in the wrong vendor, against the wrong success metric.
The common pattern
There are six versions of it and you have probably tried at least three.
“We already pay for Microsoft, so Copilot.” The most common one. About 65 percent of enterprises default to the incumbent on AI procurement. The data does not support this default. M365 Copilot has 16 million paid seats, which sounds large until you notice it is 3.9 percent of the M365 commercial base after two years on the market.1 The chat experience is the worst-rated of the four major vendors. When users at the same company are given a choice between ChatGPT, Copilot, and Gemini, they choose ChatGPT 76 percent of the time.2 About 44 percent of lapsed Copilot users say they stopped because they distrust the answers.3 You are buying the chat tool with the worst reputation in the market because the line item is already on the invoice.
“We ran an RFP.” RFPs select for the vendor that maps best to a fixed feature list. AI tool value is in qualities that do not fit RFP rows. How the model reasons. How it handles being interrupted and corrected. What its refusal behavior looks like. How it feels to keep open all day for a month. None of these fit a checkbox. The RFP finalist is almost never the tool your power users will actually open.
“Let’s standardize on one vendor.” Standardization is a late-stage move. It assumes you already know which tool has won. In Q2 2026, you do not, because the market is stratifying, not consolidating. Anthropic leads coding. OpenAI leads chat preference. Microsoft leads embedded gains in spreadsheets and documents. Google leads Workspace-native shops with infrastructure ties. Standardizing on one means accepting the worst experience in three of the four categories your people work in.
“Let’s pick an agent platform.” Every major vendor shipped an enterprise agent platform on or around April 22.4 All of them want to be the layer your company builds custom agents on for the next decade. None of them are mature. Today’s agents fail 40 to 50 percent of the time on complex multi-application workflows.5 Picking a platform now is locking your company into one vendor’s failure modes for two years. Anyone selling you platform standardization in 2026 is selling you something.
“We hired a consultant to choose.” The consultant runs the RFP. See above.
“We’re letting it grow organically.” Organically usually means your people are pasting client data into free ChatGPT because the approved tool is worse than the free one. Shadow AI is not bottom-up success. It is top-down failure of provisioning. The fix is not to lock down the tools. It is to provide better tools than the ones people are sneaking around to use.
The thing all six have in common is that they treat AI tool selection as a procurement exercise to be done by the people who do procurement, on behalf of the people who use the tools. This is exactly the inversion that produces the wrong outcome.
Why this fails
Here is the underlying picture, in one paragraph.
Your AI productivity is bimodal. A small group of users (call it 5 to 15 percent, depending on the role mix) are getting 6 times the engagement and a meaningful multiple of the output. The rest of your seat pool produces something between zero and a small marginal gain. The aggregate productivity number is the average of those two populations. Procurement decisions made on the aggregate are, in effect, decisions made on behalf of the larger population, which is the population producing the smaller fraction of the value. The smaller population, the one producing most of the value, has very specific tool preferences. Override their preferences and you have not “standardized.” You have downgraded the only people whose output your AI program is actually moving.
Recognizing Leverage covers the disposition that produces this distribution and the signals you can use to find the people in it. The corollary, and the subject of this guide: now that you know where the value is, do not make purchasing decisions that ignore them.
Three concentric tiers
The framework is not “give everyone the same tool.” It is three rings, deliberately differentiated by who is in them and what they are getting.
Tier 1: what everyone gets
A baseline chat tool that everyone in the org has access to. Not whatever you already pay for. The one your most leveraged users actually open.
In most orgs in Q2 2026, that is ChatGPT Business or Claude Team. Both are around $25 to $30 per user per month.6 Both are dramatically better daily-driver chat tools than the M365 Copilot chat experience. Pick one. If you cannot pick one, give people both. The cost of running two baseline chat subscriptions is rounding error against the cost of the wrong one.
What this tier is for: drafting, brainstorming, analysis, the daily knowledge-work tasks where chat is the interface. It is not for embedded document gains. It is not for coding. Those are different tiers.
A note on disposition. Most of the people in Tier 1 will never become heavy users. The point of provisioning a good baseline chat tool is not to convert the broad middle into power users. It is to get the broad middle out of free ChatGPT, and to make sure the dispositionally inclined who have not yet been spotted have a real tool waiting when they start to use it.
Tier 2: what specific roles get
Beyond the baseline, specific roles get specific tools, chosen by the people doing the role, not by procurement.
Engineering gets a coding agent. Claude Code, Codex, Cursor, GitHub Copilot. The choice belongs to the engineers, not to the CIO. The tools are stratifying by preference and the right answer is to fund whichever the team converges on, often more than one. Coding agent budget is the single highest-ROI line item in any AI program in Q2 2026. Do not ration it. Claude Code alone is at $2.5 billion in annualized revenue because it works.7
Doc-heavy roles (finance modeling in Excel, contract drafting in Word, ops in spreadsheets) get M365 Copilot, or Gemini in Workspace, depending on which suite they live in. The embedded features are real. Financial modeling is 30 to 40 percent faster. Document drafting is 50 to 60 percent faster.8 This is what Copilot is actually for. The chat experience is bad. That is what Tier 1 exists for.
Customer support gets the support-specific AI built into their platform, not a chat tool retrofitted into a help desk. Zendesk AI, Intercom Fin, Salesforce Agentforce as it matures. The integration with ticket history and knowledge base is the value. A bare chat tool does not have either.
Sales gets call-prep and account-research workflows built on top of Tier 1, plus whatever their CRM ships. Most “sales AI” is repackaged chat with a vector DB of LinkedIn profiles attached. Tier 1 plus a thirty minute prompt-template session does most of the same job for none of the markup.
Legal gets the baseline plus a contract-review specialist (Harvey, Spellbook, others) once the volume justifies the cost. Below a certain document throughput, the specialist is overpriced for the use; above it, the specialist pays for itself in associate hours saved.
The pattern across roles is the same. The embedded or specialty tool wins where the workflow is fixed and the integration is the value. The chat tool wins where the work is open-ended.
Tier 3: what power users get
Stop rationing. The 5 to 15 percent producing the leverage should have whatever they ask for.
Multiple chat subscriptions, because Claude and ChatGPT are good at different things and a power user who knows both will use both. Direct API budget for whatever they are building. Cursor or Codex Pro at $200 a month for the engineers who actually live in those tools. The specialty tool they identified at the last bake-off. A second seat of something for the prototype they want to run in parallel.
The arithmetic is not subtle. A leveraged user is producing somewhere between $200,000 and $2 million of marginal output per year, depending on role and seniority. Their tool budget is at most a few thousand dollars. If you are negotiating with them about whether they can have Cursor and Claude Code, you have misallocated leadership attention by an order of magnitude.
This tier is also where you find out which tool is winning inside your walls, which is the input to standardization later. Power users will gravitate toward what works. Watch the gravitation. Do not interfere with it.
The four serious vendors
Two paragraphs each. No matrix. The market has stratified enough that “which one” is no longer the right question. “Which one for what” is.
Anthropic (Claude). Buy for coding agents (Claude Code is the proven force multiplier in the market right now), analytical depth, and any work where reasoning quality matters more than breadth. Best technical reputation. Most enterprise-friendly procurement, especially after the safety story. The April 2026 enterprise pricing shift to $20 a seat plus usage-based consumption changes the math.9 Heavy users now cost more, light users almost nothing. If your usage skews heavy, Anthropic is more expensive than the headline number; if it skews light, the opposite. Do not skip for image generation, multimodal breadth, or “the default tool people will pick without being told.” Those are not Anthropic’s strengths.
OpenAI (ChatGPT, Codex). Buy for the default chat tool your people will open without prompting. The most comfortable interface for non-technical users. Image generation. Multimodal breadth. Codex for the engineering teams that prefer it, mostly for the desktop integration and Chronicle context. Do not buy as your sole tool for the deepest reasoning work; Claude still slightly leads the relevant benchmarks as of Q2 2026. Workspace Agents, OpenAI’s answer to Cowork, runs in the cloud and keeps working when nobody is watching. Pilot it on low-stakes async work. Do not commit to it as a platform.
Microsoft (Copilot). Buy for the embedded features in Excel, Word, and Outlook, where the gains are large and measurable. Buy GitHub Copilot for engineering teams that have not picked something else (many have, and the ones that have are right). Do not buy for chat. The Copilot chat experience is the worst of the four majors and your users will tell you so if you ask. Do not make M365 Copilot the only AI provision in your org. It is a complement to a chat tool, not a substitute for one.
Google (Gemini). Buy for Workspace-native shops where everyone already lives in Docs, Sheets, and Gmail. Buy for infrastructure-heavy organizations with Google Cloud as their primary platform. The new Gemini Enterprise Agent Platform supports Anthropic’s Claude models alongside Gemini, which is a useful signal about where Google thinks the market is going. Do not buy as your coding tool; Gemini is a distant third or fourth behind Claude Code, Codex, and Cursor. Do not buy as the consumer-facing chat your users will pick on their own; they will pick ChatGPT.
The four are stratifying, not consolidating. You do not have to pick one. You do have to know which tier each one is in, and why.
The Microsoft special case
Almost every reader of this guide has the Microsoft problem.
You bought M365 Copilot for everyone. Most of the seats are idle. The renewal is coming up. Your Microsoft rep is pitching you Copilot Studio and trying to get you to standardize on the agent platform.
The framework, in five steps:
- Pull the seat-level usage report. Identify the bottom 30 percent.
- Of the active users, distinguish the embedded-feature users (Excel, Word, Outlook) from the chat users.
- Keep Copilot for the embedded-feature users. Cancel for everyone else.
- Redirect the saved budget to ChatGPT Business or Claude Team for the chat use case.
- Tell your Microsoft rep your renewal will be smaller this year, and that you would be open to a re-pitch when chat NPS goes positive.
This is a defensible move. The data supports it. Most CFOs will accept it once they see the usage report. Your Microsoft rep will not be happy. That is information about whose interest the rep was representing. The mechanics of the audit itself — pulling the seat report, what to look at, how to defend the cuts — are in Evaluating Spend.
The harder version of this conversation is at the executive level, where someone made the original Copilot decision and may not enjoy revisiting it. The right framing there is not “we got it wrong.” It is “the data we did not have at the time of the decision is now available, and the cost of not acting on it is X dollars per quarter.” That is a defensible move at any altitude.
The agent platform question
Every vendor wants you to commit to their agent platform. OpenAI’s Workspace Agents. Google’s Gemini Enterprise Agent Platform. Microsoft’s expanded Copilot Studio. Salesforce Agentforce in partnership with Google. The April 22 launches were a coordinated land grab.
The right move in Q2 2026 is to pilot, not to pick.
Three reasons. First, today’s agents fail 40 to 50 percent of complex cross-application workflows. Picking a platform locks you into one vendor’s failure modes when you do not yet know which failure modes are which. Second, the platforms are all pre-1.0 in maturity. Whichever one you pick now, you will probably rebuild on a different one within 18 months. Third, and most importantly, the valuable artifact is not the platform choice. It is the muscle of specifying agent workflows: what you want the agent to do, what the success criteria are, what the escalation path looks like when it fails. That muscle transfers across platforms. The platform itself is a commodity that has not yet commoditized.
What to do instead. Pick two or three low-stakes workflows. Status reports. Expense categorization. Internal data lookups. Run each on a different vendor’s agent platform. Measure failure rates. Keep notes. Re-evaluate quarterly.
If your CIO is being told to “pick a platform” by Q3, the answer is no, and the reason to give is that the platform race is open and the cost of being wrong is higher than the cost of waiting two more quarters.
When to standardize
Standardization is not bad. Premature standardization is bad.
The signal that says you are ready to standardize: power users from different starting tools have converged on the same one without being told to. Three engineers picked Cursor on their own. Two product managers are quoting Claude in Slack. The marketing team’s leveraged user is forwarding ChatGPT outputs to her director. When two or three of your highest-leverage users in a role are independently picking the same tool, that tool has won inside your walls. Then standardize. Negotiate the enterprise contract. Roll it out broadly. Train the broad middle on a tool that has already proved itself.
The reverse, picking the tool first and trying to make it the one everyone converges on, is the procurement default. It does not work in this market. The market is moving too fast and the user preferences are too strong.
Until you see convergence, allowing two or three tools is not chaos. It is information gathering at low cost. The information you are gathering is which tool wins inside your walls, which is the only basis for a defensible standardization decision later.
The heuristic
Five rules a director can screenshot.
- Buy the tool your leveraged users already open. Not the one already in your contract.
- Run two or three tools, not one. Standardize only after you see convergence.
- Stop rationing your top 15 percent. Their tool budget is rounding error against their output.
- Cut idle seats every quarter. Redirect the savings to tools people actually use.
- Pilot agent platforms, do not pick one. The platforms are not mature. The workflow muscle is what carries.
If your AI tool spend is defensible against these five rules, the rest of the program has space to work. If it is not, the renewal is the moment to fix it.
What to do Monday morning
One thing.
Before your next AI vendor renewal, ask the three most leveraged AI users you can name which two tools they would pick if it were their decision. Not “which tool do you like.” Which two would you pick. The framing matters, because you are looking for the tools that survive an honest comparison, not the tools that have a small advantage in one demo.
Buy those two. Cancel the seats for the tool they did not pick. Do not negotiate. Do not pilot it for another quarter. The data is in your power users, and you just collected it.
If you cannot name three leveraged users, you have a different problem. Recognizing Leverage is where to start.
Footnotes
-
Microsoft Q3 FY26 earnings disclosure, last verified April 2026. ↩
-
Internal study of mixed-tool deployments at Fortune 500 companies, last verified April 2026. ↩
-
Redress Compliance enterprise Copilot survey, January 2026. ↩
-
OpenAI Workspace Agents, Google Gemini Enterprise Agent Platform, Salesforce Agentforce announcements, April 22, 2026. ↩
-
MacStories independent testing of Cowork; comparable failure rates reported across other platforms in early-deployment data, April 2026. ↩
-
ChatGPT Business and Claude Team list pricing, last verified April 2026. Subject to vendor changes. ↩
-
Anthropic public revenue disclosure, April 2026. ↩
-
Microsoft customer case studies on financial modeling and document drafting, validated against independent enterprise pilot data, last verified April 2026. ↩
-
Anthropic enterprise pricing change, effective April 2026. The model moved from a flat $200 per seat to $20 per seat plus usage at standard API rates. ↩