TL;DR. AI tool value is concentrated in the 5 to 15 percent of your people who actually extract it. Procurement-led tool selection optimizes for the median user, who is exactly where the leverage is not. Split the decision. Standardize the durable foundation, the API, the agent platform, the internal tooling you want to outlast this year’s model, on Anthropic, which is now the enterprise default. Then let the chat layer stratify: your leveraged users pick the chat tab they open, and you fund what they already use. Standardize the rest only after the market inside your walls has converged. This guide is the framework.
You have an AI tool. Probably more than one. There is a renewal coming up. There is a vendor on your calendar. There is an internal champion arguing for the platform you don’t have. There is a peer at another company who told you, in a tone that implied you were behind, that they “standardized on” something.
None of those conversations are improved by another vendor demo. The decision you actually have to make is older and simpler than the marketing makes it sound. Which tool, for which roles, at which tier. The reason it feels hard is that almost every method commonly used to make this decision is wrong.
Why the procurement playbook misfires
The default playbook for a SaaS purchase. Run an RFP. Build a feature matrix. Negotiate with the vendor with the best partnership program. Standardize on one. Push it to everyone. Train everyone. Measure adoption. Move on.
This works for payroll. It doesn’t work here. Payroll software is bought for the median user, who has to be paid the same way as every other user. AI tools aren’t used by the median user. They’re used hard by a small fraction of your workforce, ignored by most of the rest, and the productivity gain shows up in the work the leveraged few are now able to do. Procurement logic optimizes for the wrong end of that distribution.
You can already see this in your own org. Your most senior leveraged user is paying for ChatGPT Plus on a personal card and pasting work into it because the approved tool is worse than the free one. Your engineers built their own internal Claude Code workflow because waiting for the procurement decision was going to cost them a quarter. Your marketing director runs a personal Claude account on the side and uses it for the work she actually wants to ship. The tools your org bought are not the tools your org’s leverage runs on.
This isn’t a discipline problem. It’s a selection problem. The procurement process selected for the wrong qualities, in the wrong vendor, against the wrong success metric.
The common pattern
There are six versions of it, and you have probably tried at least three.
“We already pay for Microsoft, so Copilot.” The most common one. About 65 percent of enterprises default to the incumbent on AI procurement. The data does not support this default. M365 Copilot has 16 million paid seats, which sounds large until you notice it is 3.9 percent of the M365 commercial base after two years on the market.1 The chat experience is the worst-rated of the four major vendors. When users at the same company are given a choice between ChatGPT, Copilot, and Gemini, they choose ChatGPT 76 percent of the time.2 About 44 percent of lapsed Copilot users say they stopped because they distrust the answers.3 You are buying the chat tool with the worst reputation in the market because the line item is already on the invoice.
“We ran an RFP.” RFPs select for the vendor that maps best to a fixed feature list. AI tool value is in qualities that do not fit RFP rows. How the model reasons. How it handles being interrupted and corrected. What its refusal behavior looks like. How it feels to keep open all day for a month. None of these fit a checkbox. The RFP finalist is almost never the tool your power users will actually open.
“Let’s standardize on one vendor for everything.” Standardizing the whole stack on one name is a late-stage move, and no single all-purpose winner has emerged. But the picture has sharpened since the start of the year. The durable foundation has consolidated: for anything you mean to build and keep, the API, the agent platform, internal tooling, custom workflows, Anthropic is now the enterprise default. The chat layer is what still stratifies. OpenAI leads chat preference. Microsoft leads embedded gains in spreadsheets and documents. Google leads Workspace-native shops with infrastructure ties. So standardize the foundation, and let the chat tab be a preference. Forcing one vendor across both means accepting the worst experience in most of the categories your people work in. The full picture is in The State of AI: Q3 2026.
“Let’s pick an agent platform.” Every major vendor shipped an enterprise agent platform on or around April 22.4 All of them want to be the layer your company builds custom agents on for the next decade. None of them are mature. Today’s agents fail 40 to 50 percent of the time on complex multi-application workflows.5 Picking a platform now is locking your company into one vendor’s failure modes for two years. Anyone selling you platform standardization in 2026 is selling you something.
“We hired a consultant to choose.” The consultant runs the RFP. See above.
“We’re letting it grow organically.” Organically usually means your people are pasting client data into free ChatGPT because the approved tool is worse than the free one. Shadow AI is not bottom-up success. It is top-down failure of provisioning. The fix is not to lock down the tools. It is to provide better tools than the ones people are sneaking around to use.
The thing all six have in common is that they treat AI tool selection as a procurement exercise to be done by the people who do procurement, on behalf of the people who use the tools. This is exactly the inversion that produces the wrong outcome.
Why it doesn’t work
Here is the underlying picture, in one paragraph.
Your AI productivity is lopsided. A small group of users (call it 5 to 15 percent, depending on the role mix) is getting 6x the engagement and a meaningful multiple of the output. The rest of your seat pool produces something between zero and a small marginal gain. The aggregate productivity number is the average of those two populations. A procurement decision made on the aggregate is, in effect, a decision made on behalf of the larger population, which is the population producing the smaller fraction of the value. The smaller population, the one producing most of the value, has very specific tool preferences. Override their preferences and you haven’t “standardized.” You’ve downgraded the only people whose output your AI program is actually moving.
Recognizing Leverage covers the disposition that produces this distribution and the signals you can use to find the people in it. The corollary, and the subject of this guide: now that you know where the value is, don’t make purchasing decisions that ignore them.
Three concentric tiers
The framework is not “give everyone the same tool.” It is three rings, deliberately differentiated by who is in them and what they are getting.
Tier 1: what everyone gets
A baseline chat tool that everyone in the org has access to. Not whatever you already pay for. The one your most leveraged users actually open.
In most orgs in Q3 2026, that is ChatGPT Business or Claude Team. Both are around $25 to $30 per user per month.6 Both are dramatically better daily-driver chat tools than the M365 Copilot chat experience. Pick one. If you cannot pick one, give people both. The cost of running two baseline chat subscriptions is rounding error against the cost of the wrong one.
What this tier is for: drafting, brainstorming, analysis, the daily knowledge-work tasks where chat is the interface. It is not for embedded document gains. It is not for coding. Those are different tiers.
A note on disposition. Most of the people in Tier 1 will never become heavy users. The point of provisioning a good baseline chat tool is not to convert the broad middle into power users. It is to get the broad middle out of free ChatGPT, and to make sure the dispositionally inclined who have not yet been spotted have a real tool waiting when they start to use it.
Tier 2: what specific roles get
Beyond the baseline, specific roles get specific tools, chosen by the people doing the role, not by procurement.
Engineering gets a coding agent. Claude Code, Codex, Cursor, GitHub Copilot. The choice belongs to the engineers, not to the CIO. The tools are stratifying by preference and the right answer is to fund whichever the team converges on, often more than one. Coding agent budget is the single highest-ROI line item in any AI program in Q3 2026. Do not ration it. Claude Code alone is at $2.5 billion in annualized revenue, and Anthropic now holds an estimated 54 percent of the enterprise coding market, because it works.78
Doc-heavy roles (finance modeling in Excel, contract drafting in Word, ops in spreadsheets) get M365 Copilot, or Gemini in Workspace, depending on which suite they live in. The embedded features are real. Financial modeling is 30 to 40 percent faster. Document drafting is 50 to 60 percent faster.9 This is what Copilot is actually for. The chat experience is bad. That is what Tier 1 exists for.
Customer support gets the support-specific AI built into their platform, not a chat tool retrofitted into a help desk. Zendesk AI, Intercom Fin, Salesforce Agentforce as it matures. The integration with ticket history and knowledge base is the value. A bare chat tool does not have either.
Sales gets call-prep and account-research workflows built on top of Tier 1, plus whatever their CRM ships. Most “sales AI” is repackaged chat with a vector DB of LinkedIn profiles attached. Tier 1 plus a thirty minute prompt-template session does most of the same job for none of the markup.
Legal gets the baseline plus a contract-review specialist (Harvey, Spellbook, others) once the volume justifies the cost. Below a certain document throughput, the specialist is overpriced for the use; above it, the specialist pays for itself in associate hours saved.
The pattern across roles is the same. The embedded or specialty tool wins where the workflow is fixed and the integration is the value. The chat tool wins where the work is open-ended.
Tier 3: what power users get
Stop rationing. The 5 to 15 percent producing the leverage should have whatever they ask for.
Multiple chat subscriptions, because Claude and ChatGPT are good at different things and a power user who knows both will use both. Direct API budget for whatever they are building. Cursor or Codex Pro at $200 a month for the engineers who actually live in those tools. The specialty tool they identified at the last bake-off. A second seat of something for the prototype they want to run in parallel.
The arithmetic is not subtle. A leveraged user is producing somewhere between $200,000 and $2 million of marginal output per year, depending on role and seniority. Their tool budget is at most a few thousand dollars. If you are negotiating with them about whether they can have Cursor and Claude Code, you have misallocated leadership attention by an order of magnitude.
This tier is also where you find out which tool is winning inside your walls, which is the input to standardization later. Power users will gravitate toward what works. Watch the gravitation. Do not interfere with it.
The four serious vendors
Two paragraphs each. No matrix. The market has stratified enough that “which one” is no longer the right question. “Which one for what” is.
Anthropic (Claude). The enterprise default, and the place to standardize anything you mean to keep. Roughly 40 percent of enterprise LLM spend runs through Anthropic by the API meter, it led global LLM revenue in Q1 2026, and it holds an estimated 54 percent of the enterprise coding market, with eight of the Fortune 10 as customers.8 Buy it as the foundation: the API, the agent platform, the internal tooling, the custom workflows you want to outlast this year’s model. Its Agent Skills format, the packaging that turns a model into something that knows your finance close or your contract review, was opened as a standard this year, and a plug-in ecosystem grew up around it for finance, legal, accounting, and data science. That is what makes it the boring, defensible procurement choice, and boring is what wins enterprise. Claude Code is still the proven coding force multiplier. On pricing, Anthropic now bills the way the whole serious market does, a small seat fee plus metered consumption.10 Where Anthropic is not the reflex pick is the chat tab. For image generation, multimodal breadth, and the tool your non-technical people open without being told, look elsewhere.
OpenAI (ChatGPT, Codex). Buy for the chat layer: the default tool your people will open without prompting, the most comfortable interface for non-technical users, image generation, multimodal breadth. This is the preference to let your leveraged users exercise. The chat tab is exactly where you should not force a house standard. Codex for the engineering teams that prefer it, mostly for the desktop integration and Chronicle context. Do not make it the foundation you build durable agents and internal tooling on; that consolidation has gone to Anthropic. Workspace Agents, OpenAI’s answer to Cowork, runs in the cloud and keeps working when nobody is watching. Pilot it on low-stakes async work. Do not commit to it as a platform.
Microsoft (Copilot). Buy for the embedded features in Excel, Word, and Outlook, where the gains are large and measurable. Buy GitHub Copilot for engineering teams that have not picked something else (many have, and the ones that have are right). And note that Microsoft is no longer the flat-seat holdout: GitHub Copilot moved to usage-based, AI-credit billing on June 1, 2026, so all four serious vendors now bill metered. Do not buy for chat. The Copilot chat experience is the worst of the four majors and your users will tell you so if you ask. Do not make M365 Copilot the only AI provision in your org. It is a complement to a chat tool, not a substitute for one.
Google (Gemini). Buy for Workspace-native shops where everyone already lives in Docs, Sheets, and Gmail. Buy for infrastructure-heavy organizations with Google Cloud as their primary platform. The new Gemini Enterprise Agent Platform supports Anthropic’s Claude models alongside Gemini, which is a useful signal about where Google thinks the market is going. Do not buy as your coding tool; Gemini is a distant third or fourth behind Claude Code, Codex, and Cursor. Do not buy as the consumer-facing chat your users will pick on their own; they will pick ChatGPT.
The chat layer is stratifying; the foundation has consolidated on Anthropic. You do not have to pick one tool for everything. You do have to know which tier each one is in, and standardize the part that is ready to be standardized.
The Microsoft special case
Almost every reader of this guide has the Microsoft problem.
You bought M365 Copilot for everyone. Most of the seats are idle. The renewal is coming up. Your Microsoft rep is pitching you Copilot Studio and trying to get you to standardize on the agent platform.
The framework, in five steps:
- Pull the seat-level usage report. Identify the bottom 30 percent.
- Of the active users, distinguish the embedded-feature users (Excel, Word, Outlook) from the chat users.
- Keep Copilot for the embedded-feature users. Cancel for everyone else.
- Redirect the saved budget to ChatGPT Business or Claude Team for the chat use case.
- Tell your Microsoft rep your renewal will be smaller this year, and that you would be open to a re-pitch when chat NPS goes positive.
This is a defensible move. The data supports it. Most CFOs will accept it once they see the usage report. Your Microsoft rep will not be happy. That is information about whose interest the rep was representing. The mechanics of the audit itself (pulling the seat report, what to look at, how to defend the cuts) are in Evaluating Spend.
The harder version of this conversation is at the executive level, where someone made the original Copilot decision and may not enjoy revisiting it. The right framing there is not “we got it wrong.” It is “the data we did not have at the time of the decision is now available, and the cost of not acting on it is X dollars per quarter.” That is a defensible move at any altitude.
The agent platform question
Every vendor wants you to commit to their agent platform. OpenAI’s Workspace Agents. Google’s Gemini Enterprise Agent Platform. Microsoft’s expanded Copilot Studio. Salesforce Agentforce in partnership with Google. The April 22 launches were a coordinated land grab.
The right move in Q3 2026 is to pilot, not to pick.
Three reasons. First, today’s agents fail 40 to 50 percent of complex cross-application workflows. Picking a platform locks you into one vendor’s failure modes when you do not yet know which failure modes are which. Second, the platforms are all pre-1.0 in maturity. Whichever one you pick now, you will probably rebuild on a different one within 18 months. Third, and most importantly, the valuable artifact is not the platform choice. It is the muscle of specifying agent workflows: what you want the agent to do, what the success criteria are, what the escalation path looks like when it fails. That muscle transfers across platforms. The platform itself is a commodity that has not yet commoditized.
What to do instead. Pick two or three low-stakes workflows. Status reports. Expense categorization. Internal data lookups. Run each on a different vendor’s agent platform. Measure failure rates. Keep notes. Re-evaluate quarterly.
One capability did arrive, and it is not the platform. Scheduled tasks for non-technical users hit general availability when Claude Cowork shipped them on April 9, 2026: write a prompt, pick a cadence, no code. That is the first agent capability worth using now rather than piloting, for one specific shape of work, recurring work with a checkable output. A Monday status brief assembled from five systems. A weekly competitive scan. A daily reconciliation that flags what didn’t tie out. The person who used to spend Monday morning compiling the report reads a draft that was waiting when they sat down. Two limits keep it honest. Desktop-run tasks only execute while the machine is awake and the app is open, so anything load-bearing belongs on infrastructure that stays up. And the verifier constraint governs: only schedule work that two people can mechanically agree is right. Work that needs a careful senior read before anyone trusts it is not ready for a cadence. Pilot the full platforms. Use the scheduled task.
If your CIO is being told to “pick a platform” by Q3, the answer is no, and the reason to give is that the platform race is open and the cost of being wrong is higher than the cost of waiting two more quarters.
When to standardize
Standardization is not bad. Premature standardization is bad.
One layer is already past premature. The foundation you build durable agents and internal tooling on has converged on Anthropic, and standardizing it now is the defensible move. What follows is about the chat and role tools, where convergence still has to happen inside your walls.
The signal that says you are ready to standardize: power users from different starting tools have converged on the same one without being told to. Three engineers picked Cursor on their own. Two product managers are quoting Claude in Slack. The marketing team’s leveraged user is forwarding ChatGPT outputs to her director. When two or three of your highest-leverage users in a role are independently picking the same tool, that tool has won inside your walls. Then standardize. Negotiate the enterprise contract. Roll it out broadly. Train the broad middle on a tool that has already proved itself.
The reverse, picking the tool first and trying to make it the one everyone converges on, is the procurement default. It does not work in this market. The market is moving too fast and the user preferences are too strong.
Until you see convergence, allowing two or three tools is not chaos. It is information gathering at low cost. The information you are gathering is which tool wins inside your walls, which is the only basis for a defensible standardization decision later.
The heuristic
Five rules a director can screenshot.
- Buy the tool your leveraged users already open. Not the one already in your contract.
- Standardize the foundation now; stratify the chat layer. Build durable agents and internal tooling on Anthropic. Run two or three chat tools until your power users converge on one.
- Stop rationing your top 15 percent. Their tool budget is rounding error against their output.
- Cut idle seats every quarter. Redirect the savings to tools people actually use.
- Pilot agent platforms, do not pick one. The platforms are not mature. The workflow muscle is what carries.
If your AI tool spend is defensible against these five rules, the rest of the program has space to work. If it is not, the renewal is the moment to fix it.
Something to carry
Before your next AI vendor renewal, ask the three most leveraged AI users you can name which two tools they would pick if it were their decision. Not “which tool do you like.” Which two would you pick. The framing matters. You are looking for the tools that survive an honest comparison, not the tools that have a small advantage in one demo.
Buy those two. Cancel the seats for the tool they didn’t pick. Don’t negotiate with yourself for another quarter. The data is in your power users, and the conversation is what collects it.
If you can’t name three leveraged users, that’s a different problem. Recognizing Leverage is where to start.
Footnotes
-
Microsoft Q3 FY26 earnings disclosure, last verified April 2026. ↩
-
Internal study of mixed-tool deployments at Fortune 500 companies, last verified April 2026. ↩
-
Redress Compliance enterprise Copilot survey, January 2026. ↩
-
OpenAI Workspace Agents, Google Gemini Enterprise Agent Platform, Salesforce Agentforce announcements, April 22, 2026. ↩
-
MacStories independent testing of Cowork; comparable failure rates reported across other platforms in early-deployment data, April 2026. ↩
-
ChatGPT Business and Claude Team list pricing, last verified April 2026. Subject to vendor changes. ↩
-
Anthropic public revenue disclosure, April 2026. ↩
-
Microsoft customer case studies on financial modeling and document drafting, validated against independent enterprise pilot data, last verified April 2026. ↩
-
The whole serious market moved to metered enterprise billing in 2026, not just Anthropic. The model is now a small seat fee plus consumption committed against an annual spend or token volume (typically a 50-seat minimum), and usage billing generally cannot be disabled. The cost levers sit in the plumbing: prompt caching cuts the cost of repeated context by roughly 90%, and batch processing is about 50% cheaper for non-interactive work. See The State of AI: Q3 2026. Terms current as of mid-2026 and change frequently. ↩