Evaluating Spend

TL;DR. Most AI budgets are wrong in the same direction. A flat per-seat license sitting in the inboxes of people who never open it, while the three people producing real leverage put forty dollars of API spend on a personal card every month. The Anthropic pricing pivot in April made the underlying problem explicit: idle seats were always subsidizing power users. Now you pay in proportion to leverage. This guide is how to fix the allocation in five minutes.

Your finance team is asking what the AI line item is producing. A vendor wants forty-five minutes to walk you through a higher tier. Three department heads want to expense Claude Team seats independently. A board member sent you a podcast. Somewhere in your expense system, a senior engineer is reimbursing themselves $200 a month for a Claude Pro account because the approved tool is worse than the free tier of the unapproved one.

The dollars aren’t the problem. The allocation is.

What your AI line item misses

The default AI budget reads like this. A bundled Copilot SKU rolled into your existing Microsoft 365 contract at $30 per seat per month. A pilot of ChatGPT Business at $25 per seat for the marketing team. A handful of Claude Team seats that your data lead bought after a conference. Maybe an API budget if you have engineers, usually buried in the cloud line and untracked. Add it up and you’re spending somewhere between $40 and $150 per employee per month on AI, depending on how generous your seat distribution is and how much your engineers are quietly burning on tokens.

That number, by itself, tells you almost nothing. The relevant questions aren’t “how much” but “to whom” and “for what.” A $90 per-employee average can be a leveraged spend if it’s concentrated where the leverage lives. The same $90 can be pure shelfware if it’s sprayed evenly across a workforce that mostly doesn’t open the tool.

Most AI budgets are the second case. Your CFO is asking the wrong question because you handed them the wrong instrument. The honest answer isn’t “we’re spending the right amount.” It’s “we’re spending in the wrong shape, and here’s the audit that proves it.”

Three common spending mistakes

Almost every AI budget I’ve seen is some combination of these three mistakes. Each looks defensible in isolation. Together they produce a line item with no leverage to point at.

The bundled-with-the-suite mistake. You added Copilot to your E5 license because the rep made it easy and procurement preferred a single contract. Two years in, paid penetration of M365 Copilot across Microsoft’s enterprise base sits at 3.9%. When users at the same company get a free choice between Copilot, ChatGPT, and Gemini, they pick ChatGPT seventy-six percent of the time. Copilot’s accuracy NPS has been negative for nine straight months. You bought it because it was easy. Your people don’t open it because it isn’t the tool they would choose.

The flat per-seat democracy mistake. Everyone gets a seat. Marketing, finance, legal, ops, the loading dock, the regional VP who hasn’t opened a chat interface in his life. The logic is fairness, or future-proofing, or “we want to give everyone the chance to learn.” The result is a usage curve where the top ten percent of seats produce ninety percent of the value and the bottom thirty percent never log in. You’re paying full price for the median, and the median is zero.

The training-budget-as-AI-budget mistake. You allocated $40,000 to AI training this year. A vendor delivered six lunch-and-learns. Attendance was strong. Usage didn’t move. The 6x engagement gap between power users and everyone else isn’t a knowledge gap, it’s a disposition gap, and a disposition gap doesn’t close with curriculum. You spent your AI budget on the symptom and got the symptom back.

These three mistakes share a pattern. They optimize for procurement convenience, for political fairness, and for the appearance of action. None of them are anchored to where the leverage actually lives in your org.

Why usage-based billing changes the math

For two years, the enterprise pricing model for AI was a flat seat fee. Two hundred dollars per month per Claude Enterprise seat, thirty dollars per Copilot seat, twenty-five for ChatGPT Business. The pricing was simple and the procurement story was familiar. It also lied to you about who was paying for what.

Under flat-seat pricing, the seat that never logged in cost the same as the seat that consumed millions of tokens a week. The vendor priced for an average user and made margin on the idle ones. Your power users were a loss leader. Your CFO loved the predictability. Nobody had any incentive to look at usage.

In April 2026, Anthropic ended that arrangement. The flat $200 enterprise SKU went away. The replacement is $20 per seat plus consumption at standard API rates, with a monthly minimum commit. For a heavy user pool, total cost can double or triple. For a lightly used pool, it falls toward zero. OpenAI’s Workspace Agents pricing is credit-based on the same logic. Google’s Gemini Enterprise Agent Platform is consumption-priced. Microsoft is the holdout, still selling flat seats, which is consistent with their position as the vendor whose users use the product least.

The implication for your budget is simple and unfamiliar. You now pay roughly in proportion to the leverage your org extracts. A pool of mostly idle seats becomes cheap. A pool of intense users becomes expensive, and that expense is correlated with output. Idle seats stop subsidizing the work. Power users stop being free.

The seat-count mental model that has run enterprise software procurement for thirty years is now actively misleading you. The right unit isn’t seats. It’s dollars per active user, and dollars per unit of observable output.

What good spend looks like

Here’s the operational answer. Concrete, opinionated, and adjustable for your role mix.

The defensible per-employee figure. For a knowledge-work organization in 2026, total AI spend per employee per month should land somewhere between $40 and $120, with the average closer to $80. Below $40, you’re almost certainly underfunding the people producing leverage. Above $120, you’re almost certainly buying seats for people who don’t use them. The number itself matters less than the shape underneath it.

The 70/20/10 allocation. Roughly seventy percent of your AI dollars should sit on the tools your power users actually choose to open. In practice in 2026, that’s Claude or ChatGPT for chat, and Claude Code, Codex, or Cursor for engineering. Roughly twenty percent goes to workspace AI for the broad middle of your org, where the value is real but modest: drafting in Word, summaries in Excel, meeting recaps. Roughly ten percent is a discretionary API budget for the engineers and operators who turn personal leverage into infrastructure the rest of the org consumes. Adjust the ratios for your role mix. An engineering-heavy org pushes more into category three. A sales-and-marketing org pushes more into category one.

Plan tiers by org size. Under twenty people, buy Team plans on the tools your power users prefer and skip enterprise procurement entirely. Twenty to five hundred people, move to Enterprise on the one or two tools your power users have already converged on, and add a metered API budget for engineers. Five hundred and up, you’ll need vendor-negotiated commits and an SSO story, but resist the urge to consolidate to a single vendor for procurement convenience. Single-vendor lock-in is exactly the trap the bundled-Copilot org fell into.

Seats versus API. A seat is the right unit for someone who opens a chat tab a few times a week. API access is the right unit for the engineer building a script that runs ten thousand times a month, and for the operator whose workflow now has a model in the loop on every record. The top decile of any role should usually have both. The cost of giving them both is small. The cost of forcing them through the seat interface is invisible and large, because they’ll go around you and you’ll lose visibility entirely.

The shape of a working budget. A defensible AI budget for a 200-person mid-market firm in Q2 2026 looks something like this. Twenty to forty Enterprise or Team seats on the tool your power users have chosen, at sixty to one hundred dollars each. A hundred and fifty Workspace Copilot or Gemini seats at thirty dollars, with the explicit expectation that half of them will produce modest gains and the other half will be killed in the next audit. A two to four thousand dollar monthly API budget, owned by a named engineer, with usage broken out by project. Total: roughly $14,000 to $22,000 a month, or $70 to $110 per employee. That’s a budget you can defend line by line.

Shadow AI as a budget signal

Your security team treats shadow AI as a policy problem. It is, but it’s also the cleanest signal you’ll ever get about the quality of your approved tooling.

When your senior salesperson is paying twenty dollars a month out of pocket for ChatGPT Plus while a Copilot seat sits idle in their Microsoft license, they’re telling you something specific. The approved tool is worse than the free tier of the unapproved one. They’ve chosen, with their own money, to route around you.

Run a query against your expense reports for “ChatGPT,” “Claude,” “Anthropic,” “OpenAI,” “Cursor,” and “Perplexity.” Whatever you find is your shadow AI tax, paid in dollars you aren’t capturing and in data exposure you aren’t governing. It’s also the highest-quality user research in your org. Those people did the evaluation for you. They picked the tool. They’re using it to do the work you pay them for.

The fix isn’t a memo. The fix is to make the shadow tool the approved tool, on a plan you can govern, before legal writes the forty-page framework that no one will read.

The five-minute spend audit

If you do nothing else from this guide, do this. It fits on one page and is the only artifact you need to bring to the next budget meeting.

  1. Pull the seat-level usage report for every AI tool with more than ninety days of seats deployed. Pull it yourself. Don’t let the vendor pull it for you. Vendor-supplied dashboards average over the seat pool by design.
  2. Sort by activity. Three buckets. Top ten percent (power), middle sixty percent (occasional), bottom thirty percent (idle). The cutoffs don’t need to be precise. The shape will be obvious within sixty seconds.
  3. Compute cost per active seat, not cost per licensed seat. Take total spend on the tool, divide by the count of seats with meaningful weekly use. If the result is two to three times the headline per-seat price, you’re funding shelfware.
  4. Search expense reports for the names of the major AI vendors. Whatever shows up is your shadow AI spend. Add it to the total. It’s real spend. Your finance team just isn’t seeing it.
  5. Check your figure against the benchmark. Total AI spend, divided by total employees. Land it against forty to one twenty per employee. Note where you’re concentrated and where you’re sprayed thin.

The output of this audit is three numbers and a short list. Total monthly AI spend. Cost per active seat. Shadow AI spend. The list is the bottom thirty percent of seats by activity. That’s your reallocation pool.

The whole exercise takes between twenty minutes and an afternoon, depending on how clean your usage data is. The first time you run it, it will be uncomfortable. The second quarter, it will be a routine.

What to tell your CFO

Three sentences. Edit lightly.

Our total AI spend is roughly $X per employee per month, in line with mid-market benchmarks. Roughly seventy percent of that spend is concentrated on the tools our highest-output people use every day, with the remainder split between broad-deployment workspace tools and a metered API budget. Each quarter we audit seat-level usage, kill the bottom thirty percent of inactive seats, and redirect the savings to the people producing measurable leverage.

That paragraph does several things at once. It puts a defensible number on the table. It signals you understand the difference between spend and allocation. It sets the expectation of a recurring audit, which is the single most important governance habit a finance team can hear from a budget owner. It closes the conversation.

If your CFO pushes for a traditional ROI model, the honest answer is that the instrument doesn’t measure what AI actually does. AI spend produces cycle-time compression, capability expansion, and the elimination of work that previously sat in a backlog. None of that lands cleanly on a P&L line designed for cost takeout. Offer to build a different instrument: a quarterly review of the audit numbers above, plus three named workflows that have measurably changed since the last review. That’s a defensible scoreboard. The McKinsey-style ROI model isn’t.

Traps to avoid

A few specific things that will cost you money in Q2 2026 if you don’t watch for them.

Don’t lock in a multi-year usage commit on a vendor whose pricing model you haven’t run for a full quarter. Anthropic’s new model rewards heavy use and punishes overcommit. If you commit to $50,000 a month and use $20,000, you’re paying the difference and the vendor knows it. Ramp into commits, not out of them.

Don’t let workspace AI sales reps tell you the tool is already paid for. It isn’t. It’s a thirty-dollar per-seat add-on that consistently underperforms the free tier of competing chat products. The sunk-cost framing is a vendor tactic. Audit the seats and act on the data.

Don’t buy an enterprise agent platform yet. All four major vendors shipped one in April. None of them are mature. Pilots, not platforms, until you’ve run something real to completion in production. The platform you pick under pressure in Q2 will be the platform you regret in Q4.

Don’t consolidate to a single vendor for procurement convenience. The market is stratifying, not consolidating. The right posture is to know which tier each vendor is selling you and to keep at least two of them honest with each other.

What to do Monday morning

One thing.

Pull the seat-level usage report for whichever AI tool you have the most seats of. If that’s Microsoft 365 Copilot, you’re about to have an uncomfortable hour. Sort the report by last-thirty-day activity. Identify the bottom thirty percent of seats. Send a note to those users. Two weeks to demonstrate use, or the seat goes back into the pool.

Take the dollars you free up and offer them, in the same week, to the three people in your org you suspect are getting the most leverage from AI right now. Ask what tool they would actually choose. Buy that tool. Track what changes.

That single audit, followed by that single reallocation, will tell you more about the shape of your AI spend than the next vendor pitch deck you sit through. It’s also the most defensible thing you can put in front of your CFO for the rest of the quarter.