TL;DR. The AI risk you are afraid of is not the AI risk that is going to hit you. The fear is regulator action and front-page lawsuits. The reality is your VP of Sales pasted a deal sheet into a free chat tool last Tuesday and you have no record of it. There are four failure modes that actually land in enterprises right now: data leakage, wrong answers, over-reliance, and shadow AI. Each has a small set of boring controls that work. The rest is theater.
Somebody on your leadership team has been emailed a slide deck about AI risk. It probably has a 4x4 matrix. It enumerates seventeen risk categories with red, yellow, and green dots. It mentions the EU AI Act. It includes a recommendation to establish an AI ethics committee. It is on the agenda for next month’s risk review.
In the same week, three things happened that the deck did not catch. A junior analyst pasted a draft client roster into a free chat tool to clean up the formatting. A customer service bot quoted a refund policy that the company does not have. And the VP whose seat sat idle for nine months started a personal Claude account because the approved tool was worse than the consumer version.
None of those will appear on the risk register. All of them are the actual risk.
Why the risk register misses
The default executive playbook for AI risk is the playbook for every other emerging risk. Convene a working group. Inventory the categories. Score each one on likelihood and impact. Color-code. Present quarterly. Track movement.
This catches nothing because risk registers operate at the wrong layer. Registers track categories. Incidents happen at surfaces. A category like “data privacy violation” tells you nothing about whether your people are pasting customer records into ChatGPT today. A category like “model bias” tells you nothing about whether the chatbot you deployed last quarter is making refund commitments you cannot honor. The register is at the altitude of an ethics committee. The incident is at the altitude of a browser tab.
The deeper problem is that the register lets the executive feel governed without doing anything operational. The slide is the artifact. The artifact substitutes for the control. Six quarters later there is a thick binder of governance documents and a security team that still has no idea which AI tools their employees opened this morning.
Risk that gets managed has telemetry, an owner, and a kill switch. Risk that gets registered has a color.
The four failure modes
Almost every real AI incident at a normal enterprise falls into one of four categories. None of them are exotic. All of them have a control that costs less than the AI program.
Data leakage
Your people put things into AI tools that should not be there. Source code. Customer rosters. Salary spreadsheets. Draft contracts. Acquisition targets. Sometimes they use the consumer tier of a tool whose data terms permit training on inputs. Sometimes they use the enterprise tier but exfiltrate to a personal account on a weekend. Either way, the data is now somewhere it was not yesterday, and you cannot get it back.
Samsung remains the canonical version of this failure. In 2023, engineers pasted source code and internal meeting notes into a public chat tool to debug and summarize, the leak surfaced, and the company banned generative AI on company devices.1 The reflex response was a perimeter ban. The actual lesson was that no approved equivalent existed when the work showed up.
The control is paired and unglamorous. First, an enterprise tier of an approved chat tool with a signed data agreement that prohibits training on your inputs. Second, an endpoint or browser-level data loss prevention tool that flags paste events into known AI domains, including the long tail. The first removes the reason to use a consumer tool. The second tells you when somebody used one anyway. You need both. Neither alone is a control. The IBM 2025 breach data is blunt on the point: 97 percent of organizations that reported an AI-related security incident lacked proper AI access controls.2
If you have neither, your data leakage program is a wish.
Wrong answers
Models confabulate. They produce text that looks correct, in a confident register, with no marker that anything is invented. In a workflow with a competent human reviewer this is annoying. In a workflow without one it is a lawsuit.
The two cases worth remembering are public. In Mata v. Avianca, lawyers submitted a brief citing six judicial decisions that did not exist. ChatGPT had fabricated them and the lawyers had not checked. The court sanctioned them.3 In Moffatt v. Air Canada, a customer service chatbot invented a bereavement fare policy. A small claims tribunal held the airline to it.4 The pattern is the same: a model produced confident output, no human verified it, the output bound the institution, the institution paid.
Hallucination rates have fallen, but they have not gone to zero, and they will not. A frontier model in 2026 is wrong on factual claims at a low single-digit rate in the best benchmarks and substantially higher in long-tail domains. Treat any quoted “hallucination is solved” claim as marketing. The control is not a better model. It is a discipline rule about which workflows are allowed to ship model output without a verifying human.
The rule is binary. If model output reaches a customer, a court, a regulator, a counterparty, or a system of record without a human in the loop, the workflow is high-risk. High-risk workflows get a named reviewer, a logged sign-off, and a pre-mortem on what happens when the model is wrong. Everything else is low-risk and can ship faster. This is a one-page document, not a framework. Most orgs have neither.
Over-reliance
The third failure is slower and costs more. Your people get good at producing AI output and worse at producing the underlying judgment. The senior analyst who used to spot the discrepancy in a model now accepts the chatbot’s summary. The associate who used to write the memo now edits the draft. The engineer who used to read the code now reviews the diff. After a year, the muscle is softer. After three, the team cannot operate unaided.
The 2025 Microsoft and Carnegie Mellon study on knowledge workers using generative AI found a measurable shift away from independent critical evaluation toward verifying AI output, with the largest reduction in critical thinking exactly in the workers most confident in the model.5 That is the shape of the problem. Confidence rises faster than competence falls, so the decay is invisible until the moment you need the unaided skill and it is gone.
The control is cultural and small. Two practices catch most of it. First, a standing rule that any junior team member presenting AI-assisted work must be able to defend the underlying reasoning without the tool open. Not always. Periodically, in normal review. Second, at least one recurring task per role that is done unaided, by policy, because the skill is load-bearing for the rest of the work. A lawyer who cannot draft an argument from scratch is not a lawyer with leverage. A lawyer who cannot draft an argument from scratch is a paralegal with a subscription.
You will not see this risk on a dashboard. You will see it the first time a leveraged person leaves and the team behind them cannot reproduce the work.
Shadow AI
The policy guide treated shadow AI as a procurement failure: people use unapproved tools because the approved tool does not exist, is worse, or was never communicated. That framing is still right and it is still where most of the prevention work lives. But there is a residual risk after policy that is worth naming as its own failure mode.
Even with a good policy, an approved tool, and a two-week request SLA, some fraction of your people will continue to use consumer accounts. Some because the consumer model is genuinely better at their task. Some because they have a personal habit. Some because they are working from a phone and the enterprise tool is not installed. The IBM 2025 figure is again the relevant one: 63 percent of organizations had no governance to prevent shadow AI proliferation.2 Even at the organizations that did, residual use was non-zero.
The control is detection plus disclosure. Detection is the same DLP and endpoint stack that catches data leakage, configured to alert on traffic to consumer AI domains, not just enterprise ones. Disclosure is a written rule that a fast self-report of an incident is met with documentation and no penalty, while a hidden one is treated as a security incident. The point of the disclosure rule is not forgiveness. It is incentive design. You cannot fix what you cannot see, and you cannot see what your people are punished for showing you. This is the same logic every mature security organization applies to phishing-click rates and vulnerability disclosure. It is not new. It is just newly applied to a surface most companies are still pretending does not exist.
The heuristic
If you remember nothing else.
- Four failure modes, in order: data leakage, wrong answers, over-reliance, shadow AI. If your AI risk program names other categories before these four, it is not your AI risk program. It is somebody’s slide.
- Each failure mode has a paired control. Approved tier plus DLP. High-risk-workflow rule plus named reviewer. Unaided-work practice plus defend-the-reasoning culture. Detection plus rewarded disclosure. One half of any pair is not a control.
- Telemetry beats policy. A control you cannot measure is a wish. The minimum telemetry is who is using which AI tool, on what surface, against which data class.
- Reward fast disclosure. Treat concealment as the incident. This is the only sentence on culture you need.
- Kill switches matter more than approval gates. You will deploy AI things that fail. You need to be able to turn them off in an afternoon. Procurement contracts should include the off ramp.
- The risk you can name is not the risk that is coming. Reserve a small budget and a standing meeting for the failure mode you have not seen yet. The first agentic-workflow incident, the first model-update regression, the first vendor that disappears mid-quarter. Something on this list will happen this year and it is not in your register.
- No ethics committee. A named risk owner with a quarterly review and the authority to pull a tool off the approved list does the entire job an ethics committee was supposed to do, and actually does it.
What to do Monday morning
One thing.
Pull the last thirty days of endpoint or DLP logs filtered for traffic to known AI domains. The list is short and well-published. Every major chat tool, every major coding tool, plus the long tail of consumer wrappers your security team has probably already enumerated. Count distinct users.
That number is the only honest denominator you have. Compare it to the number of approved AI seats you are paying for. The gap is your shadow AI footprint. Pull the prompt or paste content where your stack captures it, sample twenty events, and read them. You will find one of three things. You will find people doing routine work in the wrong tool, which is a procurement problem you can fix in two weeks. You will find people moving sensitive data into a consumer tool, which is a control problem you can fix in a quarter. Or you will find nothing concerning, which means your detection is misconfigured and you have no controls at all.
In any of the three cases, you now have a real starting point. The risk register did not give you one. It was never going to.
The whole nine-guide arc, read end to end, makes a single argument. The leverage is real, the spend is misallocated, the tools are choosable, the talent is identifiable, the returns are observable, and the risks are containable with controls that already exist in every other part of your business. None of it requires a transformation. It requires a working week, a named owner, and the willingness to make the call.
Footnotes
-
Mark Gurman, “Samsung Bans Staff’s AI Use After Spotting ChatGPT Data Leak,” Bloomberg, May 2023. ↩
-
IBM and Ponemon Institute, Cost of a Data Breach Report 2025. The 97 percent and 63 percent figures refer to organizations reporting AI-related security incidents in the survey population. Last verified 2026-04. ↩ ↩2
-
Mata v. Avianca, Inc., No. 22-cv-1461 (S.D.N.Y. 2023). Sanctions order issued June 2023. ↩
-
Moffatt v. Air Canada, 2024 BCCRT 149 (Civil Resolution Tribunal of British Columbia). Decision issued February 2024. ↩
-
Hao-Ping (Hank) Lee et al., “The Impact of Generative AI on Critical Thinking,” Microsoft Research and Carnegie Mellon University, CHI 2025. The finding most often cited is the negative correlation between confidence in the AI tool and engagement in independent critical evaluation among knowledge workers. ↩