What's Actually Working in Enterprise AI: A Reality Check from the Fortune 50 Floor

David Lanstein

Co-founder and CEO at Atolio

Adapted from a spring 2026 fireside chat at HumanX with David Lanstein (CEO, Atolio) and Doug Shean (Former Fortune 50 Global Head of Innovation). Full video below.

‍

YouTube description

‍

I recently sat down with Doug Shean for an unfiltered conversation about enterprise AI. Doug spent nearly a decade as Global Head of IT Innovation at a Fortune 50 CPG, where his job was effectively to filter the AI startup pitches that hit the company. He has since left to build his own startup in the agentic AI space. He has talked to OpenAI in their headquarters, met with Microsoft Research on Copilot tuning, sat down with Google on their internal agentic deployments, and watched the industry's first two waves of enterprise AI experimentation up close.

His estimate of how much of what large-enterprise vendors claim about agentic AI is actually running in production: about 3%.

That number is the headline, but it is not the most useful part of the conversation. The useful part is what the other 97% looks like, why it stays stuck, and what the patterns are that actually work. This post pulls the seven that I think are worth a CIO's attention.

(The full conversation runs about 40 minutes and is embedded above. I'd recommend watching it. The post below is a structured summary.)

1. The marquee agentic use case is coding, and even coding is fragile

When Doug asked Google how much of their internal agentic activity was real, the answer was: mostly coding. That tracks with what every other hyperscaler will tell you. Coding is the use case with the cleanest evaluation loop (does it compile, do the tests pass), the most enthusiastic users, and the most ambient demand.

And yet, in the past year, GitHub, AWS, and Cloudflare have all seen their reliability degrade from five nines to two or three nines. Case in point: GitHub's unusable on a plane now. Two nines of reliability means roughly three weeks of outage per year. That is the marquee agentic use case, the one that is supposed to be working best.

The point is not that coding agents are broken. The point is that the most mature, best-evaluated, most-resourced agentic use case in the world is still surfacing reliability problems at scale. If that is the state of the leading edge, the appropriate prior for "we have 6,000 agentic use cases in production" is skepticism.

2. Most enterprise agentic backlogs are dreamt up top-down, by people who don't do the work

A CEO I recently spoke with told me about a company claiming to have 6,000 agentic use cases. But when he asked where they were? The answer was "in a desk drawer, waiting for us to build them."

This is the dominant failure mode. Business leaders, looking at the technology, generate use case lists. Each item on the list is a pain point in isolation. None of them connects to a coherent process. If you deploy them, you've created 6,000 disconnected workflows on top of the workflows that already exist, which makes the toil problem worse, not better.

The shift Doug argues for, and that I see in our own customer conversations, is bottom-up. Go to the person who has been at the company five years. Ask what they spend their time on that they wish they did not have to do. That is the use case backlog. It is short, it is specific, and it tends to be invisible to the executives funding the AI program, because those executives have admins and VPs who absorb the toil before it reaches them.

The reframe is simple. Stop asking "what AI use cases would grow revenue." Start asking "what would my best people stop doing tomorrow if they could." The answers are different.

3. The SaaS UI is collapsing into the place people actually work

ServiceNow has dozens of buttons. Workday has hundreds of fields. Most enterprise approval workflows take three hours of PowerPoint and produce no decision. None of this is going to be defended by users.

What is emerging instead is a UI-less layer. The work moves to Teams, Slack, Outlook, or whatever conversational surface the user already lives in. The agent handles the API calls back to the system of record. The user is pinged asynchronously with the context they need ("here is the access request, here is what XYZ means, here is who the requester is, you need to approve or deny") and makes a single decision. The 30-field UI is bypassed.

This shift has two implications most enterprise CIOs are underweighting. First, the multi-year investment large enterprises have made in standardizing the look and feel of internal applications is becoming a stranded asset. Users do not want consistent internal UIs. They want fewer UIs. Second, the SaaS vendors who built their moats around UI complexity are exposed. The market has already started to price this in. Doug pointed out that the CEOs of Coca-Cola and Walmart specifically said leadership currently is not equipped where things are going with AI. That kind of public statement from a Fortune 50 board does not happen unless the displacement risk is real.

The new HTTP agentic tags (the recently announced standard from Google, Microsoft, and others that extends HTTP for agent-native interaction) accelerate this. No more screen-scraping. Agents talk to systems of record directly, render dynamic visual answers (charts, mermaid diagrams, structured forms) into the chat surface, and the underlying UI becomes increasingly optional.

Five years out, the enterprise UI as we currently know it is going to look very different. Possibly mostly gone.

4. Data redundancy is the silent tax on every AI rollout

Every enterprise AI deployment runs into the same wall. The corpus is full of duplicates. Final_v12, Final_final, Final_final_v14, Final_FINAL_v46. Doug's framing from his P&G days: 30 or 40 versions of the same finance sheet, SharePoint compliance at maybe 20%, the rest scattered in personal copies. A CTO of one of our larger customers calls his Box tenant "a digital wasteland."

This is partly an incentives problem. Every SaaS vendor in the stack is paid to ingest more of your data, not to help you manage it down. Splunk's business model is volume. Box's business model is volume. The same logic applies to most knowledge platforms.

It is also a regulatory problem. In financial services and healthcare, the lawyers will not sign off on bulk deletion. Record retention rules cut against deduplication. The same memo that says "consolidate your corpus" is signed by people who cannot give permission to delete anything.

The opportunity is not "AI-driven data janitor," though that work exists and is valuable. The opportunity is to design enterprise search so that the act of using it nudges users toward the canonical version. Show people the most recently updated, most-collaborated-on copy. Surface authorship and history. Make the freshest version the easiest one to find. That changes the marginal incentive at the moment of creation, which is the only point in the lifecycle where you can prevent the next duplicate from being made.

This is why Atolio leans hard on the collaboration graph (who works with whom, who touched this document recently, who is the authority on this topic) rather than treating search as a pure text-relevance problem. The deduplication is downstream of the relevance model.

5. Inside regulated enterprises, the first wave of AI adoption already failed

The pattern repeats across the financial services, life sciences, defense, and healthcare CIOs I talk to. It looks like this:

Wave one: Get API access to a frontier model. Drop a chatbot in front of customers or employees. Discover that with too much autonomy, the chatbot will offer to sell a car for a dollar, or hallucinate a regulatory commitment, or leak sensitive data through an unexpected vector. Pull it back. Strip autonomy. End up with a glorified deterministic FAQ bot that is now technically running on an LLM but is not doing anything an LLM is good at.

Wave two: Realize the answer is in the middle. Give the model room to reason, give it the right context (the company's actual knowledge, not the open internet), give it the right memory, and put it inside a multi-agent harness where a coordinator agent breaks tasks into pieces and specialist agents handle them. This is the architecture that is actually shipping in regulated environments now.

But notice what it requires. It requires the model to have access to internal knowledge. Which means whoever runs that retrieval layer (the enterprise search and RAG layer underneath the agent) is, structurally, the company's AI access-control layer. The IBM 2025 Cost of a Data Breach Report found that 97% of organizations who experienced AI-related breaches lacked proper AI access controls. The retrieval layer is where those access controls have to live.

This is the architectural choice we made when we built Atolio. The platform deploys inside the customer's own cloud. The data does not leave. Document-level permissions from every source system are mirrored in real time so the model only ever retrieves what the asking user is already cleared to see. Customers bring their own LLM keys, which means token spend goes directly to Anthropic, OpenAI, Bedrock, or a self-hosted open model at provider rate-card, with the customer in control of which model handles which query.

We did not build it this way to be different. We built it this way because every CIO we talked to (762 of them before we wrote a line of code) said the same thing: data cannot leave, model selection needs to stay with us, and the architecture has to enforce that rather than the contract promising it.

6. The buying advice for the next 12 months: short pilots, no long contracts

Doug's view, which matches what I tell CIOs as well: do a lot of POCs. Three or four months at most. Do not sign multi-year contracts in this category right now.

The reason is not that the vendors are bad. The reason is that the underlying capability is moving fast enough that anything you lock into for three years is likely to be commoditized inside that window. Anthropic ships a new model. The thing you paid a vendor to wrap around the old model is now available as a feature for free. Or close to free.

There is a second reason. The vendor landscape is partly populated with what Doug calls "pretty prompts": products that are essentially a prompt with a nice UI and a sales motion around them. They will demo well. They will not produce a case study because there is nothing in production. Asking for a named, in-production case study (not a logo wall, not a "we worked with") is the cheapest filter available.

Three contract terms worth fighting for if you do sign anything longer than 12 months: escalator caps at 5 to 7 percent (against an industry baseline of about 12% annual SaaS inflation per Vertice’s Inflation Index), credit rate-card stability if the pricing uses credits, and exit-data SLAs defined in writing before signing.

7. Watch equivariant encryption (and what comes after BYO model keys)

The closing topic of the conversation was forward-looking. Today, BYO model keys solve part of the data-leaving-perimeter problem: the customer's account talks to Anthropic's account, the customer can audit the traffic, the customer can rotate the keys. But the tokens still go over the wire.

The next layer is equivariant encryption (or its cousins, fully homomorphic encryption applied to inference). The idea is that the customer encrypts the token space, sends encrypted tokens to the model, the model produces encrypted outputs, and the customer decrypts. An interceptor cannot read either the prompt or the completion. Carnegie Mellon and Stanford both have active research lines here. It is not production-ready in 2026, but it is close enough that buyers should know it exists, because the moment it ships, the data-residency objection to commercial LLMs becomes much smaller.

Combine this with the fact that open models (Minimax, Llama, others) running on four H100s are now hitting roughly the same performance as the frontier API models on most enterprise tasks, and the architectural picture in 18 to 36 months looks different. More enterprises will run their own GPUs. More inference will happen inside the customer's perimeter. The market for "we host your AI on multi-tenant infrastructure and charge you a premium for it" gets compressed from both directions: open models from below, encrypted inference from the side.

This is, again, why we built Atolio the way we did. The infrastructure pattern we are betting on (data stays in the customer's cloud, BYO model, customer-owned keys and prompts) is the one the rest of the market is starting to converge toward.

Where this leaves a Fortune 50 CIO

The honest summary of the conversation is this. About 3% of what gets claimed about enterprise agentic AI is actually running. The 97% that is not running is not running for predictable reasons: it was scoped top-down rather than bottom-up, it sits on top of duplicated data and disconnected processes, it was bought on long contracts that locked in yesterday's capability, and it sat in a regulatory or architectural posture that the security team would never approve.

The CIOs who are getting traction are doing the inverse on every dimension. Bottom-up use cases sourced from the people doing the work. Short pilots, not long contracts. Retrieval layers that respect document-level permissions and stay inside the customer's perimeter. Multi-agent harnesses with real context, not deterministic chatbots dressed up as AI. And a willingness to admit publicly that wave one mostly did not work, because that is the only way to design wave two honestly.

The full conversation with Doug is worth the 40 minutes. We get into the simulation research happening at Aru and similar startups, the population-modeling implications of Anthropic's emotional-vectors paper, and where Doug thinks the next inflection point lands. Watch it above, or find the chapter timestamps in the YouTube description.

If you are building or buying enterprise AI in a regulated environment and want to talk about what a permission-aware, in-your-cloud retrieval layer looks like, we are happy to show you Atolio in action. Book a demo.

Frequently asked questions

1. How much of enterprise agentic AI is actually running in production?

Per Doug Shean, former Global Head of IT Innovation at a Fortune 50, an informed estimate is roughly 3%. Most of what large vendors describe as "agentic use cases" are top-down concept lists rather than deployed workflows. The most mature agentic use case in the world, coding assistants, has measurably degraded the reliability of GitHub, AWS, and Cloudflare in the past 12 months. The realistic prior for enterprise-wide agentic deployment claims should be skepticism, paired with a request for named, in-production case studies.

2. What is the difference between top-down and bottom-up enterprise AI use cases?

Top-down use cases come from executives or business leaders identifying pain points in isolation and asking AI to solve them. They tend to multiply into disconnected workflows and rarely make it past the pilot. Bottom-up use cases come from individual contributors and middle managers who can identify the specific work they wish they did not have to do. Bottom-up use cases tend to be specific, time-bounded, and well-suited to agentic automation, but they are harder to surface because the executives funding AI programs have support staff who absorb the toil.

3. Why is the SaaS UI going away in enterprise AI?

Enterprise SaaS applications (ServiceNow, Workday, Salesforce) accumulated UI complexity over a decade because each customer demanded new fields and workflows. With agentic AI, the work shifts to where users already are (Slack, Teams, Outlook). The agent handles API calls to the system of record. The user receives a summary and makes a single decision, bypassing the multi-field UI. New standards like agentic HTTP tags from Google and Microsoft accelerate this by giving agents structured access to systems of record without screen-scraping.

4. What is data redundancy and why does it block enterprise AI?

Data redundancy is the proliferation of duplicate, near-duplicate, and slightly outdated versions of the same content across enterprise systems (the "Final_v14" problem). It blocks enterprise AI because retrieval systems return inconsistent answers, models cite the wrong version, and users lose trust. The structural fix is to design search and AI systems so the canonical (most recent, most collaborated-on) version is the easiest to find, which changes the incentive at the moment of document creation rather than trying to clean up after the fact.

5. What is a multi-agent harness?

A multi-agent harness is an architecture where a coordinator agent decomposes a task into sub-tasks and dispatches them to specialist agents (one for retrieval, one for compliance checking, one for drafting). Each specialist has its own tools, memory, and constraints. The coordinator aggregates outputs and produces a final response or recommendation. Multi-agent harnesses are the dominant architecture for production enterprise AI in regulated environments because they allow auditable, decomposable workflows where each step can be evaluated independently.

6. How should a CIO buy AI in 2026?

The pattern emerging from successful enterprise buyers: run three- to four-month proof-of-concept pilots rather than signing multi-year contracts; require named, in-production case studies before purchase, not just logo walls; negotiate escalator caps at 5 to 7% against the industry baseline of 12% annual SaaS inflation; for credit-based pricing, insist on contractual rate-card stability; and define exit-data SLAs in writing before signing. The underlying capability is moving fast enough that anything locked in for three years is likely to be commoditized inside the contract window.

7. What is equivariant encryption and why does it matter for enterprise AI?

Equivariant encryption (and related techniques like fully homomorphic encryption applied to inference) allows a customer to send encrypted tokens to a language model, receive encrypted outputs, and decrypt locally. An interceptor cannot read either the prompt or the completion. Active research at Carnegie Mellon, Stanford, and elsewhere is bringing this toward production-readiness. When it ships, the data-residency objection to commercial LLM APIs becomes much smaller, because the model provider never sees the actual content of customer queries or completions.

8. What is Atolio's architectural approach to enterprise AI search?

Atolio is deployed entirely inside the customer's own cloud (AWS, Azure, GCP, or GovCloud) or on-premise. Data never leaves the customer perimeter, processing happens inside the customer's tenant, and document-level permissions from every source system are mirrored in real time. Customers bring their own LLM API keys, so token spend goes directly to Anthropic, OpenAI, Bedrock, or a self-hosted open model at provider rate-card. This is architecturally enforced rather than contractually promised, which matters for regulated industries where the IBM 2025 finding that 97% of AI-related breaches involved improper access controls is operationally relevant.