The Future of Agent Building: How Enterprises Will Build, Govern, and Trust AI Agents in 2026

David Lanstein

Co-founder and CEO at Atolio

Summary

Agent deployment has outrun the controls that make agents safe, useful, and affordable. The gap is less about model capability, and more about the operating layer underneath: identity, retrieval, observability, action enforcement, and lifecycle. Here’s a snapshot of what’s happening as companies scramble to deploy AI, and agents specifically, within their organizations:

Sprawl is here: Gartner projects more than 150,000 AI agents per Fortune 500 by 2028, up from fewer than fifteen in 2025, yet only 13% of organizations feel prepared.
Pilots aren't producing returns: MIT famously claims that 95% of enterprise GenAI pilots deliver no measurable financial impact (MIT NANDA, 2025). A Fortune 50 Global Head of IT Innovation estimates only 3% of enterprise agentic AI is actually in production.
Access controls are missing: 97% of organizations breached via AI lacked proper access controls (IBM Cost of a Data Breach, 2025). 71% of enterprises use AI to access core systems, only 16% effectively govern that access (Saviynt CISO AI Risk Report, 2026).
Costs are spiraling: 73% of enterprises overshot AI cost projections in 2026 (FinOps Foundation, 2026), and enterprise token consumption is up 13x since January 2025 (Elvex / FinOps Foundation, 2026).

This guide breaks down the nine structural failure modes deciding whether your agent program scales or stalls, spanning the ones every company already feels to those gaining critical mass and the blind spots most companies aren't yet aware of yet. For each, we define the failure mode, why it matters, what to demand from your stack, and where a permission-aware knowledge layer fits in solving the issue / preventing the risk.

‍

Failure mode	What's actually happening	What to demand from your vendor
1. Security and data exposure	Agents connect to email, chat, CRM, code, and the web with the credentials of whoever built them. Prompt injection, tool poisoning, indirect injection, and exfiltration are mainstream risks.	Query-time policy enforcement (not a UI-level filter); scoped data access per agent; an architecture where no data leaves your perimeter to be governed.
2. Reliability and quality failures	Agents that demo well loop, hallucinate tool calls, and fail silently in production. The benchmarks teams buy on don't predict real behavior.	Per-agent quality and latency telemetry; retrieval grounded in a single high-quality index; fallback and timeout patterns engineered for silent failure.
3. Knowledge fragmentation and stale context	Enterprise knowledge sits in disconnected silos with poor native search. Fan-out retrieval is slow, expensive, and often grounded on stale data.	Real-time updates (not nightly batch); cross-system ranking in one query; per-agent retrieval scoping.
4. The pilot-to-production gap	Pilots succeed on clean data and small scope. Production exposes connector, permission, and quality gaps that weren't designed for.	Production-grade evaluation tooling; reference architectures for workflow redesign; case studies that document time-to-production.
5. Identity and access for AI agents	Agents inherit a developer's full credential surface, break when that developer leaves, and expose far more than they need.	Native agent identity provisioning via your IdP; document-level ACL inheritance; clean lifecycle for provisioning, rotation, and decommissioning.
6. Observability, audit trail, and traceability	Telemetry is scattered across four-plus vendor dashboards.	One audit log across all agents, tools, and data sources; per-agent kill switches; queryable logs for compliance evidence.
7. Multi-agent orchestration and the agent control plane	Vendor "control towers" are stack-bound, cloud-bound, and action-bound. They govern their own platform and don't see the data the agent retrieved.	Stack-neutral governance across Copilot, Gemini, Claude, Agentforce, Bedrock, and custom agents; visibility into both data plane and action plane; native MCP and A2A support.
8. Runaway cost and per-agent finances	Token spend compounds with retrieval volume, not user count. Few enterprises can attribute spend per agent or cap usage at the right level.	Per-agent cost telemetry; usage caps and budget enforcement; architecture that minimizes redundant tool calls rather than billing per call; model and cloud flexibility (no vendor compute markup).
9. Sprawl, lifecycle, and decommissioning	Agents spin up in four lines of code and persist past their creators. Most enterprises run far more agents than they can name.	A registry every agent flows through; per-agent ownership and lineage tracking; clean revocation when an agent retires.

‍

Setting the scene: agents are racing ahead of the operating layer underneath

Three forces are pulling in incompatible directions.

Adoption is accelerating. Gartner predicts 40% of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5% in 2025. LangChain's State of AI Agents finds 51% of organizations already run agents in production and 78% plan to. MCP and A2A have moved from experimental to default protocols in under twelve months.
Trust is eroding. Gartner separately predicts more than 40% of agentic AI projects will be canceled by the end of 2027, citing immature governance and unclear ROI. The 2026 CISO AI Risk Report finds 71% of organizations use AI tools to access core business systems like Salesforce and SAP, but only 16% effectively govern that access.
Economics are breaking. Token spend has risen 13x since January 2025. One healthcare enterprise consumed over a trillion tokens in six months without a budget cap. Seat-based pricing models designed for a world of named users collapse the moment power users generate 80% of agent traffic, which recent enterprise usage data shows is now typical.

The disconnect isn't about whether to deploy agents. It's about what to deploy them on top of. The agents your teams are spinning up today are inheriting whatever access, permissions, retrieval, and observability you have, which in most enterprises means fragile OAuth connections, fragmented dashboards, ungoverned tool calls, and no audit trail. That's not an agent problem. It's a foundation problem, and it's solvable.

Helpful terms and definitions

The vocabulary in this space is still settling. Use this as your decoder.

AI agent vs. agentic AI: An AI agent is a runtime instance: an LLM plus memory, tools, and a control loop. Agentic AI is the umbrella discipline (architectures, protocols, governance). Gartner, IBM, Salesforce, and Microsoft now use these distinctions consistently. For simplicity, we use “AI agent“ and “agent“ interchangeably for the runtime, and “agentic AI“ when referring to the broader category.

Agentic AI vs. autonomous AI: Both describe goal-directed behavior. “Autonomous“ implies less human-in-the-loop; “agentic“ is the broader term most analysts have settled on.

MCP (Model Context Protocol): Anthropic's open protocol, now adopted across Claude, ChatGPT, Copilot, Gemini, and Cursor, for connecting agents to data sources and tools.

A2A (Agent-to-Agent Protocol): Google-originated, now stewarded by the Linux Foundation, for inter-agent communication.

Agent control plane / agent gateway: The governance and orchestration layer sitting between agents and the systems they touch, handling auth, policy, audit, and routing.

Non-human identity (NHI) vs. agentic identity: NHI is the umbrella term (service accounts, API keys, machine tokens). Agentic identity is the emerging subset for ephemeral, intent-scoped agent credentials. Identity vendors disagree on framing; both terms appear in this guide.

Context engineering vs. RAG: RAG (retrieval-augmented generation) is one technique. Context engineering is the broader practice of orchestrating retrieval, memory, tool descriptions, and permissions – the discipline that determines whether an agent grounds its answers correctly. We use it as the broader term.

Nine structural failure modes, and what good looks like

Most published content treats agent failure as a list of risks to manage. This guide groups failures by how widespread buyer awareness of the topic is, what's starting to get more executive attention in enterprises , and what almost nobody is talking about yet, but should be.

Each section follows the same structure:

what's actually happening
why it matters
what addressing it requires
what to demand from any vendor selling you a piece of the answer

Most common failure modes

These four failure modes are no secret to today’s security and business leaders evaluating and adopting AI. They're also where most damage is done in 2026. Addressing them well is table stakes.

Failure mode 1: Security and data exposure

What's happening? Agents now connect to email, chat, CRM, code, document stores, and browsers, often with the credentials of the user who built them. The attack surface has expanded faster than any control plane has been deployed to govern it. Prompt injection, tool poisoning, indirect injection via untrusted content, and data exfiltration through agent-driven web requests are no longer theoretical.

Why it matters: IBM's 2025 Cost of a Data Breach Report found shadow-AI breaches cost an average of $670,000 more than other breaches, and 20% of all breaches now involve shadow AI. OWASP's 2025 LLM Top 10 lists prompt injection as the #1 risk for the second year running.

How to address this: Limit blast radius before you scale. The most reliable mitigations:

Constrain what data each agent can reach to the minimum required for its task
Intermediate all tool calls through a policy enforcement point that can apply rule sets at query time
Treat agents as identities subject to the same least-privilege scrutiny as employees
Instrument agent outputs for data leakage patterns, not just inputs for jailbreaks

We've written a fuller breakdown of the security risks MCP introduces at enterprise scale, and what it leaves unsolved.

What to demand from your vendor:

A query-time policy enforcement layer (not a UI-level filter)
Detection and prevention of indirect prompt injection at the data layer
Documented network-level controls for what an agent can reach
An architecture where untrusted content is handled differently from trusted content

Prompt injection and exfiltration are not fundamentally model problems, they are blast-radius problems. The most reliable defense is a smaller surface area: tight data scopes per agent, rule sets and policy enforcement at every tool call, network-level controls on what an agent can reach, and an architecture where no data has to travel out of your perimeter to be governed. Atolio is built to keep both the retrieval layer and the policy enforcement layer inside the same security boundary as the data indexed. Because the index is yours, inherits your existing IAM, encryption, and EDR controls, and runs inside your perimeter, even a successful injection still has no external destination it can reach.

Failure mode 2: Reliability and quality failures

What's happening? Agents that demo perfectly fall over in production. Hallucinated tool calls, repetitive loops when stuck, silent failures that look like success, and inconsistent results on complex tasks are the operational reality once volume scales.

Why it matters: LangChain's State of AI Agents finds quality is the single largest blocker to production, cited by one-third of respondents. Reliability vendor analysis pegs the demo-to-production failure rate at 70-88%. Recent academic work has found 8 of 10 widely-used agent benchmarks have severe validity issues, with do-nothing agents passing just 38% of airline-task evaluations, meaning the benchmarks teams buy on don't predict production behavior.

How to address this: Build an eval harness that runs continuously against production traffic, not just CI. Engineer for silent failure: timeouts, retries, fallbacks, and human-in-the-loop checkpoints on high-stakes actions. Treat quality as an operational metric, not a launch metric.

What to demand from your vendor:

Per-agent quality and latency telemetry exposed via API
Native support for fallback and timeout patterns
Production-grade eval frameworks that go beyond benchmark scores

Agents fall over in production for the same reason every distributed system does: demo conditions don't survive contact with real traffic, or under the load of the full organization’s systems or demands. What survives is engineered: continuous evaluation against production data, retrieval grounded in a single high-quality index rather than fanned out across native search, and explicit fallbacks for silent failure. Atolio's Collaboration Graph returns a single well-filtered result based on who the user works with, what they work on, and what’s currently happening in the business. Unconstrained MCP search typically requires multiple verbose queries, which is the difference between an agent that grounds confidently and one that loops or returns generic, unhelpful results.

Failure mode 3: Knowledge fragmentation and stale context

What's happening? Enterprise knowledge lives in dozens of silos – email, chat, ticketing, CRM, code, wikis, file shares – none of which were designed to be queried by an agent. Native search inside those systems is poor; agents work around it with repeated queries that are slow, expensive, and often miss the right answer. Worse, the answer that does come back may be stale, contradicted by a more recent document the agent never saw.

Why it matters: This is the silent killer of agent ROI. McKinsey's State of AI 2025 finds only 5.5% of organizations are realizing real financial returns from AI. The single biggest predictor of who falls into that 5.5% is access to high-quality, real-time, permission-aware context. As VentureBeat put it, context architecture is replacing simple RAG as the new center of gravity in enterprise AI. We've documented the architectural choices and failure modes in more depth in our 2026 Enterprise RAG Guide.

How to address this: Treat context as a first-class architectural concern. Centralize on a single, permission-aware knowledge layer that is updated in real time and ranks across all source systems. Decouple retrieval quality from any single source system's native search.

What to demand from your vendor:

Real-time updates (not nightly batch)
Cross-system ranking (not federated search that just concatenates results)
A collaboration graph that understands who is asking
The ability to scope retrieval to a curated subset for any given agent

Stale or fragmented context is the silent killer of agent ROI. The fix isn't a better model or a faster vector store; it is a single permission-aware retrieval layer updated in real time, ranking across every source system in one query rather than concatenating results from each. Atolio is that layer: a unified index across 35+ enterprise systems that respects source-system ACLs and refreshes continuously as content changes, with the ability to scope what any given agent can see via collections. Where most agents fan out across native source-system search, agents grounded on Atolio retrieve once and ground confidently.

Failure mode 4: The pilot-to-production gap

What's happening? Most agent projects look great in pilot and die in production. The fail rate isn't a model problem, it's an organizational problem. Workflows don't get redesigned. Benchmarks don't predict real-world performance. The team that built the pilot can't hand it off. Compliance and security get involved too late. The pilot satisfies a champion but doesn't satisfy procurement.

Why it matters: MIT's NANDA report found 95% of GenAI pilots deliver no measurable financial impact. LangChain's State of AI Agents found 32% of agent pilots stall after pilot and never reach production, and 62% lack a clear starting point. McKinsey's State of AI 2025 found “AI high performers“ – the 6% of companies attributing real financial returns to AI – are 3.6x more likely to be redesigning workflows rather than dropping agents into existing processes. Recent academic research on enterprise evaluations identifies an agentic disconnect, popular benchmarks fail to predict real-world utility.

How to address this: Scope pilots to validate the hardest production conditions, not the easiest. Build evaluation harnesses on real production data, not benchmark datasets. Plan workflow redesign as part of the deployment, not as a future phase. Get security, compliance, and procurement involved at week one, not week twelve. Define ROI metrics – and kill criteria – before you start.

What to demand from your vendor:

Production-grade evaluation tooling (not benchmark scores)
Reference architectures for workflow redesign
Explicit deployment patterns that account for security and compliance
Case studies that document time-to-production, not just time-to-pilot

The dominant reason agent pilots stall is that the foundation under them does not scale: connectors built for one team do not survive enterprise rollout, permissions built for one identity do not survive turnover, and quality that held on clean data does not survive production messiness. What works is building on a layer that is already enterprise-grade from the start. Atolio is built for that, with connectors for all of enterprises’ most-used tools and platforms, real-time ACL inheritance from source systems, and air-gapped deployment available on day one.

Emerging failure modes: gaining critical mass

These failure modes are where agent builders are increasingly getting stuck. CISOs and business leaders are aware of the dynamics, but mature playbooks for triaging them don't exist yet. Enterprises who solve these in 2026 will be months ahead of their peers.

Failure mode 5: Identity and access for AI agents

What's happening? Agents act on behalf of users, but the identity model was designed for humans. When an agent gets connected to SharePoint or Salesforce using a developer's OAuth token, it inherits everything that developer can see, which is usually far more than the agent needs. Worse, that inheritance often extends to destructive operations the agent was never scoped to perform. CISOs in 2026 are finding agents with permission to delete virtual servers, drop database tables, or revoke user accounts, not because anyone authorized those actions but because the developer who built the agent happened to hold those rights.

When the developer leaves, the agent breaks. When the token expires, the agent dies silently. When the agent gets prompt-injected, the attacker inherits the developer's full access footprint.

Why it matters: Non-human identities now outnumber human identities by a factor of 90 to 144 in typical enterprises. The Cloud Security Alliance reports 92% of organizations say legacy IAM cannot effectively manage AI and NHI risks, and 78% have no formal policy for AI agent identity creation or removal. The field is only now formalizing, though the OpenID Foundation published its first framework for agentic AI identity in October 2025.

How to address this: Treat every agent as a first-class identity. Provision dedicated agent identities through your existing OIDC-based single sign-on (Okta, Microsoft Entra ID, Ping, Google) rather than reusing employee tokens. Apply least privilege at the agent level, scoped to the minimum data set the agent needs. Inherit document-level ACLs from source systems automatically. Build lifecycle management – provisioning, rotation, decommissioning – into the platform from day one.

What to demand from your vendor:

Native agent identity provisioning that integrates with your existing OIDC single sign-on (Okta, Entra, Ping), not a parallel identity store
Document-level permission inheritance from source systems (not just system-level access)
Ephemeral, intent-scoped credentials
Explicit separation between read scope and write/action scope, so an agent can retrieve from a source without inheriting destructive operations against it
Clean decommissioning that revokes all derived access

Agents inherit the identity model they were built on, which means most of them inherit a developer's full credential surface and break the moment that developer leaves. What works at scale is a dedicated identity per agent, provisioned through your existing OIDC SSO (Okta, Microsoft Entra ID, or Ping), scoped to a collection of content the agent is allowed to see, and inheriting document-level ACLs from source systems and enforced at the moment of every query. Atolio's permission model does this with collections-based scoping that survives staff turnover and reflects ACL changes from source systems in real time. Identity is decoupled from the agent's creator.

Failure mode 6: Observability, audit trail, and traceability

What's happening? Once something goes wrong with an agent, teams need to reconstruct what it did, what data it touched, and what actions it took. Today that's nearly impossible. Telemetry is scattered across Application Insights, Copilot Studio dashboards, Dataverse, model-provider logs, and the source systems the agent connected to, none of which speak the same schema. Silent failures don't trigger alerts. Compliance teams can't produce evidence for audits.

Why it matters: A 2026 EY/AIUC-1 Consortium study found only 38% of organizations monitor AI traffic end-to-end, and just 17% continuously monitor agent-to-agent interactions. The EU AI Act's high-risk provisions activate August 2, 2026, requiring traceability that most current deployments cannot produce.

How to address this: Centralize agent telemetry through a single control plane. Capture every retrieval, every tool call, and every action with a unified schema. Make the audit log queryable in natural language (“show me every agent that touched this customer record in the last 24 hours”). Build in kill switches and anomaly detection from the start.

What to demand from your vendor:

A single audit log spanning all agents, all tools, and all data sources
Real-time anomaly detection; queryable logs for compliance evidence
Kill-switch capability per agent
Documented log retention aligned to your regulatory requirements

Forensic visibility into agent behavior is hard because telemetry is fragmented across platforms that were never designed to share a schema. Closing the gap requires a single layer that records every retrieval and every action with consistent identity, time, and policy context, so a compliance investigator can reconstruct what happened without correlating four dashboards. Because Atolio sits between agents and the systems they touch, every retrieval and every proxied action lands in one audit log, with native kill switches and anomaly detection per agent. IT can ask Atolio “which agents fired today and what did they do” and get a real answer.

Failure mode 7: Multi-agent orchestration and the agent control plane

What's happening? Single-agent deployments are giving way to multi-agent workflows: one agent handing off to another, agents calling each other via A2A, MCP servers proliferating across the enterprise. Handoff visibility, coordination, and trust between agents are the new operational frontier. Every major vendor is rushing to launch a “control tower,” “control plane,” “orchestration layer”, or some variation of the phrase, but most are stack-bound (governing only what runs inside their own platform), cloud-bound (proxying through their own infrastructure), and action-bound (seeing what agents do, not what they retrieved).

Why it matters: Gartner's 2026 Hype Cycle for Agentic AI named the “agent control plane” a top emerging category. VentureBeat called it the next enterprise battleground. But every vendor's control tower assumes their stack: Microsoft's assumes Entra and Foundry; Salesforce's assumes the Data Cloud; ServiceNow's assumes Now Assist. We've broken down what ServiceNow's Otto actually does (and doesn't) in a CIO guide; similar questions apply to every other stack-bound control tower. Enterprises that aren't single-vendor end up with multiple control towers, but no consolidated view.

How to address this: Choose a control plane that is stack-neutral – one that can govern agents regardless of where they were built or which platform they run on. Demand visibility into the data layer (not just the action layer) so you can see what each agent fetched, not just what it did. Plan for the protocol war: pick a vendor that implements MCP, A2A, and emerging successors rather than locking you into one.

What to demand from your vendor:

Stack-neutral agent governance (works with Copilot, Gemini, Claude, Agentforce, Bedrock, custom agents, not just one)
Both data plane visibility and action plane control
Native MCP and A2A support
An audit log that spans agent-to-agent handoffs

Most enterprise “control planes” available today are stack-bound (governing only agents built on the vendor's platform), cloud-bound (proxying through the vendor's infrastructure), and action-bound (seeing what agents do, not the data they retrieved). A governable multi-agent estate needs the opposite: stack-neutral coverage, visibility into the data plane as well as the action plane, and deployment inside the same security perimeter as the data being indexed. Atolio is architected differently: both the data plane (the knowledge layer agents read from) and the control plane (the policy enforcement and in-platform governance layer) run inside your perimeter. The result is the only architecture that gives you both planes, and keeps everything inside your security boundary, so your governance investment isn't rebuilt the next time the protocol war shifts.

Failure mode 8: Runaway cost and per-agent costs

What's happening? Agents consume tokens at rates orders of magnitude higher than human users, and the cost curve compounds with longer contexts, multi-step reasoning, and verbose MCP responses. Most enterprises can't tell you what any single agent costs, because billing attribution across cloud SKUs, model providers, and vector stores doesn't roll up.

Why it matters: FinOps Foundation's 2026 State of FinOps reports 73% of enterprises exceeded their AI cost projections. Enterprise token consumption is up 13x since January 2025 (Elvex / FinOps Foundation, 2026), Uber publicly burned its annual AI budget in four months after rolling out Claude Code at scale, and our own clients report ServiceNow's own AI Control Tower demos have used 10,000 tokens for a single HR query. Knowledge workers lose 23% of every week to searching for and gathering information (McKinsey). We've quantified the broader cost of fragmented retrieval in our framework and ROI calculator.

How to address this: Attribute spend per agent from day one. Cap usage at the agent level. Use federated retrieval (one ranked query across systems) instead of fan-out MCP calls. Route lower-stakes work to smaller, cheaper models. Eliminate redundant tool calls by improving the quality of the first response.

What to demand from your vendor:

Per-agent cost telemetry
Usage caps and budget enforcement
Transparent token accounting
Architecture that minimizes tool-call volume rather than billing per call
Model and cloud flexibility: BYO LLM keys across providers, BYO cloud credits, so lower-stakes work can route to smaller, cheaper models without re-platforming

Agent token spend compounds with retrieval volume, not user count, which is why most enterprise financial models miss the curve until they are weeks past projection. Predictable cost requires per-agent attribution, a sharp reduction in redundant tool calls per task, and an architecture that does not stack a vendor compute markup on top of existing cloud spend. Atolio replaces fan-out MCP calls with federated retrieval, runs on your own cloud infrastructure and credits with no vendor compute markup, and supports BYO LLM keys across OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure-hosted OpenAI, watsonx.ai, and self-hosted Llama. The same agent can route premium reasoning to a frontier model and routine retrieval to an open-weight model running on your own GPUs. The cost lever isn't only fewer tokens, it's the right model for each token – while also eliminating redundant tool calls due to improved quality of the first response. Per-agent costs roll up natively without a custom tagging pipeline, and are managed from the start.

Failure mode 9: Sprawl, lifecycle, and decommissioning

What's happening? Agents are easier to spin up than to retire. A developer can stand up a useful agent in four lines of Python or one CLI command, and IT has no idea. When that developer leaves, the agent persists. When the team rotates, ownership disappears. By the time anyone audits the inventory, the typical enterprise has dozens to hundreds of agents running with no documented owner, no compliance evidence, no decommissioning plan, and no clear understanding of which ones are still load-bearing.

The friction shows up before sprawl does. Builders go to IT with one-off requests (”Can we connect this agent to that system? Are we allowed to give it access to this data?”) and IT becomes a bottleneck on every new connector, every new permission scope, every new identity. The same pressure that creates sprawl, smart people working faster than the approval pipeline, also stalls the agents that should have been built. Administrative overhead is the silent half of the sprawl story.

Why it matters: Gartner projects a typical Fortune 500 will manage more than 150,000 AI agents by 2028, up from fewer than fifteen in 2025, and only 13% of organizations feel adequately prepared. Recent identity research found 82% of organizations have discovered previously unknown AI agents on their network in the past year, despite 68% claiming high confidence in visibility. The discovery gap is enormous.

How to address this: Build an agent inventory before you need one. Tag every agent with an owner, a purpose, a data scope, and a sunset date. Enforce a registration step: agents that aren't registered don't run. Builders will route around any system that makes their work harder than going direct, so the registration path has to be the path of least resistance: faster connector approval, default-on identity provisioning, a native agent builder that ships permission-aware agents in minutes. Friction is the real failure mode here, not policy. Treat decommissioning as a first-class lifecycle event, not an afterthought. Apply the same AI-BOM (AI Bill of Materials) discipline that's emerging in NIST and EU regulator guidance.

What to demand from your vendor:

A registry that every agent flows through
Per-agent ownership and lineage tracking
Native lifecycle management (provision, rotate, decommission)
Clean revocation when an agent is retired

Agent sprawl is a lifecycle problem dressed up as a discovery problem. Solving it requires that every agent in your environment route through a single layer that ties access to an owner, a purpose, and a sunset, so registration becomes the default path and decommissioning revokes all derived access at once. Atolio's data and action proxy makes that the default flow, which gives IT a real-time inventory of what exists and what each agent can reach without a separate tagging pipeline.

The architecture that closes the gap

Most of what fails in the nine failure modes above traces back to the same missing piece: the operating layer between agents and your enterprise data. Vendors are racing to occupy that layer, but most of them deliver only part of it.

If you're building enterprise agents from scratch in 2026, the dependable build order is:

identity first (provision a dedicated agent identity scoped to the minimum data it needs),
retrieval second (ground every prompt against a permission-aware index, not native source-system search),
action third (proxy every tool call through a single policy enforcement point), and
observability woven through every step.

Skipping any of the four pushes the costs into future years, when re-engineering is harder than scaffolding.

There are roughly three patterns being deployed today.

Pattern A: Stack-bound platforms. Microsoft, Salesforce, Google, AWS, and IBM each offer end-to-end agent platforms that work brilliantly inside their stacks. If you're a single-vendor enterprise, this can work. Most enterprises aren't.
Pattern B: Control-plane-only governance. A host of new “agent gateway” platforms govern other companies' agents: they see what agents do, but they don't feed them data. They're a visibility layer without a knowledge layer.
Pattern C: SaaS knowledge platforms. Productivity-focused SaaS solutions have built knowledge layers for agents, but their SaaS architecture means your data has to leave your perimeter to be indexed. For regulated industries, that's a non-starter. We've broken down the architectural trade-offs in Self-Hosted Enterprise Search vs. SaaS: A 2026 Guide for Regulated Industries.

Atolio is architected for the fourth pattern: both planes, in your perimeter.

‍

‍

That means:

A permission-aware knowledge layer. A single index that spans every system your teams work in – Slack, Salesforce, GitHub, Confluence, Jira, SharePoint, Drive, Notion, ServiceNow, and 30+ more – that respects source-system permissions at query time. Agents (and humans) only see what they're authorized to see. No data leaves your perimeter to be indexed.
An action proxy. Agents take actions (create tickets, send emails, update records) through Atolio. Atolio relays those MCP calls using its own credentials and maintains a complete audit log of every retrieval and every action. IT gets visibility, kill switches, anomaly detection, and a single audit log spanning every agent and every system.
A native agent builder. Employees describe what they want (“email me a daily summary of what I missed yesterday”), and Atolio builds it. The resulting agents are permission-aware by default, shareable across the organization, and visible to IT.
Sovereign deployment. The entire platform – connectors, search index, control plane, agent builder – runs inside your AWS, Azure, GCP, or on-premise environment. Air-gapped operation is supported. There's no vendor cloud in the path. There's no compute markup; you use your own cloud credits. This is what regulated industries require and what most SaaS alternatives architecturally cannot do.
A collaboration graph. Atolio understands who is asking, what they work on, and who they work with, so retrieval is weighted by relevance to the specific person (or agent acting on their behalf), not generic keyword match.

The combination matters more than any individual piece. A control plane without a knowledge layer can't ground answers. A knowledge layer without permission enforcement can't be trusted with sensitive data. A SaaS architecture can't survive regulated industries. And a stack-bound platform can't govern the seven other agent platforms already in your environment.

A note on governance

AI agent governance has become a procurement category in its own right in 2026, pulled forward by the EU AI Act's August 2 high-risk provisions, NIST's AI Risk Management Framework Agentic Profile, and the Cloud Security Alliance's Agentic AI Governance work. The enterprises moving fastest treat governance as architectural rather than procedural: instead of writing a policy document and asking teams to comply, they make the only sanctioned build path the one that's already governed. Sanctioned identity, sanctioned retrieval, sanctioned action proxy, sanctioned audit log. Teams building inside those paths are governed by construction; teams building outside them are visible to IT immediately.

A word on protocols

MCP and A2A have moved from experimental to default in under twelve months, but they are early standards with active security gaps. MCP's tool poisoning, unauthenticated server exposure, and credential aggregation issues are now documented; A2A's inter-agent trust model is still being formalized. Betting on a single protocol today is a year-two re-architecture risk. The defensible move is to pick an operating layer that implements both natively, treats whichever the buyer sends as table stakes, and isolates protocol logic so successor standards (Agora, ANP, or whatever emerges next) can be slotted in without re-engineering the governance layer underneath. We've written a fuller breakdown of where MCP leaves enterprise security unsolved.

What changes in 2027

Three shifts are already visible.

Identity becomes infrastructure. The OpenID Foundation, NIST, and CSA frameworks for agentic identity are converging. By late 2027, expect mature standards and procurement requirements that mirror them. Treat your agent identity model as a 24-month bet.

Pricing models split. Seat-based pricing for agent platforms will fade as power-user concentration breaks the economics. Expect a mix of consumption-based, outcome-based, and hybrid models. Smart enterprises are negotiating flexibility now, before lock-in.

Control planes consolidate. The 2026 explosion of “agent control towers“ will rationalize. Enterprises that bet on stack-bound control planes will face the same multi-vendor governance problem they have today. Stack-neutral control planes will win the long game.

The shape of 2027 isn't a different category of problem. It's the maturation of the categories above.

FAQ

1. How do I give an AI agent access to enterprise data without leaking it?

Treat the agent as a dedicated identity scoped to the minimum data it needs, not the credentials of the human who built it. Route retrieval through a single permission-aware knowledge layer that enforces source-system ACLs at query time. Ensure that the index, the query path, and the audit log all live inside your security perimeter, not a vendor's SaaS cloud. Where high-stakes actions are involved, require human approval. The goal is to make leakage architecturally hard, not procedurally discouraged. IBM's 2025 breach data shows 97% of AI-related breaches involved orgs lacking proper access controls: the controls are the difference.

2. What permissions should an AI agent have?

The least required to complete its task, and no more. Most agents today inherit the full permissions of whoever built them, which is usually 10–100x more than needed. Best practice is to provision a dedicated identity per agent, scope it to a curated collection of content (specific SharePoint sites, Confluence spaces, Drive folders), and inherit document-level ACLs from those sources automatically. High-risk actions (financial transfers, deletions, external communications) should require human approval. Reassess permissions every 90 days; agents that no longer need access should lose it.

3. Why do enterprise AI agents fail in production?

LangChain's State of AI Agents finds 32% of agent pilots stall before reaching production. The dominant causes aren't model quality. They're related to (1) data: the agent works in pilot on clean data but breaks on production's messiness; (2) permissions: the agent inherits an over-broad token that breaks when the human leaves; (3) silent quality degradation: the agent looks like it's working until someone audits the output; (4) cost: the agent runs in a loop nobody noticed; and (5) workflow: the agent was dropped into a process that should have been redesigned.

4. How do I prevent prompt injection in production agents?

There's no single fix, but a layered approach works. (1) Constrain the data each agent can reach – most prompt injection requires the agent to read untrusted content; if the agent doesn't need email or web search, don't give it access. (2) Run rule-set policy enforcement on every tool call, not just every prompt. (3) Treat content from untrusted sources differently from trusted sources at the data layer. (4) Monitor agent outputs for exfiltration patterns (anomalous URLs, encoded payloads). (5) Require human approval on high-stakes actions. OWASP's GenAI Top 10 lists prompt injection as the #1 LLM risk for the second year; assume it will happen, and limit blast radius.

5. How do I stop AI agents from exfiltrating data?

Two layers matter. Architecturally: keep the data inside your perimeter so that even a successful prompt injection has nothing external to leak it to – no vendor cloud in the data path. Operationally: monitor agent network egress (anomalous outbound URLs, unusual data volumes), enforce least privilege so each agent has the smallest possible attack surface, and instrument outputs for data leakage patterns. The combination of architectural containment and operational monitoring closes most exfiltration vectors. Air-gapped deployment is the strongest version of this control.

6. What's the safest way to let an agent search internal documents?

Route the search through a permission-aware knowledge layer that (1) respects the document-level ACLs of the source systems where the documents live, (2) enforces those permissions at query time (not at index time), (3) lives inside your security perimeter, and (4) returns one well-ranked result instead of dozens of verbose snippets. Don't give the agent direct MCP access to each source system; centralize through the layer. This gives you a single audit log of what was retrieved and prevents the agent from accidentally reading documents the requesting user shouldn't see.

7. Do AI agents need a control plane?

Yes, but the term has been overloaded. A useful control plane does four things: (1) it governs which agents can run, with what permissions, against what data; (2) it captures a unified audit log across all agents and all systems; (3) it provides kill switches, anomaly detection, and per-agent budget caps; (4) it works across multiple agent platforms (Copilot, Gemini, Claude, custom) rather than being bound to one vendor's stack. Without a control plane, you can't safely scale beyond a handful of pilot agents. With one, you can govern hundreds of agents from one place.

8. How long does it take to build an enterprise AI agent?

A scoped first agent – one workflow, one data set, defined ROI – takes most enterprises 10–16 weeks from kickoff to production, but most of that time is not model work. It's connectors, permissions, identity provisioning, evaluation harnesses, and workflow redesign. Teams that try to compress this often pay it back later in remediation. Teams that already have a centralized knowledge layer and a governance pattern can cut this to 4–6 weeks, because the foundation work has already been done.

9. How many AI agents are running in my organization right now?

Almost certainly more than you think. Recent research found 82% of organizations discovered previously unknown agents in the past year, despite 68% claiming high confidence in visibility. Most enterprises in the F500 range have 10–50x more agents in production than their CIO can name. The single most useful first move is an inventory amnesty: registration without penalty in exchange for ownership tagging.

10. Why are so many AI agent projects getting canceled?

Gartner predicts more than 40% of agentic AI projects will be canceled by end of 2027, citing immature governance and unclear ROI. The pattern is consistent: a champion drives a pilot, the pilot impresses, but production scale exposes data quality issues, permission gaps, cost overruns, and lack of clear value attribution. The fix isn't a better agent, it's a better foundation. Enterprises that solve identity, retrieval, observability, and FinOps before scaling agents see materially higher production success rates.

Closing

The nine failure modes in this guide share a structural root cause. Agents today inherit whatever access, retrieval, identity model, observability, and lifecycle their host enterprise already runs on, and most enterprises were never set up for non-human users that act across systems with their own credentials, on their own clocks, at scale. The result is a foundation problem dressed up as an agent problem.

What good looks like is consistent across all nine: a permission-aware retrieval layer that holds source-system ACLs and enforces them at query time; a single audit log spanning every agent and every action; per-agent identity, scoped to the minimum data set required; cost telemetry that rolls up by agent; and an operating layer that lives inside the same security perimeter as the data it indexes. None of this is exotic, it just hasn't been the default in how most enterprise AI gets bought.

If you're working through any of this – whether a shadow agent inventory you can't fully see, a token bill you can't fully attribute, a multi-agent rollout you're trying to keep auditable, an IT team buried under one-off agent permission and connector requests they can't keep up with, or even just the foundational groundwork to enable a more agentified organization – we'd be happy to trade notes.

Done right, the foundation work doesn't only reduce risk and administrative overhead. It also unblocks the people who were going to build agents anyway, by giving them a path that's faster and infinitely safer than the unsanctioned one. The architecture is the enablement.

Book a 30-minute conversation with our team.

David Lanstein

Co-founder and CEO at Atolio

The Future of Agent Building: How Enterprises Will Build, Govern, and Trust AI Agents in 2026

Summary

Setting the scene: agents are racing ahead of the operating layer underneath

Helpful terms and definitions

Nine structural failure modes, and what good looks like

Most common failure modes

Failure mode 1: Security and data exposure

Failure mode 2: Reliability and quality failures

Failure mode 3: Knowledge fragmentation and stale context

Failure mode 4: The pilot-to-production gap

Emerging failure modes: gaining critical mass

Failure mode 5: Identity and access for AI agents

Failure mode 6: Observability, audit trail, and traceability

Failure mode 7: Multi-agent orchestration and the agent control plane

Failure mode 8: Runaway cost and per-agent costs

Failure mode 9: Sprawl, lifecycle, and decommissioning

The architecture that closes the gap

A note on governance

A word on protocols

What changes in 2027

FAQ

1. How do I give an AI agent access to enterprise data without leaking it?

2. What permissions should an AI agent have?

3. Why do enterprise AI agents fail in production?

4. How do I prevent prompt injection in production agents?

5. How do I stop AI agents from exfiltrating data?

6. What's the safest way to let an agent search internal documents?

7. Do AI agents need a control plane?

8. How long does it take to build an enterprise AI agent?

9. How many AI agents are running in my organization right now?

10. Why are so many AI agent projects getting canceled?

Closing

Get the answers you need from your enterprise. Safely.