AI agents are entering their rebuild era as enterprises confront the reliability problem

May 29, 2026 - 19:17

0 1

AI agents are entering their rebuild era as enterprises confront the reliability problem

As enterprise AI agents move into production, organizations are confronting a growing reliability problem. Many teams are discovering that LLM performance alone does not determine whether agents succeed in production. Long-running AI workflows must survive crashes, preserve state, recover from failures, manage inference costs, and coordinate across APIs, tools, and enterprise systems.

After a first wave focused on rapid deployment, organizations now need to revisit those first-generation implementations, and redesign early agent architectures around workflow orchestration, observability, governance, and recovery, said Preeti Somal, Senior VP Engineering at Temporal Technologies, during the latest AI Impact Series event in New York.

“We do have a lot of customers that come to us where they’re building version 2.0 of the same agent,” Somal said. “They had to move really fast, but they didn’t take care of the plumbing. Things crash and burn, and then they’re back to rebuilding with the reliable foundation.”

For workflow orchestration company Temporal, whose infrastructure predates the current wave of agentic AI, the shift reflects a broader enterprise realization: production AI systems require durable execution, state management, visibility into workflows, and mechanisms to recover when models or downstream systems fail.

Agentic AI has supercharged familiar engineering problems

“These patterns aren’t necessarily new," Somal said. " AI just supercharges them."

Agentic systems introduce additional complexity because they often involve long-running, multi-step processes spanning multiple services, models, APIs, and tools. A single workflow might call several large language models, access retrieval systems, trigger external applications, and manage state over hours or days. The engineering questions, Somal said, often emerge only after deployment.

“People will write agents but haven’t thought about what happens if the agent crashes,” she said. “Am I going to need to run the entire agent flow again?”

For enterprises operating under cost constraints, the answer matters. Restarting workflows after failures can multiply inference expenses, increase latency, and create poor customer experiences.

Somal compared the current moment to an earlier period in enterprise cloud adoption when organizations went straight to migrating workloads before considering that they needed to redesign underlying architectures if they wanted these workloads to weather the long-term.

“This rush to do AI in a world where you haven’t even modernized your application reminds me a little bit of that lift-and-shift that happened in the cloud,” she said. “Everybody realized you’re spending more money on cloud and we haven’t gotten value there.”

Why long-running agents force a new architecture

Enterprise workflows increasingly involve agents executing over long windows, sometimes spanning many hours while interacting with tools and systems. Reliability challenges compound when workflows persist over time, and it impacts both state and memory, two ideas that are often treated interchangeably in AI conversations.

State concerns workflow execution. It includes where an agent is in a process, which actions have already completed, and where recovery should resume after failure. Memory or context captures information an agent carries forward across interactions or tasks.

“The state of the agent is around what step and what actions have been performed, and if something crashes, where do you want to recover from, versus the context and memory piece,” Somal explained.

That distinction becomes increasingly important when enterprises begin moving beyond simple chatbot interactions toward longer-running business processes. Somal pointed to a healthcare example involving customer Abridge, where workflows process physician visits through multiple stages, including audio processing, summarization, model calls, and after-visit generation.

“There’s not just one piece to that flow,” Somal said. “Taking videos and slicing that, taking summaries, calling the LLMs, generating the after-visit summary, all of that is being orchestrated.”

The implication for enterprises is that successful agents increasingly depend on systems that can survive interruptions, coordinate across services, and maintain continuity over time.

The rise of the deterministic spine

A useful framework for enterprise AI design is the deterministic spine, Somal said, which is how they think about Temporal's role.

“It is denoting the path you want to take," she said. "It is calling the brain, but if the brain doesn’t respond, it will call it again. If the brain responds but the next step is going to fail, it will pick up from where that failure happened.”

In this framing, the language model acts as a probabilistic system producing variable outputs, while orchestration software maintains execution reliability around it. And the concept matters because enterprise systems increasingly require consistency even when models remain non-deterministic. A procurement workflow, healthcare summary, customer support escalation, or compliance process cannot simply fail silently because a model call timed out or an external dependency crashed.

“What you care most about is making sure that you can recover and that you’re not paying the token tax if something goes wrong,” Somal said.

Reliability, visibility, and the economics of token spend

As enterprise leaders evaluate AI ROI, cost visibility has become a growing concern. Long-running agents frequently make multiple model calls across complex workflows, which can create opaque spending patterns. Somal described one operational advantage of orchestration as visibility into where costs accumulate. Because workflows are observable step-by-step, teams can see where tokens are being consumed across an agent process.

“You’ve got visibility into that entire flow in a single pane of glass,” she said. “You can now see where you’re spending the tokens in an agent that is multiple steps and calling multiple different systems.”

Workflow recovery also shapes cost efficiency. Without durable orchestration, a late-stage failure can force organizations to rerun an entire process from the beginning, including all prior model calls. Somal said systems designed around recovery can resume execution from the point of interruption.

“You pick up from where the crash happened,” she said. “We save you the cost of running the agent from step one again.”

Enterprises need to build paved paths and enlist partner expertise

Governance concerns are another emerging pattern as agentic AI takes hold. Rather than adopting fully managed agent systems wholesale, Somal said enterprises increasingly want standardized internal frameworks that provide guardrails while preserving flexibility, and implementing necessary features like governance controls, model selection policies, identity systems, cost management, and observability.

“The enterprises are looking at building these paved paths,” she said. “Taking something off the shelf is maybe not going to work because there are all of these other requirements.”

As organizations revisit first-generation deployments, challenges like this increasingly look less like a model problem and more like a systems engineering problem, and Temporal is positioned to help enterprises take this next step in part because for many organizations, it already existed as part of broader modernization programs before AI became a strategic priority.

“Temporal is already in the enterprise,” Somal said. “Taking that and extending that to AI and agent platforms feels very natural.”