This content originally appeared on DEV Community and was authored by Jeff Pegg
The promise of an autonomous AI agent that can take a high-level goal and execute it is arguably the holy grail of applied AI. Picture telling your computer “Plan and book a
vacation to Japan” and returning hours later to find flights booked, hotels reserved, and an itinerary crafted. The demos are compelling, the venture capital is flowing, and
every AI company now claims to be building “agentic AI.”
However, after building and deploying systems based on the current “agentic” stack, it’s clear we aren’t building agents. We’re building brittle, expensive, and high-latency
state machines managed by a stochastic parrot.
The Planning Fallacy
Most “agents” today use a simple ReAct (Reason+Act) loop: the model generates some reasoning, takes an action, observes the result, and repeats. This isn’t planning—it’s
myopic, iterative execution that breaks down with any complexity.
Real planning requires modeling the problem space, considering multiple paths, and reasoning about future states. Classical AI gave us formal planning languages like PDDL
(Planning Domain Definition Language) that can solve complex, multi-step problems optimally. What we call “agentic AI” today is closer to a chatbot with API access than a
planning system.
Try asking your “agent” to plan a task with more than 5 interdependent steps. Watch it lose context, repeat actions, or get stuck in loops when the first approach doesn’t
work.
The State Management Catastrophe
A true agent needs robust memory and state management. Current systems “manage” state by stuffing conversation history and tool outputs into an ever-growing context window.
Here’s the math that VCs don’t want you to see:
- A moderately complex task (booking that Japan trip) might involve 20+ tool calls
- Each web search, API call, and reasoning step adds ~1,000 tokens
- Within hours, you’re at 50,000+ tokens per task
- At $0.06 per 1K tokens (GPT-4), that’s $3+ per task
- Most tasks fail and need restarts, multiplying costs
Scale this to enterprise workflows, and you’re looking at hundreds of dollars per successful automation. The economics simply don’t work for anything beyond high-value,
one-off tasks.
The Brittle Tool-Use Illusion
The magic trick of “agentic AI” is tool use—but strip away the marketing and it’s just LLM-generated API calls. The engineering reality is terrifying:
Schema drift kills everything. When Stripe updates their API (which they do monthly), your “agent” starts hallucinating deprecated endpoints. There’s no graceful degradation,
just silent failures.
Error handling is a joke. Most frameworks implement error handling as “retry with the error message included.” This works for typos, not fundamental misunderstandings of how
APIs work.
Security is an afterthought. I’ve seen production “agents” hallucinate database deletion commands because the model confused a cleanup script with a backup procedure. The
industry’s answer? “Add better prompts.”
The Enterprise Reality Check
The hype doesn’t match enterprise adoption. Here’s what’s actually happening:
- Gartner predicts 40% of agentic AI projects will be canceled by 2027 due to escalating costs and unclear business value
- IBM’s Marina Danilevsky is skeptical about 2025 being “the year of the agent,” noting we haven’t figured out ROI on LLMs generally
- Enterprise customers are in wait-and-see mode, with most saying they don’t want to be first to try agentic AI
The gap between demo and deployment is massive. What works in a controlled environment with cherry-picked examples falls apart when faced with real-world complexity, edge
cases, and cost constraints.
The “Agent Washing” Epidemic
The term “agent” is being slapped onto everything:
- Chatbots with function calling → “Conversational agents”
- Robotic process automation scripts → “RPA agents”
- Simple if-then workflows → “Intelligent agents”
This isn’t just marketing fluff—it’s actively harmful. It sets unrealistic expectations, leads to poor architectural decisions, and diverts resources from solving real
automation challenges.
What Real Agentic AI Would Actually Require
True autonomous agents need fundamental breakthroughs we don’t have yet:
New architectures beyond Transformers. Current models are fundamentally reactive, not goal-oriented. We need architectures designed for planning, not next-token prediction.
Native long-term memory. Not context windows or vector databases, but memory systems that can maintain state across sessions, learn from failures, and build genuine
understanding.
Robust self-correction. The ability to detect when something has gone wrong and recover gracefully, not just retry the same broken approach.
100x cost reduction. Until inference costs drop dramatically, “agents” will remain expensive toys for narrow use cases.
The Path Forward
This isn’t about being anti-AI or dismissing progress. The underlying capabilities—language understanding, code generation, reasoning—are genuinely impressive. But calling
current systems “agents” is like calling a calculator a mathematician.
If you’re building in this space:
- Be honest about current limitations
- Focus on narrow, well-defined tasks where brittleness is acceptable
- Build extensive error handling and human oversight
- Track actual costs and success rates, not just demo metrics
The future of autonomous AI is exciting, but it won’t arrive by rebranding existing tools. It requires acknowledging that we’re still in the early innings, and the real work
is just beginning.
The author has built production systems using major agentic AI frameworks and currently works on AI infrastructure at Momentum3. Views are personal and based on hands-on
experience with current limitations.
This content originally appeared on DEV Community and was authored by Jeff Pegg