LLM Agents in 2023: Impressive Demos, Harder Reality

AutoGPT went viral in April 2023 and for a few weeks it seemed like autonomous AI agents were imminent. The demos showed impressive things: giving GPT-4 a goal, watching it break it into tasks, execute code, search the web, and iterate toward the objective. The GitHub repository accumulated 150,000 stars faster than almost anything before it.

I spent time that spring trying to use AutoGPT for real tasks, not demos. The gap between what the demos showed and what actually worked was large.

The fundamental problem was reliability. A task that required ten sequential steps was reliable only if each step had high reliability. If each step had a 90% success rate, a ten-step task succeeded 35% of the time. In practice, the success rate of individual agent steps was often lower than 90%, and agent tasks often required more than ten steps. Long-horizon autonomous tasks failed most of the time.

The failure modes were also unpredictable. An agent might interpret a subtask incorrectly and pursue something that looked reasonable from its perspective but was not what you wanted. Without human oversight, these errors compounded. The agent would make a wrong decision, build subsequent decisions on that wrong decision, and produce outputs that were confidently far from what was needed.

The cost was significant. Running an autonomous agent that made dozens of LLM calls to complete a task cost meaningful money in API fees. The tasks that worked reliably with agents were often tasks that could be done with a single well-crafted prompt at a fraction of the cost.

What did work in 2023 was agents with narrow, well-defined scopes. A code review agent that examined a pull request and identified issues. A data extraction agent that processed documents with a defined format. An agent that automated a specific, well-understood workflow. The constraint was essential: narrow scope, defined success criteria, human oversight for anything consequential.

The lesson from 2023's agent experiments was that the enabling technology was real but the engineering challenges of making agents reliable, cost-effective, and safe were not solved. The research investment in agentic AI was enormous and the 2024 and 2025 improvements were significant. But in 2023, the honest answer to "can we use agents for this?" was usually "not reliably, not yet".

Related Articles