AI fails as autonomous employee in simulated company experiment

Despite recent progress, current AI struggles to carry out complex tasks in a simulated work environment

Pro

26 January 2026

Researchers at Carnegie Mellon University have conducted an intriguing experiment to explore the potential of artificial intelligence (AI) in a workplace setting. They created a simulated company staffed entirely by AI agents, each assigned a specific role such as financial analyst or software engineer. These agents were powered by popoular large language models from various tech giants, including Anthropic’s Claude, OpenAI’s GPT-4o and Google’s Gemini.

To mimic real-world collaboration, the researchers introduced a separate platform representing human colleagues with whom the AI agents had to communicate for certain tasks. The results were telling. The AI agents struggled to complete most of their assigned tasks. Even Claude 3.5 Sonnet, the best-performing agent, managed to fully complete only 24% of its assignments, rising to 34.4% when partial completions were included. Other agents did even worse, with none achieving a completion rate of more than 10%.

The experiment exposed several key weaknesses in current AI technology. Many failures were due to the AI’s inability to understand nuanced instructions. A simple request to save a file with the .docx extension, for example, proved challenging because the AI did not recognise this as a Microsoft Word document format. Communication and social reasoning tasks also posed a significant hurdle for the AI agents.

On top of that, Web navigation turned out to be particularly difficult, especially when pop-ups were involved. When confronted with complexities, the AI often resorted to shortcuts, skipping difficult steps and prematurely assuming that the task had been successfully completed.

The findings of this research highlight the limitations of current AI systems, despite their impressive speed and efficiency. While AI excels at narrowly defined tasks, it still lacks the autonomy and versatility needed for truly independent work. Human judgement, creativity and adaptability will remain indispensable in any workplace for the foreseeable future.

Business AM

AI fails as autonomous employee in simulated company experiment

Sign up for the Technology Minute

Support our advertisers

Listen to Tech Radio

Most Popular