AI Agents Surge in Professional Tasks as New Model Nears 30% Capability

By admin | Feb 06, 2026 | 3 min read

Last month, I discussed Mercor's latest benchmark, which evaluates AI agents on professional tasks such as legal work and corporate analysis. At that point, the results were quite low, with every major lab scoring below 25%, leading to the conclusion that lawyers remained secure from AI replacement for the time being. However, AI capabilities can evolve significantly in just a few weeks.

This week's release of Opus 4.6 dramatically altered the rankings. Anthropic's new model achieved just under 30% in one-shot trials and averaged 45% when allowed multiple attempts. The update introduced several new agentic features, including "agent swarms," which likely contributed to improved performance on these multi-step challenges. Regardless, this marks a substantial leap from prior top scores and signals that progress in foundation models continues unabated.

Mercor CEO Brendan Foody expressed particular astonishment, stating, "jumping from 18.4% to 29.8% in a few months is insane."

While 30% is still far from perfect, lawyers aren't facing imminent replacement by machines. Nonetheless, they should feel considerably less assured than they did just a month ago.