OpenAI Launches GPT-5.4: Its Most Capable and Efficient Frontier Model for Professional Work

By admin | Mar 05, 2026 | 2 min read

OpenAI introduced GPT-5.4 on Thursday, describing it as "our most capable and efficient frontier model for professional work." Alongside the standard release, the model is offered in a reasoning-focused variant called GPT-5.4 Thinking and a high-performance version named GPT-5.4 Pro.

The API will support context windows of up to 1 million tokens, marking the largest such capacity OpenAI has ever provided. The company also highlighted gains in token efficiency, noting that GPT-5.4 solves the same tasks using notably fewer tokens than earlier models.

Benchmark performance shows substantial improvements, with record scores on the OSWorld-Verified and WebArena Verified computer use benchmarks. Additionally, GPT-5.4 achieved 83% on OpenAI’s GDPval test for knowledge work. According to a statement from Mercor CEO Brendan Foody, the model also leads on Mercor’s APEX-Agents benchmark, which evaluates professional skills in law and finance.

“[GPT-5.4] excels at creating long-horizon deliverables such as slide decks, financial models, and legal analysis,” Foody stated, “delivering top performance while running faster and at a lower cost than competitive frontier models.”

The model advances OpenAI’s ongoing work to reduce hallucinations and factual inaccuracies. Compared to GPT-5.2, GPT-5.4 is 33% less likely to make errors in individual claims, and overall responses are 18% less likely to contain mistakes.

For the API launch, OpenAI redesigned how tool calling is handled, introducing a system called Tool Search. Previously, system prompts had to define every available tool upfront—a process that grew costly in tokens as tool libraries expanded. Now, models can fetch tool definitions only when needed, making requests faster and more economical in systems with many tools.

A new safety evaluation has also been added to assess the model’s chain-of-thought—the internal commentary it provides during multi-step tasks. Researchers have expressed concern that reasoning models might misrepresent their reasoning process. Testing indicates this can occur under certain conditions, but OpenAI’s evaluation finds deception is less likely in the GPT-5.4 Thinking variant, “suggesting that the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool.”