Powered by Smartsupp

OpenAI Admits Persistent Prompt Injection Threat to AI Browsers



By admin | Dec 22, 2025 | 5 min read


OpenAI Admits Persistent Prompt Injection Threat to AI Browsers

While OpenAI continues to strengthen the security of its Atlas AI browser, the company acknowledges that prompt injection attacks—a method where AI agents are manipulated by malicious instructions concealed within web pages or emails—remain a persistent and enduring threat. This reality casts doubt on how safely such AI agents can navigate the open web. In a blog post published on Monday, OpenAI stated, “Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved’.” The post detailed the firm’s efforts to bolster Atlas’s defenses against these relentless attacks, conceding that the ‘agent mode’ in ChatGPT Atlas “expands the security threat surface.”

OpenAI introduced its ChatGPT Atlas browser in October, and security researchers quickly published demonstrations revealing its vulnerabilities. These demos showed that just a few words inserted into a Google Doc could alter the browser’s fundamental behavior. On the same day, Brave released a blog post explaining that indirect prompt injection presents a systemic challenge for all AI-powered browsers, including Perplexity’s Comet. OpenAI is not alone in recognizing the stubborn nature of these attacks. Earlier this month, the U.K.’s National Cyber Security Centre warned that prompt injection attacks targeting generative AI applications “may never be totally mitigated,” leaving websites exposed to potential data breaches. The agency advised cybersecurity professionals to focus on reducing the risk and impact of such attacks rather than attempting to eliminate them entirely.

Regarding its own approach, OpenAI stated: “We view prompt injection as a long-term AI security challenge, and we’ll need to continuously strengthen our defenses against it.” The company’s strategy involves a proactive, rapid-response cycle designed to identify novel attack methods internally before they can be exploited externally. This method is showing early promise in addressing what seems like a Sisyphean task.

This perspective aligns with that of competitors like Anthropic and Google, which also emphasize the need for layered, continuously tested defenses against persistent prompt-based attacks. Google’s recent efforts, for instance, concentrate on architectural and policy-level controls for agentic systems. However, OpenAI is distinguishing itself with its “LLM-based automated attacker.” This tool is essentially a bot trained using reinforcement learning to simulate a hacker searching for ways to deliver malicious instructions to an AI agent. The bot can test attacks in simulation, observing how the target AI would reason and respond, then refine its approach through repeated attempts. This internal access to the AI’s reasoning process theoretically allows OpenAI’s bot to identify vulnerabilities more swiftly than an external attacker could. It’s a standard practice in AI safety testing: creating an agent to rapidly uncover and test edge cases in a simulated environment. OpenAI noted, “Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” adding that it has “observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”

a screenshot showing a prompt injection attack in an OpenAI browser.
Image Credits:OpenAI

In a demonstration, OpenAI illustrated how its automated attacker placed a malicious email into a user’s inbox. When the AI agent later scanned the inbox, it followed the hidden instructions and sent a resignation message instead of drafting an out-of-office reply. However, following the security update, the company reports that “agent mode” successfully detected this prompt injection attempt and alerted the user. OpenAI emphasizes that while completely foolproof security against prompt injections is difficult, it is relying on large-scale testing and accelerated patch cycles to fortify its systems ahead of real-world attacks. An OpenAI spokesperson declined to specify whether the security updates have led to a measurable decrease in successful injections but confirmed that the firm has collaborated with third parties to harden Atlas against prompt injection since before its launch.

Rami McCarthy, principal security researcher at cybersecurity firm Wiz, notes that reinforcement learning is one method for adapting to attacker behavior, but it is only part of the solution. “Agentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access,” McCarthy explained. “Many current recommendations reflect that tradeoff. Limiting logged-in access primarily reduces exposure, while requiring review of confirmation requests constrains autonomy.”

These two principles are among OpenAI’s key recommendations for users to mitigate their own risk. A spokesperson added that Atlas is trained to seek user confirmation before sending messages or processing payments. The company also advises users to provide agents with specific instructions rather than granting broad access to sensitive areas like an inbox with open-ended directives.

“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” according to OpenAI. While the company states that protecting Atlas users from prompt injections is a top priority, McCarthy urges some skepticism regarding the return on investment for browsers operating in high-risk environments. “The risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the tradeoffs are still very real.”




RELATED AI TOOLS CATEGORIES AND TAGS

Comments

Please log in to leave a comment.

No comments yet. Be the first to comment!