“`html

As OpenAI works to strengthen the security of its Atlas AI browser, the company openly acknowledges that a major risk will persist: prompt injection attacks. These cyberattacks manipulate AI agents to follow malicious instructions, often hidden in web pages or emails. This admission raises fundamental questions about the ability of artificial intelligence agents to operate safely on the open web.
Prompt injections: a permanent threat to AI browsers
“Prompt injection, much like scams and social engineering on the web, will probably never be completely ‘solved’,” OpenAI wrote in a blog post published Monday, detailing how the company is strengthening Atlas protection against these relentless attacks. The company admitted that “agent mode” in ChatGPT Atlas “expands the security threat surface.”
OpenAI launched its ChatGPT Atlas browser in October 2025, and security researchers immediately rushed to publish their demonstrations. They showed it was possible to write a few words in Google Docs capable of altering the behavior of the underlying browser. That same day, Brave published a blog post explaining that indirect prompt injection is a systemic challenge for AI-powered browsers, including Perplexity’s Comet.
International recognition of the security problem
OpenAI is not alone in recognizing that prompt-based attacks will not disappear. The UK National Cybersecurity Centre warned in early December that prompt injection attacks against generative AI applications “may never be fully mitigated,” exposing websites to risks of large-scale data breaches. The British government agency advised cybersecurity professionals to reduce the risk and impact of prompt injections, rather than thinking that attacks can be “stopped.”
For its part, OpenAI states: “We consider prompt injection a long-term AI security challenge, and we will need to continuously strengthen our defenses against it.”
OpenAI’s strategy: an AI-based automated attacker
The company’s response to this Sisyphean task? A proactive cycle and rapid response that the company claims shows early promise for discovering new attack strategies internally before they are exploited “in the wild.”
This is not entirely different from what rivals like Anthropic and Google have said: to combat the persistent risk of prompt-based attacks, defenses must be multi-layered and continuously tested under stress. Google’s recent work, for example, focuses on architectural and policy controls for agentic systems.
A bot trained to play the role of hacker
But where OpenAI takes a different approach is with its “LLM-based automated attacker“. This attacker is essentially a bot that OpenAI has trained, using reinforcement learning, to play the role of a hacker seeking ways to slip malicious instructions to an AI agent.
The bot can test the attack in simulation before using it for real, and the simulator shows how the target AI would think and what actions it would take if it saw the attack. The bot can then study this response, adjust the attack, and try again and again. This view of the target AI’s internal reasoning is something outsiders don’t have access to, so in theory, OpenAI’s bot should be able to find vulnerabilities faster than a real-world attacker.
“Our attacker trained with reinforcement learning can direct an agent toward executing sophisticated, long-horizon harmful workflows that deploy over dozens (or even hundreds) of steps,” OpenAI wrote. “We have also observed new attack strategies that did not appear in our human red teaming campaign or in external reports.”

Concrete demonstration of an injection attack
In a demonstration (partially illustrated above), OpenAI showed how its automated attacker slipped a malicious email into a user’s inbox. When the AI agent subsequently scanned the inbox, it followed the hidden instructions in the email and sent a resignation message instead of drafting an out-of-office response. But following the security update, “agent mode” was able to successfully detect the prompt injection attempt and report it to the user, according to the company.
The company claims that while prompt injection is difficult to secure infallibly, it relies on large-scale testing and faster patch cycles to strengthen its systems before they appear in real-world attacks.
Recommendations for limiting risks
An OpenAI spokesperson refused to share whether Atlas’s security update resulted in a measurable reduction in successful injections, but claims the company has been working with third parties to strengthen Atlas against prompt injection since before launch.
OpenAI also suggests that users give agents specific instructions, rather than providing them access to their inbox and telling them to “take any necessary action.”
“Broad latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” according to OpenAI. The company indicates that Atlas is also trained to obtain user confirmation before sending messages or making payments.
Necessary skepticism according to cybersecurity experts
Rami McCarthy, senior security researcher at cybersecurity firm Wiz, argues that reinforcement learning is a way to continuously adapt to attacker behavior, but it’s only part of the picture.
“A useful way to reason about risk in AI systems is autonomy multiplied by access,” McCarthy told TechCrunch. “Agentic browsers tend to sit in a difficult part of that space: moderate autonomy combined with very high access.”
While OpenAI claims that protecting Atlas users from prompt injections is a top priority, McCarthy calls for some skepticism about the return on investment of risky browsers.
“For most everyday use cases, agentic browsers don’t yet offer enough value to justify their current risk profile,” McCarthy told TechCrunch. “The risk is high given their access to sensitive data like emails and payment information, even though that access is also what makes them powerful. This balance will evolve, but today the tradeoffs are still very real.”
Conclusion: a long-term AI security challenge
OpenAI’s position on prompt injection attacks marks a turning point in how the artificial intelligence industry approaches security. Rather than promising miracle solutions, the company is opting for a pragmatic approach based on continuous improvement and transparency. AI browsers like ChatGPT Atlas represent a new technological frontier, but their inherent vulnerabilities remind us that security must remain at the heart of developing these powerful tools.
For users and businesses, the message is clear: AI agents offer revolutionary possibilities, but their adoption must be accompanied by an understanding of the risks and rigorous application of security best practices.
Source: TechCrunch
“`
