LegalPwn: When AI Thinks Malware is Just Following Orders!
Researchers at Pangea Labs have discovered LegalPwn, a cyberattack that tricks AI into classifying malware as safe by hiding it in fake legal disclaimers. Even Google’s Gemini and GitHub Copilot were duped. The attack highlights a significant security gap in AI systems, stressing the need for human oversight in AI security.

Hot Take:
Well, it looks like hackers have found a way to make AI models legally blind! LegalPwn is like giving AI models a pair of bifocals with one lens missing—sure, they can read the legalese just fine, but they completely miss the malware in the fine print. I guess it’s time to remind these AI tools that the law isn’t always on their side!
Key Points:
- LegalPwn is a cyberattack that manipulates generative AI tools into misclassifying malware as safe code.
- The attack uses social engineering by embedding malicious code within legal-sounding text.
- Most AI models tested (12 in total) were susceptible to this manipulation.
- Some AI models, like Anthropic’s Claude 3.5 Sonnet, showed resistance to the attack.
- Human oversight is crucial in preventing these attacks, as AI models often fail to detect malicious code within legal contexts.
Already a member? Log in here