Security Comprehension and Awareness Measure (SCAM) Demo
What happens when a state-of-the-art AI assistant can read your email, browse the web, and fill in your passwords — but can’t reliably tell a scam from the real thing?
In this video, you’ll see real examples of frontier AI agents:
- Summarizing phishing emails without recognizing the threat
- Logging into fake websites with stolen-lookalike URLs
- Forwarding sensitive information without reading it
- Entering personal and credit card details into fraudulent storefronts
These aren’t edge cases.
This is the result of 1Password’s new benchmark: SCAM — Security Comprehension & Awareness Measure.
Unlike traditional AI safety tests that directly ask a model whether something is malicious, SCAM evaluates AI agents in realistic, real-world scenarios. Instead of asking, “Is this phishing?”, we let the AI perform tasks where it might encounter:
- Fake login pages
- Fraudulent storefronts
- Malicious emails
- Lookalike domains
The results? Even the best-performing frontier models failed critical security scenarios. Safety scores ranged from 38% to 92%, and even the top model averaged multiple critical security failures across 30 scenarios.
But there’s good news.
When we added a short, general cybersecurity training “skill” to put the AI in a more security-aware mindset, performance improved dramatically across every model.
We’re open-sourcing:
- The SCAM benchmark
- The full results
- The evaluation tooling
Our goal is to help developers build safer AI assistants — and to support the work 1Password is doing to enable AI agents to act securely on your behalf.
🔎 Learn more, explore the leaderboard, or contribute:
1Password.github.io/scam
#1Password #AI #CyberSecurity #Phishing #AIAgents #SecurityResearch