How to know if your agents are correct with Dylan Williams
Join us for this week's Defender Fridays as we explore AI agent evaluation with Dylan Williams, Co-founder and Chief Research Officer of Spectrum Security.
At Defender Fridays, we delve into the dynamic world of information security, exploring its defensive side with seasoned professionals from across the industry. Our aim is simple yet ambitious: to foster a collaborative space where ideas flow freely, experiences are shared, and knowledge expands.
What We'll Discuss
In this episode, Dylan Williams breaks down one of the hardest problems in agentic AI: how do you actually know your agents are doing the right thing? From building expert rubrics to deploying agent judges in production, Dylan shares lessons from the front lines of building and evaluating AI-driven security workflows.
Key Topics:
- Why human expert review is the gold standard for agent QA -- and why it doesn't scale
- How to build and calibrate an agent judge using labeled production traces
- Why deterministic validation should always come before vibes-based evaluation
- How agent judges drift over time and why turning failures into tests is the fix
- The role of trajectory analysis in diagnosing what agents actually did -- and why
- What a self-improving agentic eval loop could look like in cybersecurity
About Our Guest
Dylan Williams is Founder of Spectrum Security, a company building at the intersection of agentic AI and security. A longtime blue teamer with deep roots in detection engineering, Dylan has been working on the hard problem of AI agent correctness and evaluation since before most teams knew they needed to.
Register for Live Sessions
Join us every Friday at 10:30am PT for live, interactive discussions with industry experts. Whether you're a seasoned professional or just curious about the field, these sessions offer an engaging dialogue between our guests, hosts, and you -- our audience.
Register here: https://limacharlie.io/defender-fridays
Subscribe to our YouTube channel and hit the notification bell to never miss a live session or catch up on past episodes on our website!
Sponsored by LimaCharlie
This episode is brought to you by LimaCharlie, the Agentic SecOps Workspace (ASW) - where AI agents operate security infrastructure using the same controls and authority as human analysts, with every action visible, governed, and auditable.
Why LimaCharlie?
- Eliminate vendor sprawl and tool complexity
- Deploy and scale effortlessly on native multi-tenant architecture
- Reduce costs with intelligent data routing and free 1-year retention
- Build custom solutions with 100+ security capabilities on-demand
- Accelerate response with agentic AI that acts directly within predefined workflows
Try the Agentic SecOps Workspace free: https://limacharlie.io
Learn more: https://docs.limacharlie.io
Follow LimaCharlie
Sign up for free: https://limacharlie.io
LinkedIn: https://www.linkedin.com/company/limacharlieio/
X: https://x.com/limacharlieio
Community Discourse: https://community.limacharlie.com/
Host: Maxime Lamothe-Brassard - Founder at LimaCharlie
Guest: Dylan Williams - Founder and CRO at Spectrum Security
#defenderfridays #limacharlie #cybersecurity #infosec #secops #aiagents #detectionengineering