AI-Driven Cloud Detection Engineering: Turning Security Telemetry Into Action

Image Source: depositphotos.com

Amal Mammadov is a cloud security practitioner and detection engineering specialist whose work sits at the intersection of threat intelligence, cloud-native architecture, and security operations. In this interview, he outlines why most organisations are losing ground despite heavy security investments and what it actually takes to build detection programmes that produce outcomes.

We have frequently viewed detection engineering as a separate area of expertise, rather than an aspect of security operations. With organisations continuing to grow and expand in cloud-first and AI-enhanced environments with telemetry that's abundant yet signals that are scarce, many are now experiencing pressures to transition from reactive monitoring to more deliberate, or "engineered", approaches to detection.

How can those leading SIEM-based efforts on a daily basis come to understand what cloud detection engineering really entails and why this type of capability is becoming increasingly important for today's security programmes?

The most important reframe is this: detection engineering means building detection like a product, not treating it like a one-time configuration.

Most organisations today are generating enormous volumes of cloud telemetry: logs, API events, permission changes, and access records, but they're not translating that data into outcomes. Detection engineering is the discipline that closes that gap. It takes "we have data" and turns it into "we can reliably identify abuse early and respond before damage spreads."

What distinguishes it from simply writing rules is the operational rigour it demands. You're designing detection logic, testing it against realistic scenarios, tuning it against your specific environment, and maintaining it as your systems evolve. Cloud infrastructure changes constantly - new services get deployed, access patterns shift, and threat actors adapt - so detection can't be a static artefact. It has to be treated as living software, versioned, validated, and continuously improved. Organisations that understand these principles are building real security capability. Those who lack it are paying for an expensive false sense of coverage.

Many organisations have already invested heavily in security tooling that promises to solve these issues. Could you please help me understand why breaches continue to occur at scale?

Tooling can ingest data, but it cannot manufacture a detection strategy.

What I see repeatedly across the industry is a fundamental confusion between logging and security. Teams mistakenly believe that capturing everything guarantees their protection. Logging is necessary, but it's only the raw material. The actual work - defining what threats matter in your specific environment, modelling how attackers move through your systems, designing detection that maps to real attack progressions, and establishing response that can actually contain damage - that work has to be done by people who understand both the threat landscape and the operational context.

The other problem is that default detections are, by design, generic. They catch the obvious, low-sophistication activity. But the incidents that cause material harm tend to be contextual and deliberate: privilege misuse by a compromised identity, slow persistence established over days, and lateral movement through trusted internal paths. These are not "turn it on and forget it" problems. Detection logic must be tailored to the actual functioning of your environment and the potential misuse of your most sensitive assets. The gap between expensive tooling and effective security is almost always a strategy and engineering gap, not a data gap.

As organisations migrate to cloud architectures, security teams gain unprecedented visibility but also face a dramatic increase in data complexity and volume. This shift raises a fundamental question about how telemetry itself has evolved. How is cloud telemetry fundamentally different from what security teams have worked with in on-premises environments?

The most significant difference is visibility into the control plane, and that changes everything.

In traditional on-premises environments, defenders were largely watching network traffic and endpoint activity. In the cloud, virtually every meaningful change happens through an API: identity events, permission assignments, resource creation, storage access, encryption changes, and network policy modifications. All of it is logged. That means defenders now have a detailed, structured record of administrative intent, not just execution.

That's an extraordinary advantage. But it comes with a corresponding challenge: the volume and variety of that telemetry are overwhelming if you treat every event as potentially significant. The defenders who extract real value from cloud telemetry are those who are deeply disciplined about signal selection and who have thought carefully about which actions, in which sequences, under which conditions, indicate genuine threat behaviour. The advantage of cloud visibility only materialises when you've done the work to know what you're actually looking for.

With the rapid adoption of AI in security tools, many organisations are trying to understand whether it truly provides a significant security advantage or simply adds another layer of complexity. For security leaders, the distinction between meaningful impact and unrealistic expectations are becoming increasingly important. In what areas does AI truly help, and where is its potential overstated?

AI delivers real value when it reduces the operational friction that slows human responders down at the worst possible moments.

Concretely, that means clustering related events so analysts aren't triaging thirty separate alerts for what is functionally one incident. It means automatically enriching alerts with asset ownership, permission levels, historical change context, and business criticality before a human ever looks at them. It means surfacing suspicious behavioural sequences that span identity activity and workload behaviour simultaneously, patterns that would take an analyst significant time to correlate manually. And it means translating complex event chains into plain-language summaries that allow faster, more confident triage decisions.

Where AI is consistently oversold is in the premise that it can substitute for detection strategy. If you haven't defined what you're hunting for, if you haven't modelled attacker behaviour in your environment and designed logic that maps to it, AI will simply accelerate noise. You'll receive faster alerts that lead nowhere, with a more sophisticated interface around them. The teams achieving strong outcomes with AI are those who've already built sound detection foundations and are using AI to operate those systems more efficiently. It's a force multiplier, not a foundation.

What is your approach to building detections that analysts actually trust and act on?

The core principle is to resist alerting on isolated events unless confidence is exceptionally high.

In cloud environments, a single event almost never tells you enough to act. A CreateAccessKey call could be routine automation or the opening move of a credential-based attack. The difference lies entirely in context: who performed the action, when, from where, what preceded it, and what happened next. Detections that fire on single events force analysts to do that contextual work under time pressure, and when the context turns out to be benign, trust erodes.

What I've found far more effective is detection logic designed to recognise attack progression - sequences of behaviour that, taken together, indicate malicious intent with high confidence. Privilege escalation followed by anomalous authentication and sensitive data access. Logging disabled in proximity to a new administrative role assignment. A workload suddenly accessing secrets it has never touched in its operational history. Unusual cross-account API activity tied to a privileged identity. These multi-stage patterns produce dramatically fewer false positives, and critically, when they fire, analysts respond with urgency rather than scepticism. That trust is not a soft benefit - it is operationally essential. A detection programme that isn't trusted is one that's being quietly ignored.

How do you sustain detection quality in environments where engineering teams are shipping constantly?

By treating detection with the same engineering discipline applied to the software it's protecting.

Three practices are non-negotiable in environments moving at speed. First, detection-as-code: rules belong in version-controlled repositories, subject to peer review, with full change history and the ability to roll back. This eliminates the fragility of detection logic, which lives in tool-specific configurations that nobody fully owns or understands. Second, standardised enrichment: every alert should arrive with built-in context, the identity involved, their permissions at the time of the event, the asset affected, and recent change history. Analysts shouldn't have to go hunting for this information during an active incident. Third, continuous validation: detections need to be regularly tested against known attack behaviours in realistic conditions. Waiting for a real attacker to discover that a detection is broken is not a validation strategy.

When teams operate this way, detection becomes resilient to the constant change that characterises cloud environments. Rules don't silently break when a new service is introduced or an access pattern shifts. The detection programme evolves alongside the infrastructure rather than lagging behind it.

False positives remain one of the most corrosive problems in security operations. Could you please share your approach to minimising noise while avoiding the creation of blind spots?

Noise is almost always a symptom of detection logic that was designed to generate signals rather than support decisions.

The discipline I apply starts with prioritisation: not every asset and every identity deserves the same detection investment. The highest-value targets, privileged identities, sensitive data repositories, and critical infrastructure components warrant the most precise and aggressive detection coverage. Beyond that, baselines need to be contextual rather than global. "Unusual for this identity and this workload" is a fundamentally different and more powerful signal than "unusual in aggregate across all users". Cloud environments are too heterogeneous for global thresholds to be meaningful.

Sequencing matters enormously, as I've described; alerting on behavioural chains rather than isolated events filters out the vast majority of benign noise. But there's also a harder operational discipline that many teams resist: you have to be willing to delete detections that aren't working. Organisations accumulate years of rules that generate alerts nobody acts on, and they're kept alive out of an assumption that they might catch something someday. In practice, they're consuming analyst attention and degrading the signal-to-noise ratio that the entire programme depends on. A smaller set of high-precision, well-maintained detections outperforms a large library of stale ones every time.

How do you measure whether a detection engineering program is actually succeeding?

By measuring what leads to action, not what produces activity.

The metrics that matter are time-based and outcome-based. Mean Time to Detect is important, but it's only the beginning of the story. More telling is how quickly analysts can reach a high-confidence decision - not just identify that something happened but understand what it means and what response is appropriate. Equally important is how quickly containment actions can be taken to limit blast radius once a threat is confirmed. And at the programme level, what percentage of alerts are actually leading to meaningful responses, and which attacker paths can be reliably interrupted before damage is done?

If detection activity isn't consistently leading to containment or prevention, then what the organisation has built is expensive logging infrastructure with a security interface on top of it. The standard I hold programmes to is whether they can answer three questions quickly and confidently under pressure: What happened? Why does it matter? What do we do right now? When a team can do that consistently, including during complex, multi-vector incidents, the programme is performing.

Where is this field heading over the coming years?

Towards automated response that is trusted enough to execute without human approval on high-confidence events, and that shift will redefine what security operations teams actually do.

The trajectory isn't toward AI systems running entire security operations centres autonomously. It's moving towards a clearer division of tasks: smart detection programmes will spot serious threats, and automated responses will quickly take actions like shutting down sessions, changing compromised passwords, isolating affected workloads, and separating risky identities, all in seconds instead of minutes. The significance of this is difficult to overstate. Modern attackers operate in timeframes that human-paced response cannot match. Compressing attacker dwell time from hours to minutes is one of the highest-impact capabilities that a security programme can develop.

The teams that will lead this transition are those investing now in detection precision, because safe automation is only possible when the detections driving it are trusted. An automated response triggered by a false positive is not a security win, it's an operational incident. The foundation has to be right before the automation can be trusted, which is why the engineering discipline being built today is the actual prerequisite for the future these teams are trying to reach.

Final question: what's your most important piece of advice for security leaders trying to modernise their cloud detection programmes?

Stop measuring success by how much data you're collecting and start measuring it by how quickly and confidently your team can act.

Telemetry is raw material, and raw material has no inherent value until it's processed into something useful. The leaders who build genuinely effective cloud security programmes are those who relentlessly orient every investment - tooling, engineering time, and process design - around one question: does this make us faster and more confident when something is happening?

That reorientation requires some things that are organisationally uncomfortable. It requires deleting detections that aren't working. It requires instrumenting for outcomes rather than coverage. It requires making explicit decisions about which threats matter most and building deeply for those rather than shallowly for everything. But the organisations willing to make those trade-offs end up with something rare: a security programme that performs under pressure, that their teams trust, and that actually reduces the probability and impact of the incidents that matter most.