Cloud threat hunting works when defenders stop treating cloud logs as isolated events and start reasoning across identity, control-plane, and data-plane activity.
From Mitiga Labs, Ucha Gobejishvili shows how incident responders turn cloud events into real attack chains they can investigate, validate, and contain before impact.
Most cloud threat hunting fails for the same reason: teams treat the cloud like a louder, more expensive data center. They bring their on-prem habits with them, write queries looking for "suspicious" event names, alert on logins from weird IPs, and call it a day. Then they watch real breaches hit the news months later and realize those exact events were sitting in their logs the whole time.
Mitiga Labs sees this kind of gap in cloud investigations all the time. The evidence exists, but the story is not reconstructed as an attack chain until it’s too late.
Static indicators die fast in the cloud. IPs, tools, and even classic malware TTPs barely last an engagement. The attacker’s staging bucket is usually gone before your IOC feed refreshes. What survives is behavior: how an identity moves through the APIs, how a role gets assumed, how permissions quietly expand over a few days, or how a workload suddenly touches data it has no business seeing.
Threat hunting in the cloud is a reasoning game first and a query language second. The real work comes from knowing what to ask, why it matters, and how a control plane action turns into a data plane consequence.
Cloud vs. On-Prem Threat Hunting
Identity is the new perimeter. On-prem, you hunt lateral movement between hosts. In the cloud, you hunt lateral movement between identities IAM roles, managed identities, service accounts, and federated sessions. An attacker doesn’t need to pivot machines. They assume a different role or impersonate a service account, and they’re suddenly somewhere new.
Identity-driven cloud attacks use legitimate or compromised identities, tokens, service accounts, or federated sessions to move through APIs instead of hosts.
The classic kill chain falls apart when stolen credentials put the attacker inside the control plane from the start. Your first observable event is often just an API call from a new location using an existing identity.
Everything is API-driven, which is both a blessing and a trap. The control plane logs a lot, but it usually stops at the configuration action. They won’t show you the actual data being read or exfiltrated. That lives in separate data plane logs that are often disabled by default.
Serverless functions, containers, and autoscaled instances disappear before you can even look at them. You have to hunt through logs and behavior because the artifacts you’d normally grab are already gone. Attackers love the gap between control plane and data plane. They grant themselves access in one, then steal data in the other. If your hunts stay in only one plane, you’ll miss half the story.
Hypothesis Engineering
Before you touch any data, you need a real hypothesis that is falsifiable, specific, and testable.
- Weak hypothesis: “An attacker might escalate privileges using identity misconfigurations.”
- Stronger hypothesis: “An attacker who compromised a developer’s federated session will try to chain role-passing primitives (PassRole + RunInstances, or equivalent) to spawn compute under a more privileged identity than they currently have.”
The second one gives you an actor, a technique, a scope, and clear observables.
Same with data access. Instead of “look for unusual S3 access,” try “a production workload identity that has only ever read from its own bucket suddenly starts listing or reading from other containers.”
For every hypothesis, ask yourself, "If this were happening right now, what would I see?” If you don’t have a clear answer, keep sharpening it.
Pre-Hunt Preparation: Validate Your Logs
This is the step most people skip, and it’s why most hunts come up empty.
Don’t just assume your logs are good. Validate them against your specific hypothesis. Are control plane logs enabled everywhere, across all regions and accounts? Are the important data plane logs actually turned on for your sensitive resources? Can you reliably join events across identities and sessions?
Map your blind spots honestly.
Serverless invocation logs, Kubernetes audit logs, deep session context: these are often off by default. Know where your gaps are before you start hunting.
Worry about log integrity, too.
If an attacker can turn off logging or change log destinations, your “clean” results might be meaningless. Hunt for logging disruptions as part of your process.
Enrich your identity data before you hunt.
Raw ARNs and object IDs are useless in the middle of an investigation. You need to know who or what originally assumed that identity, via what path, and what it’s supposed to be doing.
Execution Walkthrough: Hunting Suspicious Identity Chaining
Let’s make this concrete. Hypothesis: an attacker compromised CI/CD credentials and is using identity chaining to reach production data they shouldn’t touch.
- Step 1: Map the normal chains. Know what legitimate flows look like (OIDC token → Build identity → Deploy identity, etc.).
- Step 2: Pull every assumption event downstream of your CI/CD identities over the last 30 days. Build a simple graph of source → target identities. Legitimate chains are frequent and stable. Look for the sparse, new, or weird ones.
- Step 3: For each suspicious chain, pull all activity from that session. What was the first action after assumption? Recon calls (listing identities, getting caller identity, describing orgs) are huge red flags.
- Step 4: Cross the planes. Check data plane logs for the same session. Did they read new buckets, access unrelated data, or exfiltrate anything?
- Step 5: Backtrack to the root. Was the original OIDC trust too loose? Wildcarded repo/branch? That’s usually where the real problem lives.
- Step 6: Turn the finding into detection. Alert on unusual chaining targets or recon behavior in deployment sessions.
This pattern works for many scenarios. SSRF on workloads, supply chain compromise, cross-account abuse, you name it.
Common Cloud Threat Hunting Pitfalls
- Incomplete logging coverage that you don’t realize exists.
- Trusting the SIEM too much (always keep a path to raw logs).
- Hunting without solid identity context.
- Over-relying on MITRE ATT&CK instead of your actual environment.
- Chasing volume instead of sharp hypotheses.
- Failing to close the loop: every real finding should become a remediation, a detection, or a conscious decision to accept the risk.
Closing Thoughts
Cloud threat hunting isn’t fundamentally a tooling problem. The tools are already mature enough to support good hunts. The difficult part is building the mental model: understanding how identities chain together, how permissions are truly evaluated, and how control plane activity translates into operational risk.
The adversaries hunting your cloud are exploiting the seams, the assumptions, and the complexity that most defenders never fully internalize—they're not running checklists or vendor playbooks. Understand the system at a deeper level than they do. Everything else flows from there.
FAQ
What is cloud threat hunting?
Cloud threat hunting is a hypothesis-driven investigation across cloud logs, identities, APIs, workloads, and data access. The goal is to find behavior that shows how an attacker is moving through disparate, seemingly isolated, suspicious events.
Why do cloud threat hunts miss real attacks?
Cloud hunts miss attacks when teams rely on static indicators, isolated event names, or incomplete logs. In cloud environments, the important evidence often lives across identity activity, control-plane changes, and data-plane access.
Why does identity matter so much in cloud threat hunting?
In the cloud, attackers often move through roles, service accounts, managed identities, tokens, and federated sessions instead of hosts. Hunting identity chains helps responders understand what the attacker could reach and what they really touched.
What is the difference between control-plane and data-plane evidence?
Control-plane evidence shows configuration and administrative actions, such as role assumptions or permission changes. Data-plane evidence shows access to the data itself, such as object reads, bucket access, or exfiltration paths.
How should a cloud threat hunt turn into action?
A good hunt should end in a decision: create a detection, fix a trust or logging gap, contain an active identity path, or consciously accept the risk.
.png)