Cloud incident response inherits the lifecycle of on-premise
incident response but operates in a substrate where the
control plane is the new physical access, evidence is
ephemeral, infrastructure can be re-created in seconds, and
the provider sometimes plays a participatory role. A
compromised long-lived static key reaches every region of the
account before a defender finishes reading the alert; a
terminated EC2 instance, deleted Azure VM, or destroyed
Compute Engine VM takes its ephemeral disk and memory with it
unless a snapshot has already been taken; a tenancy whose
break-glass account has expired multi-factor enrolment
discovers the gap only when it is too late to fix. The
principles on this page exist because each of these failure
modes has been observed, written up by an incident-response
firm, and turned into a CISA or NSA advisory — the cost of
re-learning them is paid in real customer data.
This page is organised around the canonical IR lifecycle —
preparation, detection and analysis, containment and
eradication and recovery, and post-incident activity —
adapted to cloud-specific containment patterns, forensics
primitives, communication chains, and recovery procedures.
Each section forward-links to the principles pages that own
the day-zero controls IR depends on:
logging and detection for the
alert pipeline,
IAM for the credential isolation
primitives,
data protection for the immutable
backups recovery relies on, and
the cloud threat model for
the attack chains containment must close.
IR lifecycle
The lifecycle terminology this corpus uses is the four-phase
model that appears in both NIST SP 800-61 Rev 2 (Computer
Security Incident Handling Guide, August 2012) and its
successor NIST SP 800-61 Rev 3 (Incident Response
Recommendations and Considerations for Cybersecurity Risk
Management: A CSF 2.0 Community Profile, April 2025).
Rev 3 reframes the same lifecycle through the lens of the
NIST Cybersecurity Framework 2.0 outcomes (Govern, Identify,
Protect, Detect, Respond, Recover) but preserves the
operational vocabulary that practitioners use: Preparation;
Detection and Analysis; Containment, Eradication, and
Recovery; and Post-Incident Activity. Where this page cites
the lifecycle, it cites Rev 3 as the current authoritative
publication; teams whose runbooks are still Rev 2-aligned
should read the Rev 3 transition note in the publication's
introduction.
Preparation is the work done before the
alert fires: break-glass accounts pre-provisioned, runbooks
rehearsed, communication channels exercised, forensic
evidence-collection accounts pre-warmed. The quality of the
incident response is largely determined here. In cloud, the
preparation surface includes identity (break-glass, MFA
enrolment, IdP independence), structure (security-dedicated
accounts, subscriptions, projects, compartments), and
process (runbooks specific to the cloud control plane).
Detection and analysis begins with the
alert pipeline documented in
the General logging and detection principles page.
Cloud detection is unusually rich — CloudTrail / Activity
Log / Cloud Audit Logs / OCI Audit capture the entire
control-plane history — and unusually fragile, because an
attacker who reaches the audit configuration can suppress
the very feed that would detect them. Analysis triages the
alert against the threat model on
threat-model.html, assigns
a severity (P0/P1/P2/P3/P4 in the convention this corpus
uses), and starts the IR clock.
Containment, eradication, and recovery are
three sub-phases that overlap in practice. Containment
isolates the compromise without destroying the evidence
(the misconfiguration callout in
§Containment patterns walks
through the canonical "do not delete the principal" rule).
Eradication removes the adversary's footholds — invalidated
tokens, removed back-door identities, rotated KMS material,
revoked OAuth consent grants. Recovery restores service from
a known-good state, drawing on the immutable backup
primitives documented in
the General data protection principles page.
Post-incident activity closes the loop:
root-cause analysis, control-gap identification, runbook
updates, posture re-baselining. The output of this phase is
a backlog of preventive and detective controls that feed
back into the hardening programme. An incident that does
not produce a backlog item — at minimum, "this should have
been detected sooner" or "this should have been impossible"
— has not finished its lifecycle.
[Diagram placeholder]
Figure 1 — NIST SP 800-61 incident response lifecycle:
Preparation feeds Detection and Analysis; Containment,
Eradication and Recovery proceeds in iterative loops with
the detection feed; Post-Incident Activity arrows back into
Preparation, closing the lifecycle. The cycle is
continuous, not linear: an incident in progress sharpens
preparation for the next.
Preparation
Preparation is where cloud incident response is won or lost.
The work is unglamorous: runbooks, accounts, contact lists,
rehearsal. The principles below are the floor below which
IR readiness should not fall.
Runbooks pre-positioned, per-scenario, and rehearsed.
A runbook is a step-by-step procedure for a specific class
of incident: compromised IAM user, compromised role with
AssumeRole privileges, ransomware on an EC2 fleet,
accidentally-public storage bucket, OAuth consent-phishing
across the tenant, exposed credentials in a public Git
repository. Each runbook names the responsible role
(incident commander, scribe, IAM analyst, network analyst,
legal liaison), the first ten minutes of actions, the
escalation criteria, and the success criteria for closure.
A runbook that has never been rehearsed in a tabletop is a
first-draft document, not a runbook.
Break-glass accounts maintained with hardware MFA.
Every cloud has scenarios in which the federated identity
provider, the conditional-access policy, or the IAM
control plane is the thing under attack. The break-glass
account is the local-to-the-cloud identity that bypasses
the federation path and admits the responder regardless.
It must exist before the incident; it must enrol a hardware
MFA token whose backup is stored in physical custody (a
safe, a sealed envelope in legal counsel's office); its
credentials must be tested quarterly; and any use must
generate a CRITICAL alert. The illustrative control in
§Illustrative control
formalises this in DS-05 markup.
Security-dedicated account, subscription, project,
and compartment. The detection pipeline, the
forensic-evidence retention bucket, the SIEM, the
incident-response runner — none of these should live in
the account under attack. AWS Organizations security-tooling
account, Azure landing-zone management subscription, GCP
security-foundations folder/project, OCI security
compartment: each provider has the construct, and the
structural separation pays out the first time an attacker
reaches a workload account and the SIEM keeps writing.
Out-of-band communication. The chat
platform, ticketing system, and pager rotation the
organisation uses every day are themselves cloud-hosted
and themselves potential incident scope. The IR plan names
a fallback channel — a managed Signal group, a Wickr
equivalent, a phone tree, a Matrix server outside the
affected provider — and exercises it. Trying to discover
the fallback during the incident is the worst possible
time to discover it does not work.
Legal, regulatory, and external-party contact lists
maintained. The cyber-insurance carrier's incident
hotline, the FBI and CISA points of contact, the data
protection authority for every jurisdiction the organisation
operates in, outside counsel, and the public-relations
escalation chain are all required reading on day one of an
incident. Breach-notification clocks are unforgiving: the
GDPR Article 33 obligation is seventy-two hours from
awareness for personal-data breaches; CIRCIA (the Cyber
Incident Reporting for Critical Infrastructure Act, with
CISA's implementing rule in finalisation at writing time —
verify current status against the CISA CIRCIA page) imposes
a seventy-two-hour reporting window for covered cyber
incidents and a twenty-four-hour window for ransom
payments at critical-infrastructure entities once the rule
is in force. Sector regulators (HIPAA, PCI DSS, NYDFS,
MAS-TRM) impose their own windows on top.
Containment patterns
Cloud containment differs from on-premise containment
because the unit of isolation is an IAM construct, a VPC
route, or a security-group rule rather than a network cable.
The patterns below are the canonical containment moves; the
per-provider IR pages document the exact CLI and IaC.
Credential isolation. When a human or
workload identity is suspected compromised, the goal is to
neutralise the credential while preserving everything an
investigation will need. The canonical move is to attach a
Deny-All inline policy to the principal, revoke active
sessions (AWS GlobalSignOut and StsRevokeOldSessions, Azure
AD revoke-signin-sessions, Google Cloud revoke OAuth
grants, OCI session termination), rotate any static
credential the principal holds, and leave the principal
itself in place. Deleting the principal seems decisive but
deletes the evidence trail that maps actions to actors and
breaks the forensic timeline at exactly the moment it is
most needed.
Workload isolation. A compromised VM or
container is moved to a forensic isolation network — an
AWS forensic VPC with no egress, an Azure forensic VNet
with NSG deny-all, a GCP forensic VPC with firewall
deny-all egress, an OCI forensic VCN with Security List
deny-all — and snapshotted before any remediation runs.
The snapshot is the artefact the forensic investigation
works against; the running workload is held in isolation
long enough to capture volatile state (process tree,
network connections, memory image) where the tooling
permits, and then terminated rather than left in service.
Termination without a snapshot destroys the evidence;
live remediation contaminates it.
Blast-radius assessment. Once the immediate
containment is in place, the question shifts to "what else
did this principal or workload touch?" The blast-radius
framework documented in
the cloud threat model page
— single resource, single account, organisation, cross-tenant
— drives the next containment moves. A compromised IAM user
with read-only privileges is one scope; a compromised role
with cross-account AssumeRole into a payments environment is
a vastly larger scope, and the containment work expands to
match.
[Diagram placeholder]
Figure 2 — Containment decision tree: alert triages to
"identity compromise" or "workload compromise" branch;
identity branch routes to attach-deny-policy, revoke-sessions,
rotate-static-credentials; workload branch routes to
forensic-network-move, snapshot, volatile-capture,
terminate. Both branches converge at blast-radius
assessment, which iterates the containment moves until
the affected scope is bounded.
Forensics
Forensic evidence in cloud is built from three substrates:
block-storage snapshots (EBS snapshots, Azure managed-disk
snapshots, GCP persistent-disk snapshots, OCI block-volume
backups), the control-plane audit log (CloudTrail, Azure
Activity Log and Microsoft Entra audit logs, Google Cloud
Audit Logs, OCI Audit), and where supported, memory
acquisition from running workloads. NIST SP 800-86 (Guide to
Integrating Forensic Techniques into Incident Response,
August 2006) is the canonical reference for evidence
handling; its core principles — chain of custody, original
preservation, working-copy analysis — apply unchanged to
cloud artefacts.
Block-storage snapshots are the cheapest and most reliable
piece of cloud forensic evidence. Taken before any
remediation runs, a snapshot freezes the state of the
compromised volume at a point in time; the snapshot can be
attached read-only to an isolated forensic VM in the
security-dedicated account, hashed for chain-of-custody, and
copied to write-once forensic storage with object-lock
retention. The forensic environment is itself a piece of
preparation, not improvisation: an account or subscription
dedicated to incident response, pre-warmed with the analysis
tooling (Volatility, Autopsy, the SANS SIFT workstation
image), with retention configured so that an attacker who
later reaches the security account cannot tamper with the
evidence.
Memory acquisition is harder. Microsoft Defender for Servers
Plan 2 includes memory-dump capabilities for Windows VMs;
AWS Lambda-based memory acquisition with frameworks like
Margarita Shotgun or fmem works for Linux; GCP and OCI rely
on agent-based capture configured in advance. Memory is
where post-exploitation tooling lives (in-memory loaders,
unbacked code, decrypted secrets) and is the most volatile
evidence — captured live or lost.
The audit log is the timeline. CloudTrail Lake, Azure
Activity Log archived to a Log Analytics workspace and to a
tamper-evident storage account, Cloud Audit Logs piped to a
dedicated logs project and BigQuery dataset, OCI Audit
archived to Object Storage with retention rules — each
provides a tamper-evident, queryable record of every
control-plane action. Chain-of-custody for the audit log
means knowing that nothing in the path from API call to
archived record allows undetected modification, which is
why log-integrity validation (CloudTrail log file
validation, immutable storage accounts, Cloud Logging's
integrity guarantees, OCI Audit retention rules) is treated
as a foundational control on
the General logging and detection principles page.
Exporting evidence for legal or regulatory purposes follows
a documented procedure: hash before export, package with
metadata describing provenance and acquisition method,
transfer over an authenticated channel to legal counsel or
the receiving regulator, and retain the original copy in
forensic storage. The procedure is the same whether the
recipient is in-house counsel, outside counsel, a regulator,
or law enforcement.
Communication
Internal communication during an incident is driven by an
incident commander — a single accountable role with
decision-making authority — and a severity classification
that bounds the response (P0 = active business-critical
impact, all hands; P1 = serious but bounded impact, on-call
plus subject-matter experts; P2 = limited impact, normal
working hours; P3/P4 = informational, follow-up only). The
incident commander is not necessarily the most senior
technical responder; the role is coordination, not
investigation. Scribes maintain the running log; subject-matter
experts execute containment and remediation; the incident
commander makes the call to escalate, to communicate, and
to close.
External communication runs through a legal-counsel approval
gate. Every statement to a regulator, a customer, a law
enforcement agency, or the public passes through counsel
before transmission, because every statement creates legal
exposure. The CSA Cloud Incident Response Framework
formalises this with explicit roles for legal counsel, the
data protection officer, and the communications lead, and
recommends pre-drafted templates for the common
notification scenarios (customer notification, regulatory
breach notice, public statement) so the drafting cycle is
short during the live incident.
Regulatory notification windows are unforgiving and
jurisdiction-specific. The seventy-two-hour GDPR Article 33
clock starts at organisational awareness of a personal-data
breach. CIRCIA's seventy-two-hour reporting window for
covered cyber incidents at covered entities starts at
reasonable belief that a covered incident has occurred,
with a twenty-four-hour clock for ransom payments. Sector
regulators stack their own clocks. The notification matrix
for the jurisdictions and sectors the organisation operates
in is part of preparation, not part of the incident.
Recovery and post-incident
Recovery restores service from a state the organisation
trusts. The substrate is immutable backups whose canonical
treatment lives on
the General data protection principles page §Retention, backup, and recovery:
S3 Object Lock with compliance-mode retention, Azure Backup
immutable vaults, Cloud Storage object retention policies and
bucket lock, OCI Object Storage retention rules. The
ransomware-resistant property of these substrates is that
the credentials available to a compromised account cannot
shorten the retention window or delete the backup. Recovery
procedures restore from these backups, not from snapshots
that the attacker had time to encrypt or delete.
Root-cause analysis follows recovery. The output is a
written incident report with a timeline (when did detection
fire, when did containment apply, when did the adversary
have access), a root-cause statement (the specific
misconfiguration, control gap, or process failure that
allowed the incident), and a remediation backlog
(preventive and detective controls that would have closed
the gap). The remediation backlog is not aspirational; each
item has an owner, a target date, and a tracking ticket.
Posture re-baselining closes the loop with the configuration
management substrate. Microsoft Secure Score, AWS Security
Hub security score, Google Security Command Center Premium
security posture, and OCI Cloud Guard problem counts each
provide a quantitative posture metric; the incident's
remediation backlog should produce a measurable improvement
in that metric. An incident whose backlog does not move the
posture metric was probably not analysed deeply enough.
Tabletop exercises and game days
Runbooks that have never been exercised are first drafts.
Quarterly tabletop exercises — facilitated scenarios where
the IR team walks a hypothetical incident through the
runbook without touching production — surface stale
contact lists, ambiguous escalation paths, and runbook
steps that do not survive contact with reality. Annual
purple-team exercises, in which a red team exercises real
attack chains against a production-equivalent environment
while the blue team responds against real runbooks, are the
higher-cost, higher-value complement. Both feed back into
preparation: every exercise produces a list of runbook
edits, contact-list corrections, and control gaps, and
every list closes before the next exercise.
Cross-provider equivalence
The principles above map to provider-specific products and
patterns. The table below is a navigation aid, not a
compliance crosswalk; per-provider depth lives in the IR
pages of each provider section.
Forensic VPC + firewall deny-all + IAM Deny policy
Forensic VCN + Security List deny-all + IAM Deny statement
Break-glass identity pattern
Root user with hardware MFA + IAM Identity Center emergency-access role
Global Administrator break-glass account excluded from Conditional Access
Organisation Admin break-glass group with hardware MFA
Tenancy Administrator break-glass user in Default identity domain
Illustrative control — Pre-positioned break-glass account
The control below illustrates the canonical
<article class="control-box"> markup with
a CRITICAL PREVENTIVE pairing. It is provider-neutral; each
provider's IR page restates the same intent with
provider-specific CLI and IaC. The control mitigates the
scenario in which an attacker — or, more commonly, a
misconfigured Conditional Access policy or expired federation
certificate — locks out the very responders who would
otherwise contain the incident.
gen-ir-ex-01
Maintain pre-positioned break-glass account with hardware MFA
⛔CRITICALPREVENTIVE
The control provisions one (preferably two, for
geographically separated custody) break-glass identity per
cloud tenant or organisation: AWS root user plus an IAM
Identity Center emergency-access role; Microsoft Entra
Global Administrator excluded from Conditional Access
policies that could lock it out; Google Cloud organisation
administrator in a dedicated break-glass group; OCI
Tenancy Administrator in the Default identity domain. Each
account enrols a hardware FIDO2 / WebAuthn authenticator
whose backup is stored in sealed physical custody (a safe,
outside counsel's office, an envelope in a bank deposit
box). The account's credentials are tested quarterly via
an in-rehearsal sign-in to a non-destructive read-only
scope, and every use generates a CRITICAL alert routed
through the security findings substrate and an
out-of-band channel. The account is excluded from the
federation path that day-to-day administrators traverse —
the entire point is that it works when federation does
not.
CIS AWS Foundations v3.0.0
CIS Microsoft Azure Foundations v3.0.0
CIS GCP Foundation v4.0.0
CIS OCI Foundation v2.0.0
NIST SP 800-53 rev5
ISO/IEC 27001:2022
ISO/IEC 27017:2015
Root user hardware MFA + IAM Identity Center separation recommendations (verify section number against pinned version)
Global Administrator role separation and break-glass exclusion recommendations (verify section number)