General Incident Response Principles

Overview

Cloud incident response inherits the lifecycle of on-premise incident response but operates in a substrate where the control plane is the new physical access, evidence is ephemeral, infrastructure can be re-created in seconds, and the provider sometimes plays a participatory role. A compromised long-lived static key reaches every region of the account before a defender finishes reading the alert; a terminated EC2 instance, deleted Azure VM, or destroyed Compute Engine VM takes its ephemeral disk and memory with it unless a snapshot has already been taken; a tenancy whose break-glass account has expired multi-factor enrolment discovers the gap only when it is too late to fix. The principles on this page exist because each of these failure modes has been observed, written up by an incident-response firm, and turned into a CISA or NSA advisory. The cost of re-learning them is paid in real customer data.

This page is organised around the canonical IR lifecycle (preparation, detection and analysis, containment and eradication and recovery, and post-incident activity) adapted to cloud-specific containment patterns, forensics primitives, communication chains, and recovery procedures. Each section forward-links to the principles pages that own the day-zero controls IR depends on: logging and detection for the alert pipeline, IAM for the credential isolation primitives, data protection for the immutable backups recovery relies on, and the cloud threat model for the attack chains containment must close.

IR lifecycle

The lifecycle terminology this corpus uses is the four-phase model that appears in both NIST SP 800-61 Rev 2 (Computer Security Incident Handling Guide, August 2012) and its successor NIST SP 800-61 Rev 3 (Incident Response Recommendations and Considerations for Cybersecurity Risk Management: A CSF 2.0 Community Profile, April 2025). Rev 3 reframes the same lifecycle through the lens of the NIST Cybersecurity Framework 2.0 outcomes (Govern, Identify, Protect, Detect, Respond, Recover) but preserves the operational vocabulary that practitioners use: Preparation; Detection and Analysis; Containment, Eradication, and Recovery; and Post-Incident Activity. Where this page cites the lifecycle, it cites Rev 3 as the current authoritative publication; teams whose runbooks are still Rev 2-aligned should read the Rev 3 transition note in the publication's introduction.

Preparation is the work done before the alert fires: break-glass accounts pre-provisioned, runbooks rehearsed, communication channels exercised, forensic evidence-collection accounts pre-warmed. The quality of the incident response is largely determined here. In cloud, the preparation surface includes identity (break-glass, MFA enrolment, IdP independence), structure (security-dedicated accounts, subscriptions, projects, compartments), and process (runbooks specific to the cloud control plane).

Detection and analysis begins with the alert pipeline documented in the General logging and detection principles page. Cloud detection is unusually rich, since CloudTrail / Activity Log / Cloud Audit Logs / OCI Audit capture the entire control-plane history, and unusually fragile, because an attacker who reaches the audit configuration can suppress the very feed that would detect them. Analysis triages the alert against the threat model on threat-model.html, assigns a severity (P0/P1/P2/P3/P4 in the convention this corpus uses), and starts the IR clock.

Containment, eradication, and recovery are three sub-phases that overlap in practice. Containment isolates the compromise without destroying the evidence (the misconfiguration callout in §Containment patterns walks through the canonical "do not delete the principal" rule). Eradication removes the adversary's footholds: invalidated tokens, removed back-door identities, rotated KMS material, revoked OAuth consent grants. Recovery restores service from a known-good state, drawing on the immutable backup primitives documented in the General data protection principles page.

Post-incident activity closes the loop: root-cause analysis, control-gap identification, runbook updates, posture re-baselining. The output of this phase is a backlog of preventive and detective controls that feed back into the hardening programme. An incident that does not produce a backlog item, at minimum "this should have been detected sooner" or "this should have been impossible", has not finished its lifecycle.

Preparation

Preparation is where cloud incident response is won or lost. The work is unglamorous: runbooks, accounts, contact lists, rehearsal. The principles below are the floor below which IR readiness should not fall.

Runbooks pre-positioned, per-scenario, and rehearsed. A runbook is a step-by-step procedure for a specific class of incident: compromised IAM user, compromised role with AssumeRole privileges, ransomware on an EC2 fleet, accidentally-public storage bucket, OAuth consent-phishing across the tenant, exposed credentials in a public Git repository. Each runbook names the responsible role (incident commander, scribe, IAM analyst, network analyst, legal liaison), the first ten minutes of actions, the escalation criteria, and the success criteria for closure. A runbook that has never been rehearsed in a tabletop is a first-draft document, not a runbook.

Break-glass accounts maintained with hardware MFA. Every cloud has scenarios in which the federated identity provider, the conditional-access policy, or the IAM control plane is the thing under attack. The break-glass account is the local-to-the-cloud identity that bypasses the federation path and admits the responder regardless. It must exist before the incident; it must enrol a hardware MFA token whose backup is stored in physical custody (a safe, a sealed envelope in legal counsel's office); its credentials must be tested quarterly; and any use must generate a CRITICAL alert. The illustrative control in §Illustrative control formalises this in DS-05 markup.

Security-dedicated account, subscription, project, and compartment. The detection pipeline, the forensic-evidence retention bucket, the SIEM, the incident-response runner: none of these should live in the account under attack. AWS Organizations security-tooling account, Azure landing-zone management subscription, GCP security-foundations folder/project, OCI security compartment: each provider has the construct, and the structural separation pays out the first time an attacker reaches a workload account and the SIEM keeps writing.

Out-of-band communication. The chat platform, ticketing system, and pager rotation the organisation uses every day are themselves cloud-hosted and themselves potential incident scope. The IR plan names a fallback channel (a managed Signal group, a Wickr equivalent, a phone tree, a Matrix server outside the affected provider) and exercises it. Trying to discover the fallback during the incident is the worst possible time to discover it does not work.

Legal, regulatory, and external-party contact lists maintained. The cyber-insurance carrier's incident hotline, the FBI and CISA points of contact, the data protection authority for every jurisdiction the organisation operates in, outside counsel, and the public-relations escalation chain are all required reading on day one of an incident. Breach-notification clocks are unforgiving: the GDPR Article 33 obligation is seventy-two hours from awareness for personal-data breaches; CIRCIA (the Cyber Incident Reporting for Critical Infrastructure Act, with CISA's implementing rule in finalisation at writing time; verify current status against the CISA CIRCIA page) imposes a seventy-two-hour reporting window for covered cyber incidents and a twenty-four-hour window for ransom payments at critical-infrastructure entities once the rule is in force. Sector regulators (HIPAA, PCI DSS, NYDFS, MAS-TRM) impose their own windows on top.

Containment patterns

Cloud containment differs from on-premise containment because the unit of isolation is an IAM construct, a VPC route, or a security-group rule rather than a network cable. The patterns below are the canonical containment moves; the per-provider IR pages document the exact CLI and IaC.

Credential isolation. When a human or workload identity is suspected compromised, the goal is to neutralise the credential while preserving everything an investigation will need. The canonical move is to attach a Deny-All inline policy to the principal, revoke active sessions (AWS GlobalSignOut and StsRevokeOldSessions, Azure AD revoke-signin-sessions, Google Cloud revoke OAuth grants, OCI session termination), rotate any static credential the principal holds, and leave the principal itself in place. Deleting the principal seems decisive but deletes the evidence trail that maps actions to actors and breaks the forensic timeline at exactly the moment it is most needed.

Misconfiguration: "just delete the compromised principal." Deleting a compromised user, role, service principal, service account, or IAM resource principal during containment destroys forensic evidence. Audit-log entries that reference the principal lose their resolvable name; policy-evaluation traces lose their context; the timeline of "which actions did this identity actually perform" becomes harder to reconstruct. The correct move is to attach a Deny-All policy, revoke active sessions, rotate any static credentials, and leave the principal in place until the investigation is closed. Preserve first; eradicate second.

Workload isolation. A compromised VM or container is moved to a forensic isolation network (an AWS forensic VPC with no egress, an Azure forensic VNet with NSG deny-all, a GCP forensic VPC with firewall deny-all egress, an OCI forensic VCN with Security List deny-all) and snapshotted before any remediation runs. The snapshot is the artefact the forensic investigation works against; the running workload is held in isolation long enough to capture volatile state (process tree, network connections, memory image) where the tooling permits, and then terminated rather than left in service. Termination without a snapshot destroys the evidence; live remediation contaminates it.

Blast-radius assessment. Once the immediate containment is in place, the question shifts to "what else did this principal or workload touch?" The blast-radius framework documented in the cloud threat model page (single resource, single account, organisation, cross-tenant) drives the next containment moves. A compromised IAM user with read-only privileges is one scope; a compromised role with cross-account AssumeRole into a payments environment is a vastly larger scope, and the containment work expands to match.

Forensics

Forensic evidence in cloud is built from three substrates: block-storage snapshots (EBS snapshots, Azure managed-disk snapshots, GCP persistent-disk snapshots, OCI block-volume backups), the control-plane audit log (CloudTrail, Azure Activity Log and Microsoft Entra audit logs, Google Cloud Audit Logs, OCI Audit), and where supported, memory acquisition from running workloads. NIST SP 800-86 (Guide to Integrating Forensic Techniques into Incident Response, August 2006) is the canonical reference for evidence handling; its core principles, chain of custody, original preservation, and working-copy analysis, apply unchanged to cloud artefacts.

Block-storage snapshots are the cheapest and most reliable piece of cloud forensic evidence. Taken before any remediation runs, a snapshot freezes the state of the compromised volume at a point in time; the snapshot can be attached read-only to an isolated forensic VM in the security-dedicated account, hashed for chain-of-custody, and copied to write-once forensic storage with object-lock retention. The forensic environment is itself a piece of preparation, not improvisation: an account or subscription dedicated to incident response, pre-warmed with the analysis tooling (Volatility, Autopsy, the SANS SIFT workstation image), with retention configured so that an attacker who later reaches the security account cannot tamper with the evidence.

Memory acquisition is harder. Microsoft Defender for Servers Plan 2 includes memory-dump capabilities for Windows VMs; AWS Lambda-based memory acquisition with frameworks like Margarita Shotgun or fmem works for Linux; GCP and OCI rely on agent-based capture configured in advance. Memory is where post-exploitation tooling lives (in-memory loaders, unbacked code, decrypted secrets) and is the most volatile evidence: it is captured live or lost.

The audit log is the timeline. CloudTrail Lake, Azure Activity Log archived to a Log Analytics workspace and to a tamper-evident storage account, Cloud Audit Logs piped to a dedicated logs project and BigQuery dataset, OCI Audit archived to Object Storage with retention rules: each provides a tamper-evident, queryable record of every control-plane action. Chain-of-custody for the audit log means knowing that nothing in the path from API call to archived record allows undetected modification, which is why log-integrity validation (CloudTrail log file validation, immutable storage accounts, Cloud Logging's integrity guarantees, OCI Audit retention rules) is treated as a foundational control on the General logging and detection principles page.

Exporting evidence for legal or regulatory purposes follows a documented procedure: hash before export, package with metadata describing provenance and acquisition method, transfer over an authenticated channel to legal counsel or the receiving regulator, and retain the original copy in forensic storage. The procedure is the same whether the recipient is in-house counsel, outside counsel, a regulator, or law enforcement.

Communication

Internal communication during an incident is driven by an incident commander (a single accountable role with decision-making authority) and a severity classification that bounds the response (P0 = active business-critical impact, all hands; P1 = serious but bounded impact, on-call plus subject-matter experts; P2 = limited impact, normal working hours; P3/P4 = informational, follow-up only). The incident commander is not necessarily the most senior technical responder; the role is coordination, not investigation. Scribes maintain the running log; subject-matter experts execute containment and remediation; the incident commander makes the call to escalate, to communicate, and to close.

External communication runs through a legal-counsel approval gate. Every statement to a regulator, a customer, a law enforcement agency, or the public passes through counsel before transmission, because every statement creates legal exposure. The CSA Cloud Incident Response Framework formalises this with explicit roles for legal counsel, the data protection officer, and the communications lead, and recommends pre-drafted templates for the common notification scenarios (customer notification, regulatory breach notice, public statement) so the drafting cycle is short during the live incident.

Regulatory notification windows are unforgiving and jurisdiction-specific. The seventy-two-hour GDPR Article 33 clock starts at organisational awareness of a personal-data breach. CIRCIA's seventy-two-hour reporting window for covered cyber incidents at covered entities starts at reasonable belief that a covered incident has occurred, with a twenty-four-hour clock for ransom payments. Sector regulators stack their own clocks. The notification matrix for the jurisdictions and sectors the organisation operates in is part of preparation, not part of the incident.

Recovery and post-incident

Recovery restores service from a state the organisation trusts. The substrate is immutable backups whose canonical treatment lives on the General data protection principles page §Retention, backup, and recovery: S3 Object Lock with compliance-mode retention, Azure Backup immutable vaults, Cloud Storage object retention policies and bucket lock, OCI Object Storage retention rules. The ransomware-resistant property of these substrates is that the credentials available to a compromised account cannot shorten the retention window or delete the backup. Recovery procedures restore from these backups, not from snapshots that the attacker had time to encrypt or delete.

Root-cause analysis follows recovery. The output is a written incident report with a timeline (when did detection fire, when did containment apply, when did the adversary have access), a root-cause statement (the specific misconfiguration, control gap, or process failure that allowed the incident), and a remediation backlog (preventive and detective controls that would have closed the gap). The remediation backlog is not aspirational; each item has an owner, a target date, and a tracking ticket.

Posture re-baselining closes the loop with the configuration management substrate. Microsoft Secure Score, AWS Security Hub security score, Google Security Command Center Premium security posture, and OCI Cloud Guard problem counts each provide a quantitative posture metric; the incident's remediation backlog should produce a measurable improvement in that metric. An incident whose backlog does not move the posture metric was probably not analysed deeply enough.

Tabletop exercises and game days

Runbooks that have never been exercised are first drafts. Quarterly tabletop exercises (facilitated scenarios where the IR team walks a hypothetical incident through the runbook without touching production) surface stale contact lists, ambiguous escalation paths, and runbook steps that do not survive contact with reality. Annual purple-team exercises, in which a red team exercises real attack chains against a production-equivalent environment while the blue team responds against real runbooks, are the higher-cost, higher-value complement. Both feed back into preparation: every exercise produces a list of runbook edits, contact-list corrections, and control gaps, and every list closes before the next exercise.

Cross-provider equivalence

The principles above map to provider-specific products and patterns. The table below is a navigation aid, not a compliance crosswalk; per-provider depth lives in the IR pages of each provider section.

Capability	AWS	Azure	GCP	OCI
Detection aggregation	GuardDuty + Security Hub	Microsoft Defender for Cloud + Microsoft Sentinel	Security Command Center (Premium / Enterprise)	Oracle Cloud Guard
Block-storage snapshot	EBS snapshot	Managed Disk snapshot	Persistent Disk snapshot	Block Volume backup / clone
Forensic network isolation	Forensic VPC + SCP deny-all on principal	Forensic VNet + NSG deny-all + Conditional Access block	Forensic VPC + firewall deny-all + IAM Deny policy	Forensic VCN + Security List deny-all + IAM Deny statement
Break-glass identity pattern	Root user with hardware MFA + IAM Identity Center emergency-access role	Global Administrator break-glass account excluded from Conditional Access	Organisation Admin break-glass group with hardware MFA	Tenancy Administrator break-glass user in Default identity domain

Illustrative control: pre-positioned break-glass account

The control below illustrates the canonical <article class="control-box"> markup with a CRITICAL PREVENTIVE pairing. It is provider-neutral; each provider's IR page restates the same intent with provider-specific CLI and IaC. The control mitigates the scenario in which an attacker (or, more commonly, a misconfigured Conditional Access policy or expired federation certificate) locks out the very responders who would otherwise contain the incident.

gen-ir-ex-01

Maintain pre-positioned break-glass account with hardware MFA

CRITICAL PREVENTIVE

MITIGATES Total IAM lockout during incident response: an attacker who reaches federated-identity infrastructure, a misconfigured Conditional Access policy that excludes every administrator, or an expired federation certificate makes the cloud control plane unreachable through the normal authentication path; the break-glass account is the local-to-the-cloud identity that admits responders regardless.

ATTACK VECTOR Adversary compromises the identity provider (Entra ID, Okta, Google Workspace, OCI federation source) and forces a malicious Conditional Access policy; or organisational error (expired SAML signing certificate, accidental administrator removal, MFA registration failure) locks every administrator out. Without a break-glass account whose authentication does not depend on the broken element, the responder has no path back into the tenancy.

BLAST RADIUS If the control is absent and the lockout scenario materialises, the entire cloud control plane is unreachable for the duration of the federation-recovery process (hours at best, days at worst), during which detection feeds may continue to fire on an actively progressing incident with no human able to act on them.

The control provisions one (preferably two, for geographically separated custody) break-glass identity per cloud tenant or organisation: AWS root user plus an IAM Identity Center emergency-access role; Microsoft Entra Global Administrator excluded from Conditional Access policies that could lock it out; Google Cloud organisation administrator in a dedicated break-glass group; OCI Tenancy Administrator in the Default identity domain. Each account enrols a hardware FIDO2 / WebAuthn authenticator whose backup is stored in sealed physical custody (a safe, outside counsel's office, an envelope in a bank deposit box). The account's credentials are tested quarterly via an in-rehearsal sign-in to a non-destructive read-only scope, and every use generates a CRITICAL alert routed through the security findings substrate and an out-of-band channel. The account is excluded from the federation path that day-to-day administrators traverse; the entire point is that it works when federation does not.

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
Root user hardware MFA + IAM Identity Center separation recommendations (verify section number against pinned version)	Global Administrator role separation and break-glass exclusion recommendations (verify section number)	Organisation-admin separation recommendations (verify section number)	Tenancy Administrator separation recommendations (verify section number)	IR-1 (Policy and Procedures), IR-4 (Incident Handling), CP-2 (Contingency Plan)	A.5.24 — Information security incident management planning and preparation; A.5.29 — Information security during disruption	CLD.6.3.1 — Shared roles and responsibilities within a cloud computing environment