AWS Incident Response Hardening

Overview

This page covers Amazon Web Services incident response (IR) hardening — the controls that decide whether the organisation can contain, investigate, and recover from an AWS-resident incident inside a defensible time window, and whether the resulting forensic record will hold up to subsequent regulatory or legal scrutiny. Scope is the AWS commercial regions; AWS GovCloud (US) and the China regions inherit the same controls but route through different partition endpoints (for example IAM under iam.amazonaws-us-gov.com and CloudTrail Lake event data stores that cannot replicate cross-partition). Re-verify partition caveats before applying any of the IaC below to a non-commercial region.

Cross-cutting IR lifecycle principles — preparation, detection, containment, eradication, recovery, lessons learned, and evidence preservation — are documented on the General Incident Response page against NIST SP 800-61 rev 3 (April 2025 CSF 2.0 community profile). This page does not re-author the lifecycle; it maps the lifecycle to AWS primitives and to the specific posture controls that make each lifecycle phase executable in an AWS account. Severity assignments follow the rubric documented in methodology; equivalence callouts at the bottom of each control point to the matching control on the Azure, GCP, and OCI sibling pages so a reader can compare break-glass, automated-response, and forensic-retention models across providers.

AWS IR posture splits cleanly into two stacks. The detective stack — CloudTrail, GuardDuty, Security Hub, AWS Config, VPC Flow Logs, CloudWatch alarms — lives on the AWS Logging page; it is what tells you an incident is happening. The responsive stack — break-glass identities, EventBridge + Lambda containment automation, S3 Object Lock evidence buckets, CloudTrail Lake forensic event data stores, and documented runbooks — lives here; it is what you do once you know. The handoff between the two stacks is concrete and AWS-internal: a GuardDuty finding crossing the severity≥7 threshold fires an EventBridge rule that invokes a Lambda quarantine function (aws-ir-02); a forensic question about who-touched-what at 03:00 last Tuesday is answered by a CloudTrail Lake SQL query (aws-ir-04). Every IR control on this page assumes the corresponding logging control on the AWS Logging page is in place; if it is not, the IR control degrades to a manual playbook with insufficient telemetry to drive it.

Order matters. Control 01 is the preparation invariant that gates everything else: without a pre-provisioned break-glass identity, the very first incident that takes out the federated identity provider — exactly the scenario IR exists to handle — locks responders out of the AWS account at the moment they need it most. Control 02 is the automation layer that compresses time-to-contain from human-response-time minutes to seconds. Control 03 is the evidence invariant — without write-once-read-many evidence storage, an attacker with sufficient privileges can erase the very logs that would prove what happened. Controls 04–06 are the responsive playbooks themselves: CloudTrail Lake for retrospective forensic SQL, EC2 isolation for "we think this instance is compromised, take it off the network without destroying state", and credential rotation for "an access key landed in a public GitHub repo". Control 07 (optional) closes the lessons-learned loop with quarterly tabletop exercises so the playbooks above are tested before they are needed.

One housekeeping note on the compliance table that follows every control. Most IR controls are playbook-driven and process-bound rather than state-driven — CIS Foundations Benchmarks across all four providers are weighted toward configurable state (encryption, public access, logging enabled) and only lightly cover the IR domain. Expect the CIS columns on this page to read (best-practices) or n/a for most controls; NIST SP 800-53 rev5 IR family (IR-4 Incident Handling, IR-5 Incident Monitoring, IR-6 Incident Reporting, IR-8 Incident Response Plan, plus AU-9 / AU-11 for evidence) and ISO/IEC 27001:2022 (A.5.24 information-security incident management, A.5.26 response to incidents, A.5.28 collection of evidence) are the primary mappings. The compliance-frameworks page explains why each row still carries the same seven framework columns even when several read n/a — the column layout is corpus-wide for diff-grade reading across domains.

aws-ir-01-break-glass-account ! CRITICAL PREVENTIVE

Pre-provision at least two break-glass identities that are reachable when the organisation's primary federated identity provider (IAM Identity Center backed by Okta, Entra ID, Ping, or Google Workspace) is unavailable, compromised, or otherwise unusable. The canonical pattern is a dedicated IAM Identity Center permission set bound to a small number (two to four) of named human responders whose user records live inside Identity Center rather than being federated from the IdP, each protected by a hardware MFA device (YubiKey or equivalent) physically stored in two separate locked safes in two separate buildings. Every console sign-in or CLI assume-role with the break-glass permission set fires a CloudWatch alarm to SNS, PagerDuty, and the security on-call channel within seconds (AWS Security Incident Response Guide — Preparation (accessed 2026-05)). The same alarm fabric is documented in aws-log-07; this control adds the break-glass-specific filter.

The principle is reinforced in General IR — preparation: the very first incident that takes out the IdP is exactly the scenario IR exists to handle, and an IdP-only access model has zero recovery path in that scenario. Quarterly access tests — a named responder retrieves their YubiKey from the safe, signs into the AWS Organization management account, performs a single read-only API call, signs out — keep the credential, the MFA device, and the alarm pipeline all known-working. Tests that have not been performed in the last 90 days are tracked on the security team's drift dashboard.

Remediation — AWS CLI

# Create a dedicated break-glass permission set in IAM Identity Center.
# Session duration deliberately short (1 hour) so a forgotten session expires fast.
aws sso-admin create-permission-set \
  --instance-arn "$IDC_INSTANCE_ARN" \
  --name BreakGlassAdmin \
  --description "Emergency-only; every use alarms" \
  --session-duration PT1H

# Attach the AWS managed AdministratorAccess policy (break-glass needs full reach).
aws sso-admin attach-managed-policy-to-permission-set \
  --instance-arn "$IDC_INSTANCE_ARN" \
  --permission-set-arn "$PS_ARN" \
  --managed-policy-arn arn:aws:iam::aws:policy/AdministratorAccess

# Provision the named human responder as a non-federated Identity Center user.
aws identitystore create-user \
  --identity-store-id "$IDS_ID" \
  --user-name break-glass-responder-01 \
  --display-name "Break-Glass Responder 01" \
  --emails Value=ir+bg01@example.com,Type=Work,Primary=true \
  --name FamilyName=Responder,GivenName=BreakGlass

# Enrol the hardware MFA device. The TOTP/U2F enrolment uses the Identity Center
# console; CLI enrolment of FIDO2 hardware tokens is not currently supported and
# must be completed interactively by the human responder during initial setup.

# CloudWatch alarm on every assume-role with the break-glass permission set.
aws logs put-metric-filter \
  --log-group-name aws-cloudtrail-logs \
  --filter-name BreakGlassAssumeRole \
  --filter-pattern '{ $.eventName = "AssumeRoleWithSAML" && $.requestParameters.roleArn = "*BreakGlassAdmin*" }' \
  --metric-transformations metricName=BreakGlassUse,metricNamespace=Security,metricValue=1

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS Security Incident Response Guide (accessed 2026-05)
# Break-glass permission set in IAM Identity Center.
resource "aws_ssoadmin_permission_set" "break_glass" {
  name             = "BreakGlassAdmin"
  description      = "Emergency-only; every use alarms"
  instance_arn     = local.idc_instance_arn
  session_duration = "PT1H"
}

resource "aws_ssoadmin_managed_policy_attachment" "break_glass_admin" {
  instance_arn       = local.idc_instance_arn
  managed_policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
  permission_set_arn = aws_ssoadmin_permission_set.break_glass.arn
}

# Named non-federated responder identities.
resource "aws_identitystore_user" "break_glass" {
  for_each          = toset(["bg01", "bg02"])
  identity_store_id = local.identity_store_id
  user_name         = "break-glass-responder-${each.key}"
  display_name      = "Break-Glass Responder ${each.key}"

  name {
    family_name = "Responder"
    given_name  = "BreakGlass"
  }
  emails {
    value   = "ir+${each.key}@example.com"
    type    = "Work"
    primary = true
  }
}

# Alarm on every assume-role with the break-glass permission set.
resource "aws_cloudwatch_log_metric_filter" "break_glass_use" {
  name           = "BreakGlassAssumeRole"
  log_group_name = "aws-cloudtrail-logs"
  pattern        = "{ $.eventName = \"AssumeRoleWithSAML\" && $.requestParameters.roleArn = \"*BreakGlassAdmin*\" }"

  metric_transformation {
    name      = "BreakGlassUse"
    namespace = "Security"
    value     = "1"
  }
}

resource "aws_cloudwatch_metric_alarm" "break_glass_use" {
  alarm_name          = "break-glass-use"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = 1
  threshold           = 1
  period              = 60
  metric_name         = "BreakGlassUse"
  namespace           = "Security"
  statistic           = "Sum"
  treat_missing_data  = "notBreaching"
  alarm_actions       = [aws_sns_topic.security_oncall.arn]
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: Break-glass IAM role gated by MFA and audited via CloudTrail; assumable only from an emergency-responder principal.
Parameters:
  EmergencyResponderArn:
    Type: String
Resources:
  BreakGlassRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: break-glass-emergency
      MaxSessionDuration: 3600
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              AWS: !Ref EmergencyResponderArn
            Action: sts:AssumeRole
            Condition:
              Bool:
                aws:MultiFactorAuthPresent: 'true'
              NumericLessThan:
                aws:MultiFactorAuthAge: '900'
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AdministratorAccess

Remediation — AWS CDK (TypeScript)

import * as cdk from 'aws-cdk-lib';
import { aws_iam as iam } from 'aws-cdk-lib';
import { Construct } from 'constructs';

export interface BreakGlassProps extends cdk.StackProps {
  emergencyResponderArn: string;
}

export class BreakGlassRoleStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props: BreakGlassProps) {
    super(scope, id, props);

    new iam.Role(this, 'BreakGlassRole', {
      roleName: 'break-glass-emergency',
      maxSessionDuration: cdk.Duration.hours(1),
      assumedBy: new iam.ArnPrincipal(props.emergencyResponderArn).withConditions({
        Bool: { 'aws:MultiFactorAuthPresent': 'true' },
        NumericLessThan: { 'aws:MultiFactorAuthAge': '900' },
      }),
      managedPolicies: [
        iam.ManagedPolicy.fromAwsManagedPolicyName('AdministratorAccess'),
      ],
    });
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
(best-practices)n/an/an/a IR-4; AC-2(8); AC-6A.5.24; A.5.26CLD.9.5.1

Log signals

  • CloudTrail console-sign-in events for the dedicated break-glass IAM user — the steady-state usage rate is exactly zero and any login is by definition a break-glass activation that must be reconciled against the on-call documented ticket within minutes.
  • CloudTrail sts:AssumeRole events whose roleArn matches the break-glass role's ARN and whose sourceIPAddress is outside the documented break-glass jump-host CIDR set.
  • CloudTrail iam:CreateAccessKey on the break-glass user — the documented operating model says the break-glass user has hardware-MFA console access only; programmatic-key creation indicates either a deliberate posture deviation or an attacker pivoting from a stolen session.

Query

fields @timestamp, eventName, userIdentity.arn, sourceIPAddress, requestParameters.roleArn, responseElements.ConsoleLogin
          | filter (eventName = "ConsoleLogin" and userIdentity.arn like /:user\/break-glass/)
            or (eventName = "AssumeRole" and requestParameters.roleArn like /:role\/BreakGlass/)
            or (eventName = "CreateAccessKey" and requestParameters.userName = "break-glass")
          | sort @timestamp desc
          | limit 50

The CloudWatch Logs Insights query routes all three break-glass signals into a single result set; downstream PagerDuty integration should treat any row as severity-1 by default and rely on the documented-ticket reconciliation flow to downgrade false positives.

Alert threshold

  • Any console-sign-in by the break-glass user — page immediately and ring the on-call IC; the legitimate usage rate is so low that even a single login warrants out-of-band confirmation from the named hardware-MFA holder.
  • An AssumeRole on the break-glass role from outside the documented jump-host CIDR — page; treat as confirmed compromise of either the user's MFA hardware or the jump-host until proven otherwise.
  • Any CreateAccessKey on the break-glass user — page; this is the canonical signal of an attacker trying to establish persistence after a successful break-glass authentication.

Initial response

  1. Confirm the activation against the documented break-glass ticket via out-of-band channel (phone call to the named hardware-MFA holder, not Slack); if the holder did not authenticate, treat as confirmed compromise and disable the user immediately via aws iam update-user --user-name break-glass --no-cli-input-json + aws iam attach-user-policy --policy-arn arn:aws:iam::aws:policy/AWSDenyAll.
  2. Pull the full CloudTrail trail of the break-glass session — every API call made under the user's session — and reconstruct intent against the documented break-glass scenario; any deviation from the documented scenario is a forensic data point.
  3. After the break-glass episode closes, rotate the user's password and MFA registration via the documented rotation playbook; the hardware MFA device may need re-enrolment depending on the org's break-glass post-use protocol described in general/ir.html.

References

Equivalent on: Azure · GCP · OCI

aws-ir-02-eventbridge-auto-containment ! HIGH RESPONSIVE

Wire EventBridge rules in the security-tooling account (the delegated GuardDuty administrator) so that every GuardDuty finding with severity >= 7 — the CRITICAL band in GuardDuty's 1–10 scale — invokes a Lambda quarantine function within seconds. The quarantine function performs three deterministic actions in order: replace the offending EC2 instance's security groups with a single deny-all SG, detach the IAM instance profile, and create an EBS snapshot of every attached volume tagged with the GuardDuty finding ID for forensic chain-of-custody (AWS Incident Response Playbooks repository (accessed 2026-05)). The detective half of this loop lives on the AWS Logging page as aws-log-04-guardduty-org; without org-wide GuardDuty with all data sources enabled, this control has nothing to fire on.

EventBridge is preferred over GuardDuty's built-in "auto-archive" or "auto-suppression" features because it gives the security team a programmable handoff: the rule can route by finding type, by resource type, by account ID, by severity, or by an arbitrary JSON-path expression on the finding payload, and the downstream target can be Lambda, Step Functions, SNS, SQS, or a partner SaaS via API Destinations. The 1-minute SLO on EventBridge delivery (Amazon EventBridge quotas (accessed 2026-05)) bounds time-to-contain at low single-digit minutes for the entire automation chain — finding emission to quarantine completion.

Remediation — AWS CLI

# EventBridge rule on GuardDuty severity>=7 findings.
aws events put-rule \
  --name guardduty-critical-findings \
  --event-pattern '{
    "source": ["aws.guardduty"],
    "detail-type": ["GuardDuty Finding"],
    "detail": { "severity": [{ "numeric": [">=", 7] }] }
  }' \
  --state ENABLED

# Target: the quarantine Lambda. EventBridge passes the full finding as input.
aws events put-targets \
  --rule guardduty-critical-findings \
  --targets "Id=quarantine,Arn=arn:aws:lambda:eu-west-1:111111111111:function:gd-quarantine"

# Grant EventBridge permission to invoke the Lambda.
aws lambda add-permission \
  --function-name gd-quarantine \
  --statement-id eventbridge-invoke \
  --action lambda:InvokeFunction \
  --principal events.amazonaws.com \
  --source-arn arn:aws:events:eu-west-1:111111111111:rule/guardduty-critical-findings

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS IR Playbooks repository (accessed 2026-05)
# Deny-all SG that quarantined instances are attached to.
resource "aws_security_group" "quarantine" {
  name        = "ir-quarantine"
  description = "Deny-all SG for quarantined instances; no ingress or egress rules"
  vpc_id      = aws_vpc.workload.id
  tags        = { Purpose = "ir-quarantine" }
}

# Quarantine Lambda — reads the finding, snapshots EBS, detaches instance
# profile, swaps SGs. Function source lives in the security tooling repo.
resource "aws_lambda_function" "gd_quarantine" {
  function_name = "gd-quarantine"
  role          = aws_iam_role.gd_quarantine.arn
  handler       = "index.handler"
  runtime       = "python3.12"
  filename      = "build/gd-quarantine.zip"
  timeout       = 60

  environment {
    variables = {
      QUARANTINE_SG_ID = aws_security_group.quarantine.id
    }
  }
}

# EventBridge rule on GuardDuty severity>=7.
resource "aws_cloudwatch_event_rule" "gd_critical" {
  name = "guardduty-critical-findings"
  event_pattern = jsonencode({
    source        = ["aws.guardduty"]
    "detail-type" = ["GuardDuty Finding"]
    detail        = { severity = [{ numeric = [">=", 7] }] }
  })
}

resource "aws_cloudwatch_event_target" "gd_quarantine" {
  rule      = aws_cloudwatch_event_rule.gd_critical.name
  target_id = "quarantine"
  arn       = aws_lambda_function.gd_quarantine.arn
}

resource "aws_lambda_permission" "eventbridge_invoke" {
  statement_id  = "eventbridge-invoke"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.gd_quarantine.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.gd_critical.arn
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: EventBridge rule routing high-severity GuardDuty findings to an SSM Automation runbook.
Parameters:
  ContainmentRoleArn:
    Type: String
Resources:
  HighSevGuardDutyRule:
    Type: AWS::Events::Rule
    Properties:
      Name: guardduty-high-sev-containment
      EventPattern:
        source:
          - aws.guardduty
        detail-type:
          - GuardDuty Finding
        detail:
          severity:
            - numeric: ['>=', 7]
      Targets:
        - Id: ssm-containment
          Arn: !Sub 'arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:automation-definition/AWS-IsolateEC2Instance:$DEFAULT'
          RoleArn: !Ref ContainmentRoleArn

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
(best-practices)n/an/an/a IR-4(1); IR-4(7); SI-4(7)A.5.26CLD.12.4.5

Log signals

  • CloudTrail events:DeleteRule targeting the canonical EventBridge rule that fans GuardDuty findings into the auto-containment Lambda — destroys the auto-response pipeline at the routing layer, leaving findings flowing to the Security Hub queue but not triggering any automated action.
  • CloudTrail events:DisableRule on the same rule — leaves the rule visible but inert; the absence-of-fire signal is harder to spot than a deletion because the rule still appears in the EventBridge console.
  • Lambda function-error CloudWatch metric on the auto-containment function exceeding zero over a rolling 5-minute window — the function may be receiving events but failing to execute its containment actions, producing a silent failure mode that the rule-disable detection misses.

Query

fields @timestamp, eventName, requestParameters.name, requestParameters.eventBusName, requestParameters.functionName, userIdentity.arn
          | filter eventSource in ["events.amazonaws.com","lambda.amazonaws.com"] and eventName in ["DeleteRule","DisableRule","RemoveTargets","DeleteFunction","UpdateFunctionConfiguration"]
          | filter requestParameters.name like /guardduty-/ or requestParameters.functionName like /^auto-containment-/
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query covers both EventBridge-side and Lambda-side mutations; for the silent-failure case, pair with a CloudWatch alarm on the Lambda's Errors metric with a threshold at zero so any error fires within one evaluation period.

Alert threshold

  • Any DeleteRule on the canonical GuardDuty fan-out rule — page immediately; the auto-response capability vanishes at the moment of the delete and every subsequent finding lacks the documented containment hook.
  • DisableRule on the same — page within 5 minutes; the rule is in the inert state and re-enable is a one-click action that the operator should perform after confirming the intent of the disable.
  • Lambda errors above zero for two consecutive 5-minute periods — high-priority ticket; the function is receiving events but its containment actions are failing, producing a partial-coverage state that may be worse than the all-or-nothing disable cases.

Initial response

  1. Restore the EventBridge rule from IaC with aws events put-rule and re-attach the Lambda target via put-targets; verify the chain end-to-end by injecting a synthetic GuardDuty finding via aws guardduty create-sample-findings and confirming the Lambda fires.
  2. For Lambda-error cases, retrieve the recent CloudWatch Logs entries from the function's log group and identify the root cause (commonly an IAM policy drift on the function's role, a missing target-resource ARN in the containment logic, or a downstream API throttle); fix and redeploy the function from IaC.
  3. Pull the GuardDuty findings backlog (aws guardduty list-findings) for the gap window and manually trigger the containment actions for each finding that the pipeline missed; document the gap-of-coverage as a finding for the next IR post-mortem per general/ir.html.

References

Pair-control: aws-log-04-guardduty-org (detective half). Equivalent on: Azure · GCP · OCI

aws-ir-03-evidence-preservation ! CRITICAL RESPONSIVE

Stand up a dedicated forensic AWS account (separate Organization OU, no shared roles with workload accounts) that owns an S3 evidence bucket configured with Object Lock in Compliance mode and a default retention of at least one year — preferably seven years to align with the CloudTrail Lake retention pattern documented in aws-ir-04. Object Lock Compliance mode is write-once-read-many at the API level: not even the root user of the account can delete or shorten the retention of an object during its retention window (Amazon S3 Object Lock overview (accessed 2026-05)). When an incident is declared, cross-account replication is enabled (or, if pre-configured, simply unpaused) from the CloudTrail log bucket, the VPC Flow Logs bucket, and any GuardDuty / Macie / Security Hub findings export bucket into the forensic account's evidence bucket. The combination — separate account, separate trust, Object Lock Compliance — defeats the credential-compromise-leads-to-log-deletion attacker chain that any same-account log store is vulnerable to.

The principle is documented in General IR — forensics & evidence preservation and codified in the AWS Customer Playbook Framework (accessed 2026-05) evidence-collection playbook. Compliance mode is preferred over Governance mode for evidence: Governance allows users with the s3:BypassGovernanceRetention permission to override the lock, which directly contradicts the threat model — the very privileges an attacker is most likely to acquire are the ones that would let them disable Governance. Compliance mode has no such bypass.

Remediation — AWS CLI

# Create the evidence bucket with Object Lock enabled at create time.
# Object Lock can ONLY be enabled at bucket creation; existing buckets
# cannot be retrofitted via API (must recreate or use AWS Support).
aws s3api create-bucket \
  --bucket ir-evidence-prod-eu-west-1 \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1 \
  --object-lock-enabled-for-bucket

# Enable versioning (required for Object Lock).
aws s3api put-bucket-versioning \
  --bucket ir-evidence-prod-eu-west-1 \
  --versioning-configuration Status=Enabled

# Apply Object Lock Compliance mode with 1-year default retention.
aws s3api put-object-lock-configuration \
  --bucket ir-evidence-prod-eu-west-1 \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": { "Mode": "COMPLIANCE", "Years": 1 }
    }
  }'

# Cross-account replication: CloudTrail log bucket (workload account) replicates
# into the evidence bucket (forensic account). Pre-configured but pausable so
# replication can be triggered explicitly at incident-declaration time.
aws s3api put-bucket-replication \
  --bucket workload-cloudtrail-logs \
  --replication-configuration file://replication.json

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS Customer Playbook Framework — Evidence collection (accessed 2026-05)
# Evidence bucket in the forensic account with Object Lock Compliance mode.
resource "aws_s3_bucket" "evidence" {
  bucket              = "ir-evidence-prod-eu-west-1"
  object_lock_enabled = true
  tags                = { Purpose = "ir-evidence", Account = "forensic" }
}

resource "aws_s3_bucket_versioning" "evidence" {
  bucket = aws_s3_bucket.evidence.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_object_lock_configuration" "evidence" {
  bucket = aws_s3_bucket.evidence.id

  rule {
    default_retention {
      mode  = "COMPLIANCE"
      years = 1
    }
  }
}

# Block all public access on the evidence bucket (defence in depth).
resource "aws_s3_bucket_public_access_block" "evidence" {
  bucket                  = aws_s3_bucket.evidence.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Cross-account replication from workload CloudTrail bucket into evidence.
resource "aws_s3_bucket_replication_configuration" "ct_to_evidence" {
  provider = aws.workload
  bucket   = aws_s3_bucket.workload_cloudtrail.id
  role     = aws_iam_role.replication.arn

  rule {
    id     = "ct-to-evidence"
    status = "Enabled"
    filter {}

    destination {
      bucket        = aws_s3_bucket.evidence.arn
      storage_class = "STANDARD_IA"
      account       = local.forensic_account_id

      access_control_translation { owner = "Destination" }
    }
    delete_marker_replication { status = "Disabled" }
  }
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: WORM-mode evidence S3 bucket with Object Lock (compliance), KMS-CMK encryption, and account BPA.
Parameters:
  EvidenceBucketName:
    Type: String
  EvidenceKmsKeyArn:
    Type: String
Resources:
  EvidenceBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref EvidenceBucketName
      ObjectLockEnabled: true
      ObjectLockConfiguration:
        ObjectLockEnabled: Enabled
        Rule:
          DefaultRetention:
            Mode: COMPLIANCE
            Days: 2555
      VersioningConfiguration:
        Status: Enabled
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - BucketKeyEnabled: true
            ServerSideEncryptionByDefault:
              SSEAlgorithm: aws:kms
              KMSMasterKeyID: !Ref EvidenceKmsKeyArn
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true

Remediation — AWS CDK (TypeScript)

import * as cdk from 'aws-cdk-lib';
import { aws_s3 as s3, aws_kms as kms } from 'aws-cdk-lib';
import { Construct } from 'constructs';

export interface EvidenceBucketProps extends cdk.StackProps {
  evidenceBucketName: string;
  evidenceKmsKeyArn: string;
}

export class EvidenceBucketStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props: EvidenceBucketProps) {
    super(scope, id, props);

    const key = kms.Key.fromKeyArn(this, 'EvidenceKey', props.evidenceKmsKeyArn);

    new s3.Bucket(this, 'EvidenceBucket', {
      bucketName: props.evidenceBucketName,
      objectLockEnabled: true,
      objectLockDefaultRetention: s3.ObjectLockRetention.compliance(cdk.Duration.days(2555)),
      versioned: true,
      encryption: s3.BucketEncryption.KMS,
      encryptionKey: key,
      bucketKeyEnabled: true,
      blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
    });
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
(best-practices)n/an/an/a AU-11; IR-4(7); SI-7A.5.28; A.8.13CLD.12.4.5

Log signals

  • CloudTrail s3:PutBucketLifecycleConfiguration events on the evidence-preservation S3 bucket where the new lifecycle rule introduces a Transition to Glacier or an Expiration at an interval shorter than the org's documented retention floor (typically 7 years for SOC 2, longer for HIPAA).
  • CloudTrail s3:DeleteObject or s3:DeleteObjectVersion within the evidence bucket — the bucket should be Object-Lock-enabled in compliance mode, so any successful delete (rather than a denied delete) indicates either the lock has been compromised or the operator is targeting non-locked objects.
  • CloudTrail s3:PutObjectLegalHold with legalHold.status=OFF on objects previously held — removes the manual hold that supplements the Object Lock retention and frequently used to ready objects for deletion outside the retention window.

Query

fields @timestamp, eventName, requestParameters.bucketName, requestParameters.key, requestParameters.lifecycleConfiguration, requestParameters.legalHold, userIdentity.arn
          | filter eventSource = "s3.amazonaws.com" and eventName in ["PutBucketLifecycleConfiguration","DeleteObject","DeleteObjectVersion","PutObjectLegalHold","PutObjectRetention"]
          | filter requestParameters.bucketName like /-evidence-/ or requestParameters.bucketName like /-forensic-/
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query is bucket-scoped via name-pattern matching; maintain the evidence-bucket name pattern as a managed lookup so the filter does not drift when a new bucket joins the evidence-preservation set.

Alert threshold

  • Any lifecycle-rule change on an evidence bucket — page immediately; the rule shape is a compliance control and any change must trace to a documented retention-policy decision approved by Legal / Compliance.
  • Successful object-delete on an evidence bucket — page; the Object Lock should have prevented the delete and a successful one indicates either lock-mode regression or pre-existing non-locked objects that should not have been in the bucket to begin with.
  • Legal-hold removal — page; the hold is the legal-team's signal that the object must remain available and removal outside a Legal-approved release is a deliberate evidence-tampering attempt.

Initial response

  1. For lifecycle-rule changes, restore from IaC with aws s3api put-bucket-lifecycle-configuration --bucket {name} --lifecycle-configuration file://canonical-lifecycle.json; verify via get-bucket-lifecycle-configuration.
  2. For successful object-deletes, recover from the bucket's versioning history via aws s3api list-object-versions + restore-marker-removal; if the bucket lacked versioning at the time of delete, the object is lost and the incident escalates to confirmed evidence destruction.
  3. Open an incident via general/ir.html and engage Legal / Compliance immediately if any successful evidence-delete is confirmed; the org's regulatory disclosure obligations may require notification of the affected investigation's data subjects.

References

Equivalent on: Azure · GCP · OCI

aws-ir-04-cloudtrail-lake-forensics ! HIGH RESPONSIVE

Stand up a CloudTrail Lake event data store at Organization scope with a seven-year (2557-day) retention period and pre-write a SQL query library that answers the most common forensic questions: which principal touched resource X between time A and time B, what API calls originated from suspicious source-IP S, when was KMS key K last disabled and by whom, which IAM role had its trust policy modified during the incident window. CloudTrail Lake is a columnar store backed by Apache Iceberg that lets responders run ANSI SQL against management events, S3 / Lambda data events, AWS Config configuration items, and Audit Manager evidence with sub-minute query latency on multi-billion-row datasets (AWS CloudTrail Lake documentation (accessed 2026-05)).

The base CloudTrail org-trail control (aws-log-08-cloudtrail-lake) is the detective half: it ensures the Lake store exists and is ingesting. This control is the responsive half: it makes the store usable under IR time pressure by pre-writing and version-controlling the saved-query library so a responder does not have to write SQL from scratch at 03:00 on a Saturday. The two-control split is deliberate — the AWS Logging page owns "the Lake is collecting"; this page owns "the Lake is queryable under stress".

Remediation — AWS CLI

# Create the Organization-wide CloudTrail Lake event data store.
# Retention 2557 days = 7 years; multi-region; org-enabled; termination protection on.
aws cloudtrail create-event-data-store \
  --name org-forensic-edst \
  --multi-region-enabled \
  --organization-enabled \
  --retention-period 2557 \
  --termination-protection-enabled \
  --advanced-event-selectors '[
    { "Name":"All management events",
      "FieldSelectors":[{ "Field":"eventCategory","Equals":["Management"] }] }
  ]'

# Run a saved forensic query: principal activity by IAM role ARN within incident window.
aws cloudtrail-data start-query \
  --query-statement "SELECT eventTime, eventName, sourceIPAddress, requestParameters
                       FROM \$EDSTID
                      WHERE userIdentity.arn = 'arn:aws:iam::111111111111:role/compromised'
                        AND eventTime BETWEEN '2026-05-23T03:00:00Z' AND '2026-05-23T05:00:00Z'
                      ORDER BY eventTime ASC"

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS CloudTrail Lake docs (accessed 2026-05)
resource "aws_cloudtrail_event_data_store" "forensic" {
  name                           = "org-forensic-edst"
  retention_period               = 2557
  multi_region_enabled           = true
  organization_enabled           = true
  termination_protection_enabled = true

  advanced_event_selector {
    name = "All management events"
    field_selector {
      field  = "eventCategory"
      equals = ["Management"]
    }
  }
}

# Pre-saved query: principal activity by IAM role ARN within a time window.
# Stored as code so the saved-query library is reviewable and version-controlled.
resource "aws_cloudtrail_query" "principal_activity" {
  # Placeholder: the AWS provider tracks saved queries via the CloudTrail Lake
  # console as of provider 5.x; teams typically check queries into the repo as
  # .sql files invoked through aws cloudtrail-data start-query in a runbook.
  count = 0
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: CloudTrail Lake event data store retaining org events for 7 years (forensic timeline).
Resources:
  ForensicEventDataStore:
    Type: AWS::CloudTrail::EventDataStore
    Properties:
      Name: forensic-eds
      MultiRegionEnabled: true
      OrganizationEnabled: true
      RetentionPeriod: 2557
      TerminationProtectionEnabled: true
      AdvancedEventSelectors:
        - Name: All management + data events
          FieldSelectors:
            - Field: eventCategory
              Equals: [Management]

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a (post-v3.0.0)n/an/an/a AU-11; IR-4(7)A.5.28CLD.12.4.5

Log signals

  • CloudTrail cloudtrail:StartQuery events whose queryStatement SQL touches sensitive tables (user-management, KMS-key-policy mutations, IAM-role assumptions) issued by a principal outside the SOC / IR allow-list — the Lake query is the forensic discovery mechanism and any out-of-band querier indicates either misconfigured RBAC or an attacker reconnaissance phase.
  • CloudTrail cloudtrail:UpdateEventDataStore shortening retentionPeriod — truncates the forensic window from years to days and is irreversible once committed.
  • Query-volume spike where the count of StartQuery calls in a 24-hour window exceeds the trailing-90-day p99 by more than 3× — passive signal that the Lake is being used at a much higher rate than usual, often the trace of an active investigation but worth confirming against an open incident ticket.

Query

fields @timestamp, eventName, requestParameters.eventDataStore, requestParameters.queryStatement, requestParameters.retentionPeriod, userIdentity.arn, sourceIPAddress
          | filter eventSource = "cloudtrail.amazonaws.com" and eventName in ["StartQuery","UpdateEventDataStore","DeleteEventDataStore"]
          | filter eventName != "StartQuery" or not (userIdentity.arn like /:role\/SOCAnalyst/ or userIdentity.arn like /:role\/IRResponder/)
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query's negative-filter on the SOC / IR role-arn pattern surfaces out-of-band queriers; tune the role-arn pattern against the org's actual analyst-role naming convention rather than treating the example as canonical.

Alert threshold

  • Any StartQuery from a principal outside the SOC / IR allow-list — high-priority ticket within 15 minutes; route to the principal's manager with the query text attached so business justification is captured at query time rather than reconstructed later.
  • UpdateEventDataStore reducing retentionPeriod below the org's compliance floor — page immediately; the truncation is irreversible from the moment it commits and the data lost from the forensic window cannot be recovered from any other source.
  • Query-volume spike above p99 — informational; promote to incident if the queries target sensitive tables and there is no concurrent open IR ticket explaining the activity.

Initial response

  1. For unauthorized StartQuery, immediately disable the issuing principal's role with aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AWSDenyAll --role-name {role}; preserve the query text and any returned data via the Lake's query-result S3 export for forensic chain-of-custody.
  2. For retention shortening, immediately revert via aws cloudtrail update-event-data-store --retention-period {prior-value}; the gap is unrecoverable but any new shortening is pre-empted by the alarm-triggered re-set.
  3. Open an incident per general/ir.html; correlate the query text against the most-recent CloudTrail events the queried filter targeted — the principal is usually trying to scope out what was logged about their own recent activity or to gather intel for the next phase of an attack chain.

References

Pair-control: aws-log-08-cloudtrail-lake (detective half). Equivalent on: Azure · GCP · OCI

aws-ir-05-isolation-playbook-ec2 ! HIGH RESPONSIVE

Document and version-control a five-step EC2 isolation runbook that responders execute when an instance is suspected of compromise but the team needs to preserve in-memory and on-disk state for later investigation rather than terminate-and-replace. The runbook is the human-driven complement to the EventBridge automation in aws-ir-02 — used when the finding is below the auto-quarantine severity threshold, when responders want to take a slower deliberate path, or when the automation path failed and a manual fallback is needed (AWS Incident Response Playbooks repository (accessed 2026-05)).

The five steps in order are: (1) aws ec2 create-snapshot on every attached EBS volume, tagged with the incident ID, before any other action — snapshots are point-in-time and a wrongly-sequenced step 2 can mutate disk state; (2) aws ec2 disassociate-iam-instance-profile to revoke whatever blast radius the instance's IAM role grants; (3) aws ec2 modify-instance-attribute --groups <deny-all-sg> to swap the instance's security groups for the quarantine SG from aws-ir-02; (4) tag the instance with the incident ID and a ir-status=isolated tag for inventory tracking; (5) export the relevant VPC Flow Logs and CloudTrail entries for the instance's ENI and IAM role into the evidence bucket (aws-ir-03). Steps 1–3 are not commutative; document and enforce the order in the runbook.

Remediation — AWS CLI

# Five-step isolation runbook. Execute in order; do not re-order.
INSTANCE_ID=i-0abc123def4567890
INCIDENT_ID=ir-2026-05-23-001

# (1) Snapshot every attached EBS volume BEFORE touching the instance.
for vol in $(aws ec2 describe-instances --instance-ids "$INSTANCE_ID" \
              --query 'Reservations[].Instances[].BlockDeviceMappings[].Ebs.VolumeId' \
              --output text); do
  aws ec2 create-snapshot --volume-id "$vol" \
    --description "IR snapshot for $INCIDENT_ID" \
    --tag-specifications "ResourceType=snapshot,Tags=[{Key=IncidentId,Value=$INCIDENT_ID}]"
done

# (2) Detach the IAM instance profile.
ASSOC=$(aws ec2 describe-iam-instance-profile-associations \
          --filters "Name=instance-id,Values=$INSTANCE_ID" \
          --query 'IamInstanceProfileAssociations[0].AssociationId' --output text)
aws ec2 disassociate-iam-instance-profile --association-id "$ASSOC"

# (3) Swap security groups to deny-all quarantine SG.
aws ec2 modify-instance-attribute \
  --instance-id "$INSTANCE_ID" \
  --groups sg-quarantine0000000

# (4) Tag for inventory tracking.
aws ec2 create-tags --resources "$INSTANCE_ID" \
  --tags "Key=IncidentId,Value=$INCIDENT_ID" "Key=ir-status,Value=isolated"

# (5) Export VPC Flow Logs and CloudTrail entries for this instance into the evidence bucket.
# (Step 5 is environment-specific; see the IR runbooks repo for the canonical script.)

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS IR Playbooks repository (accessed 2026-05)
# Quarantine SG referenced by the isolation runbook (same SG used by aws-ir-02).
resource "aws_security_group" "quarantine_isolation" {
  name        = "ir-quarantine-isolation"
  description = "Deny-all SG used by manual EC2 isolation runbook"
  vpc_id      = aws_vpc.workload.id
  tags        = { Purpose = "ir-quarantine" }
}

# Lambda skeleton invokable by responders that performs the five-step runbook
# atomically (snapshot, detach role, swap SG, tag, export logs). Codifying the
# runbook in Lambda removes the human-order-error risk from steps 1-3.
resource "aws_lambda_function" "ec2_isolation_runbook" {
  function_name = "ir-ec2-isolation-runbook"
  role          = aws_iam_role.ec2_isolation_runbook.arn
  handler       = "index.handler"
  runtime       = "python3.12"
  filename      = "build/ec2-isolation-runbook.zip"
  timeout       = 300

  environment {
    variables = {
      QUARANTINE_SG_ID = aws_security_group.quarantine_isolation.id
      EVIDENCE_BUCKET  = "ir-evidence-prod-eu-west-1"
    }
  }
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: Quarantine security group with zero ingress/egress, used by the EC2 isolation playbook.
Parameters:
  VpcId:
    Type: AWS::EC2::VPC::Id
Resources:
  QuarantineSg:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: ir-quarantine-sg
      GroupDescription: Detaches instance from network during IR isolation; zero ingress/egress.
      VpcId: !Ref VpcId
      SecurityGroupEgress:
        - IpProtocol: icmp
          FromPort: -1
          ToPort: -1
          CidrIp: 127.0.0.1/32

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
(best-practices)n/an/an/a IR-4(2); IR-4(7)A.5.26CLD.9.5.1

Log signals

  • CloudTrail ec2:ModifyInstanceAttribute events that swap an instance's security-group set to the documented containment SG (commonly named sg-quarantine) — direct evidence that the IR playbook has executed against a real instance; the alert here is positive-of-execution rather than negative-of-failure.
  • CloudTrail ec2:CreateSnapshot events on the instance's EBS volumes whose requestParameters.description contains the documented forensic-evidence tag (e.g. incident=INC-{ticket}) — surfaces the evidence-preservation half of the playbook.
  • Absence-of-fire signal: a GuardDuty UnauthorizedAccess or Backdoor finding at HIGH severity on an instance without a corresponding containment SG swap within 15 minutes — indicates the IR auto-response chain (aws-ir-02-eventbridge-auto-containment) failed to fire end-to-end for this incident.

Query

fields @timestamp, eventName, requestParameters.instanceId, requestParameters.groups, requestParameters.description, userIdentity.arn
          | filter eventSource = "ec2.amazonaws.com" and eventName in ["ModifyInstanceAttribute","CreateSnapshot","DetachVolume"]
          | filter requestParameters.groups like /sg-quarantine/ or requestParameters.description like /^incident=INC-/
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query surfaces playbook-execution events; pair with the negative-signal alarm (GuardDuty HIGH finding without playbook-execution within 15 minutes) implemented as an EventBridge rule that triggers a Lambda checking aws-ir-02 pipeline health.

Alert threshold

  • Playbook executed against an instance — informational ack to the IR ticket; the playbook firing is positive-of-execution and should be recorded but not paged.
  • GuardDuty HIGH finding without playbook-execution within 15 minutes — page; the auto-response chain has a coverage gap and the IR on-call must execute the playbook manually while the chain is debugged.
  • Containment SG modified outside the playbook context (e.g. its rules are widened) — page immediately; an attacker frequently targets the quarantine SG to re-establish reachability to instances they were previously isolated from.

Initial response

  1. For executed-playbook events, attach the CloudTrail event-ID and the snapshot-ID to the IR ticket as forensic chain-of-custody; the snapshot is the canonical evidence object for the instance's runtime state at containment time.
  2. For coverage-gap pages, manually execute the playbook: aws ec2 modify-instance-attribute --instance-id {id} --groups sg-quarantine + aws ec2 create-snapshot --volume-id {vol} --description "incident=INC-{ticket}"; the playbook script in the IaC repository captures the exact sequence.
  3. For tampered containment SG, immediately restore the SG rules from IaC and audit CloudTrail for any other rule modifications by the same principal; open an incident per general/ir.html with the tampering itself as the headline event since it indicates active adversary attempting to undo containment.

References

Equivalent on: Azure · GCP · OCI

aws-ir-06-credential-rotation-playbook ! HIGH RESPONSIVE

Document and version-control a four-step credential rotation runbook covering the canonical compromised-credential scenarios: an AWS access key landed in a public GitHub repo, a developer's laptop was lost or compromised, a third-party SaaS that holds AWS credentials was breached, or a SAML/OIDC token from the federated IdP is known or suspected to have been stolen. The runbook integrates the detective signals from CloudTrail Lake (aws-ir-04) with the responsive actions of revocation and dependent-secret rotation (AWS Security Incident Response Guide — Detection and Analysis (accessed 2026-05)).

The four steps in order: (1) Identify — call aws iam get-access-key-last-used on the suspect key (or aws iam list-access-keys on the suspect user) and capture the last-used region and service for triage; (2) Revoke — the safest first move is aws iam update-access-key --status Inactive (reversible if the call turns out to be wrong) followed by aws iam delete-access-key once the team confirms; for SAML tokens, revoke active sessions via aws iam delete-user-policy-style trust-policy edits or the SSO admin DeleteSession API; (3) Enumerate use — run the pre-saved CloudTrail Lake query that lists every API call made by the credential between its creation and the current time, scoped to the incident window; (4) Rotate dependent secrets — anything the compromised principal could read from Secrets Manager, Parameter Store, or KMS-encrypted S3 objects must be rotated, not just revoked, because the attacker has already exfiltrated it.

Remediation — AWS CLI

# Four-step credential rotation runbook.
SUSPECT_KEY=AKIAEXAMPLE123456789
USER_NAME=$(aws iam list-access-keys --query "AccessKeyMetadata[?AccessKeyId=='$SUSPECT_KEY'].UserName | [0]" --output text)

# (1) Identify: last-used region and service for the suspect key.
aws iam get-access-key-last-used --access-key-id "$SUSPECT_KEY"

# (2) Revoke (reversible first): set Inactive.
aws iam update-access-key --user-name "$USER_NAME" \
  --access-key-id "$SUSPECT_KEY" --status Inactive

# (2b) Once confirmed compromised: delete outright.
aws iam delete-access-key --user-name "$USER_NAME" --access-key-id "$SUSPECT_KEY"

# (3) Enumerate API calls made by the credential during the incident window.
aws cloudtrail-data start-query \
  --query-statement "SELECT eventTime, eventName, eventSource, requestParameters
                       FROM \$EDSTID
                      WHERE userIdentity.accessKeyId = '$SUSPECT_KEY'
                        AND eventTime BETWEEN '2026-05-22T00:00:00Z' AND '2026-05-23T12:00:00Z'
                      ORDER BY eventTime ASC"

# (4) Rotate every secret the principal could read.
aws secretsmanager rotate-secret --secret-id prod/db/master --rotate-immediately
aws secretsmanager rotate-secret --secret-id prod/stripe/api-key --rotate-immediately

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS Security Incident Response Guide (accessed 2026-05)
# Secrets Manager rotation lambdas pre-provisioned so step (4) is one API call
# per secret rather than a from-scratch rotation function authoring exercise.
resource "aws_secretsmanager_secret_rotation" "db_master" {
  secret_id           = aws_secretsmanager_secret.db_master.id
  rotation_lambda_arn = aws_lambda_function.rds_rotator.arn

  rotation_rules {
    automatically_after_days = 30
  }
}

# CloudWatch alarm on iam:CreateAccessKey or iam:UpdateAccessKey for high-priv
# users — surfaces credential-creation events for proactive review.
resource "aws_cloudwatch_log_metric_filter" "iam_access_key_mutation" {
  name           = "IamAccessKeyMutation"
  log_group_name = "aws-cloudtrail-logs"
  pattern        = "{ ($.eventName = \"CreateAccessKey\") || ($.eventName = \"UpdateAccessKey\") }"

  metric_transformation {
    name      = "IamAccessKeyMutation"
    namespace = "Security"
    value     = "1"
  }
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: Step Functions state machine orchestrating credential-rotation playbook (deactivate → rotate → notify).
Parameters:
  PlaybookRoleArn:
    Type: String
Resources:
  CredentialRotationStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      StateMachineName: ir-credential-rotation
      RoleArn: !Ref PlaybookRoleArn
      DefinitionString: !Sub |
        {
          "Comment": "IR credential-rotation playbook",
          "StartAt": "DeactivateAccessKey",
          "States": {
            "DeactivateAccessKey": {
              "Type": "Task",
              "Resource": "arn:aws:states:::aws-sdk:iam:updateAccessKey",
              "Next": "CreateNewKey"
            },
            "CreateNewKey": {
              "Type": "Task",
              "Resource": "arn:aws:states:::aws-sdk:iam:createAccessKey",
              "End": true
            }
          }
        }

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
(best-practices)n/an/an/a IR-4; IA-5; AC-2(13)A.5.26; A.5.17CLD.9.5.1

Log signals

  • CloudTrail iam:UpdateAccessKey with Status=Inactive followed by DeleteAccessKey on the same key — the canonical two-step credential-rotation sequence; rapid succession (within minutes) of these two events on a user previously linked to an incident ticket is positive-of-execution of the rotation playbook.
  • CloudTrail secretsmanager:UpdateSecret or secretsmanager:RotateSecret events whose requestParameters.secretId matches the canonical incident-rotation tag pattern — surfaces secret-rotation execution against Secrets Manager-managed credentials.
  • Negative signal: an IR ticket has been open for more than 4 hours without any UpdateAccessKey / RotateSecret CloudTrail event referencing the affected principals — indicates the rotation playbook has not been executed and the credentials remain in their potentially-compromised state.

Query

fields @timestamp, eventName, requestParameters.userName, requestParameters.accessKeyId, requestParameters.secretId, requestParameters.status, userIdentity.arn
          | filter eventSource in ["iam.amazonaws.com","secretsmanager.amazonaws.com"] and eventName in ["UpdateAccessKey","DeleteAccessKey","RotateSecret","UpdateSecret","DeactivateMFADevice","ResyncMFADevice"]
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query covers IAM and Secrets Manager mutations together; for the negative-signal case (rotation not executed), implement as a scheduled Lambda that walks open IR tickets in the IR ticketing system and queries for matching rotation events within the SLA window.

Alert threshold

  • Rotation-sequence event correlated to an open IR ticket — informational ack; record the rotation execution timestamp to the ticket and proceed.
  • IR ticket open beyond the 4-hour rotation SLA without rotation events — page immediately; the SLA is the maximum tolerable exposure window for a credential under suspicion and breaching it is a deliberate posture deviation.
  • Rotation event on a credential without a corresponding open IR ticket — high-priority ticket within 30 minutes; rotation outside an incident may be a routine hygiene rotation, but should always trace to a documented change record.

Initial response

  1. Execute the rotation playbook from the IaC repository: for IAM access keys, aws iam update-access-key --access-key-id {id} --status Inactive followed by aws iam delete-access-key after the 24-hour grace; for Secrets Manager-managed secrets, aws secretsmanager rotate-secret --secret-id {arn} --force-rotate-immediately.
  2. For each rotated credential, identify downstream consumers via the secret's VersionStage history and the IAM access-key's last-used service column; notify the consumer teams in parallel with the rotation so they can re-pull the new value within the credential's TTL.
  3. Post-rotation, audit CloudTrail for any use of the old credential after its rotation timestamp — any such use is a confirmed compromise event since the new credential was already in place; open an escalated incident per general/ir.html in that case.

References

Equivalent on: Azure · GCP · OCI

aws-ir-07-tabletop-exercises ! MEDIUM PREVENTIVE

Run quarterly tabletop exercises against three to five documented AWS-specific scenarios drawn from the threat-model corpus on the General threat-model page: an S3 bucket discovered to be public with PII inside, an IAM access key leaked to a public GitHub repository, a GuardDuty finding of unauthorised API calls from an IP geolocated outside the organisation's footprint, a CloudTrail trail abruptly disabled by a non-administrative principal, a Lambda function exfiltrating data to an unfamiliar HTTP destination. For each exercise, run the relevant playbook end-to-end against a non-production account, time the steps, and track time-to-contain as the primary metric quarter-over-quarter (AWS Security Incident Response Guide — Post-Incident Activity (accessed 2026-05)).

Exercises that surface a runbook step that is wrong, missing, or unexecutable are tracked as findings against the runbook repo and remediated before the next quarter. The control is typed PREVENTIVE because its value lies in preventing the runbook decay that always happens to documentation nobody runs — runbooks written and never tested are, in practice, runbooks that do not work when they are needed. The principle is reinforced in General IR lessons-learned guidance and in ISO/IEC 27001:2022 A.5.27 (learning from incidents).

Remediation — AWS CLI

# Tabletop exercises are run as facilitated workshops; the AWS surface they
# touch is the non-prod tabletop account. A representative driver command:
# stand up a deliberately misconfigured S3 bucket so the responder can practice
# the "public bucket with PII" scenario end-to-end against real AWS APIs.

# Create the tabletop S3 bucket in the tabletop account.
aws s3api create-bucket \
  --bucket ir-tabletop-public-pii-scenario-202605 \
  --region eu-west-1 \
  --create-bucket-configuration LocationConstraint=eu-west-1

# Deliberately turn OFF block-public-access for the exercise. Production
# accounts must NEVER have this configuration; this is an exercise-only step.
aws s3api put-public-access-block \
  --bucket ir-tabletop-public-pii-scenario-202605 \
  --public-access-block-configuration \
    "BlockPublicAcls=false,IgnorePublicAcls=false,BlockPublicPolicy=false,RestrictPublicBuckets=false"

# Drop a synthetic PII file so the responder has something to remediate.
echo "name,ssn\nJane Doe,000-00-0000" > /tmp/tabletop-pii.csv
aws s3 cp /tmp/tabletop-pii.csv s3://ir-tabletop-public-pii-scenario-202605/

# Time-to-contain is measured from the GuardDuty finding emission to the
# moment the bucket is back to Block Public Access enabled with the file
# removed. Track the duration in the IR metrics dashboard.

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS Security IR Guide — Post-Incident Activity (accessed 2026-05)
# Tabletop-account-only resources. Apply only to the dedicated tabletop OU.
resource "aws_s3_bucket" "tabletop_scenario" {
  bucket = "ir-tabletop-public-pii-scenario-202605"
  tags   = { Purpose = "ir-tabletop", Quarter = "2026Q2" }
}

# Deliberately permissive PAB for the exercise. Apply via a tabletop-only
# Terraform workspace; never propagate this resource into a prod workspace.
resource "aws_s3_bucket_public_access_block" "tabletop_scenario" {
  bucket                  = aws_s3_bucket.tabletop_scenario.id
  block_public_acls       = false
  block_public_policy     = false
  ignore_public_acls      = false
  restrict_public_buckets = false
}

# CloudWatch dashboard widget tracking time-to-contain across exercises.
resource "aws_cloudwatch_dashboard" "ir_tabletop_metrics" {
  dashboard_name = "ir-tabletop-metrics"
  dashboard_body = jsonencode({
    widgets = [{
      type   = "metric"
      width  = 12
      height = 6
      properties = {
        title  = "IR tabletop — time-to-contain by quarter"
        region = "eu-west-1"
        metrics = [["IR/Tabletop", "TimeToContainSeconds"]]
        view   = "timeSeries"
        stat   = "Average"
        period = 86400
      }
    }]
  })
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: Calendar-driven EventBridge schedule firing quarterly tabletop reminders to the IR distribution list.
Parameters:
  IrTopicArn:
    Type: String
Resources:
  QuarterlyTabletopSchedule:
    Type: AWS::Scheduler::Schedule
    Properties:
      Name: ir-tabletop-quarterly
      ScheduleExpression: cron(0 14 1 */3 ? *)
      FlexibleTimeWindow:
        Mode: 'OFF'
      Target:
        Arn: !Ref IrTopicArn
        RoleArn: !Sub 'arn:aws:iam::${AWS::AccountId}:role/EventBridgeSchedulerSnsRole'
        Input: '{"subject":"Quarterly IR tabletop reminder","detail":"Schedule the next IR tabletop exercise."}'

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
(best-practices)n/an/an/a IR-2; IR-3; IR-3(2)A.5.24; A.5.27n/a

Log signals

  • Tabletop-exercise execution is a process control rather than an event-driven one; the canonical signal is the absence of an exercise-completion record in the IR-tracking system for more than the org's cadence floor (commonly quarterly for SOC 2, annually for ISO 27001).
  • CloudTrail guardduty:CreateSampleFindings events — the canonical mechanism for injecting synthetic findings during a tabletop exercise; their presence in the trail is positive-of-execution and the absence over the cadence window is the negative-signal indicator.
  • The IR ticketing system's exercise ticket category should produce at least one closed ticket per cadence window with the postmortem-uploaded=true tag — any open exercise ticket past its planned close-date is a deliberate-delay signal warranting management escalation.

Query

fields @timestamp, eventName, requestParameters.detectorId, requestParameters.findingTypes, userIdentity.arn
          | filter eventSource = "guardduty.amazonaws.com" and eventName = "CreateSampleFindings"
          | stats count() as findings, latest(@timestamp) as last_exercise by userIdentity.arn
          | sort last_exercise desc
          | limit 50

The CloudWatch Logs Insights query enumerates synthetic-finding injections; pair with a scheduled Lambda that runs the query on a weekly cadence, emits a CloudWatch metric for the days-since-last-exercise, and triggers an alarm if the metric crosses the org's documented cadence floor.

Alert threshold

  • Days-since-last-exercise above the cadence floor — high-priority hygiene ticket routed to the IR program owner; not a page, but a tracked overdue that escalates to executive sponsor at 2× the cadence.
  • Exercise ticket open past planned close-date — informational; the IR program owner is the natural escalation target and the delay should be documented with a root-cause statement in the next IR steering meeting.
  • Post-mortem document not uploaded within 7 days of exercise completion — high-priority ticket; the post-mortem captures the lessons-learned and skipping it nullifies the exercise's training value.

Initial response

  1. Schedule the next tabletop exercise via the IR-program calendar; the exercise scenario should rotate across the playbook surface (containment, credential-rotation, evidence-preservation, restoration) so the team's response muscle covers the full IR breadth over the cadence window.
  2. For overdue exercises, the IR program owner runs an expedited dry-run within the SLA recovery window — typically a 2-hour scenario walkthrough rather than a full simulation — to close the cadence gap while the full exercise is being scheduled.
  3. Post-exercise, upload the post-mortem document to the IR knowledge-base referenced by general/ir.html; the post-mortem must include the timeline, the gap-of-coverage findings, the playbook updates triggered, and the owner of each follow-up item with a deadline.

References

Equivalent on: Azure · GCP · OCI

Sources