AWS Workloads Hardening

Overview

This page covers Amazon Web Services workload hardening across the compute surfaces that decide whether an attacker who lands code execution on a single instance can pivot to credentials, sibling workloads, or the AWS control plane. Scope is EC2 (instance metadata and remote access), Amazon ECR (container image supply chain), Amazon Inspector (vulnerability assessment), AWS Lambda (function-level least privilege and secrets handling), Amazon EKS (Kubernetes workload identity and control-plane posture), EC2 Image Builder (golden machine images), and AWS Systems Manager Patch Manager (post-deployment patch hygiene). Cross-cutting principles — image hardening, runtime protection, supply-chain integrity, secrets management — are explained in the General Workloads page sections on runtime security and supply chain; this page maps the principles to AWS primitives.

One canonical-content cross-link to flag at the top, because authoring this page in isolation would otherwise duplicate ~1500 words of canonical material: secrets management for AWS Lambda is documented on the General IAM — secrets management page, not here. The Phase 4 canonical-content rule (one canonical treatment per cross-cutting topic) lives this rule out in aws-work-05: the control covers Lambda execution-role least privilege and function-URL auth, and cross-links to general/iam.html for the Secrets Manager + KMS reference architecture rather than re-authoring it. The same pattern will recur on aws/data.html (encryption-in-transit cross-links to aws/network.html).

Two anti-conflation callouts up front, because both pairs get confused in design reviews. First: SSM Session Manager replaces SSH; bastion hosts are legacy. The default reflex of "stand up a bastion in a public subnet, allow port 22 from corporate IPs, jump from there" is obsolete (covered as aws-work-02): Session Manager exposes no public ports, requires no inbound network path, integrates with IAM for per-user authorisation, and writes a full session log to S3 + CloudWatch with optional KMS encryption. Engineers who insist on bastions are reproducing 2014's threat model — pick Session Manager, retire the bastion. Second: EKS Pod Identity (Dec 2023 GA) is the preferred workload-identity mechanism; IRSA is legacy-but-supported. Pod Identity decouples the trust-policy step that IRSA required, scales to many clusters without OIDC-provider sprawl, and is the path AWS is investing in (covered as aws-work-06). IRSA continues to work and existing IRSA deployments need not migrate urgently, but new clusters should default to Pod Identity. The same Pod-Identity-vs-IRSA choice was made on the IAM page for aws-iam-06; this page maintains alignment.

Order matters. Controls 01–02 are foundational invariants for every EC2 instance: IMDSv2 mandatory (the SSRF-to-credentials kill-chain mitigation) and SSM as the remote-access plane. Controls 03–04 close the container and vulnerability-assessment loop: ECR scan-on-push at build time, Amazon Inspector for continuous EC2 / ECR / Lambda assessment. Control 05 hardens Lambda functions. Control 06 hardens EKS. Control 07 establishes golden-AMI provenance via EC2 Image Builder. Control 08 handles ongoing patch hygiene via Systems Manager Patch Manager. The page is structured so a reader can skim 01–02 for the everyday EC2 baseline, then dip into 03–08 by service area as needed. Equivalence callouts at the bottom of each control point to the matching control on the Azure, GCP, and OCI sibling pages so a reader can compare modelling across providers, and the compliance-frameworks page describes why each control row carries the same seven framework columns.

aws-work-01-imdsv2-mandatory ! CRITICAL PREVENTIVE

Configure every EC2 instance with IMDSv2 token-required and hop-limit = 1, and pin the requirement with an organisation-level SCP that denies ec2:RunInstances when ec2:MetadataHttpTokens is not required. IMDSv1 is the unauthenticated, GET-only Instance Metadata Service that any local process — including a web server reflected through an SSRF bug — can call to retrieve the instance role's temporary credentials (Amazon EC2 — IMDSv2 enforcement and hop limit (accessed 2026-05)). IMDSv2 turns the call into a two-step session-token handshake (PUT to obtain a token, GET with the token header) that an SSRF reflection cannot perform because most SSRF payloads can only emit GETs. Hop-limit = 1 means the IMDS response packet has a TTL that decrements to zero after one hop — so a container in the host network namespace can reach it, but a forwarded HTTP request from a non-co-resident attacker cannot. PITFALL 5: hop-limit must be 1 for non-container workloads; the only legitimate reason to raise it to 2 is ECS-on-EC2, where the agent forwards the request through one virtual hop before reaching the IMDS — never raise hop-limit to 2 for general workloads "in case some app needs it", because that is exactly the attacker's wish.

MITIGATES: SSRF-to-credentials kill chain — an attacker who lands an SSRF bug in any internet-facing or LAN-facing service on the instance steals the instance role's temporary credentials and pivots to whatever the role can do (read S3, assume other roles, call sts:GetCallerIdentity to enumerate the account).

ATTACK VECTOR: The Capital One 2019 breach is the canonical case: an SSRF in a web-app WAF mis-rule reflected GETs at http://169.254.169.254/latest/meta-data/iam/security-credentials/<role> on the IMDSv1 endpoint. The response — temporary AWS credentials — flowed back through the SSRF response body to the attacker, who then used them to enumerate S3 and exfiltrate ~100M customer records. IMDSv2 alone defeats this exact pattern because the SSRF cannot emit the PUT to obtain the session token.

BLAST RADIUS: Per instance: an instance running IMDSv2-mandatory denies credential theft via SSRF reflection on that instance only. An organisation-level SCP turns the property into a region-wide invariant: every new instance is forced IMDSv2 at create time.

Remediation — AWS CLI

# Enforce IMDSv2 on an existing instance (hop-limit=1 = non-container workload default).
aws ec2 modify-instance-metadata-options \
  --instance-id i-0abc123def4567890 \
  --http-tokens required \
  --http-put-response-hop-limit 1 \
  --http-endpoint enabled

# Account-wide default: every new instance launched after this call uses IMDSv2.
aws ec2 modify-instance-metadata-defaults \
  --http-tokens required \
  --http-put-response-hop-limit 1 \
  --http-endpoint enabled

# Audit: list instances still allowing IMDSv1.
aws ec2 describe-instances \
  --filters Name=metadata-options.http-tokens,Values=optional \
  --query 'Reservations[].Instances[].[InstanceId,Tags[?Key==`Name`].Value|[0]]' \
  --output table

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_instance" "workload" {
  ami           = var.golden_ami_id
  instance_type = "m6i.large"
  subnet_id     = aws_subnet.private[0].id

  metadata_options {
    http_tokens = "required"  # IMDSv2 mandatory
    http_put_response_hop_limit = 1  # PITFALL 5: 1 for non-container; 2 ONLY for ECS-on-EC2
    http_endpoint = "enabled"
    instance_metadata_tags = "enabled"
  }

  tags = { Name = "app-prod-01" }
}

# Organisation-wide SCP: deny RunInstances unless IMDSv2 required.
resource "aws_organizations_policy" "deny_imdsv1_launches" {
  name = "deny-imdsv1-launches"
  type = "SERVICE_CONTROL_POLICY"
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid    = "DenyRunInstancesWithoutImdsV2"
      Effect = "Deny"
      Action = "ec2:RunInstances"
      Resource = "arn:aws:ec2:*:*:instance/*"
      Condition = {
        StringNotEquals = {
          "ec2:MetadataHttpTokens" = "required"
        }
      }
    }]
  })
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: EC2 launch template mandating IMDSv2 (HttpTokens=required) on every instance launched from it.
Resources:
  ImdsV2LaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: imdsv2-mandatory
      LaunchTemplateData:
        MetadataOptions:
          HttpTokens: required
          HttpEndpoint: enabled
          HttpPutResponseHopLimit: 2
          InstanceMetadataTags: enabled

Remediation — AWS CDK (TypeScript)

import * as cdk from 'aws-cdk-lib';
import { aws_ec2 as ec2 } from 'aws-cdk-lib';
import { Construct } from 'constructs';

export class ImdsV2LaunchTemplateStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new ec2.CfnLaunchTemplate(this, 'ImdsV2Lt', {
      launchTemplateName: 'imdsv2-mandatory',
      launchTemplateData: {
        metadataOptions: {
          httpTokens: 'required',
          httpEndpoint: 'enabled',
          httpPutResponseHopLimit: 2,
          instanceMetadataTags: 'enabled',
        },
      },
    });
  }
}

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
4.x (verify)	n/a	n/a	n/a	AC-3; CM-7; SC-8	A.8.20; A.8.25	CLD.9.5.1

Log signals

CloudTrail ec2:ModifyInstanceMetadataOptions events where requestParameters.httpTokens resolves to optional or where requestParameters.httpEndpoint remains enabled with httpPutResponseHopLimit above 1 — the IMDSv1 fallback re-opens the SSRF pivot from compromised containers and SDK clients into the instance role.
CloudTrail ec2:RunInstances events whose requestParameters.metadataOptions.httpTokens is optional (default for older AMIs and launch-templates) — surfaces fleet drift at launch time rather than at modification time.
Config rule ec2-imdsv2-check evaluating NON_COMPLIANT against production-tagged instances — backstop signal for instances that pre-date the CloudTrail event-window or were modified via the console with the event captured outside the working window.

Query

fields @timestamp, eventName, requestParameters.instanceId, requestParameters.httpTokens, requestParameters.httpPutResponseHopLimit, requestParameters.metadataOptions.httpTokens, userIdentity.arn
          | filter eventSource = "ec2.amazonaws.com" and eventName in ["ModifyInstanceMetadataOptions","RunInstances"]
          | filter requestParameters.httpTokens = "optional" or requestParameters.metadataOptions.httpTokens = "optional" or requestParameters.httpPutResponseHopLimit > 1
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query covers both at-modification and at-launch paths; for fleets that rely on auto-scaling launch-templates, also confirm the launch-template itself is hardened via aws ec2 describe-launch-template-versions diff against the IaC.

Alert threshold

Any httpTokens=optional on a production instance — page immediately; the instance role's credentials are now reachable via IMDSv1 from any compromised process inside the OS.
RunInstances launching with httpTokens=optional from a launch-template not in the IaC allow-list — high-priority ticket; the new launch indicates either a manual launch outside the auto-scaling flow or a stale launch-template.
httpPutResponseHopLimit above 1 on a non-EKS instance — page; the hop-limit increase has no legitimate use case outside container workloads and is a deliberate signal that the operator wants pod-level access to IMDS.

Initial response

Re-enforce IMDSv2 with aws ec2 modify-instance-metadata-options --instance-id {id} --http-tokens required --http-put-response-hop-limit 1; for launch-templates, increment the version with the hardened metadata-options and set the new version as default.
Pivot to CloudTrail sts:AssumeRole events for the instance's IAM role during the relaxation window and identify any token use from outside the instance's expected workload — pod or container processes accessing the role via IMDSv1 are the canonical abuse pattern.
Open an incident per general/ir.html if any unexpected role use is found; rotate the instance role's session credentials via aws iam update-role trust-policy re-sign and follow up with the credential-rotation playbook on any downstream resources the role had access to.

References

AWS EC2 — IMDSv2 configuration on existing instances (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

aws-work-02-ssm-session-manager ! HIGH PREVENTIVE

Replace SSH with AWS Systems Manager Session Manager on every EC2 instance; delete the bastion fleet. Session Manager establishes interactive shells through the SSM control plane, requires no inbound network path to the instance, authenticates via IAM, and writes full session transcripts to S3 + CloudWatch Logs with optional KMS encryption (AWS Systems Manager — Session Manager (accessed 2026-05)). The architectural shift is the entire point: bastion hosts with public IPs and port 22 ingress are obsolete — Session Manager is zero-trust (no implicit network reachability), logged by default, and has no public attack surface. Instances need only the AmazonSSMManagedInstanceCore managed policy on their instance profile and an SSM Agent (pre-installed on all Amazon Linux 2, AL2023, Ubuntu 18.04+, and Windows Server 2016+ AMIs). Severity HIGH PREVENTIVE because the control eliminates an entire category of internet-exposed SSH brute-force attacks and credential-stuffing campaigns against bastion fleets; CRITICAL is reserved for SSRF-class single-step exploitation paths (e.g. aws-work-01).

MITIGATES: Internet-exposed SSH brute force, bastion compromise pivots to internal fleet, lateral SSH movement after a single-instance compromise, unlogged shell sessions that defeat post-incident forensics.

ATTACK VECTOR: A bastion host is launched in a public subnet with port 22 open to corporate office IPs. An attacker compromises a developer laptop and steals an SSH private key, then connects to the bastion from the legitimate office IP range. From the bastion they SSH to internal instances using the same key (or a forwarded agent). No session log exists; the incident responder cannot reconstruct which commands ran on which host. With Session Manager: no public port 22, IAM-authenticated session, full keystroke-and-output log in S3 (Object-Lock retained) and CloudWatch.

BLAST RADIUS: Per fleet: removing the bastion + enforcing Session-Manager-only access caps blast radius of a stolen SSH key to zero AWS instances. Pairs with aws-iam-02 (MFA) so the IAM principal calling StartSession is itself MFA-gated.

Remediation — AWS CLI

# Start an interactive session (replaces ssh ec2-user@host).
aws ssm start-session --target i-0abc123def4567890

# Configure session preferences: KMS-encrypted log shipping to S3 + CloudWatch.
aws ssm update-document \
  --name SSM-SessionManagerRunShell \
  --content file://session-prefs.json \
  --document-version '$LATEST'

# session-prefs.json sets s3BucketName, s3KeyPrefix, s3EncryptionEnabled=true,
# cloudWatchLogGroupName, cloudWatchEncryptionEnabled=true, kmsKeyId=.

# Attach the SSM managed policy to the instance role.
aws iam attach-role-policy \
  --role-name ec2-workload-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_iam_role" "ec2_workload" {
  name = "ec2-workload-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Action    = "sts:AssumeRole"
      Principal = { Service = "ec2.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "ssm_core" {
  role       = aws_iam_role.ec2_workload.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

resource "aws_iam_instance_profile" "ec2_workload" {
  name = "ec2-workload-profile"
  role = aws_iam_role.ec2_workload.name
}

# Session-Manager preferences document: KMS-encrypted S3 + CloudWatch session logs.
resource "aws_ssm_document" "session_prefs" {
  name            = "SSM-SessionManagerRunShell"
  document_type   = "Session"
  document_format = "JSON"
  content = jsonencode({
    schemaVersion = "1.0"
    description   = "Session-Manager preferences with KMS-encrypted logging"
    sessionType   = "Standard_Stream"
    inputs = {
      s3BucketName               = aws_s3_bucket.session_logs.id
      s3KeyPrefix                = "sessions/"
      s3EncryptionEnabled        = true
      cloudWatchLogGroupName     = aws_cloudwatch_log_group.sessions.name
      cloudWatchEncryptionEnabled = true
      kmsKeyId                   = aws_kms_key.sessions.arn
      runAsEnabled               = false
    }
  })
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: SSM Session Manager preferences forcing KMS encryption and CloudWatch session logging.
Parameters:
  SessionLogGroupName:
    Type: String
  SessionKmsKeyArn:
    Type: String
Resources:
  SessionManagerPreferences:
    Type: AWS::SSM::Document
    Properties:
      Name: SSM-SessionManagerRunShell
      DocumentType: Session
      DocumentFormat: JSON
      Content:
        schemaVersion: '1.0'
        description: Session Manager hardened defaults.
        sessionType: Standard_Stream
        inputs:
          cloudWatchLogGroupName: !Ref SessionLogGroupName
          cloudWatchEncryptionEnabled: true
          kmsKeyId: !Ref SessionKmsKeyArn
          runAsEnabled: false

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
(best-practices)	n/a	n/a	n/a	AC-17; AC-17(3); AU-2	A.8.5; A.8.15	CLD.9.5.1

Log signals

CloudTrail ec2:RunInstances events whose requestParameters.keyName is non-empty on instances tagged for SSM-only access — attaching an SSH key-pair to an instance creates the SSH access path even if the security-group denies port 22 today (a later SG edit re-opens it without further alert).
VPC Flow Logs ACCEPT records on TCP 22 to any instance's primary ENI — the SSM-only posture means the steady-state SSH traffic to production instances is exactly zero, so any flow is high-signal.
CloudTrail ssm:UpdateInstanceInformation failures returning the instance as ConnectionLost while the instance is still running per ec2:DescribeInstances — indicates the SSM agent has been disabled or the instance role's AmazonSSMManagedInstanceCore attachment has been removed, breaking the documented access path.

Query

fields @timestamp, eventName, requestParameters.keyName, requestParameters.instanceId, responseElements.instancesSet.items.0.tagSet, userIdentity.arn
          | filter eventSource = "ec2.amazonaws.com" and eventName = "RunInstances"
          | filter ispresent(requestParameters.keyName) and requestParameters.keyName != ""
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query catches the at-launch path; for the in-flight path, run a parallel query on ec2:AuthorizeSecurityGroupIngress filtered to port-22 introductions and join the results against the org's SSM-only tag in a downstream alert-correlation step.

Alert threshold

Any RunInstances with a non-empty keyName in production — page immediately; the org's launch templates set keyName=null as policy and a non-null value is a deliberate deviation.
Inbound TCP 22 flow to a production instance — page; cross-reference the source IP against corporate-egress CIDRs and against any active SSM Session Manager session in progress (Session Manager does not use port 22 so a concurrent SSH flow is high-confidence unauthorized).
SSM ConnectionLost persisting for more than 15 minutes while the instance shows healthy in EC2 — high-priority ticket; the documented access path is broken and the operator must restore it before the next maintenance window.

Initial response

Terminate the SSH path: revoke any port-22 ingress rule on the instance's security-group, and if a key-pair is attached at launch time, remove the public key from ~/.ssh/authorized_keys via an SSM Run Command before rebooting to ensure the key does not persist in the cloud-init data.
Re-attach AmazonSSMManagedInstanceCore to the instance role if the SSM agent is reporting ConnectionLost; confirm the agent reconnects by starting a Session Manager session as a smoke test.
Pull VPC Flow Logs for the exposure window and enumerate every inbound port-22 flow that succeeded; open an incident per general/ir.html for any flow from outside the corporate egress CIDRs and rotate credentials reachable from the instance's IAM role.

References

AWS Systems Manager — Session Manager reference (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

aws-work-03-ecr-scan-on-push ! HIGH DETECTIVE

Enable enhanced scanning (Inspector v2 powered) on every Amazon ECR repository, set image_tag_mutability = IMMUTABLE, and gate the deployment pipeline so that any image with CRITICAL findings is blocked from production promotion (Amazon ECR User Guide — image scanning configuration (accessed 2026-05)). Enhanced scanning provides continuous CVE assessment, package-vulnerability detail for both OS and application-language ecosystems (npm, PyPI, RubyGems, Go modules, Maven, NuGet), and integrates findings into Amazon Inspector and Security Hub. The IMMUTABLE tag policy means an attacker who somehow lands push permission to myrepo:latest cannot overwrite an already-scanned tag with a back-doored image — the deployment pipeline pulls the same SHA256-pinned image that was scanned. The DETECTIVE typology is deliberate: scanning surfaces unsafe state, the build-pipeline gate is the PREVENTIVE pair (deployment denied on CRITICAL findings).

MITIGATES: Deployment of container images with known-CVE base layers (e.g. log4j Log4Shell, Spring4Shell), images built from outdated base images that have accumulated unpatched CVEs since the last build, supply-chain attacks where a typosquatted dependency lands in the image.

ATTACK VECTOR: A developer adds a new feature that pulls in transitive dependency colors at version 1.4.1 (the canonical "colors.js sabotage" case). The image builds, gets tagged v2024.03.15, is pushed to ECR. Without scan-on-push the image flows to production; with enhanced scanning the CRITICAL finding fires immediately, the deployment-pipeline gate refuses to promote the image, and the on-call engineer sees the finding in Security Hub.

BLAST RADIUS: Per repository: every image pushed is scanned; per organisation: enhanced scanning enabled at the Inspector v2 organisation level catches every repo in every member account.

Remediation — AWS CLI

# Enable enhanced (Inspector v2) scanning at the registry level.
aws ecr put-registry-scanning-configuration \
  --scan-type ENHANCED \
  --rules 'scanFrequency=CONTINUOUS_SCAN,repositoryFilters=[{filter="*",filterType="WILDCARD"}]'

# Per-repository: scan-on-push + immutable tags + KMS encryption.
aws ecr create-repository \
  --repository-name app/api \
  --image-tag-mutability IMMUTABLE \
  --image-scanning-configuration scanOnPush=true \
  --encryption-configuration encryptionType=KMS,kmsKey=arn:aws:kms:eu-west-1:111122223333:key/

# List images in a repo with CRITICAL findings.
aws ecr describe-image-scan-findings \
  --repository-name app/api \
  --image-id imageTag=v2024.03.15 \
  --query 'imageScanFindings.findingSeverityCounts'

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_ecr_registry_scanning_configuration" "enhanced" {
  scan_type = "ENHANCED"
  rule {
    scan_frequency = "CONTINUOUS_SCAN"
    repository_filter {
      filter      = "*"
      filter_type = "WILDCARD"
    }
  }
}

resource "aws_ecr_repository" "app_api" {
  name                 = "app/api"
  image_tag_mutability = "IMMUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "KMS"
    kms_key         = aws_kms_key.ecr.arn
  }
}

# Lifecycle policy: keep last 30 immutable tags, expire untagged after 7 days.
resource "aws_ecr_lifecycle_policy" "app_api" {
  repository = aws_ecr_repository.app_api.name
  policy = jsonencode({
    rules = [
      {
        rulePriority = 1
        description  = "Keep last 30 tagged images"
        selection = {
          tagStatus     = "tagged"
          tagPatternList = ["*"]
          countType     = "imageCountMoreThan"
          countNumber   = 30
        }
        action = { type = "expire" }
      },
      {
        rulePriority = 2
        description  = "Expire untagged after 7 days"
        selection = {
          tagStatus   = "untagged"
          countType   = "sinceImagePushed"
          countUnit   = "days"
          countNumber = 7
        }
        action = { type = "expire" }
      }
    ]
  })
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: ECR repository with enhanced (Inspector) scan-on-push and KMS encryption.
Parameters:
  RepoName:
    Type: String
  EcrKmsKeyArn:
    Type: String
Resources:
  ScanOnPushRepo:
    Type: AWS::ECR::Repository
    Properties:
      RepositoryName: !Ref RepoName
      ImageScanningConfiguration:
        ScanOnPush: true
      ImageTagMutability: IMMUTABLE
      EncryptionConfiguration:
        EncryptionType: KMS
        KmsKey: !Ref EcrKmsKeyArn

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
(best-practices)	n/a	n/a	n/a	RA-5; SI-3; SA-11	A.8.8; A.8.29	CLD.12.4.5

Log signals

CloudTrail ecr:PutRegistryScanningConfiguration events where the requestParameters.scanType shifts from ENHANCED to BASIC or where the requestParameters.rules array removes the canonical scan-on-push rule for production repository name patterns.
CloudTrail ecr:PutImageScanningConfiguration per-repository where requestParameters.imageScanningConfiguration.scanOnPush flips to false — silently disables scanning for one repository without touching the registry-wide configuration.
Inspector finding-export volume from ECR-source findings drops to zero on a repository that historically reports findings — passive signal that scan results stopped arriving even if the configuration looks correct.

Query

fields @timestamp, eventName, requestParameters.scanType, requestParameters.repositoryName, requestParameters.imageScanningConfiguration.scanOnPush, requestParameters.rules, userIdentity.arn
          | filter eventSource = "ecr.amazonaws.com" and eventName in ["PutRegistryScanningConfiguration","PutImageScanningConfiguration"]
          | filter requestParameters.scanType = "BASIC" or requestParameters.imageScanningConfiguration.scanOnPush = false
          | sort @timestamp desc
          | limit 50

The CloudWatch Logs Insights query targets the two regression paths simultaneously; the scanType=BASIC case is more severe because it strips Inspector-managed CVE feeds and leaves only the older basic-scan engine.

Alert threshold

Any PutRegistryScanningConfiguration shifting to BASIC in production — page immediately; the registry-wide downgrade affects every repository and removes Inspector CVE coverage instantly.
Per-repository scanOnPush=false change — high-priority ticket; the repository's image-push gate is now bypassed and any vulnerability scanning happens (if at all) on a delayed registry scan rather than at the push gate.
Inspector finding-stream volume below 10% of trailing-7-day baseline for an ECR repository with active pushes — informational; promote to incident if the divergence persists for 24 hours and the scan configuration appears nominally correct (indicates an Inspector-side issue or service-link misconfiguration).

Initial response

Restore the scanning configuration with aws ecr put-registry-scanning-configuration --scan-type ENHANCED --rules file://canonical-rules.json; for per-repository overrides, re-set scanOnPush=true via put-image-scanning-configuration.
Trigger a manual rescan of all images pushed during the disabled window with aws ecr start-image-scan per image-digest; this surfaces any CVE that would have blocked the push if scanning had been active.
Open an incident via general/ir.html if any image with a Critical or High CVE was pushed during the window and has since been deployed; the corresponding workload's runtime exposure needs to be evaluated and a patched image promoted ahead of normal release cadence.

References

AWS ECR — image scanning reference (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

aws-work-04-inspector-org ! HIGH DETECTIVE

Enable Amazon Inspector organisation-wide with the delegated-administrator pattern, covering EC2 instances, ECR repositories, and Lambda functions; route findings into AWS Security Hub for unified triage (Amazon Inspector User Guide — EC2/ECR/Lambda scanning (accessed 2026-05)). A naming caveat that matters for accuracy in audit reports: the product is Amazon Inspector (current name; the older "AWS Inspector" branding is no longer correct), and the "v2" qualifier was dropped in 2024 — the previous "Amazon Inspector v2" is now simply "Amazon Inspector". Inspector continuously assesses EC2 (agent-based and agentless modes), ECR images (the same scanning surfaced in aws-work-03), and Lambda functions (package and code scanning). Severity HIGH DETECTIVE because Inspector surfaces unsafe state — CVEs, misconfigurations, public-network-path findings — but is not itself the preventive gate; the build-pipeline integration and the IMDSv2/SCP combination on aws-work-01 are the preventive pairs.

MITIGATES: Unpatched CVEs accumulating on running EC2 instances post-deployment, drift between scanned-at-push container images and what is actually running, Lambda functions deployed with vulnerable dependencies, missing visibility into whether internet-routable instances have any of the above.

ATTACK VECTOR: A team deploys an EC2 instance from a golden AMI in January. By June the AMI's base packages have accumulated CVEs (kernel, openssl, libc), but no one rebuilds or reboots. Inspector's continuous EC2 assessment flags the now-vulnerable instance, including a CRITICAL kernel CVE; the on-call engineer sees the finding in Security Hub and triggers the SSM Patch Manager workflow (aws-work-08) on the affected fleet.

BLAST RADIUS: Per organisation: the delegated-administrator pattern means one account sees findings for every member account in the AWS Organization, eliminating the per-account blind spots that plagued earlier per-account-enabled tools.

Remediation — AWS CLI

# Designate the delegated administrator (from the Organizations management account).
aws inspector2 enable-delegated-admin-account \
  --delegated-admin-account-id 222233334444

# From the delegated admin: enable Inspector for all member accounts, all resource types.
aws inspector2 enable \
  --account-ids ALL_MEMBERS \
  --resource-types EC2 ECR LAMBDA LAMBDA_CODE

# Verify status.
aws inspector2 batch-get-account-status \
  --account-ids 111122223333 222233334444 \
  --query 'accounts[].[accountId,state.status]'

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_inspector2_delegated_admin_account" "security" {
  account_id = var.security_account_id
}

resource "aws_inspector2_organization_configuration" "auto_enable" {
  auto_enable {
    ec2         = true
    ecr         = true
    lambda      = true
    lambda_code = true
  }
}

# Per-account explicit enable (member accounts).
resource "aws_inspector2_enabler" "member" {
  for_each       = toset(var.member_account_ids)
  account_ids    = [each.value]
  resource_types = ["EC2", "ECR", "LAMBDA", "LAMBDA_CODE"]
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: Inspector v2 enablement for EC2 + ECR + Lambda scanning in the account.
Resources:
  InspectorEnabler:
    Type: AWS::InspectorV2::Filter
    Properties:
      Name: enable-all-scan-types
      FilterAction: NONE
      FilterCriteria:
        Severity:
          - Comparison: EQUALS
            Value: HIGH

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
(best-practices)	n/a	n/a	n/a	RA-5; SI-4	A.8.8	CLD.12.4.5

Log signals

CloudTrail inspector2:Disable events where the requestParameters.resourceTypes includes EC2, ECR, or LAMBDA — turns off scanning for the corresponding resource family at the org level.
CloudTrail inspector2:DisassociateMember on member accounts — peels accounts out of the org aggregation; the finding stream from those accounts stops flowing to the delegated-administrator account from that point.
CloudTrail inspector2:UpdateOrganizationConfiguration where autoEnable for any resource type flips to false — leaves current coverage intact but breaks the onboarding posture for new accounts and resources.

Query

fields @timestamp, eventName, requestParameters.resourceTypes, requestParameters.accountIds, requestParameters.autoEnable, userIdentity.arn
          | filter eventSource = "inspector2.amazonaws.com" and eventName in ["Disable","DisassociateMember","UpdateOrganizationConfiguration","DeleteMember"]
          | sort @timestamp desc
          | limit 100

Run the CloudWatch Logs Insights query against the delegated-administrator's CloudTrail log group; Inspector org-management events route through that account and the delegated-admin context is the canonical source of truth for org-level Inspector state.

Alert threshold

Any Disable affecting a resource type in production — page immediately; CVE-feed coverage stops flowing for that resource family at the moment of the disable and re-enabling triggers a full re-scan which takes hours to complete.
DisassociateMember or DeleteMember on more than one account in a 24-hour window — page; single-account events may reflect legitimate account closures but multi-account events indicate a sweep.
UpdateOrganizationConfiguration with autoEnable=false for any resource type — high-priority ticket within one business hour; the downside surfaces over weeks as new resources / accounts arrive uncovered.

Initial response

Re-enable scanning with aws inspector2 enable --resource-types EC2,ECR,LAMBDA --account-ids {accounts}; verify via batch-get-account-status that all three resource types report ENABLED for every member account.
Restore autoEnable=true via aws inspector2 update-organization-configuration and confirm new-account enrolment by creating a sandbox test account and verifying it auto-enrolls within one hour.
Open an incident via general/ir.html; for the gap-of-coverage window, manually trigger a one-shot scan via aws inspector2 enable-delegated-admin-account re-association and inventory any Critical / High findings that surface in the post-restoration sweep — these were latent during the gap.

References

AWS Inspector — managing multiple accounts (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

aws-work-05-lambda-least-priv ! HIGH PREVENTIVE

Every AWS Lambda function gets its own least-privileged execution role (no shared "lambda-default" role, no *:* wildcards), its function URL (where present) is configured with AuthType = AWS_IAM, its secrets are pulled at runtime from AWS Secrets Manager via KMS-encrypted references rather than baked into environment variables, and production functions carry a reserved_concurrent_executions ceiling that caps blast-radius cost during anomalous traffic (AWS Lambda Developer Guide — execution-role least privilege (accessed 2026-05)). The canonical secrets-management reference architecture — Secrets Manager rotation, KMS key policies, runtime fetching — is documented on the General IAM page (§secrets management), and this control intentionally cross-links rather than re-authoring per the Phase 4 canonical-content rule. Severity HIGH PREVENTIVE because Lambda functions inherit AWS-managed isolation but their execution-role permissions ARE the blast radius if the function is compromised; a function with s3:* on * is functionally an Organisation-wide read-write key in the hands of any attacker who exploits a code bug.

MITIGATES: Confused-deputy abuse via over-broad Lambda execution roles; secret leakage via environment-variable dumps in error stacks or log shipping; unauthenticated invocation of internal-purpose Lambda function URLs; runaway costs when a malicious or accidental loop invokes a function unbounded.

ATTACK VECTOR: A function ingests JSON from an SQS queue; a deserialisation bug lands code execution inside the function. The execution role has s3:GetObject on arn:aws:s3:::* "because we weren't sure which buckets it would need". The attacker enumerates every bucket the account can see and exfiltrates regulated data. With the role scoped to a single bucket prefix, blast radius is one prefix; with secrets in Secrets Manager rather than env vars, the secret fetched at runtime can be rotated and old captured values are invalidated.

BLAST RADIUS: Per function: each function's blast radius equals (execution-role permissions) ∩ (Secrets Manager keys it can read) ∩ (KMS keys it can decrypt). Per-function roles keep the intersection minimal.

Remediation — AWS CLI

# Update an existing function to use a least-priv role and AWS_IAM URL auth.
aws lambda update-function-configuration \
  --function-name order-processor \
  --role arn:aws:iam::111122223333:role/lambda-order-processor-role \
  --reserved-concurrent-executions 50 \
  --environment 'Variables={SECRET_ARN=arn:aws:secretsmanager:eu-west-1:111122223333:secret:db/order-xyz}'

aws lambda update-function-url-config \
  --function-name order-processor \
  --auth-type AWS_IAM

# Audit: list functions whose role is the legacy lambda_basic_execution role.
aws lambda list-functions \
  --query 'Functions[?contains(Role,`lambda_basic_execution`)].[FunctionName,Role]' \
  --output table

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_iam_role" "order_processor" {
  name = "lambda-order-processor-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Action    = "sts:AssumeRole"
      Principal = { Service = "lambda.amazonaws.com" }
    }]
  })
}

# Function-scoped policy: no wildcards beyond the function's own purpose.
resource "aws_iam_role_policy" "order_processor" {
  name = "order-processor-scope"
  role = aws_iam_role.order_processor.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = ["secretsmanager:GetSecretValue"]
        Resource = [aws_secretsmanager_secret.db_order.arn]
      },
      {
        Effect = "Allow"
        Action = ["kms:Decrypt"]
        Resource = [aws_kms_key.secrets.arn]
        Condition = {
          StringEquals = { "kms:ViaService" = "secretsmanager.eu-west-1.amazonaws.com" }
        }
      },
      {
        Effect = "Allow"
        Action = ["logs:CreateLogStream", "logs:PutLogEvents"]
        Resource = "arn:aws:logs:*:*:log-group:/aws/lambda/order-processor:*"
      }
    ]
  })
}

resource "aws_lambda_function" "order_processor" {
  function_name = "order-processor"
  role          = aws_iam_role.order_processor.arn
  package_type  = "Zip"
  filename      = var.lambda_zip
  handler       = "index.handler"
  runtime       = "nodejs20.x"

  reserved_concurrent_executions = 50

  environment {
    variables = {
      SECRET_ARN = aws_secretsmanager_secret.db_order.arn
    }
  }
}

resource "aws_lambda_function_url" "order_processor" {
  function_name      = aws_lambda_function.order_processor.function_name
  authorization_type = "AWS_IAM"
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: Lambda execution role scoped to a single DynamoDB table — no broad dynamodb:* or *.
Parameters:
  FunctionName:
    Type: String
  TableArn:
    Type: String
Resources:
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub '${FunctionName}-exec'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
        - !Ref LambdaTableAccessPolicy
  LambdaTableAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      ManagedPolicyName: !Sub '${FunctionName}-table-access'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - dynamodb:GetItem
              - dynamodb:PutItem
              - dynamodb:UpdateItem
              - dynamodb:Query
            Resource: !Ref TableArn

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
(best-practices)	n/a	n/a	n/a	AC-6; SC-12; SC-28	A.5.15; A.8.24	n/a

Log signals

CloudTrail lambda:CreateFunction or lambda:UpdateFunctionConfiguration where the resolved execution-role's attached-policy list includes any *FullAccess, AdministratorAccess, or iam:* policy — the function inherits the entire policy surface, defeating the least-privilege intent.
CloudTrail iam:AttachRolePolicy targeting a Lambda execution role with a managed-policy ARN outside the org's curated allow-list — silently widens an existing function's permissions without touching the function itself.
CloudTrail lambda:UpdateFunctionConfiguration setting KmsKeyArn to null or removing the function-URL AuthType=AWS_IAM — adjacent posture regressions that often accompany permission widening.

Query

fields @timestamp, eventName, requestParameters.functionName, requestParameters.role, requestParameters.policyArn, requestParameters.roleName, userIdentity.arn
          | filter eventSource in ["lambda.amazonaws.com","iam.amazonaws.com"] and eventName in ["CreateFunction","UpdateFunctionConfiguration","AttachRolePolicy"]
          | filter requestParameters.policyArn like /FullAccess$/ or requestParameters.policyArn like /AdministratorAccess$/ or requestParameters.policyArn like /:policy\/iam-/
          | sort @timestamp desc
          | limit 100

The CloudWatch Logs Insights query covers both function-side and role-side mutations; the policyArn regex catches the most common over-privileging idioms and should be tuned against the org's managed-policy allow-list maintained as a TSV in the IaC repository.

Alert threshold

Any Lambda execution role acquiring an *FullAccess or AdministratorAccess policy in production — page immediately; the function's request-driven invocation model means any caller-reachable trigger now has effective use of the over-privileged role.
A function deployed with KmsKeyArn=null when environment variables contain non-empty values — high-priority ticket; environment variables are visible to anyone with lambda:GetFunctionConfiguration when not CMK-encrypted and frequently contain secrets that should have been in Secrets Manager instead.
Function-URL AuthType=NONE on a production function — page; the function is now Internet-reachable without IAM-signed requests and the invocation surface is anyone who knows the URL.

Initial response

Detach the over-privileged policy with aws iam detach-role-policy and re-attach the function's IaC-canonical scoped policy; if the function was created from scratch with the over-privileged role, delete the function (idempotent re-deploy will recreate it with the correct role from IaC).
Inventory the function's CloudWatch Logs invocation stream during the over-privilege window and enumerate every AWS API call the function made — any call outside the function's documented action set is a candidate abuse trace.
Open an incident via general/ir.html if the function's invocation stream includes calls to S3 objects, KMS keys, or IAM principals outside its expected scope; rotate any credentials the function may have touched via the credential-rotation playbook (aws-ir-06-credential-rotation-playbook).

References

AWS Lambda — execution role and permissions reference (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

aws-work-06-eks-pod-identity ! HIGH PREVENTIVE

Harden Amazon EKS clusters along four orthogonal axes: (a) workload identity via EKS Pod Identity (Dec 2023 GA, preferred over IRSA per Phase 5 IAM precedent on aws-iam-06; IRSA remains legacy-but-supported for existing clusters); (b) Pod Security Admission with the restricted profile enforced on every workload namespace; (c) private control-plane endpoint with public access disabled; (d) control-plane audit logs shipped to CloudWatch (Amazon EKS — Pod Identity (accessed 2026-05)). Pod Identity decouples Kubernetes ServiceAccount → IAM role mapping from the per-cluster OIDC-provider step IRSA required; one EKS-side association replaces the cluster-specific IRSA trust-policy edit, which is the property that makes Pod Identity scale across many clusters without OIDC-provider sprawl. Scope deliberately bounded: this control covers (a)-(d) above; service-mesh integration, admission-controller frameworks beyond PSA, and runtime-detection agents are out of scope for this page and will be mentioned as v2 supplementary topics in a future release.

MITIGATES: Pod-to-pod credential theft via overly broad node-instance roles, lateral movement when a single pod is compromised, control-plane abuse from internet (a leaked kubectl config + public endpoint = direct cluster API access), unprivileged-container escape via missing PSA enforcement.

ATTACK VECTOR: A pod runs as root with hostPath mounted because PSA is unenforced; an attacker who lands code execution in the pod escapes to the underlying node, reads the node's IAM-Instance-Profile credentials, and pivots to whatever the node role can do (typically ec2:* and broader). With PSA restricted enforced the privileged pod never lands; with Pod Identity each pod has its own scoped IAM role that the node role does not inherit; with the control-plane endpoint private the leaked kubeconfig cannot be used from the public internet.

BLAST RADIUS: Per cluster for control-plane / endpoint / logging; per namespace for PSA enforcement; per pod for Pod Identity associations. The combination caps multi-tenant cluster blast radius.

Remediation — AWS CLI

# Create a Pod Identity association: Kubernetes SA <-> IAM role mapping.
aws eks create-pod-identity-association \
  --cluster-name prod-cluster \
  --namespace orders \
  --service-account order-processor-sa \
  --role-arn arn:aws:iam::111122223333:role/eks-order-processor

# Update cluster: private endpoint only + full audit logging.
aws eks update-cluster-config \
  --name prod-cluster \
  --resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=false \
  --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'

# Enforce Pod Security Admission restricted profile via namespace labels.
kubectl label namespace orders \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_eks_cluster" "prod" {
  name     = "prod-cluster"
  role_arn = aws_iam_role.cluster.arn
  version  = "1.30"

  vpc_config {
    subnet_ids              = aws_subnet.private[*].id
    endpoint_private_access = true
    endpoint_public_access  = false  # Private control plane only
  }

  enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  encryption_config {
    provider { key_arn = aws_kms_key.eks.arn }
    resources = ["secrets"]
  }
}

# Pod Identity association (preferred over IRSA for new clusters).
resource "aws_eks_pod_identity_association" "order_processor" {
  cluster_name    = aws_eks_cluster.prod.name
  namespace       = "orders"
  service_account = "order-processor-sa"
  role_arn        = aws_iam_role.order_processor_pod.arn
}

resource "aws_iam_role" "order_processor_pod" {
  name = "eks-order-processor"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = ["sts:AssumeRole", "sts:TagSession"]
      Principal = { Service = "pods.eks.amazonaws.com" }
    }]
  })
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: EKS Pod Identity addon — installs the agent that mints scoped credentials per pod.
Parameters:
  ClusterName:
    Type: String
Resources:
  PodIdentityAgent:
    Type: AWS::EKS::Addon
    Properties:
      ClusterName: !Ref ClusterName
      AddonName: eks-pod-identity-agent
      ResolveConflicts: OVERWRITE

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
n/a (post-v3.0.0)	n/a	n/a	n/a	AC-3; AC-6; SC-7	A.5.15; A.8.20	CLD.9.5.1

Log signals

EKS audit-log events showing a pod requesting a service-account token that maps via Pod Identity to an IAM role whose policy graph includes iam:*, sts:AssumeRole on broad principals, or any *FullAccess managed policy — surfaces over-privileged pod-identity bindings even when the binding itself was created legitimately.
CloudTrail sts:AssumeRoleWithWebIdentity (legacy IRSA path) on a cluster that should have migrated to Pod Identity — indicates either an un-migrated workload or a deliberate downgrade to the older path that lacks the EKS-managed credential injection guard rails.
CloudTrail use of a pod-identity-bound role's session credentials from outside the cluster's documented egress IP ranges — the SDK-side use should always trace back to the cluster's NAT-gateway egress IP set, so any other source IP is high-confidence credential leakage.

Query

fields @timestamp, eventName, requestParameters.roleArn, requestParameters.roleSessionName, sourceIPAddress, userIdentity.sessionContext.sessionIssuer.arn
          | filter eventSource = "sts.amazonaws.com" and eventName = "AssumeRole"
          | parse requestParameters.roleArn /arn:aws:iam::(?<acct>\d+):role\/(?<role>.+)/
          | filter role like /^eks-pod-/
          | stats count() as n by sourceIPAddress, role
          | sort n desc
          | limit 50

The CloudWatch Logs Insights query aggregates AssumeRole calls per source-IP per role; the canonical posture has at most a handful of egress IPs per cluster, so any unfamiliar source-IP entry above 0 calls is the actionable anomaly.

Alert threshold

Any pod-identity-bound role accessed from a source-IP outside the cluster's egress IP set — page immediately and treat as confirmed credential leakage until proven otherwise; the role's session credentials should never be observed outside the cluster network.
A new pod-identity binding to an IAM role whose effective policy graph includes iam:* — page; the binding makes that role's blast radius the entire IAM control plane and the legitimate use cases for such bindings are extremely rare.
Legacy IRSA AssumeRoleWithWebIdentity traffic on a cluster post-migration — informational; the migration completion criteria should drive this to zero and any residual traffic indicates an un-migrated workload.

Initial response

Revoke the binding via aws eks delete-pod-identity-association if the role over-privilege was the regression, or detach the over-privileged policy from the role if the binding itself was legitimate; force a pod restart in the affected namespace so the in-memory token cache invalidates.
For source-IP leakage, immediately revoke active sessions with aws iam delete-role-permissions-boundary + temporary scoped-deny SCP from aws-iam-08-scp-deny-list; this stops further use within seconds while the IAM-engineering team works on a permanent role replacement.
Open an incident per general/ir.html; cross-reference every sts:AssumeRole call against the cluster's egress-IP set during the prior 24 hours and treat any unaccounted-for source as a candidate active compromise.

References

AWS EKS — Pod Identity reference (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

aws-work-07-ec2-image-builder-golden-amis ! MEDIUM PREVENTIVE

Build EC2 instances from golden AMIs produced by an EC2 Image Builder pipeline whose ImageRecipe applies the CIS Amazon Linux 2023 (or equivalent) hardening components, installs the SSM Agent, and distributes the resulting AMI to every workload region via a DistributionConfiguration (EC2 Image Builder User Guide (accessed 2026-05)). The pipeline rebuilds on a quarterly cadence AND on a CVE-driven trigger (a CRITICAL CVE in the base image fires the build via an EventBridge rule). Severity MEDIUM PREVENTIVE because golden AMIs reduce ongoing patch surface but are not by themselves the patch-management story — aws-work-08 (Patch Manager) handles post-deployment patches; golden AMIs handle initial-state hygiene. The pair (golden AMI + Patch Manager) is the AWS-native equivalent of the "immutable infrastructure with periodic re-baking" pattern.

MITIGATES: Drift between fleet-wide baseline hardening and what any given instance actually has applied, missing CIS-baseline configurations on instances that were launched from a community AMI in a hurry, accumulated CVEs in the AMI itself that go unaddressed because nobody rebuilds.

ATTACK VECTOR: A team launches new instances in a fresh account from a public Marketplace AMI that has not been updated in 14 months. The AMI ships with kernel CVEs, weak SSH config, no SSM Agent, and a default user with sudo nopasswd. Inspector flags the kernel CVEs but the team can't reboot at-will. With a golden AMI: the same launch starts from a CIS-hardened, SSM-managed, current-month-rebuilt image — the kernel CVEs are not present, the SSM Agent is pre-installed (enabling aws-work-02 / aws-work-08), and the SSH config matches policy.

BLAST RADIUS: Per pipeline: every AMI produced by the pipeline carries the same hardening guarantees; per region: distributions cover all workload regions; per CVE: rebuild trigger ensures new CRITICAL findings result in a new AMI within hours, not quarters.

Remediation — AWS CLI

# Create an image recipe referencing CIS-hardening component + SSM Distributor packages.
aws imagebuilder create-image-recipe \
  --name al2023-cis-hardened \
  --semantic-version 1.0.0 \
  --parent-image arn:aws:imagebuilder:eu-west-1:aws:image/amazon-linux-2023-x86/x.x.x \
  --components componentArn=arn:aws:imagebuilder:eu-west-1:aws:component/cis-amazon-linux-2023-hardening/1.0.0 \
               componentArn=arn:aws:imagebuilder:eu-west-1:aws:component/aws-cli-version-2-linux/1.0.0

# Create a pipeline with a quarterly schedule and CVE-driven rebuild trigger.
aws imagebuilder create-image-pipeline \
  --name al2023-cis-quarterly \
  --image-recipe-arn arn:aws:imagebuilder:eu-west-1:111122223333:image-recipe/al2023-cis-hardened/1.0.0 \
  --infrastructure-configuration-arn  \
  --distribution-configuration-arn  \
  --schedule 'scheduleExpression="cron(0 6 1 */3 ? *)",pipelineExecutionStartCondition=EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE'

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_imagebuilder_image_recipe" "al2023_cis" {
  name              = "al2023-cis-hardened"
  parent_image      = "arn:aws:imagebuilder:${var.region}:aws:image/amazon-linux-2023-x86/x.x.x"
  version           = "1.0.0"

  component { component_arn = "arn:aws:imagebuilder:${var.region}:aws:component/cis-amazon-linux-2023-hardening/1.0.0" }
  component { component_arn = "arn:aws:imagebuilder:${var.region}:aws:component/aws-cli-version-2-linux/1.0.0" }
}

resource "aws_imagebuilder_distribution_configuration" "multi_region" {
  name = "al2023-multi-region"
  dynamic "distribution" {
    for_each = var.workload_regions
    content {
      region = distribution.value
      ami_distribution_configuration {
        name = "al2023-cis-{{ imagebuilder:buildDate }}"
      }
    }
  }
}

resource "aws_imagebuilder_image_pipeline" "al2023_cis_quarterly" {
  name                             = "al2023-cis-quarterly"
  image_recipe_arn                 = aws_imagebuilder_image_recipe.al2023_cis.arn
  infrastructure_configuration_arn = aws_imagebuilder_infrastructure_configuration.this.arn
  distribution_configuration_arn   = aws_imagebuilder_distribution_configuration.multi_region.arn

  schedule {
    schedule_expression                = "cron(0 6 1 */3 ? *)"
    pipeline_execution_start_condition = "EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE"
  }
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: EC2 Image Builder pipeline producing a hardened golden AMI on a weekly cadence.
Parameters:
  RecipeArn:
    Type: String
  InfrastructureConfigArn:
    Type: String
  DistributionConfigArn:
    Type: String
Resources:
  GoldenAmiPipeline:
    Type: AWS::ImageBuilder::ImagePipeline
    Properties:
      Name: golden-ami-weekly
      ImageRecipeArn: !Ref RecipeArn
      InfrastructureConfigurationArn: !Ref InfrastructureConfigArn
      DistributionConfigurationArn: !Ref DistributionConfigArn
      Schedule:
        ScheduleExpression: cron(0 6 ? * SUN *)
        PipelineExecutionStartCondition: EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE
      Status: ENABLED

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
(best-practices)	n/a	n/a	n/a	CM-2; SI-2; SA-10	A.8.9; A.8.32	CLD.12.4.5

Log signals

CloudTrail ec2:RunInstances events whose requestParameters.imageId resolves to an AMI not in the canonical golden-AMI set tracked by Image Builder pipelines — production launches outside the golden-AMI allow-list are by definition unauthorized.
CloudTrail imagebuilder:UpdateImagePipeline events where the imageRecipeArn changes or where buildComponentArn entries are removed — silently alters the golden-AMI build outcome without touching the AMI distribution.
CloudTrail imagebuilder:CancelImageCreation or pipeline-execution failures — disrupts the cadence of fresh golden AMIs, leading to fleet drift toward stale AMIs that miss recent CVE patches.

Query

fields @timestamp, eventName, requestParameters.imageId, requestParameters.imagePipelineArn, requestParameters.containerRecipeArn, requestParameters.imageRecipeArn, userIdentity.arn
          | filter eventSource in ["ec2.amazonaws.com","imagebuilder.amazonaws.com"] and eventName in ["RunInstances","UpdateImagePipeline","CancelImageCreation","DeleteImagePipeline"]
          | sort @timestamp desc
          | limit 100

For the RunInstances variant, post-process the CloudWatch Logs Insights output against the org's golden-AMI ID allow-list (maintained in SSM Parameter Store under /golden-amis/{family}/latest) to surface the launches that fell outside the curated set.

Alert threshold

Any production launch from an AMI ID outside the golden-AMI allow-list — page immediately; the launched instance lacks the org's baked-in hardening (CloudWatch agent, SSM agent, CIS-Level-1 configuration) and represents a fleet posture deviation.
UpdateImagePipeline changing the recipe ARN outside a tracked change-management ticket — high-priority ticket within one business hour; the recipe defines which CIS controls bake into every downstream AMI.
Image Builder pipeline-execution failure persisting beyond two consecutive scheduled runs — informational; the alert should escalate to high if the pipeline has not produced a fresh AMI in 14 days, since the org's patch SLA depends on the cadence.

Initial response

For unauthorised AMI launches, immediately isolate the instance per the EC2 isolation playbook (aws-ir-05-isolation-playbook-ec2) — change its security-group to a single-deny-rule containment SG and snapshot the EBS volumes before any further triage.
Restore the Image Builder pipeline's recipe from IaC with aws imagebuilder update-image-pipeline --image-pipeline-arn {arn} --image-recipe-arn {canonical-arn}; trigger an immediate one-shot build via start-image-pipeline-execution to refresh the golden AMI.
Open an incident via general/ir.html; the unauthorised AMI may have been deliberately crafted with a baked-in backdoor, so the snapshot from step 1 should be forensically imaged and the AMI itself analysed for any deviation from the public Amazon Linux 2023 / Ubuntu base manifests.

References

AWS EC2 Image Builder — pipelines reference (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

aws-work-08-systems-manager-patch-manager ! MEDIUM DETECTIVE

Enable AWS Systems Manager Patch Manager on every EC2 instance: define a patch baseline (the approved-patch ruleset per OS family), define a maintenance window (the time-of-day envelope when patches are applied), attach instances via tag-based patch groups, and surface patch-compliance metrics to AWS Config (AWS Systems Manager — Patch Manager (accessed 2026-05)). The DETECTIVE typology is deliberate: Patch Manager reports the compliance state of every managed instance against the baseline, and the maintenance-window automation is the remediation pair. Severity MEDIUM because the control is operational hygiene rather than single-step exploitation prevention; pairs with aws-work-04 (Inspector flags the CVEs that the baseline must include) and aws-work-07 (golden AMIs handle initial-state hygiene; Patch Manager handles steady-state drift).

MITIGATES: Long-tail CVE exposure on instances that survive between AMI rebuilds, missing patches because no maintenance window ever ran, audit findings on "what is your patching SLA" with no defensible answer.

ATTACK VECTOR: A widespread Linux kernel CVE is announced (e.g. a privilege-escalation CVE rated CRITICAL by the distro vendor). Without Patch Manager: each team is responsible for patching their own fleet, some patch within hours, others within weeks, audit logs show 30% of the fleet unpatched at 30 days. With Patch Manager: the patch baseline auto-approves the new advisory category, the next maintenance-window run applies it across every managed instance, AWS Config records compliance state, the Security Hub finding tracks the 30-day-to-95% SLA.

BLAST RADIUS: Per patch group: every instance in the patch group inherits the baseline and the maintenance-window schedule; per OS family: separate baselines (Linux / Windows / macOS) avoid a one-size-fits-all rule.

Remediation — AWS CLI

# Create a patch baseline (Linux example).
aws ssm create-patch-baseline \
  --name AL2023-Security-Critical \
  --operating-system AMAZON_LINUX_2023 \
  --approval-rules 'PatchRules=[{
    PatchFilterGroup={PatchFilters=[
      {Key=CLASSIFICATION,Values=[Security]},
      {Key=SEVERITY,Values=[Critical,Important]}]},
    ApproveAfterDays=3,
    ComplianceLevel=CRITICAL}]'

# Create a maintenance window (Sundays 03:00 UTC, 4-hour cutoff).
aws ssm create-maintenance-window \
  --name patch-prod-weekly \
  --schedule 'cron(0 3 ? * SUN *)' \
  --duration 4 --cutoff 1 \
  --allow-unassociated-targets

# Register patching task on the maintenance window.
aws ssm register-task-with-maintenance-window \
  --window-id mw-0abc \
  --task-type RUN_COMMAND \
  --task-arn AWS-RunPatchBaseline \
  --targets Key=tag:PatchGroup,Values=prod \
  --task-invocation-parameters 'RunCommand={Parameters={Operation=[Install]}}'

Remediation — Terraform

# Terraform AWS provider ~> 5.0
# Source: AWS docs (accessed 2026-05)
resource "aws_ssm_patch_baseline" "al2023_security" {
  name             = "AL2023-Security-Critical"
  operating_system = "AMAZON_LINUX_2023"

  approval_rule {
    approve_after_days  = 3
    compliance_level    = "CRITICAL"
    patch_filter {
      key    = "CLASSIFICATION"
      values = ["Security"]
    }
    patch_filter {
      key    = "SEVERITY"
      values = ["Critical", "Important"]
    }
  }
}

resource "aws_ssm_patch_group" "prod_linux" {
  baseline_id = aws_ssm_patch_baseline.al2023_security.id
  patch_group = "prod"
}

resource "aws_ssm_maintenance_window" "patch_prod" {
  name     = "patch-prod-weekly"
  schedule = "cron(0 3 ? * SUN *)"
  duration = 4
  cutoff   = 1
}

resource "aws_ssm_maintenance_window_target" "prod" {
  window_id     = aws_ssm_maintenance_window.patch_prod.id
  resource_type = "INSTANCE"
  targets {
    key    = "tag:PatchGroup"
    values = ["prod"]
  }
}

resource "aws_ssm_maintenance_window_task" "patch_run" {
  window_id        = aws_ssm_maintenance_window.patch_prod.id
  task_arn         = "AWS-RunPatchBaseline"
  task_type        = "RUN_COMMAND"
  max_concurrency  = "20%"
  max_errors       = "5%"
  priority         = 1

  targets {
    key    = "WindowTargetIds"
    values = [aws_ssm_maintenance_window_target.prod.id]
  }

  task_invocation_parameters {
    run_command_parameters {
      parameter {
        name   = "Operation"
        values = ["Install"]
      }
    }
  }
}

Remediation — CloudFormation

AWSTemplateFormatVersion: '2010-09-09'
Description: SSM Patch Manager baseline (critical+security) applied during a maintenance window.
Resources:
  ProdPatchBaseline:
    Type: AWS::SSM::PatchBaseline
    Properties:
      Name: prod-amazonlinux-baseline
      OperatingSystem: AMAZON_LINUX_2023
      ApprovalRules:
        PatchRules:
          - ApproveAfterDays: 0
            PatchFilterGroup:
              PatchFilters:
                - Key: CLASSIFICATION
                  Values:
                    - Security
                - Key: SEVERITY
                  Values:
                    - Critical
                    - Important
            ComplianceLevel: CRITICAL
            EnableNonSecurity: false

Compliance mapping

CIS AWS Foundations v7.0.0	CIS Microsoft Azure Foundations v6.0.0	CIS GCP Foundation v5.0.0	CIS OCI Foundation v3.1.0	NIST SP 800-53 rev5	ISO/IEC 27001:2022	ISO/IEC 27017:2015
(best-practices)	n/a	n/a	n/a	SI-2; SI-2(2); CM-3	A.8.8; A.8.9	CLD.12.4.5

Log signals

CloudTrail ssm:DeleteMaintenanceWindow events targeting the canonical patch-window resources — destroys the scheduled patch cadence; instances stay on their current patch baseline until manual intervention.
CloudTrail ssm:DeregisterTargetFromMaintenanceWindow where the deregistered target is the production instance-fleet tag — peels instances out of the patch scope one population at a time.
SSM DescribePatchGroupState output where InstancesWithNotApplicablePatches + InstancesWithMissingPatches is non-zero on a patch-group whose maintenance-window has been disabled or removed — passive signal of patch drift that is the actual measurable outcome of the regression.

Query

fields @timestamp, eventName, requestParameters.windowId, requestParameters.windowTargetId, requestParameters.targets, userIdentity.arn
          | filter eventSource = "ssm.amazonaws.com" and eventName in ["DeleteMaintenanceWindow","DeregisterTargetFromMaintenanceWindow","UpdateMaintenanceWindow","DisassociateAssociation"]
          | sort @timestamp desc
          | limit 100

Pair the CloudWatch Logs Insights query with a daily completeness check that runs aws ssm describe-maintenance-windows and asserts each canonical window is present, enabled, and has the expected target registration count.

Alert threshold

Any DeleteMaintenanceWindow in production — page immediately; the patch cadence breaks at the moment of the delete and the CVE exposure window grows linearly with time.
DeregisterTargetFromMaintenanceWindow removing more than 10% of the fleet — high-priority ticket; an explicit operator choice that should map to a tracked change ticket, not a one-off CLI invocation.
Patch compliance below 95% for any patch-group for more than 14 days — informational hygiene ticket; promote to incident if the affected patch-group includes internet-facing workloads and the missing patches include Critical CVEs.

Initial response

Restore the maintenance-window from IaC with aws ssm create-maintenance-window using the canonical schedule expression, then re-register patch-baseline targets via register-target-with-maintenance-window and register-task-with-maintenance-window.
Trigger a one-shot patch sweep with aws ssm start-automation-execution --document-name AWS-RunPatchBaseline against the affected patch-group to close the gap created by the missed scheduled window.
Open an incident via general/ir.html if any production instance has missed two or more scheduled patch cycles; the workload's exposed CVE list should be enumerated via aws inspector2 list-findings filtered on the instance ARN, and any active-exploit CVE should drive immediate remediation rather than waiting for the next maintenance window.

References

AWS Systems Manager — Patch Manager reference (accessed 2026-05)
Cross-provider equivalence: Azure · GCP · OCI

Equivalent on: Azure · GCP · OCI

AWS Workloads Hardening

Overview

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Remediation — AWS CDK (TypeScript)

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Remediation — AWS CLI

Remediation — Terraform

Remediation — CloudFormation

Compliance mapping

Log signals

Query

Alert threshold

Initial response

References

Sources