A workload, in this corpus, is any executable artefact a customer
runs against a cloud-provider compute substrate: a virtual machine
booted from an image, a container scheduled by Kubernetes or a
managed runtime, a function invoked by an event source, or a
managed-runtime application packaged behind a platform abstraction
such as AWS App Runner, Azure App Service, Google Cloud Run, or
Oracle Container Instances. The abstractions differ; the
principles that determine whether a compromise of an attacker's
dropper, a stolen credential, or a malicious dependency turns
into customer data loss do not. Workload hardening reduces the
population of resources an attacker can land on, raises the cost
of code execution on those resources, and makes the resulting
activity visible to the detection pipeline documented in
the General logging and detection principles page.
This page treats six principles as the canonical workload-hardening
backbone — image and OS hardening, patch management, runtime
security, container-specific controls, serverless-specific
controls, and software supply chain integrity — plus a brief
cross-reference to secrets at runtime whose canonical treatment
lives on the
General IAM principles page.
Configuration knobs that implement each principle on AWS, Azure,
GCP, and OCI are deferred to the per-provider workload pages at
aws/workloads.html,
azure/workloads.html,
gcp/workloads.html, and
oci/workloads.html.
The threats these principles answer are catalogued on
the cloud threat model page;
the partition of responsibility between provider and customer
is established on
the shared responsibility model page.
Image / OS hardening
The base image is the cheapest place in a workload's life cycle
to remove attack surface. A custom AMI, managed image, custom
image, or custom OCI image that begins from a CIS Hardened Image
or an equivalent provider-blessed minimal image inherits the
configuration baselines codified in the CIS Benchmark for the
underlying operating system — locked-down sysctl, disabled
legacy services, sane file permissions, NIST SP 800-53 rev5
CM-6 (Configuration Settings) compliance — without the customer
re-deriving them. The CIS-CAT tool measures drift against the
benchmark and produces audit-grade reports; running it in the
image-build pipeline catches drift before it ships, not in
production.
Whether the base is a CIS Hardened Image or a custom build, the
surface area that ships with the image is the floor of the
surface area in production. Default accounts (root, ec2-user,
opc, azureuser) that are not strictly required should be
disabled or password-locked; an SSH daemon that is not strictly
required for a workload type (everything that is not a bastion,
and most container hosts) should be removed entirely rather than
firewalled. Unnecessary packages — compilers, debuggers,
interactive editors, package managers in production images —
shrink both the attack surface and the credibility of any
lateral-movement attempt. The same logic applies to listening
network services: an image whose ss -lntp output is
minimal at boot has fewer paths a compromised neighbour can
probe.
On top of the hardened image, three observability primitives
should be installed at build time, not bolted on at runtime:
host audit logging via auditd or its equivalent
(writing to the centralised logging substrate of
the General logging and detection principles page);
a host inventory and configuration-introspection agent such as
osquery; and an endpoint detection-and-response sensor sized to
the workload's risk class. The image-build pipeline produces
provenance metadata — SBOM, signature, build attestation —
covered in §Supply chain security;
that metadata is the connective tissue between the hardened
image and the supply-chain controls that protect what is
inside it.
Patch management
Patch cadence is one of the few security metrics that is easy
to measure and difficult to argue with. The cadence this corpus
recommends — and which appears in CIS Controls v8 Safeguard 7.3
and 7.4 and in NIST SP 800-53 rev5 SI-2 (Flaw Remediation) — is
forty-eight to seventy-two hours for critical CVEs against
internet-facing or high-privilege workloads, and thirty days
for everything else. Faster is better; the floor exists to
make slipped cadences visible as findings rather than as
permanently absent controls.
Each major provider ships a cloud-native patch service that
removes the operational excuse of "we lacked tooling." AWS
Systems Manager Patch Manager orchestrates baseline selection,
maintenance windows, and compliance reporting across EC2 and
on-premise hybrid fleets. Azure Update Manager (the Arc-aware
successor to Update Management) drives patch scans and
deployments across Azure VMs and Arc-enabled servers from a
single plane. Google Cloud OS Patch Management (under VM
Manager) covers Compute Engine instances with patch
deployments and compliance reports. Oracle OS Management Hub
provides the equivalent across OCI Compute and on-premise
hosts. Adopting one of these — and routing its compliance
status into the security findings substrate, not into an
email folder — is the cheapest path to a measurable patch
posture.
Patch compliance reporting belongs in the security findings
pane (AWS Security Hub, Microsoft Defender for Cloud, Google
Security Command Center, OCI Cloud Guard) alongside
configuration and threat findings. When a critical CVE
publishes, the question "how many of our workloads are
exposed?" should be answerable in minutes, not days. Where
patching is impossible — third-party appliances, embedded
firmware, fragile legacy applications — compensating controls
(network isolation, EDR-enforced behavioural detection, more
aggressive logging) become explicit rather than implicit.
Runtime security
Patching addresses known flaws; runtime security addresses the
space between disclosure and patch and the long tail of
misuse, abuse, and zero-day exploitation that no patch
schedule closes. On virtual machines, an endpoint
detection-and-response sensor — Microsoft Defender for
Endpoint, Amazon GuardDuty Runtime Monitoring, Google Cloud
Security Command Center virtual machine threat detection,
Oracle Cloud Guard Instance Security — provides behavioural
telemetry (process trees, network connections, file integrity)
and a detection signal far more durable than antivirus
signatures. The sensor should be installed in the base image
(image-time, not runtime), should report into the same
centralised security findings substrate as configuration
findings, and should have its own alerting on health: a
missing or muted EDR sensor is itself an incident-worthy
condition.
Containers run too briefly and at too high a density for
per-host EDR to be the only line of defence. Container runtime
security tooling watches kernel-level syscalls and container
life-cycle events instead. Falco, the CNCF runtime-security
project, expresses detections as YAML rules over syscall and
Kubernetes audit events. Provider equivalents and integrations
include Microsoft Defender for Containers (with its built-in
runtime threat detection for AKS, EKS, and GKE), Google
Kubernetes Engine Security Posture and GKE runtime threat
detection, and Oracle Cloud Guard with the Kubernetes engine
agent. The detections worth wiring up first are the ones
covered by NIST SP 800-190 §4 — shell spawned in a production
container, unexpected outbound network from a workload pod,
mount of the host filesystem, escalation via a writable
/etc — because they are observable, low-noise,
and high-signal.
Behavioural detection is also worth more than signature
detection on cloud workloads because attackers increasingly
live off the land: legitimate aws sts get-caller-identity
calls from an unusual instance, a kubectl exec
from a service-account that has never exec'd before, an
outbound DNS request to a never-resolved domain. Signatures
catch the previous campaign; behaviour catches the next one.
Detection coverage maps cleanly onto the MITRE ATT&CK Cloud
and Containers matrices, which is the recommended baseline
for runtime-detection content per
General logging and detection principles.
Container-specific
Containers add four hardening principles on top of the
general workload baseline: image provenance, admission
control, pod-level least privilege, and namespace and network
isolation. NIST SP 800-190 (Application Container Security
Guide) is the canonical reference for the first three; the
fourth overlaps with
the General network principles page.
Image provenance answers the question "did we, in fact, build
and approve this image?" The answer is a cryptographic
signature produced at build time and verified at deploy time.
Sigstore's cosign, the Notary v2 / Notation toolchain, and
AWS Signer all sign container images using short-lived
identities (OIDC-bound, in the cosign keyless model) or
long-lived KMS-backed keys. The signature is attached to the
image (as an OCI artefact in the registry) and verified by
the cluster before the image runs. Without verified
provenance, the supply chain ends at the registry — and any
attacker who can write to the registry can replace what runs.
Admission control is the enforcement point that consumes the
provenance signal. Kubernetes admission controllers — Kyverno,
OPA Gatekeeper, Google Binary Authorization, Azure Policy for
AKS, the AWS GuardDuty / Defender for Containers admission
integrations on EKS — evaluate every pod, deployment, and
custom resource against a policy set before it admits to the
cluster. The minimum useful policy set rejects unsigned images,
rejects images from registries outside an allow-list,
rejects containers running as root, rejects writable root
filesystems, rejects pods that mount the host filesystem or
escalate privileges, and rejects pods that grant themselves
excessive capabilities. The illustrative control in
§Illustrative control
expands this pattern in full DS-05 markup.
Pod-level least privilege is the in-cluster mirror of IAM
least privilege: containers should run as a non-root UID,
with a read-only root filesystem, with the Linux capabilities
set dropped to the minimum the workload actually requires
(most workloads need none), with seccomp profile
RuntimeDefault or stricter, and with no
privileged: true, no hostNetwork,
no hostPID, no host-path volumes. Combined with
Kubernetes NetworkPolicy and provider VPC primitives
(cross-linked from
network principles),
these settings shrink the blast radius of a single
compromised pod from "the cluster" to "the pod."
Serverless functions — AWS Lambda, Azure Functions, Google
Cloud Run functions, OCI Functions — eliminate host
patching and image hardening as customer concerns but
introduce three workload-specific hardening principles of
their own. First, the function execution role is the
credential the function holds for every invocation; it is
the single highest-value secret in the function's life
cycle and must be scoped to the function's actual data and
API needs per the least-privilege principle on
the General IAM principles page.
A function whose execution role grants * on
DynamoDB or Storage is a one-shot data-exfiltration tool
for anyone who triggers it with malicious input.
Second, environment variables are not a secrets store.
Function environment variables are visible to anyone with
read access to the function configuration (and, in some
providers, to anyone with read access to the deployment
metadata in the platform's audit trail). Secrets belong in
the provider secret store (AWS Secrets Manager, Azure Key
Vault, Google Secret Manager, OCI Vault), fetched at
invocation time with the execution role acting as the
authoriser; canonical treatment lives in
General IAM principles §Secrets management.
Third, functions that interact with private data planes
(databases, internal APIs) should be attached to a VPC, VNet,
or VCN with controlled egress; functions that do not need
internet egress should not have it. The cold-start identity
window — the moment the platform assumes the execution role
for the first invocation — is also the moment any platform
compromise would be most visible, which is why audit-log
coverage of function invocations and role assumptions is
non-negotiable.
Supply chain security
Software supply chain attacks — SolarWinds, the
xz-utils backdoor, malicious npm typosquats, compromised
GitHub Actions runners — are by now a mature category, not
an edge case. The defensive baseline is codified across
three publications that should be read as a set: NIST SSDF
SP 800-218 (Secure Software Development Framework), which
enumerates the practices every producer of software should
adopt; NIST SP 800-204D, which adapts SSDF for CI/CD
environments; and the SLSA framework (Supply-chain Levels
for Software Artefacts), which formalises a maturity ladder
from SLSA 1 (build provenance exists) through SLSA 4
(hermetic, reproducible, two-person-reviewed builds). The
SLSA specification is community-maintained and versions
frequently; verify the current level definitions at writing
time against the slsa.dev specification page.
The artefact that ties supply-chain controls together is the
software bill of materials. CycloneDX and SPDX are the two
standards in use; either is acceptable, both should be
consumable by downstream tooling. CISA's SBOM guidance and
the NTIA "Minimum Elements for a Software Bill of Materials"
(July 2021) define what a usable SBOM contains: supplier,
component name, version, unique identifiers, dependency
relationships, author of SBOM data, and timestamp. SBOMs
should be generated during the build (not after, not
on demand), signed alongside the artefact, and stored
in a queryable substrate so that "are we exposed to
CVE-YYYY-NNNNN?" becomes a SQL query rather than a fire
drill.
Dependency scanning is the operational layer below SBOM.
GitHub Dependabot, Snyk, Sonatype, AWS Inspector v2
(which scans ECR images and Lambda functions), Microsoft
Defender for Cloud's vulnerability assessment for
containers, Google Artifact Registry vulnerability
scanning, and Oracle Vulnerability Scanning Service all
surface known-vulnerable dependencies against published
CVE feeds. The value of these tools is proportional to
how quickly findings reach a developer who can fix them;
a 14-day SLA on dependency findings in production
artefacts is a reasonable starting baseline, tighter for
high-privilege internet-facing services.
Build-system isolation closes the last gap. CI/CD
runners that have access to production credentials or
signing keys are themselves part of the supply chain;
they should run on ephemeral, locked-down infrastructure,
authenticate to clouds via short-lived OIDC tokens (AWS
IAM Roles for GitHub Actions, Azure workload identity
federation, GCP Workload Identity Federation, OCI
identity federation) rather than long-lived static keys,
and emit verifiable build provenance attestations
alongside the artefacts they produce.
[Diagram placeholder]
Figure 2 — SLSA levels 1 through 4 mapped against
controls: L1 build provenance generated; L2
authenticated provenance with hosted build service; L3
non-falsifiable provenance with isolated, ephemeral
build; L4 reproducible, two-party-reviewed hermetic
build. SBOM generation, signing, and admission-control
verification are layered alongside the SLSA progression.
Secrets in workloads
Workloads consume credentials: database passwords, API
keys for downstream services, signing keys, OAuth client
secrets. The canonical treatment of secrets — including
the rotation cadence, provider-store comparison
(AWS Secrets Manager, Azure Key Vault, GCP Secret Manager,
OCI Vault), workload-identity-federation patterns that
eliminate static keys, and the static-key-elimination
requirement — lives on
the General IAM principles page §Secrets management
rather than being duplicated here. This page reiterates
the three workload-side rules: secrets do not live in
source repositories, container images, function
environment variables, or CI logs; secrets are fetched
from a provider secret store at runtime by an identity
(instance profile, managed identity, service account,
resource principal) whose privilege is narrow; and
secret access is itself an audit event worth alerting
on. Cross-link to
the General incident response principles page
for the credential-isolation containment pattern that
consumes secret-access telemetry during a live incident.
Cross-provider equivalence
The four major providers cover the same hardening
principles with different products. The table below is a
quick-reference for "where do I look in provider X for
control Y?" — not a compliance crosswalk. Per-provider
depth lives in the workload pages of each provider's
domain section, which document the configuration knobs,
CLI invocations, and Terraform resources that this
general page intentionally elides.
Principle
AWS
Azure
GCP
OCI
EDR / VM runtime detection
GuardDuty Runtime Monitoring + Inspector v2
Microsoft Defender for Servers (Plan 2)
Security Command Center VM Threat Detection
Oracle Cloud Guard Instance Security
Container registry vulnerability scan
Amazon Inspector v2 (ECR)
Microsoft Defender for Containers
Artifact Registry vulnerability scanning
OCI Vulnerability Scanning Service (Container Registry)
Admission control for signed images
No native control plane; OPA / Kyverno on EKS, plus Defender for Containers admission integration
Azure Policy for AKS + Defender for Containers admission
The control below illustrates the canonical
<article class="control-box"> markup as
it appears across the corpus. It is provider-neutral and
intended to be read as a worked example rather than as a
directly-applicable recommendation; each provider's
workloads page restates the same intent with provider-specific
CLI and IaC. The control mitigates supply-chain image
substitution attacks — an adversary with write access to a
container registry replaces a legitimate image, or pushes a
new tag, expecting the cluster to pull it unchecked. The
attack chain is enumerated in
the cloud threat model page
under software-supply-chain adversary classes.
gen-work-ex-01
Enforce signed container images via admission control
⚠HIGHPREVENTIVE
The control admits only images whose digest is signed by a
trusted key (cosign keyless via OIDC, AWS Signer KMS-backed
key, Google Binary Authorization attestor, or equivalent)
and whose signature chain resolves to a build pipeline the
organisation controls. Tag-based references are rejected;
references resolve to immutable digests before signature
verification. The policy is enforced at admission, not at
scheduling, so a rejected image never gets a pod-spec.
Compliance mappings follow the canonical seven-column
framework header per
docs/control-template.md;
the cell content below names benchmark recommendations whose
exact numbering must be verified against the pinned version
in use, per the corpus's pinned-version contract documented
on
the compliance frameworks page.