This page covers Google Cloud Platform workload hardening across four execution surfaces — Compute Engine virtual machines, Google Kubernetes Engine clusters (Autopilot and Standard), Cloud Run services, and Cloud Functions — and the supply chain that produces the artefacts those workloads run. Scope is the commercial GCP regions; GCP Sovereign Cloud (formerly Assured Workloads and the Google Cloud Air-Gapped offering) inherits the same controls but exposes a different region table and a different service-availability matrix — re-verify against the relevant cloud.google.com sovereign endpoint documentation before applying the IaC below to a sovereign or air-gapped deployment. CIS sub-IDs throughout this page reference the CIS Google Cloud Platform Foundation Benchmark v4.0.0 — May 2025 release (accessed 2026-05) unless explicitly annotated as a post-v4.0.0 best-practice recommendation that the current benchmark has not yet codified. The cross-cutting principles — image / OS hardening, patch management, runtime security, container-specific concerns, serverless-specific concerns, supply chain, and secrets in workloads — are owned by the General Workloads page; this page maps them to GCP primitives. The canonical secrets-management treatment lives on the General IAM page; gcp-work-05 cross-links to it rather than re-authoring.
The GCP workloads model layers four product families. Compute Engine exposes virtual machines whose firmware (UEFI + measured boot via vTPM), boot integrity (Secure Boot policy), and runtime integrity (kernel-measurement attestation) are configured per-instance and gated by the organisation-level constraints/compute.requireShieldedVm Org Policy. Google Kubernetes Engine (GKE) runs Kubernetes clusters in two flavours — Autopilot (Google-operated node pool, no node access; reduces control-plane operator burden) and Standard (customer-operated node pools; supports privileged DaemonSets, custom OS images, GPU pools) — and gates pod identity via Workload Identity Federation for GKE using the current --workload-pool=PROJECT_ID.svc.id.goog flag (the legacy identity-namespace flag form is deprecated). Cloud Run is the managed-container serverless surface; it accepts container images from Artifact Registry, runs them under a per-service service account, and exposes ingress that must be restricted away from all. Cloud Functions shares the same Cloud Run substrate in its 2nd-generation form. The supply chain is anchored by Artifact Registry (Docker, Maven, npm, Python, Go, Yum/Apt; CMEK-encryptable) with the Container Analysis API performing CVE + OS-package vulnerability scanning, and Binary Authorization as the admission-policy layer that requires attestations from cryptographically-identified attestors before an image can be deployed to GKE or Cloud Run. VM Manager ties the steady-state observability and patch loops together: OS Config Inventory, OS Config Compliance (reporting to Security Command Center), and OS Config Patch Deployments. The cross-cutting severity rubric applies; equivalence callouts at the bottom of each control point at the matching control on the AWS, Azure, and OCI sibling pages.
Four anti-conflation callouts up front, because each pair gets conflated in audit reports and architecture reviews and the distinction matters for control design. First: GKE is presented as a single umbrella control (gcp-work-06), not split by mode. Workload Identity Federation for GKE, private cluster topology (private endpoint + master-authorized networks), Binary Authorization integration, GKE Dataplane V2 (Cilium-based eBPF with NetworkPolicy default-deny), Shielded GKE Nodes, and the Autopilot-vs-Standard authoring choice all sit inside that one control body. Autopilot eliminates control-plane operator burden but constrains node-level customisation; Standard is required for privileged DaemonSets, custom OS images, GPU pools, and any workload that needs direct node access. The choice is a deployment-mode decision, not a different control surface. sibling-anchors.tsv pre-locks an 8-control count for this page and splitting GKE would inflate it without adding pedagogical value — the same umbrella decision applies on the EKS (Phase 6) and AKS (Phase 7) pages.
Second: Artifact Registry vulnerability scanning and Binary Authorization are one workflow, one control (gcp-work-03). The Container Analysis API automatically scans every image push for CVE matches against the upstream OS-package indexes and Go / Python / Java / Node.js language dependencies; the results are persisted as Note + Occurrence resources tied to the image digest. Binary Authorization then evaluates a per-cluster (or per-service) admission policy at deploy time; that policy requires attestations from named attestors — typically encoding "image built by the trusted CI/CD pipeline" and "image passed CVE scan with no CRITICAL/HIGH unfixed". Sigstore cosign produces the keyless or KMS-backed signatures that those attestors consume; SLSA Build Level 3 is the current Google-recommended supply-chain bar (in-toto attestation generated by a hardened, isolated builder). Scanner + policy + attestation is a single supply-chain story; one control, two-part body.
Third: VM Manager has two distinct review workflows on this page (gcp-work-04 and gcp-work-08) even though they share the same product.gcp-work-04 covers OS Config Inventory (what is installed) + OS Config Compliance (whether the configuration matches policy) and the resulting vulnerability-report channel into Security Command Center — the detection and posture side of VM Manager. gcp-work-08 covers OS Config Patch Deployments and Patch Policies — the patch-application side: how the organisation rolls out CVE remediations on a recurring schedule with maintenance windows and rolling-restart semantics. They mirror the Phase 6 Inspector + Systems Manager Patch Manager two-surface split, and the Phase 7 Defender for Servers + Update Manager split — the same product, two distinct review rituals.
Fourth: Cloud Run services run under a dedicated service account, NOT the default Compute SA (gcp-work-05). The Compute Engine default service account ships with roles/editor across the project — a Cloud Run service running under that identity has project-wide edit on every resource, every secret, every IAM binding. Each Cloud Run service must have its own least-privileged service account; ingress must be restricted to internal-and-cloud-load-balancing or internal for any service that does not legitimately serve the public internet; secrets must arrive via Secret Manager references (run.googleapis.com/secrets annotation or --set-secrets binding), never via plaintext environment variables in the service revision; and the IAM invoker binding must enumerate explicit principals — never allUsers (anonymous) or allAuthenticatedUsers (any Google account on the internet) unless the service is genuinely intended to be public. Cross-link to the canonical Secrets Manager treatment rather than re-authoring the secrets-management reference architecture here.
Order and scope matter. Controls 01–02 are foundational invariants enforced organisation-wide via Org Policy and instance metadata: every Compute Engine instance gets Shielded VM (Secure Boot + vTPM + Integrity Monitoring) and is reachable only through OS Login + IAP, with no metadata-based SSH keys and no external IPs on workload VMs. Controls 03–04 close the steady-state observability loop: every image is scanned and admission-gated, every running VM reports inventory and compliance. Controls 05–06 harden the serverless and Kubernetes execution surfaces. Controls 07–08 close the supply-chain and patch loops at maturity.
gcp-work-01-shielded-vm!CRITICALPREVENTIVE
Enforce Shielded VM (Secure Boot + virtual Trusted Platform Module + Integrity Monitoring) on every Compute Engine instance organisation-wide via the constraints/compute.requireShieldedVm Org Policy constraint. Shielded VM raises the boot-firmware bar — UEFI firmware with Microsoft-signed root certificate, measured boot recorded into a vTPM, integrity-monitoring baselines verified against the running guest's PCRs — and refuses to boot images whose bootloader or kernel modules fail signature verification (Google Cloud — Shielded VM documentation (accessed 2026-05)). The principle is reinforced in General Workloads — image / OS hardening: an instance whose pre-OS attack surface is not measured cannot reason about whether it has been rootkitted before the operating system even started. Confidential VM (AMD SEV / SEV-SNP, Intel TDX) is called out as the upgrade path for regulated workloads — it adds memory encryption with attestation against the platform — but it is not a replacement for the Shielded VM baseline. Confidential VM is layered on top: --confidential-compute --maintenance-policy=TERMINATE requires Shielded VM features to be enabled. CRITICAL because a non-Shielded VM accepting an unsigned bootloader or a tampered kernel is the canonical rootkit-survives-reboot scenario; CIS GCP v4.0.0 §4 codifies the requirement.
Remediation — gcloud CLI
# gcloud CLI (latest stable)
# Step 1: enforce constraints/compute.requireShieldedVm at the organisation scope.
cat > require-shielded-vm.yaml <<'YAML'
name: organizations/ORG_ID/policies/compute.requireShieldedVm
spec:
rules:
- enforce: true
YAML
gcloud org-policies set-policy require-shielded-vm.yaml \
--organization=ORG_ID
# Step 2: create a new Shielded VM (boot must use a Shielded-VM-compatible image family).
gcloud compute instances create app-prod-01 \
--project=svc-app-prod \
--zone=europe-west1-b \
--machine-type=n2-standard-4 \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--no-address \
--service-account=sa-app-prod@svc-app-prod.iam.gserviceaccount.com \
--scopes=cloud-platform
# Step 3: inventory existing instances missing any Shielded VM toggle.
for project in $(gcloud projects list --format='value(projectId)'); do
gcloud compute instances list --project="$project" \
--format="value(name,zone,shieldedInstanceConfig.enableSecureBoot,shieldedInstanceConfig.enableVtpm,shieldedInstanceConfig.enableIntegrityMonitoring)" 2>/dev/null \
| awk -F'\t' '$3!="True" || $4!="True" || $5!="True" { print "'"$project"'\t" $0 }'
done
# Step 4: enable the three toggles on an existing stopped instance.
gcloud compute instances stop app-legacy-01 --zone=europe-west1-b --project=svc-app-prod
gcloud compute instances update app-legacy-01 \
--project=svc-app-prod --zone=europe-west1-b \
--shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring
gcloud compute instances start app-legacy-01 --zone=europe-west1-b --project=svc-app-prod
Cloud Audit Logs on compute.googleapis.com for v1.compute.instances.updateShieldedInstanceConfig where enableSecureBoot, enableVtpm, or enableIntegrityMonitoring transitions to false.
Integrity-monitoring violation entries from running VMs in resource.type="gce_instance" with jsonPayload.eventType="integrityViolation".
Org Policy state of constraints/compute.requireShieldedVm moved away from enforce: true.
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND protoPayload.serviceName="compute.googleapis.com"
AND protoPayload.methodName=~"v1.compute.instances.updateShieldedInstanceConfig"
AND (protoPayload.request.enableSecureBoot=false
OR protoPayload.request.enableIntegrityMonitoring=false)
Pair this Cloud Logging filter with a saved query on integrity-violation entries (resource.type="gce_instance"); the two streams together provide both config-drift detection and runtime-integrity detection in a single Cloud Monitoring alert policy.
Alert threshold
Page on any updateShieldedInstanceConfig call that disables Secure Boot or Integrity Monitoring on production VMs.
Page on the first integrity-violation entry from any production VM — vTPM measurement divergence indicates boot-stage tampering.
Initial response
Re-enable Shielded VM features via gcloud compute instances update --shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring and reboot the instance to re-measure the boot chain.
If an integrity violation fired, snapshot the VM's boot disk for forensic review and rebuild the VM from a known-good image; do not patch a tampered VM in place.
Re-assert the constraints/compute.requireShieldedVm constraint at the organisation node; pin VM templates in Terraform with all three Shielded flags set.
Enforce OS Login organisation-wide via the constraints/compute.requireOsLogin Org Policy constraint, require 2-Step Verification (enable-oslogin-2fa = TRUE as project or instance metadata), deny metadata-based SSH keys (block-project-ssh-keys = TRUE), and access workload VMs exclusively through Identity-Aware Proxy (IAP) TCP forwarding — no external IPs on workload VMs (Google Cloud — OS Login documentation (accessed 2026-05); Google Cloud — IAP TCP forwarding documentation (accessed 2026-05)). OS Login ties SSH access to Google Cloud IAM: a user with the roles/compute.osLogin (or roles/compute.osAdminLogin for sudo) role on a project or instance can SSH, and their POSIX uid/gid is provisioned from their Google identity rather than from an instance-local authorized_keys file. IAP TCP forwarding tunnels SSH (and any other TCP protocol) through an IAP front-end that checks IAM (roles/iap.tunnelResourceAccessor) before the connection ever reaches the VM; workload VMs can therefore live in subnets with no external IPs. Anti-conflation: OS Login is the identity-binding layer (who can SSH); IAP is the network-path layer (how the SSH connection arrives). Both are required: OS Login alone leaves you needing external IPs or a self-managed bastion; IAP alone leaves you with instance-local SSH keys and the offboarding gap they imply.
Remediation — gcloud CLI
# gcloud CLI (latest stable)
# Step 1: enforce OS Login organisation-wide via Org Policy.
cat > require-oslogin.yaml <<'YAML'
name: organizations/ORG_ID/policies/compute.requireOsLogin
spec:
rules:
- enforce: true
YAML
gcloud org-policies set-policy require-oslogin.yaml --organization=ORG_ID
# Step 2: require 2-Step Verification and block metadata SSH keys at project scope.
gcloud compute project-info add-metadata --project=svc-app-prod \
--metadata=enable-oslogin=TRUE,enable-oslogin-2fa=TRUE,block-project-ssh-keys=TRUE
# Step 3: grant a user the OS-Login role + the IAP tunnel role at project scope.
gcloud projects add-iam-policy-binding svc-app-prod \
--member='user:alice@example.com' \
--role='roles/compute.osLogin'
gcloud projects add-iam-policy-binding svc-app-prod \
--member='user:alice@example.com' \
--role='roles/iap.tunnelResourceAccessor'
# Step 4: connect to a VM with no external IP through IAP TCP forwarding.
gcloud compute ssh app-prod-01 \
--project=svc-app-prod --zone=europe-west1-b \
--tunnel-through-iap
# Step 5: audit instances that still carry external IPs (no workload VM should).
for project in $(gcloud projects list --format='value(projectId)'); do
gcloud compute instances list --project="$project" \
--format="value(name,zone,networkInterfaces[].accessConfigs[].natIP)" 2>/dev/null \
| awk -F'\t' 'NF==3 && $3!="" { print "'"$project"'\t" $0 }'
done
Remediation — Terraform
# Terraform Google provider ~> 5.0
# Source: Google Cloud docs (accessed 2026-05)
resource "google_org_policy_policy" "require_oslogin" {
name = "organizations/${var.org_id}/policies/compute.requireOsLogin"
parent = "organizations/${var.org_id}"
spec {
rules {
enforce = "TRUE"
}
}
}
resource "google_compute_project_metadata" "oslogin_metadata" {
project = var.app_project_id
metadata = {
enable-oslogin = "TRUE"
enable-oslogin-2fa = "TRUE"
block-project-ssh-keys = "TRUE"
}
}
resource "google_project_iam_member" "oslogin_alice" {
project = var.app_project_id
role = "roles/compute.osLogin"
member = "user:alice@example.com"
}
resource "google_project_iam_member" "iap_tunnel_alice" {
project = var.app_project_id
role = "roles/iap.tunnelResourceAccessor"
member = "user:alice@example.com"
}
# Explicit per-instance IAP tunnel grant (defence in depth).
resource "google_iap_tunnel_instance_iam_member" "alice_to_app_prod_01" {
project = var.app_project_id
zone = "europe-west1-b"
instance = "app-prod-01"
role = "roles/iap.tunnelResourceAccessor"
member = "user:alice@example.com"
}
Cloud Audit Logs on compute.googleapis.com for setCommonInstanceMetadata or setMetadata calls removing enable-oslogin or adding block-project-ssh-keys=false.
IAP-tunnel disable events: compute.googleapis.com firewall-rule changes removing the IAP source range (35.235.240.0/20) from SSH ingress on production VPCs.
Direct SSH connection attempts to public IPs: VPC Flow Logs entries on port 22 from non-IAP sources — captures both pure SSH-on-public-IP and bypass attempts.
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND protoPayload.serviceName="compute.googleapis.com"
AND protoPayload.methodName=~"v1.compute.(instances|projects).setMetadata"
AND (protoPayload.request.items.key="block-project-ssh-keys"
OR protoPayload.request.items.key="enable-oslogin")
AND (protoPayload.request.items.value="false"
OR protoPayload.request.items.value="FALSE")
This Cloud Logging filter watches the metadata-level controls; pair with a VPC Flow Logs query on inbound port 22 traffic so silent IAP-bypass attempts surface alongside the metadata mutation events.
Alert threshold
Page on any metadata mutation disabling OS Login on production VMs or projects.
Page on any inbound SSH connection from outside the IAP source range on a VM tagged for IAP-only access.
Initial response
Re-enable OS Login via gcloud compute project-info add-metadata --metadata enable-oslogin=TRUE; remove unauthorised SSH keys from project metadata and per-instance metadata.
Audit OS Login audit logs (resource.type="audited_resource" with service="oslogin.googleapis.com") for sign-in activity during the gap window; cross-correlate against the OS Login posix-account allow-list.
Pin IAP-tunnel firewall config in Terraform; close the public-SSH path entirely so future bypass attempts fail at the network layer regardless of metadata state.
Two-part supply-chain control: (a) every Artifact Registry repository has the Container Analysis API automatic vulnerability scanning enabled (CVE matches against OS-package indexes plus Go / Java / Node.js / Python language dependencies are persisted as Note + Occurrence resources tied to the image digest; Container Analysis — container scanning overview (accessed 2026-05)), AND (b) every GKE cluster and every Cloud Run service that consumes those images has a Binary Authorization policy in PROJECT_SINGLETON_POLICY_ENFORCE mode requiring attestations from one or more named attestors. The attestor encodes "image built by the trusted CI/CD pipeline AND CVE-scanned" using Sigstore cosign signatures (keyless via OIDC, or KMS-backed via Cloud KMS); the in-toto attestation produced by a hardened, isolated builder is the current Google-recommended SLSA Build Level 3 evidence (SLSA v1.0 — build levels (accessed 2026-05); Google Cloud — Software Delivery Shield overview (accessed 2026-05)). DETECTIVE on the scanning half (the scanner surfaces CVEs already in the image), PREVENTIVE-equivalent on the admission half (BinAuthz blocks deploys that fail the attestation check) — the control is typed DETECTIVE because the surface that names it is the scanner; the policy is the natural enforcement pair. Repository keys should be CMEK (kms_key_name) for regulated workloads. Cite the Binary Authorization key concepts reference for policy-evaluation semantics.
Remediation — gcloud CLI
# gcloud CLI (latest stable)
# Step 1: create a CMEK-encrypted Artifact Registry repository.
gcloud artifacts repositories create app-images \
--repository-format=docker \
--location=europe-west1 \
--kms-key=projects/svc-kms-prod/locations/europe-west1/keyRings/app-prod/cryptoKeys/artifacts \
--project=svc-app-prod
# Step 2: enable Container Analysis API (scanning is automatic once enabled).
gcloud services enable containeranalysis.googleapis.com \
artifactregistry.googleapis.com binaryauthorization.googleapis.com \
--project=svc-app-prod
# Step 3: query vulnerabilities for a specific image digest.
gcloud artifacts docker images list \
europe-west1-docker.pkg.dev/svc-app-prod/app-images/api \
--include-tags --show-occurrences
gcloud artifacts docker images describe \
europe-west1-docker.pkg.dev/svc-app-prod/app-images/api@sha256:DIGEST \
--show-package-vulnerability
# Step 4: define + import a Binary Authorization policy that requires attestations.
cat > binauthz-policy.yaml <<'YAML'
defaultAdmissionRule:
evaluationMode: REQUIRE_ATTESTATION
enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
requireAttestationsBy:
- projects/svc-app-prod/attestors/built-by-prod-ci
clusterAdmissionRules:
europe-west1.gke-prod:
evaluationMode: REQUIRE_ATTESTATION
enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
requireAttestationsBy:
- projects/svc-app-prod/attestors/built-by-prod-ci
globalPolicyEvaluationMode: ENABLE
YAML
gcloud container binauthz policy import binauthz-policy.yaml --project=svc-app-prod
# Step 5: create an attestor backed by a Cloud KMS key (cosign-compatible).
gcloud container binauthz attestors create built-by-prod-ci \
--attestation-authority-note=projects/svc-app-prod/notes/built-by-prod-ci-note \
--project=svc-app-prod
# Step 6: sign an image post-CVE-scan in CI (cosign with a Cloud KMS key).
cosign sign --key gcpkms://projects/svc-kms-prod/locations/global/keyRings/attest/cryptoKeys/ci/cryptoKeyVersions/1 \
europe-west1-docker.pkg.dev/svc-app-prod/app-images/api@sha256:DIGEST
Cloud Audit Logs on containeranalysis.googleapis.com for Notes.delete or Occurrences.delete targeting vulnerability findings tied to Artifact Registry images.
Artifact Registry repo IAM mutations: artifactregistry.googleapis.comSetIamPolicy granting roles/artifactregistry.writer to principals outside the documented CI service-account list.
Vulnerability-scanning enablement state: Artifact Analysis settings transitions from ENABLED to DISABLED at the project scope.
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND ((protoPayload.serviceName="containeranalysis.googleapis.com"
AND protoPayload.methodName=~".*Occurrences.Delete")
OR (protoPayload.serviceName="artifactregistry.googleapis.com"
AND protoPayload.methodName=~".*SetIamPolicy"
AND protoPayload.serviceData.policyDelta.bindingDeltas.role="roles/artifactregistry.writer"))
Stream this Cloud Logging filter into Cloud Monitoring; pair with an Artifact Analysis findings query against the Pub/Sub topic so finding-delete activity and finding-creation rate are visible side-by-side.
Alert threshold
Page on any vulnerability-finding delete; findings should age out per the documented severity retention, not be deleted on demand.
Page on any new artifactregistry.writer binding outside the documented CI principals.
Initial response
Quarantine images deployed during the unrestricted-writer window: enumerate via Artifact Registry tag history and tag them quarantine; redeploy production workloads from a known-good tag.
Force a re-scan via Container Analysis API on every image whose findings were deleted; treat re-surfaced findings as if they had been present continuously.
Pin repo IAM bindings in Terraform; gate writer-role bindings via a CI-approval gate so console additions cannot bypass review.
Enable VM Manager across every project hosting Compute Engine workloads: deploy the Ops Agent (or the legacy Monitoring/Logging agents) via a managed policy, enable OS Config Inventory so the running package set is reported back, and enable OS Config Compliance to evaluate the running configuration against an OS Policy Assignment (CIS-hardened baseline; required sshd hardening; required auditd configuration) (Google Cloud — Manage OS with VM Manager (accessed 2026-05)). The compliance results surface in Security Command Center as Security Command Center findings, closing the detection loop from "package installed on a running VM" to "audit-ready dashboard at organisation scope". This is the detection / posture side of VM Manager; the patch-application side is gcp-work-08. Anti-conflation: OS Config Inventory reports what is installed (raw package list, kernel version, agent versions); OS Config Compliance evaluates that inventory against policy (is openssh-server the expected version? does /etc/ssh/sshd_config enforce PermitRootLogin no?); OS Config Patch Deployments (the -08 control) is the remediation surface that closes the loop on a recurring schedule. Same product family, three distinct workflows.
Remediation — gcloud CLI
# gcloud CLI (latest stable)
# Step 1: enable the VM Manager APIs.
gcloud services enable osconfig.googleapis.com containeranalysis.googleapis.com \
--project=svc-app-prod
# Step 2: enable OS Config on the project (per-instance metadata or project metadata).
gcloud compute project-info add-metadata --project=svc-app-prod \
--metadata=enable-osconfig=TRUE,enable-guest-attributes=TRUE
# Step 3: deploy the Ops Agent across the fleet via an OS Policy Assignment.
cat > ops-agent-policy.yaml <<'YAML'
osPolicies:
- id: install-ops-agent
mode: ENFORCEMENT
resourceGroups:
- resources:
- id: ops-agent
repository:
apt:
archiveType: DEB
uri: https://packages.cloud.google.com/apt
distribution: google-cloud-ops-agent-focal-all
components: [main]
gpgKey: https://packages.cloud.google.com/apt/doc/apt-key.gpg
- id: ops-agent-pkg
pkg:
desiredState: INSTALLED
apt:
name: google-cloud-ops-agent
instanceFilter:
all: true
rollout:
disruptionBudget:
percent: 25
minWaitDuration: 300s
YAML
gcloud compute os-config os-policy-assignments create install-ops-agent \
--location=europe-west1 \
--project=svc-app-prod \
--file=ops-agent-policy.yaml
# Step 4: query OS inventory for a single VM.
gcloud compute os-config inventories describe app-prod-01 \
--location=europe-west1-b --project=svc-app-prod
# Step 5: query OS compliance for the fleet.
gcloud compute os-config os-policy-assignment-reports list \
--location=europe-west1 --project=svc-app-prod
Cloud Audit Logs on osconfig.googleapis.com for PatchDeployments.delete or PatchDeployments.patch on the production patch schedule.
OS Config agent disable: compute.googleapis.com project-metadata mutations clearing enable-osconfig=TRUE.
VM Manager OS-policy violations: resource.type="gce_instance" entries from the osconfig service reporting compliance_state="NON_COMPLIANT" on a previously compliant instance.
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND protoPayload.serviceName="osconfig.googleapis.com"
AND (protoPayload.methodName=~".*PatchDeployments.(delete|patch)"
OR protoPayload.methodName=~".*OSPolicyAssignments.(delete|patch)")
Run this Cloud Logging filter at project scope; pair with a Cloud Asset Inventory query on compute.googleapis.com/Instance to surface the population of VMs without the OS Config agent enabled at metadata level.
Alert threshold
Page on any patch-deployment or OS-policy-assignment delete or patch on production scope.
Page when the population of OS-Config-enabled VMs falls more than 5% below the documented baseline coverage.
Initial response
Restore the patch deployment / OS policy from the captured baseline via gcloud compute os-config patch-deployments create; trigger an immediate patch-job execution to close the patching gap.
Audit which CVEs the affected VMs were exposed to during the patch-schedule gap by joining the OS-inventory snapshot against the NIST NVD feed.
Pin patch deployments + OS policies in Terraform; require the enable-osconfig metadata to be set at project level via Org Policy so per-VM opt-outs are impossible.
Every Cloud Run service runs under its own dedicated, least-privileged service account — NOT the Compute Engine default service account, which ships with roles/editor across the entire project. Ingress is restricted to internal-and-cloud-load-balancing (the service is fronted by an external HTTPS Load Balancer with Cloud Armor) or internal (only callers from the same VPC, Shared VPC, or VPC SC perimeter); never the default all for any service that does not legitimately need to be reachable from the public internet. Secrets are injected via Secret Manager references using the run.googleapis.com/secrets annotation or the equivalent --set-secrets binding on gcloud run deploy — never as plaintext environment variables in the service revision (Google Cloud — Cloud Run service identity (accessed 2026-05); Google Cloud — Cloud Run secrets (accessed 2026-05)). The IAM invoker binding enumerates explicit principals; allUsers (anonymous) and allAuthenticatedUsers (any Google account on the internet) are forbidden unless the service is genuinely intended to be public, in which case authorisation is layered behind an external HTTPS Load Balancer with Cloud Armor and IAP. VPC connectors (or Direct VPC egress) route the service to private resources without exposing them publicly. This control DOES NOT re-author the canonical Secrets Manager + KMS reference architecture — that lives at general/iam — secrets management (canonical-content rule).
Remediation — gcloud CLI
# gcloud CLI (latest stable)
# Step 1: create a dedicated service account for the Cloud Run service.
gcloud iam service-accounts create sa-api-prod \
--display-name="Cloud Run: api-prod" \
--project=svc-app-prod
# Step 2: grant ONLY the narrow IAM roles this service needs.
gcloud projects add-iam-policy-binding svc-app-prod \
--member='serviceAccount:sa-api-prod@svc-app-prod.iam.gserviceaccount.com' \
--role='roles/secretmanager.secretAccessor' \
--condition='expression=resource.name.startsWith("projects/svc-app-prod/secrets/api-prod-"),title=api-prod-secrets-only'
# Step 3: grant the Cloud Run service permission to access the database secret.
gcloud secrets add-iam-policy-binding api-prod-db-url \
--member='serviceAccount:sa-api-prod@svc-app-prod.iam.gserviceaccount.com' \
--role='roles/secretmanager.secretAccessor' \
--project=svc-app-prod
# Step 4: deploy with dedicated SA, internal ingress, Secret Manager refs, no public.
gcloud run deploy api-prod \
--image=europe-west1-docker.pkg.dev/svc-app-prod/app-images/api:v1.0.0 \
--service-account=sa-api-prod@svc-app-prod.iam.gserviceaccount.com \
--region=europe-west1 \
--project=svc-app-prod \
--ingress=internal-and-cloud-load-balancing \
--no-allow-unauthenticated \
--set-secrets=DATABASE_URL=api-prod-db-url:latest \
--vpc-connector=projects/svc-net-prod/locations/europe-west1/connectors/vpc-conn-prod \
--vpc-egress=private-ranges-only
# Step 5: grant the invoker role to a specific principal (NEVER allUsers).
gcloud run services add-iam-policy-binding api-prod \
--region=europe-west1 --project=svc-app-prod \
--member='serviceAccount:sa-frontend-prod@svc-app-prod.iam.gserviceaccount.com' \
--role='roles/run.invoker'
# Step 6: audit all Cloud Run services for forbidden invoker bindings (allUsers/allAuth).
for project in $(gcloud projects list --format='value(projectId)'); do
for svc in $(gcloud run services list --project="$project" --format='value(metadata.name,metadata.namespace)' 2>/dev/null); do
gcloud run services get-iam-policy "$svc" --project="$project" --format=json 2>/dev/null \
| jq -r '.bindings[]? | select(.members[]? | test("allUsers|allAuthenticatedUsers")) | "'"$project"'/'"$svc"': " + .role'
done
done
Cloud Audit Logs on run.googleapis.com for services.setIamPolicy adding allUsers as roles/run.invoker on a production service.
Cloud Run service spec patches setting ingress=all when the documented default is internal-and-cloud-load-balancing.
Cloud Run revision spec patches changing serviceAccountName to the default compute service account (PROJECT_NUMBER-compute@developer.gserviceaccount.com) — reverts to broad node-SA scope.
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND protoPayload.serviceName="run.googleapis.com"
AND ((protoPayload.methodName=~".*services.setIamPolicy"
AND protoPayload.serviceData.policyDelta.bindingDeltas.member="allUsers")
OR protoPayload.request.spec.template.spec.serviceAccountName=~".*-compute@developer.gserviceaccount.com"
OR protoPayload.request.metadata.annotations."run.googleapis.com/ingress"="all")
Stream this Cloud Logging filter into a Cloud Monitoring log-based metric grouped by service name; pair with HTTP-load-balancer logs on the service so request-pattern shifts after a public-binding addition are visible alongside the policy change.
Alert threshold
Page on any binding adding allUsers/allAuthenticatedUsers to a Cloud Run service or any ingress change to all.
Page on any revision deploy using the default compute service account on a production service.
Initial response
Remove the public binding via gcloud run services remove-iam-policy-binding --member=allUsers and revert ingress to internal-and-cloud-load-balancing.
Audit invoke counts during the exposed window from Cloud Run request logs; if the workload processes data on behalf of authenticated callers, treat any unauthenticated invoke as candidate-abuse.
Redeploy with the purpose-built service account; pin service IAM + ingress + serviceAccountName in Terraform and reject deploys that drift.
GKE cluster is hardened as a single umbrella control combining five inseparable surfaces. (a) Workload Identity Federation for GKE — clusters created with --workload-pool=PROJECT_ID.svc.id.goog bind Kubernetes Service Accounts (KSAs) to Google Service Accounts (GSAs) via the iam.workloadIdentityUser role; pods request short-lived OAuth2 tokens from the GKE metadata server using their KSA identity. The current syntax is --workload-pool=PROJECT_ID.svc.id.goog; the legacy identity-namespace flag form is deprecated and a G12 violation on this corpus (Google Cloud — GKE Workload Identity (accessed 2026-05)). (b) Private cluster topology — --enable-private-endpoint with master-authorized networks restricts the API server to a private IP plus a small allowlist of management CIDRs; nodes have no external IPs. (c) Binary Authorization integration — binary_authorization { evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE" } ensures pod admission consults the project's BinAuthz policy (the one configured in gcp-work-03). (d) GKE Dataplane V2 — --datapath-provider=ADVANCED_DATAPATH swaps kube-proxy/iptables for Cilium-based eBPF, enabling native NetworkPolicy enforcement with default-deny pod-to-pod plus FQDN egress policies; cluster authors must ship a default-deny NetworkPolicy per namespace. (e) Shielded GKE Nodes — shielded_nodes { enabled = true } with integrity monitoring + secure boot on every node. Autopilot vs Standard in prose: Autopilot eliminates control-plane and node operator burden (Google manages the node pool, no SSH or DaemonSet privileges for the customer; bills per-pod resource request) and is the preferred default for stateless web workloads; Standard is required for privileged DaemonSets (eBPF agents not from the curated GKE Autopilot allowlist), custom OS images, GPU pools, large persistent stateful workloads, and any scenario needing direct node access. Both modes support the full set of hardening toggles above. Single umbrella, NOT split into two controls.
Remediation — gcloud CLI
# gcloud CLI (latest stable)
# Step 1: create a hardened GKE Standard cluster with the full umbrella.
gcloud container clusters create gke-prod-euw1 \
--project=svc-app-prod --region=europe-west1 \
--workload-pool=svc-app-prod.svc.id.goog \
--enable-private-nodes --enable-private-endpoint \
--master-ipv4-cidr=172.16.0.0/28 \
--master-authorized-networks=10.50.0.0/24 \
--enable-binauthz \
--binauthz-evaluation-mode=PROJECT_SINGLETON_POLICY_ENFORCE \
--datapath-provider=ADVANCED_DATAPATH \
--enable-shielded-nodes \
--shielded-secure-boot --shielded-integrity-monitoring \
--release-channel=regular \
--enable-network-policy
# Step 2: alternative — Autopilot cluster (Google-managed nodes; same toggles inherited).
gcloud container clusters create-auto gke-prod-autopilot \
--project=svc-app-prod --region=europe-west1 \
--workload-pool=svc-app-prod.svc.id.goog \
--enable-private-nodes --enable-private-endpoint \
--master-ipv4-cidr=172.16.0.16/28 \
--master-authorized-networks=10.50.0.0/24 \
--binauthz-evaluation-mode=PROJECT_SINGLETON_POLICY_ENFORCE \
--release-channel=regular
# Step 3: bind a Kubernetes Service Account to a Google Service Account.
kubectl create namespace api
kubectl create serviceaccount ksa-api -n api
gcloud iam service-accounts add-iam-policy-binding \
sa-api-prod@svc-app-prod.iam.gserviceaccount.com \
--role=roles/iam.workloadIdentityUser \
--member='serviceAccount:svc-app-prod.svc.id.goog[api/ksa-api]'
kubectl annotate serviceaccount ksa-api -n api \
iam.gke.io/gcp-service-account=sa-api-prod@svc-app-prod.iam.gserviceaccount.com
# Step 4: ship a default-deny NetworkPolicy in every namespace.
cat <<'YAML' | kubectl apply -n api -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes: [Ingress, Egress]
YAML
Cloud Functions deploys via cloudfunctions.googleapis.comfunctions.create or functions.update with runtime values in the deprecated set (e.g. nodejs10, python37, go113).
Function ingress mutations setting ingressSettings=ALLOW_ALL on functions that previously required VPC-internal callers.
Function IAM bindings adding roles/cloudfunctions.invoker to allUsers.
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND protoPayload.serviceName="cloudfunctions.googleapis.com"
AND (protoPayload.methodName=~".*functions.(create|update)"
AND (protoPayload.request.function.runtime=~"nodejs10|python37|go113|nodejs12"
OR protoPayload.request.function.ingressSettings="ALLOW_ALL"))
Pin this Cloud Logging filter at project scope; pair with a Cloud Asset Inventory query on cloudfunctions.googleapis.com/Function to surface the steady-state population of functions on deprecated runtimes alongside the change-event stream.
Alert threshold
Page on any function deploy using a runtime in the project's deprecated allow-list — Google EOL'd those runtimes years ago and they receive no security patches.
Page on any ingress relaxation to ALLOW_ALL on a function tagged for internal-only invocation.
Initial response
Block further deploys via a Cloud Functions Pub/Sub admission Cloud Function that rejects deprecated runtimes; migrate affected functions to the current supported runtime list.
Re-tighten ingress on affected functions via gcloud functions deploy --ingress-settings=INTERNAL_AND_GCLB; review invocation logs for the public-exposure window.
Pin runtime + ingress in Terraform; gate updates through CI which checks runtime against the current support matrix.
Image curation runs through a hardened pipeline: Cloud Build consumes pinned base images from the CIS-hardened Public Image Project (or an internal curated image family), produces immutable artefacts in Artifact Registry, signs them via Sigstore cosign against a Binary Authorization attestor (the same one consumed by gcp-work-03), and emits an SLSA in-toto provenance attestation that survives in Container Analysis as a Note / Occurrence pair. Community-provided OS images and "latest"-tagged base images are forbidden for production workloads — pin to image digests or to an organisation-owned image family that has been baseline-scanned. For regulated workloads, layer Confidential VM (AMD SEV / SEV-SNP, Intel TDX) on top of the Shielded VM baseline — confidential nodes encrypt memory in use with attestation against the platform, closing the host-administrator-can-read-RAM blast radius (Google Cloud — Confidential VM overview (accessed 2026-05)). This control pairs with gcp-work-03: that one is the runtime admission gate; this one is the build-time provenance generator. Anti-conflation: "we use signed images" without an attestation policy enforcing the signature is theatre; "we have a Binary Authorization policy" without a build pipeline producing the attestations is unenforceable. Both halves are required.
Cloud Audit Logs on compute.googleapis.com for v1.compute.instances.insert where confidentialInstanceConfig.enableConfidentialCompute is omitted or set to false on subnets tagged for confidential-compute-only workloads.
Image-source events: instances created from a Compute Engine image family not on the documented Confidential VM allow-list.
Per-VM attestation events: Confidential Space attestation tokens issued from VMs whose confidentialInstanceConfig field reports SEV_SNP when the policy required TDX (or vice versa).
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND protoPayload.serviceName="compute.googleapis.com"
AND protoPayload.methodName="v1.compute.instances.insert"
AND protoPayload.request.networkInterfaces.subnetwork=~".*confidential.*"
AND NOT protoPayload.request.confidentialInstanceConfig.enableConfidentialCompute=true
This Cloud Logging filter catches VM-create drift; pair with a Cloud Asset Inventory query enumerating instances on confidential-tagged subnets so the steady-state population stays visible.
Alert threshold
Page on any VM insert on a confidential-tagged subnet that omits or disables Confidential Compute.
Page on attestation-type mismatch between the VM's confidentialInstanceConfig.confidentialInstanceType and the documented policy for the workload class.
Initial response
Terminate the non-confidential VM and redeploy on a Confidential VM image (--confidential-compute --confidential-compute-type=SEV_SNP or TDX as appropriate).
If the workload processed regulated data during the gap, treat memory contents as potentially exposed to the hypervisor; rotate any plaintext credentials and re-encrypt any in-RAM ciphertext under a fresh KEK.
Pin Confidential VM enablement in Terraform via confidential_instance_config block; reject deploys to confidential-tagged subnets that omit the block.
The patch-application surface of VM Manager Patch Deployments closes the loop that gcp-work-04 opens. A recurring Patch Deployment runs on a cron-style schedule, targets an instance filter (all instances; a specific zone; a tag), executes the platform-appropriate package-manager update (apt / yum / dnf / Windows Update / zypper) inside a maintenance window, and reports compliance back to Security Command Center. Rolling-restart semantics on managed instance groups keep production fleets serving traffic during the patch run. Anti-conflation:gcp-work-04 is the inventory + compliance side ("what's installed and does it match policy"); gcp-work-08 is the patch-application side ("apply the missing CVE fixes on this schedule"). The same VM Manager product, two distinct review rituals — mirroring the Phase 6 Systems Manager Patch Manager split and the Phase 7 Azure Update Manager split. DETECTIVE because the compliance dashboard surfaces patch-state drift; the auto-remediation aspect (the deploy itself) is the natural pair to the detection. Cite VM Manager — schedule patch jobs (accessed 2026-05) for the schedule semantics.
Cloud Audit Logs on compute.googleapis.com for v1.compute.instances.setMetadata calls that add ssh-keys at instance-level metadata bypassing OS Login.
Project-metadata pulls via setCommonInstanceMetadata publishing a project-wide SSH key — broad lateral-access vector.
Serial-console enable events: setMetadata setting serial-port-enable=TRUE on production VMs.
Query
logName=~"projects/.*/logs/cloudaudit.googleapis.com%2Factivity"
AND protoPayload.serviceName="compute.googleapis.com"
AND protoPayload.methodName=~"v1.compute.(instances|projects).setMetadata"
AND (protoPayload.request.items.key="ssh-keys"
OR protoPayload.request.items.key="serial-port-enable")
This Cloud Logging filter runs against the project audit sink; pair with OS Login audit feed so legacy-SSH-key additions are visible alongside OS Login sign-in events for cross-validation.
Alert threshold
Page on any ssh-keys metadata addition at instance or project scope when the OS Login + IAP-tunnel baseline applies.
Page on any serial-port-enable=TRUE mutation on production VMs; serial console is a remote-OOB path that bypasses OS Login auth.
Initial response
Remove the SSH key from metadata via gcloud compute instances remove-metadata --keys=ssh-keys and disable serial-port via setMetadata with serial-port-enable=FALSE.
Audit OS-image-side /var/log/auth.log (Linux) or Event Viewer Security log (Windows) for sign-ins keyed on the metadata-published SSH key during the gap window; rotate any account credential the key was bound to.
Re-assert the Org Policy constraint compute.requireOsLogin=true at organisation scope; pin instance metadata in Terraform with the block-project-ssh-keys=TRUE invariant.