Azure Workloads Hardening

Overview

This page covers Microsoft Azure workload hardening across the compute surfaces that decide whether an attacker who lands code execution on a single VM, container, or function can pivot to credentials, sibling workloads, or the Azure Resource Manager control plane. Scope is the Azure commercial regions; Azure Government and Azure operated by 21Vianet (China) inherit the same controls but expose a different sovereign endpoint suffix, a different Microsoft Entra ID (formerly Azure Active Directory) tenant topology, and a slightly different Defender for Cloud plan availability matrix — re-verify the per-region plan availability and the Microsoft Graph endpoint before applying any of the IaC below to a non-commercial cloud. CIS sub-IDs and NIST / ISO mappings throughout this page reference the commercial Microsoft Azure Foundations Benchmark v3.0.0 (Feb 2025) unless explicitly annotated as a post-v3.0.0 feature or a best-practice recommendation that the current benchmark has not yet codified. The crosswalk page at compliance frameworks describes how the seven pinned framework columns relate to each other.

The Azure workload model spans three compute planes that map to three distinct hardening conversations. Virtual Machines (Compute) — Linux and Windows guests on Hyper-V — hardened via Trusted Launch (Secure Boot + virtual TPM), Bastion-fronted remote access, Just-in-Time VM Access for residual direct exposure, Defender for Servers Plan 2 (Microsoft Defender for Endpoint auto-deploy + Defender Vulnerability Management + file integrity monitoring + adaptive application controls), Shared Image Gallery golden images, and Azure Update Manager for patch hygiene. Containers — Azure Container Registry (image supply chain), Azure Kubernetes Service (orchestration), Container Apps and Container Instances (managed runtimes) — hardened via ACR quarantine + content trust, Defender for Containers (agentless + agent-based image scanning, runtime detection, Kubernetes posture), Microsoft Entra Workload Identity for pod-to-Azure authentication, and Azure CNI Powered by Cilium for L4/L7 network policy. Serverless — Azure Functions and App Service Web Apps — hardened via system-assigned managed identity, Key Vault references for secrets (no connection strings in app settings), and Easy Auth / Microsoft Entra authentication for HTTP-triggered endpoints. Cross-cutting principles — image / OS hardening, runtime security, supply chain, secrets in workloads, and patch management — are owned by the General Workloads page; this page maps them to Azure primitives.

One canonical-content cross-link to flag at the top, because authoring this page in isolation would otherwise duplicate ~1500 words of canonical material: secrets management for Azure Functions and App Service is documented on the General IAM — secrets management page, not here. The Phase 4 canonical-content rule (one canonical treatment per cross-cutting topic) is honoured in azure-work-05: the control covers Function App / App Service system-assigned managed identity and Key Vault references (the Azure-specific how-to), and cross-links to general/iam.html for the Key Vault + secret-rotation reference architecture rather than re-authoring it. The same pattern recurs on azure/data.html for the Key Vault data-plane controls and on azure/iam.html — managed identities for the underlying identity-plane primitive.

Three anti-conflation callouts up front, because each pair gets confused in design reviews. First: VM Trusted Launch is a structural primitive (Secure Boot + vTPM), not a metadata-version control. AWS IMDSv2 is a metadata-token handshake that defeats SSRF-to-credentials reflection; Azure has IMDS but no v1/v2 protocol split — IMDS on Azure is already token-and-header-required and runs against a non-routable address with a hop limit of 1 by Azure platform default. The structural workload-integrity primitive on Azure is Trusted Launch: Secure Boot validates the UEFI boot chain against a Microsoft signature database (rejects rootkits that tamper with the bootloader or kernel modules), and a virtual TPM 2.0 measures boot integrity into PCRs that downstream Azure Disk Encryption and remote attestation services can verify (covered as azure-work-01). The cross-provider equivalence to aws-work-01 is structural, not mechanical — both raise the bar against pre-OS and credential-theft attacks, but the surface they harden is different. Do not transpose AWS IMDSv2 framing onto Azure. Second: Azure Bastion is the structural answer; Just-in-Time VM Access is the compensating control. Bastion removes the public IP from the target VM entirely — there is no NSG rule on TCP 22 or 3389 to expose because the management traffic terminates at a managed Microsoft service inside a dedicated AzureBastionSubnet. JIT VM Access keeps the public IP but opens the NSG rule for a bounded window on demand, with the request authorised through Defender for Cloud RBAC. Bastion is the default; JIT is the compensating control for legacy workloads that for licensing or vendor-support reasons cannot front via Bastion (covered as azure-work-02). Third: AKS Workload Identity (the umbrella in azure-work-06) replaces Pod Identity; Pod Identity was deprecated 24 October 2022. Microsoft Entra Pod Identity (the managed add-on formerly known as Azure AD Pod Identity) is patched-only until September 2025 and is not the path Microsoft is investing in. New AKS deployments must use Microsoft Entra Workload Identity (OIDC federation between the AKS cluster's OIDC issuer and Entra) — covered as a single umbrella control in azure-work-06 alongside Azure CNI Powered by Cilium (Azure NPM retiring September 2028), private API server, and Defender for Containers integration. Do not author Pod Identity into new code; reference it only with deprecation framing.

Order matters. Controls 01–02 are foundational invariants for every VM: Trusted Launch closes the pre-OS attack surface and removes guest tampering as a credible vector, Bastion + JIT closes the remote-access surface. Controls 03–04 close the container and vulnerability-assessment loop: ACR quarantine holds an image push until Defender for Containers clears its vulnerability scan, Defender for Servers Plan 2 provides continuous EDR + vulnerability assessment + FIM across Linux and Windows VMs. Control 05 hardens Function App / App Service identity. Control 06 hardens AKS as a single umbrella. Control 07 establishes golden-image provenance via Shared Image Gallery + Azure Image Builder. Control 08 handles ongoing patch hygiene via Azure Update Manager (the canonical Azure patching plane, successor to Update Management Center). The page is structured so a reader can skim 01–02 for the everyday VM baseline, then dip into 03–08 by service area as needed. Equivalence callouts at the bottom of each control point to the matching control on the AWS, GCP, and OCI sibling pages — note that the AWS callouts are bidirectional and load-bearing (the Phase 6 AWS page links INTO the IDs on this page, and the Phase 7 equivalence gate auto-promotes those links from graceful-skip to strict once the control boxes here exist). Subscription and management-group scope: Azure Policy at the root management group enforces tenant-wide invariants (Trusted Launch required on new VMs, Defender plans enabled, allowed VM SKUs, required tagging) and is the single most important lever for keeping the controls below from drifting out of compliance once dozens of subscriptions and thousands of VMs exist.

azure-work-01-trusted-launch ! CRITICAL PREVENTIVE

Every Azure VM (Linux or Windows) must be deployed with Trusted Launch enabled — Secure Boot on, virtual TPM 2.0 on — and the requirement must be pinned at the root management group via an Azure Policy assignment that denies VM creation with securityProfile.securityType other than TrustedLaunch (Confidential VMs additionally require Trusted Launch as a prerequisite) — see Microsoft Learn — Azure Trusted Launch for VMs (accessed 2026-05). Secure Boot validates the UEFI boot chain against the Microsoft-managed signature database, rejecting bootkits that tamper with the bootloader or unsigned kernel modules; the virtual TPM measures boot integrity into PCRs that Azure Disk Encryption and the Azure Attestation service can later verify. PITFALL 11 — Gen2 prerequisite is non-negotiable. Trusted Launch requires a Generation 2 VM SKU (UEFI boot) and a Generation 2 image — Gen1 legacy gallery images (BIOS boot) cannot enable Trusted Launch under any flag combination. Inventory existing Gen1 VMs before assigning the deny policy and refresh them through azure-work-07 (Shared Image Gallery + Azure Image Builder) onto Gen2 hardened base images. The default-on posture for newly published gallery images (since November 2023) handles the green-field case; the brown-field case is where this control earns its severity. Anti-conflation vs the AWS sibling: aws-work-01-imdsv2-mandatory defeats the SSRF-to-credentials reflection on the metadata service; azure-work-01 closes the pre-OS firmware attack surface. The equivalence is structural — both raise the workload-integrity bar — but the mechanism and the threat model are different.

Remediation — Azure CLI

# Azure CLI 2.x
# Inventory: list VMs and their security profile across all subscriptions.
for sub in $(az account list --query '[].id' -o tsv); do
  az vm list --subscription "$sub" --show-details \
    --query "[].{sub:'$sub', name:name, rg:resourceGroup, gen:storageProfile.imageReference.sku, sec:securityProfile.securityType}" \
    -o tsv
done

# Create a new Gen2 VM with Trusted Launch (Secure Boot + vTPM).
az vm create \
  --resource-group rg-app-prod-westeu \
  --name vm-app-01 \
  --image Canonical:0001-com-ubuntu-server-jammy:22_04-lts-gen2:latest \
  --size Standard_D4s_v5 \
  --security-type TrustedLaunch \
  --enable-secure-boot true \
  --enable-vtpm true \
  --admin-username azureuser \
  --generate-ssh-keys

# Assign the built-in policy at the root management group to force the property tenant-wide.
# Built-in: "Guest Attestation extension should be installed on supported Linux virtual machines"
# Built-in: "vTPM should be enabled on supported virtual machines"
# Built-in: "Secure Boot should be enabled on supported Windows virtual machines"
az policy assignment create \
  --name pa-trusted-launch-required \
  --scope "/providers/Microsoft.Management/managementGroups/tenant-root" \
  --policy-set-definition "/providers/Microsoft.Authorization/policySetDefinitions/<trusted-launch-initiative-id>"

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_resource_group" "app" {
  name     = "rg-app-prod-westeu"
  location = "westeurope"
}

resource "azurerm_network_interface" "vm" {
  name                = "nic-vm-app-01"
  resource_group_name = azurerm_resource_group.app.name
  location            = azurerm_resource_group.app.location
  ip_configuration {
    name                          = "ipcfg"
    subnet_id                     = var.app_subnet_id
    private_ip_address_allocation = "Dynamic"
  }
}

resource "azurerm_linux_virtual_machine" "app" {
  name                = "vm-app-01"
  resource_group_name = azurerm_resource_group.app.name
  location            = azurerm_resource_group.app.location
  size                = "Standard_D4s_v5"
  admin_username      = "azureuser"
  network_interface_ids = [azurerm_network_interface.vm.id]

  # Trusted Launch (PITFALL 11: requires Gen2 image + Gen2 SKU)
  secure_boot_enabled = true
  vtpm_enabled        = true

  admin_ssh_key {
    username   = "azureuser"
    public_key = var.admin_ssh_public_key
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Premium_LRS"
  }

  # Gen2 image — Trusted Launch will refuse to apply against a Gen1 SKU.
  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts-gen2"
    version   = "latest"
  }

  tags = { tier = "prod", owner = "platform-compute" }
}

# Tenant-wide enforcement at the root management group.
resource "azurerm_management_group_policy_assignment" "trusted_launch_required" {
  name                 = "trusted-launch-required"
  management_group_id  = "/providers/Microsoft.Management/managementGroups/tenant-root"
  policy_definition_id = var.trusted_launch_initiative_id
  description          = "Require Secure Boot + vTPM on all new VMs (Trusted Launch)"
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Hardened-by-default Linux VM with Trusted Launch (Secure Boot + vTPM).')
param vmName string
@description('Subnet ID for the VM NIC.')
param subnetId string
@description('Admin SSH public key.')
@secure()
param adminPublicKey string

param location string = resourceGroup().location

resource nic 'Microsoft.Network/networkInterfaces@2024-03-01' = {
  name: '${vmName}-nic'
  location: location
  properties: {
    ipConfigurations: [
      {
        name: 'ipconfig'
        properties: { subnet: { id: subnetId }, privateIPAllocationMethod: 'Dynamic' }
      }
    ]
  }
}

resource vm 'Microsoft.Compute/virtualMachines@2024-07-01' = {
  name: vmName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    hardwareProfile: { vmSize: 'Standard_D4ds_v5' }
    securityProfile: {
      securityType: 'TrustedLaunch'
      uefiSettings: {
        secureBootEnabled: true
        vTpmEnabled: true
      }
    }
    osProfile: {
      computerName: vmName
      adminUsername: 'azureuser'
      linuxConfiguration: {
        disablePasswordAuthentication: true
        ssh: { publicKeys: [{ path: '/home/azureuser/.ssh/authorized_keys', keyData: adminPublicKey }] }
      }
    }
    storageProfile: {
      imageReference: {
        publisher: 'Canonical', offer: '0001-com-ubuntu-server-jammy', sku: '22_04-lts-gen2', version: 'latest'
      }
      osDisk: {
        createOption: 'FromImage'
        managedDisk: { storageAccountType: 'Premium_LRS' }
        deleteOption: 'Delete'
      }
    }
    networkProfile: { networkInterfaces: [{ id: nic.id }] }
  }
}

Remediation — Pulumi (TypeScript)

import * as pulumi from "@pulumi/pulumi";
import * as compute from "@pulumi/azure-native/compute";

new compute.VirtualMachine("vm-trusted-launch", {
  resourceGroupName: "<rg>",
  hardwareProfile: { vmSize: "Standard_D4ds_v5" },
  identity: { type: compute.ResourceIdentityType.SystemAssigned },
  securityProfile: {
    securityType: compute.SecurityTypes.TrustedLaunch,
    uefiSettings: { secureBootEnabled: true, vTpmEnabled: true },
  },
  storageProfile: {
    imageReference: {
      publisher: "Canonical", offer: "0001-com-ubuntu-server-jammy",
      sku: "22_04-lts-gen2", version: "latest",
    },
    osDisk: { createOption: compute.DiskCreateOptionTypes.FromImage,
              managedDisk: { storageAccountType: compute.StorageAccountTypes.Premium_LRS } },
  },
  osProfile: {
    computerName: "vm-tl",
    adminUsername: "azureuser",
    linuxConfiguration: { disablePasswordAuthentication: true },
  },
  networkProfile: { networkInterfaces: [{ id: "<nic-id>" }] },
});

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a7.x (verify)n/an/a AC-3; CM-7; SC-8A.8.20; A.8.25CLD.9.5.1

Log signals

  • AzureActivity Microsoft.Compute/virtualMachines/write where the request body sets securityProfile.securityType from TrustedLaunch to Standard on an existing VM — disables Secure Boot and vTPM enforcement.
  • AzureActivity VM creation events where securityProfile.uefiSettings.secureBootEnabled is false on a workload tagged production.
  • AzureActivity scale-set updates that mass-disable vTpmEnabled across an instance fleet — fleet-wide regression.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.Compute/virtualMachines/write", "Microsoft.Compute/virtualMachineScaleSets/write")
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "\"securityType\":\"Standard\"" or body has "\"secureBootEnabled\":false" or body has "\"vTpmEnabled\":false"
          | project TimeGenerated, Caller, ResourceId, body
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Trusted Launch downgrade is rare and intentional; persist as a Sentinel analytics rule with severity Medium and require an attached governance ticket reference.

Alert threshold

  • Any flip of securityType back to Standard on a production VM — page on first occurrence.
  • Scale-set update touching more than 5 instances with vTPM or Secure Boot disablement — page; treat as fleet-scale supply-chain event.

Initial response

  1. Reapply Trusted Launch via the IaC baseline; capture the AzureActivity Caller and the prior VM-resource JSON as the rollback ledger.
  2. Walk Defender for Servers Guest Configuration assessments for the affected VM — boot integrity assessment going from PASS to FAIL after the change confirms the impact.
  3. Escalate per general/ir.html — confirm Azure Policy Virtual machines should have Secure Boot enabled and Virtual machines should have vTPM enabled remain in deny mode at the management group.

References

Equivalent on: AWS · GCP · OCI

azure-work-02-jit-bastion ! HIGH PREVENTIVE

Remote administrative access to Azure VMs must terminate at Azure Bastion Standard — a managed Microsoft service deployed into a dedicated AzureBastionSubnet that brokers RDP and SSH over TLS from the Azure portal (and from native clients via the Standard SKU) — with no public IP and no NSG rule on TCP 22 or 3389 on the target VM. For workloads that for licensing or vendor-support reasons cannot front via Bastion, Defender for Cloud Just-in-Time VM Access serves as the compensating control: the NSG rule for the management port is normally absent and is added for a bounded window (typically ≤3 hours) on a per-request basis, authorised through Defender for Cloud RBAC and audited in the Activity Log (Microsoft Learn — Azure Bastion overview (accessed 2026-05); Microsoft Learn — Defender for Cloud Just-in-Time VM Access (accessed 2026-05)). Bastion-Standard adds session recording (when integrated with Log Analytics), native-client support (az network bastion ssh / tunnel), and scale (host scale-units for high-throughput tenants); Bastion-Basic is acceptable for lower-tier subscriptions but loses native-client and scaling. Conditional Access policies on the Microsoft Entra users authorised to launch Bastion sessions add MFA + device-compliance enforcement to the management plane. Anti-conflation vs the AWS sibling: aws-work-02-ssm-session-manager uses an SSM agent inside the instance to broker the session through the AWS API plane (no inbound network path at all); Azure Bastion uses a managed bastion VM in a service-managed subnet that brokers the session over TLS — equivalent threat posture (no exposed management port on the target), different architectural primitive.

Remediation — Azure CLI

# Azure CLI 2.x
# Audit: enumerate VMs with public IPs and NSG rules opening 22/3389 from the Internet tag.
for sub in $(az account list --query '[].id' -o tsv); do
  az network nsg list --subscription "$sub" --query '[].id' -o tsv | while read nsg_id; do
    az network nsg rule list --ids "$nsg_id" \
      --query "[?direction=='Inbound' && access=='Allow' && (sourceAddressPrefix=='Internet' || sourceAddressPrefix=='*') && (contains(destinationPortRanges, '22') || contains(destinationPortRanges, '3389'))].{nsg:'$nsg_id', name:name}" \
      -o tsv
  done
done

# Deploy Azure Bastion (Standard) into the dedicated AzureBastionSubnet.
az network vnet subnet create \
  --resource-group rg-net-hub-westeu \
  --vnet-name vnet-hub-westeu \
  --name AzureBastionSubnet \
  --address-prefixes 10.0.255.0/26

az network public-ip create \
  --resource-group rg-net-hub-westeu \
  --name pip-bastion-hub \
  --sku Standard --allocation-method Static

az network bastion create \
  --resource-group rg-net-hub-westeu \
  --name bastion-hub \
  --vnet-name vnet-hub-westeu \
  --public-ip-address pip-bastion-hub \
  --sku Standard \
  --enable-tunneling true

# Enable JIT VM Access on residual VMs that cannot front via Bastion.
az security jit-policy create \
  --resource-group rg-app-legacy-westeu \
  --location westeurope \
  --name default \
  --kind Basic \
  --virtual-machines '[{"id":"<vm-resource-id>","ports":[{"number":22,"protocol":"TCP","allowedSourceAddressPrefix":"<corp-vpn-cidr>","maxRequestAccessDuration":"PT3H"}]}]'

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_subnet" "bastion" {
  name                 = "AzureBastionSubnet"
  resource_group_name  = azurerm_resource_group.net.name
  virtual_network_name = azurerm_virtual_network.hub.name
  address_prefixes     = ["10.0.255.0/26"]
}

resource "azurerm_public_ip" "bastion" {
  name                = "pip-bastion-hub"
  resource_group_name = azurerm_resource_group.net.name
  location            = azurerm_resource_group.net.location
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_bastion_host" "hub" {
  name                = "bastion-hub"
  resource_group_name = azurerm_resource_group.net.name
  location            = azurerm_resource_group.net.location
  sku                 = "Standard"

  # Standard-SKU features
  tunneling_enabled       = true
  copy_paste_enabled      = true
  file_copy_enabled       = false
  shareable_link_enabled  = false
  ip_connect_enabled      = true
  scale_units             = 2

  ip_configuration {
    name                 = "ipcfg"
    subnet_id            = azurerm_subnet.bastion.id
    public_ip_address_id = azurerm_public_ip.bastion.id
  }
}

# Defender for Cloud JIT VM Access — compensating control for legacy VMs.
# JIT policy is currently authored via Azure Policy / az security jit-policy;
# the azurerm_security_center_subscription_pricing resource enables the Defender for Servers
# plan that licenses the feature.
resource "azurerm_security_center_subscription_pricing" "servers" {
  tier          = "Standard"
  resource_type = "VirtualMachines"
  subplan       = "P2"
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Azure Bastion (Standard SKU) replaces public RDP/SSH.')
param bastionName string
@description('Subnet named AzureBastionSubnet (/26).')
param bastionSubnetId string
@description('Standard SKU public IP resource ID.')
param publicIpId string

param location string = resourceGroup().location

resource bastion 'Microsoft.Network/bastionHosts@2024-03-01' = {
  name: bastionName
  location: location
  sku: { name: 'Standard' }
  properties: {
    disableCopyPaste: false
    enableTunneling: true
    enableShareableLink: false
    enableIpConnect: false
    ipConfigurations: [
      {
        name: 'bastion-ipconfig'
        properties: {
          subnet: { id: bastionSubnetId }
          publicIPAddress: { id: publicIpId }
        }
      }
    ]
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a7.x (verify)n/an/a AC-17; AC-17(3); AU-2A.8.5; A.8.15CLD.9.5.1

Log signals

  • AzureActivity Microsoft.Security/locations/jitNetworkAccessPolicies/delete on a JIT policy that previously gated VM admin access — silently removes the time-bound NSG-open path.
  • AzureActivity Microsoft.Network/bastionHosts/delete on a Bastion attached to a production VNet — operators may pivot to direct SSH/RDP rules instead.
  • MicrosoftAzureBastionAuditLogs entries showing session disconnects spike followed by AzureActivity NSG-rule writes that open TCP 22/3389 on the same VNet — supply-chain pivot from Bastion to raw NSG access.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.Security/locations/jitNetworkAccessPolicies/delete", "Microsoft.Network/bastionHosts/delete", "Microsoft.Network/networkSecurityGroups/securityRules/write")
          | extend body = tostring(parse_json(Properties).requestbody)
          | where OperationNameValue endswith "Delete" or (body has "\"22\"" or body has "\"3389\"")
          | project TimeGenerated, Caller, ResourceId, OperationNameValue, body
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Pair with the Bastion audit-log baseline — sustained Bastion-session activity that abruptly drops while NSG admin rules appear is the canonical pivot signal.

Alert threshold

  • JIT policy delete on a production VM — page immediately.
  • Bastion delete followed within 24h by NSG admin-port Allow on the same VNet — page; the operator path has shifted off the audited surface.

Initial response

  1. Restore the JIT policy and the Bastion via the IaC baseline; delete the NSG admin-port Allow rule introduced in the gap window.
  2. Walk SigninLogs and AzureActivity for the exposure window for any VM management operation issued from a principal that did not also use Bastion — these are candidate raw-port-access events.
  3. Escalate per general/ir.html — confirm Azure Policy Internet-facing virtual machines should be protected with NSGs and Just-in-time access should be enabled for VMs remain in deny mode.

References

Equivalent on: AWS · GCP · OCI

azure-work-03-acr-scanning ! HIGH DETECTIVE

Every Azure Container Registry holding images destined for production workloads must run on the Premium SKU with quarantine_policy_enabled = true (the push is held in a quarantined state until a vulnerability scan clears or an explicit override is recorded), with Microsoft Defender for Containers enabled at the subscription so that registry images and running cluster images are scanned continuously (agent-based for AKS, agentless for non-AKS targets), and with the registry content-trust / signed-images policy enabled so downstream consumers (AKS, Container Apps, Function App container deployments) refuse unsigned tags (Microsoft Learn — ACR content trust (accessed 2026-05); Microsoft Learn — Defender for Containers (accessed 2026-05)). ACR Tasks with base-image-trigger-enabled = true automatically rebuild downstream tags when a base image is updated, closing the "old node-based image, fresh CVE in libcrypto" gap that bites long-lived production images. Defender for Containers replaces the legacy "Defender for container registries" plan and additionally provides Kubernetes-runtime threat detection, admission controller integration, and Kubernetes posture (CIS Kubernetes Benchmark assessment) — but the registry-scanning slice is the part that earns this control's HIGH DETECTIVE severity. Pair-control prose: this control covers the image supply chain at the registry boundary; azure-work-06 covers the AKS cluster posture that consumes those images; azure-data-07 covers sensitive-data discovery inside the running data plane. The three controls are complementary; image scanning at the registry is not a substitute for runtime detection in the cluster.

Remediation — Azure CLI

# Azure CLI 2.x
# Upgrade ACR to Premium and enable quarantine + content trust.
az acr update \
  --name acrprodweu \
  --sku Premium

az acr config content-trust update \
  --registry acrprodweu \
  --status enabled

# Quarantine policy: hold push until vulnerability scan clears.
az acr config retention update --registry acrprodweu --status enabled --days 30 --type UntaggedManifests
# Quarantine is currently set via the management API / ARM template; az acr update --quarantine
# was the legacy verb. Use ARM/Terraform for the canonical declaration (below).

# Enable Defender for Containers at the subscription.
az security pricing create \
  --name Containers \
  --tier Standard

# ACR Task: rebuild on base-image update.
az acr task create \
  --registry acrprodweu \
  --name app-base-rebuild \
  --image app:{{.Run.ID}} \
  --context https://github.com/example/app.git \
  --file Dockerfile \
  --base-image-trigger-enabled true \
  --commit-trigger-enabled true \
  --git-access-token "$GH_PAT"

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_container_registry" "prod" {
  name                = "acrprodweu"
  resource_group_name = azurerm_resource_group.app.name
  location            = azurerm_resource_group.app.location
  sku                 = "Premium"
  admin_enabled       = false

  # Quarantine: hold image push until scan clears (Premium only).
  quarantine_policy_enabled = true

  # Content trust: signed images required.
  trust_policy {
    enabled = true
  }

  # Vulnerability scanning is delivered via Defender for Containers — see plan below.
  retention_policy {
    days    = 30
    enabled = true
  }

  # No public network access; consume via Private Endpoint.
  public_network_access_enabled = false
  network_rule_set {
    default_action = "Deny"
  }
}

# Subscription-level Defender for Containers plan enables registry + runtime scanning.
resource "azurerm_security_center_subscription_pricing" "containers" {
  tier          = "Standard"
  resource_type = "Containers"
}

# ACR Task: rebuild on base-image change.
resource "azurerm_container_registry_task" "rebuild" {
  name                  = "app-base-rebuild"
  container_registry_id = azurerm_container_registry.prod.id

  platform {
    os = "Linux"
  }

  docker_step {
    dockerfile_path      = "Dockerfile"
    context_path         = "https://github.com/example/app.git"
    context_access_token = var.github_pat
    image_names          = ["app:{{.Run.ID}}"]
  }

  base_image_trigger {
    name                        = "default-base-image-trigger"
    type                        = "Runtime"
    enabled                     = true
    update_trigger_payload_type = "Default"
  }
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Container Registry with Defender vuln scanning + content trust + public access disabled.')
param acrName string

param location string = resourceGroup().location

resource acr 'Microsoft.ContainerRegistry/registries@2023-11-01-preview' = {
  name: acrName
  location: location
  sku: { name: 'Premium' }
  identity: { type: 'SystemAssigned' }
  properties: {
    adminUserEnabled: false
    publicNetworkAccess: 'Disabled'
    policies: {
      trustPolicy:     { status: 'enabled', type: 'Notary' }
      retentionPolicy: { status: 'enabled', days: 90 }
      quarantinePolicy: { status: 'enabled' }
      exportPolicy:    { status: 'disabled' }
    }
    networkRuleSet: { defaultAction: 'Deny' }
    encryption: { status: 'enabled' }
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a RA-5; SI-3; SA-11A.8.8; A.8.29CLD.12.4.5

Log signals

  • ContainerRegistryRepositoryEvents showing operationName = "Push" on an ACR repository where the subsequent Defender for Containers scan report never lands — scan pipeline disconnected.
  • AzureActivity Microsoft.ContainerRegistry/registries/write where the request body removes the policies.quarantinePolicy setting — disables the quarantine-on-push gate.
  • AzureDiagnostics Category ContainerRegistryLoginEvents showing pulls from identities outside the documented runtime principals — supply-chain abuse of registry credentials.

Query

ContainerRegistryRepositoryEvents
          | where OperationName == "Push"
          | join kind=leftanti (
              SecurityAlert
              | where AlertName has "Container image"
              | extend repo = tostring(parse_json(Entities)[0].name)
          ) on $left.Repository == $right.repo
          | project TimeGenerated, Repository, ImageTag, Identity
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. The anti-join surfaces image pushes for which no Defender scan completed within the expected SLA — coverage gap rather than per-event policy violation. Persist as a Sentinel analytics rule and pair with daily ACR inventory reconciliation.

Alert threshold

  • Image push without a matching Defender scan report within 60 minutes — page; the runtime fleet may already be pulling an unscanned image.
  • ACR pull from an identity outside the documented runtime principal set — page on first occurrence; treat as registry credential abuse.

Initial response

  1. Manually trigger a scan via az acr repository show --name {acr} --image {repo}:{tag} and the Defender for Containers re-scan API; quarantine the image via tag rename until results return clean.
  2. Walk ContainerRegistryLoginEvents for unauthorised pulls; rotate the registry credentials for any non-managed-identity principal that was active during the exposure window.
  3. Escalate per general/ir.html — confirm Azure Policy Container registries should have vulnerability scan completed remains in audit-deny mode and that downstream AKS clusters pull only from the documented ACR.

References

Equivalent on: AWS · GCP · OCI

azure-work-04-defender-for-servers ! HIGH DETECTIVE

Microsoft Defender for Servers Plan 2 must be enabled subscription-wide for every subscription containing production VMs. Plan 2 auto-deploys Microsoft Defender for Endpoint (the EDR engine, ex-Microsoft Defender Advanced Threat Protection) to every Linux and Windows VM in scope, runs Defender Vulnerability Management for continuous OS + application vulnerability assessment, licenses the Just-in-Time VM Access feature exercised by azure-work-02, enables File Integrity Monitoring (FIM) on a configurable set of paths and registry keys, and provisions Adaptive Application Controls (an allow-list of known-good binaries per VM group) — see Microsoft Learn — Defender for Servers plans (accessed 2026-05). Plan 1 covers Defender for Endpoint deployment only and is acceptable for low-tier subscriptions; Plan 2 is the production baseline because the vulnerability-assessment + FIM + adaptive-application-controls bundle is what makes the EDR signal actionable. Pair-control prose: azure-log-04 enables the Defender for Cloud workload-protection plans subscription-wide as a posture and licensing decision; azure-work-04 authors the per-server-class hardening implications of having Plan 2 specifically (FIM rule sets, adaptive-application-control allow-lists, the JIT-VM-Access integration). The two controls are deliberately separated because they answer different operational questions (which plan tier? vs how do I use it on my Linux fleet?). Anti-conflation vs the AWS sibling: aws-work-04-inspector-org covers Amazon Inspector (continuous vulnerability assessment); Defender for Servers Plan 2 bundles vulnerability assessment, EDR, FIM, JIT, and adaptive application controls in a single subscription-level plan — the equivalence is at the "continuous workload protection" layer, not at the per-feature level.

Remediation — Azure CLI

# Azure CLI 2.x
# Enable Defender for Servers Plan 2 on the current subscription.
az security pricing create \
  --name VirtualMachines \
  --tier Standard \
  --subplan P2

# Verify the plan and the included extensions (MDE, vulnerability assessment, FIM).
az security pricing show --name VirtualMachines -o jsonc

# Configure File Integrity Monitoring (FIM) rules at the workspace.
# (FIM lives on the Log Analytics workspace that the Defender plan ships data into.)
az monitor log-analytics workspace get-shared-keys \
  --resource-group rg-mgmt-prod-westeu \
  --workspace-name law-mgmt-prod

# Adaptive Application Controls: list current recommendations, then enforce per VM group.
az security adaptive-application-controls list -o table

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_security_center_subscription_pricing" "servers_p2" {
  tier          = "Standard"
  resource_type = "VirtualMachines"
  subplan       = "P2"
}

# Defender for Endpoint auto-provisioning is enabled by the plan, but the integration
# with the Defender for Cloud workspace must be authored explicitly.
resource "azurerm_security_center_setting" "mde_integration" {
  setting_name = "WDATP"
  enabled      = true
}

# Pin the Log Analytics workspace that Defender ships data into.
resource "azurerm_security_center_workspace" "default" {
  scope        = "/subscriptions/${var.subscription_id}"
  workspace_id = azurerm_log_analytics_workspace.mgmt.id
}

# Tenant-wide enforcement: Defender for Servers Plan 2 required on every subscription.
resource "azurerm_management_group_policy_assignment" "defender_servers_required" {
  name                 = "defender-servers-p2-required"
  management_group_id  = "/providers/Microsoft.Management/managementGroups/tenant-root"
  policy_definition_id = var.defender_servers_p2_policy_id
  description          = "Require Defender for Servers Plan 2 on all subscriptions"
}

Remediation — Bicep

targetScope = 'subscription'

resource defenderServers 'Microsoft.Security/pricings@2024-01-01' = {
  name: 'VirtualMachines'
  properties: {
    pricingTier: 'Standard'
    subPlan: 'P2'
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a2.1n/an/a RA-5; SI-4A.8.8CLD.12.4.5

Log signals

  • AzureActivity Microsoft.Security/pricings/write where name = "VirtualMachines" and the request body sets pricingTier = "Free" — disarms the Defender for Servers agent on the subscription.
  • AzureActivity Microsoft.Compute/virtualMachines/extensions/delete targeting the MDE.Linux or MDE.Windows extension — uninstalls the EDR agent on a single VM.
  • SecurityAlert table volume drop per VM compared to a 30-day baseline — agent health regression that the per-extension delete may not explain (transient sensor failure).

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.Security/pricings/write", "Microsoft.Compute/virtualMachines/extensions/delete")
          | extend body = tostring(parse_json(Properties).requestbody)
          | where (body has "\"VirtualMachines\"" and body has "\"Free\"") or ResourceId has "MDE."
          | project TimeGenerated, Caller, ResourceId, OperationNameValue, body
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Pair with a daily Defender for Servers coverage report grouped by subscription — coverage drops are easier to spot at fleet scale than per-extension uninstalls.

Alert threshold

  • Plan flip from Standard P2 to Free on a production subscription — page on first occurrence.
  • MDE extension delete on a production VM — page; the VM is now off-EDR.

Initial response

  1. Reapply the Defender for Servers plan via az security pricing create --name VirtualMachines --tier Standard; the MDE extension should redeploy via auto-provisioning within an hour.
  2. If the extension delete was operator-driven, reinstall via az vm extension set --publisher Microsoft.Azure.AzureDefenderForServers; capture the AzureActivity Caller as the actor of record.
  3. Escalate per general/ir.html — confirm Azure Policy Configure servers to enable Microsoft Defender for Endpoint remains in DeployIfNotExists mode at the management-group root.

References

Equivalent on: AWS · GCP · OCI

azure-work-05-function-managed-identity ! HIGH PREVENTIVE

Every Azure Function App and App Service Web App must run with a system-assigned managed identity (a service principal whose lifecycle is bound to the resource and whose credentials Azure rotates automatically — no shared secrets, no connection strings sitting in app settings, no Key Vault client IDs and client secrets baked into deployment slots). Secrets the application needs (database passwords, third-party API keys, SAS tokens) are referenced via Key Vault references in the form @Microsoft.KeyVault(SecretUri=https://<vault>.vault.azure.net/secrets/<name>/<version>) with the managed identity granted Key Vault Secrets User on the target vault. HTTP-triggered endpoints that are publicly reachable must have function-key authentication disabled in favour of Easy Auth / Microsoft Entra authentication (the App Service Authentication / authorization v2 surface, configured via auth_settings_v2), pinning the identity provider to Microsoft Entra ID and refusing unauthenticated calls (Microsoft Learn — App Service authentication and authorization (accessed 2026-05); Microsoft Learn — Key Vault references in App Service / Functions (accessed 2026-05)). The canonical Key Vault + secret-rotation reference architecture and the underlying threat model live on general/iam.html — secrets management; this control covers the Azure-specific how-to (which app setting syntax, which managed-identity flavour, which Key Vault role assignment, which Easy Auth surface) and does not re-author the canonical content per the Phase 4 canonical-content rule. Cross-link to azure/iam.html — managed identities for the underlying identity-plane primitive that this control consumes. Anti-conflation vs the AWS sibling: aws-work-05-lambda-least-priv covers Lambda execution-role least privilege and function-URL auth; azure-work-05 covers the equivalent Function App / App Service managed identity + Key Vault references + Easy Auth surface — equivalent threat posture (no long-lived secrets in the runtime), different IAM primitives.

Remediation — Azure CLI

# Azure CLI 2.x
# Assign a system-assigned managed identity to an existing Function App.
az functionapp identity assign \
  --resource-group rg-app-prod-westeu \
  --name func-orders-prod

PRINCIPAL_ID=$(az functionapp identity show \
  --resource-group rg-app-prod-westeu \
  --name func-orders-prod \
  --query principalId -o tsv)

# Grant the identity the Key Vault Secrets User role on the target vault.
az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Key Vault Secrets User" \
  --scope "/subscriptions/$SUB/resourceGroups/rg-sec-prod-westeu/providers/Microsoft.KeyVault/vaults/kv-orders-prod"

# Replace the inline secret with a Key Vault reference.
az functionapp config appsettings set \
  --resource-group rg-app-prod-westeu \
  --name func-orders-prod \
  --settings "SQL_CONN_STR=@Microsoft.KeyVault(SecretUri=https://kv-orders-prod.vault.azure.net/secrets/sql-conn/)"

# Lock public HTTP triggers behind Microsoft Entra (Easy Auth v2).
az webapp auth update \
  --resource-group rg-app-prod-westeu \
  --name func-orders-prod \
  --enabled true \
  --action RequireAuthentication \
  --unauthenticated-client-action Return401 \
  --redirect-provider AzureActiveDirectory

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_linux_function_app" "orders" {
  name                       = "func-orders-prod"
  resource_group_name        = azurerm_resource_group.app.name
  location                   = azurerm_resource_group.app.location
  service_plan_id            = azurerm_service_plan.app.id
  storage_account_name       = azurerm_storage_account.func.name
  storage_account_access_key = azurerm_storage_account.func.primary_access_key

  # System-assigned managed identity — credentials rotated by Azure.
  identity {
    type = "SystemAssigned"
  }

  # Key Vault references in lieu of inline secrets.
  app_settings = {
    SQL_CONN_STR        = "@Microsoft.KeyVault(SecretUri=https://kv-orders-prod.vault.azure.net/secrets/sql-conn/)"
    THIRD_PARTY_API_KEY = "@Microsoft.KeyVault(SecretUri=https://kv-orders-prod.vault.azure.net/secrets/partner-api/)"
    FUNCTIONS_WORKER_RUNTIME = "dotnet-isolated"
  }

  # Easy Auth v2 — refuse unauthenticated HTTP traffic.
  auth_settings_v2 {
    auth_enabled           = true
    require_authentication = true
    unauthenticated_action = "Return401"
    default_provider       = "azureactivedirectory"

    active_directory_v2 {
      client_id                  = var.entra_app_client_id
      tenant_auth_endpoint       = "https://login.microsoftonline.com/${var.tenant_id}/v2.0"
      www_authentication_disabled = false
    }

    login {
      token_store_enabled = true
    }
  }

  site_config {
    ftps_state = "Disabled"
    http2_enabled = true
    minimum_tls_version = "1.2"
  }
}

# Grant the system-assigned identity Key Vault Secrets User on the target vault.
resource "azurerm_role_assignment" "func_kv_reader" {
  scope                = azurerm_key_vault.orders.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_linux_function_app.orders.identity[0].principal_id
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Function App with system-assigned identity and no FTP / SCM basic auth.')
param functionAppName string
@description('App Service Plan resource ID.')
param appServicePlanId string

param location string = resourceGroup().location

resource fn 'Microsoft.Web/sites@2023-12-01' = {
  name: functionAppName
  location: location
  kind: 'functionapp,linux'
  identity: { type: 'SystemAssigned' }
  properties: {
    serverFarmId: appServicePlanId
    httpsOnly: true
    siteConfig: {
      minTlsVersion: '1.2'
      ftpsState: 'Disabled'
      scmIpSecurityRestrictionsUseMain: true
      publicNetworkAccess: 'Disabled'
    }
  }
}

resource ftpBasic 'Microsoft.Web/sites/basicPublishingCredentialsPolicies@2023-12-01' = {
  parent: fn
  name: 'ftp'
  properties: { allow: false }
}

resource scmBasic 'Microsoft.Web/sites/basicPublishingCredentialsPolicies@2023-12-01' = {
  parent: fn
  name: 'scm'
  properties: { allow: false }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a AC-6; SC-12; SC-28A.5.15; A.8.24n/a

Log signals

  • AzureActivity Microsoft.Web/sites/write where the request body sets identity.type = "None" on a Function App that previously had a SystemAssigned identity.
  • AzureActivity Microsoft.Web/sites/config/appsettings/write where the request body adds connection-string secrets that should have come from a managed-identity-mediated Key Vault reference.
  • FunctionAppLogs records showing DefaultAzureCredential falling back to environment-variable provider — runtime symptom that the managed identity path has broken.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.Web/sites/write", "Microsoft.Web/sites/config/appsettings/write")
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "\"identity\":{\"type\":\"None\"" or body has "AccountKey=" or body has "ConnectionString="
          | project TimeGenerated, Caller, ResourceId, OperationNameValue, body
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Function workloads should pull credentials from Key Vault via managed identity, so connection-string appsetting writes are a regression signal even when the identity flag is unchanged.

Alert threshold

  • Function identity removal in production — page on first occurrence.
  • Connection-string appsetting write to a Function App that previously used Key Vault references — page; rotate the underlying credential immediately.

Initial response

  1. Restore the managed identity via the IaC pipeline; replace connection-string appsettings with @Microsoft.KeyVault(...) references.
  2. Walk Key Vault AuditEvent for secret reads issued by the Function App identity during the exposure window — if no reads occurred, the workload was using embedded credentials.
  3. Escalate per general/ir.html — confirm Azure Policy Function apps should use managed identity remains in deny mode at the management-group root.

References

Equivalent on: AWS · GCP · OCI

azure-work-06-aks-workload-identity ! HIGH PREVENTIVE

This is a deliberate umbrella control covering five AKS hardening surfaces that are operationally inseparable in a green-field cluster: (a) Microsoft Entra Workload ID via OIDC federation (workload_identity_enabled = true + oidc_issuer_enabled = true) so pods authenticate to Azure resources without long-lived service-principal secrets — Pod Identity was deprecated 24 October 2022 and the managed add-on is patched only until September 2025 (PITFALL 5); (b) Azure CNI Powered by Cilium with Cilium Network Policy (network_data_plane = "cilium", network_policy = "cilium") for eBPF-backed L4 + L7 network policy default-deny — Azure NPM is retiring September 2028 and new clusters must not adopt it; (c) private API server endpoint (private_cluster_enabled = true) so the Kubernetes control plane has no public IP and reaches the cluster only via Private Endpoint; (d) Microsoft Defender for Containers integration (the agent-based DaemonSet for runtime threat detection plus the agentless image-scanning surface from azure-work-03); and (e) Microsoft Entra-integrated AKS RBAC with Azure RBAC for Kubernetes Authorization so cluster role assignments are managed through Entra group membership instead of through cluster-local kubeconfig files. See Microsoft Learn — Azure Kubernetes Service Workload ID (accessed 2026-05), Microsoft Learn — Azure CNI Powered by Cilium (accessed 2026-05), and Microsoft Learn — Defender for Containers (accessed 2026-05). The umbrella is intentional: each sub-feature is necessary, none is sufficient on its own, and splitting them into separate controls would produce four boxes that all share the same Terraform azurerm_kubernetes_cluster resource and the same threat model. The decision mirrors the Phase 6 aws-work-06-eks-pod-identity umbrella for EKS (Pod Identity + IRSA disposition + audit logging + private endpoint), and is the same anti-fragmentation discipline applied to AKS. Out of scope (deferred to v2): Istio AKS add-on / service-mesh hardening, OPA Gatekeeper / Kyverno admission control, and Defender for Containers runtime detection signature catalogues are out of scope for this control. The runtime-detection plane is enabled by Defender for Containers; signature management lives downstream.

Remediation — Azure CLI

# Azure CLI 2.x
# Create a hardened AKS cluster covering all five umbrella surfaces.
az aks create \
  --resource-group rg-aks-prod-westeu \
  --name aks-orders-prod \
  --kubernetes-version 1.30.0 \
  --node-count 3 \
  --node-vm-size Standard_D4s_v5 \
  --enable-managed-identity \
  --enable-workload-identity \
  --enable-oidc-issuer \
  --network-plugin azure \
  --network-dataplane cilium \
  --network-policy cilium \
  --enable-private-cluster \
  --private-dns-zone "/subscriptions/$SUB/resourceGroups/rg-net-hub-westeu/providers/Microsoft.Network/privateDnsZones/privatelink.westeurope.azmk8s.io" \
  --enable-defender \
  --defender-config workspace-resource-id="/subscriptions/$SUB/resourceGroups/rg-mgmt-prod-westeu/providers/Microsoft.OperationalInsights/workspaces/law-mgmt-prod" \
  --enable-aad \
  --enable-azure-rbac \
  --aad-admin-group-object-ids "<entra-admin-group-object-id>"

# Federate a Kubernetes ServiceAccount to a Microsoft Entra application (Workload ID).
OIDC_ISSUER=$(az aks show -g rg-aks-prod-westeu -n aks-orders-prod --query oidcIssuerProfile.issuerUrl -o tsv)

az identity create -g rg-aks-prod-westeu -n mi-orders-app
APP_CLIENT_ID=$(az identity show -g rg-aks-prod-westeu -n mi-orders-app --query clientId -o tsv)

az identity federated-credential create \
  --name fc-orders \
  --identity-name mi-orders-app \
  --resource-group rg-aks-prod-westeu \
  --issuer "$OIDC_ISSUER" \
  --subject "system:serviceaccount:orders:orders-sa" \
  --audience api://AzureADTokenExchange

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_kubernetes_cluster" "orders" {
  name                = "aks-orders-prod"
  resource_group_name = azurerm_resource_group.aks.name
  location            = azurerm_resource_group.aks.location
  dns_prefix          = "aks-orders-prod"
  kubernetes_version  = "1.30.0"

  # (a) Microsoft Entra Workload Identity — OIDC federation.
  workload_identity_enabled = true
  oidc_issuer_enabled       = true

  # (c) Private API server endpoint — no public IP on the control plane.
  private_cluster_enabled             = true
  private_dns_zone_id                 = azurerm_private_dns_zone.aks.id
  private_cluster_public_fqdn_enabled = false

  identity {
    type = "SystemAssigned"
  }

  default_node_pool {
    name                 = "system"
    node_count           = 3
    vm_size              = "Standard_D4s_v5"
    vnet_subnet_id       = var.aks_subnet_id
    orchestrator_version = "1.30.0"
    only_critical_addons_enabled = true
  }

  # (b) Azure CNI Powered by Cilium + Cilium Network Policy.
  network_profile {
    network_plugin      = "azure"
    network_plugin_mode = "overlay"
    network_data_plane  = "cilium"
    network_policy      = "cilium"
    service_cidr        = "10.100.0.0/16"
    dns_service_ip      = "10.100.0.10"
  }

  # (d) Defender for Containers integration.
  microsoft_defender {
    log_analytics_workspace_id = azurerm_log_analytics_workspace.mgmt.id
  }

  # (e) Microsoft Entra-integrated RBAC + Azure RBAC for K8s Authorization.
  azure_active_directory_role_based_access_control {
    azure_rbac_enabled     = true
    admin_group_object_ids = [var.entra_admin_group_object_id]
  }

  # Control-plane audit logs shipped to the central LAW via Diagnostic Settings (see azure/logging.html).
  oms_agent {
    log_analytics_workspace_id      = azurerm_log_analytics_workspace.mgmt.id
    msi_auth_for_monitoring_enabled = true
  }
}

# Federated identity credential bridging the AKS OIDC issuer to a managed identity.
resource "azurerm_user_assigned_identity" "orders_app" {
  name                = "mi-orders-app"
  resource_group_name = azurerm_resource_group.aks.name
  location            = azurerm_resource_group.aks.location
}

resource "azurerm_federated_identity_credential" "orders" {
  name                = "fc-orders"
  resource_group_name = azurerm_resource_group.aks.name
  parent_id           = azurerm_user_assigned_identity.orders_app.id
  audience            = ["api://AzureADTokenExchange"]
  issuer              = azurerm_kubernetes_cluster.orders.oidc_issuer_url
  subject             = "system:serviceaccount:orders:orders-sa"
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Federated credential binding a Kubernetes ServiceAccount to a UAMI.')
param uamiName string
@description('AKS cluster OIDC issuer URL.')
param aksOidcIssuerUrl string
@description('K8s namespace.')
param namespace string
@description('K8s ServiceAccount name.')
param saName string

resource uami 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' existing = {
  name: uamiName
}

resource fic 'Microsoft.ManagedIdentity/userAssignedIdentities/federatedIdentityCredentials@2023-01-31' = {
  parent: uami
  name: '${namespace}-${saName}'
  properties: {
    issuer: aksOidcIssuerUrl
    subject: 'system:serviceaccount:${namespace}:${saName}'
    audiences: ['api://AzureADTokenExchange']
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/an/a (post-v3.0.0)n/an/a AC-3; AC-6; SC-7A.5.15; A.8.20CLD.9.5.1

Log signals

  • AKSAuditAdmin entries creating a ServiceAccount object without the azure.workload.identity/client-id annotation on a namespace where the GitOps baseline mandates federated identity — silent regression to default token mounting.
  • AuditLogs Category = "ApplicationManagement" showing addition of a client-credential (password or certificate) to a service principal that maps to an AKS service account — adversary establishing a secret-backed parallel identity.
  • AzureActivity removal of federatedIdentityCredentials from a service principal that is the documented workload identity for a deployment.

Query

AzureActivity
          | where OperationNameValue == "Microsoft.GraphServices/applications/federatedIdentityCredentials/delete" or OperationNameValue endswith "addPassword/action"
          | project TimeGenerated, Caller, ResourceId, OperationNameValue
          | order by TimeGenerated desc
          | take 100
          | union (
              AKSAuditAdmin
              | where Verb == "create" and ObjectRef has "serviceaccounts"
              | extend annotations = tostring(parse_json(ResponseObject).metadata.annotations)
              | where not(annotations has "azure.workload.identity/client-id")
              | project TimeGenerated, ObjectRef, User
          )

Run as a KQL query in Log Analytics. The federated-credential delete is the supply-chain pivot signal; the AKSAuditAdmin annotation-absence query catches the dataplane regression independently.

Alert threshold

  • Service account creation without the workload-identity annotation in a production namespace — page on first occurrence.
  • Federated credential removal from a workload-identity-mediating service principal — page; secret fallback is the only alternative.

Initial response

  1. Reapply the ServiceAccount annotation via the GitOps source; reissue the federated credential via the IaC pipeline.
  2. Walk AuditLogs for any password-credential additions to the affected service principal during the exposure window; rotate any such credential immediately.
  3. Escalate per general/ir.html — confirm Azure Policy Kubernetes cluster pods should only use approved host network and port range set and the workload-identity admission webhook remain in enforce mode.

References

Equivalent on: AWS · GCP · OCI

azure-work-07-vm-images ! MEDIUM PREVENTIVE

Production VMs must be built from golden images produced by an Azure Image Builder pipeline and published into a Shared Image Gallery with versioning, regional replication, and Trusted-Launch-compatible (Gen2) image definitions. Image Builder accepts a Packer-style configuration (base image + customisation steps + validation steps) and produces an immutable image version into the gallery; downstream VM scale sets and individual VMs reference the gallery image by version, not by an ad-hoc managed image, so promotion and rollback are explicit and auditable. The hardening customisation steps applied during image build encode the CIS Microsoft Azure Linux / Windows benchmark hardening (sysctl, sshd_config, PAM, audit, firewall, package allow-lists) so the resulting golden image is born hardened rather than hardened after-the-fact via configuration management on a freshly booted VM. Downstream consumers (Container Registry tasks that build container base images, App Service custom images) extend the same supply-chain trust via the ACR content-trust policy from azure-work-03 — see Microsoft Learn — Shared Image Gallery (accessed 2026-05) and Microsoft Learn — Azure VM Image Builder (accessed 2026-05). Cross-link to azure-work-01: golden images must be Gen2 + Trusted-Launch-compatible (HyperVGeneration = V2 + securityType = TrustedLaunchSupported on the image definition) or downstream Trusted Launch enforcement will fail at VM creation. Implementation note: the first-class Terraform resource azurerm_image_template may not yet be GA in the AzureRM 3.x line; until then, deploy the Image Builder template via azurerm_resource_group_template_deployment referencing an ARM JSON template, or via the azapi provider. Both fallbacks are operationally equivalent and the gallery + image-definition surface remains first-class AzureRM. Anti-conflation vs the AWS sibling: aws-work-07-ec2-image-builder-golden-amis uses EC2 Image Builder; this control uses Azure VM Image Builder + Shared Image Gallery — equivalent supply-chain architecture (immutable, versioned, regionally replicated, signed downstream).

Remediation — Azure CLI

# Azure CLI 2.x
# Create the Shared Image Gallery and a Gen2/Trusted-Launch-compatible image definition.
az sig create \
  --resource-group rg-img-prod-westeu \
  --gallery-name sig_prod_westeu \
  --location westeurope

az sig image-definition create \
  --resource-group rg-img-prod-westeu \
  --gallery-name sig_prod_westeu \
  --gallery-image-definition ubuntu-22-04-hardened \
  --publisher example \
  --offer ubuntu-server-hardened \
  --sku 22-04-gen2-tl \
  --os-type Linux \
  --hyper-v-generation V2 \
  --features 'SecurityType=TrustedLaunchSupported'

# Build a hardened image via Azure VM Image Builder (template authored separately).
az image builder create \
  --name img-ubuntu-22-04-hardened-v1 \
  --resource-group rg-img-prod-westeu \
  --location westeurope \
  --image-template image-builder-templates/ubuntu-22-04-hardened.json

az image builder run \
  --name img-ubuntu-22-04-hardened-v1 \
  --resource-group rg-img-prod-westeu

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_shared_image_gallery" "prod" {
  name                = "sig_prod_westeu"
  resource_group_name = azurerm_resource_group.img.name
  location            = azurerm_resource_group.img.location
  description         = "Hardened golden images for production VMs"
}

resource "azurerm_shared_image" "ubuntu_hardened" {
  name                = "ubuntu-22-04-hardened"
  gallery_name        = azurerm_shared_image_gallery.prod.name
  resource_group_name = azurerm_resource_group.img.name
  location            = azurerm_resource_group.img.location
  os_type             = "Linux"
  hyper_v_generation  = "V2"

  # Trusted Launch compatibility — required for downstream azure-work-01 enforcement.
  trusted_launch_supported = true

  identifier {
    publisher = "example"
    offer     = "ubuntu-server-hardened"
    sku       = "22-04-gen2-tl"
  }
}

# Azure Image Builder template — deploy via ARM until azurerm_image_template GAs.
# Fallback: azurerm_resource_group_template_deployment referencing the ARM JSON.
resource "azurerm_resource_group_template_deployment" "builder" {
  name                = "img-ubuntu-22-04-hardened-v1"
  resource_group_name = azurerm_resource_group.img.name
  deployment_mode     = "Incremental"
  template_content    = file("${path.module}/image-builder/ubuntu-22-04-hardened.json")
  parameters_content  = jsonencode({
    galleryImageId = { value = azurerm_shared_image.ubuntu_hardened.id }
    location       = { value = azurerm_resource_group.img.location }
  })
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Azure Compute Gallery hosting hardened golden images.')
param galleryName string

param location string = resourceGroup().location

resource gallery 'Microsoft.Compute/galleries@2024-03-03' = {
  name: galleryName
  location: location
  properties: {
    description: 'Hardened golden images — CIS-benchmarked + Defender-onboarded'
    sharingProfile: { permissions: 'Private' }
  }
}

resource imgDef 'Microsoft.Compute/galleries/images@2024-03-03' = {
  parent: gallery
  name: 'ubuntu-22-04-cis'
  location: location
  properties: {
    osType: 'Linux'
    osState: 'Generalized'
    hyperVGeneration: 'V2'
    features: [
      { name: 'SecurityType', value: 'TrustedLaunch' }
    ]
    identifier: { publisher: 'example', offer: 'ubuntu-cis', sku: '22-04' }
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a CM-2; SI-2; SA-10A.8.9; A.8.32CLD.12.4.5

Log signals

  • AzureActivity Microsoft.Compute/virtualMachines/write where the request body sets storageProfile.imageReference to a publisher outside the documented golden-image gallery — sidesteps the hardened-image supply chain.
  • AzureActivity Microsoft.Compute/galleries/images/versions/delete on a golden image version still referenced by VM Scale Sets — downstream deployments will fall back to the next available version, possibly older.
  • AzureActivity write events where storageProfile.imageReference.id points at a Marketplace image instead of the Shared Image Gallery — coverage erosion on the image-bake pipeline.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.Compute/virtualMachines/write", "Microsoft.Compute/virtualMachineScaleSets/write")
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "\"publisher\"" and not(body has "/galleries/")
          | project TimeGenerated, Caller, ResourceId, body
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Pair with a daily Resource Graph reconciliation listing all VMs whose storageProfile.imageReference.id does not contain the corporate gallery resource ID — coverage erosion is the slow-motion failure mode for image baselines.

Alert threshold

  • VM creation referencing a non-gallery image in production — page on first occurrence.
  • Gallery image-version delete on a version still referenced by a Scale Set — page; the next scale event will land an alternate version.

Initial response

  1. Recreate the VM from the gallery baseline if the workload is stateless; for stateful workloads, schedule a maintenance window to redeploy and migrate state.
  2. Rebake and republish any deleted gallery image version via the Image Builder template; capture the AzureActivity Caller as the supply-chain actor of record.
  3. Escalate per general/ir.html — confirm Azure Policy Allowed virtual machine images remains in deny mode and that the gallery-publisher RBAC is bound to the image-bake service principal only.

References

Equivalent on: AWS · GCP · OCI

azure-work-08-update-manager ! MEDIUM DETECTIVE

Azure Update Manager — the canonical Azure patching plane, successor to Update Management Center — must run in assessment mode on every Linux and Windows VM (and Arc-enabled server, where applicable), with a dynamic scope selecting target VMs by subscription, resource group, or tag, and with at least one maintenance configuration per environment that schedules guest patching during an agreed maintenance window. Assessment mode reports patch compliance against the Defender for Cloud Secure Score (pairing with azure-log-05); the maintenance configuration is what actually applies updates — without one, assessment alone produces visibility but no remediation (Microsoft Learn — Azure Update Manager overview (accessed 2026-05)). The dynamic-scope pattern (tag-based) is the operational lever that lets a single maintenance configuration cover every VM tagged patch_schedule=monthly-tier-1 across dozens of subscriptions without per-VM enrolment. Severity is MEDIUM DETECTIVE following the Phase 6 aws-work-08-systems-manager-patch-manager rationale: the control surfaces compliance state and applies updates on a schedule; it is not a real-time intrusion-prevention surface (that role belongs to azure-work-04 Defender for Servers Plan 2), and the operational-hygiene character of patching makes it detective rather than preventive at the per-VM level. Out of scope (deferred): Update Manager for Azure Arc-enabled SQL Server / Hot Patch for Windows Server / kernel live patching on Linux are surface-specific overlays referenced in passing; the core control here is the VM-fleet patching plane.

Remediation — Azure CLI

# Azure CLI 2.x
# Create a monthly maintenance configuration for Tier-1 production VMs.
az maintenance configuration create \
  --resource-group rg-mgmt-prod-westeu \
  --resource-name mc-tier1-monthly \
  --location westeurope \
  --maintenance-scope InGuestPatch \
  --duration 03:00 \
  --recur-every '1Month Second Sunday' \
  --start-date-time '2026-06-14 02:00' \
  --time-zone 'UTC' \
  --reboot-setting IfRequired \
  --windows-parameters classifications-to-include='Critical' 'Security' \
  --linux-parameters classifications-to-include='Critical' 'Security'

# Bind a dynamic scope (tag-based) to the configuration so newly tagged VMs are auto-enrolled.
az maintenance assignment create \
  --resource-group rg-mgmt-prod-westeu \
  --maintenance-configuration-id "/subscriptions/$SUB/resourceGroups/rg-mgmt-prod-westeu/providers/Microsoft.Maintenance/maintenanceConfigurations/mc-tier1-monthly" \
  --name asgn-tier1-monthly \
  --filter '{"resourceGroups":[],"resourceTypes":["Microsoft.Compute/virtualMachines"],"tagSettings":{"filterOperator":"All","tags":{"patch_schedule":["monthly-tier-1"]}}}'

# Trigger an on-demand assessment across the dynamic scope.
az vm assess-patches --ids $(az vm list --query "[?tags.patch_schedule=='monthly-tier-1'].id" -o tsv)

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn (accessed 2026-05)
resource "azurerm_maintenance_configuration" "tier1_monthly" {
  name                = "mc-tier1-monthly"
  resource_group_name = azurerm_resource_group.mgmt.name
  location            = azurerm_resource_group.mgmt.location
  scope               = "InGuestPatch"

  window {
    start_date_time = "2026-06-14 02:00"
    duration        = "03:00"
    recur_every     = "1Month Second Sunday"
    time_zone       = "UTC"
  }

  install_patches {
    reboot = "IfRequired"
    linux {
      classifications_to_include = ["Critical", "Security"]
    }
    windows {
      classifications_to_include = ["Critical", "Security"]
    }
  }

  in_guest_user_patch_mode = "User"
}

# Dynamic scope: every VM tagged patch_schedule=monthly-tier-1 across the subscription.
resource "azurerm_maintenance_assignment_dynamic_scope" "tier1_monthly" {
  name                         = "asgn-tier1-monthly"
  maintenance_configuration_id = azurerm_maintenance_configuration.tier1_monthly.id

  filter {
    resource_types = ["Microsoft.Compute/virtualMachines"]
    locations      = ["westeurope"]
    tag_filter     = "All"
    tags {
      tag    = "patch_schedule"
      values = ["monthly-tier-1"]
    }
  }
}

# Enforce per-VM assessment mode via Azure Policy at the management group.
resource "azurerm_management_group_policy_assignment" "auto_assess_required" {
  name                 = "vm-auto-assessment-required"
  management_group_id  = "/providers/Microsoft.Management/managementGroups/tenant-root"
  policy_definition_id = var.update_manager_assessment_policy_id
  description          = "Require assessmentMode=AutomaticByPlatform on all VMs"
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Azure Update Manager maintenance configuration (monthly patching window).')
param configName string

param location string = resourceGroup().location

resource maint 'Microsoft.Maintenance/maintenanceConfigurations@2023-10-01-preview' = {
  name: configName
  location: location
  properties: {
    maintenanceScope: 'InGuestPatch'
    visibility: 'Custom'
    installPatches: {
      rebootSetting: 'IfRequired'
      linuxParameters:   { classificationsToInclude: ['Security', 'Critical'] }
      windowsParameters: { classificationsToInclude: ['Security', 'Critical'] }
    }
    maintenanceWindow: {
      startDateTime: '2026-06-01 02:00'
      duration: '03:00'
      timeZone: 'UTC'
      recurEvery: '1Month Third Sunday'
    }
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a SI-2; SI-2(2); CM-3A.8.8; A.8.9CLD.12.4.5

Log signals

  • AzureActivity Microsoft.Maintenance/maintenanceConfigurations/delete on a configuration that was previously the patching schedule for a production VM scope — silently halts patch deployment.
  • AzureActivity Microsoft.Compute/virtualMachines/installPatches/action failures over multiple consecutive runs on the same VM — patch pipeline broken even though the configuration still exists.
  • UpdateRunProgress table showing a sudden drop in covered-VM count on a recurring run — fleet-level coverage failure.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.Maintenance/maintenanceConfigurations/delete", "Microsoft.Maintenance/configurationAssignments/delete")
          | project TimeGenerated, Caller, ResourceId, OperationNameValue
          | order by TimeGenerated desc
          | take 200
          | union (
              UpdateRunProgress
              | summarize covered=count() by Configuration=tostring(MaintenanceConfigurationName), Run=tostring(RunId)
              | where covered == 0
              | project TimeGenerated=now(), ResourceId=Configuration, Caller="watchdog", OperationNameValue="zero-coverage-run"
          )

Run as a KQL query in Log Analytics. The zero-coverage run query catches schedule-still-exists-but-broke failures that the management-plane delete event misses; persist as a Sentinel analytics rule with severity High.

Alert threshold

  • Maintenance configuration delete on a production-scope schedule — page on first occurrence.
  • Two consecutive zero-coverage runs on the same schedule — page; the patching layer is silently broken.

Initial response

  1. Recreate the maintenance configuration via the IaC pipeline; trigger an on-demand assessment via az maintenance assignment create followed by a manual run.
  2. Walk Update Manager assessment reports for the affected VM scope — any VM that has missed critical CVE patches during the gap is incident-grade.
  3. Escalate per general/ir.html — confirm Azure Policy Configure periodic checking for missing system updates on Azure virtual machines remains assigned in DeployIfNotExists mode.

References

Equivalent on: AWS · GCP · OCI

Sources