Azure AKS Hardening

Overview

This page covers hardening controls for Azure Kubernetes Service (AKS). Both AKS Standard and AKS Automatic cluster modes are addressed — Standard/Automatic differences are noted in per-control callouts immediately below each control header. Where a control is enforced by default in AKS Automatic, the callout identifies it; where AKS Automatic manages a setting on your behalf, the callout explains what Azure handles. See general/kubernetes.html for the cross-cutting threat model, cluster-baseline principles, and common misconfigurations that apply to all providers.

Controls are ordered by TSV anchor sequence (which approximates severity: CRITICAL first, then HIGH, then MEDIUM). Terraform examples use hashicorp/azurerm ~> 4.0. The sealed v1.0 Azure pages use the same provider pin — both contracts are mutually consistent. Supporting Entra ID Workload Identity prerequisites (federated credentials, UAMI provisioning) live on azure/iam.html; private cluster VNet, private DNS zone, and IP allow-list patterns live on azure/network.html; Log Analytics workspace + Diagnostic Settings sink configuration lives on azure/logging.html.

azure-k8s-01 ! CRITICAL PREVENTIVE

AKS Standard: Enable --enable-private-cluster + --private-dns-zone system + --api-server-authorized-ip-ranges for any required external management. AKS Automatic: Private cluster is the default; verify with az aks show --query apiServerAccessProfile. CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 §5.5 covers private-cluster posture.

Provision the AKS cluster with a private control-plane endpoint so the kube-apiserver is not reachable from the public internet. The cluster's API server gets a private IP inside the VNet and is resolvable only via the linked private DNS zone. For required external management — CI/CD agents, bastions, operator workstations — combine with --api-server-authorized-ip-ranges to allow-list only known CIDRs. A public kube-apiserver is the single largest AKS breach surface — any leaked kubeconfig, service-account token, or Entra ID bearer token becomes immediately usable from any internet host.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_kubernetes_cluster" "hardened" {
  name                = "hardened-cluster"
  location            = var.location
  resource_group_name = azurerm_resource_group.aks.name
  dns_prefix          = "hardened"

  private_cluster_enabled             = true
  private_dns_zone_id                 = "System"
  private_cluster_public_fqdn_enabled = false

  api_server_access_profile {
    authorized_ip_ranges = ["203.0.113.0/24"]
  }

  default_node_pool {
    name       = "system"
    vm_size    = "Standard_D4s_v5"
    node_count = 3
  }

  identity {
    type = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.aks_cluster.id]
  }
}

Remediation — az aks CLI

az aks create \
  --resource-group hardened-rg \
  --name hardened-cluster \
  --enable-private-cluster \
  --private-dns-zone system \
  --disable-public-fqdn \
  --api-server-authorized-ip-ranges 203.0.113.0/24 \
  --enable-managed-identity \
  --assign-identity /subscriptions/SUB/resourceGroups/hardened-rg/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-cluster-uami \
  --network-plugin azure \
  --network-plugin-mode overlay

Remediation — Bicep

targetScope = 'resourceGroup'

@description('AKS cluster name (private API server endpoint).')
param clusterName string
@description('Subnet ID for system node pool.')
param subnetId string

param location string = resourceGroup().location

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    dnsPrefix: clusterName
    apiServerAccessProfile: {
      enablePrivateCluster: true
      privateDNSZone: 'system'
      enablePrivateClusterPublicFQDN: false
    }
    networkProfile: {
      networkPlugin: 'azure'
      networkPolicy: 'cilium'
      loadBalancerSku: 'standard'
    }
    enableRBAC: true
    aadProfile: {
      managed: true
      enableAzureRBAC: true
    }
    agentPoolProfiles: [
      {
        name: 'system'
        count: 3
        vmSize: 'Standard_D4ds_v5'
        mode: 'System'
        vnetSubnetID: subnetId
      }
    ]
  }
}

Remediation — Pulumi (TypeScript)

import * as pulumi from "@pulumi/pulumi";
import * as cs from "@pulumi/azure-native/containerservice";
import * as resources from "@pulumi/azure-native/resources";

const rg = new resources.ResourceGroup("aks-rg");

new cs.ManagedCluster("aks-private", {
  resourceGroupName: rg.name,
  identity: { type: cs.ResourceIdentityType.SystemAssigned },
  dnsPrefix: "aks-private",
  apiServerAccessProfile: {
    enablePrivateCluster: true,
    privateDNSZone: "system",
    enablePrivateClusterPublicFQDN: false,
  },
  networkProfile: {
    networkPlugin: cs.NetworkPlugin.Azure,
    networkPolicy: "cilium",
    loadBalancerSku: cs.LoadBalancerSku.Standard,
  },
  enableRBAC: true,
  aadProfile: { managed: true, enableAzureRBAC: true },
  agentPoolProfiles: [{
    name: "system", count: 3, vmSize: "Standard_D4ds_v5",
    mode: cs.AgentPoolMode.System, vnetSubnetID: "<subnet-id>",
  }],
});

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-01 CRITICAL PREVENTIVE Azure AKS n/a (managed control plane) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) AC-17; SC-7; SC-8 A.8.20; A.8.22 CLD.13.1.4 NIST SP 800-190 §4.4.1 NSA/CISA Kubernetes Hardening Guide v1.2 §2 (Network separation)

Log signals

  • AzureActivity OperationNameValue = "Microsoft.ContainerService/managedClusters/write" where the request body sets apiServerAccessProfile.enablePrivateCluster = false — flips the cluster API surface from VNet-private back to internet-reachable.
  • AzureActivity body diff showing authorizedIPRanges widened beyond the documented administrator CIDR list (canonical regression: 0.0.0.0/0).
  • AzureDiagnostics ResourceProvider = "MICROSOFT.CONTAINERSERVICE" Category kube-apiserver entries showing requests from source IPs outside the operator-jump-host network.

Query

AzureActivity
          | where OperationNameValue == "Microsoft.ContainerService/managedClusters/write"
          | where ActivityStatusValue == "Success"
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "enablePrivateCluster\":false" or body has "0.0.0.0/0"
          | project TimeGenerated, Caller, CallerIpAddress, ResourceId, body
          | order by TimeGenerated desc
          | take 200

Run the KQL query in Log Analytics against the workspace receiving AzureActivity export. Promote to a Sentinel analytics rule with severity High; the AKS control plane is one of the highest-value lateral-movement pivots in the tenant.

Alert threshold

  • Any flip of enablePrivateCluster from true to false in production — page immediately; the cluster control plane is now reachable from the internet until rolled back.
  • Any 0.0.0.0/0 entry appearing in authorizedIPRanges — block via Azure Policy preview before alert fan-out and treat the attempt as the incident.

Initial response

  1. Roll back via az aks update --name {cluster} --resource-group {rg} --enable-private-cluster or apply the IaC baseline; capture the AzureActivity OperationId and Caller as the forensic ledger entry.
  2. Inspect the AKS control-plane audit logs (kube-apiserver Category in AzureDiagnostics) for the exposure window — any verb=create or verb=patch from an unexpected sourceIPs warrants treating affected pods as compromised.
  3. Escalate per general/ir.html — rotate cluster certificates via az aks rotate-certs and confirm the Azure Policy denying enablePrivateCluster=false is in deny mode.

References

Equivalent controls in other providers: GKE private cluster + authorized networks, EKS private endpoint, OKE private API endpoint.

azure-k8s-02 ! HIGH PREVENTIVE

AKS Standard: Enable --enable-workload-identity + --enable-oidc-issuer at cluster create (or via az aks update on an existing cluster). AKS Automatic: Microsoft Entra Workload Identity is enabled by default; the OIDC issuer URL is provisioned automatically. ServiceAccount annotation: azure.workload.identity/client-id: <UAMI-CLIENT-ID>. The federated identity credential on the UAMI binds the Kubernetes ServiceAccount subject to the User-Assigned Managed Identity in Microsoft Entra ID.

Use Microsoft Entra Workload Identity so pods authenticate to Azure resources via short-lived OIDC tokens federated to a User-Assigned Managed Identity (UAMI) — no static secrets stored in the cluster. The Kubernetes ServiceAccount projects an OIDC token; Microsoft Entra ID validates the token against the federated credential and exchanges it for an Azure access token scoped to the UAMI. Microsoft Entra Workload Identity replaced the previous Azure AD pod identity mechanism (the legacy pod-identity webhook + MIC controller pair, end-of-life September 2025). The current mechanism is Azure-native, requires no add-on controllers in the cluster, and integrates directly with Microsoft Entra ID's federated credential model.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_user_assigned_identity" "app" {
  name                = "app-workload-uami"
  resource_group_name = azurerm_resource_group.aks.name
  location            = var.location
}

resource "azurerm_federated_identity_credential" "app" {
  name                = "app-federated"
  resource_group_name = azurerm_resource_group.aks.name
  parent_id           = azurerm_user_assigned_identity.app.id
  audience            = ["api://AzureADTokenExchange"]
  issuer              = azurerm_kubernetes_cluster.hardened.oidc_issuer_url
  subject             = "system:serviceaccount:production:app-sa"
}

# AKS cluster flags
resource "azurerm_kubernetes_cluster" "hardened" {
  # ... other args ...
  workload_identity_enabled = true
  oidc_issuer_enabled       = true
}

Remediation — az aks CLI + kubectl

az aks update \
  --resource-group hardened-rg \
  --name hardened-cluster \
  --enable-workload-identity \
  --enable-oidc-issuer

# Create the federated identity credential on the UAMI
az identity federated-credential create \
  --name app-federated \
  --identity-name app-workload-uami \
  --resource-group hardened-rg \
  --issuer "$(az aks show -g hardened-rg -n hardened-cluster --query oidcIssuerProfile.issuerUrl -o tsv)" \
  --subject "system:serviceaccount:production:app-sa" \
  --audiences "api://AzureADTokenExchange"

# Annotate the Kubernetes ServiceAccount
kubectl create serviceaccount app-sa --namespace production
kubectl annotate serviceaccount app-sa --namespace production \
  azure.workload.identity/client-id=<UAMI-CLIENT-ID>

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Existing AKS cluster.')
param clusterName string

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: resourceGroup().location
  properties: {
    oidcIssuerProfile: { enabled: true }
    securityProfile: {
      workloadIdentity: { enabled: true }
    }
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-02 HIGH PREVENTIVE Azure AKS n/a (managed control plane) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) IA-2; AC-6; IA-5 A.5.15; A.5.18 n/a NIST SP 800-190 §4.4.2 NSA/CISA Kubernetes Hardening Guide v1.2 §4 (Authentication and authorization)

Log signals

  • AzureActivity Microsoft.ContainerService/managedClusters/write where oidcIssuerProfile.enabled or securityProfile.workloadIdentity.enabled flips from true to false — disarms the federated-credential path and forces secret fallback.
  • AKSAuditAdmin entries where verb = "create" targets secrets resources with names matching *-azure-credentials within an hour of the workload-identity disable event.
  • AuditLogs Category = "ApplicationManagement" showing a federated identity credential removal (Update application – Certificates and secrets management) from a service principal mapped to a cluster service account.

Query

AzureActivity
          | where OperationNameValue == "Microsoft.ContainerService/managedClusters/write"
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "workloadIdentity\":{\"enabled\":false" or body has "oidcIssuerProfile\":{\"enabled\":false"
          | project TimeGenerated, Caller, ResourceId, body
          | order by TimeGenerated desc
          | take 200

Run the KQL query in Log Analytics; AKS workload identity disablement is the supply-chain pivot toward Kubernetes Secret-backed credentials. Pair with a Sentinel analytics rule joining the AKSAuditAdmin secret-create stream over a 60-minute window.

Alert threshold

  • Any workload identity disablement in production — page immediately; pods are now reauthenticating with whatever secret is on disk.
  • Three or more federated-credential removals across the tenant in a 24h window — supply-chain campaign targeting workload-identity tenants.

Initial response

  1. Re-enable via az aks update --enable-workload-identity --enable-oidc-issuer; reapply federated credentials per the IaC baseline and force a rolling restart of affected namespaces.
  2. Inspect AKSAuditAdmin for the exposure window — every Secret read by a workload that should be using federated credentials is a leak event.
  3. Escalate per general/ir.html — rotate any service-principal secret that was issued during the disable window and reconfirm the Azure Policy denying workload-identity disable is in deny mode.

References

Equivalent controls in other providers: GKE Workload Identity Federation, EKS Pod Identity, OKE Workload Identity.

azure-k8s-03 ! HIGH PREVENTIVE

AKS Standard: Enable KMS v2 envelope encryption via --enable-azure-keyvault-kms + --azure-keyvault-kms-key-id referencing a Customer-Managed Key (CMK) in Azure Key Vault. KMS v2 is the current Kubernetes envelope encryption protocol (KMS v1 is deprecated as of Kubernetes 1.28). AKS Automatic: KMS v2 + Azure Key Vault integration is opt-in even in Automatic mode; pass the same flags at cluster creation. Network access on the Key Vault must be restricted (Private Link recommended) so the KMS plugin reaches the vault over the VNet only.

Enable KMS v2 envelope encryption for the AKS etcd store so Kubernetes Secrets are encrypted at the application layer with a Customer-Managed Key (CMK) held in Azure Key Vault. This sits on top of Azure-managed at-rest encryption and gives the customer authoritative control over the key lifecycle — rotation cadence, access policies, and emergency revocation. Without CMK envelope encryption, the encryption key is held entirely by Azure; with it, key revocation in Key Vault immediately renders all cluster Secret material undecryptable until the key is restored.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_key_vault" "aks_kms" {
  name                       = "aks-kms-kv"
  location                   = var.location
  resource_group_name        = azurerm_resource_group.aks.name
  tenant_id                  = data.azurerm_client_config.current.tenant_id
  sku_name                   = "premium"
  purge_protection_enabled   = true
  enable_rbac_authorization  = true

  network_acls {
    default_action = "Deny"
    bypass         = "AzureServices"
  }
}

resource "azurerm_key_vault_key" "aks_etcd" {
  name         = "aks-etcd-cmk"
  key_vault_id = azurerm_key_vault.aks_kms.id
  key_type     = "RSA"
  key_size     = 2048
  key_opts     = ["wrapKey", "unwrapKey"]
}

resource "azurerm_kubernetes_cluster" "hardened" {
  # ... other args ...
  key_management_service {
    key_vault_key_id         = azurerm_key_vault_key.aks_etcd.id
    key_vault_network_access = "Private"
  }
}

Remediation — az aks CLI

KEY_ID=$(az keyvault key show \
  --vault-name aks-kms-kv \
  --name aks-etcd-cmk \
  --query key.kid -o tsv)

az aks create \
  --resource-group hardened-rg \
  --name hardened-cluster \
  --enable-azure-keyvault-kms \
  --azure-keyvault-kms-key-id "$KEY_ID" \
  --azure-keyvault-kms-key-vault-network-access Private \
  --azure-keyvault-kms-key-vault-resource-id /subscriptions/SUB/resourceGroups/hardened-rg/providers/Microsoft.KeyVault/vaults/aks-kms-kv \
  --enable-managed-identity

Remediation — Bicep

targetScope = 'resourceGroup'

@description('AKS cluster name (must have OIDC + workload identity for KMS plugin).')
param clusterName string
@description('Versionless Key Vault key URI for etcd KMS plugin.')
param keyVaultKeyId string
@description('Key Vault resource ID granting AKS identity Encrypt/Decrypt.')
param keyVaultResourceId string

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: resourceGroup().location
  properties: {
    securityProfile: {
      azureKeyVaultKms: {
        enabled: true
        keyId: keyVaultKeyId
        keyVaultNetworkAccess: 'Private'
        keyVaultResourceId: keyVaultResourceId
      }
    }
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-03 HIGH PREVENTIVE Azure AKS §1.2 (etcd encryption posture) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) SC-28; IA-5 A.8.24; A.8.10 n/a NIST SP 800-190 §4.3.2 NSA/CISA Kubernetes Hardening Guide v1.2 §5 (Log auditing and threat detection — secrets handling)

Log signals

  • AzureActivity Microsoft.ContainerService/managedClusters/write where securityProfile.azureKeyVaultKms.enabled flips to false or where keyId is removed — disables the Key Vault-backed envelope encryption layer for cluster secrets.
  • AzureDiagnostics ResourceProvider = "MICROSOFT.KEYVAULT" OperationName = "KeyDelete" targeting the cluster KMS key — would silently strip the unwrap path.
  • AKSAuditAdmin entries showing repeated 500-class responses to secrets reads — symptom of a broken KMS plugin chain after key rotation gone wrong.

Query

AzureActivity
          | where OperationNameValue == "Microsoft.ContainerService/managedClusters/write"
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "azureKeyVaultKms\":{\"enabled\":false" or body has "\"keyId\":\"\""
          | project TimeGenerated, Caller, ResourceId, body
          | order by TimeGenerated desc
          | take 100

Run the KQL query in Log Analytics and persist as a Sentinel analytics rule. KMS-disable on a running AKS cluster is rare and indicates either a misconfigured operator action or deliberate envelope-encryption bypass.

Alert threshold

  • Any KMS disablement on a cluster carrying production data — page on first occurrence.
  • Key Vault key delete event on a key referenced by an AKS cluster — page immediately; the cluster will fail to unwrap secrets within minutes.

Initial response

  1. Re-enable KMS via az aks update --enable-azure-keyvault-kms --azure-keyvault-kms-key-id {keyUri}; if the underlying Key Vault key was deleted, soft-delete recovery via az keyvault key recover is the first move.
  2. Snapshot etcd via az aks command invoke --command "kubectl get secrets -A -o yaml" from a privileged-access workstation to enumerate what secrets the cluster currently believes it holds; treat any drift versus the GitOps source as suspect.
  3. Escalate per general/ir.html — rotate all cluster-scoped secrets via the IaC pipeline and reconfirm the Key Vault firewall + RBAC scope still permits the cluster's managed identity.

References

Equivalent controls in other providers: GKE Cloud KMS application-layer secrets encryption, EKS KMS envelope encryption, OKE OCI Vault CMK secrets encryption.

azure-k8s-04 ! HIGH DETECTIVE

AKS Standard: Enable Microsoft Defender for Containers via az aks update --enable-defender or by enabling the Defender for Containers plan in Microsoft Defender for Cloud at subscription scope. AKS Automatic: Defender for Containers is enabled by default; verify in the Microsoft Defender for Cloud portal under Environment settings.

Enable Microsoft Defender for Containers to provide runtime threat protection (eBPF-based detection on cluster nodes), admission control posture, vulnerability scanning of running workloads, and centralized security posture management in Microsoft Defender for Cloud. Defender for Containers is the AKS-native answer to runtime detection: it streams kernel-level signals through an eBPF sensor on each node and correlates them against Microsoft's threat library, flagging cryptominer execution, reverse-shell patterns, privilege-escalation attempts, and known-malicious-binary hashes. Without it, in-pod behavior between API events is invisible.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_log_analytics_workspace" "aks" {
  name                = "aks-law"
  location            = var.location
  resource_group_name = azurerm_resource_group.aks.name
  sku                 = "PerGB2018"
  retention_in_days   = 90
}

resource "azurerm_kubernetes_cluster" "hardened" {
  # ... other args ...
  microsoft_defender {
    log_analytics_workspace_id = azurerm_log_analytics_workspace.aks.id
  }
}

# Subscription-scoped Defender for Containers plan (optional but recommended)
resource "azurerm_security_center_subscription_pricing" "containers" {
  tier          = "Standard"
  resource_type = "Containers"
}

Remediation — az aks CLI

LAW_ID=$(az monitor log-analytics workspace show \
  --resource-group hardened-rg \
  --workspace-name aks-law \
  --query id -o tsv)

az aks update \
  --resource-group hardened-rg \
  --name hardened-cluster \
  --enable-defender \
  --defender-config-workspace-resource-id "$LAW_ID"

# Subscription-scoped plan
az security pricing create --name Containers --tier Standard

Remediation — Bicep

targetScope = 'subscription'

resource defenderContainers 'Microsoft.Security/pricings@2024-01-01' = {
  name: 'Containers'
  properties: {
    pricingTier: 'Standard'
    subPlan: 'ContainerSensor'
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-04 HIGH DETECTIVE Azure AKS n/a (runtime detection) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) SI-3; SI-4; AU-12 A.8.7; A.8.16 CLD.12.4.5 NIST SP 800-190 §4.4.4 NSA/CISA Kubernetes Hardening Guide v1.2 §6 (Audit logging and threat detection)

Log signals

  • AzureActivity Microsoft.Security/pricings/write where name = "Containers" and the request body sets pricingTier = "Free" — disarms the Defender for Containers analytics layer across the tenant or subscription.
  • AzureDiagnostics ResourceProvider = "MICROSOFT.SECURITY" entries where OperationName matches "DefenderForContainersConfiguration/Disable" at the cluster scope.
  • SecurityAlert table entries that abruptly drop to zero per-cluster after a previously steady baseline — indicates the sensor agent is no longer reporting.

Query

AzureActivity
          | where OperationNameValue == "Microsoft.Security/pricings/write"
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "Containers" and body has "Free"
          | project TimeGenerated, Caller, SubscriptionId, body
          | order by TimeGenerated desc
          | take 100

Run as a KQL query in Log Analytics. Pair with a daily anomaly query against the SecurityAlert table grouped by cluster — sudden silence is more informative than the disable event itself when the disable was applied at a parent scope.

Alert threshold

  • Any flip of the Containers plan from Standard to Free at the subscription or tenant scope — page on first occurrence; entire cluster fleets just lost run-time threat detection.
  • A 24h window with zero SecurityAlerts for a cluster that previously generated more than five — investigate the sensor health and the underlying Defender plan state.

Initial response

  1. Re-enable via az security pricing create --name Containers --tier Standard; reconfirm the cluster's Microsoft Defender extension is healthy via az aks show --query securityProfile.defender.
  2. Run a baseline sweep with the Defender for Containers vulnerability assessment query in Defender XDR to confirm the agent backlog has drained.
  3. Escalate per general/ir.html — the disable window itself is incident-grade if any production cluster was uncovered; confirm the Azure Policy denying Defender plan downgrade is in deny mode.

References

azure-k8s-05 ! HIGH PREVENTIVE

AKS Standard: Enable the Azure Policy add-on via az aks enable-addons --addons azure-policy. Assign the built-in initiative Kubernetes cluster pod security restricted standards for Linux-based workloads at cluster or subscription scope to enforce Pod Security Standards via Azure Policy. AKS Automatic: The Azure Policy add-on is enabled by default; the operator still chooses which initiative to assign.

Enable the Azure Policy add-on for AKS and assign the built-in Pod Security Standards initiative to enforce the Restricted profile at admission time. The Azure Policy add-on installs the Gatekeeper-based admission controller in the cluster and translates Azure Policy assignments into ConstraintTemplates and Constraints; this gives a single Azure-native policy-as-code surface that covers both Azure ARM resources and in-cluster Kubernetes objects. The built-in Restricted-PSS initiative blocks privileged pod creation, hostPath mounts, host namespace sharing, and other workload-tenant escape vectors. This control intentionally covers both Azure Policy add-on enablement and Pod Security Standards enforcement, because Azure Policy is the AKS-native PSS enforcement path; the upstream Kubernetes PodSecurity admission controller is also available as a parallel mechanism.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_kubernetes_cluster" "hardened" {
  # ... other args ...
  azure_policy_enabled = true
}

# Assign the built-in PSS Restricted initiative to the cluster scope
data "azurerm_policy_set_definition" "pss_restricted" {
  display_name = "Kubernetes cluster pod security restricted standards for Linux-based workloads"
}

resource "azurerm_resource_group_policy_assignment" "pss" {
  name                 = "aks-pss-restricted"
  resource_group_id    = azurerm_resource_group.aks.id
  policy_definition_id = data.azurerm_policy_set_definition.pss_restricted.id
  parameters = jsonencode({
    effect = { value = "deny" }
  })
}

Remediation — az aks CLI

az aks enable-addons \
  --resource-group hardened-rg \
  --name hardened-cluster \
  --addons azure-policy

# Assign the built-in PSS Restricted initiative
INIT_ID=$(az policy set-definition list \
  --query "[?displayName=='Kubernetes cluster pod security restricted standards for Linux-based workloads'].id | [0]" -o tsv)

az policy assignment create \
  --name aks-pss-restricted \
  --policy-set-definition "$INIT_ID" \
  --scope /subscriptions/SUB/resourceGroups/hardened-rg \
  --params '{"effect":{"value":"deny"}}'

Remediation — Bicep

targetScope = 'resourceGroup'

@description('AKS cluster to enable Azure Policy add-on on.')
param clusterName string

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: resourceGroup().location
  properties: {
    addonProfiles: {
      azurepolicy: {
        enabled: true
      }
    }
  }
}

// Assign restricted-pod-security policy initiative at the cluster scope (RG-scoped here).
resource policyAssign 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
  name: 'aks-restricted-pod-security'
  properties: {
    policyDefinitionId: tenantResourceId('Microsoft.Authorization/policySetDefinitions', '42b8ef37-b724-4e24-bbc8-7a7708edfe00')
    displayName: 'Kubernetes cluster pod security restricted standards'
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-05 HIGH PREVENTIVE Azure AKS §5.2 (Pod security) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) CM-6; AC-6; SI-3 A.8.9; A.8.28 CLD.6.3.1 NIST SP 800-190 §4.2 NSA/CISA Kubernetes Hardening Guide v1.2 §3 (Pod security)

Log signals

  • AzureActivity Microsoft.ContainerService/managedClusters/write where the request body removes azurepolicy from addonProfiles or sets enabled = false — disarms the Gatekeeper admission webhook chain.
  • AKSAuditAdmin entries showing admission decisions disappear from the stream — Gatekeeper's ValidatingAdmissionWebhook log entries should be continuous; gaps indicate webhook failure-open.
  • AzureActivity Microsoft.Authorization/policyAssignments/delete targeting a Kubernetes-policy initiative — coverage erosion at the policy layer rather than the addon.

Query

AzureActivity
          | where OperationNameValue == "Microsoft.ContainerService/managedClusters/write"
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "azurepolicy\":{\"enabled\":false" or (body has "addonProfiles" and not(body has "azurepolicy"))
          | project TimeGenerated, Caller, ResourceId, body
          | order by TimeGenerated desc
          | take 100

Run the KQL query in Log Analytics. Azure Policy addon disablement is one of the highest-signal AKS coverage regressions; persist as a Sentinel analytics rule with severity High.

Alert threshold

  • Any Azure Policy addon disable on a production cluster — page on first occurrence.
  • Cluster-policy initiative delete that takes the cluster outside the documented policy assignment scope — page immediately even if the addon is still enabled.

Initial response

  1. Re-enable via az aks enable-addons --addons azure-policy; confirm the policy initiative is still assigned to the resource group via az policy assignment list.
  2. Sweep recent AKSAuditAdmin for any resource that would have failed the policy gate during the disable window — privileged pods, hostNetwork enablement, hostPath mounts — and treat them as suspect.
  3. Escalate per general/ir.html — reconfirm the parent Azure Policy denying addon-disable is in deny mode and that the initiative compliance scan has refreshed.

References

Equivalent controls in other providers: GKE PSS via PodSecurity admission, EKS PSS namespace labels, OKE PSS admission.

azure-k8s-06 ! HIGH PREVENTIVE

AKS Standard: Enable Microsoft Entra ID integration + Azure RBAC for Kubernetes Authorization at cluster create — pass --enable-aad --enable-azure-rbac --aad-admin-group-object-ids <GROUP-OID> --disable-local-accounts. AKS Automatic: Entra ID integration, Azure RBAC for K8s Authorization, and local-accounts-disabled are all defaults.

Use Microsoft Entra ID as the cluster authentication provider and Azure RBAC for Kubernetes Authorization as the authorization layer, then disable local Kubernetes accounts via --disable-local-accounts. The local-accounts-disabled flag is the critical hardening step here: legacy local Kubernetes admin accounts (kubeconfig credentials issued by the cluster itself) sit outside Microsoft Entra ID's authentication and audit pipeline, so if a legacy kubeconfig leaks, the attacker bypasses Conditional Access policies, MFA, and Entra ID sign-in logs entirely. Azure built-in roles such as Azure Kubernetes Service RBAC Admin, Azure Kubernetes Service RBAC Cluster Admin, Azure Kubernetes Service RBAC Reader, and Azure Kubernetes Service RBAC Writer map to Kubernetes verbs and resources, so role assignments live in Azure RBAC rather than Kubernetes RoleBindings.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_kubernetes_cluster" "hardened" {
  # ... other args ...
  local_account_disabled = true

  azure_active_directory_role_based_access_control {
    tenant_id              = data.azurerm_client_config.current.tenant_id
    admin_group_object_ids = [var.aks_admin_group_oid]
    azure_rbac_enabled     = true
  }
}

# Assign Azure RBAC for Kubernetes role to a user/group
resource "azurerm_role_assignment" "aks_rbac_admin" {
  scope                = azurerm_kubernetes_cluster.hardened.id
  role_definition_name = "Azure Kubernetes Service RBAC Admin"
  principal_id         = var.platform_team_group_oid
}

Remediation — az aks CLI

az aks create \
  --resource-group hardened-rg \
  --name hardened-cluster \
  --enable-aad \
  --enable-azure-rbac \
  --aad-admin-group-object-ids <GROUP-OID> \
  --disable-local-accounts \
  --enable-managed-identity

# Grant Azure RBAC for Kubernetes role to a user
az role assignment create \
  --assignee <USER-OID> \
  --role "Azure Kubernetes Service RBAC Admin" \
  --scope /subscriptions/SUB/resourceGroups/hardened-rg/providers/Microsoft.ContainerService/managedClusters/hardened-cluster

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Existing AKS cluster name.')
param clusterName string
@description('Entra ID group for SREs (RBAC Cluster Admin).')
param sreGroupObjectId string

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' existing = {
  name: clusterName
}

// AKS RBAC Cluster Admin role
var clusterAdminRoleId = 'b1ff04bb-8a4e-4dc4-8eb5-8693973ce19b'

resource assignSre 'Microsoft.Authorization/roleAssignments@2024-04-01' = {
  scope: aks
  name: guid(aks.id, sreGroupObjectId, clusterAdminRoleId)
  properties: {
    principalId: sreGroupObjectId
    principalType: 'Group'
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', clusterAdminRoleId)
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-06 HIGH PREVENTIVE Azure AKS §5.1 (RBAC and Service Accounts) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) AC-2; AC-3; AC-6; IA-2 A.5.15; A.5.16; A.5.18 n/a NIST SP 800-190 §4.4.2 NSA/CISA Kubernetes Hardening Guide v1.2 §4 (Authentication and authorization)

Log signals

  • AzureActivity Microsoft.ContainerService/managedClusters/write where disableLocalAccounts flips from true to false — re-enables the static clusterAdmin kubeconfig path and bypasses Entra-mediated RBAC.
  • AKSAuditAdmin entries where user.username matches masterclient or clusterUser — these are the static-credential identities that should be unused in an Entra-integrated cluster.
  • AzureActivity calls to listClusterAdminCredential — every call hands out a long-lived static kubeconfig that sidesteps conditional access.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.ContainerService/managedClusters/write", "Microsoft.ContainerService/managedClusters/listClusterAdminCredential/action")
          | extend body = tostring(parse_json(Properties).requestbody)
          | where OperationNameValue endswith "listClusterAdminCredential/action" or body has "disableLocalAccounts\":false"
          | project TimeGenerated, Caller, OperationNameValue, ResourceId, body
          | order by TimeGenerated desc
          | take 200

Run the KQL in Log Analytics. The listClusterAdminCredential call is the highest-fidelity static-credential signal in AKS; persist as a Sentinel analytics rule with severity High.

Alert threshold

  • Any disableLocalAccounts flip back to false on a production cluster — page on first occurrence.
  • Any listClusterAdminCredential call outside the documented incident-response break-glass procedure — page immediately; the resulting kubeconfig must be considered compromised.

Initial response

  1. Re-disable local accounts via az aks update --disable-local-accounts; rotate cluster certificates via az aks rotate-certs to invalidate any kubeconfig that the listClusterAdminCredential call produced.
  2. Walk AKSAuditAdmin for the exposure window — any verb=create or verb=patch issued under masterclient identity warrants treating the targeted namespaces as compromised.
  3. Escalate per general/ir.html — reconfirm the Azure RBAC role assignments mapping the operations team to AKS Cluster User and Azure Kubernetes Service RBAC Cluster Admin are intact and in audit-state.

References

Equivalent controls in other providers: GKE Workload Identity (closest IAM-integrated authentication mechanism), EKS Cluster Access Management API access entries (parallel access-control surface), OKE IAM least-privilege cluster access.

azure-k8s-07 ! HIGH PREVENTIVE

AKS Standard: Select --network-plugin azure --network-plugin-mode overlay --network-policy azure (Azure CNI Overlay with Azure native network policy) or --network-policy cilium (Azure CNI Powered by Cilium). Calico is also supported as a third option. AKS Automatic: Cilium-based CNI with NetworkPolicy support is the default; the operator authors NetworkPolicy manifests directly.

Apply a default-deny NetworkPolicy in every namespace and then add explicit allow rules for required traffic. This requires a NetworkPolicy-capable CNI — Azure CNI Overlay (with the Azure or Cilium policy mode), Azure CNI Powered by Cilium, or Calico. Without default-deny, any pod can reach any other pod and any external endpoint; this is the default in AKS until network policy is explicitly applied. Network segmentation inside the cluster is foundational defense-in-depth: it limits the blast radius of any single workload compromise.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_kubernetes_cluster" "hardened" {
  # ... other args ...
  network_profile {
    network_plugin      = "azure"
    network_plugin_mode = "overlay"
    network_policy      = "azure"
    service_cidr        = "10.100.0.0/16"
    dns_service_ip      = "10.100.0.10"
  }
}

# Default-deny NetworkPolicy in the production namespace
resource "kubernetes_manifest" "default_deny" {
  manifest = {
    apiVersion = "networking.k8s.io/v1"
    kind       = "NetworkPolicy"
    metadata = {
      name      = "default-deny-all"
      namespace = "production"
    }
    spec = {
      podSelector = {}
      policyTypes = ["Ingress", "Egress"]
    }
  }
}

Remediation — kubectl

cat <<'YAML' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]
YAML

Remediation — Bicep

targetScope = 'resourceGroup'

@description('AKS cluster with Cilium network policy.')
param clusterName string
@description('Subnet ID for the cluster.')
param subnetId string

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: resourceGroup().location
  identity: { type: 'SystemAssigned' }
  properties: {
    dnsPrefix: clusterName
    networkProfile: {
      networkPlugin: 'azure'
      networkPluginMode: 'overlay'
      networkPolicy: 'cilium'
      networkDataplane: 'cilium'
    }
    agentPoolProfiles: [
      { name: 'system', count: 3, vmSize: 'Standard_D4ds_v5', mode: 'System', vnetSubnetID: subnetId }
    ]
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-07 HIGH PREVENTIVE Azure AKS §5.3 (Network policies and CNI) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) SC-7; SC-5; AC-4 A.8.20; A.8.22 CLD.9.5.1 NIST SP 800-190 §4.4.1 NSA/CISA Kubernetes Hardening Guide v1.2 §2 (Network separation)

Log signals

  • AKSAuditAdmin entries with verb in ("delete","patch") targeting networkpolicies resources — especially deletions of namespace-default deny baselines.
  • AzureActivity Microsoft.ContainerService/managedClusters/write where networkProfile.networkPolicy flips from azure or calico to an empty value — disarms enforcement at the dataplane.
  • AKSAudit (non-admin) telemetry where new pods in a namespace expose ports unrestricted by any matching NetworkPolicy — coverage gap signal even when policies still nominally exist.

Query

AKSAuditAdmin
          | where ObjectRef has "networkpolicies"
          | where Verb in ("delete", "patch", "deletecollection")
          | extend ns = tostring(parse_json(ObjectRef).namespace), name = tostring(parse_json(ObjectRef).name)
          | project TimeGenerated, User, Verb, ns, name, ResponseStatus
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics; AKSAuditAdmin captures every admission-level mutation. The deny-baseline NetworkPolicy is a stable artefact, so any delete event in a production namespace is incident-grade.

Alert threshold

  • Delete of a NetworkPolicy whose name matches the documented deny-baseline (default-deny-all, deny-egress-default) — page immediately.
  • Cluster-scope flip of networkProfile.networkPolicy to empty — page immediately; lateral movement just became unrestricted.

Initial response

  1. Reapply the deny-baseline via the GitOps pipeline or kubectl apply -f deny-baseline.yaml; capture the AKSAuditAdmin auditID and the deleter's UPN for the ledger.
  2. Walk AKSAudit dataplane for any new east-west connection between the affected namespace and others during the exposure window — VNet NSG flow logs can corroborate when pod CIDRs translate to node IPs.
  3. Escalate per general/ir.html — confirm the Azure Policy Kubernetes initiative enforcing require deny-by-default network policy is in deny mode at the cluster scope.

References

Equivalent controls in other providers: GKE Dataplane V2 network policy, EKS default-deny NetworkPolicy, OKE Calico network policy.

azure-k8s-08 ! HIGH DETECTIVE

AKS Standard / AKS Automatic: Configure Diagnostic Settings on the AKS cluster resource to forward all six Kubernetes control-plane log categories to a Log Analytics workspace — kube-apiserver, kube-audit, kube-audit-admin, kube-controller-manager, kube-scheduler, and cluster-autoscaler.

Enable Diagnostic Settings on the AKS cluster resource and forward all six Kubernetes control-plane log categories — kube-apiserver, kube-audit, kube-audit-admin, kube-controller-manager, kube-scheduler, and cluster-autoscaler — to a Log Analytics workspace. The kube-audit and kube-audit-admin categories are the highest-value forensic artifacts: they record every Kubernetes API request (with the requester identity, verb, resource, and outcome) and are the only reliable source of truth for reconstructing a lateral-movement timeline after a breach. Without these enabled and forwarded to durable storage outside the cluster, kubectl-driven activity is invisible after the fact.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_monitor_diagnostic_setting" "aks_audit" {
  name                       = "aks-audit"
  target_resource_id         = azurerm_kubernetes_cluster.hardened.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.aks.id

  enabled_log { category = "kube-apiserver" }
  enabled_log { category = "kube-audit" }
  enabled_log { category = "kube-audit-admin" }
  enabled_log { category = "kube-controller-manager" }
  enabled_log { category = "kube-scheduler" }
  enabled_log { category = "cluster-autoscaler" }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Remediation — az monitor CLI

AKS_ID=$(az aks show --resource-group hardened-rg --name hardened-cluster --query id -o tsv)
LAW_ID=$(az monitor log-analytics workspace show --resource-group hardened-rg --workspace-name aks-law --query id -o tsv)

az monitor diagnostic-settings create \
  --name aks-audit \
  --resource "$AKS_ID" \
  --workspace "$LAW_ID" \
  --logs '[
    {"category":"kube-apiserver","enabled":true},
    {"category":"kube-audit","enabled":true},
    {"category":"kube-audit-admin","enabled":true},
    {"category":"kube-controller-manager","enabled":true},
    {"category":"kube-scheduler","enabled":true},
    {"category":"cluster-autoscaler","enabled":true}
  ]'

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Existing AKS cluster resource ID.')
param aksResourceId string
@description('Log Analytics workspace receiving AKS control-plane logs.')
param workspaceId string

resource diag 'Microsoft.Insights/diagnosticSettings@2024-01-01-preview' = {
  name: 'aks-control-plane-audit'
  scope: az.resourceId('Microsoft.ContainerService/managedClusters', last(split(aksResourceId, '/')))
  properties: {
    workspaceId: workspaceId
    logs: [
      { category: 'kube-apiserver',        enabled: true }
      { category: 'kube-audit',            enabled: true }
      { category: 'kube-audit-admin',      enabled: true }
      { category: 'kube-controller-manager', enabled: true }
      { category: 'kube-scheduler',        enabled: true }
      { category: 'guard',                 enabled: true }
    ]
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-08 HIGH DETECTIVE Azure AKS §1.2.22 (audit logging) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) AU-2; AU-12; SI-4 A.8.15; A.8.16 CLD.12.4.5 NIST SP 800-190 §4.4.3 NSA/CISA Kubernetes Hardening Guide v1.2 §6 (Audit logging and threat detection)

Log signals

  • AzureActivity Microsoft.Insights/diagnosticSettings/delete targeting a diagnostic setting that exports kube-audit, kube-audit-admin, or kube-apiserver categories from an AKS resource — silences the audit stream.
  • AzureActivity Microsoft.Insights/diagnosticSettings/write where the workspaceId is removed while only the storageAccountId remains — moves audit traffic off the queryable Log Analytics surface.
  • AKSAuditAdmin ingestion gap exceeding the cluster's normal idle window — absence-of-signal indicator that the export pipeline is broken.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.Insights/diagnosticSettings/delete", "Microsoft.Insights/diagnosticSettings/write")
          | where ResourceId has "Microsoft.ContainerService/managedClusters"
          | extend body = tostring(parse_json(Properties).requestbody)
          | project TimeGenerated, Caller, ResourceId, OperationNameValue, body
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Complement with a watchdog: AKSAuditAdmin | where TimeGenerated > ago(30m) | summarize count() by Cluster=tostring(split(_ResourceId,'/')[-1]) — any cluster with zero rows triggers the same incident path.

Alert threshold

  • Any diagnostic-setting delete on an AKS resource carrying production workloads — page on first occurrence.
  • 30-minute audit ingestion gap for a previously active cluster — page; the export pipeline is the truth source for every other AKS detection.

Initial response

  1. Recreate the diagnostic setting via Bicep/Terraform reapply; confirm the Log Analytics workspace receives the next batch of kube-audit-admin events within five minutes.
  2. Pull the AKS control-plane kube-audit-admin backlog from the Storage Account archive (if also configured) and replay any verb=create or verb=patch events that landed during the export gap.
  3. Escalate per general/ir.html — confirm the Azure Policy Deploy Diagnostic Settings for Azure Kubernetes Service to Log Analytics workspace is assigned in deploy-if-not-exists mode at the management-group scope.

References

Equivalent controls in other providers: GKE Cloud Audit Logs Data Access, EKS control-plane logs, OKE OCI Audit Logging.

azure-k8s-09 ! HIGH PREVENTIVE

AKS Standard: Use --enable-managed-identity --assign-identity <UAMI-RESOURCE-ID> --assign-kubelet-identity <KUBELET-UAMI-RESOURCE-ID> at cluster create to assign a User-Assigned Managed Identity (UAMI) to the cluster and a separate UAMI to the kubelet (for ACR image pulls and node-side Azure API calls). AKS Automatic: Cluster identity is a User-Assigned Managed Identity by default; the operator provisions and assigns the UAMI.

The AKS cluster identity must be a User-Assigned Managed Identity (UAMI) — not a service principal, and not a system-assigned managed identity. A User-Assigned Managed Identity is a standalone Azure resource that the operator creates, lifecycles, and assigns to the cluster; its lifecycle is decoupled from the cluster's lifecycle, so deleting and recreating the cluster does not invalidate the identity or break the role assignments granted to it. Service-principal mode (--service-principal <APP-ID> --client-secret <SECRET>) is the legacy alternative and is no longer recommended — it requires manual secret rotation, ages out client secrets, and forces the operator to manage credentials in pipelines. System-assigned managed identity lifecycle-couples the identity to the cluster, which complicates RBAC migration because the identity is destroyed and recreated with a new object ID when the cluster is replaced.

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "azurerm_user_assigned_identity" "aks_cluster" {
  name                = "aks-cluster-uami"
  resource_group_name = azurerm_resource_group.aks.name
  location            = var.location
}

resource "azurerm_user_assigned_identity" "aks_kubelet" {
  name                = "aks-kubelet-uami"
  resource_group_name = azurerm_resource_group.aks.name
  location            = var.location
}

resource "azurerm_kubernetes_cluster" "hardened" {
  # ... other args ...
  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.aks_cluster.id]
  }

  kubelet_identity {
    user_assigned_identity_id = azurerm_user_assigned_identity.aks_kubelet.id
    client_id                 = azurerm_user_assigned_identity.aks_kubelet.client_id
    object_id                 = azurerm_user_assigned_identity.aks_kubelet.principal_id
  }
}

Remediation — az aks CLI

az identity create --name aks-cluster-uami --resource-group hardened-rg
az identity create --name aks-kubelet-uami --resource-group hardened-rg

CLUSTER_UAMI=$(az identity show -g hardened-rg -n aks-cluster-uami --query id -o tsv)
KUBELET_UAMI=$(az identity show -g hardened-rg -n aks-kubelet-uami --query id -o tsv)

az aks create \
  --resource-group hardened-rg \
  --name hardened-cluster \
  --enable-managed-identity \
  --assign-identity "$CLUSTER_UAMI" \
  --assign-kubelet-identity "$KUBELET_UAMI"

Remediation — Bicep

targetScope = 'resourceGroup'

@description('User-assigned managed identity granted Network Contributor on the AKS VNet.')
param identityResourceId string
@description('AKS cluster name.')
param clusterName string

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: resourceGroup().location
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: { '${identityResourceId}': {} }
  }
  properties: {
    dnsPrefix: clusterName
    agentPoolProfiles: [
      { name: 'system', count: 3, vmSize: 'Standard_D4ds_v5', mode: 'System' }
    ]
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-09 HIGH PREVENTIVE Azure AKS n/a (managed control plane) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) IA-5; AC-3 A.5.16; A.5.17 n/a NIST SP 800-190 §4.4.2 NSA/CISA Kubernetes Hardening Guide v1.2 §4 (Authentication and authorization)

Log signals

  • AzureActivity Microsoft.ContainerService/managedClusters/write where identity.type flips from SystemAssigned or UserAssigned to ServicePrincipal — regression to the long-lived shared-secret cluster-identity model.
  • AuditLogs Category = "ApplicationManagement" showing client-credential or password-credential addition to the service principal that the cluster now references.
  • AzureActivity role-assignment writes scoping the cluster identity to subscriptions or resource groups outside the documented cluster boundary — privilege creep on a workload-mediating identity.

Query

AzureActivity
          | where OperationNameValue == "Microsoft.ContainerService/managedClusters/write"
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "\"identity\":{\"type\":\"ServicePrincipal\""
          | project TimeGenerated, Caller, ResourceId, body
          | order by TimeGenerated desc
          | take 100

Run as a KQL query in Log Analytics. AKS clusters originally provisioned as SystemAssigned rarely if ever flip back to ServicePrincipal; persist as a Sentinel analytics rule with severity High.

Alert threshold

  • Any flip from managed identity to service principal on a production cluster — page immediately; rotate the underlying credential within the same operator session.
  • Role assignment writes that grant the cluster identity privileges beyond its documented resource-group scope — page on first occurrence.

Initial response

  1. Reissue the cluster with --enable-managed-identity via az aks update; capture the AzureActivity OperationId as the rollback ledger.
  2. Inspect AuditLogs for any credential added to the service principal during the regression window; rotate any such credential before any further pod-credential issuance occurs.
  3. Escalate per general/ir.html — confirm the Azure Policy Audit AKS clusters using service principal credentials is in deny mode at the cluster scope.

References

Equivalent controls in other providers: GKE Workload Identity Federation (workload-to-cloud identity), EKS Pod Identity, OKE cluster IAM identity.

azure-k8s-10 ! MEDIUM PREVENTIVE

AKS Standard / AKS Automatic: Apply a NetworkPolicy in every workload namespace that blocks pod egress to 169.254.169.254/32 (the Azure Instance Metadata Service endpoint) as defense-in-depth. With Microsoft Entra Workload Identity (azure-k8s-02) configured correctly, pods never need direct IMDS access — workload tokens are projected via the SA token, not retrieved from IMDS.

Even with Microsoft Entra Workload Identity configured, applying a NetworkPolicy that blocks pod egress to the Instance Metadata Service endpoint (169.254.169.254) is defense-in-depth. The Azure IMDS responds to any pod-originated HTTP request on the node's network namespace by default; if Workload Identity is misconfigured or disabled, or if a misbehaving pod attempts to retrieve the node-level managed identity via IMDS, an egress-blocking NetworkPolicy prevents the request from reaching the metadata endpoint. This complements (but does not replace) Workload Identity — both controls operate at different layers (NetworkPolicy at L3/L4, Workload Identity at the auth layer).

Remediation — kubectl

cat <<'YAML' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-imds-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 169.254.169.254/32
YAML

Remediation — Terraform

# Terraform Azure provider ~> 4.0
resource "kubernetes_manifest" "deny_imds" {
  manifest = {
    apiVersion = "networking.k8s.io/v1"
    kind       = "NetworkPolicy"
    metadata = {
      name      = "deny-imds-egress"
      namespace = "production"
    }
    spec = {
      podSelector = {}
      policyTypes = ["Egress"]
      egress = [{
        to = [{
          ipBlock = {
            cidr   = "0.0.0.0/0"
            except = ["169.254.169.254/32"]
          }
        }]
      }]
    }
  }
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('NSG attached to AKS node subnet — denies non-system pod traffic to IMDS (169.254.169.254).')
param nsgName string

param location string = resourceGroup().location

resource nsg 'Microsoft.Network/networkSecurityGroups@2024-03-01' = {
  name: nsgName
  location: location
  properties: {
    securityRules: [
      {
        name: 'deny-pod-imds'
        properties: {
          priority: 200, direction: 'Outbound', access: 'Deny', protocol: 'Tcp'
          sourceAddressPrefix: '*', sourcePortRange: '*'
          destinationAddressPrefix: '169.254.169.254/32', destinationPortRange: '80'
          description: 'Pods must use Workload Identity (OIDC), not IMDS'
        }
      }
    ]
  }
}

Compliance mapping

Control Severity Type Provider CIS Kubernetes Benchmark v1.11.0 CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015 NIST SP 800-190 (Sep 2017) NSA/CISA Kubernetes Hardening Guide v1.2
azure-k8s-10 MEDIUM PREVENTIVE Azure AKS §5.3 (Network policies) n/a (verify against CIS Azure Kubernetes Service (AKS) Benchmark v2.0.0 PDF) SC-7; SC-5 A.8.20; A.8.22 CLD.9.5.1 NIST SP 800-190 §4.4.1 NSA/CISA Kubernetes Hardening Guide v1.2 §2 (Network separation)

Log signals

  • AKSAudit dataplane connection events from non-system-namespace pods toward 169.254.169.254 on port 80 — pods that legitimately use workload identity should reach IMDS through the sidecar shim, not directly.
  • AzureActivity Microsoft.Compute/virtualMachineScaleSets/write applied to AKS agent pool nodes where security extensions disabling host-level IMDS-NetworkPolicy enforcement are removed.
  • AKSAuditAdmin deletes of the NetworkPolicy that blocks pod-to-IMDS egress in non-system namespaces — coverage erosion event.

Query

AKSAudit
          | where Verb == "create" and ObjectRef has "pods"
          | project TimeGenerated, ns=tostring(parse_json(ObjectRef).namespace), pod=tostring(parse_json(ObjectRef).name), User
          | join kind=leftouter (
              AzureDiagnostics
              | where Category == "kube-audit"
              | where log_s contains "169.254.169.254"
              | extend logBlob = log_s
          ) on $left.pod == $right.logBlob
          | where isnotempty(logBlob)
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. The direct IMDS access pattern is the canonical signal of a pod attempting to harvest the node identity rather than using the federated-credential path; persist as a Sentinel analytics rule.

Alert threshold

  • Any pod in a non-system namespace contacting 169.254.169.254 directly — page on first occurrence; the workload should be using workload identity instead.
  • Deletion of the IMDS-block NetworkPolicy — page immediately; the cluster is one credential-harvest pod away from a node-identity theft.

Initial response

  1. Cordon the affected node and quarantine the pod via kubectl label pod {pod} quarantine=true --overwrite; capture the pod manifest and the AKSAudit trail as the forensic ledger.
  2. Rotate the node-pool managed identity if any direct IMDS request returned 200 — assume the node identity has been harvested and replay the agent-pool reimage via az aks nodepool upgrade --node-image-only.
  3. Escalate per general/ir.html — restore the IMDS-block NetworkPolicy from the GitOps source and reconfirm the Azure Policy denying its deletion is in deny mode.

References

Equivalent controls in other providers: EKS parallel IMDS hardening (IMDSv2 + hop-limit 1), OKE Workload Identity removes IMDS-based instance-principal dependency.

Sources