Azure Incident Response Hardening

Overview

This page covers Microsoft Azure incident response (IR) hardening — the controls that decide whether the organisation can contain, investigate, and recover from an Azure-resident incident inside a defensible time window, and whether the resulting forensic record will hold up to subsequent regulatory or legal scrutiny. Scope is the Azure commercial regions; Azure Government and Azure operated by 21Vianet (China) inherit the same controls but route through different sovereign endpoint suffixes, a different Microsoft Entra ID (formerly Azure Active Directory) tenant topology, and (for Azure Government) a separate Microsoft Sentinel commercial-to-government data-residency boundary that prohibits cross-cloud workspace replication. Re-verify partition caveats and the Microsoft Graph endpoint before applying any of the IaC below to a non-commercial cloud.

Cross-cutting IR lifecycle principles — preparation, detection, containment, eradication, recovery, lessons-learned, and forensics & evidence preservation — are documented on the General Incident Response page against NIST SP 800-61 rev 3 (April 2025 CSF 2.0 community profile). This page does not re-author the lifecycle; it maps the Prepare → Detect → Contain → Eradicate → Recover → Lessons-Learned sequence to Azure primitives and to the specific posture controls that make each lifecycle phase executable inside an Azure tenant. Severity assignments follow the rubric documented in methodology — severity assignment; equivalence callouts at the bottom of each control point to the matching control on the AWS, GCP, and OCI sibling pages so a reader can compare break-glass, automated-response, and forensic-retention models across providers.

Azure IR posture splits cleanly into two stacks. The detective stack — subscription Activity Log routed to a central Log Analytics workspace, Microsoft Defender for Cloud workload-protection plans, Microsoft Sentinel analytics rules and data connectors, NSG Flow Logs with traffic analytics, Activity Log alerts on canonical events — lives on the Azure Logging page; it is what tells you an incident is happening. The responsive stack — emergency-access break-glass identities in Microsoft Entra ID, Microsoft Sentinel automation rules and Logic App playbooks, immutable forensic blob storage in a dedicated subscription, Sentinel KQL hunting query libraries, documented VM-isolation and token-revocation runbooks — lives here; it is what you do once you know. The handoff between the two stacks is concrete and Azure-internal: a high-severity Microsoft Sentinel incident triggers an automation rule that runs a Logic App playbook (azure-ir-02); a forensic question about who-touched-what at 03:00 last Tuesday is answered by a Sentinel KQL hunting query against archived Activity Log and SigninLogs (azure-ir-04). Every IR control on this page assumes the corresponding logging control on the Azure Logging page is in place; if it is not, the IR control degrades to a manual playbook with insufficient telemetry to drive it.

This page is the provider-specific how-to; General IR owns the cross-cutting principles. Per the canonical-content rule, the lifecycle phases, the threat-actor taxonomy, and the regulatory-reporting timelines are not re-authored here. Pair-control announcement: azure-ir-02 and azure-ir-04 both cross-link STRICT to azure-log-08-sentinel-data-lake on the Azure Logging page — Microsoft Sentinel is the SIEM/SOAR plane that both controls depend on, and the same-phase bidirectional linkage is the parallel of the Phase 6 AWS pattern (aws-ir-02 / aws-ir-04 ↔ aws-log-04 / aws-log-08). azure-ir-03 additionally cross-links to azure-data-08-immutable-blob: both controls rely on the identical Immutable Blob in Locked mode mechanism, but with different postures — defensive retention of business records vs forensic chain-of-custody storage of incident evidence.

Order matters. Control 01 is the preparation invariant that gates everything else: without a pre-provisioned emergency-access identity, the very first incident that takes out Microsoft Entra ID — exactly the scenario IR exists to handle — locks responders out of the tenant at the moment they need it most. Control 02 is the automation layer that compresses time-to-contain from human-response-time minutes to seconds via Microsoft Sentinel SOAR. Control 03 is the evidence invariant — without write-once-read-many evidence storage in a separate subscription, an attacker with sufficient privileges can erase the very logs and snapshots that would prove what happened. Controls 04–06 are the responsive playbooks themselves: Sentinel KQL hunting for retrospective forensic queries, VM isolation for "we think this VM is compromised, take it off the network without destroying state", and token revocation for "an Entra ID identity is known or suspected stolen". Control 07 closes the lessons-learned loop with quarterly tabletop exercises so the playbooks above are tested before they are needed, and with an annual Microsoft Incident Response (DART) engagement contact test so the vendor escalation path is known-working when needed.

One housekeeping note on the compliance table that follows every control. Most IR controls are playbook-driven and process-bound rather than state-driven — CIS Foundations Benchmarks across all four providers are weighted toward configurable state (encryption, public access, logging enabled) and only lightly cover the IR domain. Expect the CIS columns on this page to read (best-practices) or n/a for most controls; the CIS Microsoft Azure Foundations Benchmark v3.0.0 (Feb 2025) Section 5 (Logging and Monitoring) covers the detective half of IR but does not enumerate playbook-style response controls. NIST SP 800-53 rev5 IR family (IR-4 Incident Handling, IR-5 Incident Monitoring, IR-6 Incident Reporting, IR-8 Incident Response Plan, plus AU-9 / AU-11 for evidence) and ISO/IEC 27001:2022 (A.5.24 information-security incident management, A.5.26 response to incidents, A.5.28 collection of evidence) are the primary mappings.

azure-ir-01-emergency-access ! CRITICAL PREVENTIVE

Pre-provision between two and four emergency-access (break-glass) accounts in Microsoft Entra ID. Every account is cloud-only — created directly in the Entra tenant, never synchronised from on-premises Active Directory via Entra Connect — uses a FIDO2 hardware security key for authentication (no passwords, no SMS, no Microsoft Authenticator push), and is explicitly excluded from every Conditional Access policy via a dedicated CA-exclusion group so that a misconfigured or compromised CA policy cannot lock the responder out of the tenant they need to recover (Microsoft Learn — Manage emergency access accounts in Microsoft Entra ID (accessed 2026-05)). Each account is assigned the Global Administrator role at the tenant root; the hardware key is stored in a locked safe in two physically separate buildings; a Microsoft Sentinel analytics rule fires on every sign-in for these accounts to PagerDuty, the security on-call Teams channel, and an SMTP gateway for redundancy.

The control is typed PREVENTIVE, not RESPONSIVE — mirroring the Phase 6 aws-ir-01 precedent. The control is the pre-positioning that makes response possible: creating an emergency-access account during the incident that took out Microsoft Entra ID, after the federation provider was compromised, or after the IdP-to-Entra trust was misconfigured is structurally impossible. The control is also CRITICAL: without it, the very first incident that affects the identity plane has no recovery path. Quarterly access tests — a named responder retrieves their FIDO2 key from the safe, signs into the Entra portal, performs a single read-only Graph API call, signs out — keep the credential, the hardware key, and the Sentinel alarm pipeline all known-working. Tests that have not been performed in the last 90 days are tracked on the security team's drift dashboard. The principle is reinforced in General IR — preparation and cross-references the privileged-access posture documented on azure/iam.html.

Remediation — Azure CLI

# Azure CLI 2.x
# Microsoft Entra emergency-access account creation. The CA-exclusion group MUST
# exist before the account is created; the account MUST be added to the group
# before any Conditional Access policy is applied tenant-wide.

# (1) Create the dedicated CA-exclusion group.
az ad group create \
  --display-name "Emergency Access Accounts" \
  --mail-nickname emergency-access \
  --description "Excluded from ALL Conditional Access policies — break-glass only"

# (2) Create the cloud-only break-glass user account. Note: --account-enabled true
# is intentional; the account is enabled but should only be used by named
# responders retrieving the hardware key from the safe.
az ad user create \
  --display-name "Break-Glass Responder 01" \
  --user-principal-name break-glass-01@contoso.onmicrosoft.com \
  --password "$(openssl rand -base64 48)" \
  --force-change-password-next-sign-in false \
  --account-enabled true

# (3) Add the account to the CA-exclusion group.
USER_ID=$(az ad user show --id break-glass-01@contoso.onmicrosoft.com --query id -o tsv)
GROUP_ID=$(az ad group show --group "Emergency Access Accounts" --query id -o tsv)
az ad group member add --group "$GROUP_ID" --member-id "$USER_ID"

# (4) Assign Global Administrator at tenant root via Microsoft Graph.
# az CLI lacks a first-class verb for directory role assignment; az rest is canonical.
az rest --method POST \
  --uri "https://graph.microsoft.com/v1.0/directoryRoles/roleTemplateId=62e90394-69f5-4237-9190-012177145e10/members/\$ref" \
  --body "{\"@odata.id\": \"https://graph.microsoft.com/v1.0/directoryObjects/$USER_ID\"}"

# (5) FIDO2 hardware-key enrolment is interactive: the responder signs in once
# (via Temporary Access Pass issued for first sign-in) and registers the FIDO2
# key via aka.ms/mysecurityinfo. CLI enrolment of FIDO2 keys is not supported.

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn — Entra emergency-access accounts (accessed 2026-05)
# Directory-plane resources use the AzureAD provider declared in-block.
terraform {
  required_providers {
    azuread = {
      source  = "hashicorp/azuread"
      version = "~> 2.50"
    }
  }
}

# Dedicated Conditional Access exclusion group.
resource "azuread_group" "emergency_access" {
  display_name     = "Emergency Access Accounts"
  mail_nickname    = "emergency-access"
  description      = "Excluded from ALL Conditional Access policies — break-glass only"
  security_enabled = true
}

# Two cloud-only break-glass user identities. Cloud-only = not synced from
# on-prem AD; FIDO2 enrolment via mysecurityinfo (interactive step).
resource "azuread_user" "break_glass" {
  for_each            = toset(["bg01", "bg02"])
  user_principal_name = "break-glass-${each.key}@contoso.onmicrosoft.com"
  display_name        = "Break-Glass Responder ${each.key}"
  mail_nickname       = "break-glass-${each.key}"
  password            = random_password.break_glass[each.key].result
  force_password_change = false
  account_enabled     = true
}

resource "random_password" "break_glass" {
  for_each = toset(["bg01", "bg02"])
  length   = 48
  special  = true
}

# Bind both break-glass accounts to the CA-exclusion group.
resource "azuread_group_member" "break_glass_in_exclusion" {
  for_each         = azuread_user.break_glass
  group_object_id  = azuread_group.emergency_access.object_id
  member_object_id = each.value.object_id
}

# Microsoft Sentinel analytics rule firing on any sign-in for these accounts.
resource "azurerm_sentinel_alert_rule_scheduled" "break_glass_signin" {
  name                       = "break-glass-signin"
  log_analytics_workspace_id = azurerm_log_analytics_workspace.security.id
  display_name               = "Emergency access account sign-in"
  severity                   = "High"
  query                      = <<-KQL
    SigninLogs
    | where UserPrincipalName in~ ("break-glass-bg01@contoso.onmicrosoft.com",
                                    "break-glass-bg02@contoso.onmicrosoft.com")
    | project TimeGenerated, UserPrincipalName, IPAddress, AppDisplayName, ResultType
  KQL
  query_frequency            = "PT5M"
  query_period               = "PT5M"
  trigger_operator           = "GreaterThan"
  trigger_threshold          = 0
}

Remediation — Bicep

targetScope = 'tenant'

@description('Object IDs of two cloud-only break-glass accounts (no MFA dependency on tenant IdP).')
param breakGlassObjectIds array

@description('Group used in Conditional Access exclusions for break-glass.')
param exclusionGroupId string

resource memberships 'Microsoft.Graph/groups/members@2023-09-01' = [for (oid, i) in breakGlassObjectIds: {
  name: '${exclusionGroupId}/${oid}'
}]

Remediation — Pulumi (TypeScript)

import * as pulumi from "@pulumi/pulumi";

// Break-glass accounts are created out-of-band (no secrets in IaC).
// Pulumi authors the CA-exclusion-group memberships only.
const breakGlassObjectIds = [
  "<bg-account-1-object-id>",
  "<bg-account-2-object-id>",
];

const exclusionGroupId = "<break-glass-exclusion-group-id>";

// graph-membership authoring goes here (azure-native preview surface).
export const breakGlassRoster = breakGlassObjectIds;

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a IR-4; AC-2(8); AC-6A.5.24; A.5.26CLD.9.5.1

Log signals

  • SigninLogs entries for the tenant break-glass principals outside the quarterly-drill cadence window — every event on these accounts is by design incident-grade and warrants page-on-first-occurrence handling.
  • AuditLogs Category = "UserManagement" showing password rotation, MFA-method change, or directory-role removal on the break-glass principal — adversary-attempt-to-disarm signal.
  • AzureActivity role-assignment delete events targeting the Global Administrator binding on break-glass principals — would silently strip the emergency path.

Query

SigninLogs
          | where UserPrincipalName startswith "breakglass-"
          | project TimeGenerated, UserPrincipalName, ResultType, ResultDescription, IPAddress, Location, AppDisplayName
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics scoped to the SigninLogs export. Pair with a Sentinel analytics rule severity High and an automation playbook that posts to the incident-response channel on every match — the legitimate-use cadence is rare enough that false positives are negligible.

Alert threshold

  • Any successful break-glass sign-in outside the documented quarterly drill window — page immediately.
  • Any directory-role change or MFA-method change on a break-glass principal — page; treat as adversary preparing the path for later use.

Initial response

  1. Confirm the sign-in maps to a declared incident ticket; if no ticket exists, treat as compromise of the highest-privilege identity in the tenant.
  2. Rotate the break-glass password to a fresh hardware-managed value via four-eyes process; re-pair the FIDO2 key under witness; revoke refresh tokens via Revoke-MgUserSignInSession.
  3. Escalate per general/ir.html — capture SigninLogs + AuditLogs + AzureActivity for the affected tenant slice over the prior 72 hours and confirm the conditional-access exception for break-glass remains tightly scoped.

References

Equivalent on: AWS · GCP · OCI

azure-ir-02-sentinel-playbook ! HIGH RESPONSIVE

Wire Microsoft Sentinel automation rules to invoke Logic App playbooks the moment a high-severity incident is created in the Sentinel workspace. The canonical automation set covers four deterministic actions: (a) isolate the affected VM by attaching a deny-all "quarantine" NSG to its NIC, (b) disable the implicated Entra ID identity (az ad user update --account-enabled false or Microsoft Graph PATCH /users/{id}), (c) snapshot the OS and data disks for forensic preservation (handing off to azure-ir-03), and (d) notify the on-call rotation via Teams adaptive card and PagerDuty. Microsoft Defender for Cloud workflow automation is a complement, not a substitute: Defender raises the finding, Sentinel ingests it as an incident (via the Microsoft Defender for Cloud data connector), the Sentinel automation rule pattern-matches on tactics or severity, and the Logic App carries out the SOAR action (Microsoft Learn — Automate incident handling with automation rules (accessed 2026-05)).

The detective half of this loop lives on the Azure Logging page as azure-log-08-sentinel-data-lake; without Sentinel onboarded to the central Log Analytics workspace with data connectors for Entra ID sign-ins, Activity Log, and Defender for Cloud alerts, this control has nothing to fire on. This is the STRICT same-phase pair-control link to azure/logging.html — the bidirectional linkage parallels the Phase 6 aws-ir-02 ↔ aws-log-04 pattern. Logic Apps are preferred over Azure Functions for SOAR playbooks because the connector library covers Entra ID, ServiceNow, Teams, Jira, PagerDuty, and Microsoft 365 Defender out-of-the-box; the workflow editor is reviewable by non-developer responders; and the run history is queryable per-incident for post-incident review.

Remediation — Azure CLI

# Azure CLI 2.x
# (1) Create the Logic App workflow that the Sentinel automation rule will invoke.
# Workflow definition lives in playbook-isolate-vm.json and contains the
# Entra ID, Network, Compute, and Teams connector actions.
az logic workflow create \
  --resource-group rg-security-prod \
  --name la-ir-isolate-vm \
  --location westeurope \
  --definition @playbook-isolate-vm.json

# (2) Grant the Logic App's system-assigned managed identity the roles it needs.
LA_PRINCIPAL=$(az logic workflow show --resource-group rg-security-prod \
                 --name la-ir-isolate-vm --query identity.principalId -o tsv)

# Network Contributor for NSG attach/detach (scoped to a single resource group
# that holds workloads which may need isolation).
az role assignment create --assignee "$LA_PRINCIPAL" \
  --role "Network Contributor" \
  --scope "/subscriptions/$SUB_ID/resourceGroups/rg-workload-prod"

# (3) Create the Sentinel automation rule that invokes the Logic App on any
# incident with severity High or Critical.
az sentinel automation-rule create \
  --resource-group rg-security-prod \
  --workspace-name law-security-prod \
  --automation-rule-id "$(uuidgen)" \
  --display-name "Isolate VM on High/Critical incident" \
  --order 1 \
  --triggering-logic '{"isEnabled":true,"triggersOn":"Incidents","triggersWhen":"Created","conditions":[{"conditionType":"Property","conditionProperties":{"propertyName":"IncidentSeverity","operator":"Equals","propertyValues":["High","Critical"]}}]}' \
  --actions '[{"actionType":"RunPlaybook","order":1,"actionConfiguration":{"logicAppResourceId":"/subscriptions/'"$SUB_ID"'/resourceGroups/rg-security-prod/providers/Microsoft.Logic/workflows/la-ir-isolate-vm"}}]'

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn — Sentinel automation rules (accessed 2026-05)
# Pre-staged quarantine NSG that the playbook attaches to compromised VMs.
resource "azurerm_network_security_group" "quarantine" {
  name                = "nsg-ir-quarantine"
  location            = "westeurope"
  resource_group_name = azurerm_resource_group.security.name

  security_rule {
    name                       = "DenyAllInbound"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "DenyAllOutbound"
    priority                   = 100
    direction                  = "Outbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }
}

# Logic App workflow — the SOAR playbook itself. Body lives in playbook.json
# and contains the Microsoft Graph + Compute + Network connector actions.
resource "azurerm_logic_app_workflow" "isolate_vm" {
  name                = "la-ir-isolate-vm"
  location            = "westeurope"
  resource_group_name = azurerm_resource_group.security.name
  identity { type = "SystemAssigned" }
}

resource "azurerm_logic_app_trigger_http_request" "isolate_vm" {
  name         = "manual"
  logic_app_id = azurerm_logic_app_workflow.isolate_vm.id
  schema       = jsonencode({ "type": "object" })
}

# Sentinel automation rule that fires the playbook on High/Critical incidents.
resource "azurerm_sentinel_automation_rule" "isolate_vm" {
  name                       = "isolate-vm-high-critical"
  log_analytics_workspace_id = azurerm_log_analytics_workspace.security.id
  display_name               = "Isolate VM on High/Critical incident"
  order                      = 1

  condition_json = jsonencode([{
    conditionType = "Property"
    conditionProperties = {
      propertyName   = "IncidentSeverity"
      operator       = "Equals"
      propertyValues = ["High", "Critical"]
    }
  }])

  action_playbook {
    logic_app_id = azurerm_logic_app_workflow.isolate_vm.id
    order        = 1
  }
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Sentinel-workspace-scoped Logic App (consumption) acting as an IR playbook.')
param playbookName string

param location string = resourceGroup().location

resource playbook 'Microsoft.Logic/workflows@2024-05-01-preview' = {
  name: playbookName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    state: 'Enabled'
    definition: {
      '$schema': 'https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#'
      contentVersion: '1.0.0.0'
      triggers: {
        SentinelIncident: {
          type: 'ApiConnectionWebhook'
          inputs: { /* … */ }
        }
      }
      actions: { /* tag incident, isolate VM, post to Teams … */ }
    }
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a IR-4(1); IR-4(7); SI-4(7)A.5.26CLD.12.4.5

Log signals

  • AzureActivity Microsoft.SecurityInsights/automationRules/delete targeting an automation rule that previously bound a Critical-severity analytic rule to its containment playbook — silently disarms the auto-response path.
  • AzureActivity Microsoft.Logic/workflows/disable on a Logic App that is referenced by Sentinel as a playbook — incident creation still occurs but no auto-response runs.
  • Sentinel SecurityIncident table where Status = "New" persists beyond the playbook-SLA timer — downstream signal that auto-handling has stopped.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.SecurityInsights/automationRules/delete", "Microsoft.Logic/workflows/disable/action", "Microsoft.SecurityInsights/incidents/relations/delete")
          | project TimeGenerated, Caller, ResourceId, OperationNameValue
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Pair with a Sentinel analytics rule that joins SecurityIncident rows against the documented playbook coverage matrix — any Critical incident without a triggered playbook run is itself a control failure.

Alert threshold

  • Automation-rule delete touching a Critical-severity binding — page on first occurrence.
  • Logic App disable on a playbook referenced by Sentinel — page; the auto-response path is now manual-only.

Initial response

  1. Restore the automation rule and Logic App state via the IaC baseline; trigger a synthetic incident to confirm the playbook execution succeeds end-to-end.
  2. Walk SecurityIncident rows for the exposure window and apply manual containment to any Critical incident that did not benefit from auto-response.
  3. Escalate per general/ir.html — confirm Azure Policy Configure Microsoft Sentinel automation rules remains assigned and that the playbook's managed identity retains the requisite Sentinel Contributor role.

References

Pair-control: azure-log-08-sentinel-data-lake (detective half, Sentinel SOAR plane). Equivalent on: AWS · GCP · OCI

azure-ir-03-snapshot-forensic ! CRITICAL RESPONSIVE

Stand up a dedicated forensic Azure subscription (separate management group, no shared role assignments with workload subscriptions) that owns a Storage Account configured with Immutable Blob Storage in Locked mode, with a time-based retention policy of at least one year — preferably seven years to align with regulatory evidence-retention norms. Immutable Blob in Locked mode is write-once-read-many at the API level: not even the subscription owner can delete or shorten the retention of a blob during its lock window, and the lock itself cannot be removed or reduced once locked (Microsoft Learn — Immutable storage for Blob data overview (accessed 2026-05)). This is the Azure analog of the Phase 6 aws-ir-03 decision to use S3 Object Lock in COMPLIANCE mode (not Governance) — the threat model is identical: the very privileges an attacker is most likely to acquire are the ones that would let them disable a bypassable lock.

When an incident is declared, the responder snapshots the affected VM's OS and data managed disks using az snapshot create, tagging each snapshot with chain-of-custody metadata (incident_id, captured_by, captured_at, source_resource_id), then exports the snapshot artefacts and the relevant Activity Log slice into the forensic Storage Account. Cross-tenant role assignments granted to an external IR partner's Service Principal allow that partner to read the evidence container without ever having a credential inside the customer tenant. The same Immutable Blob Locked mode mechanism is used for defensive storage of business records on azure-data-08-immutable-blob — same mechanism, different posture; this control is the forensic-storage application of that primitive. The principle is documented in General IR — forensics & evidence preservation.

Remediation — Azure CLI

# Azure CLI 2.x
# All commands run in the FORENSIC subscription context.
az account set --subscription "$FORENSIC_SUB_ID"

# (1) Storage Account for forensic evidence — geo-redundant, blob-versioning ON.
az storage account create \
  --resource-group rg-forensic-prod \
  --name stforensicirprodweu \
  --location westeurope \
  --sku Standard_GRS \
  --kind StorageV2 \
  --allow-blob-public-access false \
  --public-network-access Disabled \
  --enable-hierarchical-namespace false

# (2) Enable blob versioning (prerequisite for version-level immutability).
az storage account blob-service-properties update \
  --account-name stforensicirprodweu \
  --enable-versioning true

# (3) Create the evidence container.
az storage container create \
  --account-name stforensicirprodweu \
  --name ir-evidence \
  --auth-mode login

# (4) Apply a time-based immutability policy in LOCKED mode (1-year retention).
# Once locked, no API call by any principal can delete or shorten the lock.
az storage container immutability-policy create \
  --account-name stforensicirprodweu \
  --container-name ir-evidence \
  --period 365 \
  --allow-protected-append-writes true

az storage container immutability-policy lock \
  --account-name stforensicirprodweu \
  --container-name ir-evidence \
  --if-match "*"

# (5) Snapshot a VM's OS disk and tag with chain-of-custody metadata.
INCIDENT_ID=ir-2026-05-23-001
DISK_ID=$(az vm show --resource-group rg-workload-prod --name vm-app-01 \
            --query storageProfile.osDisk.managedDisk.id -o tsv)

az snapshot create \
  --resource-group rg-forensic-prod \
  --name snap-${INCIDENT_ID}-vm-app-01-os \
  --source "$DISK_ID" \
  --incremental true \
  --tags incident_id=$INCIDENT_ID captured_by=responder-bg01 \
         captured_at=$(date -u +%FT%TZ) source_resource_id=$DISK_ID

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn — Immutable storage for Blob data (accessed 2026-05)
# Forensic Storage Account in the FORENSIC subscription.
resource "azurerm_storage_account" "forensic" {
  name                            = "stforensicirprodweu"
  resource_group_name             = azurerm_resource_group.forensic.name
  location                        = "westeurope"
  account_tier                    = "Standard"
  account_replication_type        = "GRS"
  allow_nested_items_to_be_public = false
  public_network_access_enabled   = false

  blob_properties {
    versioning_enabled = true
  }
}

# Evidence container.
resource "azurerm_storage_container" "ir_evidence" {
  name                  = "ir-evidence"
  storage_account_name  = azurerm_storage_account.forensic.name
  container_access_type = "private"
}

# Time-based immutability policy in LOCKED mode — 1-year retention; once
# applied, this policy_mode = "Locked" cannot be reduced or removed.
resource "azurerm_storage_container_immutability_policy" "ir_evidence" {
  storage_container_resource_manager_id = azurerm_storage_container.ir_evidence.resource_manager_id
  immutability_period_in_days           = 365
  protected_append_writes_all_enabled   = true
  locked                                = true
}

# Cross-tenant role assignment for the external IR partner's Service Principal.
# The partner SP reads evidence without ever holding a credential in this tenant.
resource "azurerm_role_assignment" "ir_partner_reader" {
  scope                = azurerm_storage_account.forensic.id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = var.ir_partner_service_principal_object_id
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Source managed disk to snapshot for forensic analysis.')
param sourceDiskResourceId string
@description('Snapshot name (forensic-evidence-<timestamp>).')
param snapshotName string

param location string = resourceGroup().location

resource snap 'Microsoft.Compute/snapshots@2024-03-02' = {
  name: snapshotName
  location: location
  sku: { name: 'Standard_ZRS' }
  properties: {
    creationData: {
      createOption: 'Copy'
      sourceResourceId: sourceDiskResourceId
    }
    incremental: false
    networkAccessPolicy: 'DenyAll'
    publicNetworkAccess: 'Disabled'
    diskAccessId: null
  }
  tags: {
    'forensic-evidence': 'true'
    'do-not-delete': 'true'
  }
}

Remediation — Pulumi (TypeScript)

import * as pulumi from "@pulumi/pulumi";
import * as compute from "@pulumi/azure-native/compute";

new compute.Snapshot("forensic-evidence", {
  resourceGroupName: "<rg>",
  snapshotName: "forensic-evidence-2026-05-26",
  sku: { name: compute.SnapshotStorageAccountTypes.Standard_ZRS },
  creationData: {
    createOption: compute.DiskCreateOption.Copy,
    sourceResourceId: "/subscriptions/.../disks/<source-disk>",
  },
  incremental: false,
  networkAccessPolicy: compute.NetworkAccessPolicy.DenyAll,
  publicNetworkAccess: compute.PublicNetworkAccess.Disabled,
  tags: {
    "forensic-evidence": "true",
    "do-not-delete": "true",
  },
});

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a AU-11; IR-4(7); SI-7A.5.28; A.8.13CLD.12.4.5

Log signals

  • AzureActivity Microsoft.Compute/snapshots/delete on snapshots tagged forensic=true or stored in the dedicated forensics resource group — interferes with evidence chain.
  • AzureActivity write events on the forensics Storage Account where immutability policy enforcement is reduced — investigation archive integrity at risk.
  • Audit-failure spikes against the forensics resource group's RBAC scope — adversary probing the evidence vault.

Query

AzureActivity
          | where ResourceId contains "/resourceGroups/rg-forensics" or ResourceId contains "snapshots"
          | where OperationNameValue endswith "/delete" or OperationNameValue endswith "/write"
          | extend body = tostring(parse_json(Properties).requestbody)
          | project TimeGenerated, Caller, ResourceId, OperationNameValue, body, ActivityStatusValue
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. The forensics resource group should be tightly scoped via RBAC and Azure Policy; persist as a Sentinel analytics rule with severity High and treat any write or delete as worth reviewing immediately.

Alert threshold

  • Delete on any resource in the forensics resource group — page on first occurrence.
  • Three or more authorization-failed events in the forensics scope within a 1h window — page; adversary is probing the evidence vault.

Initial response

  1. Restore the deleted snapshot from the immutable archive container or from the regional backup vault; capture the AzureActivity Caller as the actor of record.
  2. Walk RBAC role-assignment writes targeting the forensics resource group during the prior 30 days — any new principal that should not have evidence-vault access is itself an incident.
  3. Escalate per general/ir.html — confirm the forensics-scope Azure Policy Resource locks should be applied to forensic resources remains in deny mode.

References

Pair-control: azure-data-08-immutable-blob (same Locked-mode mechanism; defensive vs forensic posture). Equivalent on: AWS · GCP · OCI

azure-ir-04-sentinel-kql ! HIGH RESPONSIVE

Pre-write and version-control a Microsoft Sentinel KQL hunting query library that answers the most common forensic questions Azure incident responders face: suspicious sign-in chains (impossible travel + new device + unfamiliar app within a short time window from a single UPN), anomalous role assignments (Owner or User Access Administrator added to a subscription outside an approved change window), mass blob-download patterns (a single principal reading thousands of blob URIs across multiple containers within minutes), NSG rule mutations on production resource groups, and Key Vault secret reads at rates above baseline. Each query is committed to the security team's hunting-queries repository as a .kql file with documented inputs, expected output schema, and a short prose paragraph describing the question it answers (Microsoft Learn — Hunt for threats with Microsoft Sentinel (accessed 2026-05)).

The base Sentinel onboarding control (azure-log-08-sentinel-data-lake) is the detective half: it ensures Sentinel is connected to the central Log Analytics workspace and that data connectors for Entra ID sign-ins, Activity Log, Microsoft Defender for Cloud alerts, Office 365, and DNS are emitting. This control is the responsive half: it makes the workspace usable under IR time pressure by pre-writing and version-controlling the hunting-query library so a responder does not have to write KQL from scratch at 03:00 on a Saturday. This is the STRICT same-phase pair-control link to azure/logging.html — bidirectional linkage parallels the Phase 6 aws-ir-04 ↔ aws-log-08 pattern. Long-tail forensic questions are answered against the Log Analytics archive tier (up to 12 years retention) and Sentinel auxiliary logs (cost-efficient for high-volume noisy sources like DNS and proxy) — two-year minimum retention is the operational floor for the canonical hunting workflow.

Remediation — Azure CLI

# Azure CLI 2.x
# (1) Create a saved hunting query in the Sentinel workspace.
# Note: az sentinel hunting-query commands were added in az CLI ~2.50; older
# versions use the az rest workaround against the Microsoft.SecurityInsights API.
az sentinel hunting-query create \
  --resource-group rg-security-prod \
  --workspace-name law-security-prod \
  --hunting-query-id "$(uuidgen)" \
  --display-name "Suspicious sign-in chain — impossible travel + new device" \
  --query "let lookback = 1h;
SigninLogs
| where TimeGenerated > ago(lookback)
| where ResultType == 0
| extend RiskState = tostring(parse_json(ConditionalAccessPolicies)[0].result)
| summarize Cities=make_set(LocationDetails.city),
            Devices=make_set(DeviceDetail.displayName),
            Apps=make_set(AppDisplayName)
          by UserPrincipalName, bin(TimeGenerated, 10m)
| where array_length(Cities) > 1 and array_length(Devices) > 1" \
  --query-frequency PT1H

# (2) Pre-saved Activity Log hunting query: privileged-role assignment outside
# an approved change window.
az sentinel hunting-query create \
  --resource-group rg-security-prod \
  --workspace-name law-security-prod \
  --hunting-query-id "$(uuidgen)" \
  --display-name "Privileged role assignment outside change window" \
  --query "AzureActivity
| where OperationNameValue == 'Microsoft.Authorization/roleAssignments/write'
| where Properties contains 'Owner' or Properties contains 'User Access Administrator'
| where TimeGenerated !between (datetime(2026-05-23T01:00) .. datetime(2026-05-23T05:00))
| project TimeGenerated, Caller, CallerIpAddress, Properties, ResourceGroup"

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn — Sentinel hunting (accessed 2026-05)
# Each hunting query is a scheduled analytics rule firing on a schedule so that
# matches automatically produce Sentinel incidents — the library is operational,
# not just a saved-query collection.
resource "azurerm_sentinel_alert_rule_scheduled" "impossible_travel" {
  name                       = "impossible-travel-new-device"
  log_analytics_workspace_id = azurerm_log_analytics_workspace.security.id
  display_name               = "Suspicious sign-in chain — impossible travel + new device"
  severity                   = "Medium"
  query                      = <<-KQL
    let lookback = 1h;
    SigninLogs
    | where TimeGenerated > ago(lookback)
    | where ResultType == 0
    | summarize Cities=make_set(LocationDetails.city),
                Devices=make_set(DeviceDetail.displayName),
                Apps=make_set(AppDisplayName)
              by UserPrincipalName, bin(TimeGenerated, 10m)
    | where array_length(Cities) > 1 and array_length(Devices) > 1
  KQL
  query_frequency            = "PT1H"
  query_period               = "PT1H"
  trigger_operator           = "GreaterThan"
  trigger_threshold          = 0
  tactics                    = ["InitialAccess", "CredentialAccess"]
}

# Archive-tier retention on the Log Analytics workspace — 2 years archive
# (cost-efficient long-tail forensic queries against SigninLogs and AzureActivity).
resource "azurerm_log_analytics_workspace" "security" {
  name                       = "law-security-prod"
  location                   = "westeurope"
  resource_group_name        = azurerm_resource_group.security.name
  sku                        = "PerGB2018"
  retention_in_days          = 90
  daily_quota_gb             = 50
}

# Per-table archive retention extending SigninLogs / AzureActivity to 730 days.
resource "azurerm_log_analytics_workspace_table" "signin_logs" {
  workspace_id        = azurerm_log_analytics_workspace.security.id
  name                = "SigninLogs"
  total_retention_in_days = 730
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Sentinel workspace name (Log Analytics workspace with Sentinel solution).')
param workspaceName string
@description('Scheduled alert rule name.')
param ruleName string = 'high-severity-signin-anomaly'

resource scheduled 'Microsoft.SecurityInsights/alertRules@2024-09-01' = {
  scope: resourceId('Microsoft.OperationalInsights/workspaces', workspaceName)
  name: ruleName
  kind: 'Scheduled'
  properties: {
    displayName: 'High-severity sign-in anomaly'
    severity: 'High'
    enabled: true
    query: '''
SigninLogs
| where ResultType != 0
| where RiskLevelDuringSignIn in ('high','medium')
| summarize attempts = count() by UserPrincipalName, IPAddress, bin(TimeGenerated, 5m)
| where attempts > 10
'''
    queryFrequency: 'PT5M'
    queryPeriod: 'PT30M'
    triggerOperator: 'GreaterThan'
    triggerThreshold: 0
    suppressionEnabled: false
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/an/a (post-v3.0.0)n/an/a AU-11; IR-4(7)A.5.28CLD.12.4.5

Log signals

  • AzureActivity Microsoft.SecurityInsights/alertRules/write where the request body sets enabled = false on an analytics rule the org has classified as critical.
  • AzureActivity Microsoft.SecurityInsights/alertRules/delete on a critical rule — coverage erosion at the detection-content layer.
  • SecurityIncident table volume drop per source connector compared to the 30-day baseline — downstream coverage failure even if no per-rule edit explains it.

Query

AzureActivity
          | where OperationNameValue in ("Microsoft.SecurityInsights/alertRules/write", "Microsoft.SecurityInsights/alertRules/delete")
          | extend body = tostring(parse_json(Properties).requestbody)
          | where body has "\"enabled\":false" or OperationNameValue endswith "/delete"
          | project TimeGenerated, Caller, ResourceId, OperationNameValue, body
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Pair with a daily Sentinel content-pack reconciliation that compares the rule inventory to the GitOps source — undocumented drift between the two is itself a control failure.

Alert threshold

  • Disable or delete of a critical analytics rule — page on first occurrence.
  • Drift between Sentinel rule inventory and GitOps source persisting beyond the next sync cycle — page; the detection content is no longer reproducible.

Initial response

  1. Reapply the analytics rule from the IaC baseline; confirm the next run produces SecurityIncident rows consistent with the 30-day baseline.
  2. Walk SecurityAlert rows for the exposure window — if the rule would have fired during the disable, identify any missed incidents and process them manually.
  3. Escalate per general/ir.html — confirm Sentinel Contributor RBAC scope is bound to the GitOps service principal only and that drift-detection automation reports back to the SOC engineering channel.

References

Pair-control: azure-log-08-sentinel-data-lake (detective half, Sentinel KQL hunting plane). Equivalent on: AWS · GCP · OCI

azure-ir-05-vm-isolation ! HIGH RESPONSIVE

Document and version-control a five-step Azure VM isolation runbook that responders execute when a VM is suspected of compromise but the team needs to preserve in-memory and on-disk state for later investigation rather than deallocate-and-replace. The runbook is the human-driven complement to the Microsoft Sentinel automation playbook in azure-ir-02 — used when the Sentinel incident is below the auto-isolation severity threshold, when responders want to take a slower deliberate path, or when the automation path failed and a manual fallback is needed (Microsoft Learn — Manage security incidents in Defender for Cloud (accessed 2026-05)).

The five steps in order are: (1) az snapshot create on the OS managed disk and every attached data disk, tagged with the incident ID, before any other action — snapshots are point-in-time and a wrongly-sequenced step 2 can mutate disk state (hand off to azure-ir-03 for chain-of-custody); (2) az network nic update --network-security-group swap the VM's NIC-level NSG for the pre-staged quarantine NSG (deny-all inbound + outbound except an IR-tooling allowlist for the responder's bastion subnet); (3) az vm identity remove to detach every managed identity assigned to the VM — revoking whatever blast radius the VM's identities grant against Key Vault, Storage, and downstream Azure services; (4) tag the VM with incident_id and ir-status=isolated for inventory tracking; (5) preserve the Activity Log slice for the affected resource group by exporting the relevant time-window via az monitor activity-log list into the forensic Storage Account (handing off to azure-log-01-activity-log-centralized for the centralised audit baseline). Steps 1–3 are not commutative; document and enforce the order in the runbook.

Remediation — Azure CLI

# Azure CLI 2.x
# Five-step VM isolation runbook. Execute in order; do not re-order.
INCIDENT_ID=ir-2026-05-23-001
VM_RG=rg-workload-prod
VM_NAME=vm-app-01

# (1) Snapshot OS disk + every data disk BEFORE touching the VM.
OS_DISK_ID=$(az vm show -g "$VM_RG" -n "$VM_NAME" \
              --query storageProfile.osDisk.managedDisk.id -o tsv)
az snapshot create \
  --resource-group rg-forensic-prod \
  --name "snap-${INCIDENT_ID}-${VM_NAME}-os" \
  --source "$OS_DISK_ID" \
  --incremental true \
  --tags incident_id=$INCIDENT_ID source_vm=$VM_NAME

for data_disk_id in $(az vm show -g "$VM_RG" -n "$VM_NAME" \
                        --query "storageProfile.dataDisks[].managedDisk.id" -o tsv); do
  data_name=$(basename "$data_disk_id")
  az snapshot create \
    --resource-group rg-forensic-prod \
    --name "snap-${INCIDENT_ID}-${VM_NAME}-${data_name}" \
    --source "$data_disk_id" \
    --incremental true \
    --tags incident_id=$INCIDENT_ID source_vm=$VM_NAME
done

# (2) Swap NIC NSG to the pre-staged quarantine NSG (deny-all + IR allowlist).
NIC_ID=$(az vm show -g "$VM_RG" -n "$VM_NAME" \
           --query "networkProfile.networkInterfaces[0].id" -o tsv)
az network nic update --ids "$NIC_ID" \
  --network-security-group "/subscriptions/$SUB_ID/resourceGroups/rg-security-prod/providers/Microsoft.Network/networkSecurityGroups/nsg-ir-quarantine"

# (3) Detach every managed identity assigned to the VM.
az vm identity remove --resource-group "$VM_RG" --name "$VM_NAME" --identities '[system]'

# (4) Tag the VM for inventory.
az tag update --operation merge \
  --resource-id "/subscriptions/$SUB_ID/resourceGroups/$VM_RG/providers/Microsoft.Compute/virtualMachines/$VM_NAME" \
  --tags incident_id=$INCIDENT_ID ir-status=isolated isolated_at=$(date -u +%FT%TZ)

# (5) Preserve the Activity Log slice for the resource group into forensic Storage.
az monitor activity-log list \
  --resource-group "$VM_RG" \
  --start-time "$(date -u -d '-24 hours' +%FT%TZ)" \
  --offset 24h \
  --output json \
  > "/tmp/activity-log-${INCIDENT_ID}-${VM_RG}.json"

az storage blob upload \
  --account-name stforensicirprodweu \
  --container-name ir-evidence \
  --name "${INCIDENT_ID}/activity-log-${VM_RG}.json" \
  --file "/tmp/activity-log-${INCIDENT_ID}-${VM_RG}.json" \
  --auth-mode login

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn — Defender for Cloud incident handling (accessed 2026-05)
# Pre-staged quarantine NSG referenced by both the manual runbook (this control)
# and the Sentinel automation playbook (azure-ir-02). Single source of truth.
resource "azurerm_network_security_group" "ir_quarantine_isolation" {
  name                = "nsg-ir-quarantine-isolation"
  location            = "westeurope"
  resource_group_name = azurerm_resource_group.security.name

  security_rule {
    name                       = "DenyAllInbound"
    priority                   = 4096
    direction                  = "Inbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "AllowIRBastionInbound"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "22"
    source_address_prefix      = "10.99.0.0/24"  # IR bastion subnet only
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "DenyAllOutbound"
    priority                   = 4096
    direction                  = "Outbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  tags = { Purpose = "ir-quarantine" }
}

# Role assignment for IR team — Virtual Machine Contributor at the quarantine
# NSG scope only; combined with the responder's Entra ID PIM elevation this is
# the operational permission set for executing the five-step runbook.
resource "azurerm_role_assignment" "ir_team_isolation" {
  scope                = azurerm_network_security_group.ir_quarantine_isolation.id
  role_definition_name = "Network Contributor"
  principal_id         = var.ir_team_group_object_id
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('NSG attached during VM isolation that denies all in/out except evidence-collection workstation.')
param nsgName string
@description('CIDR of the forensic workstation allowed to RDP/SSH into the isolated VM.')
param forensicCidr string

param location string = resourceGroup().location

resource nsg 'Microsoft.Network/networkSecurityGroups@2024-03-01' = {
  name: nsgName
  location: location
  properties: {
    securityRules: [
      {
        name: 'allow-forensic-inbound'
        properties: {
          priority: 100, direction: 'Inbound', access: 'Allow', protocol: '*'
          sourceAddressPrefix: forensicCidr, sourcePortRange: '*'
          destinationAddressPrefix: '*',   destinationPortRange: '*'
        }
      }
      {
        name: 'deny-all-inbound'
        properties: {
          priority: 4096, direction: 'Inbound', access: 'Deny', protocol: '*'
          sourceAddressPrefix: '*', sourcePortRange: '*'
          destinationAddressPrefix: '*', destinationPortRange: '*'
        }
      }
      {
        name: 'deny-all-outbound'
        properties: {
          priority: 4096, direction: 'Outbound', access: 'Deny', protocol: '*'
          sourceAddressPrefix: '*', sourcePortRange: '*'
          destinationAddressPrefix: '*', destinationPortRange: '*'
        }
      }
    ]
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a IR-4(2); IR-4(7)A.5.26CLD.9.5.1

Log signals

  • SecurityAlert entries with ProductName = "Microsoft Defender for Servers" and AlertSeverity = "High" where no matching isolation playbook run lands in the LogicApps invocation log within the policy SLA (5 minutes).
  • AzureActivity Microsoft.Network/networkSecurityGroups/write applying the documented isolation NSG to the affected VM — confirmation the playbook ran or the SOC analyst ran the manual fallback.
  • AzureActivity absence-of-signal: a High alert with no subsequent isolation NSG attach within the SLA — page on the absence.

Query

SecurityAlert
          | where ProductName == "Microsoft Defender for Servers" and AlertSeverity == "High"
          | extend vmId = tostring(parse_json(Entities)[0].ResourceId)
          | join kind=leftouter (
              AzureActivity
              | where OperationNameValue == "Microsoft.Network/networkSecurityGroups/write"
              | extend nsgVm = tostring(parse_json(Properties).resource)
          ) on $left.vmId == $right.nsgVm
          | where isempty(nsgVm) or (TimeGenerated1 - TimeGenerated > 5m)
          | project TimeGenerated, AlertName, vmId, isolationDelay = TimeGenerated1 - TimeGenerated
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. The isolation-SLA query is more reliable than counting playbook invocations directly because it surfaces both playbook-not-triggered and playbook-triggered-but-failed scenarios.

Alert threshold

  • Any High-severity Defender for Servers alert without a matching isolation NSG attach within 5 minutes — page on the gap.
  • Playbook invocation logs showing repeated 4xx/5xx responses from the isolation Logic App — page; the auto-isolation pipeline is broken.

Initial response

  1. Manually apply the isolation NSG to the affected VM via az network nic update --network-security-group {nsg}; capture the SecurityAlert id and the NSG-write AzureActivity record as the ledger.
  2. Pull the Logic App run history for the failed invocation — most playbook failures trace to managed-identity permission scope drift on the target NSG resource group.
  3. Escalate per general/ir.html — confirm the playbook managed identity retains Network Contributor RBAC on the isolation-NSG resource group.

References

Equivalent on: AWS · GCP · OCI

azure-ir-06-token-revocation ! HIGH RESPONSIVE

Document and version-control a five-step compromised-identity runbook covering the canonical Microsoft Entra ID token-theft scenarios: a Service Principal client secret leaked to a public GitHub repository, a developer's laptop with a refresh token cached by Microsoft Authentication Library was lost or stolen, a third-party SaaS that held delegated Graph permissions was breached, or a primary refresh token (PRT) is known or suspected stolen via adversary-in-the-middle (AiTM) tooling such as Evilginx. The runbook integrates the detective signals from azure-ir-04 Sentinel hunting (token-theft KQL patterns) with the responsive actions of revocation and dependent-secret rotation (Microsoft Learn — Revoke user access in an emergency in Microsoft Entra ID (accessed 2026-05)).

The five steps in order: (1) Revoke all sign-in sessions — invalidate every issued refresh token for the user via Microsoft Graph POST /users/{id}/revokeSignInSessions (the Azure CLI lacks a first-class verb for this; az rest --method POST against Graph is canonical, or PowerShell Revoke-MgUserSignInSession is equivalent); (2) Disable the accountaz ad user update --account-enabled false or set accountEnabled = false via Graph PATCH; for Service Principals, az ad sp update --set servicePrincipalNames="[]" followed by deleting the credentials; (3) Enumerate downstream impact via Sentinel KQL — query SigninLogs for every app the user signed into during the incident window, then query AzureActivity for every resource the user touched, building the dependent-secrets list; (4) Rotate every dependent secret in Azure Key Vault for any Service Principal the user had Contributor or Owner on — anything the compromised principal could read from Key Vault must be rotated, not just revoked, because the attacker has already exfiltrated it; (5) Re-issue with PIM-elevated scoped roles — when access is restored, use Microsoft Entra Privileged Identity Management to bind the role assignments to just-in-time activation with MFA-on-elevation, not the standing role assignments that enabled the breach blast radius.

Remediation — Azure CLI

# Azure CLI 2.x
# Five-step compromised-identity runbook.
SUSPECT_UPN=alice@contoso.com
INCIDENT_ID=ir-2026-05-23-002

# (1) Revoke all sign-in sessions via Microsoft Graph (no first-class az verb).
USER_ID=$(az ad user show --id "$SUSPECT_UPN" --query id -o tsv)
az rest --method POST \
  --uri "https://graph.microsoft.com/v1.0/users/${USER_ID}/revokeSignInSessions"

# (2) Disable the account.
az ad user update --id "$SUSPECT_UPN" --account-enabled false

# (3) Enumerate every sign-in and every Azure activity for the user during the
# incident window. Sentinel hunting library queries (see azure-ir-04) wrap this
# pattern; the CLI invocation below is the runbook fallback.
az monitor log-analytics query \
  --workspace "$LAW_ID" \
  --analytics-query "SigninLogs
| where UserPrincipalName == '${SUSPECT_UPN}'
| where TimeGenerated > ago(7d)
| project TimeGenerated, AppDisplayName, IPAddress, ResultType, RiskState"

# (4) Rotate every Key Vault secret the user (or any SP they Owner-ed) could read.
# Build the secrets list from step (3) output and Activity Log; rotate each.
for vault_secret in $(cat /tmp/dependent-secrets-${INCIDENT_ID}.txt); do
  vault=$(echo "$vault_secret" | cut -d/ -f1)
  secret=$(echo "$vault_secret" | cut -d/ -f2)
  NEW_VAL=$(openssl rand -base64 48)
  az keyvault secret set --vault-name "$vault" --name "$secret" --value "$NEW_VAL"
done

# (5) Re-issue access via PIM (just-in-time elevation; MFA on elevation enforced).
# After incident review, the user's standing role assignments are removed; PIM
# eligibility assignments replace them.
az rest --method POST \
  --uri "https://graph.microsoft.com/v1.0/roleManagement/directory/roleEligibilityScheduleRequests" \
  --body "{
    \"action\": \"adminAssign\",
    \"principalId\": \"${USER_ID}\",
    \"roleDefinitionId\": \"4d97b98b-1d4f-4787-a291-c67834d212e7\",
    \"directoryScopeId\": \"/\",
    \"scheduleInfo\": { \"startDateTime\": \"$(date -u +%FT%TZ)\",
                         \"expiration\": { \"type\": \"afterDuration\", \"duration\": \"PT8H\" } }
  }"

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: Microsoft Learn — Revoke user access; Entra PIM (accessed 2026-05)
# Operational: Key Vault rotation policies on every secret used by workloads so
# step (4) is bounded — anything stored in a vault with a rotation policy will
# be rotated on cadence and is recoverable to a known-good state.
terraform {
  required_providers {
    azuread = {
      source  = "hashicorp/azuread"
      version = "~> 2.50"
    }
  }
}

# Sentinel scheduled analytics rule that surfaces token-theft signal patterns
# (refresh-token replay from new IP within minutes of legitimate sign-in).
resource "azurerm_sentinel_alert_rule_scheduled" "token_replay" {
  name                       = "refresh-token-replay-new-ip"
  log_analytics_workspace_id = azurerm_log_analytics_workspace.security.id
  display_name               = "Refresh token replay from new IP"
  severity                   = "High"
  query                      = <<-KQL
    SigninLogs
    | where TimeGenerated > ago(1h)
    | where AuthenticationProtocol == "OAuth2RefreshToken"
    | summarize IPs=make_set(IPAddress) by UserPrincipalName, bin(TimeGenerated, 15m)
    | where array_length(IPs) > 2
  KQL
  query_frequency            = "PT15M"
  query_period               = "PT1H"
  trigger_operator           = "GreaterThan"
  trigger_threshold          = 0
  tactics                    = ["CredentialAccess", "DefenseEvasion"]
}

# Key Vault on the security tenant with rotation policies enabled; the runbook's
# step (4) iterates against vaults of this shape to rotate dependent secrets.
resource "azurerm_key_vault" "workload_secrets" {
  name                       = "kv-workload-prod-weu"
  location                   = "westeurope"
  resource_group_name        = azurerm_resource_group.security.name
  tenant_id                  = data.azurerm_client_config.current.tenant_id
  sku_name                   = "premium"
  enable_rbac_authorization  = true
  purge_protection_enabled   = true
  soft_delete_retention_days = 90
}

Remediation — Bicep

targetScope = 'tenant'

@description('Object IDs of users whose refresh tokens must be invalidated.')
param compromisedUserIds array

// Microsoft Graph: revokeSignInSessions invalidates refresh tokens for the user.
// Authored via deployment script so the action is recorded in IaC.
resource revoke 'Microsoft.Resources/deploymentScripts@2023-08-01' = [for (uid, i) in compromisedUserIds: {
  name: 'revoke-${i}'
  location: deployment().location
  kind: 'AzurePowerShell'
  identity: { type: 'UserAssigned', userAssignedIdentities: { '<graph-identity-id>': {} } }
  properties: {
    azPowerShellVersion: '11.0'
    scriptContent: 'Revoke-MgUserSignInSession -UserId ${uid}'
    cleanupPreference: 'OnSuccess'
    retentionInterval: 'P1D'
  }
}]

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a IR-4; IA-5; AC-2(13)A.5.26; A.5.17CLD.9.5.1

Log signals

  • AuditLogs Category = "UserManagement" with ActivityDisplayName = "Reset password (by admin)" or "Revoke sign-in sessions" — establishes the ledger of issued revocations.
  • SigninLogs immediately after revocation showing successful sign-in via a refresh token whose lifetime predates the revocation timestamp — failure of the revocation propagation.
  • AuditLogs ApplicationManagement showing credential rotation events on service principals tied to the compromised identity workflow.

Query

AuditLogs
          | where ActivityDisplayName in ("Revoke sign-in sessions", "Reset password (by admin)", "Disable user")
          | join kind=leftouter (
              SigninLogs
              | project sigUpn=UserPrincipalName, sigTime=TimeGenerated, sigResult=ResultType
          ) on $left.TargetResources[0].userPrincipalName == $right.sigUpn
          | where isnotempty(sigTime) and sigTime > TimeGenerated and sigResult == 0
          | project TimeGenerated, RevocationActor=InitiatedBy, target=tostring(TargetResources[0].userPrincipalName), gapMin=(sigTime - TimeGenerated)/1m, sigTime
          | order by TimeGenerated desc
          | take 200

Run as a KQL query in Log Analytics. Successful sign-in after a revocation event is a propagation-failure signal; the same query also surfaces stuck refresh tokens for principals that should be fully cordoned.

Alert threshold

  • Any successful sign-in for a principal whose Revoke sign-in sessions ran within the prior 15 minutes — page; the refresh-token chain did not propagate.
  • Three or more revocation events on the same UPN within an hour — likely incident in progress; confirm the IR ticket exists.

Initial response

  1. Re-issue revocation via Revoke-MgUserSignInSession and rotate the user's password; if the principal is a service principal, rotate the client credential and reissue the federated trust.
  2. Walk SigninLogs for any application-context token issued during the gap window; force token-grant revocation via Revoke-MgServicePrincipalSignInSession per affected app registration.
  3. Escalate per general/ir.html — confirm the Entra ID continuous-access-evaluation (CAE) policy remains assigned at the tenant scope to shorten refresh-token propagation lag.

References

Equivalent on: AWS · GCP · OCI

azure-ir-07-tabletop ! MEDIUM PREVENTIVE

Run quarterly tabletop exercises against four to five documented Azure-specific scenarios drawn from the threat-model corpus on the General threat-model page: a Service Principal client secret discovered leaked to a public GitHub repository, a mass Key Vault key-read event by a workload identity outside its baseline, a Microsoft Defender for Cloud CRITICAL finding (such as a publicly exposed Storage container with regulated data, or a public-access-misconfigured Cosmos DB account), a Storage Account public-access misconfiguration discovered post-deployment, and an Entra ID token-theft chain detected via Sentinel impossible-travel hunting. For each exercise, run the relevant playbook end-to-end against a non-production subscription, time the steps, and track mean-time-to-contain (MTTC) and mean-time-to-eradicate (MTTE) as the primary metrics quarter-over-quarter (NIST SP 800-61 Rev 3 — CSF 2.0 Community Profile, April 2025 release (accessed 2026-05)).

Independently, test the Microsoft Incident Response engagement path annually: confirm the customer's Microsoft Premier or Unified support contract entitles the organisation to engage Microsoft Incident Response (the team formerly known as DART, the Detection and Response Team), validate that the named security contacts in the tenant's Entra ID role assignments are still correct, and run a dry-run engagement request through the Microsoft Support portal so the responder workflow is documented (Microsoft Incident Response engagement page (accessed 2026-05)). The DART team is cited here as the documented escalation vendor; the engagement contract is not itself a control on this page (the control is testing that the engagement works).

Exercises that surface a runbook step that is wrong, missing, or unexecutable are tracked as findings against the runbook repo and remediated before the next quarter. The control is typed PREVENTIVE — mirroring the Phase 6 aws-ir-07 precedent — because its value lies in preventing the runbook decay that always happens to documentation nobody runs: runbooks written and never tested are, in practice, runbooks that do not work when they are needed. The severity is MEDIUM because the control mitigates failure-modes of other controls rather than a direct attack vector. The principle is reinforced in General IR — tabletop exercises lessons-learned guidance and in ISO/IEC 27001:2022 A.5.27 (learning from incidents).

Remediation — Azure CLI

# Azure CLI 2.x
# Tabletop exercises are run as facilitated workshops; the Azure surface they
# touch is the non-prod tabletop subscription. A representative driver command:
# stand up a deliberately misconfigured Storage Account so the responder can
# practice the "public Storage container with regulated data" scenario end-to-end.

# All commands run against the dedicated tabletop subscription.
az account set --subscription "$TABLETOP_SUB_ID"

# Create the tabletop Storage Account.
az storage account create \
  --resource-group rg-tabletop \
  --name sttabletoppubpiiscenario2606 \
  --location westeurope \
  --sku Standard_LRS \
  --kind StorageV2

# Deliberately permit anonymous blob public access for the exercise. PRODUCTION
# Storage Accounts MUST NEVER have this configuration; this is exercise-only.
az storage account update \
  --resource-group rg-tabletop \
  --name sttabletoppubpiiscenario2606 \
  --allow-blob-public-access true \
  --public-network-access Enabled

az storage container create \
  --account-name sttabletoppubpiiscenario2606 \
  --name public-pii-scenario \
  --public-access blob \
  --auth-mode login

# Drop a synthetic regulated-data file so the responder has something to remediate.
echo "name,ssn" > /tmp/tabletop-pii.csv
echo "Jane Doe,000-00-0000" >> /tmp/tabletop-pii.csv
az storage blob upload \
  --account-name sttabletoppubpiiscenario2606 \
  --container-name public-pii-scenario \
  --name tabletop-pii.csv \
  --file /tmp/tabletop-pii.csv \
  --auth-mode login

# MTTC is measured from the Microsoft Defender for Cloud finding emission to
# the moment the Storage Account is back to public-access disabled with the
# blob removed. Track the duration in a Log Analytics IR metrics workbook.

# Annual Microsoft Incident Response engagement dry-run — open a low-severity
# Microsoft Support ticket through the Azure portal tagged "IR engagement test"
# so the named-contacts workflow is documented and the response SLA is observed.

Remediation — Terraform

# Terraform AzureRM provider ~> 3.0
# Source: NIST SP 800-61 Rev 3; Microsoft IR engagement page (accessed 2026-05)
# Tabletop-subscription-only resources. Apply only to the dedicated tabletop
# management group; never propagate this module to a production subscription.
resource "azurerm_storage_account" "tabletop_scenario" {
  name                            = "sttabletoppubpiiscenario2606"
  resource_group_name             = azurerm_resource_group.tabletop.name
  location                        = "westeurope"
  account_tier                    = "Standard"
  account_replication_type        = "LRS"
  allow_nested_items_to_be_public = true   # tabletop-only; never in prod
  public_network_access_enabled   = true   # tabletop-only; never in prod
  tags                            = { Purpose = "ir-tabletop", Quarter = "2026Q2" }
}

resource "azurerm_storage_container" "tabletop_public" {
  name                  = "public-pii-scenario"
  storage_account_name  = azurerm_storage_account.tabletop_scenario.name
  container_access_type = "blob"  # tabletop-only; never in prod
}

# Log Analytics workbook visualising MTTC + MTTE per quarter across exercises.
# Workbook JSON template lives in the security repo and is applied via the
# azurerm_application_insights_workbook resource.
resource "azurerm_application_insights_workbook" "ir_tabletop_metrics" {
  name                = "ir-tabletop-metrics"
  resource_group_name = azurerm_resource_group.security.name
  location            = "westeurope"
  display_name        = "IR tabletop — MTTC + MTTE by quarter"
  data_json           = file("${path.module}/workbooks/ir-tabletop-metrics.json")
}

Remediation — Bicep

targetScope = 'resourceGroup'

@description('Storage account hosting tabletop exercise artefacts (immutable retention).')
param storageName string

resource storage 'Microsoft.Storage/storageAccounts@2024-01-01' existing = {
  name: storageName
}

resource container 'Microsoft.Storage/storageAccounts/blobServices/containers@2024-01-01' = {
  name: '${storageName}/default/ir-tabletop-evidence'
  properties: {
    publicAccess: 'None'
    immutableStorageWithVersioning: { enabled: true }
  }
}

Compliance mapping

CIS AWS Foundations v3.0.0 CIS Microsoft Azure Foundations v3.0.0 CIS GCP Foundation v4.0.0 CIS OCI Foundation v2.0.0 NIST SP 800-53 rev5 ISO/IEC 27001:2022 ISO/IEC 27017:2015
n/a(best-practices)n/an/a IR-2; IR-3; IR-3(2)A.5.24; A.5.27n/a

Log signals

  • Tabletop exercises are observational, not telemetric — the Sentinel watchlist TabletopSchedule drives the cadence. The detection signal is a missing scheduled exercise on a quarter that has elapsed without a corresponding entry in the TabletopRunResults custom table.
  • Defender for Cloud regulatory-compliance gap on the BCP/DR-related controls (ISO 27001:2022 A.5.30) — downstream signal that exercise evidence has not been refreshed in the audit window.
  • Sentinel automation-rule misfires during exercise replay scenarios — exposes brittle playbook coverage that should be addressed before a real incident exercises the same path.

Query

_GetWatchlist('TabletopSchedule')
          | join kind=leftouter (
              TabletopRunResults_CL
              | summarize lastRun=max(TimeGenerated) by ScenarioId=tostring(scenarioId_s)
          ) on $left.ScenarioId == $right.ScenarioId
          | extend overdueDays = (now() - lastRun) / 1d
          | where isempty(lastRun) or overdueDays > 90
          | project ScenarioId, ScenarioName, lastRun, overdueDays
          | order by overdueDays desc

Run as a KQL query in Log Analytics; the join against the custom TabletopRunResults_CL table is the durable observability surface for the tabletop programme. Persist as a Sentinel analytics rule severity Medium and pair with a quarterly reporting cadence to the security-governance forum.

Alert threshold

  • Any scheduled scenario with no run in the prior 90 days — page the BCP/DR programme manager.
  • Three or more brittle playbook misfires during exercise replay — page the SOC engineering lead; the playbook surface needs hardening before the next live incident.

Initial response

  1. Schedule the overdue scenario in the next IR drill calendar; capture the kickoff in the TabletopRunResults_CL custom table to refresh the timestamp.
  2. Cross-check Defender for Cloud regulatory-compliance state on the BCP/DR control set — overdue exercises usually correlate with compliance regressions on the framework dashboards.
  3. Escalate per general/ir.html — confirm the tabletop programme charter remains aligned with the org's audit-frequency commitment and that the watchlist captures every documented scenario.

References

Equivalent on: AWS · GCP · OCI

Sources