General Logging & Detection Principles

Overview

Logging is the foundation every other detective and responsive control in the cloud rests on. Without an authoritative, tamper-evident record of who did what to which resource, incident response is forensically blind: the responder cannot tell whether an attacker reached a database, whether credentials were used after exfiltration, or whether a configuration change was authorised. The corollary is operational. A control that exists only as a configuration setting, with no corresponding log entry, cannot be audited and in practice is not enforced. This page sets the provider-neutral principles that the four provider logging pages (aws/logging.html, azure/logging.html, gcp/logging.html, oci/logging.html) then instantiate.

Detection is logging plus interpretation. A CloudTrail event, an Azure Activity Log entry, a Google Cloud Audit Log record, or an OCI Audit event is raw material; turning it into a finding takes a detection rule, an analyst who maintains the rule, a tested true-positive case, and an alert pipeline that reaches the on-call responder before the attacker has finished. The MITRE ATT&CK for Cloud matrix is the standard taxonomy of attacker techniques that detection engineering aims to cover. Coverage is measured by mapping each maintained detection rule to one or more ATT&CK techniques and reporting gap regions to leadership, rather than by reporting raw rule counts.

The rest of this page treats the logging-and-detection pipeline as a single discipline. Three log classes (control-plane, data-plane, network) are aggregated to a security-dedicated destination, made tamper-evident, retained against compliance- and forensic-driven floors, ingested by a SIEM, turned into alerts through maintained detection content, and routed to runbook-equipped responders via the incident response workflow. The pipeline is the control. See general/threat-model.html for the adversary techniques each stage is designed to surface, and general/network.html for VPC and subnet flow-log sourcing.

What to log

Three log classes are mandatory in every cloud-resident environment. Control-plane audit logs record every API call against the provider's management plane: identity, action, target resource, source IP, success or failure, request parameters. AWS CloudTrail, Azure Activity Log (and Microsoft Entra ID audit and sign-in logs), Google Cloud Audit Logs (Admin Activity, Data Access, System Event, Policy Denied streams), and OCI Audit are the four standard sources. Control-plane logs are the single most important log class, because almost every cloud-resident attack chain (credential abuse, role assumption, key disablement, public-resource creation) passes through the control plane at some point.

Data-plane access logs record every read and write against storage and database services: S3 server-access logs, Azure Storage diagnostic logs, GCS data-access logs (a subset of Cloud Audit Logs), OCI Object Storage request logs. Data-plane volume runs one to three orders of magnitude higher than control-plane volume, so the design choice is which buckets, databases, and PaaS endpoints warrant data-plane logging (typically every Restricted-tagged resource, every public-facing endpoint, and every cross-account-accessed resource) rather than whether to enable it globally.

Network flow logs record connection metadata at the subnet, NIC, or virtual-network level: VPC Flow Logs in AWS, NSG Flow Logs (legacy) and VNet Flow Logs in Azure, VPC Flow Logs in GCP, and VCN Flow Logs in OCI. Flow logs carry no payload, but they do carry source, destination, port, protocol, and action, which makes them the primary evidence source for post-compromise lateral-movement analysis. See general/network.html for the segmentation model that flow logs validate.

NIST SP 800-92 (Guide to Computer Security Log Management) formalises the discipline of selecting, prioritising, and managing log sources; CIS Control 8 (Audit Log Management) in CIS Controls v8 sets out the operational checklist (enable audit logs, centralise collection, ensure adequate storage, configure detailed audit logging, review logs). The principle behind both is the same: log selection is a deliberate, classification-driven decision, not a side effect of enabling every available source.

Log integrity

A log the attacker can edit, delete, or silently halt is not evidence. Log integrity is engineered, not assumed. Three controls combine to provide tamper-evidence. First, cryptographic chaining binds log entries together so that any deletion or modification breaks the chain. AWS CloudTrail log file validation produces a digest file that hashes every delivered log file, signed by AWS; equivalent integrity hashing is available in Azure Monitor diagnostic settings exports and in OCI Audit export pipelines. Second, write-once storage places log archives in an object store configured with object-lock or retention-rule policies (S3 Object Lock in Compliance mode, Azure immutable storage with time-based retention, GCS retention policy with bucket lock, OCI Object Storage retention rules) so that even an account-administrator principal cannot delete logs within the locked retention period. Third, cross-account isolation, covered in the next section, separates the identity that writes logs from the identity that can administer the log store, so that compromise of the workload account does not grant access to retroactively modify the logs that captured the compromise.

NIST SP 800-92 §5.4 (Protecting Log Data) defines the integrity model these controls instantiate: logs at rest are protected with the same rigour as the data they describe; logs in transit are encrypted under TLS; access to log infrastructure is restricted to a small, monitored set of administrators. CloudTrail log file validation should be verified continuously by an automated job that re-runs the digest check and alerts on any mismatch. A "validation succeeded" check that nobody runs is the same as no check at all.

MISCONFIGURATION "Logs are written to a bucket in the same account as the workload." This is the dominant log-integrity failure mode. An attacker who compromises the workload account inherits the IAM permissions that govern the in-account log bucket; with sufficient privilege, the attacker can disable the trail, delete the historical archive, or replace it with sanitised content before the next forensic review. Logs MUST flow into a security-dedicated account, subscription, project, or compartment whose administrators are organisationally separate from the workload account's administrators and whose log bucket is configured with object lock that survives root-account credentials. The reference architecture is a hub-and-spoke "log archive" account modelled after the AWS Control Tower Log Archive account, the Azure landing-zone Management subscription's centralised log workspace, the GCP organization log sink to a security folder's project, and the OCI security tenancy compartment with an aggregated logging compartment.

Centralization

Centralisation is the architectural pattern that turns the integrity rules above into operational reality. The pattern is hub-and-spoke: every workload account, subscription, project, or tenancy compartment ("spokes") emits its logs to a single security-dedicated destination ("hub") whose administration is segregated. Centralisation provides three properties at once: tamper-evidence (the hub's object lock survives the compromise of any spoke), unified detection scope (a SIEM ingests from one location rather than N), and economy of analyst attention (cross-spoke correlation surfaces attacks that touch multiple accounts).

Each provider names the centralisation primitive differently. AWS uses an Organization Trail in CloudTrail (one configuration covers every account in the organization) delivering to an S3 bucket in the dedicated Log Archive account, with Object Lock enabled and a bucket policy that permits the organization to PutObject but denies any DeleteObject or modification action. Azure uses Diagnostic Settings on every subscription, routing Activity Log and resource logs to a central Log Analytics workspace (and/or a storage account for long retention) in a dedicated security subscription, optionally fed into an enterprise Microsoft Sentinel workspace. GCP uses Aggregated Sinks at the organization or folder level (one sink, many source projects) routing to a Cloud Storage bucket, BigQuery dataset, or Pub/Sub topic in a security folder's project. OCI uses Connector Hub to route audit and service logs from every compartment into a centralised destination (Object Storage with retention rules, or directly into Logging Analytics for query).

Cross-account and cross-tenant log routing requires explicit trust configuration. In AWS, the central S3 bucket policy permits the organization principal via aws:PrincipalOrgID; in Azure, diagnostic settings can target a Log Analytics workspace in a different subscription as long as the writing identity holds the appropriate role; in GCP, the aggregated sink's writer service account must be granted IAM access on the destination project; in OCI, the Connector Hub identity must hold IAM policies in both source and destination compartments. In every case the trust is one-way: the spoke can write but cannot read or modify, and the hub can read but has no administrative rights back into the spoke.

Retention

Retention floors are compliance-driven and must be encoded into the centralised log destination's retention configuration, not left to operator memory. PCI DSS v4.0 requires audit log retention of at least one year, with the most recent three months immediately available for analysis. HIPAA's Security Rule (45 CFR §164.316(b)(2)) requires retention of documentation, including audit trails, for six years from the date of creation or last effective date. SOX (Sarbanes-Oxley) financial-controls logs are typically retained seven years. SOC 2 requires retention sufficient to support the audit period (usually one year minimum). The applicable floor for any given log is the maximum of the regulatory floors the underlying data class is subject to.

Hot versus cold tiering reconciles retention floors with searchability cost. The hot tier (CloudWatch Logs, Log Analytics workspace, BigQuery, OCI Logging Analytics) is queryable in seconds, expensive per-GB-month, and typically holds 30 to 90 days of data. The cold tier (S3 with Glacier transitions, Storage Account with archive tier, GCS Coldline-Archive, OCI Archive Storage) is queryable in hours, cheap per-GB-month, and holds the rest of the retention floor. Lifecycle rules automate the transition: a SIEM ingest pipeline reads from hot, and ad-hoc forensic retrieval reads from cold via a rehydration job documented in the incident response runbook.

SIEM and detection engineering

A SIEM is the system that converts centralised logs into prioritised findings via maintained detection content. Provider-native SIEMs integrate by default with the corresponding provider's audit, posture, and threat-detection signals: AWS Security Hub (ingesting GuardDuty, Inspector, Config, Macie, and IAM Access Analyzer findings), Microsoft Sentinel (cloud-native SIEM/SOAR with KQL detection rules and built-in workbooks), Google Chronicle Security Operations (paired with Security Command Center for posture findings), and OCI Cloud Guard (with Logging Analytics for log-based detections). Third-party SIEMs (Splunk, Elastic Security, Sumo Logic, IBM QRadar, Devo) ingest from the same centralised log destinations via Lambda, Logic App, Cloud Function, or Service Connector forwarders, and are the typical choice when a single SIEM must span multiple clouds plus on-premises sources.

Detection engineering is the discipline of building and maintaining the rules that turn logs into findings. Every detection rule in this corpus follows a four-part contract: a hypothesis (a one-paragraph attacker behaviour described in MITRE ATT&CK terms, e.g., "T1078.004: adversary signs in to a cloud account using valid credentials from an unusual geography"), a log-source dependency (which log class and which fields the rule reads), a true-positive test case (a reproducible event sequence that triggers the rule, verified at least quarterly), and an owner (a named team responsible for tuning the rule). Detection-as-code formalises this: rules live in version control as Sigma (vendor-neutral), Sentinel KQL files, Chronicle YARA-L rules, or Splunk SPL saved searches, with pull-request review and CI-time validation against a golden-event corpus.

Coverage is measured against MITRE ATT&CK for Cloud (the IaaS, SaaS, Office 365, Azure AD, and Google Workspace sub-matrices) rather than against raw rule counts. A team with 800 detection rules concentrated in two ATT&CK tactics has worse coverage than a team with 200 rules spanning twelve tactics. The detection-engineering output therefore includes a heatmap that maps each maintained rule to one or more ATT&CK techniques, plus an annual review that prioritises new rules into gap regions. Joint CISA and National Security Agency guidance on detection engineering for cloud environments supports this technique-driven coverage model.

Alerting and runbook integration

Alert fatigue is the single most common reason detection programmes fail. A responder who receives 200 alerts per shift, 195 of which are false positives or low-severity noise, will stop reading the channel, and the five real findings travel through the same dead channel. The mitigation is severity-tiered routing combined with continuous false-positive review. CRITICAL findings page the on-call engineer directly (PagerDuty, Opsgenie, Splunk On-Call). HIGH findings open a ticket in the security ticketing queue (Jira Security, ServiceNow SIR) for review within one business day. MEDIUM findings populate a daily-review dashboard. LOW findings populate a weekly-review dashboard. Each tier carries a documented true-positive rate target; rules whose true-positive rate falls below the target are tuned or retired rather than left to noise the channel.

Every CRITICAL and HIGH detection rule is paired with a runbook: a documented step-by-step response procedure that the on-call responder executes. The runbook references the general incident response page for the lifecycle phases (containment, eradication, recovery, post-incident) and the standard actions per phase. Without a runbook, a paged responder spends the first thirty minutes deciding what to do; with a runbook, those thirty minutes go to containment.

Cross-provider equivalence

The four providers implement the logging-and-detection pipeline under different names. The table below maps the principles in this page to the provider-native primitives. Each provider deep-dive (aws/logging.html, azure/logging.html, gcp/logging.html, oci/logging.html) carries the per-service configuration detail and the per-provider detection-content libraries.

Principle	AWS	Azure	GCP	OCI
Control-plane audit log	CloudTrail (Organization Trail)	Activity Log + Microsoft Entra audit and sign-in logs	Cloud Audit Logs (Admin Activity stream)	OCI Audit service
Centralisation primitive	Organization Trail → S3 in Log Archive account with Object Lock	Diagnostic Settings → central Log Analytics workspace in security subscription	Organization-level Aggregated Sink → GCS / BigQuery / Pub/Sub in security project	Connector Hub → Object Storage / Logging Analytics in security tenancy compartment
Provider-native SIEM	Security Hub + GuardDuty findings aggregation	Microsoft Sentinel	Chronicle Security Operations + Security Command Center	Cloud Guard + Logging Analytics
Network flow logs	VPC Flow Logs	VNet Flow Logs (NSG Flow Logs legacy)	VPC Flow Logs	VCN Flow Logs
Log integrity / tamper evidence	CloudTrail log file validation + S3 Object Lock	Immutable storage with time-based retention policy	GCS retention policy with bucket lock	Object Storage retention rules with retention-rule lock

Illustrative control: centralized immutable audit log

The control-box below is an illustrative example of the markup pattern every provider logging page applies. It is not a production control entry (provider pages carry CLI and IaC remediations specific to each cloud), but the threat-model framing and the CRITICAL DETECTIVE pairing transfer directly. Reading this box alongside the CRITICAL PREVENTIVE example on the data-protection page exercises the distinction the methodology page emphasises: same severity, different operational meaning. The illustrative ID gen-log-ex-01 is reserved and is not reused as a real control identifier.

gen-log-ex-01

Centralized immutable audit log for all control-plane API calls

HIGH DETECTIVE

MITIGATES Post-compromise forensic blindness and attacker tampering of evidence. A centralised, organisation-wide control-plane audit log delivered to a tamper-evident store in a security-dedicated account records every privileged action (IAM change, key access, public-resource creation, security-tool disablement) outside the blast radius of the workload account being audited.

ATTACK VECTOR Absent centralisation and immutability, an attacker who reaches the workload account's audit-log administration permission can disable the local trail, delete the historical archive, or replace it with sanitised content before the next review. Capital One (2019), Snowflake / UNC5537 (2024), and Midnight Blizzard / Microsoft (2024) each show variants of this kill chain, where the logs that would have surfaced the intrusion earlier were incomplete, decentralised, or accessible to the same identity scope as the compromised workload.

BLAST RADIUS Without this control: every account, subscription, project, or compartment in the organization is forensically blind once the workload identity is compromised. With this control: the workload account compromise is bounded by the workload account's permissions; the forensic record remains available in the security tenancy for incident response and regulatory reporting.

HIGH (not CRITICAL) because the absence of this control does not by itself enable compromise: an attacker still needs an initial-access vector, a credential, or a vulnerability. It does, however, materially raise the cost and likelihood of successful exploitation and forfeits the ability to detect and respond. DETECTIVE (not PREVENTIVE) because the control surfaces unsafe states after they occur rather than preventing them. Paired with alerting and runbook integration it becomes the trigger for response, but it does not stop an action at the control plane. This exercises the methodology distinction that a HIGH DETECTIVE differs operationally from a HIGH PREVENTIVE even though both carry the same severity colour: the responder receives an alert and acts; the preventive control would have refused the action entirely. Maps cross-provider to CIS AWS Foundations v7.0.0 (CloudTrail enabled in all regions, log file validation enabled, log delivery to dedicated S3 bucket), CIS Microsoft Azure Foundations v6.0.0 (Activity Log diagnostic settings, log retention, immutability), CIS GCP Foundation v5.0.0 (Cloud Audit Logs configured for all services, sink to immutable storage), CIS OCI Foundation v3.1.0 (Audit retention and centralisation), NIST SP 800-53 rev5 AU-2 (event logging), AU-9 (protection of audit information), and AU-12 (audit record generation), ISO/IEC 27001:2022 A.8.15 (logging), and ISO/IEC 27017:2015 CLD.12.4.1 (monitoring of cloud services).