Data is what attackers want. Compute is rented, identities are revocable, networks are reconfigurable — but stolen customer records, regulated health information, payment data, and source code carry the financial and regulatory consequences that turn a misconfiguration into a breach disclosure. Every other control domain in this corpus — identity, network, logging and detection, incident response — exists in service of preventing, detecting, or recovering from unauthorized access to data. Data protection therefore sits at the centre of the threat model, and the most cost-effective control posture is the one calibrated to the sensitivity of the data being defended.
Calibration starts with classification. A four-tier scheme labels every dataset by sensitivity before any encryption, access, or retention decision is made. Classification drives encryption-key custody (provider-managed for low-sensitivity data; customer-managed with hardware-security-module backing for regulated data), retention floors (compliance-driven minimums for PCI, HIPAA, SOX), backup posture (immutable for ransomware-targeted data), and data-loss-prevention coverage (full content inspection for restricted data; metadata-only sampling for internal data). Controls applied without classification are theatre: encrypting public marketing copy with a customer-managed key wastes operational effort, while encrypting personal health information with a default service key concedes key access to the provider's operations staff in ways the threat model may not tolerate.
This page treats encryption in transit as a cross-link to the general network principles page rather than duplicating it here. One canonical treatment per cross-cutting topic keeps the corpus internally consistent and avoids the divergence that creeps in when the same material is maintained in two places. Forward-links to aws/data.html, azure/data.html, gcp/data.html, and oci/data.html carry the provider-specific implementations of the principles below.
Data classification
A four-tier classification scheme is the minimum viable taxonomy: Public (intended for unrestricted disclosure — marketing collateral, published documentation, open-source code), Internal (intended for employees and contractors — engineering wikis, internal roadmaps, non-customer operational data), Confidential (subject to contractual or competitive harm if disclosed — customer lists, pricing models, unannounced product plans, security findings), and Restricted (subject to statutory, regulatory, or contractual penalty if disclosed — personally identifiable information, protected health information, payment card data, controlled unclassified information). Restricted data inherits every control applied to Confidential plus additional regulatory-class-specific controls.
Regulatory classes that overlay the four-tier scheme include PII (personally identifiable information — names, addresses, identifiers; jurisdictionally defined, with GDPR Art. 4(1) and CCPA §1798.140 as the typical references), PHI (protected health information under US HIPAA 45 CFR §160.103; subject to the HIPAA Security Rule's administrative, physical, and technical safeguards), PCI (payment card data under PCI DSS v4.0; cardholder data and sensitive authentication data carry separate handling requirements), and CUI (controlled unclassified information under US federal contracts, governed by NIST SP 800-171). A single record can carry multiple regulatory class labels — a healthcare payment record is simultaneously PHI and PCI — and the controls applied are the union, not the intersection.
Classification is most reliably enforced when applied at write time as a label that travels with the data. Provider tag-based labelling (AWS resource tags, Azure resource tags and Microsoft Purview sensitivity labels, GCP labels and Data Catalog tags, OCI defined tags) makes classification queryable and policy-actionable: a deny policy can prohibit Restricted-tagged buckets from being made public; a key-access policy can require additional principals for keys protecting Restricted data; a backup policy can require longer retention for Restricted-tagged volumes. NIST SP 800-60 vol 1 provides the federal taxonomy for classifying information types and is the authoritative reference for any organization mapping its scheme to FIPS 199 impact levels.
Classification without enforcement is a documentation exercise. Every classification scheme this corpus references therefore assumes a tagging-and-policy enforcement loop: data is tagged at creation, policies condition on tags, and a periodic scan re-classifies datasets that drift (for example, a bucket that started as Internal and accumulates customer records becomes Restricted by content even if its tag is stale). The data-loss-prevention section below describes the scanning side of that loop.
Encryption at rest
Encryption at rest protects data persisted to disk — block volumes, object storage, managed databases, snapshots, backups, log archives. Every major cloud provider encrypts data at rest by default with a provider-managed service key (AWS S3 SSE-S3 and EBS default encryption; Azure Storage service-side encryption; GCP default at-rest encryption with Google-managed keys; OCI Object Storage and Block Volume default encryption). The cryptographic primitive in every case is AES-256-GCM or AES-256-XTS, vetted against FIPS 140-2 / 140-3 module requirements. The question is never whether data is encrypted at rest, but who controls the key.
Three key-custody models exist. Provider-managed (SSE) keys are generated, rotated, and accessed by the provider with no customer-visible key material; the customer's IAM permissions on the data resource gate access, but the provider's operations staff can decrypt under legal process or insider-threat scenarios. Customer-managed keys (CMK) — AWS KMS customer-managed keys, Azure Key Vault keys, GCP Cloud KMS CMEK, OCI Vault customer-managed keys — are still hosted by the provider's key-management service, but the customer controls the key policy (who can encrypt, who can decrypt, whether the key may leave the region) and can revoke access by deleting or disabling the key. Customer-supplied / external keys (CSE / HYOK / EKM) — AWS XKS, Azure Key Vault Managed HSM with BYOK, GCP External Key Manager, OCI Vault with HSM-backed keys — keep key material in customer-controlled hardware (often on-premises or in a sovereign HSM) and have the cloud KMS forward decryption requests to the external system; the customer can sever decryption globally by disabling the external key endpoint.
NIST SP 800-111 (Storage Encryption Technologies for End User Devices) and NIST SP 800-175B (Guideline for Using Cryptographic Standards in the Federal Government) define the algorithm-strength and key-handling baselines that the three custody models inherit. The custody decision is a threat-model decision, not a default. CMK is required whenever the threat model treats provider operations staff as a relevant adversary, whenever data sovereignty rules require the customer to attest key control, or whenever key escrow against subpoena needs to be customer-mediated. HSM-backed keys are required whenever FIPS 140-2 Level 3 (tamper-evident, identity-based authentication) or Level 4 (tamper-active) module assurance is mandated by contract or regulation.
[Diagram placeholder]
Figure 1 — Envelope encryption flow: a data-encryption key (DEK) generated by the provider KMS encrypts the object payload; the DEK itself is encrypted under a customer-managed key-encryption key (KEK) and stored alongside the ciphertext. Decryption requires the consumer to call KMS Decrypt against the KEK, which in turn requires the consumer's IAM principal to satisfy the KEK's key policy. Revoking the KEK invalidates every wrapped DEK without re-encrypting any object.
Key management
Key management is the operational discipline that turns a customer-managed key from a checkbox into a control. NIST SP 800-57 Part 1 Rev 5 (Recommendation for Key Management) defines the canonical key lifecycle: generation (inside an HSM or vetted RNG; never derived from low-entropy sources), distribution (key material never leaves the HSM in plaintext; only wrapped DEKs cross the boundary), storage (keys at rest inside the KMS / Vault / Cloud KMS / OCI Vault are themselves wrapped under a service root key), use (key access conditioned on IAM identity, network origin, and request context), rotation (cadenced replacement of the active key version; old versions retained for decrypt-only operations until data is re-encrypted under the new version), and destruction (scheduled deletion with a mandatory waiting period to prevent accidental key loss).
Rotation cadence is a function of data sensitivity. Annual rotation is the minimum for any customer-managed key protecting Internal or Confidential data and is the cadence CIS recommends for symmetric KMS keys (CIS AWS Foundations v3.0.0 calls for annual rotation of KMS customer-managed keys; CIS Microsoft Azure Foundations v3.0.0 calls for Key Vault key rotation; CIS GCP Foundation v4.0.0 specifies CMEK rotation; CIS OCI Foundation v2.0.0 specifies Vault key rotation). 90-day rotation applies to keys protecting Restricted data and to keys associated with high-volume signing operations where cryptanalytic exposure scales with use. Immediate rotation is required whenever a key administrator role is revoked, whenever a key is suspected of compromise, or whenever a HSM tamper event is recorded.
Key policy hygiene is where most customer-managed-key deployments break. The dominant failure mode is a key policy that grants kms:Decrypt or equivalent to "Principal": "*" with no condition, intended as a development convenience and forgotten in production. Every key policy in this corpus follows three rules: (1) no wildcard principals — every grant names an account, role, or workload identity; (2) least-privilege actions — encrypt-only principals get encrypt-only grants, decrypt-only principals get decrypt-only grants, key administrators get key-administration actions without data-plane decrypt; (3) cross-account or cross-tenant key sharing is an explicit deliberate decision, documented in the key tags and audited via a dedicated CloudTrail / Activity Log / Cloud Audit Logs / OCI Audit alert.
Separation of duties applies inside the KMS as much as it applies in IAM. Key administrators (who can change policy, schedule deletion, rotate) should not be data-plane users (who can call Decrypt against the key). Break-glass key access — for example, the principal authorised to recover from a forgotten administrative key — is held by a separately controlled identity with strong MFA, alerting on every use, and a documented runbook tying back to the incident response page. Cross-link to general IAM principles for the underlying identity model that key policies attach to.
Encryption in transit
Encryption in transit is treated canonically on the general network principles page. The short version: TLS 1.2 is the floor, TLS 1.3 is preferred, mTLS is required for service-to-service inside the trust boundary, and provider-internal traffic between regions or between services is not exempt — assume the network is hostile and require encryption on every link. See general/network.html §Encryption in transit for the full treatment including IETF RFC 8446, cipher-suite policy, and provider mTLS implementations.
Retention, backup, and recovery
Retention is the deliberate decision to keep data for a defined period and to delete it afterwards. Both halves matter. Under-retention loses forensic evidence (logs deleted before an intrusion is discovered), breaches contractual minimums (PCI DSS requires audit logs retained 1 year with 3 months immediately available; HIPAA requires 6 years of audit-trail retention; SOX requires 7 years for financial records), and forfeits the ability to restore from before a corruption event. Over-retention accumulates regulated data past the lawful basis for processing (GDPR Art. 5(1)(e) storage-limitation principle), expands the breach blast radius unnecessarily, and inflates storage cost. Retention policy is therefore a per-classification, per-data-type decision, not a global default.
Backup is the operational system that makes retention recoverable. The cloud-adapted 3-2-1 backup rule reads: three copies of data, on two different storage classes or providers, with one copy isolated from the primary control plane. "Isolated" means the backup cannot be deleted or encrypted by the same identity that compromised the primary — a ransomware-resistant design. Every provider now offers an object-lock or immutable-vault primitive that enforces this isolation cryptographically: AWS S3 Object Lock in Governance or Compliance mode (Compliance mode cannot be disabled even by the root account during the retention period), Azure Backup immutable vaults with locked policies, GCS retention policies with bucket locks, and OCI Object Storage retention rules with locked time-bound retention. Backup encryption MUST use keys distinct from the primary data keys — otherwise compromise of the primary key compromises the backup.
Recovery is the part that fails most often because it is rarely tested. Every backup policy in this corpus is paired with a documented restore runbook, a recovery point objective (RPO — the maximum acceptable data loss measured in time), and a recovery time objective (RTO — the maximum acceptable downtime). Restore is exercised at least quarterly against a non-production target; an untested backup is not a backup. Ransomware-specific recovery requires the additional discipline of validating that the immutable-backup copy itself has not been silently corrupted before encryption — CISA's StopRansomware guidance documents the pre-recovery integrity-check pattern. Cross-link to incident response for the full recovery workflow inside an active incident.
[Diagram placeholder]
Figure 2 — Backup immutability flow: primary data resides in the workload account; nightly snapshots replicate to a backup vault in a dedicated security account; the backup vault enforces object lock in Compliance mode for the retention period such that neither the workload account principals nor the security account principals can shorten or delete retention until the lock expires. A second cross-region copy provides geographic resilience.
Data loss prevention
Data loss prevention (DLP) closes the gap between intent (classification) and reality (where regulated data actually lives). Every cloud has a DLP scanning service: Amazon Macie (S3 sensitive-data discovery; identifies PII, financial data, credentials), Microsoft Defender for Cloud + Microsoft Purview (cross-workload data discovery and classification; integrates with sensitivity labels), Google Cloud DLP (deep content inspection for over 150 information types across storage, BigQuery, and streams), and Oracle Data Safe (database-focused discovery with sensitive-data masking). The choice between content inspection (the scanner reads payload bytes; highest fidelity, highest cost) and metadata-only sampling (the scanner reads object names, sizes, tags, and a content sample; lower fidelity, lower cost) is a per-classification decision.
DLP findings feed into the detection pipeline rather than acting in isolation. A Macie finding, a Defender for Cloud sensitivity-label alert, a Cloud DLP scan job summary, or a Data Safe sensitive-data discovery report becomes a control-plane event ingested by the SIEM — see general/logging.html for the alert-routing pattern. The detection feeds back into classification: data discovered in a location that does not match its sensitivity tag triggers a re-classification or relocation workflow.
Cross-provider equivalence
The four providers implement the same data-protection primitives under different names. The table below maps the principles in this page to the provider-native service that delivers them. Each provider deep-dive — aws/data.html, azure/data.html, gcp/data.html, oci/data.html — carries the per-service configuration detail.
OCI Object Storage retention rules with retention-rule lock
Data loss prevention / sensitive-data discovery
Amazon Macie
Microsoft Defender for Cloud + Microsoft Purview
Google Cloud DLP (Sensitive Data Protection)
Oracle Data Safe
Illustrative control — CMK with annual rotation
The control-box below is an illustrative example of the markup pattern every provider data page applies. It is not a production control entry — provider pages carry CLI and IaC remediations specific to each cloud — but the threat-model framing, severity reasoning, and compliance-mapping pattern transfer directly. The illustrative ID gen-data-ex-01 is reserved and is not reused as a real control identifier.
gen-data-ex-01
Customer-managed key with annual rotation for storage at rest
⛔CRITICALPREVENTIVE
CRITICAL because absence directly enables data exfiltration via the provider plane — a single-step path from key-access compromise to plaintext recovery, requiring no additional vulnerability or pivot. PREVENTIVE because the configured key policy and rotation schedule stop the unsafe state (provider-controlled key, indefinite rotation) from existing rather than detecting it after the fact. Maps cross-provider to CIS AWS Foundations v3.0.0 (KMS customer-managed-key rotation), CIS Microsoft Azure Foundations v3.0.0 (Key Vault key rotation), CIS GCP Foundation v4.0.0 (CMEK rotation), CIS OCI Foundation v2.0.0 (Vault key rotation), NIST SP 800-53 rev5 SC-12 (cryptographic key establishment), SC-13 (cryptographic protection), and SC-28 (protection of information at rest), ISO/IEC 27001:2022 A.8.24 (use of cryptography), and ISO/IEC 27017:2015 CLD.10.1.2 (key management in cloud services).