GenAI Security Principles
Overview
This page establishes the authoritative, provider-neutral security principles for managed generative AI and large language model (LLM) APIs. Scope is limited to cloud-hosted managed model APIs: AWS Bedrock, Azure OpenAI Service, GCP Vertex AI (Gemini API), and OCI Generative AI Service. Self-hosted models, on-premises inference servers, and cloud training platforms (SageMaker, Azure ML, Vertex AI Training) are explicitly out of scope.
This page does not contain provider-specific controls. For controls, see the provider pages: Azure OpenAI Service Hardening (live — Phase 13 pilot); AWS Bedrock Hardening, GCP Vertex AI Hardening, and OCI Generative AI Hardening arrive in Phase 14. The principles on this page apply uniformly across all four providers; the provider pages translate each principle into auditable configurations, CLI commands, and IaC.
Before reading the provider GenAI pages, security engineers should be familiar with the three foundational cross-cutting domains: Identity and Access Management (workload identities, least-privilege access), Network Security (private endpoints, egress controls), and Logging and Monitoring (audit trail, anomaly detection). GenAI hardening extends these foundations — it does not replace them. A misconfigured IAM role or a missing private endpoint is still critical even when content-safety guardrails are enabled.
Threat Model
LLM-based systems present a fundamentally different threat surface than traditional cloud workloads. In a conventional data store or compute deployment, the application logic is authored and deployed by the developer; the attack surface is the API boundary, the network, and the credentials. In an LLM-based system, the model itself becomes a dynamic execution environment: it interprets natural language instructions from both the developer (system prompt) and the user (completion request), and — in agentic configurations — it issues tool calls that interact with real services. An attacker who can influence model inputs can influence model outputs and, in agentic systems, downstream actions. This is the core novelty of the LLM threat surface.
RAG pipelines combine two of these threats in a compound attack chain. LLM01:2025 (indirect prompt injection) and LLM08:2025 (vector and embedding weaknesses) interact when an attacker controls content that enters the knowledge base. The attacker poisons the retrieval index during ingestion (LLM08:2025), and those poisoned chunks are later retrieved and injected as adversarial instructions into the model context at inference time (LLM01:2025). The defence must address both stages: access-controlled ingestion pipelines with content provenance checks, and differential trust treatment of retrieved content at inference time (retrieved chunks should be treated as untrusted user input, not as trusted system-level instructions).
Cross-Cutting Principles
Nine architecture-level principles apply regardless of which managed model API you deploy. Each principle is stated once here and referenced from provider pages. Provider pages translate these principles into concrete configurations, CLI commands, and IaC — they do not redefine the principles themselves.
- 1. Input filtering and validation
- Validate and sanitise all user-supplied text before it reaches the model. Apply a content-safety classifier or prompt injection detector at the application layer before model invocation. Do not rely solely on provider-managed safety filters at model inference time — an application-layer check provides an independent, earlier defence that catches attacks before they consume model tokens or trigger potentially harmful model outputs.
- 2. Output filtering
- Apply content safety checks and PII redaction to model responses before returning them to the caller. Model outputs are untrusted data: they may reproduce memorised PII, generate harmful content that bypassed inference-time filters, or contain injection payloads designed to be executed by a downstream component (LLM05:2025). Treat model output as you would user-supplied input when passing it to other system components.
- 3. System-prompt isolation
- Treat the system prompt as trusted configuration, not as a user-addressable surface. Never include secrets (API keys, tokens, connection strings) in the system prompt — model extraction attacks can surface them. Enforce system-prompt isolation through provider-level controls (e.g., role separation in the OpenAI message format, Bedrock system-prompt role, Vertex AI system instruction field) rather than trusting the model to protect prompt confidentiality. Assume the system prompt will eventually be extracted; design it to be safe to disclose.
- 4. Content-safety guardrails
- Configure harm-category safety filters explicitly at recommended severity thresholds for your workload. Do not rely on provider defaults, which may be permissive or change without notice. Multiple independent layers are required: provider-managed inference-time filters are one layer; application-layer input and output checks are additional layers. Content filters are not a complete prompt-injection defence — they reduce the attack surface but cannot eliminate it.
- 5. Tool-use authorisation
- Scope agent tool permissions to the minimum specific resources and actions required for the task. Validate all tool invocations server-side before execution — do not let the LLM's tool-call output be executed without an authorisation check at the application layer. Treat every tool invocation as if it were initiated by an untrusted caller. This is the primary control against excessive agency (LLM06:2025) and the principal mitigation for the agentic AI blast-radius failure mode.
- 6. Rate limiting and quota management
- Apply per-user or per-application token and request quotas to prevent unbounded consumption (LLM10:2025). Instrument token usage per caller and alert on abnormal consumption patterns that suggest token-farming, automated abuse, or runaway inference loops. Rate limits at the application API gateway layer provide an additional check independent of provider-level quotas, which are typically per-deployment rather than per-caller.
- 7. Prompt and completion logging with PII redaction
- Log all model invocations with caller identity, timestamp, and request metadata for audit and anomaly detection. Redact PII from prompts and completions before log storage — not as a post-processing step, but as a gate before any log write. Raw unredacted prompts in logs create a secondary exfiltration surface: a log storage misconfiguration or over-privileged analyst can access every user input sent to the model. The logging pipeline and the model invocation pipeline carry equal data-sensitivity risk.
- 8. Data-residency for embeddings
- Confirm that the vector store and embedding compute are processed in the same geographic region as required by your primary data classification for the source documents. RAG pipelines move data across two additional processing stages (embedding generation and vector storage) beyond the model invocation itself — each stage must satisfy the data-residency requirements that govern the source documents. Providers offer region-locked embedding endpoints; verify the configuration rather than accepting defaults.
- 9. PII redaction before model invocation
- Strip or tokenise PII from user input before sending it to the model. This prevents PII from appearing in completions (LLM02:2025) and in prompt logs, and limits the data-sensitivity of the inference request itself. Reversible tokenisation (replacing PII with stable tokens before the prompt and substituting back in the response) allows PII-bearing applications to use managed model APIs without exposing PII to the model provider's inference infrastructure.
Common Misconfigurations
These five patterns appear protective but weaken your GenAI security posture. Each has been observed in production environments.
Enabling model invocation logging without a PII filter routes raw unredacted prompts to CloudWatch Logs, Log Analytics, or Cloud Audit Logs. Every user message, including any PII or sensitive context the user typed, is written verbatim to the log destination. Prompt logs become a secondary exfiltration surface with different access controls and retention policies than the primary application. A log-storage misconfiguration, an over-privileged analyst account, or a log-forwarding misconfiguration to a SIEM can expose user conversations at scale.
Remediation: Apply PII redaction before log storage — as a pre-write gate, not a post-processing step. Configure the logging pipeline to tokenise or mask PII fields before any log record is written to a durable destination.
Setting harm-category safety filters to BLOCK_NONE to reduce false positives eliminates the provider-managed output moderation layer entirely. Any jailbreak, adversarial prompt, or unintentional harmful completion produces unfiltered output that is returned to the caller. Operators frequently set BLOCK_NONE during development to reduce iteration friction, then leave it in place in production. A single misconfigured deployment with BLOCK_NONE becomes the entry point for adversarial users who test filters systematically.
Remediation: Tune thresholds rather than disabling. BLOCK_MEDIUM_AND_ABOVE is the minimum recommended setting for regulated contexts. Maintain separate deployment configurations for development and production environments with different filter thresholds.
Using a single shared API key or shared service account across development, staging, and production environments means a compromised development credential carries production blast radius. Audit logs cannot distinguish per-workload or per-environment activity, breaking forensic traceability. A developer workstation with the shared API key stored in a dotfile or IDE config is a direct path to production model access.
Remediation: Use per-workload, per-environment managed identities or IAM roles — never shared credentials. Each environment (development, staging, production) must have distinct identities with distinct audit trails and distinct permission scopes.
Granting an AI agent wildcard tool permissions — s3:*, lambda:*, Contributor role — "for flexibility" enables any successful prompt injection on the agent's tool-use execution path to perform arbitrary destructive or exfiltrating actions. The agent is the attack amplifier: a single injected instruction turns the agent's granted permissions into the attacker's effective permissions. Configuring wildcard tool permissions on agents is the primary agentic AI failure mode identified in OWASP LLM Top 10:2025. Every penetration test of an agentic AI system with broad permissions finds this path exploitable.
Remediation: Scope execution roles to the minimum specific resources required for each tool. Validate tool invocations server-side before execution. Treat the LLM's tool-call output as untrusted input that must pass an authorisation check before any action is taken.
Applying for the Limited Access exemption to disable Microsoft's human review of flagged completions — citing privacy or latency concerns — without alternative detection controls removes a compensating layer that catches attack patterns automated classifiers miss. Abuse monitoring human review exists specifically to identify novel jailbreaks, prompt-injection campaigns, and policy-violation patterns before they are formalised into automated detectors. Removing it creates a detection gap during the interval between novel attack emergence and detector update.
Remediation: Keep default abuse monitoring enabled. If disabling human review is a documented regulatory requirement, implement Defender for Cloud AI workload alerts and a structured incident-review process as compensating controls, and document the risk acceptance formally. Do not disable without a compensating control in place.
OWASP LLM Top 10:2025 Taxonomy
The OWASP LLM Top 10:2025 (published November 2024) supersedes the 2023 edition (v1.1). Provider pages in this guide map controls to stable LLMxx:2025 IDs. LLM07:2025 (System Prompt Leakage) and LLM08:2025 (Vector and Embedding Weaknesses) are entirely new entries in the 2025 edition — they do not exist in the 2023 list. The 2023 entry for position 7 was "Insecure Plugin Design" (a concept now subsumed by LLM06:2025 Excessive Agency); that 2023-edition mapping is incorrect when applied to the 2025 edition. Always verify which edition a mapping cites before using it as an audit reference.
| ID | Name | Brief description |
|---|---|---|
| LLM01:2025 | Prompt Injection | Direct and indirect prompt injection attacks overriding or supplementing model instructions via user input or retrieved context. |
| LLM02:2025 | Sensitive Information Disclosure | Model reproduces sensitive data, PII, credentials, or proprietary information from training memorisation or prompt context. |
| LLM03:2025 | Supply Chain | Vulnerabilities introduced via third-party models, datasets, plugins, or dependencies in the model delivery and deployment chain. |
| LLM04:2025 | Data and Model Poisoning | Malicious injection into training data, fine-tuning datasets, or model weights that corrupts behaviour for specific input patterns. |
| LLM05:2025 | Improper Output Handling | Downstream components processing LLM output without adequate validation, enabling XSS, SSRF, code injection, or command execution. |
| LLM06:2025 | Excessive Agency | LLM agents granted excessive permissions or autonomy executing unintended, destructive, or exfiltrating actions via tool calls. |
| LLM07:2025 | System Prompt Leakage | Disclosure of confidential system prompt contents to users via adversarial extraction queries. New entry in 2025 edition. |
| LLM08:2025 | Vector and Embedding Weaknesses | RAG pipeline manipulation via poisoned embeddings, adversarial retrieval, or vector-store access control failures. New entry in 2025 edition. |
| LLM09:2025 | Misinformation | LLM generates plausible but factually false or misleading information with downstream security or compliance consequences. |
| LLM10:2025 | Unbounded Consumption | Excessive resource use, denial of service, or cost-exhaustion attacks via token farming, runaway inference chains, or quota abuse. |
Each provider page in this guide maps its controls to these IDs in the compliance table column labelled "OWASP LLM Top 10:2025". Source: OWASP Top 10 for LLM Applications 2025 (accessed 2026-05).
EU AI Act: Provider vs. Deployer Obligations
The EU AI Act (Regulation (EU) 2024/1689) creates distinct obligations for cloud providers acting as general-purpose AI (GPAI) model providers and for enterprises that deploy AI APIs in their applications as deployers of high-risk AI systems. The enforcement timeline is staggered — the articles most relevant to cloud security hardening entered force at different dates spanning 2025 to 2026.
| Obligation | Article | Who it applies to | In force date |
|---|---|---|---|
| GPAI model provider transparency: technical documentation, training-data summary, copyright policy | Art. 53 (in force 2025-08-02) | Cloud providers placing GPAI models on the EU market (AWS, Azure, GCP, OCI) | 2025-08-02 |
| GPAI systemic-risk provider controls: adversarial testing, systemic risk assessment, serious incident reporting, cybersecurity measures for models trained with >1025 FLOPs | Art. 55 (in force 2025-08-02) | Major cloud providers whose foundation models qualify as systemic-risk GPAI | 2025-08-02 |
| High-risk AI deployer risk management: use per provider instructions, human oversight, input data management, impact assessment, incident monitoring, log retention ≥ 6 months | Art. 26 (in force 2026-08-02) | Enterprises deploying managed AI APIs in high-risk use cases (as defined in Annex III) | 2026-08-02 |
| High-risk AI system transparency to users: instructions for use enabling deployers to understand capabilities and limitations | Art. 13 (in force 2026-08-02) | Enterprises deploying in high-risk contexts; also obligations on providers to supply documentation | 2026-08-02 |
Art. 55 (in force 2025-08-02) for controls where the cloud provider holds the Art. 55 obligation, and Art. 26 (in force 2026-08-02) where the enterprise deployer holds the obligation.
Provider page compliance table cells follow the pattern: Art. 55 (in force 2025-08-02) for controls where the cloud provider holds the Art. 55 GPAI obligation, and Art. 26 (in force 2026-08-02) where the deployer holds the obligation under Art. 26. Every Art. reference in compliance cells includes the enforcement-date qualifier — a bare "Art. 26" without a date qualifier does not satisfy the audit-precision standard used throughout this guide.
Reading the Provider Pages
The provider GenAI pages translate the principles on this page into auditable, provider-specific controls. Each control article identifies the threat it addresses (by LLMxx:2025 code), the severity of the gap it closes, and the remediation steps including CLI commands and IaC. Compliance table columns on provider pages follow the 10-column GenAI schema: the four CIS Foundations benchmarks (marked n/a (no dedicated CIS GenAI benchmark) as no CIS benchmark exists for managed LLM APIs as of 2026-05), NIST SP 800-53 rev5, ISO/IEC 27001:2022, ISO/IEC 27017:2015, OWASP LLM Top 10:2025, NIST AI 600-1 (Jul 2024), and EU AI Act (2024/1689).
Azure OpenAI Service Hardening is the Phase 13 pilot page and is currently live. It covers nine controls addressing all AZOPENAI-01 through AZOPENAI-09 requirements, including Entra ID authentication enforcement, Azure AI Content Safety Prompt Shields, content filter baseline configuration, private endpoint, RBAC least-privilege, diagnostic logging, customer-managed key encryption, quota and token rate limiting, and abuse monitoring configuration.
AWS Bedrock Hardening, GCP Vertex AI Hardening, and OCI Generative AI Hardening are forthcoming in Phase 14. Phase 14 will follow the same 10-column compliance table schema validated by the Azure pilot and will add cross-provider equivalence links between all four provider GenAI pages. During Phase 13, equivalence link placeholders appear as HTML comments in the provider page source; live hrefs will be added in the Phase 14 sealing wave once all anchor targets exist.
Each control on the provider pages carries an equivalence callout noting the analogous control on sibling provider pages. These callouts are populated progressively as provider pages are authored — Phase 14 completes the full cross-provider equivalence map.