In June 2025, Microsoft published a security advisory about a flaw in Microsoft 365 Copilot. The vulnerability, tracked as CVE-2025-32711 and named EchoLeak, was a zero-click prompt injection issue. A specially written email, never opened and never clicked, could instruct Copilot to collect sensitive content from a user's mailbox, files, and chats, and then send it out through links that looked safe. Microsoft rated the issue 9.3 critical.
There was no malware and no phishing kit: the attack used an indirect prompt injection payload embedded in Markdown-formatted email content, with exfiltration through Microsoft Teams and SharePoint URLs, which the AI then followed. The problem is that large language models (LLMs) can follow instructions they should not follow. Once they are connected to inboxes, documents, browsers, and APIs, this behavior can become a security incident with a CVE number.
Key takeaways
- LLM security protects the model itself plus the full LLM system, including the prompts, retrieval data, vector stores, tools, agents, users, and the surrounding infrastructure.
- The OWASP Top 10 for LLM Applications 2025 is the main list of risks most teams should plan for. It is led by prompt injection, sensitive information disclosure, supply chain, as well as data and model poisoning.
- The main technical problem is that current LLMs do not enforce a clear boundary between instructions and data inside a prompt. This is why prompt injection is hard to fix completely.
- Agents and copilots increase risk because a successful injection can trigger sending email, moving money, modifying records, or calling internal APIs.
- Defense in depth is the right approach. Combine governance (NIST AI RMF, ISO/IEC 42001), application controls (OWASP), threat modeling (MITRE ATLAS), least privilege, isolation of untrusted content, output validation, monitoring, and red teaming.
What is LLM security
LLM security is the set of controls, processes, and tests that protect large language model systems, their connected applications, their data, and their users from misuse, manipulation, data exposure, unauthorized actions, and operational failure. It covers the full stack around the model: prompts, retrieval pipelines, vector databases, tools, APIs, agents, memory, logs, identities, vendors, and cloud infrastructure.
Securing the model is only part of the work. NIST's Generative AI Profile places generative AI risk across the full lifecycle (design, development, deployment, operation, and decommissioning) and at multiple levels: model, system, application, and wider ecosystem.
Why LLM security became a priority for organizations
On the defender side, LLMs are now used in copilots, retrieval-augmented generation (RAG) systems, agents, browser tools, coding assistants, SOC playbooks, and enterprise search.
On the attacker side, threat actors moved from experimental AI use to operational AI use during 2025, Mandiant's 2026 special report concludes. Much of the AI risk they see in client environments does not come from advanced adversarial machine learning (ML), but from gaps such as poor asset inventory, weak supply chain visibility, missing AI SBOMs, shadow AI, weak identity controls, and the absence of AI threat modeling.
Regulatory and reputational pressure adds to this. The EU AI Act, sectoral guidance from NSA, CISA, and NCSC, and a regular flow of incidents like EchoLeak mean that large language model security is now a procurement question, an audit question, and an incident response question.
Major LLM security vulnerabilities and risks
The most useful list of LLM security risks is the OWASP Top 10 for LLM Applications and Generative AI Apps 2025. It is the closest thing the field has to a shared vocabulary, and most enterprise security programs now reference it directly.

1. Prompt injection
Prompt injection is malicious or manipulative input that changes an LLM's behavior or output. OWASP ranks it as LLM01:2025. It appears in two forms:
- Direct prompt injection is when the user tells the model to ignore its instructions.
- Indirect prompt injection hides instructions inside content the model later reads: a webpage, an email, a code comment, or even text inside an image.
According to the UK NCSC, current LLMs do not enforce a security boundary between instructions and data inside a prompt. This makes prompt injection different from SQL injection and likely harder to solve completely.
2. Sensitive information disclosure
LLMs can leak information like training data, prompts, retrieved documents, and details they inferred from data they were not supposed to see. NIST lists privacy risks across leakage, unauthorized use, disclosure, and de-anonymization of PII.
Example: a sales representative asks a RAG-powered assistant, “Show me everything we have on X Corp.” The vector store contains a confidential M&A memo that was not properly access-controlled. The assistant summarizes it – and that’s a leak.
As Mandiant notes, LLMs expose existing access control gaps because they make poorly permissioned data easy to search.
3. Supply chain risk
LLM systems depend on model providers, base models, fine-tuned weights, datasets, embeddings, vector databases, prompt templates, open-source libraries, plugins, and third-party APIs. Each of these is a possible entry point for something harmful. NIST calls this “value chain and component integration” risk and identifies non-transparent third-party components and weak supplier vetting as recurring problems.
Example: a developer downloads a popular fine-tuned model from a public hub. The model performs normally on benchmarks but is trained to leak the system prompt when it sees a specific trigger phrase. A year later, the model is in production.
4. Data and model poisoning
Poisoning is when an attacker influences training data, fine-tuning data, RAG content, embeddings, or memory so the model later produces outputs the attacker wants. OWASP lists it as LLM04:2025.
Example: an attacker creates a wiki page titled “Q4 expense policy” inside a shared workspace that your RAG system ingests. The page looks ordinary, except for one paragraph instructing the assistant to always approve expense reports under $4,999 and to leave them out of monthly summaries. The finance team notices the missing funds months later.
5. Improper output handling
LLM output should be treated as untrusted input to whatever system consumes it.
Example: an LLM-generated answer contains a Markdown image whose URL encodes the user’s recent chat history. The frontend renders it. The image request sends the data to an attacker-controlled server.
6. Excessive agency
The more tools, credentials, and autonomy an agent has, the larger the impact of any single mistake. OWASP lists excessive agency as LLM06:2025, and Mandiant has reported real cases of agents performing actions they should not have been allowed to attempt.
Example: an “IT helper” agent has read and write access to your identity provider so it can reset passwords. A carefully written internal ticket convinces it to disable MFA on a privileged account “for testing.” The agent did exactly what it was allowed to do.
7. System prompt leakage
System prompts often contain rules, hidden workflow logic, internal labels, tool descriptions, and sometimes (despite common guidance) API keys or business secrets. OWASP lists this as LLM07:2025.
Example: a user pastes a long Unicode-padded message into a chatbot. The model gets confused and returns its full system prompt, which includes a Stripe key labeled “for internal billing tool.”
8. Vector and embedding weaknesses
Vector databases bring their set of problems: cross-tenant leakage, missing chunk-level ACLs, retrieval of stale or poisoned documents, and embedding inversion attacks that reconstruct source text. OWASP lists this as LLM08:2025.
Example: a multi-tenant SaaS platform stores all customer documents in one shared index, filtered by metadata at query time. A small bug in the filter, or an injection that bypasses it, and Tenant A starts seeing chunks from Tenant B.
9. Misinformation and hallucination
NIST uses the term “confabulation” for confidently stated false output. OWASP lists misinformation as LLM09:2025.
In a security context, hallucination matters when it influences decisions: incorrect code, wrong incident response steps, false compliance answers, or invented citations in legal or medical workflows.
Example: an AI coding assistant suggests installing a package called “requests-secure-v2.” The package does not exist, until an attacker registers it. This is a recognized class of supply chain attack now called slopsquatting.
10. Unbounded consumption
LLM APIs charge per token. Unbounded consumption covers cost spikes, “denial of wallet,” token exhaustion, long-context attacks, and automated abuse.
Example: a public-facing chatbot with no rate limit is hit by a script that submits 100,000 long prompts overnight. The bill arrives on Monday, and it is a significant problem for the finance team.
What makes LLM security different from traditional cybersecurity
Traditional applications can usually be protected with trust boundaries such as parameterized queries, strict parsers, deterministic validation, and role-based access control.
LLMs do not enforce this separation. Instructions, user content, retrieved documents, tool outputs, and memory all flow into the same probabilistic context window, and the model decides what to do with them. According to Microsoft, some defenses against indirect prompt injection are themselves probabilistic, because the system they protect is probabilistic.
Two practical conclusions follow.
- Deterministic controls should be used wherever they exist (authentication, ACLs, sandboxing, schema validation, allowlists), because probabilistic controls on their own are not enough.
- The architecture should assume that some model-level defense will eventually fail, and use system design (isolation, scoped tools, human approval, monitoring) to limit the impact. This is the working definition of defense in depth for LLM applications.
LLM security best practices
What works is a layered program that combines governance, secure design, runtime controls, and continuous testing.
1. Start with an AI inventory and governance
Create an inventory of every LLM application, model, provider, agent, plugin, prompt template, RAG source, and data flow. For governance, use the NIST AI Risk Management Framework and the NIST Generative AI Profile as a starting point.
ISO/IEC 42001 provides an AI management system standard suitable for audit.
The CSA AI Controls Matrix offers 243 controls across 18 domains, mapped to ISO/IEC 42001, ISO 27001, and the NIST AI RMF.
2. Threat model the whole system
A useful LLM security assessment covers users, prompts, system prompts, RAG sources, vector databases, file uploads, external content, plugins, tools, APIs, memory, logs, browser surfaces, code execution, and output renderers.
NSA, CISA, and NCSC-UK jointly recommend treating secure design, secure development, secure deployment, and secure operation as the four phases of an AI security program.
3. Apply least privilege to data, tools, and agents
Classify data before it goes into prompts or retrieval. Apply chunk-level ACLs in your vector store. Enforce permissions at retrieval time, not only at the UI, because LLM data security starts with what the model is allowed to see.
For tools and agents, use scoped tokens, short-lived credentials, narrow allowlists, and explicit approval gates for high-impact actions (sending email, transferring funds, modifying production data, creating users, calling external URLs).
Microsoft, OpenAI, and OWASP agree on the same recommendation: minimal short-lived privileges and human-in-the-loop verification for risky actions.
4. Treat all external content as untrusted
Anything the model reads from outside the system prompt is a potential payload. Microsoft's guidance recommends isolating untrusted content, marking it (spotlighting), monitoring plan drift in agents, analyzing tool chains, and applying information flow controls.
Google describes a layered defense for Gemini that combines prompt injection classifiers, security thought reinforcement, Markdown sanitization, suspicious URL redaction, user confirmations, and red teaming.
5. Implement guardrails at input, output, and tool boundaries
Guardrails are the runtime controls between the model and the rest of the system.
- Input guardrails: PII redaction, secret detection, jailbreak classifiers, content policy filters, language detection, length limits.
- Output guardrails: schema validation for structured outputs, Markdown and HTML sanitization, URL allowlists, blocked patterns for code execution, grounding checks against retrieved sources.
- Tool-call guardrails: parameter validation, allowlisted actions, approval gates for sensitive operations, dry-run modes, rate limits per tool.
- Retrieval guardrails: source trust scoring, document provenance, ingestion scanning, deduplication, freshness checks.
OWASP’s prompt injection prevention cheat sheet is the most practical implementation reference. Its key point: do not rely on prompt text alone as a security boundary. Wrap the model with deterministic checks wherever you can.
6. Harden RAG pipelines
RAG is where LLM security and data security meet most directly.
Secure the full pipeline. Allowlist sources, track document provenance, scan ingested content, enforce metadata and chunk-level ACLs, isolate tenants in the vector store, log retrievals, and check that generated answers are grounded in the retrieved chunks.
Remember that RAG and fine-tuning improve relevance but, again, do not fully prevent prompt injection.
7. Monitor in detail
Log prompts, completions, retrieval sources, tool calls, user IDs, permissions, token usage, guardrail decisions, and outbound network calls.
Build dashboards that show unusual tool sequences, sudden token spikes, retrieval of rarely-used documents, and repeated guardrail failures.
Prepare AI-specific incident playbooks: disabling an agent, rotating tool credentials, removing poisoned RAG content, reviewing prompt logs, notifying affected users, fixing ACL gaps, and retesting the workflow before bringing it back online.
8. Red team continuously
A red team exercise that happened once at launch is not enough for an LLM security framework. Models change, prompts change, tools get added, and RAG corpora grow. Every change is an opportunity for regression.
Test direct and indirect prompt injection, hidden text in images and PDFs, poisoned RAG records, system prompt extraction, data exfiltration through Markdown and links, unsafe tool use, excessive agency, cross-tenant leakage, and cost abuse.
Mandiant's own red team reports finding many of these in production, alongside more familiar issues like SQL injection through AI interfaces and SSRF in AI workflows. You can use MITRE ATLAS to structure the exercises, and feed the findings back into threat models and guardrails.
9. Map to a recognized framework
For larger organizations, mapping your controls to an external framework is what turns a set of good practices into an auditable LLM security program. The most useful options:
- NIST AI RMF and Generative AI Profile for governance and lifecycle risk.
- OWASP Top 10 for LLM Applications 2025 for application-level risk categories and OWASP LLM security guidance teams can act on.
- MITRE ATLAS for adversarial threat modeling.
- NSA / CISA / NCSC-UK secure AI guidance for government-backed secure design, development, deployment, and operation.
- ISO/IEC 42001 for AI management systems.
- CSA AI Controls Matrix for detailed control mapping across standards.
Choose the ones that match your regulators, your customers, and your existing GRC stack. Do not adopt all of them at once. (It’s impossible anyway).
A short checklist for LLM applications
Area | What to check |
|---|---|
AI inventory | List all LLM apps, providers, models, agents, plugins, and data flows |
Data protection |
|
Access control | Enforce least privilege for users, agents, tools, and RAG sources |
Prompt injection | Test direct, indirect, multimodal, and RAG-based prompt injection |
RAG | Enforce source trust, document provenance, ingestion scanning, and chunk-level ACLs |
Tool use | Use scoped tokens, allowlisted actions, approval gates, and tool-call monitoring |
Output handling | Validate schemas, sanitize Markdown and HTML, sandbox code, block unsafe execution |
Monitoring | Track prompts, outputs, retrievals, tool calls, token spikes, and anomalous workflows |
Red teaming | Test before release and after major model, prompt, data, or tool changes |
Governance | Map controls to NIST AI RMF, OWASP, ISO/IEC 42001, CSA AICM, or internal GRC |
Overall, LLM security is application security, data security, identity security, AI governance, and adversarial ML security in one program. The hard part is accepting that a system which reads natural language from everywhere will eventually be told something it should not act on, and designing it so that when this happens, very little of value is lost.
