What is generative AI security?

In 2025, a zero-click Microsoft 365 Copilot data-leak vulnerability was found: a single malicious email could inject hidden instructions and quietly cause Copilot to exfiltrate sensitive organizational data from the user’s context.

That kind of incident shows why generative AI security has become its own discipline. It combines classic cybersecurity with new controls for models, prompts, and training data.

Key takeaways

Gen AI security protects generative AI models, applications, training data, infrastructure, outputs, and users across the full AI lifecycle.
It is broader than AI safety: safety covers harm, bias, and reliability, while gen AI cybersecurity covers operational risk such as data leaks, model abuse, and unsafe agent actions.
The defining risk is prompt injection, because large language models (LLMs) do not reliably separate trusted instructions from untrusted data inside a prompt.
Other generative AI security risks include sensitive data disclosure, supply chain compromise, data poisoning, misinformation, runaway costs, etc.
Effective protection is defense in depth: governance, secure development, generative AI data security, access controls, as well as input and output validation.

What is generative AI in cybersecurity?

Generative AI security is the practice of protecting generative AI models, AI applications, data, infrastructure, outputs, and users from cyber threats, misuse, data exposure, model manipulation, unsafe tool actions, and compliance failures across the AI lifecycle. It applies to every stage: from design and development to deployment, operation, and decommissioning, which is the lifecycle framing used in the NIST Generative AI Profile.

It also helps to separate generative AI security from two related ideas.

AI safety covers wider harm prevention, bias, reliability, harmful content, societal risk.
AI governance covers policy, accountability, and risk management.

Generative AI security focuses on cyber and operational risk: confidentiality, integrity, availability, privacy, system misuse, model abuse, agent misuse, data leakage, supply chain risk, and incident response.

The term covers several sub-areas:

Large language model (LLM) security. That means protecting both the model and the apps built on top of it from prompt injection, jailbreaks, model extraction, or unsafe outputs.
AI prompt security. It requires treating prompts as untrusted data, with input validation and monitoring.
Generative AI data security covers protecting training data and outputs from leakage, poisoning, or unauthorized access.
AI API security means protecting the APIs that expose models.
AI code security. Reviewing AI-generated code for vulnerabilities and licensing issues, and securing the developer workflows that use AI coding assistants.
Finally, AI agent security. It means controlling what autonomous agents can read, call, send, and change, plus logging and human approval for high-impact actions.

Organizations can use these 6 categories to divide the work between application security, data security, cloud security, and identity teams.

What are the major security risks and vulnerabilities associated with generative AI?

Generative AI security risks come in two groups: traditional risks that apply to any software, and novel risks that come from how generative AI systems work. The clearest view of generative AI security issues is the OWASP Top 10 for LLM Applications, with extra context from NIST, NCSC, and joint government guidance.

The main risks of generative AI are:

Prompt injection — hidden or malicious instructions in user input or in retrieved content steer the model to ignore its rules, leak data, or misuse tools.
Sensitive information disclosure. Models or apps can reveal personal data or confidential business information through prompts, outputs, or logs.
Supply chain risk. Third-party models and plugins can be tampered with or poisoned.
Data and model poisoning. Hackers can manipulate training data to insert bias or backdoors.
Improper output handling. Apps sometimes trust model output too much and pass it straight into databases or APIs without validation.
Excessive agency. Agents might have too broad permissions or no human approval for high-impact actions.
System prompt leakage. Internal instructions and security logic from the system prompt can end up in the user’s hands.
Vector and embedding weaknesses. RAG knowledge bases and vector databases can get poisoned or used to leak content across users.
Misinformation and confabulation. Fabricated outputs can cause different kinds of damage.
Unbounded consumption. Excessive token use and denial-of-wallet attacks can lead to runaway cloud bills or service outages.

A single incident can combine several genAI security risks: for example, an indirect prompt injection leaks the system prompt, causes the agent to misuse a tool, and then exfiltrates customer data through an unvalidated output channel. That's why defense in depth is the only reliable approach.

How has generative AI affected security?

For defenders, generative AI can speed up code review, vulnerability research, incident triage, threat analysis, patch development, you name it. For attackers, it can reduce the skill needed for phishing, reconnaissance, malware adaptation, exploit development, and abuse of AI-enabled tools.

The concern became sharper in 2026 with Claude Mythos: Anthropic chose not to release Mythos publicly because the model was capable at finding software vulnerabilities and turning them into working exploits. Some media describe the shift as AI becoming “an automated hacker.”

Instead of a broad release, Mythos was made available behind closed doors to a small set of trusted partners so they could use it defensively: find flaws, disclose them responsibly, and patch systems before similar capabilities spread more widely.

It is the core security change generative AI creates: it compresses the time between vulnerability discovery and exploitation. Attackers no longer need a zero-day to abuse your chatbot – a document on a public website may be enough. And with Mythos-class models, the next risk is that attackers may become much better at finding and weaponizing zero-days themselves.

GenAI security frameworks

There is no single standard for genAI security yet, but several frameworks have emerged that cover different angles: risk management, secure development, threats, controls, and management systems. A good program usually borrows from several of them.

OWASP Top 10 for LLM Applications. This is the most practical risk taxonomy for teams building or buying generative AI applications. Use it for application-level threats such as prompt injection, sensitive information disclosure, supply chain risk, and improper output handling.
NIST AI Risk Management Framework (AI RMF). A voluntary framework that helps organizations build trustworthiness into the design, development, use, and evaluation of AI products and systems. It is organized around four functions: Govern, Map, Measure, and Manage. The companion NIST AI 600-1 (Generative AI Profile) applies the framework to generative AI specifically.
NIST SP 800-218A. Adds AI-specific practices to the secure software development framework. It is the right starting point for engineering teams that build, fine-tune, or operate foundation models.
MITRE ATLAS. A knowledge base of adversary tactics, techniques, and case studies for attacks on AI-enabled systems, modeled on the better-known ATT&CK framework. Useful for threat modeling and red teaming.
Google Secure AI Framework (SAIF). It maps AI security across data, infrastructure, model, and application components, with a matching set of controls including input validation, output sanitization, least-privilege agent permissions, user approval for sensitive actions, and AI incident response management.

You do not need to adopt all of these. Many organizations end up with NIST AI RMF for governance, OWASP for application risks, MITRE ATLAS for threat modeling, and SAIF for control categories.

13 generative AI security best practices

There is no single product that solves the problem of generative AI security. The best practices combine guidance from organizations such as NIST, OWASP, NCSC, NSA, CISA, Google, and Microsoft. They focus on the usage risks that show up most often in real deployments.

1. Set policies for approved AI use

Decide which AI use cases are approved, which are prohibited, what data can and cannot go into AI tools, how models and vendors are reviewed, who owns risk for each system, and how incidents are handled. NIST recommends connecting these policies to existing risk processes rather than building a parallel rulebook.

A practical starting point is a one-page AI use policy that answers three questions for every employee: which tools are approved, what data is off-limits, and who to contact when in doubt.

2. Inventory your generative AI systems and data

Build an inventory of every AI app, model, dataset, RAG knowledge base, vector database, agent, plugin, and third-party provider in use. For each entry, record a business owner (who wants the tool), a data owner (who is responsible for the information), and a risk owner (who answers when something goes wrong).

3. Apply secure development practices to AI systems

Secure software development practices still apply, with a few additions for AI. NIST SP 800-218A extends the Secure Software Development Framework to generative AI and dual-use foundation models, with practices for data management, model development, evaluation, deployment, acquisition, and lifecycle controls.

In practice, that means version-controlling prompts and training data, reviewing model changes the way you review code changes, and documenting what each model was trained on.

4. Protect data across the AI lifecycle

A practical generative AI data security checklist:

Classify and label sensitive data before it reaches AI systems.
Keep credentials, customer records, source code, and regulated data out of prompts unless there is a documented business reason and a controlled channel.
Apply data loss prevention (DLP) to prompts, uploads, outputs, and logs.
Apply least-privilege access control to data used by AI apps, RAG systems, and agents.
Track data provenance: where each dataset came from, who owns it, what license applies, and what consent basis covers it.
Limit retention of prompts and responses to the minimum useful period.
Audit access to vector databases, embeddings, logs, and agent memory the same way you audit access to production databases.
Treat AI outputs as sensitive when they were generated from confidential context.

NIST also recommends monitoring AI-generated content for personal or sensitive data exposure, including detection methods for text, images, video, and audio.

5. Validate inputs and outputs

Treat user input, retrieved content, model output, and tool responses as untrusted. The Google SAIF controls recommend input validation, output validation, and output sanitization before model output reaches applications, extensions, or users.

OWASP also notes that RAG and fine-tuning do not fully fix prompt injection, and that defenses need regular updates as attackers learn new tricks.

6. Apply access control and least privilege to AI

Use managed identities instead of long-lived API keys.
Avoid hardcoded API keys in code, config files, or notebooks.
Rotate credentials on a schedule and after any suspected exposure.
Restrict which APIs, files, databases, and SaaS apps each AI system can reach.
Require human approval for sensitive actions such as production deployments, financial transactions, or mass data exports. Google SAIF specifically recommends user approval for actions that alter user data or act on a user's behalf.

7. Secure RAG and vector databases

RAG security deserves its own checklist because the knowledge base is now part of the trust boundary.

Apply access control to who can write to the knowledge base, not just who can read from it.
Validate sources and require trusted ingestion paths.
Check metadata and document signatures where possible.
Enforce document-level authorization, so a user only retrieves what they are entitled to see.
Monitor for poisoned or suspicious content, including documents with unusual instruction-like text.
Isolate vector store namespaces in multi-tenant environments.

8. Protect model and code integrity

Google SAIF recommends integrity protection for all data, models, and code used to produce AI models. It means signing model artifacts, restricting who can publish to internal model registries, and reviewing third-party models before use, especially when they come from public hubs.

For organizations that fine-tune models, the fine-tuning data deserves the same protection as production code. A poisoned fine-tuning dataset can introduce backdoors that are nearly invisible during normal testing.

9. Run AI red teaming and adversarial testing

Test generative AI systems. The NIST Generative AI Profile recommends red teaming against malicious code generation, enhanced phishing content, prompt injection, adversarial prompts, data poisoning, membership inference, and model extraction. Microsoft AI Red Team provides guidance and tools for similar tests.

10. Monitor, log, and detect

NCSC recommends logging enough information to identify suspicious activity, including LLM input and output, tool use, and API calls.

Feed logs into the same SIEM and analytics tools that handle the rest of the environment, and set alerts for the obvious patterns: sudden cost spikes, repeated failed tool calls, retrieval of sensitive documents by unusual identities, or output that contains things like API keys.

11. Manage supply chain and vendor risk

Vendor due diligence for generative AI should cover model provenance, training data sources, security certifications, support for private deployment, data handling commitments, incident response practices, and the right to audit.

NIST recommends approved provider lists, third-party risk monitoring, and contract clauses that allow evaluation of third-party generative AI processes and standards. Add AI-specific questions to existing vendor reviews instead of building a separate process.

12. Bring shadow AI under control

Shadow AI (the use of unsanctioned AI tools with company data) is usually a symptom of unmet business demand. A blanket block tends to push users toward personal devices and personal accounts, where the security team has no visibility at all.

A more durable approach has four parts: provide approved AI tools that work, write clear policies on what data can and cannot be used, apply DLP to prompts and uploads, and train employees on the rules.

13. Update incident response for AI-specific events

Existing incident response plans usually do not mention genAI-specific risks. Add new playbooks for:

Prompt injection that led to data exposure or unauthorized actions.
Poisoned datasets or RAG documents.
Model abuse, including misuse of a public AI feature.
Data exposure through prompts, outputs, or logs.
Rogue or malfunctioning agent actions.
Compromise of a model endpoint or its credentials.
Suspicious API usage patterns.
Unexpected cost spikes.
Incidents involving third-party AI providers.

For each scenario, decide who gets paged, what gets disabled (the agent, the model endpoint, the API key, the integration), how affected users are notified, and what evidence is preserved. Google SAIF includes AI incident response management as a control, and NIST recommends rehearsed third-party incident response plans, which is a polite reminder that the first time you test the playbook should not be during a real incident.

The bottom line

Generative AI security is lifecycle security. It has both classic AI application security and controls for prompts, models, training data, agents, and the cloud workloads that run them. The defining problem is prompt injection, but it is far from the only one, and no single product or filter will close every gap.

Learn next

Network security

Zero Trust

Secure Access Service Edge (SASE)

Cloud security

Virtual Private Network (VPN)

Identity access management (IAM)

Firewall

Access control

PCI-DSS

Regulatory compliance