# Agentic AI Security Spec

> Machine-readable security rules for agentic AI systems.
> Drop this into your AI development workflow as a reference spec.
>
> **Published by Casaba Security** | casaba.com
> Full guide: https://www.casaba.com/guides/agentic-ai-security/
> Assessment: https://www.casaba.com/guides/agentic-ai-security/checklist/
> V1.2 | April 2026

## How to use this spec

This is a condensed, rule-based version of the Casaba Agentic AI Security Guide. Each rule is a testable security requirement for agentic AI systems.

**Tiers:** Rules are tagged `[F]` Foundational, `[O]` Operational, or `[A]` Advanced.
- `[F]` = Non-negotiable before production. Start here.
- `[O]` = Within 90 days of deployment. Demonstrates maturity.
- `[A]` = Leading practice from real-world assessment experience.

**Usage:** Include this file in your project as `SECURITY-SPEC.md`, add it to your AI assistant's context (CLAUDE.md, .cursorrules, system prompt), or use it as a RAG source for security review automation.

---

## 1. Threat Modeling

- `[F]` MUST document a threat model that includes agentic-specific risks: prompt injection, memory poisoning, tool misuse, privilege escalation, and cross-tenant leakage.
- `[F]` MUST model indirect prompt injection (XPIA) for every data source agents ingest - RAG indices, emails, tool outputs, web content, database records.
- `[F]` MUST distinguish between traditional LLM risks (single request-response) and agentic risks (persistent memory, tool access, multi-step workflows, inter-agent communication).
- `[O]` MUST update threat models when agents gain new tools, data sources, or inter-agent communication paths.
- `[O]` MUST model multi-step attack chains - sequences of individually harmless actions that combine into data exfiltration or privilege escalation.
- `[A]` SHOULD map the threat model against OWASP Agentic Top 10 (ASI01-ASI10) and identify coverage gaps.
- `[A]` SHOULD model AI-augmented attacks - threat actors using agentic AI to accelerate reconnaissance, credential harvesting, and lateral movement at machine speed.
- `[A]` SHOULD model XPIA across four attack layers: perception-layer injection (hidden content), reasoning-layer manipulation (statistical bias), memory and learning attacks (persistent poisoning), and action-layer attacks (explicit instruction embedding).

## 2. Architecture

- `[F]` The orchestration layer - deterministic backend code - MUST enforce all security policies. Model outputs are advisory, never authoritative. The model proposes; code decides.
- `[F]` Each agent MUST have a narrow, defined scope with a minimal tool set. No agent should have broad "figure it out" latitude.
- `[F]` MUST apply defense-in-depth at every boundary: UI, orchestration, tools, data, infrastructure. Each layer MUST assume layers above it may be compromised.
- `[F]` Policies MUST be expressed as code or configuration, not as English instructions in prompts. A prompt saying "do not access financial records" is not an access control.
- `[O]` High-risk actions MUST use plan-verify-execute or controlled breakpoints. The model generates a structured plan; deterministic code validates it; a scoped executor runs only approved steps.
- `[O]` Multi-agent communication channels MUST be schema-validated with explicit trust boundaries between agents.
- `[A]` MUST enforce the "lethal trifecta" constraint: no single component ever has access to sensitive data + untrusted content + broad network access simultaneously. At most two of three.
- `[A]` MUST apply Least Agency as a design principle: agents default to read-only. Write access, external calls, and code execution are opt-in capabilities, each individually justified.

## 3. Identity and Access

- `[F]` Every agent MUST be a first-class identity with a unique ID, registered owner, defined environment, and risk classification. MUST NOT use shared service accounts between agents.
- `[F]` RBAC and ABAC policies MUST constrain which data and tools each agent can access, scoped per user and per tenant.
- `[F]` Agent credentials MUST NOT be exposed to model context windows or client-side code. All secrets MUST live in a centralized secrets manager.
- `[O]` High-privilege access MUST be granted just-in-time with short-lived, task-scoped tokens that auto-expire on task completion.
- `[O]` All privilege escalation events, delegation chains, and high-risk actions MUST be logged with full identity attribution traceable to the originating user.
- `[A]` SHOULD implement non-human identity (NHI) governance at enterprise scale: agent identity lifecycle management, shadow agent detection, orphaned credential cleanup.
- `[A]` SHOULD calibrate behavioral baselines for machine identity patterns (API call frequency, tool usage mix) rather than human UBA assumptions.

## 4. Orchestration and Tools

- `[F]` Tools MUST be atomic, single-purpose functions with server-side JSON schema validation on every invocation. MUST NOT expose generic "run SQL" or "make HTTP request" tools.
- `[F]` Per-agent and per-tenant tool allowlists MUST be enforced with default deny. If a tool is not explicitly allowlisted, the agent cannot call it.
- `[F]` High-impact tools (write, delete, transfer, execute) MUST require human approval or propose-confirm flows before execution.
- `[O]` Network egress from agents and tool servers MUST be restricted to approved endpoints only.
- `[O]` All MCP servers and plugins MUST go through security review before production and be registered in a centralized inventory.
- `[A]` MCP servers SHOULD be code-signed and schema-pinned. Tool descriptor changes MUST trigger alerts. Automated discovery SHOULD surface shadow MCP deployments.
- `[A]` SHOULD defend against the full MCP threat taxonomy: tool poisoning, full schema poisoning (FSP), resource content poisoning, impersonation/typosquatting, session ID leakage, cross-component context poisoning, and agent-aware dynamic cloaking.
- `[A]` MCP server validation SHOULD be performed under conditions matching actual agent access patterns, not only human-initiated test requests, to detect cloaking behavior.

## 5. Data, RAG, and Memory

- `[F]` Data MUST be classified and access-controlled at the record and document level with tenant-aware filters.
- `[F]` MUST minimize data sent to models. MUST NOT send secrets to models. MUST NOT send raw PII or full tables when summaries suffice.
- `[F]` RAG ingestion MUST include content validation. Retrieval controls MUST prevent untrusted content from overriding system instructions.
- `[O]` Memory stores MUST have tiered security treatments (session, short-term, long-term) with promotion rules and poisoning detection.
- `[O]` PII and secrets MUST be detected and redacted before storage. Agents MUST access data through mediation layers, never raw SQL.
- `[A]` SHOULD defend against cross-session memory poisoning: memory stores segmented by user/tenant/task, writes carry provenance metadata, periodic re-baselining removes unverified entries.
- `[A]` Agent outputs re-ingested as future context SHOULD carry lower trust scores than primary source material. The system MUST distinguish its own prior outputs from verified external sources.
- `[A]` SHOULD defend against latent memory poisoning - data that appears innocuous at write time but activates maliciously on retrieval. Anomaly detection on retrieval patterns and behavioral output monitoring SHOULD supplement write-time validation.
- `[A]` Externally-sourced few-shot examples SHOULD be treated as untrusted input. Contextual learning within the context window can steer agent behavior without explicit instructions and without touching persistent memory.

## 6. Frontend and UX

- `[F]` Standard web security MUST be in place: strong auth, CSRF protection, strict CSP, security headers, hardened session management.
- `[F]` All model output MUST be treated as untrusted. MUST NOT inject raw model output via innerHTML or equivalent. Security-focused sanitization MUST run before rendering.
- `[F]` The UI MUST communicate agent capabilities and limitations. Domain-specific disclaimers MUST be present for health, legal, and financial contexts.
- `[O]` Export and sharing flows MUST show users what will be transmitted and apply PII/secret scrubbing before sending.
- `[O]` Users MUST be able to flag harmful or incorrect outputs and escalate to humans.
- `[A]` Rich agent content (diagrams, code blocks) SHOULD render in sandboxed iframes with dedicated origins and strict CSP.
- `[A]` SHOULD implement prompt injection-aware UI design: visually separate system content, user input, and untrusted external content.
- `[A]` Agent-facing interface reviews SHOULD test for web-standard obfuscation vectors: CSS visibility manipulation, off-viewport positioning, HTML comment injection, and accessibility attribute abuse (aria-label, alt text, metadata).

## 7. Infrastructure

- `[F]` Agent workloads MUST run in hardened, least-privilege containers with network segmentation preventing direct access to core data stores.
- `[F]` A model gateway MUST mediate all LLM traffic with authentication, rate limits, and request/response logging.
- `[F]` High-risk tools (code execution, file access) MUST run in additional sandboxing (gVisor, Firecracker, Kata) with no default network access and strict resource limits.
- `[O]` Supply chain risks MUST be managed: SBOMs for agent dependencies, vulnerability scanning, model version tracking.
- `[O]` Cloud resources MUST follow provider-specific security best practices: workload identity, managed key vaults, network policies, mTLS.
- `[A]` All agent-generated code MUST be treated as untrusted input: static analysis before execution, ephemeral per-task environments destroyed after completion.
- `[A]` Control plane and data plane MUST be strictly segregated. Agent runtimes MUST NOT access infrastructure management APIs.

## 8. Guardrails and Responsible AI

- `[F]` Pre-input, mid-execution, and post-output guardrails MUST be implemented as independent services, not as prompt instructions.
- `[F]` Responsible AI harm categories MUST be assessed for the use case: toxicity, bias, misinformation, privacy violations, domain-specific risks.
- `[F]` When a guardrail fires or a classifier is uncertain, MUST default to block or escalate. Fail-safe, not fail-open.
- `[O]` Guardrails MUST be configurable per product, per agent, and per domain.
- `[O]` Bias and fairness MUST be explicitly addressed in sensitive contexts with regular evaluation.
- `[A]` Domain-specific constraints (financial, clinical, legal) SHOULD be encoded as deterministic policies, not model-dependent behavior.
- `[A]` Hallucination and grounding checks SHOULD run on every output where correctness is critical.
- `[A]` Human oversight workflows SHOULD be designed to resist approval fatigue, automation bias, and obfuscated action summaries. High-risk actions SHOULD be surfaced clearly regardless of total approval volume.
- `[A]` Oversight interfaces SHOULD display both the agent's natural language summary and the underlying log of API calls, file operations, and network requests.

## 9. Monitoring, Incident Response, and Testing

- `[F]` End-to-end structured audit logging MUST capture identity, request, actions, and outcome for every agent interaction in immutable stores.
- `[F]` AI-specific incident response runbooks MUST be defined for data leakage, tool misuse, memory poisoning, and harmful outputs.
- `[F]` Automated security tests (prompt injection, tool authorization, data isolation, RAI harms) MUST run in CI/CD pipelines.
- `[O]` Behavioral baselines and anomaly detection MUST monitor agents in production with alerts for injection attempts, tool misuse, and behavioral drift.
- `[O]` Periodic adversarial red teaming MUST be performed. Findings MUST feed back into automated test suites.
- `[A]` SHOULD implement decision-layer forensics: log full reasoning context, tool selection rationale, and goal state changes. Agentic IR SHOULD include automated containment triggers.
- `[A]` Testing SHOULD cover agentic-specific scenarios: multi-step attack chains across agent boundaries, cross-session memory poisoning, cascading failure simulation, and goal drift over extended interactions.

---

## Quick Reference

**Total rules:** 74 (29 Foundational, 19 Operational, 26 Advanced)

**Key concepts unique to this spec:**
- **Least Agency** (Sec 2) - Minimize autonomy, not just access. Default read-only.
- **Lethal trifecta** (Sec 2) - Never combine sensitive data + untrusted content + broad network access in one component.
- **NHI governance** (Sec 3) - Non-human identity lifecycle, shadow agent detection, attribution chain auditing.
- **MCP threat taxonomy** (Sec 4) - Tool poisoning, schema poisoning, impersonation, session ID leakage, context poisoning.
- **Cross-session memory poisoning** (Sec 5) - Persistent poisoning across sessions and tenants with provenance tracking.
- **Decision-layer forensics** (Sec 9) - Log reasoning chains, not just I/O. Enable action replay for incident investigation.
- **XPIA attack spectrum** (Sec 1) - Four-layer model: perception, reasoning, memory/learning, and action-layer injection.
- **Latent memory poisoning** (Sec 5) - Data innocuous at write time, activates on retrieval. Write-time validation alone is insufficient.
- **Agent-aware cloaking** (Sec 4) - MCP servers fingerprint agent vs. human access and serve different content.
- **Oversight as attack surface** (Sec 8) - Approval fatigue, automation bias, and obfuscated action summaries weaponize human review.

**Framework alignment:**
- OWASP Top 10 for Agentic Applications 2026 (ASI01-ASI10)
- OWASP Top 10 for LLM Applications 2025
- NIST AI Risk Management Framework
- MITRE ATLAS

---

*This spec is derived from the Casaba Security Agentic AI Security Guide.*
*Full guide: https://www.casaba.com/guides/agentic-ai-security/*
*Assessment tool: https://www.casaba.com/guides/agentic-ai-security/checklist/*
*Contact: https://www.casaba.com/contact/*
*License: (c) 2002-2026 Casaba Security, LLC. All rights reserved.*