Core Design Principles

These principles should drive every agentic AI system. They are not aspirational goals. They are engineering requirements that determine whether your system fails safely or fails catastrophically.

3.1 Assume Prompt Injection and Model Compromise

Treat any untrusted text as potentially adversarial. That includes user prompts, RAG-retrieved content, tool outputs (emails, webpages, files), and inter-agent messages. All of it.

Design your system so that even if the model "breaks character" and follows malicious instructions, it still cannot:

  • Directly execute code, SQL, or HTTP requests
  • Bypass authorization checks
  • Access data outside its policy scope
  • Trigger high-risk actions without external verification

This is the core Zero Trust assumption for agentic AI. The model will eventually be tricked. Your architecture needs to handle that gracefully.

3.2 Orchestration Layer as Policy Brain

The model is a suggestion engine, not the authority. It suggests actions. It does not decide what actually happens.

The orchestration layer - your backend code - must:

  • Authenticate the user and agent
  • Enforce RBAC/ABAC and data access policies
  • Decide which tools may be used, with what parameters
  • Apply guardrails before and after model calls
  • Log and audit every step

Model outputs are advisory. Only deterministic code and policies should decide what gets executed. If your LLM can call a tool without the orchestration layer approving it, you have a gap.

3.3 Constrained Agency and Least Privilege

Give each agent:

  • A narrow mandate (e.g., "support FAQ summarizer," "sales analytics reporter")
  • Minimal tools needed to fulfill that mandate
  • Minimal data visibility - per-tenant, per-role, per-task

Never give an agent:

  • Direct access to production SQL
  • Generic "run any HTTP request" tools
  • Broad "execute arbitrary code" capabilities without sandboxing

High-risk operations must require a separate step: human approval or a dedicated privileged service that enforces its own checks. The agent proposes; something else disposes.

3.4 Separation of Concerns

Structure your system into clearly separated layers:

  • UI/Presentation - AuthN, UX, basic validation, rendering
  • Orchestration/Policy - Context assembly, policy enforcement, tool mediation
  • Model Inference - Prompt building, LLM invocation
  • Tool & Data Access - APIs, MCP, database, storage
  • Observability & Governance - Central logging, audit, monitoring

Concentrate security logic in the orchestration and tools/data layers. This lets you swap models without having to redo your security architecture. If your security depends on a specific model behaving a specific way, it is fragile.

3.5 Least Data and Purpose Limitation

For each step, send the model only what it needs for that task. Nothing more.

Avoid sending:

  • Whole raw tables or entire documents when a summary suffices
  • PII, PHI, or financial details unless absolutely required
  • Secrets and credentials (never send these to the model at all)

Prefer:

  • Aggregated or masked views (counts, statistics)
  • Domain-specific APIs that abstract raw data

The goal is to minimize the blast radius if a prompt is compromised. If the model never sees the data, the data cannot be exfiltrated through the model.

3.6 Defense-in-Depth and Fail-Safe Defaults

Layer protections at every boundary:

  • UI - Input limits, CSRF protection, XSS protection
  • Orchestration - AuthZ, rate limits, guardrails
  • Tools - Schema validation, business logic checks
  • Data - Row/document-level access, encryption, retention policies
  • Infrastructure - Sandboxing, network policies, monitoring

If a check fails or a classifier is uncertain, default to block or escalate - not "allow and hope." For important flows, degrade functionality (e.g., switch to read-only mode) rather than failing open. Every layer should assume the layers above it may have already been bypassed.

3.7 Governance and Traceability

You must be able to answer these questions for any impactful action your system takes:

  • Which user initiated it?
  • Which tenant?
  • Which agent (and which version)?
  • What prompts, context, and tools were involved?
  • What data was touched?
  • What changed?

To support this, you need:

  • Agents with assigned owners, purposes, and risk classifications
  • All high-risk operations logged in immutable, append-only stores
  • Documented runbooks and owners for AI-specific incidents

If you cannot reconstruct what happened and why, you cannot investigate incidents, satisfy auditors, or improve your system. Traceability is not optional - it is a hard requirement.

3.8 Least Agency

Section 3.3 covers Least Privilege - constraining what an agent can access and which tools it can call. Least Agency goes further. It constrains how much an agent can decide on its own.[1]

We borrowed this from the Principle of Least Privilege and extended it. It's not just about what an agent can touch - it's about how much it can decide on its own. Autonomy is a design choice, not a default. An agent with a blank check to figure things out on its own is an insider threat waiting to be manipulated.

Apply Least Agency by:

  • Decomposing tasks into narrow, single-purpose agent steps rather than giving one agent broad latitude to "figure it out"
  • Preferring stateless agents over stateful ones where possible - the less an agent remembers, the less can be poisoned
  • Requiring human-in-the-loop confirmation for actions above a defined risk threshold, not just for "dangerous" actions in the abstract
  • Defaulting to read-only mode. Write access, external calls, and code execution should be opt-in capabilities, each justified individually

When an agent has too much freedom, every vulnerability in its reasoning becomes an avenue for exploitation. Least Agency limits the blast radius of a compromised agent by limiting what it was allowed to do in the first place. This principle connects directly to how you design non-human identity governance and MCP tool access.

3.9 Strong Observability

Section 3.7 establishes that you need traceability. Strong Observability elevates that from a logging requirement to a design principle: if you cannot observe it, you cannot secure it.[2]

Traditional application logging captures API calls, HTTP status codes, and system events. That is not enough for agentic systems. Agents make decisions - they select tools, interpret context, choose between possible actions, and sometimes chain dozens of steps together before producing a result. If you only log the inputs and outputs, you are blind to everything that happened in between.

Strong Observability means logging at the decision layer:

  • Which tools the agent considered and which it selected, with the reasoning context that led to the choice
  • What data entered the agent's context window at each step - including RAG retrievals, tool outputs, and inter-agent messages
  • Goal state changes - when the agent's objective shifted during a multi-step workflow, and what caused the shift
  • Confidence signals - if the model expressed uncertainty, hedged, or retried, that is forensically relevant

This is not just for incident response. Strong Observability also enables drift detection - spotting when an agent's behavior gradually changes over time as its memory and context evolve. Without decision-layer telemetry, you will not notice the drift until something breaks.

References

  1. OWASP Top 10 for Agentic Applications 2026 - ASI07: Excessive Agency and Autonomy. OWASP GenAI Security Project. owasp.org
  2. OWASP Top 10 for Agentic Applications 2026 - ASI09: Lack of Observability. OWASP GenAI Security Project. owasp.org

Need help applying these principles?

We review agentic AI architectures and help teams build security in from the start. If you are deploying agents in production, let's talk.

Get in touch

Agentic AI Security Guide V1.1 · Changelog