Core Design Principles
These principles should drive every agentic AI system.
3.1 Assume Prompt Injection and Model Compromise
Treat any untrusted text as potentially adversarial:
- User prompts
- RAG content
- Tool outputs (emails, webpages, files)
- Inter-agent messages
Design so that even if the model "breaks character" and follows malicious instructions, it still cannot:
- Directly execute code, SQL, or HTTP
- Bypass authorization
- Access data outside policy
- Trigger high-risk actions without external checks
This is the core Zero Trust assumption for agentic AI.
3.2 Orchestration Layer as Policy Brain
The model is an optimizer, not the authority.
The orchestration layer (backend) must:
- Authenticate the user and agent
- Enforce RBAC/ABAC and data access policies
- Decide which tools may be used, with what parameters
- Apply guardrails before and after model calls
- Log and audit every step
Model outputs are advisory. Only deterministic code and policies should decide what actually gets executed.
3.3 Constrained Agency and Least Privilege
Give each agent:
- A narrow mandate ("support FAQ summarizer", "sales analytics reporter")
- Minimal tools needed to fulfil that mandate
- Minimal data visibility (per-tenant, per-role, per-task)
Never give an agent:
- Direct access to production SQL
- Generic "run any HTTP request" tools
- Broad "execute arbitrary code" without sandboxing
High-risk operations must require a separate step (human approval or a dedicated privileged service).
3.4 Separation of Concerns
Structure the system into clearly separated layers:
- UI / Presentation – AuthN, UX, basic validation, rendering
- Orchestration / Policy – Context assembly, policy enforcement, tool mediation
- Model inference – Prompt building, LLM invocation
- Tool & data access – APIs, MCP, DB, storage
- Observability & governance – Central logging, audit, monitoring
Concentrate security logic in orchestration + tools/data so models can be swapped without redoing security.
3.5 Least Data and Purpose Limitation
For each step, send the model only what is needed for that task:
Avoid:
- Whole raw tables or entire documents when a summary suffices
- PII/PHI/financial details unless absolutely needed
- Secrets and credentials (never send at all)
Prefer:
- Aggregated or masked views (e.g., counts, statistics)
- Domain-specific APIs that abstract raw data
Minimize the blast radius if a prompt is compromised.
3.6 Defense-in-Depth and Fail-Safe Defaults
Layer protections at:
- UI (input limits, CSRF, XSS protection)
- Orchestration (authZ, rate limits, guardrails)
- Tools (schema validation, business logic checks)
- Data (row/document-level access, encryption, retention)
- Infrastructure (sandboxing, network policies, monitoring)
If a check fails, or a classifier is uncertain, default to block or escalate, not "allow and hope." For important flows, degrade functionality (e.g., read-only mode) rather than failing open.
3.7 Governance and Traceability
You must be able to answer, for any impactful action:
- Which user? Which tenant?
- Which agent (and version)?
- What prompts, context, and tools were involved?
- What data was touched? What changed?
Requirements:
- Agents have owners, purposes, and risk classifications.
- All high-risk operations are logged in immutable, append-only stores.
- There are documented runbooks and owners for AI incidents.