Core Design Principles

These principles should drive every agentic AI system.

3.1 Assume Prompt Injection and Model Compromise

Treat any untrusted text as potentially adversarial:

User prompts
RAG content
Tool outputs (emails, webpages, files)
Inter-agent messages

Design so that even if the model "breaks character" and follows malicious instructions, it still cannot:

Directly execute code, SQL, or HTTP
Bypass authorization
Access data outside policy
Trigger high-risk actions without external checks

This is the core Zero Trust assumption for agentic AI.

3.2 Orchestration Layer as Policy Brain

The model is an optimizer, not the authority.

The orchestration layer (backend) must:

Authenticate the user and agent
Enforce RBAC/ABAC and data access policies
Decide which tools may be used, with what parameters
Apply guardrails before and after model calls
Log and audit every step

Model outputs are advisory. Only deterministic code and policies should decide what actually gets executed.

3.3 Constrained Agency and Least Privilege

Give each agent:

A narrow mandate ("support FAQ summarizer", "sales analytics reporter")
Minimal tools needed to fulfil that mandate
Minimal data visibility (per-tenant, per-role, per-task)

Never give an agent:

Direct access to production SQL
Generic "run any HTTP request" tools
Broad "execute arbitrary code" without sandboxing

High-risk operations must require a separate step (human approval or a dedicated privileged service).

3.4 Separation of Concerns

Structure the system into clearly separated layers:

UI / Presentation – AuthN, UX, basic validation, rendering
Orchestration / Policy – Context assembly, policy enforcement, tool mediation
Model inference – Prompt building, LLM invocation
Tool & data access – APIs, MCP, DB, storage
Observability & governance – Central logging, audit, monitoring

Concentrate security logic in orchestration + tools/data so models can be swapped without redoing security.

3.5 Least Data and Purpose Limitation

For each step, send the model only what is needed for that task:

Avoid:

Whole raw tables or entire documents when a summary suffices
PII/PHI/financial details unless absolutely needed
Secrets and credentials (never send at all)

Prefer:

Aggregated or masked views (e.g., counts, statistics)
Domain-specific APIs that abstract raw data

Minimize the blast radius if a prompt is compromised.

3.6 Defense-in-Depth and Fail-Safe Defaults

Layer protections at:

UI (input limits, CSRF, XSS protection)
Orchestration (authZ, rate limits, guardrails)
Tools (schema validation, business logic checks)
Data (row/document-level access, encryption, retention)
Infrastructure (sandboxing, network policies, monitoring)

If a check fails, or a classifier is uncertain, default to block or escalate, not "allow and hope." For important flows, degrade functionality (e.g., read-only mode) rather than failing open.

3.7 Governance and Traceability

You must be able to answer, for any impactful action:

Which user? Which tenant?
Which agent (and version)?
What prompts, context, and tools were involved?
What data was touched? What changed?

Requirements:

Agents have owners, purposes, and risk classifications.
All high-risk operations are logged in immutable, append-only stores.
There are documented runbooks and owners for AI incidents.