Orchestration and Tool Security

This is where most agentic risk is either controlled or allowed through. The orchestration layer decides what agents can do, and the tool layer is how they do it. Get these wrong and no amount of guardrails will save you.

7.1 Policy Enforcement in the Orchestrator

Your orchestrator needs a centralized policy engine. This is not optional. The policy engine should enforce:

  • User and agent authentication - Verify who is making the request and which agent is acting on their behalf.
  • Authorization - Which agent can act on behalf of which user or tenant? Which tools can be used in this context?
  • Data access rules - Document-level and row-level filters (tenant_id, region, department), plus classification-based constraints.
  • Guardrails - Pre-input, mid-execution, and post-output checks.
  • Rate limits and quotas - Per-user, per-tenant, per-agent, per-tool.

One critical point: policies must be expressed as code or configuration, not as English instructions inside prompts. A prompt saying "do not access financial records" is a suggestion to the model. A policy engine that blocks the query before it reaches the database is an actual control. There is a difference.

7.2 Tool Design: Atomic, Schema-Constrained, Safe-by-Default

The way you design tools determines how much damage a compromised agent can do. Design every tool to be:

  • Atomic - Single-purpose, narrow scope. get_order_status is good. run_sql is not.
  • Versioned - With documented inputs, outputs, and risk level.
  • Schema-constrained - JSON schema for parameters, enforced server-side. Define types, ranges, enums, regexes and patterns, and set additionalProperties: false.

Bad examples: run_python(code: string) or http_request(url, headers, body) as general-purpose tools. These give the agent unlimited reach and make every prompt injection a full compromise.

Better examples: get_stock_price(ticker: string) or create_support_ticket(title, description, priority). These are narrow, predictable, and easy to validate.

7.3 Tool Invocation Mediation Layer

Never execute raw model output directly. The model should output structured tool calls - JSON with a tool name and parameters - and a mediation layer must sit between the model and actual execution.

The mediation layer must:

  • Validate the tool name against the agent's allowlist. If the tool is not on the list, reject the call.
  • Validate parameters against JSON schema. Reject anything that does not conform.
  • Apply business logic checks - For example, if the tool sends email, verify the recipient is an internal domain.
  • Apply rate limits and quotas - Prevent runaway agents from hammering downstream services.
  • Enforce impact-aware behavior - For medium-risk actions, use a propose-then-confirm pattern. For high-risk actions, require breakpoints or human approval before execution proceeds.

7.4 Network and External Integration Controls

Tools are the "hands" of the agent. You need to control where those hands can reach.

  • Restrict outbound network traffic from agent runtimes and tool/MCP servers. Do not let them talk to arbitrary endpoints.
  • Maintain an allowlist - Internal APIs and specific external domains with a known risk posture. Everything else is blocked by default.
  • Avoid generic HTTP tools. If you absolutely must have one: restrict it to specific domains and paths at the infrastructure layer, and add logic to classify and sanitize fetched content before passing it to the model.

Network controls are your safety net when all other checks fail. If an agent is tricked into calling an external service, the network layer should stop it cold.

7.5 MCP Security

The Model Context Protocol (MCP) is becoming the connective tissue for agentic deployments. It standardizes how agents discover, connect to, and invoke external tools and data sources. That standardization is valuable, but it also creates a well-defined attack surface that threat actors can target systematically.[1] [2]

MCP is becoming the connective tissue for agentic deployments, and most of the teams we work with haven't inventoried their MCP servers. That's the starting point. You cannot secure what you haven't cataloged.

This section covers the MCP-specific threat categories you need to defend against and the mitigations that address them. The guide is designed to be self-contained on MCP security - you should not need to go elsewhere for the essentials.

MCP Threat Categories

Tool Poisoning. An attacker modifies tool metadata or descriptors, causing agents to invoke compromised tools. A poisoned descriptor looks legitimate to monitoring systems - the tool name and schema appear unchanged - while routing sensitive data through an attacker-controlled endpoint. Because agents rely on tool descriptors to decide what to call and how, a single poisoned entry can redirect entire workflows without any visible anomaly in the agent's reasoning.

Full Schema Poisoning (FSP). A more structural attack than tool poisoning. Attackers compromise entire tool schema definitions at the structural level, injecting hidden parameters or altered return types that affect all subsequent tool invocations. FSP is harder to detect because the schema itself becomes the source of truth that downstream validation checks against - if the schema is compromised, validation passes on malicious inputs.[1]

Resource Content Poisoning. Malicious instructions embedded in data sources that MCP servers retrieve. The agent processes these as trusted inputs and executes the embedded commands. This is indirect prompt injection (XPIA) delivered through the MCP channel - the attack exploits the trust boundary between the MCP server's data sources and the agent's reasoning context.

MCP Impersonation and Typosquatting. Malicious MCP servers that impersonate legitimate services at the discovery and installation stage. An attacker registers an MCP server with a name similar to a popular service - one character off, a plausible misspelling - and waits for agents or developers to connect. Once connected, the malicious server can serve poisoned tools, exfiltrate data from tool calls, or inject instructions into the agent's context.[3]

Session ID Leakage. Early MCP implementations placed session identifiers in URL query strings rather than headers. Those IDs leak through browser history, proxy logs, referrer headers, and server access logs. An attacker with access to any of these can hijack active MCP sessions and issue commands as the authenticated agent. CVE-2025-32711 (EchoLeak) demonstrated this attack pattern against a production MCP deployment.[4]

Cross-Component Context Poisoning. A compromised MCP server injects bad data into shared state consumed by other MCP components. This causes behavioral drift across the system without any single obviously compromised component. The poisoned data propagates through normal inter-component communication, making the root cause difficult to trace.

Agent-Aware Dynamic Cloaking. A malicious MCP server can fingerprint whether it is being accessed by an AI agent or by a human auditor - using automation framework artifacts, behavioral timing differences, or access pattern signatures - and serve different tool schemas, responses, or injected content accordingly. An auditor running a security review sees benign output. The agent connecting in normal operation sees something different. This means human-conducted MCP audits are not sufficient on their own. Validate MCP server behavior under conditions that match actual agent access patterns, not just human-initiated test requests. The technique is a direct adaptation of web cloaking to agent infrastructure.[5]

MCP Mitigations

Treat all MCP tool descriptors and schema definitions as untrusted inputs. Validate schemas before installation. Pin schema versions and monitor for unexpected changes. If a tool descriptor changes between invocations, that is an alert condition, not a normal update. Apply the same skepticism you would apply to any external dependency - "it's a standard protocol" is not a reason to trust the server on the other end.

Maintain a centralized inventory of all deployed MCP servers. This includes servers deployed by your team, third-party servers integrated during development, and - critically - shadow deployments that teams provisioned without going through your security review process. Automated discovery to surface unregistered MCP servers should be part of your monitoring posture. Apply Least Agency here: if a server is not in the inventory, it should not be reachable.

Implement code signing verification for MCP server installations. Before any MCP server is added to your approved registry, verify its provenance. Code signing, hash verification, and publisher validation reduce the risk of typosquatting and impersonation attacks. Treat MCP server selection with the same rigor you apply to package dependency management.

Log all MCP interactions with full request and response content. OpenTelemetry is becoming the standard approach for MCP observability. Every tool call, every response, every schema change should be logged in a format that supports forensic analysis. When something goes wrong, you need to reconstruct the full sequence of MCP interactions - not just the final outcome. Ensure that agent identity is attached to every logged interaction.

Apply zero-trust posture to MCP connections. Authenticate and authorize every MCP connection. Use mTLS for transport security. Scope each MCP server's access to the minimum data and capabilities required for its function. Do not allow MCP servers to access resources beyond their defined scope, even if the underlying infrastructure permits it.

Default deny for MCP server access. Agents cannot use arbitrary MCP servers. Every MCP server must be explicitly allowlisted per environment. If it is not in the registry and approved, the agent cannot reach it. Period.

Go deeper: For a detailed look at MCP security testing methodology - including the specific attack surfaces, adversarial workflows, and finding patterns we look for on engagements - see our MCP & Tool Integration Security capability brief. The content in this section draws from that brief and from our assessment experience.

References

  1. Securing Agentic AI: A Threat Model and Call to Action for the Model Context Protocol. CoSAI / OASIS Open, January 2026. cosai.oasis-open.org
  2. OWASP Top 10 for Agentic Applications 2026 - ASI02: Tool Misuse and Exploitation. OWASP GenAI Security Project. owasp.org
  3. OWASP Top 10 for Agentic Applications 2026 - ASI04: Agentic Supply Chain Vulnerabilities. OWASP GenAI Security Project. owasp.org
  4. CVE-2025-32711 (EchoLeak). Microsoft Security Response Center. Session ID leakage in MCP server implementation.
  5. Franklin, M., Tomasev, N., Jacobs, J., Leibo, J.Z., and Osindero, S. "AI Agent Traps." SSRN preprint. Google DeepMind, 2025.

Need help securing your orchestration layer?

We review agentic AI architectures and help teams lock down tool integrations, policy engines, and MCP configurations. If you are deploying agents with real tool access, let's talk.

Get in touch

Agentic AI Security Guide V1.2 · Changelog