Technical Capability Brief

AI Application & Agent Security

This is where the majority of AI security engagements land. Any product that calls an LLM API to add AI capabilities to a workflow - chatbots with tools, RAG pipelines, AI-enhanced SaaS features, autonomous coding assistants, browser agents - falls into this category. The common thread is that an application is making decisions using LLM output, and those decisions have consequences.

Casaba has deep experience testing these systems across the full spectrum of complexity, from simple question-answering interfaces to multi-agent systems with broad tool access and autonomous execution capabilities.

What we test

Seven attack surfaces

Prompt Injection and XPIA

Both direct (user-facing) and indirect (through retrieved documents, tool outputs, emails, web content). We test whether untrusted data flowing into the model's context can hijack its behavior across every input surface the application exposes.

Tool Use and Privilege Escalation

When the AI can call APIs, query databases, send emails, or execute code, every tool is an attack surface. We test whether the model can be manipulated into chaining tools in unintended ways, accessing data outside its scope, or escalating privileges through tool interactions.

Data Exfiltration

Can sensitive data (source code, credentials, PII, internal documents) be extracted through the AI's responses, tool calls, or side channels? We map every path data can take out of the system.

Sandbox and Isolation

For agents with execution environments, we evaluate whether the sandbox actually contains the agent's actions. This includes network access, file system escape, environment variable leakage, and lateral movement.

RAG Pipeline Integrity

Testing whether the retrieval layer enforces proper access controls, whether data poisoning through malicious documents in the knowledge base can influence agent behavior, and whether cross-tenant data isolation holds in multi-tenant RAG systems.

Decision-Making Under Adversarial Pressure

As agent autonomy increases, so does the risk from social engineering-style attacks. We test whether phishing techniques, false credibility, and multi-turn manipulation can get agents to take actions they shouldn't, even when direct injection defenses hold.

Excessive Agency and Autonomy Controls

Testing whether AI agents can be nudged into taking actions beyond their intended scope, including unauthorized tool invocations, context window poisoning that triggers unintended downstream actions, and permissions that exceed what the agent needs for its stated purpose.

How we approach it

Real environments, custom attack strategies

We work within the actual environment where the AI system is deployed, using real workflows and realistic adversarial scenarios. Testing follows a structured process: scoping to identify the highest-risk input surfaces and tool integrations, test plan generation focused on the most impactful attack vectors, and iterative testing with regular status updates.

We build custom prompt injection payloads and attack strategies tailored to the specific application's behavior patterns, tool set, and safety architecture. Where engagements require scale, we build custom AI-vs-AI adversarial automation - using models to generate, mutate, and evaluate adversarial prompts against the target, covering far more attack surface than manual testing alone. We also leverage our internal tooling (Nemesis) to accelerate reconnaissance, finding triage, and report generation.

For multi-agent systems, Casaba tests how agents interact with each other and with external tools, looking for scenarios where one agent's behavior can influence another in unintended ways. We evaluate whether autonomy controls and guardrails hold up when agents are nudged toward the boundaries of their intended functionality.

What we find

Typical finding patterns

Cross-Boundary Injection Through Tool Integrations

Third-party tool connections, MCP servers, and plugin surfaces often introduce injection vectors that the primary product's safety controls don't cover. Content flowing through these integrations can redirect the agent's behavior, and chaining tool calls in unintended sequences can escalate impact significantly.

Sandbox Escape Through Adjacent Capabilities

Development sandboxes and execution environments that appear contained may allow network access, credential leakage, or lateral movement through overlooked pathways. Combining these with prompt injection can escalate from "the AI said something it shouldn't" to "an attacker has access to production credentials."

RAG Poisoning and Cross-Tenant Leakage

In multi-tenant systems, retrieval pipelines that don't enforce strict access boundaries can allow one tenant's data to influence another tenant's AI responses. Poisoned documents in shared knowledge bases can steer agent behavior across user boundaries.

Social Engineering-Style Attacks Against Agent Reasoning

When direct injection defenses hold, phishing-like techniques - establishing false credibility, building context across multiple turns, mimicking legitimate content patterns - can still manipulate agent decision-making. These are harder to defend against because they exploit the model's reasoning rather than its input parsing.

Excessive Agency

AI agents operating with more permissions, functionality, or autonomy than required for their task, enabling them to take actions that were never intended by the developer.

Context Window Poisoning

Manipulated inputs that cause the agent to take additional unintended actions on connected systems, such as modifying code, files, or configurations beyond the scope of the original request.

Multi-Agent Interaction Vulnerabilities

One agent's output becoming another agent's input without adequate validation, creating chains of unintended behavior across agent boundaries.

Insufficient Guardrails on Tool Use

Agents invoking tools or APIs with parameters or frequency that the system should restrict but does not enforce, allowing unchecked actions through legitimate interfaces.

Unsafe Output Rendering

LLM-generated content passed to downstream systems without sanitization - rendered in web pages leading to cross-site scripting, passed to shell commands enabling code execution, or fed to APIs that interpret the output as instructions rather than data.

Uncontrolled Resource Consumption

Missing limits on LLM API usage, token consumption, or query volume that enable denial-of-service or denial-of-wallet attacks. Includes scenarios where an attacker can trigger expensive model operations repeatedly without adequate throttling.

Why it matters

The attack surface is fundamentally different

AI applications are proliferating across every industry, and most security teams are still building their testing methodology for these systems. The attack surface is fundamentally different from traditional web and API security - prompt injection, tool chaining, and adversarial reasoning don't map neatly to existing vulnerability taxonomies. Organizations need testing partners who have already developed the methodology, tooling, and pattern recognition that this work demands.

Need your AI system tested?

We've been testing AI products at scale longer than most. Let's talk about yours.

Get in touch