Monitoring and Incident Response

You cannot defend what you cannot see. Implement telemetry across the full agent lifecycle and build incident response procedures that account for AI-specific failure modes. Without visibility, attacks go undetected and incidents spiral.

11.1 Structured Telemetry and Immutable Audit

For each agentic task, log at minimum these four categories:

Identity

  • User ID (or pseudonymous ID), role, tenant/organization
  • Agent ID and version

Request

  • Timestamp, environment, region
  • User prompt (sanitized - PII and secrets redacted)
  • High-level context such as retrieved document IDs, not full content

Actions

  • Tools called and parameters (sanitized)
  • Data domains touched (e.g., which tables, collections, or indices)
  • Guardrails triggered and decisions taken

Outcome

  • Final agent output (sanitized)
  • Status: success, blocked, or error
  • Any policy violations or escalations

Store logs in append-only or tamper-evident storage. Use WORM (write once, read many), hash-chaining, or signed logs. Align retention periods with your regulatory requirements. If someone can modify or delete audit logs after the fact, you have lost your ability to investigate incidents reliably.

11.2 Behavioral Monitoring and AI-Specific Detection

Static rules are not enough. You need to establish baselines per agent and watch for deviations.

Establish Baselines

For each agent, track:

  • Normal tool usage frequency and mix
  • Typical data volumes and classifications accessed
  • Usual response lengths, latency, and behavioral patterns

Monitor for Anomalies

  • Sudden spikes in high-risk tool calls, bulk data exports, or guardrail violations
  • Activity at unusual times or from unexpected geographies
  • Sudden shifts in agent behavior - tone changes, altered recommendations, or systematic policy deviations

Detect AI-Specific Threats

  • Prompt injection and jailbreak attempts - Repeated attempts to override system instructions
  • Cross-tenant access attempts - Any probe for data outside the current tenant boundary
  • Data exfiltration patterns - Unusually large responses, repeated "list all" requests, or attempts to encode data in output
  • Tool abuse - Misuse of code-execution or generic HTTP tools beyond their intended scope

11.3 Automated Safeguards

Detection without response is just an expensive logging exercise. Build automatic controls that act on what you detect.

Circuit Breakers for Agents

If error rates or policy violation rates cross defined thresholds, disable the agent automatically or switch to a degraded mode - read-only, no tool access. Do not let a malfunctioning agent keep operating at full capability while you figure out what went wrong.

Adaptive Security Posture

When threat levels are elevated:

  • Disable risky tools temporarily
  • Tighten rate limits
  • Force human approval for actions that would normally be automated

Quarantine Modes

For suspicious users or tenants, move them to stricter policies and manual review. This limits blast radius while you investigate, without shutting down the entire system.

11.4 AI-Specific Incident Response

AI incidents are real incidents. Treat them as first-class concerns, integrated with your existing security operations.

Common Incident Classes

  • Data leakage - PII, secrets, or confidential data exposed through agent outputs
  • Tool misuse - Unauthorized changes made through agent tool calls
  • RAG or memory poisoning - Corrupted retrieval data or manipulated agent memory
  • Unsafe or harmful outputs - Toxic, biased, or dangerous content reaching production users
  • Provider compromise or misconfiguration - Issues at the model provider or infrastructure level

For Each Class, Define a Runbook

1. Detection. Which alerts or metrics indicate the problem? Define specific thresholds and signals so your team knows what to look for.

2. Containment. Disable affected agents or tools. Revoke or rotate compromised credentials. Apply network lockdown if needed. Speed matters here.

3. Triage and analysis. Determine the scope: which tenants, users, and data were affected, over what time period, and through which flows. Identify root cause - was it prompt injection, misconfiguration, a code bug, or infrastructure compromise?

4. Remediation. Fix the code, policies, or patch the affected components. Clean or roll back poisoned memory and RAG indices. Restore systems under stricter observation until you have confidence the fix holds.

5. Communication. Notify internal stakeholders immediately. For regulated industries or contractual obligations, notify customers and regulators as required by applicable law and agreements.

6. Learning and improvement. Update your threat models, test suites, guardrails, and runbooks based on what you learned. Every incident should make the system harder to attack next time.

11.5 Agentic Incident Response - What Changes

The runbook structure in 11.4 applies to any AI incident. But agentic systems introduce specific challenges that traditional IR playbooks - built for human-speed events and deterministic system behavior - are not designed to handle. The incident response frameworks most teams have in place weren't designed for systems that make decisions. We're working through what that means on engagements.

Forensics for Agent Decisions

Standard logs capture API calls and system events, not reasoning chains or goal state. When you investigate an agentic incident, you need to reconstruct why the agent did what it did - not just what it did. This requires decision-layer logging: the full context window at each step, the tools considered and selected, the reasoning that led to each action. Without this, your forensic investigation is limited to input-output pairs with a black box in between.

Blast Radius Containment at Machine Speed

A compromised agent can affect multiple downstream agents, services, and data stores at machine speed. By the time a human analyst notices the anomaly, the damage chain may already be complete. IR playbooks need pre-defined automated containment triggers - circuit breakers that fire based on behavioral signals, not just human escalation paths. NHI governance enables fast containment: if every agent has a distinct identity and scoped credentials, you can isolate a compromised agent without disrupting the entire system.

Memory and Context Store Forensics

After a suspected memory poisoning incident, the memory store is evidence. It must be preserved before re-baselining - snapshot it, secure the snapshot, then clean the production store. If you re-baseline first, you destroy the forensic record of what was poisoned, when, and through what vector.

Rogue Agent Containment

Stopping the agent process is not sufficient. When an agent goes rogue, you also need to audit its task queue (what was it about to do?), check downstream effects (what did it trigger that is still in flight?), and verify that no sub-agents or delegated tasks are continuing to execute. A killed agent with live downstream work is a partial containment.

Action Replay

The ability to replay recorded agent actions in an isolated environment is an emerging practice worth building toward. When an incident occurs, replay the agent's action sequence to understand its behavior, test whether the same sequence would trigger cascading failures, and validate that your containment measures would have caught it earlier. This requires the decision-layer logging described above - you cannot replay what you did not record.

References

  1. OWASP Top 10 for Agentic Applications 2026 - ASI09: Lack of Observability. OWASP GenAI Security Project. owasp.org
  2. NIST AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology, January 2023. nist.gov

Need help with monitoring and incident response?

We help teams design telemetry architectures and build AI-specific incident response playbooks. If your agents are running in production, let's make sure you can see what they are doing and respond when something goes wrong.

Get in touch

Agentic AI Security Guide V1.1 · Changelog