Monitoring & Incident Response

Visibility is essential for detecting attacks and responding to incidents. Implement comprehensive telemetry and AI-specific incident response procedures.

11.1 Structured Telemetry and Immutable Audit

For each agentic task, log at least:

Identity

User ID (or pseudonymous ID), role
Tenant/organization
Agent ID and version

Request

Timestamp, environment, region
User prompt (sanitized: PII/secrets redacted)
High-level context (e.g., retrieved doc IDs, not full content)

Actions

Tools called, parameters (sanitized)
Data domains touched (e.g., which tables/collections/indices)
Guardrails triggered and decisions taken

Outcome

Final agent output (sanitized)
Status (success, blocked, error)
Any policy violations or escalations

Store logs in append-only or tamper-evident storage (WORM, hash-chaining, or signed logs) with retention aligned to regulatory needs.

11.2 Behavioral Monitoring and AI-Specific Detection

Establish baselines per agent:

Normal tool usage frequency and mix
Typical data volumes and classifications accessed
Usual response lengths, latency, and patterns

Monitor for Anomalies

Sudden spikes in:
- High-risk tool calls
- Bulk exports
- Guardrail violations
Unusual times or geographies for activity
Sudden shifts in agent behavior (e.g., tone, recommendations, systematic policy deviations)

Detect AI-Specific Threats

Repeated prompt injection/jailbreak attempts
Cross-tenant access attempts
Data exfiltration patterns (e.g., large responses, repeated "list all" requests)
Abuse of code-exec or generic HTTP tools

11.3 Automated Safeguards

Implement automatic controls such as:

Circuit Breakers for Agents

If error or violation rate crosses thresholds:

Disable the agent or switch to a degraded mode (read-only, no tools).

Adaptive Security Posture

In high threat levels:

Disable risky tools
Tighten rate limits
Force human approval for actions that are normally automated

Quarantine Modes

For suspicious users or tenants, move them to stricter policies and manual review.

11.4 AI-Specific Incident Response

Treat AI incidents as first-class incidents, integrated with security operations.

Common Incident Classes

Data leakage (PII/secrets/confidential data)
Tool misuse or unauthorized changes (e.g., records deleted, config modified)
RAG/memory poisoning
Unsafe or harmful outputs in production
Provider compromise or misconfiguration

For Each Class, Define:

1. Detection

Which alerts or metrics indicate the problem?

2. Containment

Disable affected agents/tools.
Revoke or rotate credentials.
Apply network lockdown if needed.

3. Triage & Analysis

Scope: which tenants/users/data, how long, via which flows.
Root cause: injection, misconfig, code bug, infra compromise.

4. Remediation

Fix code/policies or patch components.
Clean or roll back poisoned memory/RAG indices.
Restore systems under stricter observation.

5. Communication

Notify internal stakeholders.
For regulated or contractual obligations, notify customers/regulators as required.

6. Learning & Improvement

Update threat models, test suites, guardrails, and runbooks.