Monitoring & Incident Response
Visibility is essential for detecting attacks and responding to incidents. Implement comprehensive telemetry and AI-specific incident response procedures.
11.1 Structured Telemetry and Immutable Audit
For each agentic task, log at least:
Identity
- User ID (or pseudonymous ID), role
- Tenant/organization
- Agent ID and version
Request
- Timestamp, environment, region
- User prompt (sanitized: PII/secrets redacted)
- High-level context (e.g., retrieved doc IDs, not full content)
Actions
- Tools called, parameters (sanitized)
- Data domains touched (e.g., which tables/collections/indices)
- Guardrails triggered and decisions taken
Outcome
- Final agent output (sanitized)
- Status (success, blocked, error)
- Any policy violations or escalations
Store logs in append-only or tamper-evident storage (WORM, hash-chaining, or signed logs) with retention aligned to regulatory needs.
11.2 Behavioral Monitoring and AI-Specific Detection
Establish baselines per agent:
- Normal tool usage frequency and mix
- Typical data volumes and classifications accessed
- Usual response lengths, latency, and patterns
Monitor for Anomalies
- Sudden spikes in:
- High-risk tool calls
- Bulk exports
- Guardrail violations
- Unusual times or geographies for activity
- Sudden shifts in agent behavior (e.g., tone, recommendations, systematic policy deviations)
Detect AI-Specific Threats
- Repeated prompt injection/jailbreak attempts
- Cross-tenant access attempts
- Data exfiltration patterns (e.g., large responses, repeated "list all" requests)
- Abuse of code-exec or generic HTTP tools
11.3 Automated Safeguards
Implement automatic controls such as:
Circuit Breakers for Agents
If error or violation rate crosses thresholds:
- Disable the agent or switch to a degraded mode (read-only, no tools).
Adaptive Security Posture
In high threat levels:
- Disable risky tools
- Tighten rate limits
- Force human approval for actions that are normally automated
Quarantine Modes
For suspicious users or tenants, move them to stricter policies and manual review.
11.4 AI-Specific Incident Response
Treat AI incidents as first-class incidents, integrated with security operations.
Common Incident Classes
- Data leakage (PII/secrets/confidential data)
- Tool misuse or unauthorized changes (e.g., records deleted, config modified)
- RAG/memory poisoning
- Unsafe or harmful outputs in production
- Provider compromise or misconfiguration
For Each Class, Define:
1. Detection
Which alerts or metrics indicate the problem?
2. Containment
- Disable affected agents/tools.
- Revoke or rotate credentials.
- Apply network lockdown if needed.
3. Triage & Analysis
- Scope: which tenants/users/data, how long, via which flows.
- Root cause: injection, misconfig, code bug, infra compromise.
4. Remediation
- Fix code/policies or patch components.
- Clean or roll back poisoned memory/RAG indices.
- Restore systems under stricter observation.
5. Communication
- Notify internal stakeholders.
- For regulated or contractual obligations, notify customers/regulators as required.
6. Learning & Improvement
- Update threat models, test suites, guardrails, and runbooks.