Data, RAG, and Memory Security

Data is the lifeblood of agentic systems - and the primary target for attackers. Every document an agent retrieves, every memory it stores, every database query it runs is a potential attack vector. Protect it through classification, access controls, and integrity verification.

8.1 Data Classification and Access Control

Define and enforce a simple classification scheme:

  • Public - No restrictions on access or processing.
  • Internal - Available to authenticated users within the organization.
  • Confidential - Restricted to specific roles, departments, or teams.
  • Regulated - Subject to GDPR, HIPAA, PCI, or local data-protection laws.

For each classification level, define: who can access it (roles, departments, tenants), where it may be processed (on-prem or specific regions), and how long it may be retained.

Implementation

  • Record-level and document-level filters - Always attach tenant and user context to queries. Enforce filters server-side. Never trust the client or the model to self-filter.
  • Avoid cross-tenant RAG indices when possible. If you must share an index across tenants, enforce tenant filters in the query and re-check results in code before returning them.

8.2 RAG Integrity and Indirect Prompt Injection (XPIA) Defenses

RAG content is a persistent indirect prompt injection vector. When agents retrieve and process documents, those documents can contain malicious instructions that hijack agent behavior. Hardening both ingestion and retrieval is critical.

Ingestion Controls

  • Restrict who can edit high-impact corpora (e.g., configuration docs, policy documents).
  • Require approvals for content that is heavily used by agents and for regulated or sensitive documents.
  • Log and review changes to key sources.
  • Optionally: hash and sign critical documents, verify signatures at retrieval time, and maintain versions with rollback capability.

Retrieval Controls

  • Apply row-level and document-level access controls. Only retrieve documents the current user is allowed to see.
  • Limit the number of retrieved documents and the maximum size per document.
  • Tag documents with: tenant, classification, origin, last editor, and last review date.

In Prompt Construction

  • Explicitly separate untrusted RAG snippets from system instructions.
  • Label retrieved content as untrusted context, and instruct the model not to follow instructions found within it.

8.3 Memory Tiers and Poisoning Defenses

Not all agent memory is the same. Define clear tiers with different security treatments:

  1. Session Memory - Lives for a single conversation or short task. Cleared after completion or a short timeout.
  2. Short-Term Memory - Spans multiple sessions (hours to days) for continuity. Auto-expiring and limited in size.
  3. Long-Term Memory / Knowledge - RAG collections, user profiles, configuration. Curated, versioned, and usually human-reviewed.

Promotion Rules

  • Only promote data to long-term memory if the source is trusted or the data passes validation workflows (consistency checks, approvals).
  • For high-impact information like policy or configuration changes, require manual review before promotion.

Poisoning Detection

  • Monitor agent behavior over time. Sudden shifts in tone, recommendations, or policy interpretations may indicate knowledge base poisoning.
  • Keep snapshots of important memory sets so you can diff changes and roll back when needed.

8.4 PII, Secrets, and Retention

PII and secrets must not be casually fed into third-party models or stored in long-term memory without controls.

PII and Secrets Detection

  • Use detectors to find PII, PHI, financial data, and secrets in prompts, logs, and RAG ingestion streams.
  • Redact or tokenize as required by policy.

Data Minimization for Models

  • For third-party LLMs: avoid sending raw identifiers (names, IDs). Use pseudonyms or tokens where possible.
  • Turn off training and retention features, or use dedicated non-training endpoints.

Retention and Deletion

  • Set different retention periods per data type and classification.
  • Support deletion and erasure requests (e.g., GDPR/CCPA) by deleting or anonymizing chat logs, embeddings, and related artifacts.
  • Ensure logs maintain their security value while minimizing personal data.

8.5 Database Mediation

Agents should not have raw SQL access. Full stop.

Introduce a database mediation layer that:

  • Exposes safe, domain-specific operations as tools - get_sales_summary, find_customer_by_name - instead of generic query access.
  • Uses parameterized queries only. No string concatenation. No dynamic SQL built from model output.
  • Enforces limits on rows returned, query complexity, and request frequency and volume.

For analytical agents that need broader data access, consider pre-aggregated data marts or database views rather than direct access to transactional tables. Give the agent the answers, not the keys to the warehouse.

8.6 Cross-Session and Multi-Tenant Memory Poisoning

Section 8.3 covers memory tiers and poisoning detection within a single session or context. This section addresses a more persistent threat: memory poisoning that persists across sessions and, in multi-tenant deployments, across users.[1]

Single-session prompt injection is well understood - an attacker injects instructions that affect the current conversation. Cross-session memory poisoning is different. The attacker plants instructions or false facts in the agent's persistent memory store. Those instructions survive session boundaries and influence future behavior in different contexts, with different users, days or weeks later. The original injection may be long gone from any active context window, but the poisoned memory persists.

Cross-User Memory Contamination

In multi-tenant deployments, one user's inputs can affect another user's agent behavior through shared memory structures. If tenant A's interactions write to a shared knowledge base or embedding store, and tenant B's agent reads from that same store, then tenant A has a path to influence tenant B's agent behavior. This is not a hypothetical - it is a specific failure mode we look for on multi-tenant assessments.

Memory stores must be segmented by user, tenant, and task domain. Shared memory across tenants should be treated as a design decision that requires explicit justification and compensating controls, not as a default.

Mitigations

  • Provenance tracking on memory writes. Every write to the agent's persistent memory should record who or what produced it, when, and in what context. When the agent retrieves a memory later, provenance metadata helps determine how much to trust it.
  • Periodic re-baselining. Persistent memory stores should be periodically audited and re-baselined. If a memory entry cannot be traced to a legitimate source, it should be flagged or removed.
  • Validation of writes before committing. Do not allow agent-generated outputs to be written directly back into persistent memory without a validation step. Agent outputs carry a lower trust score than human-verified inputs - treat them accordingly.
  • Segmentation by user, task, and domain. The tighter the segmentation, the smaller the blast radius of a poisoning event. An agent's long-term memory for user A should not be readable by user B's agent, even within the same deployment.
  • Lower trust scores for re-ingested agent outputs. If an agent's output is later re-ingested as context for a future session (a common pattern in agentic workflows), that re-ingested content should carry a lower trust score than primary source material. The agent should "know" that it is reading its own prior output, not a verified external source.

Latent Memory Poisoning

Not all memory poisoning is immediately detectable. When injected data is designed to appear innocuous at write time and only activate when retrieved in a specific future context, write-time validation will not catch it. The implication: memory validation at ingestion is necessary but not sufficient. Anomaly detection on retrieval patterns and behavioral output monitoring are also required - catching shifts in agent behavior even when no individual stored item looks suspicious. Research has demonstrated attack success rates exceeding 80% with less than 0.1% data poisoning in latent memory attacks on autonomous agents, while leaving benign behavior largely unaffected.[2]

Contextual Learning Traps

A separate attack surface operates entirely within the context window at inference time. If an agent's context includes examples of how to perform a task - few-shot demonstrations - and those examples have been crafted by an attacker, the agent's behavior on the current task can be steered without any explicit instruction and without touching memory stores or RAG indices. This requires no persistent access and leaves no trace in stored context. Treat any externally-sourced few-shot examples with the same skepticism applied to any other untrusted input.[2]

References

  1. OWASP Top 10 for Agentic Applications 2026 - ASI06: Memory and Context Poisoning. OWASP GenAI Security Project. owasp.org
  2. Franklin, M., Tomasev, N., Jacobs, J., Leibo, J.Z., and Osindero, S. "AI Agent Traps." SSRN preprint. Google DeepMind, 2025.

Need help securing your data pipeline?

We help teams harden RAG systems, design memory architectures, and lock down data access patterns for agentic AI. If your agents touch real data, we should talk.

Get in touch

Agentic AI Security Guide V1.2 · Changelog