Agentic AI Security Guide / Infrastructure & Sandboxing

Infrastructure & Sandboxing

Infrastructure security provides the foundation for all other controls. Isolate components, harden containers, and leverage cloud-native security features.

10.1 Execution Isolation

Differentiate between:

Standard Services

Orchestrator, model gateway, many tools:

  • Container best practices:
    • Non-root users
    • Dropped Linux capabilities
    • Read-only root file systems where possible
    • Regular image scanning and patching

High-Risk Tools

Code execution, document parsing of untrusted binaries, browser automation:

  • Extra isolation:
    • gVisor, Firecracker, Kata, or similar lightweight VMs
    • No network access by default; selectively enable if necessary
    • Strict CPU, memory, and time limits to prevent DoS
    • Ephemeral environments purged after each run

10.2 Kubernetes and Service Mesh

  • Use namespaces to separate:
    • Agent workloads
    • Core services
    • Tool services
  • Apply NetworkPolicies (or service mesh authZ) so:
    • Only approved services can call the model gateway and tools.
    • Agents cannot directly talk to databases or internal admin services.
  • Use service-to-service authN (mTLS, JWTs) for internal calls.

10.3 Model Gateway and Plane Segregation

Introduce a model gateway that centralizes:

  • Provider credentials
  • Rate limiting
  • Request and response logging
  • Allowlist of services allowed to call models

Segregate:

  • Control plane: Orchestration, policies, configuration, governance
  • Data plane: Inference traffic, tool invocation, data I/O

Restrict control plane APIs to small sets of admin services and teams; audit all changes.

10.4 Supply Chain and Model Provenance

  • Maintain SBOMs for:
    • Base images
    • Key libraries and frameworks (LLM SDKs, vector DBs, guardrail engines)
  • Regularly scan for vulnerabilities and outdated components.
  • Track model versions:
    • Provider, model name, version
    • Training policies (as disclosed), model cards, evaluation results
  • Correlate behavioral changes with model or framework updates.

10.5 Cloud Provider-Specific Recommendations

When deploying agentic AI systems on major cloud platforms, leverage provider-native services that align with security best practices.

Azure

Identity and Access Management

  • Azure Entra ID for user authentication with Conditional Access policies
  • Managed Identities for agent service accounts (avoid storing credentials)
  • Azure RBAC for resource-level access control
  • Privileged Identity Management (PIM) for just-in-time admin access

Secrets Management

  • Azure Key Vault with private endpoints, RBAC-based access policies, and key rotation automation

Container Orchestration

  • Azure Kubernetes Service (AKS) with Azure Network Policies, Azure Policy for Kubernetes, Workload Identity, and confidential containers for sensitive workloads

Network Security

  • Azure Virtual Network with NSGs, Azure Firewall, Private Endpoints for all PaaS services
  • Azure Private Link for secure access to Azure services

Model Gateway

  • Azure API Management with OAuth 2.0/JWT validation, rate limiting, request/response logging, and private VNET integration

AWS

Identity and Access Management

  • AWS IAM Identity Center for user authentication
  • IAM Roles with least-privilege policies, SCPs, and permission boundaries
  • AWS IAM Access Analyzer to identify overly permissive policies

Secrets Management

  • AWS Secrets Manager with automatic rotation and VPC endpoints

Container Orchestration

  • Amazon EKS with Pod Identity, Calico or AWS Network Policies, EKS Security Groups for Pods, and Fargate for serverless pods

Network Security

  • Amazon VPC with Security Groups, Network ACLs, VPC endpoints for AWS services
  • AWS Network Firewall and AWS WAF for advanced filtering

Model Gateway

  • Amazon API Gateway with IAM authorization, usage plans, VPC Link for private integration

Google Cloud Platform (GCP)

Identity and Access Management

  • Google Cloud Identity for user authentication
  • Service Accounts with Workload Identity for GKE pods
  • IAM Conditions for fine-grained access control
  • VPC Service Controls for data perimeter enforcement

Secrets Management

  • Google Secret Manager with IAM-based access control and versioning

Container Orchestration

  • Google Kubernetes Engine (GKE) with Workload Identity, Network Policies, Binary Authorization, and GKE Autopilot
  • Cloud Run for stateless container workloads

Network Security

  • VPC with firewall rules, Private Google Access, VPC Service Controls
  • Cloud Armor for DDoS protection and WAF

Model Gateway

  • Cloud Endpoints or Apigee with service-to-service authentication, rate limiting, and Cloud Armor integration

Cross-Cloud Considerations

When operating across multiple clouds:

  • Unified Identity: Use OIDC/SAML federation; consider HashiCorp Vault for cross-cloud secrets
  • Network Connectivity: AWS Direct Connect, Azure ExpressRoute, or Cloud Interconnect for private connectivity
  • Observability: Centralize logs in a SIEM with consistent formatting and correlation IDs
  • Data Residency: Clearly define which regions handle what data classifications
  • Disaster Recovery: Multi-region within a single cloud first; multi-cloud for critical systems with regular DR drills