Infrastructure & Sandboxing

Infrastructure security provides the foundation for all other controls. Isolate components, harden containers, and leverage cloud-native security features.

10.1 Execution Isolation

Differentiate between:

Standard Services

Orchestrator, model gateway, many tools:

Container best practices:
- Non-root users
- Dropped Linux capabilities
- Read-only root file systems where possible
- Regular image scanning and patching

High-Risk Tools

Code execution, document parsing of untrusted binaries, browser automation:

Extra isolation:
- gVisor, Firecracker, Kata, or similar lightweight VMs
- No network access by default; selectively enable if necessary
- Strict CPU, memory, and time limits to prevent DoS
- Ephemeral environments purged after each run

10.2 Kubernetes and Service Mesh

Use namespaces to separate:
- Agent workloads
- Core services
- Tool services
Apply NetworkPolicies (or service mesh authZ) so:
- Only approved services can call the model gateway and tools.
- Agents cannot directly talk to databases or internal admin services.
Use service-to-service authN (mTLS, JWTs) for internal calls.

10.3 Model Gateway and Plane Segregation

Introduce a model gateway that centralizes:

Provider credentials
Rate limiting
Request and response logging
Allowlist of services allowed to call models

Segregate:

Control plane: Orchestration, policies, configuration, governance
Data plane: Inference traffic, tool invocation, data I/O

Restrict control plane APIs to small sets of admin services and teams; audit all changes.

10.4 Supply Chain and Model Provenance

Maintain SBOMs for:
- Base images
- Key libraries and frameworks (LLM SDKs, vector DBs, guardrail engines)
Regularly scan for vulnerabilities and outdated components.
Track model versions:
- Provider, model name, version
- Training policies (as disclosed), model cards, evaluation results
Correlate behavioral changes with model or framework updates.

10.5 Cloud Provider-Specific Recommendations

When deploying agentic AI systems on major cloud platforms, leverage provider-native services that align with security best practices.

Azure

Identity and Access Management

Azure Entra ID for user authentication with Conditional Access policies
Managed Identities for agent service accounts (avoid storing credentials)
Azure RBAC for resource-level access control
Privileged Identity Management (PIM) for just-in-time admin access

Secrets Management

Azure Key Vault with private endpoints, RBAC-based access policies, and key rotation automation

Container Orchestration

Azure Kubernetes Service (AKS) with Azure Network Policies, Azure Policy for Kubernetes, Workload Identity, and confidential containers for sensitive workloads

Network Security

Azure Virtual Network with NSGs, Azure Firewall, Private Endpoints for all PaaS services
Azure Private Link for secure access to Azure services

Model Gateway

Azure API Management with OAuth 2.0/JWT validation, rate limiting, request/response logging, and private VNET integration

AWS

Identity and Access Management

AWS IAM Identity Center for user authentication
IAM Roles with least-privilege policies, SCPs, and permission boundaries
AWS IAM Access Analyzer to identify overly permissive policies

Secrets Management

AWS Secrets Manager with automatic rotation and VPC endpoints

Container Orchestration

Amazon EKS with Pod Identity, Calico or AWS Network Policies, EKS Security Groups for Pods, and Fargate for serverless pods

Network Security

Amazon VPC with Security Groups, Network ACLs, VPC endpoints for AWS services
AWS Network Firewall and AWS WAF for advanced filtering

Model Gateway

Amazon API Gateway with IAM authorization, usage plans, VPC Link for private integration

Google Cloud Platform (GCP)

Identity and Access Management

Google Cloud Identity for user authentication
Service Accounts with Workload Identity for GKE pods
IAM Conditions for fine-grained access control
VPC Service Controls for data perimeter enforcement

Secrets Management

Google Secret Manager with IAM-based access control and versioning

Container Orchestration

Google Kubernetes Engine (GKE) with Workload Identity, Network Policies, Binary Authorization, and GKE Autopilot
Cloud Run for stateless container workloads

Network Security

VPC with firewall rules, Private Google Access, VPC Service Controls
Cloud Armor for DDoS protection and WAF

Model Gateway

Cloud Endpoints or Apigee with service-to-service authentication, rate limiting, and Cloud Armor integration

Cross-Cloud Considerations

When operating across multiple clouds:

Unified Identity: Use OIDC/SAML federation; consider HashiCorp Vault for cross-cloud secrets
Network Connectivity: AWS Direct Connect, Azure ExpressRoute, or Cloud Interconnect for private connectivity
Observability: Centralize logs in a SIEM with consistent formatting and correlation IDs
Data Residency: Clearly define which regions handle what data classifications
Disaster Recovery: Multi-region within a single cloud first; multi-cloud for critical systems with regular DR drills