Secure SDLC & Testing

Security must be built in from the start, not bolted on later. Integrate AI-specific security practices throughout your development lifecycle.

12.1 AI-Aware Secure Development Lifecycle

Integrate AI considerations into your existing SDL.

At Requirements & Design

Perform AI risk assessment for each agent.
Threat model using:
- Traditional methods (STRIDE, data-flow diagrams)
- AI-specific risks (prompt injection, RAG poisoning, tool misuse, cross-tenant leakage).
Identify:
- Data classifications to be processed
- Tools and external APIs
- RAI harm categories and domain-specific constraints

At Implementation

Apply secure coding practices across stack.
Implement:
- Orchestrator policies
- Guardrails
- Data minimization
- Tool schemas and allowlists
Peer reviews that include:
- Security engineer
- AI/ML-aware engineer where possible

At Testing

Standard security tests:
- SAST, DAST, dependency scanning, container image scanning
AI-specific tests:
- Prompt injection and jailbreak attempts
- Tool misuse and exfiltration attempts
- Cross-tenant access
- RAI harm tests (toxicity, bias, hallucinations under realistic prompts)

At Deployment

Security sign-off for agents with elevated risk.
Gradual rollout (canary, feature flags).
Monitoring and alerting configured.

In Production

Continuous monitoring and periodic re-assessment.
Regular review of logs, metrics, and incidents.
Update models, prompts, guardrails, and policies as threats evolve.

12.2 Automated AI Security Testing

Develop automated suites covering:

Prompt Injection / RAG Attacks

Test both:

Direct user prompts ("ignore all previous instructions and…")
Indirect sources (poisoned RAG docs, tool outputs).

Tool Authorization

Verify forbidden tools cannot be called from an agent.
Check that tool parameters outside allowed ranges are rejected.

Data Isolation

Ensure tenant A cannot retrieve data belonging to tenant B.

RAI Behaviors

Seed tests for harmful prompts in your domain.
Confirm that guardrails block/modify outputs appropriately.

Integrate into CI/CD so regressions are caught early.

12.3 Adversarial Red Teaming

Supplement automated testing with targeted, manual adversarial exercises:

Simulate realistic attackers with:
- Long-term memory poisoning strategies
- Attempts to chain tools for exfiltration
- Privilege escalation via inter-agent messaging
Include both:
- Security-focused red team
- Domain experts (to spot subtle RAI harms)

Use findings to:

Harden prompts, policies, tools, and RAG pipelines.
Expand your automated test library.

12.4 Continuous Security and RAI Evaluation

Adopt a holistic, continuous evaluation approach that combines multiple assessment types:

Security Assessments

Traditional penetration testing
Indirect prompt injection (XPIA) testing via RAG and tool outputs
Authorization and data isolation validation

RAI Harm Evaluation

Regular testing against defined harm categories
Domain-specific risk assessments

Privacy Impact Reviews

PII handling and data minimization audits
Compliance with data protection regulations

Reliability and Robustness Metrics

Accuracy and consistency testing
Performance under adversarial conditions

Key Metrics to Track

Security

Blocked injection attempts / total attempts
Unauthorized tool calls prevented
Cross-tenant access attempts blocked

RAI

Toxic/harmful output rate
Hallucination rate
Bias indicators (where applicable)

Privacy

PII detection/redaction accuracy
Data minimization scores

Reliability

Accuracy benchmarks (per use case)
Consistency across runs
Uptime and error rates

Feed these metrics back into product and security planning.

This consolidated guide should serve as the baseline for agentic AI architecture, implementation, and governance. For each new agent or platform, instantiate these principles with concrete designs, threat models, and policies, and keep living documents as the threat landscape evolves.