Case study
Testing Microsoft 365 Copilot for AI security risks
Since January 2024, Microsoft has selected Casaba Security to perform ongoing security assessments of Copilot AI assistants across the M365 product suite. Our work has spanned multiple engagements over two years, testing for AI-specific vulnerabilities including prompt injection, data exfiltration, and responsible AI violations.
The full reports are publicly available on Microsoft's Service Trust Portal.
Overview
Two years of AI security testing at scale
Microsoft Copilot is one of the largest AI deployments in the world, embedded across Word, Excel, PowerPoint, Outlook, Teams, and other M365 applications. Casaba's engagement covers the security of these Copilot implementations from both AI-specific and traditional application security perspectives.
Over 2024 and 2025, our dedicated AI security testing team assessed Copilot across dozens of implementations, web applications, and desktop applications within the M365 suite. This is not a one-time audit - it is an ongoing security partnership testing evolving AI features as Microsoft ships them.
2024 engagement
M365 Copilot vulnerability assessment
In our initial 2024 engagement, we assessed the security of Microsoft 365 Copilot, testing for both AI-specific and traditional security vulnerabilities across the Copilot experience.
The engagement included automated fuzzing and manual prompt injection testing to evaluate the system's resistance to cross-prompt injection attacks, data exfiltration, and responsible AI violations. We validated behavior across multiple RAI harm categories and applied Microsoft's Vulnerability Severity Classification for AI Systems.
Prompt injection testing
Direct and indirect prompt injection attacks, including cross-prompt injection (XPIA) where external data sources attempt to influence Copilot behavior.
Data exfiltration testing
Evaluating whether Copilot can be manipulated into leaking sensitive information from documents, emails, or other M365 data sources.
Responsible AI evaluation
Testing against multiple RAI harm categories to ensure Copilot behaves safely, ethically, and within its intended boundaries.
2025 engagement
Expanded scope across 19 Copilot implementations
The 2025 engagement expanded significantly. Over a nine-month assessment, Casaba tested 19 Copilot implementations, 24 web applications, and 5 desktop applications within the M365 suite.
Our dedicated AI security testing team used custom in-house LLM test automation alongside manual testing to evaluate both traditional web application security and AI-specific attack surfaces. The work included custom protocol test harness development, multi-layered test execution across AI and traditional attack surfaces, and data flow analysis tracing user inputs across dozens of dispersed components.
19 Copilot implementations
Testing AI assistants across the full M365 product suite - Word, Excel, PowerPoint, Outlook, Teams, and more.
Custom LLM test automation
Purpose-built tooling for automated prompt injection and jailbreaking tests, running alongside manual expert testing.
24 web + 5 desktop applications
Traditional application security testing of the web and desktop surfaces where Copilot features are exposed to users.
How we test
Our approach to AI security testing
Casaba's AI security testing methodology goes beyond automated prompt scanning. We combine black-box adversarial testing with code-level analysis, architecture review, and deep collaboration with engineering teams. Our approach is informed by the OWASP Top 10 for LLM Applications and our own research from years of testing AI systems.
We build and maintain custom testing tools - including LLM-vs-LLM attack automation - and continuously update our methods as new research and attack techniques emerge. Our Nemesis platform powers the code analysis and finding triage behind these engagements.
Learn more about our AI and LLM security testing services.
Published reports
Read the reports
These reports are published by Microsoft on the Service Trust Portal, which requires an M365 account to access. They are summary versions prepared for public distribution.
M365 Copilot Vulnerability Assessment (2025)
Nine-month assessment covering 19 Copilot implementations, 24 web applications, and 5 desktop applications.
View on Service Trust Portal →M365 Copilot Vulnerability Assessment (2024)
AI-specific and traditional security testing of the M365 Copilot experience, including prompt injection and RAI evaluation.
View on Service Trust Portal →Need your AI product tested?
We've tested the largest AI deployments in the world. Let's talk about what your system needs.
Get in touch