Technology

Nemesis

An AI-native security testing platform with expert-guided agentic workflows

Nemesis centralizes engagement knowledge and runs continuous, AI-driven investigation loops across both source code and live targets. Consultants set scope, brief agents, and direct the investigation - Nemesis runs static and dynamic analysis, validates findings, and generates report-ready deliverables. Every decision stays in expert hands.

See Nemesis in action How it works

What Nemesis does

One platform, from scope to report

Investigation agents build plans from project knowledge and consultant instructions, run tests in continuous loops, and surface validated findings for review. Consultants test in parallel, and their results feed back into the same workspace. The whole engagement, from agent testing to hands-on expert work, runs in one platform.

Test Plan

AI-generated test plans

Nemesis generates a structured test plan from workspace context. Each task includes specific steps, expected behavior, and a security hypothesis - giving consultants a clear starting point and a record of what was tested.

nemesis / test-plan / acme-webapp

TC-01 - Auth token replaysteps: 5

hypothesis: session tokens accepted after logout

TC-02 - IDOR on /orders/:idsteps: 4

hypothesis: object refs not scoped to caller

TC-03 - Privilege escalationsteps: 6

hypothesis: role check missing on admin route

Orchestration

Transparent investigation loops

Every agent investigation is recorded as a full transcript. Consultants can review exactly what the agent did, what it observed, and what it concluded - before any finding is accepted into the workspace.

nemesis / investigation-transcript

agent> querying call graph for auth entrypoints

agent> 6 paths reach verifySession()

net> POST /api/orders (forged JWT) returns 200

found> auth bypass confirmed via live request

Deliverables

Shared findings for stakeholder review

Findings can be shared with clients or internal reviewers via a scoped link. Each finding includes impact narrative, reproduction steps, and status tracking - no separate document required.

casaba / shared findings / acme-webapp

F-01 Authentication bypasscritical

F-02 IDOR - order objectshigh

F-03 Hardcoded API keyhigh

F-04 Verbose error disclosuremedium

Reporting

Report generation from findings

Validated findings feed directly into report generation. Nemesis drafts executive summaries, technical details, and reproduction steps - structured to match Casaba's delivery format. Consultants refine, not rebuild from scratch.

nemesis / report / executive summary

OVERALL RISK: HIGH

CRITICAL FINDINGS - 3

Code analysis

Agents that investigate, not scanners that flag

Pattern-based SAST tools find only what they are written to match. Nemesis investigation agents reason about code. They review it the way a person would, following call graphs across the codebase and searching the full repository for what matters.

Cross-file taint and data flow

Trace untrusted input across modules and call boundaries, following real execution paths, not single-file pattern matches.

Auth, authz, and business logic

Reason about access-control intent and application logic, the flaw classes scanners structurally cannot model.

Chained vulnerabilities and custom protocols

Connect lower-severity issues into realistic attack chains, and analyze custom or undocumented protocols.

Runtime testing

Dynamic testing, not just reading code

Nemesis exercises live targets, not only their source. Dynamic analysis runs alongside code review and feeds the same investigation loop, so findings are confirmed against running systems rather than inferred from code alone.

01 / Fuzzing

API fuzzing pipeline

A built-in pipeline drives dictionary-based attacks against target endpoints, surfacing injection, authentication, and input-handling flaws that only appear at runtime.

dictionary attacks - parameter mutation - response analysis

02 / MCP

MCP dynamic testing

Dedicated testing probes live MCP servers for common misconfigurations and unsafe tool behavior, purpose-built for AI and agent infrastructure.

tool-permission scope - protocol abuse - unsafe defaults

03 / Agents

Agent-driven network testing

Orchestrator-triggered investigation agents make live requests to targets, validating exploitability and capturing evidence the way a consultant would.

live requests - exploit validation - evidence capture

How it works

Static and dynamic, in a single run

The orchestrator runs static analysis, dynamic testing, and agent investigations as one continuous loop, with code-path analysis and live requests feeding the same set of validated findings.

nemesis > orchestrator start --engagement acme-webapp

[static] source artifacts indexed, call graph built, static baseline complete

[static] 12 validated findings, 9 investigation candidates generated

[agent] launching investigation agents against highest-priority paths...

[agent] tracing auth flow across 6 modules via call graph + semantic search

[dyn] API fuzzing against /api/* endpoints, surfacing runtime input-handling flaws

[dyn] probing MCP server for tool-permission misconfigurations

[found] 3 critical findings confirmed with live requests and code-path analysis

[human] consultant reviews and signs off before the report ships

Expert-in-the-loop

Consultants drive it. Nemesis handles the grind.

Nemesis proposes, the consultant decides, and nothing ships to a client without expert review and sign-off.

Setting scope

A human defines what to test, how, and what matters most. Nemesis executes within those boundaries.

Approving pivots

When the loop uncovers new leads, consultants decide which directions to pursue. Nothing runs without approval.

Validating findings

Every finding that reaches a client is reviewed, validated, and contextualized by a human expert.

Signing off

Reports and assessments carry a consultant's judgment, not just a tool's output. Nemesis drafts; humans deliver.

Agents are equipped for the engagement type (white-box, black-box, or MCP/AI testing) with the right tools and tuned prompts for each.

Architecture

Runs anywhere. Data stays with you.

Containerized

Runs as containerized services that deploy to on-prem, cloud, or hybrid. No vendor lock-in.

Any cloud

AWS, Azure, GCP, or your own data center. Nemesis adapts to your environment, not the reverse.

Any LLM provider

A centralized gateway routes AI workloads across OpenAI, Azure OpenAI, Gemini, DeepInfra, Baseten, and AWS Bedrock.

Secure by design

Built to the standard we hold others to

Casaba tests the world's software for a living, and we built Nemesis to the same standard we hold our clients to. Security is structural here, not bolted on.

mTLS-protected gateways

Connections to the persistence and model gateways are mutually authenticated with mTLS, so service-to-service traffic is encrypted and verified on both ends.

Workspace-scoped authorization

Access is scoped to the engagement workspace across every route; users, artifacts, runs, and outputs stay isolated within their boundary.

Deterministic before agentic

Protocol-heavy work runs as deterministic baselines first, with agents following up, for stronger reproducibility and lower hallucination risk.

Human gates on state changes

Any tool call that mutates state requires explicit confirmation; agents propose, consultants approve.

Scoped external review

Clients see findings through a dedicated, scoped review surface with its own access tokens, never the internal workspace UI.

Your data stays put

Client source code and findings never leave your deployment; run Nemesis in your cloud, on-prem, or fully air-gapped.

Static and dynamic testing, driven by experts.

Nemesis powers our AI security assessments and code analysis and application testing engagements. Talk to us about how it works.

Get in touch