EF

exploit-forge

Fully autonomous, multidisciplinary exploit development driven by orchestrated AI agents.

A project by The Exploit Intelligence Platform exploit-intel.com 2026

What Is This

Autonomous Exploit Research

Give it a target. Get a verified exploit - lab, code, and documentation included.

⏳ The Problem

A skilled security researcher takes days to weeks to analyze a vulnerability, build a test environment, write an exploit, and prove it works. The process is manual, expensive, and doesn't scale.

🤖 What exploit-forge Does

78 AI agents work together - researching the vulnerability, building Docker labs, writing exploit code, and verifying it against both vulnerable and patched targets. Fully autonomous.

✅ The Result

A complete, verified exploit package - working PoC, Docker lab, and technical writeup - delivered in minutes, not weeks. No human in the loop. Nine pipelines cover every domain.

300+

verified exploit labs built, tested, and documented entirely by AI agents

exploit-intel.com/labs

At a Glance

The Scale of exploit-forge

A single TypeScript monorepo that consolidates 9 autonomous security research pipelines, 78 AI agents, and 18,000 lines of tool infrastructure into one unified platform.

9

Pipelines

78

AI Agents

35K

Lines of TypeScript

18K

Lines MCP Tools

11

Docker Images

Each pipeline deploys 6-13 autonomous Claude agents that sequentially execute specialized security research phases - from reconnaissance to verified exploit delivery.

The Problem

The High-Skill Research Bottleneck

Exploit development is one of the most demanding disciplines in security. It requires deep expertise across multiple domains, tools, and architectures - and even the best researchers hit a ceiling.

🧩 Multidisciplinary Complexity

One CVE needs Ghidra and GDB. The next needs WinDbg on a remote VM. The next needs a Playwright browser session. Each demands a different skillset, different tools, different architectures - and hours of debugging to get right.

🦄 The Unicorn Problem

A researcher who can reverse engineer binaries, write ROP chains, pentest web apps, diff Windows patches, AND decompile Android APKs? They barely exist. And even they can only work on one thing at a time.

🌐 Every Stack, Every Framework

The target could be Node, ASP.NET, PHP, Flask, Spring, Rails - any stack. A human researcher might master two or three. The AI reads them all equally, pulling API docs, source code, and framework internals on the fly.

exploit-forge's answer: 78 specialized agents across 9 disciplines. No skill gaps. No context switching. No ceiling.

The 9 Pipelines

Domain-Specific Security Expertise

Each pipeline is purpose-built for a different security domain - from known CVEs to undiscovered zero-days. Same core engine, different specialized agents and tools.

CVEForge

CVE advisory assessment and PoC development. Takes a CVE ID, builds a lab, writes and verifies an exploit.

7agentsProduction

BinForge

Binary exploitation via reverse engineering. Ghidra + GDB analysis, crash collection, ROP chain development.

8agentsProduction

StackForge

Stack overflow analysis with crash + control gates. Must prove EIP/RBP control before exploitation phase.

9agentsTesting

ZeroForge

Zero-day research with fuzz campaigns. AFL++/libFuzzer with coverage feedback and crash triage.

11agentsTesting

WinForge

Patch Tuesday diffing on Windows VMs. WinDbg remote sessions via QEMU/KVM, ghidriff binary analysis.

8agentsActive Dev

LabForge

Exploit QA and cleanup. Normalizes labs, anonymizes artifacts, verifies reproducibility for release.

8agentsActive Dev

ShannonForge

Web application pentesting. 5 vuln types analyzed in parallel with dedicated Playwright browser agents.

13agentsActive Dev

WPForge

WordPress-native plugin/theme exploit research. Audits source code, builds local WP lab, verifies findings.

6agentsNew

APKForge

Android APK assessment. APKTool + JADX decompilation, secret scanning, bounded external validation.

8agentsNew

CVEForge

CVE Advisory to Proof of Concept

Input a CVE ID. Get a verified exploit with Docker lab, PoC code, and documentation.

🔍

Intel

Query EIP, clone repo, identify versions

🧬

Analysis

Read fix diff, trace vuln path

🏗️

Lab Build

Generate Dockerfile, build containers

💥

PoC Dev

Write exploit, test on vulnerable target

✅

Verify

Confirm patch blocks exploit

📝

Report

Synthesize evidence into README

🔎

QA

Quality review of all deliverables

Production ~$2-8 per run 7 Agents Sequential

BinForge

Binary Exploitation

Reverse engineering meets autonomous exploitation. Ghidra for static analysis, GDB for runtime debugging, pwntools for exploit development.

🔬

Intel

Fingerprint binary, enumerate inputs

🗺️

Reverse

Map control + data flow

🎯

Sink

Analyze dangerous functions

🏗️

Harness

Build test harness + lab

💥

PoC

Execute, measure impact

📝

Report

Document mechanics

🔓

Bypass

Attempt patch bypass

🔎

QA

Validate reliability

Production GDB + Ghidra Exploitability Gate

ZeroForge

Zero-Day Research

Discovers unknown vulnerabilities through automated fuzz campaigns with coverage-guided mutation. 11 agents - the most complex pipeline.

🗺️ Surface Mapping

Identify attack surface, entry points, dangerous sinks. Build call graphs and trace data flow from inputs to sinks.

🔎 Variant Hunting

Search for similar code patterns and related CVEs. Find variants of known vulnerability classes in new locations.

🧪 Fuzz Campaigns

Run AFL++/libFuzzer with coverage feedback, ASAN/UBSAN sanitizers. Auto-triage crashes by exploitability.

Testing 11 Agents ~$10-40 per run Coverage-Guided

StackForge

Stack Overflow Exploitation

The only pipeline with built-in quality gates. Must prove a crash before analyzing control, and must prove register control before attempting exploitation.

🔍

Research

Query EIP, identify vuln class and target

⚙️

Env Setup

Configure build tools and dependencies

🏗️

Lab Build

Compile target, build Docker lab

💥

Crash PoC

Prove crash with ASAN or GDB evidence

🎯

Control

Map registers, calculate offsets

⚡

Exploit

ROP chains, shellcode, pwntools

✅

Validate

Confirm exploit reliability

📝

Report

Document exploit mechanics

🔓

Bypass

Attempt patch bypass

Testing Crash Gate + Control Gate GDB + pwntools 9 Agents

WinForge

Patch Tuesday Analysis

Every month Microsoft patches hundreds of binaries. WinForge diffs them automatically - finding what changed, why it matters, and whether it's exploitable.

📋

Patch Intel

Identify target KB, acquire binaries

🔬

Diff Analysis

ghidriff binary diffing, function-level changes

🏗️

Lab Setup

Provision QEMU/KVM Windows VM

💥

PoC Dev

Write exploit targeting the diff

✅

Verify

WinDbg remote debugging session

📝

Report

Document patch analysis

🔓

Bypass

Attempt patch bypass

🔎

QA

Validate deliverables

Active Dev QEMU/KVM Windows VMs WinDbg + ghidriff 8 Agents

ShannonForge

Web Application Pentesting

The largest pipeline - 13 agents. Performs a full web application security assessment with 5 vulnerability types analyzed in parallel, each with its own Playwright browser agent.

Phase 1: Reconnaissance

Pre-Recon analyzes source code for attack surfaces. Recon performs full site mapping with Playwright. Two agents feed all five parallel tracks.

Phase 2: Vulnerability Analysis

Five agents run in parallel - SQLi, XSS, Auth Bypass, SSRF, and Authorization Bypass. Each gets its own browser instance.

Phase 3: Exploitation

Only vuln types with confirmed findings proceed to exploitation. Each exploit agent gets a dedicated Playwright session to prove the finding with evidence.

Active Dev 13 Agents 5 Parallel Tracks Playwright Browsers

WPForge

WordPress Security Research

Purpose-built for WordPress plugin and theme vulnerabilities. Audits source code, maps hook and filter dependencies, builds a local WordPress lab, and verifies findings end-to-end.

📥

Intake

Identify target plugin/theme, fetch source

🗺️

Dep Map

Map hooks, filters, AJAX handlers, REST routes

🔍

Code Audit

Security review - sinks, guards, callbacks

🏗️

Lab Build

Docker WordPress + MySQL + target plugin

💥

Exploit

Verify findings against live WP instance

📝

Report

Full writeup with reproduction steps

New WordPress-Native 6 Agents Hook/Filter Mapping

APKForge

Android Application Assessment

Takes an APK file and tears it apart - decompiling with APKTool and JADX, hunting for hardcoded secrets, mapping API endpoints, and validating findings against live infrastructure.

📥

Intake

Fetch APK, validate target

📦

Unpack

APKTool + JADX decompilation

🔑

Secrets

Scan for keys, tokens, credentials

🌐

Endpoints

Map API calls, URLs, backends

🎯

Validate

Test findings against live targets

🏗️

Lab Build

Docker test environment

💥

Exploit

Verify exploitability

📝

Report

Document all findings

New APKTool + JADX 8 Agents Bounded External Validation

LabForge

Exploit QA and Release Prep

The pipeline that makes everything publishable. Takes raw exploit output from other forges - scrubs internal paths, fixes Docker configs, normalizes documentation, and verifies reproducibility.

🐳

Docker Fix

Repair and normalize Dockerfiles

🧹

Scrub

Remove all internal paths and IPs

📄

Generate

Create standardized README and docs

✅

Verify

Build and test from clean checkout

🔧

Audit Fix

Fix issues found in verification

🔎

QA

Quality gate check

🔧

QA Fix

Address QA findings

✅

Final QA

Final release approval

Active Dev 8 Agents Leakage Scrubbing Release-Ready Output

Architecture

Modular Exploitation Framework

Write the engine once. Plug in a new security domain with ~2,000 lines. Every pipeline inherits retry logic, cost tracking, audit trails, and MCP tooling for free.

plugins

CVEForge

7 agents

BinForge

8 agents

StackForge

9 agents

ZeroForge

11 agents

WinForge

8 agents

Shannon

13 agents

WPForge

6 agents

APKForge

8 agents

LabForge

8 agents

PLUGIN INTERFACE

engine

🤖

Claude Executor

Multi-turn conversations, model selection, retry logic

⏱️

Temporal Runtime

Workflow state, retries, heartbeats, persistence

📊

Audit & Metrics

Cost tracking, turn counts, session telemetry

🚦

Claim Gates

Evidence verification, impact validation, false-positive blocking

MCP TOOL LAYER

infra

10 MCP Servers

18,181 LOC

11 Docker Images

Layered hierarchy

177 Prompts

Template engine

REST API

forge-api/

70%

code reuse

A new pipeline is ~5 files and ~2,000 lines. The core engine, MCP tools, Docker base images, prompt system, and audit trail come built in.

Architecture

Execution Flow

One command triggers a cascade - Docker containers spin up, Temporal orchestrates the workflow, Claude agents execute sequentially, and verified deliverables land in the workspace.

⌨️

./forge cveforge start

CLI dispatch + Docker Compose

⏱️

Temporal Workflow

State machine · retries · persistence

🤖

Claude Agent Turns

Multi-turn conversations · MCP tools

🔧

MCP Tool Servers

save_deliverable · run_exploit_test

🐳

Docker Labs

Vulnerable + patched containers

📦

Workspace Deliverables

PoCs · reports · evidence

How It Works

Temporal Workflow Engine

Every pipeline run is a deterministic Temporal workflow - fault-tolerant, resumable, and fully auditable.

🔄 Retry Presets

Production: 5min → 30min backoff, 50 max attempts
Testing: 10s → 30s, 5 attempts
Subscription: 5min → 6hr, 100 attempts

📊 State Tracking

Current phase, completed agents, per-agent cost/turns/duration, advisory warnings, and human review flags - all persisted in workflow state.

⏸️ Resume Support

Workflows pause and resume from the last checkpoint. Workspaces persist by default - use CLEAN=true for fresh runs.

Temporal.io v1.15 PostgreSQL Backend Heartbeat Monitoring

How It Works

Agent Turn Execution

Each agent is a multi-turn Claude conversation augmented with specialized tools. The executor handles retries, cost tracking, and model selection - so agents focus on the security research.

Claude Executor

Manages multi-turn Claude conversations. Handles MCP tool dispatch, streaming responses, retry logic with exponential backoff, spending cap detection, and API error recovery.

Prompt Manager

Template interpolation with {{CVE_ID}}, {{WORKSPACE_PATH}} variables. Include directives for shared partials. 177 prompt template files across all pipelines.

Multi-Tier Models

Small (Haiku) for lightweight tasks, Medium (Sonnet) for standard agent work, Large (Opus) for complex reasoning. Per-agent model selection optimizes cost vs. capability.

Metrics & Billing

Per-agent tracking of turns used, USD cost, and wall-clock duration. Aggregated into workflow summary. Billing cap detection auto-pauses if spending limits are hit.

How It Works

Deliverables & Evidence

Every agent produces deterministic deliverables - markdown reports and JSON decision gates that downstream agents read to decide whether to proceed.

📄 Markdown Reports

Intel brief
Vulnerability analysis
Lab build report
PoC verification report
Final README

🚦 JSON Gates

Exploitability gate (proceed/block)
Runtime gate (env ready?)
Finding gate (findings sufficient?)
Confidence scores + blockers

📦 Workspace Layout

source/ - target code
lab/ - Docker builds
deliverables/ - reports + PoCs
artifacts/ - evidence

Technology

MCP Tool Infrastructure

18,181 lines of MCP server code. 10 in-repo servers plus external integrations - all via stdio transport.

Shared Tools

save_deliverable
write_file + verification
run_in_repo
run_exploit_test
exploit_claim_gate

Pipeline-Specific

WinForge: 18 tools (WinDbg, VM SSH, ghidriff)
APKForge: 14 tools (APKTool, JADX, secrets)
BinForge: 8 tools (GDB, checksec, crash)
ZeroForge: 8 tools (fuzz, sanitizer, triage)

External MCPs

EIP server (vuln intelligence)
Playwright (5 browser agents)
Ghidra + GDB agents
SharkMCP (packet capture)

Technology

Docker Image Hierarchy

11 Dockerfiles organized as a hierarchy. Shared base image with pipeline-specific layers adding domain tools.

base.Dockerfile

Ubuntu 22.04 + Node.js 22 + Docker CLI + build tools, git, ripgrep, jq, cmake, Python 3

base-jvm.Dockerfile

Base + Java 21 + Ghidra headless - for BinForge and StackForge reverse engineering

Pipeline Layers

CVEForge: GDB, strace, pwntools, checksec
BinForge: Ghidra, GDB, ROPgadget, capstone
ZeroForge: AFL++, libFuzzer, ASAN/UBSAN
WinForge: ghidriff, QEMU guest tools

New Domains

ShannonForge: Playwright, Chromium
APKForge: APKTool, JADX, aapt
WPForge: WordPress CLI, PHP, semgrep
LabForge: Docker-in-Docker for cleanup

Technology

The ./forge CLI

One unified command interface for all 9 pipelines. Dispatches to per-pipeline bash scripts that handle Docker Compose orchestration.

Commands

start - Launch a pipeline workflow
stop - Stop containers
logs - Tail live workflow output
status - Structured run status (--json)
workspaces - List workspace dirs

Examples

./forge cveforge start CVE=CVE-2026-28296
./forge binforge start TARGET=/path/to/binary
./forge shannonforge start URL=https://target.com
./forge apkforge start APK=/path/to/app.apk
./forge zeroforge start TARGET_REPO=git://...

CLEAN=true PIPELINE_TESTING=true REBUILD=true WORKSPACE=name

Parallel Execution

ShannonForge: 5 Parallel Tracks

The only pipeline with true parallel agent execution - 5 vulnerability types analyzed simultaneously, each with a dedicated Playwright browser agent.

Recon Phase
pre-recon + recon

Fan Out

Injection

vuln + exploit

XSS

vuln + exploit

Auth

vuln + exploit

SSRF

vuln + exploit

AuthZ

vuln + exploit

Report
evidence synthesis

Infrastructure

Production Deployment

Two dedicated servers - EU and US - run pipelines around the clock. Temporal manages state, Docker isolates execution, and daily encrypted backups protect everything.

🇪🇺

forge-eu

EU exploit-forge server. Runs all pipeline types. OVH dedicated hardware with Docker + Temporal.

forge-eu.internal

🇺🇸

forge-us

US exploit-forge server. Parallel capacity for high-volume pipeline runs. OVH dedicated.

forge-us.internal

🌐

eip-app

Application servers running FastAPI + Apache. Serves exploit-intel.com behind Cloudflare CDN.

2 HA app servers + LB

⏱️

Temporal

Workflow engine with PostgreSQL backend. Manages state persistence, retries, and activity heartbeats.

One instance per forge server

🐳

Docker Compose

Per-pipeline compose files orchestrate Temporal worker + exploit-forge container + workspace volumes.

10 compose files · nested Docker

💾

Backups

Daily automated backups to Backblaze B2. GPG-encrypted. Crontab runs at 02:00-03:30 UTC.

B2 + GPG · daily retention

Infrastructure

Automated Provisioning

Two-phase provisioning - install.sh (13 stages, 30-60 min) + setup.sh (6 stages, 10-15 min). From bare metal to fully operational in under 2 hours.

install.sh - 53 KB, 13 stages

Base packages + Docker CE + Node.js 22
Java 21 + Ghidra + ghidriff
QEMU/KVM + libvirt (WinForge VMs)
Clone 15+ repos + build all MCP servers
Temporal + PostgreSQL + Docker images
Windows VM provisioning (ISOs + QCOW2)

setup.sh - 42 KB, 6 stages

System tools + Homebrew
Systemd services (Paperclip, forge-browser)
MCP infrastructure wiring
B2 backup system + GPG keys
AI tool settings (Claude Code, Codex)
Shell config + environment

Quality Assurance

The Exploit Claim Gate

A critical MCP tool that prevents agents from lying about exploit success. Compares the claimed impact against actual evidence and downgrades or blocks inflated claims.

Claimed Impact

Agent claims "RCE achieved" or "authentication bypass confirmed" based on its analysis and PoC execution results.

Evidence Check

Gate compares claim against actual exploit test output - exit codes, signals, stdout/stderr, timeout state. Does the evidence support the claim?

Verdict

Approved - evidence matches claim.
Downgraded - partial evidence, reduced severity.
Blocked - no supporting evidence.

The Platform

exploit-intel.com

The public face of the Exploit Intelligence Platform - vulnerability search, exploit rankings, stats dashboards, labs, and an MCP server for AI integration.

Exploit Intelligence Platform

BlogStatisticsLabsCLI ToolMCP ServerAPI Docs

Updated 2h ago

339,495

CVEs Tracked

53,335

With Exploits

4,748

Exploited in Wild

1,551

CISA KEV

3,948

Nuclei Templates

49,233

Vendors

42,833

Researchers

Search vulnerabilities...

Search

Severity: All Has Exploits KEV Only EPSS > 0.7

CVE-2026-24289 CRITICAL

Windows Kernel IOCP Race Condition - Use After Free

CVE-2026-3910 CRITICAL

V8 Maglev JIT Type Confusion in Chrome

CVE-2026-4105 HIGH

systemd-machined D-Bus Privilege Escalation

Statistics /stats

CVE Monthly Volume

MCP Server /mcp

17 tools for AI assistants - search vulns, analyze exploits, audit stacks, generate pentest findings

Remote Local Install 17 Tools

Docker Labs /labs

300+ verified exploit labs with Docker environments, PoC code, and technical writeups

12 Stats Charts MCP Server - 17 Tools 7 Theme Options REST API + CLI

Observability

Forge Archive Browser

Every run, every agent turn, every dollar spent - searchable and browsable. Full visibility into what the AI agents are doing and thinking.

Operational Console

Forge Archive Browser

Refresh All Import New

Runs

163

Total Spend

$1279

Artifacts

8,872

Agents

905

Runs 163

CVE-2039-900...BYPASS$4.96

stackforge - Mar 15

CVE-2039-900...BYPASS$2.12

stackforge - Mar 15

vuln-bin$4.81

binforge - Mar 15

Run Detail stackforge - bypass

CVE

CVE-2039-9000002

Duration

12m 15s

Cost

$2.12

Bypass

yes

Status

completed

Overview Agent Story Raw Events Artifacts Bypass

VULN-RESEARCH - ATTEMPT 1

thinking Mar 15, 06:15 PM

Let me start by gathering intelligence on the CVE and examining the source code and binary in parallel.

tool use mcp__eip_server__get_vulnerability

query=CVE-2039-9000002

PostgreSQL Backend 163 Runs Archived Full Agent Replay

Integration

Connected Systems

exploit-forge doesn't operate in isolation - it's wired into the entire Exploit Intel ecosystem. Vulnerability intelligence feeds in, results flow out to the company workflow, and the founder monitors everything from Telegram.

🔍 EIP Server

Query vulnerability intelligence - CVE details, exploit rankings, EPSS scores. Powers the intel phase of every CVE-based pipeline.

📋 Paperclip AI

Company workflow orchestration. CEO and ForgeRunner agents query pipeline status, trigger runs, and monitor progress via forge-ops MCP.

📱 Telegram

Real-time notifications for pipeline phase transitions and completions. The founder monitors runs from their phone.

🐙 GitHub

Clone target repositories, checkout specific commits/tags. SSH key authentication for private repos.

🗄️ PostgreSQL

forge_archive database stores audit logs, agent metrics, and deliverable metadata. Ingested via scripts for historical analysis.

☁️ Backblaze B2

Daily encrypted backups of forge data, audit logs, and workspace artifacts. GPG-encrypted at rest.

Scale

By the Numbers

The scale of what a small team can build when AI agents do the heavy lifting - from a $2 CVEForge run to a $40 zero-day research campaign.

35K

TypeScript LOC

177

Prompt Templates

10

MCP Servers

78

Total Agents

$2-8

CVEForge run cost

$10-40

ZeroForge run cost

2,000

Max turns per agent

Technology Stack

The Full Stack

From TypeScript and Temporal at the core to Ghidra, GDB, and AFL++ at the edges - every tool a security researcher needs, orchestrated by AI.

🦕

TypeScript

ES2022 strict mode

⏱️

Temporal.io

v1.15 workflows

🤖

Claude SDK

v0.2.38 agent runtime

🐳

Docker

11 images · Compose

🔧

MCP

18K LOC · 10 servers

🔬

Ghidra

Headless analysis

🐛

GDB

Runtime debugging

🔨

AFL++

Coverage fuzzing

Results

Real-World Impact

exploit-forge doesn't just generate reports - it finds real vulnerabilities and produces verified, reproducible exploit code.

300+

Verified Exploit Labs

76

CVEs with Published PoCs

0‑days

Found by AI Agents

8

Embargoed CVEs Pending

<75m

Vendor Patch Bypass Time

exploit-intel.com/labs exploit-intel.com/blog github.com/exploitintel/eip-pocs-and-cves

Results

Highlight Reel

A selection of findings - from Windows kernel exploits to vendor patch bypasses in under 75 minutes.

Windows Kernel Use-After-Free

0-day

IOCP race condition in the Windows kernel. WinForge identified the vulnerability through binary diffing of Patch Tuesday updates, then built a working exploit on a QEMU/KVM Windows VM.

CVE-2026-24289

V8 JIT Type Confusion

Patch Bypass

Type confusion in Chrome's Maglev JIT compiler. After the vendor patched the original finding, exploit-forge's bypass agent found a way around the fix - in 75 minutes.

CVE-2026-3910

systemd Privilege Escalation

Patch Bypass

Local privilege escalation via D-Bus in systemd-machined. The vendor's patch was incomplete - exploit-forge identified and bypassed it in 72 minutes, autonomously.

CVE-2026-4105

nginx Protocol Desync

0-day

Previously unknown FastCGI protocol desynchronization bugs in nginx, discovered by ZeroForge through automated coverage-guided fuzzing with AFL++ and sanitizer feedback.

ZeroForge Discovery

Summary

What Makes exploit-forge Special

Four architectural decisions that make the difference between a proof-of-concept and a production system.

🔀

70% Code Reuse

Shared core framework - bug fixes and improvements apply to all 9 pipelines automatically

⏱️

Fault-Tolerant Orchestration

Temporal workflows with retries, persistence, and resume - pipelines survive crashes and recover automatically

✅

Deterministic Evidence

Every exploit is verified against both vulnerable and patched targets - claim gates prevent false positives

🌐

9 Security Domains

CVEs, binaries, web apps, Windows patches, zero-days, WordPress, Android APKs - one platform, one CLI

Closing

Autonomous Security Research

78 AI agents across 9 specialized pipelines - from CVE advisory to zero-day discovery - orchestrated by Temporal, powered by Claude, delivering verified exploit research at scale.

Monorepo

Single source of truth

Temporal

Fault-tolerant workflows

MCP

Tool-augmented agents

9 pipelines · 78 agents · 35K LOC · 18K MCP
One CLI. One monorepo. Autonomous exploit research.

A project by The Exploit Intelligence Platform