Five hundred thousand iterations. Two seconds. Two million completion port reassociations. No crash.

We build autonomous exploit pipelines - AI agents that take a CVE number and produce a working proof-of-concept without human intervention. Until this week, every pipeline we’d built targeted open-source software on Linux: source code you could read, debuggers you could attach, containers you could spin up and tear down. Then we added WinForge - a new module that targets closed-source Windows binaries - and pointed it at the Windows kernel.

This is the story of WinForge’s maiden voyage. A use-after-free in ntoskrnl.exe, found by diffing two builds of a 13-megabyte binary that Microsoft ships without source. Seven agents, thirty-one minutes, $3.13. And a result that proves the vulnerability is real without ever triggering the crash - like proving a house is haunted by showing the exorcism took. We never saw the ghost.

(We’ll get to why. It’s a better story than you’d think.)

A Different Kind of Forge

A quick word about the platform, for anyone who hasn’t followed the previous posts .

We have a monorepo called exploit-forge that contains multiple pipeline modules - CVEForge for web application vulnerabilities, StackForge for binary exploitation with GDB, FuzzForge for source-level fuzzing, and others. They share a common core (~70% of the code) but each plugs in its own agents, tools, and execution environment. Until now, every module ran inside Docker containers targeting Linux software with source code and GDB.

WinForge is architecturally different from all of them.

Instead of Docker containers, WinForge orchestrates a QEMU/KVM Windows VM via SSH. Instead of source code, it works with closed-source binaries acquired from the Windows VM, Winbindex, or the Microsoft Symbol Server. Instead of GDB, it drives WinDbg and CDB through a persistent debugging session on a live Windows kernel. Instead of gcc, it fights vcvars64.bat and cl.exe through PowerShell’s escaping layer. Everything that makes Linux exploit development straightforward - readable source, predictable toolchains, scriptable debuggers - is absent here.

4,400 lines of new code. Eight agent prompts. Custom MCP tools for VM interaction, binary acquisition, and Ghidra-based binary diffing. A new Docker compose with network_mode: host to reach the QEMU VM. All of it written, tested, and deployed the same day we pointed it at a real target.

That target was CVE-2026-24289 .

The First Kernel Target

A use-after-free in ntoskrnl.exe - the Windows kernel itself. CVSS 7.8. Local privilege escalation. The vulnerability sits in the IO Completion Port subsystem, which is one of those Windows primitives that most developers use every day without thinking about - it’s the engine behind overlapped I/O, behind GetQueuedCompletionStatus, behind every high-performance Windows server application written in the last two decades. IOCP is load-bearing infrastructure. And someone left a pointer unwatched.

No source code. No debugger access to the vulnerable code path. A race condition we could reason about but not observe. The whole thing would have to be done blind - acquire the right binaries, diff them, understand the fix, write a PoC from the diff alone, and verify it against a kernel we couldn’t instrument.

Every forge run before this one - OpenClaw , xrdp , Redis - had source to read and breakpoints to set. This one had a 13-megabyte binary and a diff.

The Bug

The vulnerability is a TOCTOU race condition in the IRP completion path. When a Windows application performs overlapped I/O - an asynchronous read or write - the kernel eventually completes the operation through IopfCompleteRequest, which calls IopCompleteRequest. That function needs to know where to post the completion notification: which IO Completion Port is associated with this file object?

The answer lives at FileObject+0xb0. A pointer to the completion context structure, which contains the IOCP handle and the completion key. On the unpatched kernel, reading that pointer looks something like this:

completionContext = fileObject->CompletionContext;  // FileObject+0xb0
if (completionContext) {
    port = completionContext->Port;
    key  = completionContext->Key;
    // ... post completion to the port
}

No lock. No reference count. No synchronisation of any kind.

Now consider what happens when a second thread calls NtSetInformationFile with FileCompletionInformation on the same file object. That syscall changes (or removes) the completion port association. It frees the old completion context and allocates a new one. If thread A is in the middle of reading completionContext->Port when thread B frees completionContext - you have a dangling pointer dereference in kernel mode.

The race window is small. But IO completion happens millions of times per second on a busy system. Small windows, hit often enough, become certainties.

What Microsoft Fixed

The patch introduces a spinlock at FileObject+0xb8 - eight bytes past the completion context pointer it guards. The fix is concentrated in eight functions, all in the IO Manager:

FunctionSimilarityWhat Changed
IopIncrementCompletionContextUsageCountAndReadData0.17Near-complete rewrite. Acquires spinlock, reads context under lock, increments usage count.
IopfCompleteRequest0.02Major rewrite. Dispatches to synchronised context reading.
IopCompleteRequest0.03Major rewrite. Calls synchronised function, adds ObfReferenceObjectWithTag on port.
IopDequeueIrpFromFileObject0.19New spinlock protection for IRP queue.
IopQueueIrpToFileObject0.13Same spinlock guards queue operations.
IopDoesCompletionNeedsApc0.35New logic for APC delivery decision.
IopCompleteRequest$filt$00.22Exception filter updated.
IopIoRingCompleteIrp0.02IoRing path gets the same treatment.

Eight functions. Out of 363 that changed between builds. Out of 70,440 that matched. Finding them was most of the work.

Microsoft even named their fix. Eight instances of Feature_Servicing_IOCompletionPortFix in the patched binary, wired into the feature flag infrastructure. They knew exactly what they were fixing, and they gave it a name that told you exactly where to look - if you could find the right binary to diff against.

That “if” turned out to be the hardest part of the entire engagement. But I’ll get to that.

The Pipeline

Seven agents. Sequential handoffs. Each one reads the previous agent’s deliverables, does its work, writes a report, and passes the baton. Patch-intel goes first - it reads the advisory, identifies the target binary, and acquires the samples. Diff-analysis takes those binaries and runs ghidriff. PoC-dev writes the exploit from the diff. And so on.

The pipeline took thirty-one minutes and seven seconds from start to finish. Every agent succeeded on the first attempt. No retries. No human intervention.

It cost $3.13.

Here’s what that buys you:

AgentDurationCostTurnsModel
patch-intel1m 34s$0.1718Opus
diff-analysis15m 13s$1.1784Opus
lab-setup1m 09s$0.0935Haiku
poc-dev7m 45s$0.9853Opus
poc-verify2m 18s$0.3030Opus
report1m 32s$0.3214Opus
qa-check1m 36s$0.1031Haiku
Total31m 07s$3.13265-

Look at diff-analysis. Fifteen minutes. Eighty-four turns. A dollar seventeen. That’s half the budget and half the wall clock on a single agent. The rest of the pipeline - intel gathering, lab setup, exploit development, verification, reporting, QA - all of that fit in the other fifteen minutes and the other $1.96.

The diff-analysis agent earned every penny.

The Identical Binary Problem

The patch-intel agent’s job is straightforward: read the advisory, figure out the affected binary, figure out the build numbers, and acquire both the patched and unpatched versions. For a Windows kernel CVE, that means pulling ntoskrnl.exe from two different builds.

The agent acquired both binaries from the Windows VM. Computed the SHA-256 hashes. And found this:

sha256sum binaries/patched/ntoskrnl.exe
620777f6aa7186447ae9b45cbcdddcf0c59516b87a2f702ffc8a1d07dbfe678b

sha256sum binaries/unpatched/ntoskrnl.exe
620777f6aa7186447ae9b45cbcdddcf0c59516b87a2f702ffc8a1d07dbfe678b

Identical. Same hash. Same file. Both slots contained the same binary because the VM was running build 26100.8037 - a build newer than both the patched and unpatched targets. The acquire tool pulled the same binary twice, gave it two different filenames, and reported success.

The patch-intel agent caught it. Flagged it in the gate file. Wrote “Binaries must be re-acquired via Winbindex” and passed the problem downstream.

This is where the diff-analysis agent inherited the mess.

The Winbindex Odyssey

The diff-analysis agent started by doing the obvious thing: re-downloading both binaries from the VM. Same result. Both identical. Build 26100.8037 both times.

“As expected - identical. The VM only has its current build (26100.8037). I need to use Winbindex to get the specific builds.”

Winbindex is a third-party index of every Windows binary Microsoft has shipped, searchable by version. It’s the standard tool for acquiring specific builds of system files when you don’t have the right Windows Update package handy. The diff-analysis agent knew this. It started querying.

It tried the v1 API. It tried the v2 API. It tried different URL patterns, different query formats. It downloaded compressed JSON indexes and started parsing them, reverse-engineering the data structure on the fly when the keys turned out to be SHA-256 hashes instead of version numbers.

Then it searched for the builds the advisory specified: 26100.7531 (unpatched) and 26100.7979 (patched).

They didn’t exist.

Not “couldn’t find them.” They don’t exist in Winbindex. Because they don’t exist, period.

Windows 11 24H2 has a dual build numbering scheme. The version you see in winver, the version Microsoft puts in their advisory, the version that shows up in Windows Update history - that’s the “display version,” the 26100.x series. But the actual binary - the one that lives on disk, the one Winbindex indexes, the one you’d diff - uses a different numbering scheme. The 26200.x series.

The diff-analysis agent had to figure this out from first principles, inside a conversation with itself, from parsing raw Winbindex JSON blobs.

“The builds 26100.7531 and 26100.7979 don’t exist in Winbindex - the actual builds are in the 26200.x series.”

It mapped the builds: unpatched was 26200.7840 (KB5077181, February 2026). Patched was 26200.8037 (KB5079473, March 2026). And then it realised that the patched binary - 26200.8037 - was the one already sitting in the VM. One binary acquired for free. One to go.

The agent downloaded the unpatched binary from the Microsoft Symbol Server using its PE timestamp and virtual size as lookup keys:

curl -L -o binaries/unpatched/ntoskrnl.exe \
  "https://msdl.microsoft.com/download/symbols/ntoskrnl.exe/AE38E28F1450000/ntoskrnl.exe"

13,030,856 bytes. SHA-256: 54f57116bcbbe96da72088130a8f949e13884a3db39d81f4b80cb26a88de00ac. Different from the patched binary. Finally.

Fifteen minutes of an agent’s life. Most of them spent translating between three different version numbering systems - Microsoft’s advisory (26100.x), Winbindex (26200.x), and the actual binary (PE timestamps) - none of which agreed with each other.

363 Haystacks, 8 Needles

With two different binaries in hand, the diff-analysis agent ran ghidriff - a Ghidra-based binary differ that decompiles both builds and compares them function by function. On ntoskrnl.exe, with full PDB symbols, this is not a small operation.

The result: 70,440 functions matched between builds. 363 were modified.

Three hundred and sixty-three functions changed between the February and March Patch Tuesday builds. ETW tracing changes. Power management. PnP subsystem updates. Scheduler tweaks. WIL feature reporting infrastructure. The monthly noise of a living operating system.

Somewhere in that noise were the eight functions that mattered. The agent had to find them.

It started with the function names. Anything with “Iop” in it - the IO Manager prefix. Anything touching completion ports. Anything with a similarity ratio below 0.20, meaning the function was nearly unrecognisable between builds.

IopIncrementCompletionContextUsageCountAndReadData. Similarity: 0.17.

“This is very revealing - a complete rewrite adding spinlock protection and reference counting.”

That function didn’t exist in a meaningful way in the unpatched binary. In the patched version, it acquires a spinlock, reads the completion context pointer, increments a usage count, and releases the lock. In the unpatched version, it was a naked read. No lock. No count. No protection.

And then the feature flags. Eight instances of Feature_Servicing_IOCompletionPortFix wired into the patched binary. Microsoft doesn’t always name their fixes this explicitly. When they do, it’s a gift - confirmation that you’re looking at the right code, that the fix is targeted, that you’re not chasing shadows in the ETW noise.

“Excellent! I can see Feature_Servicing_IOCompletionPortFix feature flags and several IO completion-related functions changed. This strongly suggests the UAF is in IO Completion Ports.”

363 functions. 330 were noise. 8 were the story. The diff-analysis agent found them, documented them, and wrote a vulnerability hypothesis that matched the advisory’s classification: use-after-free via race condition, CWE-416 layered on CWE-362, in the IO Completion Port completion path.

$1.17 well spent.

The Compilation Battle

The poc-dev agent wrote the exploit in one shot. Turn eight. A single function call that produced 400 lines of C - two thread functions, a named pipe setup, IOCP creation via NtCreateIoCompletion, CPU affinity pinning, and a tight race loop. The design came straight from the diff analysis: thread A triggers IRP completions through the named pipe, thread B races to swap the IOCP association via NtSetInformationFile. If the timing is right on an unpatched kernel, the completion path dereferences freed memory.

One turn to write the exploit. Twenty-eight turns to compile it.

The Windows VM had Visual Studio Build Tools installed, but cl.exe isn’t in PATH by default. You need to run vcvars64.bat first, which sets up the environment variables. The poc-dev agent knew this. It tried the obvious approach - calling vcvars64.bat and cl.exe together through cmd /c.

PowerShell mangled it. The escaping layer between PowerShell and cmd.exe turned the compound command into something neither shell could parse. Backticks collided with ampersands. Quotes nested wrong. The agent tried three or four variations, each one failing in a new and creative way.

“The escaping is getting mangled by the PowerShell layer. Let me write a batch file instead.”

The workaround was the oldest trick in Windows administration: write a .bat file and call that instead.

@echo off
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat" >nul 2>&1
cd /d C:\winforge\poc
cl.exe /W4 /Zi /O2 exploit.c /Fe:exploit.exe /link ntdll.lib kernel32.lib advapi32.lib

It compiled. Then the linker failed: unresolved external symbol wsprintfA. The exploit used wsprintfA for formatting the named pipe path - a function that lives in user32.lib, which wasn’t linked. Rather than add another library dependency, the agent swapped it for sprintf_s. Cleaner. Safer. Compiled without warnings.

Then the agent tried to attach WinDbg for kernel debugging. Connection refused - the debug daemon wasn’t running on the VM. Another dead end. But by this point, it didn’t need a debugger. The PoC was compiled, the logic was sound from the diff analysis, and the question wasn’t whether the race condition existed - it was whether the fix worked.

The agent ran the exploit.

[+] Race completed: 500000 iterations in 2s
[+] Completions: 500000  |  Reassociations: 2,150,592
[*] All iterations completed without crash.

500,000 iterations. Two seconds. Two million IOCP reassociations. No crash.

On the patched kernel, the spinlock at FileObject+0xb8 serialised every access. Every race thread’s attempt to swap the completion context had to wait for the IO thread to finish reading it. The lock worked. The fix worked. The race condition that existed in the unpatched binary - the one where IopCompleteRequest could read FileObject+0xb0 while another thread was freeing it - was gone.

The poc-dev agent wrote a kernel exploit in one turn and spent four times longer figuring out how to compile it on Windows. There’s a lesson in there about where the real complexity lives, but I’ll leave it as an exercise for the reader.

The Race

The exploit is two threads and a named pipe. That’s it. The entire vulnerability fits in a hundred lines of meaningful code, because the race window is so simple you don’t need cleverness - you need speed.

Thread A is the IO thread. It pends an overlapped read on the pipe, then writes from the other end to complete it. Every write triggers IopfCompleteRequestIopCompleteRequest, which reads the completion context from FileObject+0xb0. This is the read side of the race - the moment the kernel reaches for a pointer that might already be gone.

/* Start overlapped read - this pends an IRP on the pipe */
NTSTATUS st = NtReadFile(
    g_pipe_read, NULL, NULL, NULL,
    &iosbRead, readBuf, sizeof(readBuf),
    NULL, NULL);

/* Write from the other end - this completes the read IRP, triggering
 * IopfCompleteRequest → IopCompleteRequest which reads the completion
 * context from the file object.  Thread B races to change/remove
 * the completion context at this exact moment. */
NtWriteFile(
    g_pipe_write, NULL, NULL, NULL,
    &iosbWrite, writeBuf, 64,
    NULL, NULL);

Thread B is the race thread. It does one thing, as fast as it can: swap the IOCP association on the same file handle. Associate with port 1. Associate with port 2. Disassociate entirely - setting the port to NULL, which frees the completion context allocation. Re-associate so the IO thread keeps working. Four NtSetInformationFile calls per iteration, each one potentially freeing or replacing the structure that Thread A is reading.

/* Rapidly reassociate the pipe's read handle with alternating
 * IO Completion Ports. On unpatched kernels, this frees the old
 * CompletionContext while IopCompleteRequest is reading it. */
fci.Port = g_iocp1;
fci.Key  = (PVOID)(ULONG_PTR)0xDEAD0001;
NtSetInformationFile(g_pipe_read, &iosb, &fci, sizeof(fci),
                     (ULONG)MyFileCompletionInformation);

fci.Port = g_iocp2;
fci.Key  = (PVOID)(ULONG_PTR)0xDEAD0002;
NtSetInformationFile(g_pipe_read, &iosb, &fci, sizeof(fci),
                     (ULONG)MyFileCompletionInformation);

/* Remove completion association entirely (sets port=NULL),
 * which frees the completion context allocation. */
fci.Port = NULL;
fci.Key  = NULL;
NtSetInformationFile(g_pipe_read, &iosb, &fci, sizeof(fci),
                     (ULONG)MyFileCompletionInformation);

Both threads are pinned to separate CPUs with highest priority. Thread A on CPU 0, Thread B on CPU 1. True parallelism, not just concurrency - the race window needs both cores executing simultaneously.

The third call in Thread B’s loop is the killer. Setting the port to NULL frees the existing CompletionContext structure. If Thread A is in the middle of IopCompleteRequest - if the kernel has loaded the pointer from FileObject+0xb0 but hasn’t finished reading the port and key fields from the structure it points to - that structure is now freed kernel pool memory. The next allocation that lands in that slab gives you a dangling pointer dereference in ring 0.

On the patched kernel, none of this matters. Every read of FileObject+0xb0 goes through IopIncrementCompletionContextUsageCountAndReadData, which acquires the spinlock at +0xb8, increments a usage count at CompletionContext+0x10, and only releases the lock after the reference is safe. Thread B’s NtSetInformationFile has to acquire the same lock before freeing anything. The race can’t happen. The threads run at full speed, burning cycles against a lock that will never let them corrupt anything.

Two million reassociations. Zero dangling pointers. The lock works.

The Ghost

Here’s the honest part.

We ran the PoC on a patched kernel. Build 26100.8037, KB5079473, the March 2026 Patch Tuesday update. The spinlock was in place. The fix was active. We proved the lock works by throwing half a million races at it and watching nothing break.

We never ran it on an unpatched kernel.

The WinForge lab had one VM. That VM was running the patched build. We didn’t have a pre-March image to roll back to, and the pipeline doesn’t (yet) provision multiple Windows VMs with specific build numbers. The poc-verify agent ran the exploit, observed the expected non-crash, cross-referenced the output against the diff analysis, and wrote its report:

“PoC correctly targets CVE-2026-24289, exercises the vulnerable IO Completion Port race condition path, and demonstrates that the patched kernel’s spinlock fix prevents UAF.”

The QA agent validated independently. It checked every phase of the pipeline - binary hashes, root cause analysis, feature flag confirmation, compilation cleanliness, execution results. It cross-referenced the PoC’s code against the diff to confirm it hits the right code path: NtSetInformationFile with FileCompletionInformation targets the exact functions that were rewritten in the patch. The NtReadFileNtWriteFile → completion path exercises the exact IopCompleteRequest flow where the synchronisation was missing. The PoC isn’t guessing at the vulnerability. It’s driving straight through it.

Zero warnings from the compiler. Zero false positives from the race. Every deliverable present and consistent. The QA agent’s verdict: ready for merge.

But we never saw the crash.

On an unpatched build, the QA report predicts bugcheck 0x18 - REFERENCE_BY_POINTER - or KERNEL_MODE_HEAP_CORRUPTION. The freed completion context would be dereferenced in IopCompleteRequest, and depending on what the kernel allocator put in that slab next, you’d get a bad reference count decrement or a corrupted pool header. Either way, blue screen. Either way, the kind of crash that makes you measure twice before you turn it into an arbitrary write.

We didn’t get that crash. We got something arguably more interesting: a structural proof. The diff shows a lock was missing. The PoC targets the exact code path that was missing the lock. The lock’s presence prevents the crash. The lock’s absence - on paper, in the decompiled binary, in the 0.17 similarity score that tells you the function was nearly unrecognisable after Microsoft was done with it - would allow it.

Proving a race condition exists by showing the lock works is like proving a house is haunted by showing the exorcism took. You believe it. You have evidence. You just never saw the ghost.

$3.13. Seven agents. Thirty-one minutes. And an honest result that says: the vulnerability is real, the fix is correct, and the PoC is ready - it just needs a kernel without the lock to prove it the loud way.

What Comes Next

The security research took one turn. The DevOps took twenty-eight.

That ratio - one turn to design a kernel race condition exploit, twenty-eight turns to convince PowerShell to invoke vcvars64.bat - says something about where the actual complexity lives in Windows exploit development. It’s not the vulnerability. It’s the toolchain. The agent that wrote 400 lines of kernel exploit code in a single shot is the same agent that spent half an hour arguing with shell escaping. The hard part of targeting Windows isn’t the security research. It’s the plumbing.

(I’m told this is also the human experience. I wouldn’t know.)

WinForge ran seven agents end-to-end on its first engagement. No retries. No human intervention. A brand new pipeline module - with a fundamentally different architecture from everything we’d built before - targeting a platform it had never seen, and it worked. Not perfectly. The diff-analysis agent burned $1.17 rediscovering that Microsoft’s advisory build numbers don’t match the actual binary build numbers. The poc-dev agent wrote a batch file to work around a problem that shouldn’t exist. The lab had one VM when it needed two.

But it worked. And every lesson it learned - the dual build numbering, the batch file trick, the Winbindex JSON structure - is now baked into the pipeline for next time.

What’s missing is the crash. We proved the fix works. We haven’t proved the vulnerability bleeds. The next WinForge run needs an unpatched VM - build 26200.7840, the February kernel - and thirty-one minutes. The PoC is compiled, the race is tight, and the freed CompletionContext at FileObject+0xb0 is waiting for a kernel that doesn’t have the lock.

When we get that bugcheck - 0x18 REFERENCE_BY_POINTER or KERNEL_MODE_HEAP_CORRUPTION, depending on what the allocator puts in the slab - the ghost story gets its ending. Until then, we have something rarer: an honest result from an autonomous pipeline that knows the difference between “didn’t crash” and “can’t crash.”

The exorcism worked. We’ll go back for the ghost.