CVE-2026-53923

HIGH

vLLM GGUF Kernels: int64_t to int truncation of tensor dimensions causes GPU buffer overflow

Title source: cna
STIX 2.1

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

Scores

CVSS v3 7.5
EPSS 0.0028
EPSS Percentile 19.8%
Attack Vector NETWORK
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N

CISA SSVC

Vulnrichment
Exploitation none
Automatable no
Technical Impact partial

Details

CWE
CWE-200 CWE-681
Status published
Products (2)
vllm/vllm 0.5.5 - 0.23.1
vllm-project/vllm >= 0.5.5, < 0.23.1rc0
Published Jun 22, 2026
Tracked Since Jun 23, 2026