Description
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
References (3)
Scores
CVSS v3
2.6
EPSS
0.0018
EPSS Percentile
38.8%
Attack Vector
NETWORK
CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N
CISA SSVC
Vulnrichment
Exploitation
none
Automatable
no
Technical Impact
partial
Details
CWE
CWE-208
CWE-203
Status
published
Products (2)
pypi/vllm
0 - 0.9.0PyPI
vllm/vllm
< 0.9.0
Published
May 29, 2025
Tracked Since
Feb 18, 2026