Files
nano-vllm/nanovllm
Tai An 25794a1f29 fix(model_runner): correct seqlen_k to chunk boundary in prepare_prefill
During chunked prefill, seqlen_k was set to len(seq) (the full sequence
length), causing the attention kernel to access uninitialized KV slots
for tokens not yet scheduled in the current chunk.

Fix: reorder so that end = start + seqlen_q is computed first, then
set seqlen_k = end — limiting attention to the current chunk boundary.

Fixes #212
2026-04-22 15:13:19 -07:00
..
2026-02-08 23:46:01 -05:00
2025-06-15 10:36:45 +08:00
2025-06-15 01:31:24 +08:00