nano-vllm

Files

T

Tai An 25794a1f29 fix(model_runner): correct seqlen_k to chunk boundary in prepare_prefill

During chunked prefill, seqlen_k was set to len(seq) (the full sequence
length), causing the attention kernel to access uninitialized KV slots
for tokens not yet scheduled in the current chunk.

Fix: reorder so that end = start + seqlen_q is computed first, then
set seqlen_k = end — limiting attention to the current chunk boundary.

Fixes #212

2026-04-22 15:13:19 -07:00

engine

fix(model_runner): correct seqlen_k to chunk boundary in prepare_prefill

2026-04-22 15:13:19 -07:00

layers

support chunked prefill and fix minor bug

2026-04-14 03:05:35 +08:00

models

support chunked prefill and fix minor bug

2026-04-14 03:05:35 +08:00

utils

enable slots=True for dataclasses