Files
nano-vllm/nanovllm
six 77dd709ca1 fix(scheduler): recalculate num_tokens after allocate to prevent IndexError
The scheduler overestimated num_scheduled_tokens because it used an outdated num_cached_tokens before block_manager.allocate(seq) could update it via prefix cache hits. In prepare_prefill (model_runner.py), this caused 'end = start + seqlen_q' to exceed the sequence length, leading to an inflated 'end_block'. Consequently, an 'index out of range' error occurred at line 155 when accessing seq.block_table[i] beyond its actual physical allocation.
2026-04-20 16:34:27 +08:00
..
2026-02-08 23:46:01 -05:00
2025-06-15 10:36:45 +08:00
2025-06-15 01:31:24 +08:00