nano-vllm

Files

T

six 77dd709ca1 fix(scheduler): recalculate num_tokens after allocate to prevent IndexError

The scheduler overestimated num_scheduled_tokens because it used an outdated num_cached_tokens before block_manager.allocate(seq) could update it via prefix cache hits. In prepare_prefill (model_runner.py), this caused 'end = start + seqlen_q' to exceed the sequence length, leading to an inflated 'end_block'. Consequently, an 'index out of range' error occurred at line 155 when accessing seq.block_table[i] beyond its actual physical allocation.

2026-04-20 16:34:27 +08:00

engine

fix(scheduler): recalculate num_tokens after allocate to prevent IndexError

2026-04-20 16:34:27 +08:00

layers

support chunked prefill and fix minor bug

2026-04-14 03:05:35 +08:00

models

support chunked prefill and fix minor bug

2026-04-14 03:05:35 +08:00

utils

enable slots=True for dataclasses