nano-vllm

Author	SHA1	Message	Date
Rain-Bus	ffd2defdfc	add Chinese annotations to all source files for learning purposes Annotated 16 source files covering the full architecture: engine (scheduler, block manager, model runner), layers (attention, linear, sampler, etc.), model (qwen3), and utils. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 21:33:15 +08:00
GeekExplorer	f64d821c20	fix chunked prefill bugs and refactor	2026-04-26 02:53:06 +08:00
six	77dd709ca1	fix(scheduler): recalculate num_tokens after allocate to prevent IndexError The scheduler overestimated num_scheduled_tokens because it used an outdated num_cached_tokens before block_manager.allocate(seq) could update it via prefix cache hits. In prepare_prefill (model_runner.py), this caused 'end = start + seqlen_q' to exceed the sequence length, leading to an inflated 'end_block'. Consequently, an 'index out of range' error occurred at line 155 when accessing seq.block_table[i] beyond its actual physical allocation.	2026-04-20 16:34:27 +08:00
GeekExplorer	8d63a98c03	support chunked prefill and fix minor bug	2026-04-14 03:05:35 +08:00
Chengqi Deng	498f5a1aa8	Fix scheduler.postprocess return type	2026-04-11 13:23:49 +08:00
GeeeekExplorer	cde3fc22c2	simplify	2025-06-21 17:19:15 +08:00
Xingkai Yu	326b121fad	Merge pull request #10 from MARD1NO/refine_return_hint_in_schedule	2025-06-15 10:39:51 +08:00
MARD1NO	98bbbefb68	schedule return bool args	2025-06-15 10:15:05 +08:00
cheunglei	53b3ef2e32	support tensor parallel	2025-06-15 01:31:24 +08:00
GeeeekExplorer	f16adb729e	refactor	2025-06-12 09:41:12 +08:00
GeeeekExplorer	386290d69e	refactor	2025-06-11 21:12:57 +08:00
GeeeekExplorer	b98e1ca305	fix	2025-06-10 21:25:54 +08:00
GeeeekExplorer	a5a4909e6a	init commit	2025-06-10 00:27:01 +08:00

13 Commits