Commit Graph

13 Commits

Author SHA1 Message Date
Rain-Bus ffd2defdfc add Chinese annotations to all source files for learning purposes
Annotated 16 source files covering the full architecture:
engine (scheduler, block manager, model runner), layers (attention,
linear, sampler, etc.), model (qwen3), and utils.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:33:15 +08:00
GeekExplorer f64d821c20 fix chunked prefill bugs and refactor 2026-04-26 02:53:06 +08:00
six 77dd709ca1 fix(scheduler): recalculate num_tokens after allocate to prevent IndexError
The scheduler overestimated num_scheduled_tokens because it used an outdated num_cached_tokens before block_manager.allocate(seq) could update it via prefix cache hits. In prepare_prefill (model_runner.py), this caused 'end = start + seqlen_q' to exceed the sequence length, leading to an inflated 'end_block'. Consequently, an 'index out of range' error occurred at line 155 when accessing seq.block_table[i] beyond its actual physical allocation.
2026-04-20 16:34:27 +08:00
GeekExplorer 8d63a98c03 support chunked prefill and fix minor bug 2026-04-14 03:05:35 +08:00
Chengqi Deng 498f5a1aa8 Fix scheduler.postprocess return type 2026-04-11 13:23:49 +08:00
GeeeekExplorer cde3fc22c2 simplify 2025-06-21 17:19:15 +08:00
Xingkai Yu 326b121fad Merge pull request #10 from MARD1NO/refine_return_hint_in_schedule 2025-06-15 10:39:51 +08:00
MARD1NO 98bbbefb68 schedule return bool args 2025-06-15 10:15:05 +08:00
cheunglei 53b3ef2e32 support tensor parallel 2025-06-15 01:31:24 +08:00
GeeeekExplorer f16adb729e refactor 2025-06-12 09:41:12 +08:00
GeeeekExplorer 386290d69e refactor 2025-06-11 21:12:57 +08:00
GeeeekExplorer b98e1ca305 fix 2025-06-10 21:25:54 +08:00
GeeeekExplorer a5a4909e6a init commit 2025-06-10 00:27:01 +08:00