six
77dd709ca1
fix(scheduler): recalculate num_tokens after allocate to prevent IndexError
...
The scheduler overestimated num_scheduled_tokens because it used an outdated num_cached_tokens before block_manager.allocate(seq) could update it via prefix cache hits. In prepare_prefill (model_runner.py), this caused 'end = start + seqlen_q' to exceed the sequence length, leading to an inflated 'end_block'. Consequently, an 'index out of range' error occurred at line 155 when accessing seq.block_table[i] beyond its actual physical allocation.
2026-04-20 16:34:27 +08:00
GeekExplorer
8d63a98c03
support chunked prefill and fix minor bug
2026-04-14 03:05:35 +08:00
GeekExplorer
9e8507ef41
minor simplify
2026-04-13 22:09:46 +08:00
Xingkai Yu
a4f94cb38b
Merge pull request #200 from KinglittleQ/fix-scheduler-typing
...
Fix scheduler.postprocess return type
2026-04-13 21:13:37 +08:00
Xingkai Yu
52d2215911
Merge pull request #148 from guodongxiaren/main
...
remove hard code for block_size
2026-04-13 20:36:32 +08:00
Chengqi Deng
498f5a1aa8
Fix scheduler.postprocess return type
2026-04-11 13:23:49 +08:00
guodongxiaren
55c64e7fdf
remove hard code for block_size
2025-12-30 01:55:17 +08:00
Mengqi
82f5ca244f
fix bug for tp
2025-12-18 01:28:25 +08:00
GeeeekExplorer
2f21442653
support qwen2
2025-11-04 01:44:42 +08:00
GeeeekExplorer
df99418f7d
simplify
2025-08-31 20:02:51 +08:00
PeterDing
f5b4840276
fix(model_runner): correct position indexing to be 0-based
...
- Change position calculation from len(seq) to len(seq) - 1
2025-07-04 14:29:12 +08:00
GeeeekExplorer
cb0b3dec3f
remove rng state
2025-06-27 22:50:33 +08:00
GeeeekExplorer
1caeec8dfa
same as vllm
2025-06-27 18:50:56 +08:00
GeeeekExplorer
658520b788
warmup and allocate
2025-06-27 01:51:57 +08:00
xiaohajiayou
054aec852d
Fix: Division-by-Zero Risk and Typo
2025-06-24 02:02:33 +08:00
GeeeekExplorer
03cfc13bb3
faster pickle
2025-06-23 00:51:52 +08:00
GeeeekExplorer
cde3fc22c2
simplify
2025-06-21 17:19:15 +08:00
jinghuan-Chen
ffafaeb133
Release CUDA Graphs resource before exit.
2025-06-18 16:17:31 +08:00
Xingkai Yu
4fc764f175
Merge pull request #22 from cheunglei/use_spawn
2025-06-17 23:53:59 +08:00
cheunglei
b5ace32982
use spawn
2025-06-17 23:49:15 +08:00
GeeeekExplorer
bc0ad5a116
better
2025-06-17 23:33:38 +08:00
GeeeekExplorer
7e42fa6f63
fix
2025-06-15 13:28:29 +08:00
Xingkai Yu
326b121fad
Merge pull request #10 from MARD1NO/refine_return_hint_in_schedule
2025-06-15 10:39:51 +08:00
GeeeekExplorer
fc778a4da9
better
2025-06-15 10:36:45 +08:00
MARD1NO
98bbbefb68
schedule return bool args
2025-06-15 10:15:05 +08:00
cheunglei
53b3ef2e32
support tensor parallel
2025-06-15 01:31:24 +08:00
GeeeekExplorer
b6136383c9
support fast pickle
2025-06-14 13:36:57 +08:00
GeeeekExplorer
4a8aa090a7
fix
2025-06-14 00:56:07 +08:00
GeeeekExplorer
59aa3ff57c
better
2025-06-13 13:07:33 +08:00
GeeeekExplorer
98a1551a7d
support CUDA_VISIBLE_DEVICES
2025-06-12 23:14:01 +08:00
GeeeekExplorer
ec3c60d96f
update bench
2025-06-12 22:54:51 +08:00
GeeeekExplorer
f16adb729e
refactor
2025-06-12 09:41:12 +08:00
GeeeekExplorer
fee58d44e4
fix
2025-06-12 01:00:31 +08:00
GeeeekExplorer
08c84ec08d
multi file loader
2025-06-12 01:00:09 +08:00
GeeeekExplorer
386290d69e
refactor
2025-06-11 21:12:57 +08:00
GeeeekExplorer
b98e1ca305
fix
2025-06-10 21:25:54 +08:00
GeeeekExplorer
a5a4909e6a
init commit
2025-06-10 00:27:01 +08:00