Xingkai Yu
bb823b3e06
Merge pull request #218 from GeeeekExplorer/chunked-prefill-refactor
...
fix chunked prefill bugs and refactor
2026-04-26 13:10:12 +08:00
GeekExplorer
9fa256a56d
fix cache hit
2026-04-26 03:49:14 +08:00
GeekExplorer
f64d821c20
fix chunked prefill bugs and refactor
2026-04-26 02:53:06 +08:00
Xingkai Yu
44a51afc8a
Merge pull request #207 from DestineG/fix-prefill-index-out-of-range
...
fix(scheduler): recalculate num_tokens after allocate to prevent IndexError at model_runner:155
2026-04-25 18:12:43 +08:00
Xingkai Yu
5df84934f3
Merge pull request #213 from Anai-Guo/fix/prepare-prefill-seqlen-k-chunked-prefill
...
fix(model_runner): correct seqlen_k to chunk boundary in prepare_prefill
2026-04-25 17:43:24 +08:00
Tai An
25794a1f29
fix(model_runner): correct seqlen_k to chunk boundary in prepare_prefill
...
During chunked prefill, seqlen_k was set to len(seq) (the full sequence
length), causing the attention kernel to access uninitialized KV slots
for tokens not yet scheduled in the current chunk.
Fix: reorder so that end = start + seqlen_q is computed first, then
set seqlen_k = end — limiting attention to the current chunk boundary.
Fixes #212
2026-04-22 15:13:19 -07:00
six
77dd709ca1
fix(scheduler): recalculate num_tokens after allocate to prevent IndexError
...
The scheduler overestimated num_scheduled_tokens because it used an outdated num_cached_tokens before block_manager.allocate(seq) could update it via prefix cache hits. In prepare_prefill (model_runner.py), this caused 'end = start + seqlen_q' to exceed the sequence length, leading to an inflated 'end_block'. Consequently, an 'index out of range' error occurred at line 155 when accessing seq.block_table[i] beyond its actual physical allocation.
2026-04-20 16:34:27 +08:00
Xingkai Yu
812eb1c1e4
Merge pull request #204 from GeeeekExplorer/chunked-prefill
...
Chunked Prefill
2026-04-14 03:06:27 +08:00
GeekExplorer
8d63a98c03
support chunked prefill and fix minor bug
2026-04-14 03:05:35 +08:00
GeekExplorer
9e8507ef41
minor simplify
2026-04-13 22:09:46 +08:00
Xingkai Yu
02a95fdc66
Merge pull request #203 from Anai-Guo/fix-row-parallel-bias-crash
...
fix RowParallelLinear weight_loader crash when bias is enabled
2026-04-13 21:26:07 +08:00
Xingkai Yu
a4f94cb38b
Merge pull request #200 from KinglittleQ/fix-scheduler-typing
...
Fix scheduler.postprocess return type
2026-04-13 21:13:37 +08:00
Xingkai Yu
00eea73176
Merge pull request #172 from IceCreamMilkyTea/main
...
enable 'slots=True' for dataclasses
2026-04-13 20:45:50 +08:00
Xingkai Yu
52d2215911
Merge pull request #148 from guodongxiaren/main
...
remove hard code for block_size
2026-04-13 20:36:32 +08:00
Xingkai Yu
7f967ed6ff
Merge pull request #145 from LiaoMengqi/fix/tp
...
bug for tensor parallelism # issue 144
2026-04-13 20:34:49 +08:00
Anai-Guo
bf99453d90
fix RowParallelLinear weight_loader crash when bias is enabled
...
When RowParallelLinear has bias=True, the weight_loader crashes with an
IndexError because it calls param_data.size(tp_dim) where tp_dim=1, but
the bias tensor is 1D and only has dimension 0.
The bias in RowParallelLinear is not sharded (all ranks hold the full
bias, only rank 0 applies it), so skip the sharding logic for 1D params.
Fixes GeeeekExplorer/nano-vllm#125
2026-04-12 23:10:52 -07:00
Chengqi Deng
498f5a1aa8
Fix scheduler.postprocess return type
2026-04-11 13:23:49 +08:00
IceCreamMilkyTea
f438ce463f
enable slots=True for dataclasses
2026-02-08 23:46:01 -05:00
guodongxiaren
55c64e7fdf
remove hard code for block_size
2025-12-30 01:55:17 +08:00
Mengqi
82f5ca244f
fix bug for tp
2025-12-18 01:28:25 +08:00
GeeeekExplorer
2f21442653
support qwen2
2025-11-04 01:44:42 +08:00
GeeeekExplorer
db1b49dce4
add logo and trendshift
2025-11-04 00:45:10 +08:00
GeeeekExplorer
6ef2a4f630
compile random sampling
2025-08-31 22:55:34 +08:00
GeeeekExplorer
df99418f7d
simplify
2025-08-31 20:02:51 +08:00
Xingkai Yu
6a6d217de7
Merge pull request #67 from PeterDing/fix/decoding-positions
...
fix(model_runner): correct position indexing to be 0-based
2025-08-31 18:05:45 +08:00
PeterDing
f5b4840276
fix(model_runner): correct position indexing to be 0-based
...
- Change position calculation from len(seq) to len(seq) - 1
2025-07-04 14:29:12 +08:00
GeeeekExplorer
38baf0bbe4
remove assert shape
2025-06-27 23:00:30 +08:00
Xingkai Yu
2de882a395
Merge pull request #60 from GeeeekExplorer/warmup
2025-06-27 22:52:11 +08:00
GeeeekExplorer
cb0b3dec3f
remove rng state
2025-06-27 22:50:33 +08:00
Xingkai Yu
6802cb2f42
Merge pull request #54 from TonyLianLong/patch-1
2025-06-27 22:44:38 +08:00
GeeeekExplorer
1caeec8dfa
same as vllm
2025-06-27 18:50:56 +08:00
GeeeekExplorer
658520b788
warmup and allocate
2025-06-27 01:51:57 +08:00
Long(Tony) Lian
c2ee8b8dff
Update pyproject.toml to fix missing files
2025-06-25 17:57:38 -07:00
papadopoulos Aggelos-Michael
cfc4cb6710
docs: add manual download instructions
2025-06-24 23:38:28 +08:00
Xingkai Yu
37eb91f890
Merge pull request #39 from xiaohajiayou/main
2025-06-24 22:51:58 +08:00
xiaohajiayou
054aec852d
Fix: Division-by-Zero Risk and Typo
2025-06-24 02:02:33 +08:00
GeeeekExplorer
03cfc13bb3
faster pickle
2025-06-23 00:51:52 +08:00
Xingkai Yu
8162578b60
star history
2025-06-22 15:13:04 +08:00
GeeeekExplorer
cde3fc22c2
simplify
2025-06-21 17:19:15 +08:00
Xingkai Yu
ad4e95fbdc
update .gitignore
2025-06-21 07:28:40 +08:00
GeeeekExplorer
801365a611
update bench
2025-06-19 23:28:11 +08:00
Xingkai Yu
fa0078174e
Merge pull request #24 from jinghuan-Chen/fix/Release-CUDA-Graphs-resource-before-exit
2025-06-18 17:15:28 +08:00
jinghuan-Chen
ffafaeb133
Release CUDA Graphs resource before exit.
2025-06-18 16:17:31 +08:00
Xingkai Yu
4fc764f175
Merge pull request #22 from cheunglei/use_spawn
2025-06-17 23:53:59 +08:00
cheunglei
b5ace32982
use spawn
2025-06-17 23:49:15 +08:00
GeeeekExplorer
bc0ad5a116
better
2025-06-17 23:33:38 +08:00
GeeeekExplorer
7e42fa6f63
fix
2025-06-15 13:28:29 +08:00
Xingkai Yu
326b121fad
Merge pull request #10 from MARD1NO/refine_return_hint_in_schedule
2025-06-15 10:39:51 +08:00
Xingkai Yu
ba96387043
Merge pull request #11 from GeeeekExplorer/tp_dev
2025-06-15 10:37:21 +08:00
GeeeekExplorer
fc778a4da9
better
2025-06-15 10:36:45 +08:00