Annotated 16 source files covering the full architecture:
engine (scheduler, block manager, model runner), layers (attention,
linear, sampler, etc.), model (qwen3), and utils.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When RowParallelLinear has bias=True, the weight_loader crashes with an
IndexError because it calls param_data.size(tp_dim) where tp_dim=1, but
the bias tensor is 1D and only has dimension 0.
The bias in RowParallelLinear is not sharded (all ranks hold the full
bias, only rank 0 applies it), so skip the sharding logic for 1D params.
FixesGeeeekExplorer/nano-vllm#125