Files

T

Rain-Bus ffd2defdfc add Chinese annotations to all source files for learning purposes

Annotated 16 source files covering the full architecture:
engine (scheduler, block manager, model runner), layers (attention,
linear, sampler, etc.), model (qwen3), and utils.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 21:33:15 +08:00

1.3 KiB

Raw Blame History

Nano-vLLM 注释说明

已添加注释的文件（16个）：

模块	文件	注释要点
入口	`__init__.py`	项目架构和数据流概览
配置	`config.py`, `sampling_params.py`	每个参数的含义和作用
引擎	`sequence.py`	序列状态机、block_table、序列化机制
引擎	`block_manager.py`	前缀缓存原理、哈希链式计算、引用计数
引擎	`scheduler.py`	prefill/decode调度策略、chunked prefill、抢占机制
引擎	`model_runner.py`	KV cache分配、CUDA Graph捕获、TP共享内存通信
引擎	`llm_engine.py`	引擎初始化流程、step循环、吞吐量统计
模型	`qwen3.py`	Qwen3架构（GQA、Q/K Norm）、融合模块映射
层	`attention.py`	Triton kernel写KV cache、Flash Attention两阶段
层	`linear.py`	5种并行线性层（列切/行切/融合QKV/融合gate_up）
层	`sampler.py`	Gumbel-like采样方法
层	`activation.py`	SwiGLU (SiLU * up)
层	`layernorm.py`	残差融合RMSNorm
层	`embed_head.py`	词表并行Embedding、LM Head前缀优化
层	`rotary_embedding.py`	RoPE原理和预计算缓存
工具	`context.py`, `loader.py`	全局上下文机制、safetensors权重加载

1.3 KiB Raw Blame History Unescape Escape

Nano-vLLM 注释说明

1.3 KiB

Raw Blame History