Files
nano-vllm/ANNOTATIONS.md
T
Rain-Bus ffd2defdfc add Chinese annotations to all source files for learning purposes
Annotated 16 source files covering the full architecture:
engine (scheduler, block manager, model runner), layers (attention,
linear, sampler, etc.), model (qwen3), and utils.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:33:15 +08:00

23 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Nano-vLLM 注释说明
**已添加注释的文件(16个):**
| 模块 | 文件 | 注释要点 |
|------|------|---------|
| 入口 | `__init__.py` | 项目架构和数据流概览 |
| 配置 | `config.py`, `sampling_params.py` | 每个参数的含义和作用 |
| 引擎 | `sequence.py` | 序列状态机、block_table、序列化机制 |
| 引擎 | `block_manager.py` | 前缀缓存原理、哈希链式计算、引用计数 |
| 引擎 | `scheduler.py` | prefill/decode调度策略、chunked prefill、抢占机制 |
| 引擎 | `model_runner.py` | KV cache分配、CUDA Graph捕获、TP共享内存通信 |
| 引擎 | `llm_engine.py` | 引擎初始化流程、step循环、吞吐量统计 |
| 模型 | `qwen3.py` | Qwen3架构(GQA、Q/K Norm)、融合模块映射 |
| 层 | `attention.py` | Triton kernel写KV cache、Flash Attention两阶段 |
| 层 | `linear.py` | 5种并行线性层(列切/行切/融合QKV/融合gate_up |
| 层 | `sampler.py` | Gumbel-like采样方法 |
| 层 | `activation.py` | SwiGLU (SiLU * up) |
| 层 | `layernorm.py` | 残差融合RMSNorm |
| 层 | `embed_head.py` | 词表并行Embedding、LM Head前缀优化 |
| 层 | `rotary_embedding.py` | RoPE原理和预计算缓存 |
| 工具 | `context.py`, `loader.py` | 全局上下文机制、safetensors权重加载 |