ffd2defdfc
Annotated 16 source files covering the full architecture: engine (scheduler, block manager, model runner), layers (attention, linear, sampler, etc.), model (qwen3), and utils. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
22 lines
826 B
Python
22 lines
826 B
Python
from dataclasses import dataclass
|
|
|
|
|
|
@dataclass(slots=True)
|
|
class SamplingParams:
|
|
"""生成采样的参数配置。
|
|
|
|
Args:
|
|
temperature: 采样温度,控制输出的随机性。值越大越随机,越接近 0 越确定。
|
|
注意:本项目不支持 temperature=0(贪心解码),必须大于 1e-10。
|
|
max_tokens: 单个请求最大生成的 token 数量。
|
|
ignore_eos: 是否忽略 EOS token。设为 True 时即使遇到结束符也继续生成,直到 max_tokens 耗尽。
|
|
基准测试中用于确保每个请求都生成固定数量的 token。
|
|
"""
|
|
|
|
temperature: float = 1.0
|
|
max_tokens: int = 64
|
|
ignore_eos: bool = False
|
|
|
|
def __post_init__(self):
|
|
assert self.temperature > 1e-10, "greedy sampling is not permitted"
|