feat: initial project structure

- Add FastAPI app with paper browsing UI and REST API - Add crawler service and database models - Add scripts for DB init and manual crawl - Add docs (api-and-ui, data-model, services) - Add requirements and project config
2026-06-05 21:56:40 +08:00
commit f1be24ab83
26 changed files with 2557 additions and 0 deletions
@@ -0,0 +1,41 @@
+# ─── 应用 ────────────────────────────────
+APP_HOST=127.0.0.1
+APP_PORT=8000
+APP_DEBUG=false
+BASE_URL=http://127.0.0.1:8000
+APP_TIMEZONE=Asia/Shanghai
+
+# ─── 安全 ────────────────────────────────
+ADMIN_TOKEN=change-me
+
+# ─── HuggingFace / arXiv ────────────────
+HF_API_BASE=https://huggingface.co/api
+HF_PROXY=
+TOP_N=20
+HTTP_TIMEOUT_SECONDS=30
+HTTP_MAX_RETRIES=3
+HTTP_USER_AGENT=hf-daily-papers-local/0.1
+
+# ─── AI 总结（Phase 2 使用）──────────────
+PI_BIN=/home/rainbus/.local/share/mise/installs/pi/latest/pi
+SUMMARY_SKILL=daily-paper-summary
+SUMMARY_CONCURRENCY=3
+SUMMARY_TIMEOUT_SECONDS=300
+SUMMARY_MAX_RETRIES=1
+
+# ─── 调度（Phase 4 使用）─────────────────
+SCHEDULER_ENABLED=false
+SCHEDULE_HOUR=8
+SCHEDULE_MINUTE=0
+APP_WORKERS=1
+
+# ─── 数据库 ─────────────────────────────
+DATABASE_URL=sqlite:///data/db/papers.db
+
+# ─── 语义搜索（Phase 5 增强，暂留空）─────
+CHROMA_ENABLED=false
+CHROMA_DIR=data/chroma
+EMBED_API_BASE=
+EMBED_API_KEY=
+EMBED_MODEL=
+EMBED_DIMENSIONS=
@@ -0,0 +1,15 @@
+.env
+__pycache__/
+*.pyc
+*.pyo
+data/db/*.db
+data/papers/
+data/tmp/
+data/chroma/
+logs/*.log
+.venv/
+venv/
+*.egg-info/
+dist/
+build/
+.DS_Store
@@ -0,0 +1,236 @@
+# HF Daily Papers — 中文论文导览站
+
+> 每日从 HuggingFace Daily Papers 获取热门论文，生成中文结构化解读，提供浏览、搜索、收藏和管理的本地 Web 应用。
+
+---
+
+## 文档索引
+
+| 文档 | 内容 |
+|------|------|
+| [services.md](docs/services.md) | 服务模块：爬虫、AI 总结、搜索、清理、调度、安全等 |
+| [data-model.md](docs/data-model.md) | SQLite 表结构、summary.json schema、索引和校验策略 |
+| [api-and-ui.md](docs/api-and-ui.md) | 路由、页面、用户流程、验收标准 |
+
+---
+
+## 1. 产品边界
+
+### 当前目标
+
+构建一个本地运行的论文导览站：
+
+1. 按日期抓取 HuggingFace Daily Papers。
+2. 提取必要元数据，写入 SQLite。
+3. 总结阶段按需下载 PDF，调用 pi CLI 为论文生成中文结构化总结，完成后清理下载文件。
+4. 展示首页、日期列表、论文详情、搜索结果、阅读列表。
+5. 支持收藏、阅读状态、个人笔记。
+6. 提供安全的管理接口，用于手动抓取、总结、清理和查看日志。
+
+### 暂不做
+
+- 不做 Docker / Docker Compose。
+- 不做自动归档。
+- 不保留下载文件作为长期资产：PDF/源码只用于解析和总结，流程完成后清理。
+- 不做 PDF 图片兜底提取。
+- 不做多用户账号体系。
+- 不做公网服务设计，默认本地或内网部署。
+
+---
+
+## 2. 技术选型
+
+| 层 | 选型 | 说明 |
+|----|------|------|
+| 后端框架 | FastAPI | 页面路由、JSON API、管理接口 |
+| 模板 | Jinja2 | 服务端渲染 HTML |
+| 前端交互 | HTMX + 少量原生 JS | 收藏、状态、搜索、局部刷新 |
+| 样式 | 自定义 CSS，参考 kami 风格 | kami 只作为视觉和排版参考，不调用 kami 构建管线 |
+| 数据库 | SQLite + SQLAlchemy | 单文件、本地低运维 |
+| 全文搜索 | SQLite FTS5 | 标题、摘要、总结、作者、标签关键词搜索 |
+| 语义搜索 | ChromaDB（可选增强） | MVP 后接入，用在线嵌入服务生成向量 |
+| AI 总结 | pi CLI | 一篇论文一次 pi 调用 |
+| 调度 | APScheduler | 单进程内嵌调度，禁止多 worker 重复运行 |
+
+---
+
+## 3. 项目结构
+
+```text
+paper/
+├── README.md
+├── REQUIREMENTS.md
+├── docs/
+│   ├── services.md
+│   ├── data-model.md
+│   └── api-and-ui.md
+├── .env
+├── .env.example
+├── pyproject.toml
+│
+├── app/
+│   ├── main.py
+│   ├── config.py
+│   ├── database.py
+│   ├── models.py
+│   ├── security.py
+│   ├── cli.py
+│   │
+│   ├── routes/
+│   │   ├── pages.py
+│   │   ├── api.py
+│   │   ├── search.py
+│   │   ├── user.py
+│   │   └── admin.py
+│   │
+│   ├── services/
+│   │   ├── crawler.py
+│   │   ├── summarizer.py
+│   │   ├── searcher.py
+│   │   ├── cleaner.py
+│   │   ├── user_data.py
+│   │   └── scheduler.py
+│   │
+│   ├── templates/
+│   │   ├── base.html
+│   │   ├── index.html
+│   │   ├── detail.html
+│   │   ├── search.html
+│   │   ├── reading_list.html
+│   │   ├── admin_logs.html
+│   │   └── partials/
+│   │       ├── paper_card.html
+│   │       ├── date_nav.html
+│   │       └── search_bar.html
+│   │
+│   └── static/
+│       ├── css/style.css
+│       └── js/app.js
+│
+├── data/
+│   ├── db/papers.db
+│   ├── papers/{arxiv_id}/
+│   │   ├── meta.json
+│   │   ├── summary.json
+│   │   └── raw_output.txt
+│   ├── tmp/{arxiv_id}/
+│   │   ├── paper.pdf
+│   │   └── source/
+│   └── chroma/
+│
+├── logs/
+├── tests/
+└── scripts/
+    ├── init_db.py
+    └── manual_crawl.py
+```
+
+`data/tmp/` 是临时文件目录。PDF、LaTeX 源码等下载文件只在总结阶段按需下载，解析和总结完成后删除；数据库、`meta.json`、`summary.json` 和 `raw_output.txt` 可长期保留。
+
+---
+
+## 4. 配置项
+
+```bash
+# 应用
+APP_HOST=127.0.0.1
+APP_PORT=8000
+APP_DEBUG=false
+BASE_URL=http://127.0.0.1:8000
+APP_TIMEZONE=Asia/Shanghai
+
+# 安全
+ADMIN_TOKEN=change-me
+
+# HuggingFace / arXiv
+HF_API_BASE=https://huggingface.co/api
+HF_PROXY=
+TOP_N=20
+HTTP_TIMEOUT_SECONDS=30
+HTTP_MAX_RETRIES=3
+HTTP_USER_AGENT=hf-daily-papers-local/0.1
+
+# AI 总结
+PI_BIN=/home/rainbus/.local/share/mise/installs/pi/latest/pi
+SUMMARY_SKILL=daily-paper-summary
+SUMMARY_CONCURRENCY=3
+SUMMARY_TIMEOUT_SECONDS=300
+SUMMARY_MAX_RETRIES=1
+
+# 调度
+SCHEDULER_ENABLED=true
+SCHEDULE_HOUR=8
+SCHEDULE_MINUTE=0
+APP_WORKERS=1
+
+# 数据库
+DATABASE_URL=sqlite:///data/db/papers.db
+
+# 语义搜索（后续增强，可为空）
+CHROMA_ENABLED=false
+CHROMA_DIR=data/chroma
+EMBED_API_BASE=
+EMBED_API_KEY=
+EMBED_MODEL=
+EMBED_DIMENSIONS=
+```
+
+---
+
+## 5. 里程碑
+
+### Phase 1 — MVP：抓取、入库、浏览
+
+- [ ] FastAPI + SQLite + SQLAlchemy 项目骨架。
+- [ ] 数据表、FTS5 表、基础迁移或初始化脚本。
+- [ ] HF Daily Papers 抓取：支持日期、TOP_N、去重、重试、空日期。
+- [ ] 抓取阶段只入库元数据，不长期保存 PDF。
+- [ ] 首页 `/day/{date}` 和论文详情页 `/paper/{arxiv_id}`。
+- [ ] CLI：手动抓取指定日期。
+
+### Phase 2 — AI 总结
+
+- [ ] pi CLI 集成：一篇论文一次调用。
+- [ ] 总结阶段按需下载 PDF，成功或失败后清理临时文件。
+- [ ] summary.json schema 校验、降级展示、失败重试。
+- [ ] 总结状态追踪。
+- [ ] raw_output.txt 保存和管理后台复跑。
+- [ ] 总结完成后更新 `papers`、`paper_summaries`、FTS5。
+
+### Phase 3 — 搜索和个人化
+
+- [ ] FTS5 关键词搜索。
+- [ ] 收藏、阅读状态、个人笔记。
+- [ ] 阅读列表页。
+- [ ] RSS Feed。
+
+### Phase 4 — 管理和自动化
+
+- [ ] APScheduler 每日自动抓取和总结。
+- [ ] 管理接口 token 鉴权。
+- [ ] 管理后台日志。
+- [ ] 手动删除指定时间段内的数据。
+- [ ] 临时文件清理任务。
+
+### Phase 5 — 后续增强
+
+- [ ] ChromaDB 语义搜索。
+- [ ] 相似论文推荐。
+- [ ] 趋势看板。
+- [ ] 论文对比。
+- [ ] LaTeX 图片提取。
+
+---
+
+## 6. 核心验收标准
+
+1. 重复抓取同一天不会重复入库。
+2. HuggingFace 或 arXiv 请求失败时有超时、重试和日志。
+3. 某篇论文总结失败不会阻塞其他论文。
+4. 首页能展示四种状态：未总结、总结中、总结失败、总结完成。
+5. 详情页在无总结时展示英文标题、摘要、作者、链接和手动总结入口。
+6. 搜索至少能匹配标题、摘要、作者、标签和中文总结正文。
+7. 管理接口没有 token 时不能触发抓取、总结、删除等写操作。
+8. PDF/源码临时文件在流程完成后被清理。
+9. 手动删除指定日期范围后，页面、搜索索引、用户数据和本地文件保持一致。
+10. 调度器在单 worker 下只触发一次每日任务。
@@ -0,0 +1,66 @@
+"""CLI 工具 — 手动抓取论文。"""
+
+import asyncio
+import sys
+from datetime import date
+
+import typer
+from dotenv import load_dotenv
+
+# 在导入 app 模块前加载 .env
+load_dotenv()
+
+cli_app = typer.Typer(help="HF Daily Papers 管理 CLI")
+
+
+@cli_app.command()
+def crawl(
+    date_str: str = typer.Argument(
+        None,
+        help="抓取日期 (YYYY-MM-DD)，默认今天",
+    ),
+    top_n: int = typer.Option(None, "--top", "-n", help="取前 N 篇"),
+):
+    """手动抓取指定日期的 HuggingFace Daily Papers。"""
+    from app.config import settings
+    from app.database import SessionLocal, engine
+    from app.models import init_db as _init
+    from app.services.crawler import crawl_daily
+
+    target = date_str or date.today().isoformat()
+
+    # 确保数据库和表存在
+    import os
+    os.makedirs(settings.db_path.parent, exist_ok=True)
+    _init(engine)
+    typer.echo(f"📡 开始抓取 {target} ...")
+
+    db = SessionLocal()
+    try:
+        result = asyncio.run(crawl_daily(db, target, top_n))
+        if result["status"] == "success":
+            typer.echo(
+                f"✅ 抓取完成：发现 {result['found']} 篇，新增 {result['new']} 篇"
+            )
+        else:
+            typer.echo(f"❌ 抓取失败：{result['error']}", err=True)
+            raise typer.Exit(code=1)
+    finally:
+        db.close()
+
+
+@cli_app.command()
+def init_db():
+    """初始化数据库表。"""
+    from app.config import settings
+    from app.database import engine
+    from app.models import init_db as _init
+
+    import os
+    os.makedirs(settings.db_path.parent, exist_ok=True)
+    _init(engine)
+    typer.echo(f"✅ 数据库已初始化：{settings.db_path}")
+
+
+if __name__ == "__main__":
+    cli_app()
@@ -0,0 +1,73 @@
+"""应用配置 — 从 .env / 环境变量加载。"""
+
+from pathlib import Path
+
+from pydantic_settings import BaseSettings
+
+BASE_DIR = Path(__file__).resolve().parent.parent
+
+
+class Settings(BaseSettings):
+    # 应用
+    APP_HOST: str = "127.0.0.1"
+    APP_PORT: int = 8000
+    APP_DEBUG: bool = False
+    BASE_URL: str = "http://127.0.0.1:8000"
+    APP_TIMEZONE: str = "Asia/Shanghai"
+
+    # 安全
+    ADMIN_TOKEN: str = "change-me"
+
+    # HuggingFace / arXiv
+    HF_API_BASE: str = "https://huggingface.co/api"
+    HF_PROXY: str = ""
+    TOP_N: int = 20
+    HTTP_TIMEOUT_SECONDS: int = 30
+    HTTP_MAX_RETRIES: int = 3
+    HTTP_USER_AGENT: str = "hf-daily-papers-local/0.1"
+
+    # AI 总结（Phase 2）
+    PI_BIN: str = ""
+    SUMMARY_SKILL: str = "daily-paper-summary"
+    SUMMARY_CONCURRENCY: int = 3
+    SUMMARY_TIMEOUT_SECONDS: int = 300
+    SUMMARY_MAX_RETRIES: int = 1
+
+    # 调度（Phase 4）
+    SCHEDULER_ENABLED: bool = False
+    SCHEDULE_HOUR: int = 8
+    SCHEDULE_MINUTE: int = 0
+    APP_WORKERS: int = 1
+
+    # 数据库
+    DATABASE_URL: str = "sqlite:///data/db/papers.db"
+
+    # 语义搜索（Phase 5）
+    CHROMA_ENABLED: bool = False
+    CHROMA_DIR: str = "data/chroma"
+    EMBED_API_BASE: str = ""
+    EMBED_API_KEY: str = ""
+    EMBED_MODEL: str = ""
+    EMBED_DIMENSIONS: int = 0
+
+    model_config = {
+        "env_file": str(BASE_DIR / ".env"),
+        "env_file_encoding": "utf-8",
+        "extra": "ignore",
+    }
+
+    @property
+    def db_path(self) -> Path:
+        """从 DATABASE_URL 解析出 SQLite 文件路径。"""
+        # sqlite:///data/db/papers.db → data/db/papers.db
+        url = self.DATABASE_URL
+        if url.startswith("sqlite:///"):
+            return BASE_DIR / url[len("sqlite:///"):]
+        raise ValueError(f"Unsupported DATABASE_URL: {url}")
+
+    @property
+    def http_proxy(self) -> str | None:
+        return self.HF_PROXY or None
+
+
+settings = Settings()
@@ -0,0 +1,41 @@
+"""数据库引擎、会话工厂、初始化。"""
+
+from sqlalchemy import event, create_engine
+from sqlalchemy.orm import DeclarativeBase, sessionmaker
+
+from app.config import settings
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+def _make_engine():
+    """创建 SQLite 引擎，启用 foreign_keys。"""
+    engine = create_engine(
+        settings.DATABASE_URL,
+        echo=settings.APP_DEBUG,
+        connect_args={"check_same_thread": False},
+    )
+
+    @event.listens_for(engine, "connect")
+    def _set_sqlite_pragma(dbapi_connection, _connection_record):
+        cursor = dbapi_connection.cursor()
+        cursor.execute("PRAGMA foreign_keys=ON")
+        cursor.execute("PRAGMA journal_mode=WAL")
+        cursor.close()
+
+    return engine
+
+
+engine = _make_engine()
+SessionLocal = sessionmaker(bind=engine, autoflush=False, autocommit=False)
+
+
+def get_db():
+    """FastAPI 依赖注入：获取数据库会话。"""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
@@ -0,0 +1,59 @@
+"""FastAPI 应用入口。"""
+
+import logging
+import os
+
+from fastapi import FastAPI
+from fastapi.staticfiles import StaticFiles
+
+from app.config import settings
+from app.database import engine
+from app.models import init_db
+from app.routes.pages import router as pages_router
+
+logging.basicConfig(
+    level=logging.DEBUG if settings.APP_DEBUG else logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+
+def create_app() -> FastAPI:
+    app = FastAPI(
+        title="HF Daily Papers",
+        description="HuggingFace Daily Papers — 中文论文导览站",
+        version="0.1.0",
+    )
+
+    # 确保数据目录存在
+    os.makedirs(settings.db_path.parent, exist_ok=True)
+
+    # 初始化数据库
+    init_db(engine)
+    logger.info("Database initialized at %s", settings.db_path)
+
+    # 安全警告
+    if settings.ADMIN_TOKEN == "change-me":
+        logger.warning("⚠️  ADMIN_TOKEN is the default value 'change-me'. Please change it in .env!")
+
+    # 静态文件
+    app.mount("/static", StaticFiles(directory="app/static"), name="static")
+
+    # 路由
+    app.include_router(pages_router)
+
+    return app
+
+
+app = create_app()
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    uvicorn.run(
+        "app.main:app",
+        host=settings.APP_HOST,
+        port=settings.APP_PORT,
+        reload=settings.APP_DEBUG,
+    )
@@ -0,0 +1,235 @@
+"""SQLAlchemy ORM 模型 — papers, authors, tags, summaries, FTS5, logs, locks, user data。"""
+
+from datetime import date, datetime
+
+from sqlalchemy import (
+    Boolean,
+    Column,
+    Date,
+    DateTime,
+    ForeignKey,
+    Index,
+    Integer,
+    String,
+    Text,
+    UniqueConstraint,
+    text,
+)
+from sqlalchemy.orm import relationship
+
+from app.database import Base
+
+
+# ── papers ──────────────────────────────────────────────────────────────
+class Paper(Base):
+    __tablename__ = "papers"
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    arxiv_id = Column(String, unique=True, nullable=False, index=True)
+    title_en = Column(String, nullable=False)
+    title_zh = Column(String)
+    abstract = Column(Text)
+    published_at = Column(Date)
+    paper_date = Column(Date, nullable=False, index=True)
+    crawled_at = Column(DateTime, nullable=False)
+    upvotes = Column(Integer, default=0)
+    hf_url = Column(String)
+    arxiv_url = Column(String)
+    pdf_url = Column(String)
+    source_url = Column(String)
+    asset_status = Column(String, default="not_downloaded")
+    asset_error = Column(String)
+    meta_path = Column(String)
+    summary_path = Column(String)
+    raw_output_path = Column(String)
+    summary_quality = Column(String)
+
+    authors = relationship("PaperAuthor", back_populates="paper", cascade="all, delete-orphan")
+    tags = relationship("PaperTag", back_populates="paper", cascade="all, delete-orphan")
+    summary = relationship("PaperSummary", back_populates="paper", uselist=False, cascade="all, delete-orphan")
+    summary_status = relationship("SummaryStatus", back_populates="paper", uselist=False, cascade="all, delete-orphan")
+    bookmark = relationship("UserBookmark", back_populates="paper", uselist=False, cascade="all, delete-orphan")
+    reading_status = relationship("UserReadingStatus", back_populates="paper", uselist=False, cascade="all, delete-orphan")
+    note = relationship("UserNote", back_populates="paper", uselist=False, cascade="all, delete-orphan")
+
+
+# ── paper_authors ───────────────────────────────────────────────────────
+class PaperAuthor(Base):
+    __tablename__ = "paper_authors"
+    __table_args__ = (UniqueConstraint("paper_id", "name"),)
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    paper_id = Column(Integer, ForeignKey("papers.id", ondelete="CASCADE"), nullable=False)
+    name = Column(String, nullable=False)
+    position = Column(Integer, default=0)
+
+    paper = relationship("Paper", back_populates="authors")
+
+
+# ── paper_tags ──────────────────────────────────────────────────────────
+class PaperTag(Base):
+    __tablename__ = "paper_tags"
+    __table_args__ = (UniqueConstraint("paper_id", "tag", "source"),)
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    paper_id = Column(Integer, ForeignKey("papers.id", ondelete="CASCADE"), nullable=False)
+    tag = Column(String, nullable=False)
+    source = Column(String, default="hf")
+
+    paper = relationship("Paper", back_populates="tags")
+
+
+# ── paper_summaries ─────────────────────────────────────────────────────
+class PaperSummary(Base):
+    __tablename__ = "paper_summaries"
+
+    paper_id = Column(Integer, ForeignKey("papers.id", ondelete="CASCADE"), primary_key=True)
+    one_line = Column(Text)
+    difficulty = Column(String)
+    prerequisites_json = Column(Text)
+    motivation_problem = Column(Text)
+    motivation_goal = Column(Text)
+    motivation_gap = Column(Text)
+    method_overview = Column(Text)
+    method_key_idea = Column(Text)
+    method_steps_json = Column(Text)
+    method_novelty = Column(Text)
+    results_main_json = Column(Text)
+    results_benchmarks_json = Column(Text)
+    limitations_json = Column(Text)
+    weaknesses_json = Column(Text)
+    future_work_json = Column(Text)
+    reproducibility = Column(String)
+    full_json = Column(Text, nullable=False)
+    updated_at = Column(DateTime, nullable=False)
+
+    paper = relationship("Paper", back_populates="summary")
+
+
+# ── summary_status ──────────────────────────────────────────────────────
+class SummaryStatus(Base):
+    __tablename__ = "summary_status"
+    __table_args__ = (UniqueConstraint("paper_id"),)
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    paper_id = Column(Integer, ForeignKey("papers.id", ondelete="CASCADE"), nullable=False)
+    status = Column(String, nullable=False, default="pending")
+    quality = Column(String)
+    error_type = Column(String)
+    error = Column(Text)
+    retry_count = Column(Integer, default=0)
+    raw_output_saved = Column(Boolean, default=False)
+    started_at = Column(DateTime)
+    completed_at = Column(DateTime)
+
+    paper = relationship("Paper", back_populates="summary_status")
+
+
+# ── crawl_logs ──────────────────────────────────────────────────────────
+class CrawlLog(Base):
+    __tablename__ = "crawl_logs"
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    task = Column(String, nullable=False)
+    status = Column(String, nullable=False)
+    date = Column(Date)
+    papers_found = Column(Integer)
+    papers_new = Column(Integer)
+    error = Column(Text)
+    started_at = Column(DateTime, nullable=False)
+    completed_at = Column(DateTime)
+
+
+# ── task_locks ──────────────────────────────────────────────────────────
+class TaskLock(Base):
+    __tablename__ = "task_locks"
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    task = Column(String, nullable=False)
+    lock_key = Column(String, nullable=False)
+    status = Column(String, nullable=False)
+    owner = Column(String)
+    acquired_at = Column(DateTime, nullable=False)
+    released_at = Column(DateTime)
+
+
+# ── user data ──────────────────────────────────────────────────────────
+class UserBookmark(Base):
+    __tablename__ = "user_bookmarks"
+    __table_args__ = (UniqueConstraint("paper_id"),)
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    paper_id = Column(Integer, ForeignKey("papers.id", ondelete="CASCADE"), nullable=False)
+    note = Column(Text)
+    created_at = Column(DateTime, nullable=False)
+
+    paper = relationship("Paper", back_populates="bookmark")
+
+
+class UserReadingStatus(Base):
+    __tablename__ = "user_reading_status"
+    __table_args__ = (UniqueConstraint("paper_id"),)
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    paper_id = Column(Integer, ForeignKey("papers.id", ondelete="CASCADE"), nullable=False)
+    status = Column(String, nullable=False, default="unread")
+    updated_at = Column(DateTime, nullable=False)
+
+    paper = relationship("Paper", back_populates="reading_status")
+
+
+class UserNote(Base):
+    __tablename__ = "user_notes"
+    __table_args__ = (UniqueConstraint("paper_id"),)
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    paper_id = Column(Integer, ForeignKey("papers.id", ondelete="CASCADE"), nullable=False)
+    content = Column(Text, nullable=False)
+    created_at = Column(DateTime, nullable=False)
+    updated_at = Column(DateTime, nullable=False)
+
+    paper = relationship("Paper", back_populates="note")
+
+
+# ── data_delete_jobs ───────────────────────────────────────────────────
+class DataDeleteJob(Base):
+    __tablename__ = "data_delete_jobs"
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    date_start = Column(Date, nullable=False)
+    date_end = Column(Date, nullable=False)
+    include_notes = Column(Boolean, default=True)
+    paper_count = Column(Integer, default=0)
+    status = Column(String, nullable=False)
+    error = Column(Text)
+    started_at = Column(DateTime, nullable=False)
+    completed_at = Column(DateTime)
+
+
+# ── FTS5 索引初始化 SQL（普通虚拟表，由应用层维护）──────────────────────
+FTS5_CREATE_SQL = """
+CREATE VIRTUAL TABLE IF NOT EXISTS papers_fts USING fts5(
+    title_en,
+    title_zh,
+    abstract,
+    authors,
+    tags,
+    summary_text,
+    tokenize='unicode61'
+);
+"""
+
+FTS5_TRIGGER_INDEX = """
+-- partial index for task_locks running
+CREATE UNIQUE INDEX IF NOT EXISTS uq_task_locks_running
+ON task_locks(task, lock_key) WHERE status = 'running';
+"""
+
+
+def init_db(engine):
+    """创建所有 ORM 表 + FTS5 虚拟表。"""
+    Base.metadata.create_all(engine)
+    with engine.connect() as conn:
+        conn.execute(text(FTS5_CREATE_SQL))
+        conn.execute(text(FTS5_TRIGGER_INDEX))
+        conn.commit()
@@ -0,0 +1,109 @@
+"""页面路由 — 首页、日期页、论文详情。"""
+
+from datetime import date, datetime, timedelta
+from zoneinfo import ZoneInfo
+
+from fastapi import APIRouter, Depends, HTTPException, Request
+from fastapi.responses import RedirectResponse
+from fastapi.templating import Jinja2Templates
+from sqlalchemy.orm import Session, joinedload
+
+from app.config import settings
+from app.database import get_db
+from app.models import Paper
+
+router = APIRouter()
+templates = Jinja2Templates(directory="app/templates")
+
+
+def _today() -> str:
+    tz = ZoneInfo(settings.APP_TIMEZONE)
+    return datetime.now(tz).strftime("%Y-%m-%d")
+
+
+@router.get("/")
+def index(request: Request):
+    """重定向到 /day/{today}。"""
+    return RedirectResponse(url=f"/day/{_today()}")
+
+
+@router.get("/day/{date_str}")
+def day_page(date_str: str, request: Request, db: Session = Depends(get_db)):
+    """指定日期论文列表。"""
+    try:
+        target = date.fromisoformat(date_str)
+    except ValueError:
+        raise HTTPException(status_code=404, detail="Invalid date format")
+
+    prev_day = (target - timedelta(days=1)).isoformat()
+    next_day = (target + timedelta(days=1)).isoformat()
+    today_str = _today()
+
+    papers = (
+        db.query(Paper)
+        .filter(Paper.paper_date == date_str)
+        .options(
+            joinedload(Paper.authors),
+            joinedload(Paper.tags),
+            joinedload(Paper.summary_status),
+            joinedload(Paper.bookmark),
+        )
+        .order_by(Paper.upvotes.desc())
+        .all()
+    )
+
+    dates_raw = (
+        db.query(Paper.paper_date)
+        .distinct()
+        .order_by(Paper.paper_date.desc())
+        .limit(30)
+        .all()
+    )
+    available_dates = [d[0].isoformat() if isinstance(d[0], date) else str(d[0]) for d in dates_raw]
+
+    return templates.TemplateResponse(
+        request, "index.html",
+        {
+            "papers": papers,
+            "current_date": date_str,
+            "prev_day": prev_day,
+            "next_day": next_day,
+            "today": today_str,
+            "available_dates": available_dates,
+            "page_title": f"{date_str} 论文列表",
+        },
+    )
+
+
+@router.get("/paper/{arxiv_id}")
+def paper_detail(arxiv_id: str, request: Request, db: Session = Depends(get_db)):
+    """论文详情页。"""
+    paper = (
+        db.query(Paper)
+        .filter(Paper.arxiv_id == arxiv_id)
+        .options(
+            joinedload(Paper.authors),
+            joinedload(Paper.tags),
+            joinedload(Paper.summary),
+            joinedload(Paper.summary_status),
+            joinedload(Paper.bookmark),
+            joinedload(Paper.reading_status),
+            joinedload(Paper.note),
+        )
+        .first()
+    )
+    if not paper:
+        raise HTTPException(status_code=404, detail="Paper not found")
+
+    summary_state = "none"
+    if paper.summary_status:
+        summary_state = paper.summary_status.status
+
+    return templates.TemplateResponse(
+        request, "detail.html",
+        {
+            "paper": paper,
+            "summary_state": summary_state,
+            "page_title": paper.title_zh or paper.title_en,
+        },
+    )
@@ -0,0 +1,182 @@
+"""爬虫服务 — 从 HuggingFace Daily Papers API 抓取论文元数据。"""
+
+import logging
+from datetime import date as date_type
+from datetime import datetime, timezone
+
+import httpx
+from sqlalchemy import select, text
+from sqlalchemy.orm import Session
+
+from app.config import settings
+from app.models import (
+    CrawlLog,
+    Paper,
+    PaperAuthor,
+    PaperTag,
+    SummaryStatus,
+)
+
+logger = logging.getLogger(__name__)
+
+
+async def fetch_daily(target_date: str, top_n: int | None = None) -> list[dict]:
+    """从 HF Daily Papers API 获取指定日期的论文列表。
+
+    Args:
+        target_date: YYYY-MM-DD 格式
+        top_n: 取前 N 篇，默认使用 settings.TOP_N
+
+    Returns:
+        论文元数据列表
+    """
+    top_n = top_n or settings.TOP_N
+    url = f"{settings.HF_API_BASE}/daily_papers"
+    params = {"date": target_date}
+
+    transport = None
+    if settings.http_proxy:
+        transport = httpx.AsyncHTTPTransport(proxy=settings.http_proxy)
+
+    async with httpx.AsyncClient(
+        timeout=settings.HTTP_TIMEOUT_SECONDS,
+        headers={"User-Agent": settings.HTTP_USER_AGENT},
+        transport=transport,
+    ) as client:
+        for attempt in range(1, settings.HTTP_MAX_RETRIES + 1):
+            try:
+                logger.info("Fetching HF Daily Papers: date=%s attempt=%d", target_date, attempt)
+                resp = await client.get(url, params=params)
+                resp.raise_for_status()
+                data = resp.json()
+                break
+            except (httpx.HTTPError, httpx.HTTPStatusError) as exc:
+                logger.warning("Fetch failed (attempt %d/%d): %s", attempt, settings.HTTP_MAX_RETRIES, exc)
+                if attempt == settings.HTTP_MAX_RETRIES:
+                    raise
+        else:
+            data = []
+
+    papers = data[:top_n]
+    logger.info("Fetched %d papers for %s (raw=%d)", len(papers), target_date, len(data))
+    return papers
+
+
+def _parse_paper(item: dict) -> dict:
+    """从 HF API 响应中提取论文元数据。"""
+    paper_info = item.get("paper", item)
+    arxiv_id = paper_info.get("id", "")
+    published_raw = paper_info.get("publishedAt", "")
+    published_at = None
+    if published_raw:
+        try:
+            published_at = date_type.fromisoformat(published_raw[:10])
+        except ValueError:
+            pass
+    return {
+        "arxiv_id": arxiv_id,
+        "title_en": paper_info.get("title", ""),
+        "abstract": paper_info.get("abstract", ""),
+        "published_at": published_at,
+        "upvotes": item.get("paper", {}).get("upvotes", 0) or item.get("upvotes", 0),
+        "hf_url": f"https://huggingface.co/papers/{arxiv_id}" if arxiv_id else "",
+        "arxiv_url": f"https://arxiv.org/abs/{arxiv_id}" if arxiv_id else "",
+        "pdf_url": f"https://arxiv.org/pdf/{arxiv_id}.pdf" if arxiv_id else "",
+        "authors": [a.get("name", a) if isinstance(a, dict) else a for a in paper_info.get("authors", [])],
+        "tags": [t.get("name", t) if isinstance(t, dict) else t for t in (paper_info.get("tags") or [])],
+    }
+
+
+def upsert_papers(db: Session, papers_raw: list[dict], paper_date: str) -> list[Paper]:
+    """将论文元数据写入数据库。已有论文仅更新可变字段（upvotes 等），不重复插入。"""
+    now = datetime.now(timezone.utc)
+    paper_date_obj = date_type.fromisoformat(paper_date)
+    new_papers: list[Paper] = []
+
+    for item in papers_raw:
+        meta = _parse_paper(item)
+        arxiv_id = meta["arxiv_id"]
+        if not arxiv_id:
+            continue
+
+        existing = db.execute(
+            select(Paper).where(Paper.arxiv_id == arxiv_id)
+        ).scalar_one_or_none()
+
+        if existing:
+            existing.upvotes = meta["upvotes"]
+            existing.crawled_at = now
+            logger.debug("Updated existing paper: %s", arxiv_id)
+        else:
+            paper = Paper(
+                arxiv_id=arxiv_id,
+                title_en=meta["title_en"],
+                abstract=meta["abstract"],
+                published_at=meta["published_at"],
+                paper_date=paper_date_obj,
+                crawled_at=now,
+                upvotes=meta["upvotes"],
+                hf_url=meta["hf_url"],
+                arxiv_url=meta["arxiv_url"],
+                pdf_url=meta["pdf_url"],
+            )
+            db.add(paper)
+            db.flush()
+
+            for idx, name in enumerate(meta["authors"]):
+                if name:
+                    db.add(PaperAuthor(paper_id=paper.id, name=name, position=idx))
+
+            for tag_name in meta["tags"]:
+                if tag_name:
+                    db.add(PaperTag(paper_id=paper.id, tag=tag_name, source="hf"))
+
+            db.add(SummaryStatus(paper_id=paper.id, status="pending"))
+
+            authors_text = ", ".join(meta["authors"])
+            tags_text = ", ".join(meta["tags"])
+            db.execute(
+                text(
+                    "INSERT INTO papers_fts(rowid, title_en, abstract, authors, tags) "
+                    "VALUES (:id, :title, :abstract, :authors, :tags)"
+                ),
+                {"id": paper.id, "title": meta["title_en"], "abstract": meta["abstract"] or "",
+                 "authors": authors_text, "tags": tags_text},
+            )
+
+            new_papers.append(paper)
+            logger.debug("Inserted new paper: %s", arxiv_id)
+
+    db.commit()
+    logger.info("Upserted %d papers (%d new) for %s", len(papers_raw), len(new_papers), paper_date)
+    return new_papers
+
+
+async def crawl_daily(db: Session, target_date: str, top_n: int | None = None) -> dict:
+    """完整的抓取流程：获取 + 入库 + 写日志。"""
+    now = datetime.now(timezone.utc)
+    log_entry = CrawlLog(
+        task="crawl",
+        status="running",
+        date=date_type.fromisoformat(target_date),
+        started_at=now,
+    )
+    db.add(log_entry)
+    db.commit()
+
+    try:
+        raw_papers = await fetch_daily(target_date, top_n)
+        new_papers = upsert_papers(db, raw_papers, target_date)
+        log_entry.status = "success"
+        log_entry.papers_found = len(raw_papers)
+        log_entry.papers_new = len(new_papers)
+        log_entry.completed_at = datetime.now(timezone.utc)
+        db.commit()
+        return {"found": len(raw_papers), "new": len(new_papers), "status": "success", "error": None}
+    except Exception as exc:
+        logger.exception("Crawl failed for %s", target_date)
+        log_entry.status = "failed"
+        log_entry.error = str(exc)
+        log_entry.completed_at = datetime.now(timezone.utc)
+        db.commit()
+        return {"found": 0, "new": 0, "status": "failed", "error": str(exc)}
@@ -0,0 +1,338 @@
+/* ── kami 风格参考：纸张质感、留白、墨蓝强调色 ─────────────────── */
+:root {
+  --bg: #faf8f5;
+  --surface: #ffffff;
+  --ink: #1a1a2e;
+  --ink-light: #4a4a6a;
+  --accent: #2d5f8a;
+  --accent-hover: #1d4a6f;
+  --border: #e8e4df;
+  --shadow: rgba(0, 0, 0, 0.06);
+  --radius: 8px;
+  --font-body: "Noto Serif SC", "Georgia", serif;
+  --font-sans: "Inter", "Noto Sans SC", system-ui, sans-serif;
+  --max-width: 960px;
+}
+
+*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+
+body {
+  font-family: var(--font-sans);
+  background: var(--bg);
+  color: var(--ink);
+  line-height: 1.7;
+  -webkit-font-smoothing: antialiased;
+}
+
+a { color: var(--accent); text-decoration: none; }
+a:hover { color: var(--accent-hover); text-decoration: underline; }
+
+/* ── Header ─────────────────────────────────────────────────────── */
+.site-header {
+  background: var(--surface);
+  border-bottom: 1px solid var(--border);
+  position: sticky;
+  top: 0;
+  z-index: 100;
+}
+
+.nav-bar {
+  max-width: var(--max-width);
+  margin: 0 auto;
+  padding: 12px 24px;
+  display: flex;
+  align-items: center;
+  gap: 24px;
+}
+
+.nav-brand {
+  font-family: var(--font-body);
+  font-size: 1.2rem;
+  font-weight: 700;
+  color: var(--ink);
+}
+
+.nav-links { display: flex; gap: 16px; margin-left: auto; }
+.nav-links a { font-size: 0.9rem; color: var(--ink-light); }
+.nav-links a:hover { color: var(--accent); }
+
+/* ── Container ──────────────────────────────────────────────────── */
+.container {
+  max-width: var(--max-width);
+  margin: 0 auto;
+  padding: 24px;
+}
+
+/* ── Date Navigation ────────────────────────────────────────────── */
+.date-nav {
+  display: flex;
+  align-items: center;
+  gap: 16px;
+  margin-bottom: 24px;
+  flex-wrap: wrap;
+}
+
+.date-title {
+  font-family: var(--font-body);
+  font-size: 1.5rem;
+  font-weight: 700;
+}
+
+.date-nav-btn {
+  display: inline-block;
+  padding: 6px 14px;
+  background: var(--surface);
+  border: 1px solid var(--border);
+  border-radius: var(--radius);
+  font-size: 0.85rem;
+  color: var(--ink-light);
+  transition: all 0.2s;
+}
+.date-nav-btn:hover { border-color: var(--accent); color: var(--accent); text-decoration: none; }
+
+/* ── Date Chips ─────────────────────────────────────────────────── */
+.date-quick-nav {
+  margin-top: 32px;
+  padding-top: 16px;
+  border-top: 1px solid var(--border);
+  font-size: 0.85rem;
+  color: var(--ink-light);
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  flex-wrap: wrap;
+}
+
+.date-chip {
+  padding: 4px 10px;
+  background: var(--surface);
+  border: 1px solid var(--border);
+  border-radius: 4px;
+  font-size: 0.8rem;
+  color: var(--ink-light);
+}
+.date-chip:hover { border-color: var(--accent); color: var(--accent); text-decoration: none; }
+.date-chip.active { background: var(--accent); color: #fff; border-color: var(--accent); }
+
+/* ── Paper Card ─────────────────────────────────────────────────── */
+.paper-list { display: flex; flex-direction: column; gap: 16px; }
+
+.paper-card {
+  background: var(--surface);
+  border: 1px solid var(--border);
+  border-radius: var(--radius);
+  padding: 20px 24px;
+  transition: box-shadow 0.2s;
+}
+.paper-card:hover { box-shadow: 0 2px 12px var(--shadow); }
+
+.paper-card-header {
+  display: flex;
+  justify-content: space-between;
+  align-items: flex-start;
+  gap: 12px;
+}
+
+.paper-title {
+  font-family: var(--font-body);
+  font-size: 1.1rem;
+  font-weight: 600;
+  line-height: 1.5;
+  flex: 1;
+}
+.paper-title a { color: var(--ink); }
+.paper-title a:hover { color: var(--accent); }
+
+.paper-upvotes {
+  font-size: 0.85rem;
+  color: var(--ink-light);
+  white-space: nowrap;
+}
+
+.paper-one-line, .paper-abstract-preview {
+  margin-top: 8px;
+  color: var(--ink-light);
+  font-size: 0.92rem;
+  line-height: 1.6;
+}
+
+.paper-meta {
+  margin-top: 8px;
+  font-size: 0.82rem;
+  color: var(--ink-light);
+}
+
+.paper-tags {
+  margin-top: 8px;
+  display: flex;
+  gap: 6px;
+  flex-wrap: wrap;
+}
+
+.tag {
+  display: inline-block;
+  padding: 2px 8px;
+  background: #eef3f8;
+  color: var(--accent);
+  border-radius: 3px;
+  font-size: 0.75rem;
+  font-weight: 500;
+}
+
+.paper-footer {
+  margin-top: 12px;
+  display: flex;
+  justify-content: space-between;
+  align-items: center;
+}
+
+.summary-badge {
+  font-size: 0.8rem;
+  padding: 2px 8px;
+  border-radius: 3px;
+}
+.summary-none { background: #f0f0f0; color: #888; }
+.summary-pending { background: #fff3e0; color: #e67e22; }
+.summary-processing { background: #e3f2fd; color: #1976d2; }
+.summary-done { background: #e8f5e9; color: #388e3c; }
+.summary-failed, .summary-permanent_failure { background: #fce4ec; color: #c62828; }
+
+.btn-detail {
+  font-size: 0.85rem;
+  color: var(--accent);
+  font-weight: 500;
+}
+
+/* ── Empty State ────────────────────────────────────────────────── */
+.empty-state {
+  text-align: center;
+  padding: 60px 20px;
+  color: var(--ink-light);
+}
+.empty-state p:first-child { font-size: 1.2rem; }
+.hint { font-size: 0.85rem; margin-top: 8px; }
+
+/* ── Paper Detail ───────────────────────────────────────────────── */
+.paper-detail { max-width: 780px; margin: 0 auto; }
+
+.back-link {
+  display: inline-block;
+  margin-bottom: 16px;
+  font-size: 0.85rem;
+  color: var(--ink-light);
+}
+
+.detail-title {
+  font-family: var(--font-body);
+  font-size: 1.6rem;
+  font-weight: 700;
+  line-height: 1.4;
+  margin-bottom: 12px;
+}
+.detail-title .title-en {
+  display: block;
+  font-size: 1rem;
+  font-weight: 400;
+  color: var(--ink-light);
+  margin-top: 4px;
+}
+
+.detail-meta {
+  display: flex;
+  gap: 16px;
+  flex-wrap: wrap;
+  font-size: 0.88rem;
+  color: var(--ink-light);
+  margin-bottom: 12px;
+}
+
+.detail-tags { margin-bottom: 12px; display: flex; gap: 6px; flex-wrap: wrap; }
+
+.detail-links {
+  display: flex;
+  gap: 12px;
+  margin-bottom: 24px;
+}
+.ext-link {
+  padding: 6px 14px;
+  background: var(--surface);
+  border: 1px solid var(--border);
+  border-radius: var(--radius);
+  font-size: 0.85rem;
+  color: var(--ink-light);
+}
+.ext-link:hover { border-color: var(--accent); color: var(--accent); text-decoration: none; }
+
+/* ── Summary Sections ───────────────────────────────────────────── */
+.summary-section {
+  margin-bottom: 24px;
+  padding: 20px;
+  background: var(--surface);
+  border: 1px solid var(--border);
+  border-radius: var(--radius);
+}
+
+.summary-section h2 {
+  font-family: var(--font-body);
+  font-size: 1.05rem;
+  font-weight: 600;
+  margin-bottom: 8px;
+  color: var(--accent);
+}
+
+.summary-section p {
+  font-size: 0.92rem;
+  line-height: 1.8;
+}
+
+.one-line {
+  font-size: 1rem;
+  font-weight: 500;
+  line-height: 1.6;
+}
+
+.abstract-section { background: #faf8f5; }
+.abstract-en { font-size: 0.9rem; color: var(--ink-light); font-style: italic; }
+
+/* ── Summary Placeholders ───────────────────────────────────────── */
+.summary-placeholder {
+  padding: 24px;
+  text-align: center;
+  border-radius: var(--radius);
+  margin-bottom: 24px;
+}
+.summary-placeholder.processing { background: #e3f2fd; }
+.summary-placeholder.failed { background: #fce4ec; }
+.summary-placeholder.none { background: #f5f5f5; }
+.error-detail { font-size: 0.85rem; color: #c62828; margin-top: 8px; }
+
+.quality-warning {
+  padding: 10px 16px;
+  background: #fff8e1;
+  border: 1px solid #ffe082;
+  border-radius: var(--radius);
+  font-size: 0.85rem;
+  color: #f57f17;
+  margin-bottom: 16px;
+}
+
+/* ── Footer ─────────────────────────────────────────────────────── */
+.site-footer {
+  margin-top: 48px;
+  padding: 20px;
+  text-align: center;
+  font-size: 0.8rem;
+  color: var(--ink-light);
+  border-top: 1px solid var(--border);
+}
+
+/* ── Responsive ─────────────────────────────────────────────────── */
+@media (max-width: 640px) {
+  .container { padding: 16px; }
+  .nav-bar { padding: 10px 16px; }
+  .date-nav { gap: 8px; }
+  .date-title { font-size: 1.2rem; }
+  .paper-card { padding: 14px 16px; }
+  .detail-title { font-size: 1.3rem; }
+  .detail-meta { flex-direction: column; gap: 4px; }
+}
@@ -0,0 +1 @@
+/* app.js — 基础前端交互（HTMX 后续增强） */
@@ -0,0 +1,32 @@
+<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>{% block title %}HF Daily Papers{% endblock %}</title>
+  <link rel="stylesheet" href="/static/css/style.css">
+</head>
+<body>
+  <header class="site-header">
+    <nav class="nav-bar">
+      <a href="/" class="nav-brand">📚 HF Daily Papers</a>
+      <div class="nav-links">
+        <a href="/day/{{ today }}">今日</a>
+        <a href="/search">搜索</a>
+        <a href="/reading-list">阅读列表</a>
+      </div>
+    </nav>
+  </header>
+
+  <main class="container">
+    {% block content %}{% endblock %}
+  </main>
+
+  <footer class="site-footer">
+    <p>HF Daily Papers — 中文论文导览站 · 数据来源于 <a href="https://huggingface.co/papers" target="_blank">HuggingFace</a></p>
+  </footer>
+
+  <script src="/static/js/app.js"></script>
+  {% block scripts %}{% endblock %}
+</body>
+</html>
@@ -0,0 +1,121 @@
+{% extends "base.html" %}
+
+{% block title %}{{ page_title }} — HF Daily Papers{% endblock %}
+
+{% block content %}
+<article class="paper-detail">
+  <a href="/day/{{ paper.paper_date.isoformat() }}" class="back-link">← 返回 {{ paper.paper_date.isoformat() }}</a>
+
+  {# 标题 #}
+  <h1 class="detail-title">
+    {{ paper.title_zh or paper.title_en }}
+    {% if paper.title_zh and paper.title_en != paper.title_zh %}
+    <small class="title-en">{{ paper.title_en }}</small>
+    {% endif %}
+  </h1>
+
+  {# 元信息 #}
+  <div class="detail-meta">
+    <span class="detail-authors">{{ paper.authors|map(attribute='name')|join(', ') }}</span>
+    <span class="detail-date">📅 {{ paper.published_at or paper.paper_date }}</span>
+    <span class="detail-upvotes">👍 {{ paper.upvotes }}</span>
+  </div>
+
+  {# 标签 #}
+  {% if paper.tags %}
+  <div class="detail-tags">
+    {% for tag in paper.tags %}
+    <span class="tag">{{ tag.tag }}</span>
+    {% endfor %}
+  </div>
+  {% endif %}
+
+  {# 链接 #}
+  <div class="detail-links">
+    {% if paper.arxiv_url %}<a href="{{ paper.arxiv_url }}" target="_blank" class="ext-link">arXiv</a>{% endif %}
+    {% if paper.hf_url %}<a href="{{ paper.hf_url }}" target="_blank" class="ext-link">HuggingFace</a>{% endif %}
+    {% if paper.pdf_url %}<a href="{{ paper.pdf_url }}" target="_blank" class="ext-link">PDF</a>{% endif %}
+  </div>
+
+  {# 总结内容 — 按状态降级 #}
+  {% if summary_state == 'done' and paper.summary %}
+    {% if paper.summary_status and paper.summary_status.quality == 'low' %}
+    <div class="quality-warning">⚠️ AI 总结质量较低，仅供参考</div>
+    {% elif paper.summary_status and paper.summary_status.quality == 'degraded' %}
+    <div class="quality-warning">📝 总结部分字段不完整</div>
+    {% endif %}
+
+    {% if paper.summary.one_line %}
+    <section class="summary-section">
+      <h2>一句话摘要</h2>
+      <p class="one-line">{{ paper.summary.one_line }}</p>
+    </section>
+    {% endif %}
+
+    {% if paper.summary.difficulty %}
+    <section class="summary-section">
+      <h2>难度</h2>
+      <p>{{ paper.summary.difficulty }}</p>
+    </section>
+    {% endif %}
+
+    {% if paper.summary.motivation_problem %}
+    <section class="summary-section">
+      <h2>研究动机</h2>
+      {% if paper.summary.motivation_problem %}<p><strong>问题：</strong>{{ paper.summary.motivation_problem }}</p>{% endif %}
+      {% if paper.summary.motivation_goal %}<p><strong>目标：</strong>{{ paper.summary.motivation_goal }}</p>{% endif %}
+      {% if paper.summary.motivation_gap %}<p><strong>差距：</strong>{{ paper.summary.motivation_gap }}</p>{% endif %}
+    </section>
+    {% endif %}
+
+    {% if paper.summary.method_key_idea %}
+    <section class="summary-section">
+      <h2>核心方法</h2>
+      {% if paper.summary.method_overview %}<p>{{ paper.summary.method_overview }}</p>{% endif %}
+      <p><strong>关键思路：</strong>{{ paper.summary.method_key_idea }}</p>
+      {% if paper.summary.method_novelty %}<p><strong>新颖性：</strong>{{ paper.summary.method_novelty }}</p>{% endif %}
+    </section>
+    {% endif %}
+
+    {% if paper.summary.results_main_json %}
+    <section class="summary-section">
+      <h2>实验结果</h2>
+      <p>{{ paper.summary.results_main_json }}</p>
+    </section>
+    {% endif %}
+
+    {% if paper.summary.limitations_json %}
+    <section class="summary-section">
+      <h2>局限与改进</h2>
+      <p>{{ paper.summary.limitations_json }}</p>
+    </section>
+    {% endif %}
+
+  {% elif summary_state == 'processing' %}
+    <div class="summary-placeholder processing">
+      <p>🔄 正在生成 AI 总结，请稍后刷新页面</p>
+    </div>
+
+  {% elif summary_state in ('failed', 'permanent_failure') %}
+    <div class="summary-placeholder failed">
+      <p>❌ 总结生成失败{% if paper.summary_status and paper.summary_status.error_type %}（{{ paper.summary_status.error_type }}）{% endif %}</p>
+      {% if paper.summary_status and paper.summary_status.error %}
+      <p class="error-detail">{{ paper.summary_status.error }}</p>
+      {% endif %}
+    </div>
+
+  {% else %}
+    <div class="summary-placeholder none">
+      <p>📝 AI 总结尚未生成</p>
+    </div>
+  {% endif %}
+
+  {# 英文摘要 — 始终显示 #}
+  {% if paper.abstract %}
+  <section class="summary-section abstract-section">
+    <h2>Abstract</h2>
+    <p class="abstract-en">{{ paper.abstract }}</p>
+  </section>
+  {% endif %}
+</article>
+{% endblock %}
@@ -0,0 +1,36 @@
+{% extends "base.html" %}
+
+{% block title %}{{ page_title }} — HF Daily Papers{% endblock %}
+
+{% block content %}
+<div class="date-nav">
+  {% if prev_day %}
+  <a href="/day/{{ prev_day }}" class="date-nav-btn">← 前一天</a>
+  {% endif %}
+  <h1 class="date-title">{{ current_date }}</h1>
+  {% if next_day <= today %}
+  <a href="/day/{{ next_day }}" class="date-nav-btn">后一天 →</a>
+  {% endif %}
+  <a href="/day/{{ today }}" class="date-nav-btn">今日</a>
+</div>
+
+{% if papers %}
+<div class="paper-list">
+  {% for paper in papers %}
+  {% include "partials/paper_card.html" %}
+  {% endfor %}
+</div>
+{% else %}
+<div class="empty-state">
+  <p>📭 当天暂无论文数据</p>
+  <p class="hint">试试浏览其他日期，或使用管理接口抓取数据</p>
+</div>
+{% endif %}
+
+<div class="date-quick-nav">
+  <span>有数据的日期：</span>
+  {% for d in available_dates[:10] %}
+  <a href="/day/{{ d }}" class="date-chip {% if d == current_date %}active{% endif %}">{{ d }}</a>
+  {% endfor %}
+</div>
+{% endblock %}
@@ -0,0 +1,44 @@
+{# 论文卡片组件 — paper 变量必须在上下文中 #}
+<article class="paper-card" data-arxiv="{{ paper.arxiv_id }}">
+  <div class="paper-card-header">
+    <h2 class="paper-title">
+      <a href="/paper/{{ paper.arxiv_id }}">
+        {{ paper.title_zh or paper.title_en }}
+      </a>
+    </h2>
+    <span class="paper-upvotes">👍 {{ paper.upvotes }}</span>
+  </div>
+
+  {% if paper.summary and paper.summary.one_line %}
+  <p class="paper-one-line">{{ paper.summary.one_line }}</p>
+  {% elif paper.abstract %}
+  <p class="paper-abstract-preview">{{ paper.abstract[:200] }}{% if paper.abstract|length > 200 %}…{% endif %}</p>
+  {% endif %}
+
+  <div class="paper-meta">
+    <span class="paper-authors">
+      {{ paper.authors|map(attribute='name')|join(', ')|truncate(80) }}
+    </span>
+  </div>
+
+  <div class="paper-tags">
+    {% for tag in paper.tags[:5] %}
+    <span class="tag">{{ tag.tag }}</span>
+    {% endfor %}
+  </div>
+
+  <div class="paper-footer">
+    <span class="summary-badge summary-{{ paper.summary_status.status if paper.summary_status else 'none' }}">
+      {% if not paper.summary_status or paper.summary_status.status == 'pending' %}
+        未总结
+      {% elif paper.summary_status.status == 'processing' %}
+        🔄 总结中
+      {% elif paper.summary_status.status == 'failed' or paper.summary_status.status == 'permanent_failure' %}
+        ❌ 总结失败
+      {% elif paper.summary_status.status == 'done' %}
+        ✅ 已总结
+      {% endif %}
+    </span>
+    <a href="/paper/{{ paper.arxiv_id }}" class="btn-detail">详情 →</a>
+  </div>
+</article>
@@ -0,0 +1,224 @@
+# API 路由与页面设计
+
+> 本文档定义页面路由、JSON API、管理接口、用户流程和验收标准。
+
+---
+
+## 1. 页面路由
+
+| 方法 | 路径 | 说明 |
+|------|------|------|
+| GET | `/` | 重定向到 `/day/{today}` |
+| GET | `/day/{date}` | 指定日期论文列表 |
+| GET | `/paper/{arxiv_id}` | 论文详情 |
+| GET | `/search` | 搜索页和搜索结果 |
+| GET | `/reading-list` | 收藏和阅读列表 |
+| GET | `/admin/logs` | 管理日志页，需要 token |
+| GET | `/rss.xml` | RSS Feed |
+
+后续增强：
+
+- `/trends`
+- `/compare?ids=id1,id2`
+- `/similar/{arxiv_id}`
+
+---
+
+## 2. 数据 API
+
+| 方法 | 路径 | 说明 |
+|------|------|------|
+| GET | `/api/papers?date=&tag=&q=` | 论文列表 |
+| GET | `/api/paper/{arxiv_id}` | 单篇论文详情 |
+| GET | `/api/dates` | 有数据的日期列表 |
+| GET | `/api/tags` | 标签及计数 |
+| GET | `/api/stats` | 统计信息 |
+| GET | `/api/search?q=&tag=` | FTS5 搜索 |
+
+---
+
+## 3. 用户数据 API
+
+| 方法 | 路径 | 说明 |
+|------|------|------|
+| POST | `/api/bookmark/{arxiv_id}` | 收藏/取消收藏 |
+| POST | `/api/reading-status/{arxiv_id}` | 更新阅读状态 |
+| GET | `/api/note/{arxiv_id}` | 获取笔记 |
+| POST | `/api/note/{arxiv_id}` | 保存笔记 |
+
+请求和响应使用 JSON。无账号体系，数据写入本地 SQLite。
+
+安全边界：
+
+- 默认 `APP_HOST=127.0.0.1` 时，用户数据 API 只服务本机访问。
+- 如果绑定到非本地地址，用户数据写接口需要启用 same-origin 检查或 token。
+
+---
+
+## 4. 管理接口
+
+所有管理接口都需要：
+
+```text
+Authorization: Bearer <ADMIN_TOKEN>
+```
+
+| 方法 | 路径 | 说明 |
+|------|------|------|
+| POST | `/admin/crawl` | 手动抓取指定日期，默认今天 |
+| POST | `/admin/summarize/{arxiv_id}` | 手动总结或重跑单篇 |
+| POST | `/admin/summarize` | 批量总结 pending 论文 |
+| POST | `/admin/cleanup` | 清理临时文件 |
+| POST | `/admin/delete` | 删除指定日期范围内的数据 |
+| GET | `/admin/logs` | 查看任务日志 |
+
+### `/admin/delete` 请求体
+
+```json
+{
+  "date_start": "2026-06-01",
+  "date_end": "2026-06-05",
+  "include_notes": true,
+  "confirm": "DELETE"
+}
+```
+
+`confirm` 必须为 `DELETE`，否则拒绝执行。
+
+---
+
+## 5. 页面状态
+
+### 首页 / 日期页
+
+每张论文卡片展示：
+
+- 中文标题；没有总结时展示英文标题。
+- 一句话摘要；没有总结时展示英文 abstract 截断。
+- 标签、作者、upvotes、难度。
+- 总结状态：未总结、总结中、失败、已完成。
+- 收藏按钮、阅读状态入口、详情链接。
+
+### 详情页
+
+详情页按状态降级：
+
+| 状态 | 展示 |
+|------|------|
+| 无总结 | 英文标题、作者、摘要、HF/arXiv 链接、手动总结按钮 |
+| processing | 元数据 + “正在生成总结” |
+| failed | 元数据 + 错误类型 + 手动重跑按钮 |
+| done/normal | 完整中文结构化解读 |
+| done/degraded | 展示已有内容，缺失模块标注不完整 |
+| done/low | 顶部质量提示 + 已有内容 |
+
+详情模块：
+
+- 一句话摘要
+- 预置知识
+- 研究动机
+- 核心方法
+- 实验结果
+- 局限和改进方向
+- 原文链接
+- 收藏、阅读状态、个人笔记
+
+### 搜索页
+
+MVP 只提供关键词搜索：
+
+- 搜索框。
+- 标签筛选。
+- 结果按相关性和日期排序。
+- 命中片段高亮。
+
+语义搜索作为后续增强，UI 上先不展示模式切换。
+
+### 阅读列表
+
+筛选项：
+
+- 全部收藏。
+- 未读。
+- 已读摘要。
+- 已读原文。
+- 有笔记。
+- 标签。
+
+---
+
+## 6. 用户流程
+
+```text
+访问 /
+  -> /day/{today}
+  -> 浏览论文卡片
+  -> 点击论文进入 /paper/{arxiv_id}
+  -> 收藏 / 修改阅读状态 / 写笔记
+  -> 搜索 /search?q=...
+  -> 阅读列表 /reading-list
+```
+
+管理员流程：
+
+```text
+POST /admin/crawl
+  -> 抓取论文并入库
+  -> POST /admin/summarize
+  -> 生成总结
+  -> POST /admin/cleanup
+  -> 查看 /admin/logs
+```
+
+删除流程：
+
+```text
+POST /admin/delete
+  -> 校验 token 和 confirm
+  -> 删除日期范围内论文、索引、用户数据、本地文件
+  -> 写入删除记录和日志
+```
+
+---
+
+## 7. MVP 验收标准
+
+### 抓取
+
+- 指定日期能抓取 HF Daily Papers 前 N 篇。
+- 同一天重复抓取不会重复插入。
+- 空日期返回成功状态和 0 篇日志。
+- 网络失败有 timeout、重试和错误日志。
+
+### 总结
+
+- 单篇总结失败不会影响其他论文。
+- 必填字段缺失时自动重试一次。
+- 重试失败后标记 `permanent_failure`。
+- 总结成功后页面、FTS 索引和 summary.json 同步更新。
+- 成功或失败后都会清理 PDF/源码临时文件。
+
+### 页面
+
+- 首页能显示未总结、总结中、失败、完成状态。
+- 详情页无总结时仍可阅读英文元数据。
+- degraded/low 总结有清晰提示。
+- 移动端不出现主要内容横向溢出。
+
+### 搜索
+
+- 能搜索标题、摘要、作者、标签、中文总结。
+- 删除论文后搜索结果不再出现该论文。
+
+### 管理
+
+- 无 token 不能调用管理接口。
+- token 错误返回 401。
+- 删除接口没有 `confirm=DELETE` 时拒绝执行。
+- 删除指定日期范围后，页面、搜索索引、用户数据和本地文件保持一致。
+
+### 调度
+
+- 单 worker 下每日任务只执行一次。
+- 多 worker 或非本地 host 配置存在风险时，应用启动给出明确告警或拒绝启动。
+- `/` 的 today 和每日调度日期都按 `APP_TIMEZONE` 计算。
@@ -0,0 +1,394 @@
+# 数据模型
+
+> 本文档定义 SQLite 表、summary.json schema、索引同步、校验和删除策略。
+
+---
+
+## 1. 设计原则
+
+1. SQLite 是主存储，页面和 API 优先从 SQLite 读取。
+2. PDF、LaTeX 源码等下载文件是临时资产，解析和总结完成后清理。
+3. `meta.json`、`summary.json`、`raw_output.txt` 可作为可读备份保存在 `data/papers/{arxiv_id}/`。
+4. 作者和标签使用规范化表，避免 JSON 字符串聚合困难。
+5. FTS5 由独立索引表维护，写入/更新/删除论文时同步更新。
+6. ChromaDB 是后续增强，不能成为 MVP 页面渲染的必要依赖。
+7. 每个 SQLite 连接必须执行 `PRAGMA foreign_keys=ON`，确保级联删除生效。
+
+---
+
+## 2. 数据库表
+
+### papers — 论文主表
+
+```sql
+CREATE TABLE papers (
+    id                 INTEGER PRIMARY KEY AUTOINCREMENT,
+    arxiv_id           TEXT UNIQUE NOT NULL,
+    title_en           TEXT NOT NULL,
+    title_zh           TEXT,
+    abstract           TEXT,
+    published_at       DATE,
+    paper_date         DATE NOT NULL,
+    crawled_at         DATETIME NOT NULL,
+    upvotes            INTEGER DEFAULT 0,
+    hf_url             TEXT,
+    arxiv_url          TEXT,
+    pdf_url            TEXT,
+    source_url         TEXT,
+    asset_status       TEXT DEFAULT 'not_downloaded', -- not_downloaded / ready / failed / cleaned
+    asset_error        TEXT,
+    meta_path          TEXT,
+    summary_path       TEXT,
+    raw_output_path    TEXT,
+    summary_quality    TEXT       -- normal / degraded / low
+);
+```
+
+手动删除采用物理删除。删除审计写入 `data_delete_jobs` 和 `crawl_logs`。
+
+### paper_authors — 作者表
+
+```sql
+CREATE TABLE paper_authors (
+    id          INTEGER PRIMARY KEY AUTOINCREMENT,
+    paper_id    INTEGER NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
+    name        TEXT NOT NULL,
+    position    INTEGER DEFAULT 0,
+    UNIQUE(paper_id, name)
+);
+```
+
+### paper_tags — 标签表
+
+```sql
+CREATE TABLE paper_tags (
+    id          INTEGER PRIMARY KEY AUTOINCREMENT,
+    paper_id    INTEGER NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
+    tag         TEXT NOT NULL,
+    source      TEXT DEFAULT 'hf', -- hf / ai / user
+    UNIQUE(paper_id, tag, source)
+);
+```
+
+### paper_summaries — 结构化总结表
+
+```sql
+CREATE TABLE paper_summaries (
+    paper_id                 INTEGER PRIMARY KEY REFERENCES papers(id) ON DELETE CASCADE,
+    one_line                 TEXT,
+    difficulty               TEXT,
+    prerequisites_json       TEXT,
+    motivation_problem       TEXT,
+    motivation_goal          TEXT,
+    motivation_gap           TEXT,
+    method_overview          TEXT,
+    method_key_idea          TEXT,
+    method_steps_json        TEXT,
+    method_novelty           TEXT,
+    results_main_json        TEXT,
+    results_benchmarks_json  TEXT,
+    limitations_json         TEXT,
+    weaknesses_json          TEXT,
+    future_work_json         TEXT,
+    reproducibility          TEXT,
+    full_json                TEXT NOT NULL,
+    updated_at               DATETIME NOT NULL
+);
+```
+
+结构化字段用于页面、对比、搜索和排序；`full_json` 保留完整原始结构。
+
+### summary_status — 总结状态
+
+```sql
+CREATE TABLE summary_status (
+    id                INTEGER PRIMARY KEY AUTOINCREMENT,
+    paper_id          INTEGER NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
+    status            TEXT NOT NULL, -- pending / processing / done / failed / permanent_failure
+    quality           TEXT,          -- normal / degraded / low
+    error_type        TEXT,          -- pdf_download_failed / timeout / process_error / json_not_found / json_invalid / field_missing / schema_error / unknown
+    error             TEXT,
+    retry_count       INTEGER DEFAULT 0,
+    raw_output_saved  BOOLEAN DEFAULT FALSE,
+    started_at        DATETIME,
+    completed_at      DATETIME,
+    UNIQUE(paper_id)
+);
+```
+
+### papers_fts — 全文搜索索引
+
+```sql
+CREATE VIRTUAL TABLE papers_fts USING fts5(
+    title_en,
+    title_zh,
+    abstract,
+    authors,
+    tags,
+    summary_text,
+    tokenize='unicode61'
+);
+```
+
+使用普通 FTS5 表，由应用层显式维护。普通 FTS5 会复制一份索引文本，数据量可接受，换取简单可靠的更新和删除语义：
+
+- 新增论文：插入标题、摘要、作者、标签。
+- 总结完成：更新中文标题和 `summary_text`。
+- 收藏/笔记变更：不进入 FTS，避免个人笔记污染论文搜索。
+- 删除论文：同步删除对应 FTS row。
+
+写入时必须使用 `papers.id` 作为 FTS rowid：
+
+```sql
+INSERT INTO papers_fts(rowid, title_en, title_zh, abstract, authors, tags, summary_text)
+VALUES (:paper_id, :title_en, :title_zh, :abstract, :authors, :tags, :summary_text);
+```
+
+更新时可使用普通 `UPDATE`，也可先按 rowid 删除再插入。删除论文时执行：
+
+```sql
+DELETE FROM papers_fts WHERE rowid = :paper_id;
+```
+
+### crawl_logs — 任务日志
+
+```sql
+CREATE TABLE crawl_logs (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    task            TEXT NOT NULL, -- crawl / summarize / cleanup / delete / scheduler
+    status          TEXT NOT NULL, -- running / success / failed
+    date            DATE,
+    papers_found    INTEGER,
+    papers_new      INTEGER,
+    error           TEXT,
+    started_at      DATETIME NOT NULL,
+    completed_at    DATETIME
+);
+```
+
+### task_locks — 任务锁
+
+```sql
+CREATE TABLE task_locks (
+    id             INTEGER PRIMARY KEY AUTOINCREMENT,
+    task           TEXT NOT NULL,
+    lock_key       TEXT NOT NULL, -- 通常是日期，如 2026-06-05
+    status         TEXT NOT NULL, -- running / finished / failed
+    owner          TEXT,
+    acquired_at    DATETIME NOT NULL,
+    released_at    DATETIME
+);
+
+CREATE UNIQUE INDEX uq_task_locks_running
+ON task_locks(task, lock_key)
+WHERE status = 'running';
+```
+
+防重入规则：启动任务前插入 `status='running'` 的锁；插入失败说明同一任务正在运行，直接跳过或返回 409。任务完成后更新为 `finished` 或 `failed`。
+
+### user_bookmarks — 收藏
+
+```sql
+CREATE TABLE user_bookmarks (
+    id          INTEGER PRIMARY KEY AUTOINCREMENT,
+    paper_id    INTEGER NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
+    note        TEXT,
+    created_at  DATETIME NOT NULL,
+    UNIQUE(paper_id)
+);
+```
+
+### user_reading_status — 阅读状态
+
+```sql
+CREATE TABLE user_reading_status (
+    id          INTEGER PRIMARY KEY AUTOINCREMENT,
+    paper_id    INTEGER NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
+    status      TEXT NOT NULL, -- unread / skimmed / read_summary / read_full
+    updated_at  DATETIME NOT NULL,
+    UNIQUE(paper_id)
+);
+```
+
+### user_notes — 个人笔记
+
+```sql
+CREATE TABLE user_notes (
+    id          INTEGER PRIMARY KEY AUTOINCREMENT,
+    paper_id    INTEGER NOT NULL REFERENCES papers(id) ON DELETE CASCADE,
+    content     TEXT NOT NULL,
+    created_at  DATETIME NOT NULL,
+    updated_at  DATETIME NOT NULL,
+    UNIQUE(paper_id)
+);
+```
+
+### data_delete_jobs — 手动删除记录
+
+```sql
+CREATE TABLE data_delete_jobs (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+    date_start      DATE NOT NULL,
+    date_end        DATE NOT NULL,
+    include_notes   BOOLEAN DEFAULT TRUE,
+    paper_count     INTEGER DEFAULT 0,
+    status          TEXT NOT NULL, -- running / success / failed
+    error           TEXT,
+    started_at      DATETIME NOT NULL,
+    completed_at    DATETIME
+);
+```
+
+---
+
+## 3. summary.json Schema
+
+```python
+from pydantic import BaseModel, Field, field_validator
+
+
+class Prerequisites(BaseModel):
+    concepts: list[str] = Field(default_factory=list)
+    level: str = ""
+
+
+class Motivation(BaseModel):
+    problem: str
+    goal: str = ""
+    gap: str = ""
+
+
+class Method(BaseModel):
+    overview: str = ""
+    key_idea: str
+    steps: list[str] = Field(default_factory=list)
+    novelty: str = ""
+
+
+class Results(BaseModel):
+    main_findings: list[str] = Field(default_factory=list)
+    benchmarks: list[dict] = Field(default_factory=list)
+    limitations: list[str] = Field(default_factory=list)
+
+
+class Improvements(BaseModel):
+    weaknesses: list[str] = Field(default_factory=list)
+    future_work: list[str] = Field(default_factory=list)
+    reproducibility: str = ""
+
+
+class SummarySchema(BaseModel):
+    title_zh: str
+    one_line: str
+    tags: list[str]
+    difficulty: str = ""
+    paper_date: str | None = None
+    prerequisites: Prerequisites = Field(default_factory=Prerequisites)
+    motivation: Motivation
+    method: Method
+    results: Results = Field(default_factory=Results)
+    improvements: Improvements = Field(default_factory=Improvements)
+
+    @field_validator("title_zh", "one_line")
+    @classmethod
+    def non_empty_text(cls, value: str) -> str:
+        if not value or not value.strip():
+            raise ValueError("field cannot be empty")
+        return value.strip()
+
+    @field_validator("tags")
+    @classmethod
+    def non_empty_tags(cls, value: list[str]) -> list[str]:
+        tags = [tag.strip() for tag in value if tag and tag.strip()]
+        if not tags:
+            raise ValueError("tags cannot be empty")
+        return tags
+```
+
+实际实现时还要给 `Motivation.problem` 和 `Method.key_idea` 加同样的非空校验，空字符串视为 `field_missing`。
+
+### 字段分级
+
+| 级别 | 字段 | 处理 |
+|------|------|------|
+| 必填 | `title_zh`, `one_line`, `tags`, `motivation.problem`, `method.key_idea` | 缺失则失败并重试 |
+| 重要 | `motivation.goal`, `motivation.gap`, `method.overview`, `results.main_findings` | 缺失可入库，标记 `degraded` |
+| 可选 | `benchmarks`, `limitations`, `improvements`, `prerequisites` | 缺失用默认值 |
+
+---
+
+## 4. 校验和错误处理
+
+### 状态流转
+
+```text
+pending -> processing -> done
+                    └-> failed -> pending retry -> processing
+                    └-> permanent_failure
+```
+
+### 错误分级
+
+| error_type | 场景 | 自动重试 |
+|------------|------|----------|
+| timeout | pi 超时 | 是 |
+| pdf_download_failed | PDF 下载失败或文件不可读 | 是 |
+| process_error | pi 进程非 0 退出 | 是 |
+| json_not_found | 输出中找不到 JSON | 是 |
+| json_invalid | JSON 解析失败 | 是 |
+| field_missing | 必填字段缺失 | 是 |
+| schema_error | 字段类型不合法 | 是 |
+| unknown | 未分类异常 | 是 |
+
+最大自动重试次数为 1。重试后仍失败则标记 `permanent_failure`，管理后台可手动重跑。
+
+### 质量分级
+
+| quality | 条件 | 页面表现 |
+|---------|------|----------|
+| normal | 必填和重要字段完整 | 完整展示 |
+| degraded | 必填完整，重要字段部分缺失 | 缺失模块显示“不完整” |
+| low | 字段存在但内容明显空洞 | 顶部提示“AI 总结质量较低” |
+
+---
+
+## 5. 删除和清理策略
+
+### 临时文件清理
+
+每篇论文处理完成后删除：
+
+- `data/tmp/{arxiv_id}/paper.pdf`
+- `data/tmp/{arxiv_id}/source/`
+- 其他下载中间文件
+
+总结失败时也应清理下载文件，但保留 `raw_output.txt` 和错误日志。
+
+### 手动删除指定日期范围
+
+管理员可删除 `paper_date` 落在指定范围内的数据。删除流程：
+
+1. 查询目标论文。
+2. 删除用户收藏、阅读状态、笔记。
+3. 删除 summary/status/authors/tags。
+4. 删除 FTS5 索引。
+5. 删除 `data/papers/{arxiv_id}/` 和 `data/tmp/{arxiv_id}/`。
+6. 物理删除 `papers` 记录。
+7. 写入 `data_delete_jobs` 和 `crawl_logs`。
+
+如后续需要可恢复删除，再引入 `deleted_at` 软删除字段；MVP 不实现。
+
+---
+
+## 6. ChromaDB 增强设计
+
+ChromaDB 不进入 MVP。接入时只索引 `paper_summaries` 中的高信号字段：
+
+- 中文标题
+- 英文标题
+- 标签
+- 一句话摘要
+- `motivation_problem`
+- `method_key_idea`
+
+向量维度必须和 `EMBED_MODEL` 匹配。写入前校验 embedding 长度，不匹配则跳过语义索引并记录日志，不影响普通页面和 FTS 搜索。
@@ -0,0 +1,269 @@
+# 服务模块详解
+
+> 本文档描述各服务模块的职责、输入输出、失败处理和实现约束。
+
+---
+
+## 1. 爬虫服务
+
+**职责**：从 HuggingFace Daily Papers 获取论文列表，写入元数据。PDF 不在抓取阶段长期保存。
+
+### 数据源
+
+- Daily Papers API：`GET https://huggingface.co/api/daily_papers?date=YYYY-MM-DD`
+- PDF：`https://arxiv.org/pdf/{arxiv_id}.pdf`（总结阶段按需下载）
+- 源码（后续增强）：`https://arxiv.org/e-print/{arxiv_id}`
+
+HuggingFace 官方 Hub API 文档说明 `/api/daily_papers` 支持 `date` 查询参数。
+
+### 规则
+
+- `arxiv_id` 是唯一键。
+- 重复抓取同一天时，已有论文只更新 upvotes、标签等可变元数据，不重复插入。
+- 网络请求必须设置 timeout、User-Agent、重试次数。
+- API 返回空列表时记录成功日志，不视为失败。
+- 抓取阶段不下载 PDF；总结阶段 PDF 下载失败时更新 `asset_status=failed` 和 `summary_status.error_type=pdf_download_failed`。
+
+### 接口
+
+```python
+async def fetch_daily(date: str, top_n: int) -> list[PaperMeta]: ...
+async def upsert_papers(papers: list[PaperMeta]) -> list[Paper]: ...
+```
+
+---
+
+## 2. AI 总结服务
+
+**职责**：调用 pi CLI，把单篇论文转成结构化中文总结。
+
+### 调用原则
+
+- 一篇论文一次 pi 调用。
+- 并发数由 `SUMMARY_CONCURRENCY` 控制，默认 3。
+- 单篇超时由 `SUMMARY_TIMEOUT_SECONDS` 控制，默认 300 秒。
+- pi 路径通过 `PI_BIN` 配置，当前可以先使用宿主机路径；跑通后再抽象部署方式。
+- PDF 在总结开始前按需下载到 `data/tmp/{arxiv_id}/paper.pdf`，总结成功或失败后清理。
+
+### 调用示例
+
+```bash
+pi -p --skill daily-paper-summary \
+  "请深度解读以下论文，并按指定 JSON schema 输出：
+   @data/papers/2401.12345/meta.json
+   @data/tmp/2401.12345/paper.pdf"
+```
+
+### 流程
+
+```text
+取 pending 论文
+  -> 下载 PDF 到 data/tmp/{arxiv_id}/paper.pdf
+  -> status=processing
+  -> 调 pi
+  -> 提取 JSON
+  -> Pydantic 校验
+  -> 写 summary.json
+  -> 写 paper_summaries / paper_tags / papers_fts
+  -> status=done
+  -> 清理 PDF/源码临时文件
+```
+
+失败时保存 raw output、更新 `summary_status`，并清理下载文件。
+
+PDF 下载失败不调用 pi，直接记录 `pdf_download_failed` 并进入重试流程。
+
+---
+
+## 3. 搜索服务
+
+**职责**：MVP 提供 FTS5 关键词搜索；后续接入 ChromaDB 语义搜索。
+
+### FTS5 搜索
+
+索引字段：
+
+- 英文标题
+- 中文标题
+- 英文摘要
+- 作者
+- 标签
+- 中文总结正文
+
+应用层负责同步 FTS：
+
+```python
+def build_fts_document(paper: Paper, summary: PaperSummary | None) -> FtsDocument:
+    summary_text = ""
+    if summary:
+        summary_text = " ".join([
+            summary.one_line or "",
+            summary.motivation_problem or "",
+            summary.motivation_goal or "",
+            summary.method_overview or "",
+            summary.method_key_idea or "",
+            " ".join(summary.results_main or []),
+        ])
+    return FtsDocument(...)
+```
+
+### ChromaDB 语义搜索（后续）
+
+接入时要求：
+
+- `CHROMA_ENABLED=true` 才初始化。
+- embedding API 失败不能影响总结入库。
+- embedding 维度和配置不匹配时记录日志并跳过。
+- 使用当前 ChromaDB 官方 API 重新确认查询和过滤语法后实现。
+
+---
+
+## 4. 页面渲染服务
+
+**职责**：从 SQLite 读取数据并渲染 Jinja2 模板。
+
+kami 只作为风格参考：
+
+- 参考纸张质感、留白、字体层级和墨蓝强调色。
+- 不调用 kami，不依赖 kami 生成页面。
+- CSS 放在 `app/static/css/style.css`，按本项目页面实际结构维护。
+
+页面必须支持降级状态：
+
+- 无总结：显示英文元数据和“AI 总结尚未生成”。
+- 总结失败：显示错误类型和手动重跑入口。
+- degraded/low：显示提示，但仍展示已有内容。
+
+---
+
+## 5. 用户数据服务
+
+**职责**：本地个人化数据，无账号体系。
+
+功能：
+
+- 收藏/取消收藏。
+- 阅读状态：`unread`、`skimmed`、`read_summary`、`read_full`。
+- 个人 Markdown 笔记。
+- 阅读列表：按收藏、状态、标签、日期筛选。
+
+所有用户数据跟随论文删除一起删除。
+
+---
+
+## 6. 清理和删除服务
+
+**职责**：清理临时文件，并支持管理员手动删除指定日期范围内的数据。
+
+### 临时文件清理
+
+触发时机：
+
+- 单篇总结成功后。
+- 单篇总结失败后。
+- 每日任务结束后兜底扫描 `data/tmp/`。
+
+### 手动删除
+
+接口：
+
+```python
+async def delete_papers_by_date_range(
+    date_start: date,
+    date_end: date,
+    include_notes: bool = True,
+) -> DeleteResult: ...
+```
+
+要求：
+
+- 删除前统计目标论文数量。
+- 删除 DB 记录、FTS 索引、本地文件。
+- 删除失败时记录具体 arXiv ID 和错误。
+- 日期范围必须有限制，避免误删全部数据；管理接口需要二次确认参数。
+
+---
+
+## 7. 调度服务
+
+**职责**：自动执行每日抓取和总结。
+
+### 约束
+
+- 应用以单 worker 运行。
+- `APP_WORKERS` 必须为 1，或 `SCHEDULER_ENABLED=false`。
+- 启动时检查运行中任务，避免重复执行。
+- 同一日期同一任务使用数据库锁或日志状态防重入。
+- 推荐使用 `task_locks` 表；抢锁失败时，自动任务跳过，管理接口返回 409。
+
+### 每日流程
+
+```text
+08:00
+  -> 按 APP_TIMEZONE 计算 today
+  -> crawl(date=today)
+  -> summarize pending papers
+  -> cleanup tmp files
+  -> write logs
+```
+
+手动触发方式：
+
+- CLI：`python -m app.cli crawl --date YYYY-MM-DD`
+- API：`POST /admin/crawl`
+
+---
+
+## 8. 管理和安全服务
+
+**职责**：保护所有有副作用的管理操作。
+
+### 鉴权
+
+管理接口必须要求 `ADMIN_TOKEN`：
+
+```text
+Authorization: Bearer <ADMIN_TOKEN>
+```
+
+受保护接口：
+
+- `POST /admin/crawl`
+- `POST /admin/summarize/{arxiv_id}`
+- `POST /admin/summarize`
+- `POST /admin/cleanup`
+- `POST /admin/delete`
+- `GET /admin/logs`
+
+如果 `ADMIN_TOKEN` 为空或为默认值 `change-me`，应用启动时应警告；如果 `APP_HOST` 不是 `127.0.0.1`，应拒绝启动或要求显式确认。
+
+用户数据接口默认仅面向本地使用。如果 `APP_HOST=127.0.0.1`，收藏、阅读状态、笔记接口不额外要求 token；如果绑定到非本地地址，应启用 same-origin 检查或要求 `ADMIN_TOKEN`，避免内网其他人修改本地笔记。
+
+---
+
+## 9. RSS 服务
+
+**职责**：输出最近论文的 RSS Feed。
+
+MVP 只做 `/rss.xml`：
+
+- 默认最近 7 天。
+- 支持 `?tag=RAG`。
+- 有中文标题则用中文标题，否则用英文标题。
+- 详情链接指向本站 `/paper/{arxiv_id}`。
+
+Atom 和 JSON Feed 作为后续增强。
+
+---
+
+## 10. 后续增强服务
+
+这些能力暂不进入 MVP：
+
+- LaTeX 图片提取。
+- ChromaDB 语义搜索。
+- 相似论文推荐。
+- 趋势看板。
+- 论文对比页。
+
+实现前需要重新评估数据量、API 成本、页面复杂度和验收标准。
@@ -0,0 +1,30 @@
+[project]
+name = "hf-daily-papers"
+version = "0.1.0"
+description = "HuggingFace Daily Papers — 中文论文导览站"
+requires-python = ">=3.12"
+dependencies = [
+    "fastapi>=0.115",
+    "uvicorn[standard]>=0.34",
+    "sqlalchemy>=2.0",
+    "httpx>=0.28",
+    "jinja2>=3.1",
+    "python-multipart>=0.0.18",
+    "pydantic>=2.0",
+    "pydantic-settings>=2.0",
+    "typer>=0.15",
+    "python-dotenv>=1.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0",
+    "pytest-asyncio>=0.24",
+]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["app"]
@@ -0,0 +1,5 @@
+"""快捷脚本：初始化数据库。"""
+
+if __name__ == "__main__":
+    from app.cli import cli_app
+    cli_app(["init-db"])
@@ -0,0 +1,6 @@
+"""快捷脚本：手动抓取指定日期。用法: python scripts/manual_crawl.py [YYYY-MM-DD] [--top N]"""
+
+if __name__ == "__main__":
+    import sys
+    from app.cli import cli_app
+    cli_app(["crawl"] + sys.argv[1:])
				`@@ -0,0 +1 @@`
				`/* app.js — 基础前端交互（HTMX 后续增强） */`