feat: refactor summarizer and PDF extraction pipeline
- Split summarizer into summary_generator and summary_persister modules - Refactor pdf_image_extractor to two-phase pipeline with PicoDet layout detection - Add layout_detector service for PicoDet-S_layout_3cls integration - Add exceptions module with ConflictError and NotFoundError - Improve admin dashboard with better statistics and task management - Add design review document with system optimization suggestions - Add new tests for crawler, pdf_downloader, pipeline, and summary_utils - Update dependencies and configuration - Clean up dead code and improve error handling
This commit is contained in:
@@ -46,3 +46,8 @@ EMBED_API_BASE=https://api.siliconflow.cn/v1/embeddings
|
||||
EMBED_API_KEY=your_api_key_here
|
||||
EMBED_MODEL=Qwen/Qwen3-Embedding-4B
|
||||
EMBED_DIMENSIONS=2560
|
||||
|
||||
# ─── 布局检测 ─────────────────────────────
|
||||
# ONNX 模型路径(首次运行前执行 scripts/export_picodet_onnx.py 导出)
|
||||
# LAYOUT_MODEL_PATH=data/models/picodet_layout_3cls.onnx
|
||||
# LAYOUT_THRESHOLD=0.5
|
||||
|
||||
Reference in New Issue
Block a user