feat: refactor summarizer and PDF extraction pipeline

- Split summarizer into summary_generator and summary_persister modules
- Refactor pdf_image_extractor to two-phase pipeline with PicoDet layout detection
- Add layout_detector service for PicoDet-S_layout_3cls integration
- Add exceptions module with ConflictError and NotFoundError
- Improve admin dashboard with better statistics and task management
- Add design review document with system optimization suggestions
- Add new tests for crawler, pdf_downloader, pipeline, and summary_utils
- Update dependencies and configuration
- Clean up dead code and improve error handling
This commit is contained in:
2026-06-13 13:16:47 +08:00
parent e2f0e1a8be
commit 21f16e6756
43 changed files with 3304 additions and 1494 deletions
+5 -3
View File
@@ -12,6 +12,8 @@ from app.utils import templates
router = APIRouter()
MAX_COMPARE_PAPERS = 5
@router.get("/compare")
def compare_page(
@@ -33,9 +35,9 @@ def compare_page(
arxiv_ids = [i.strip() for i in ids.split(",") if i.strip()]
# 最多 5
if len(arxiv_ids) > 5:
arxiv_ids = arxiv_ids[:5]
# 最多 MAX_COMPARE_PAPERS
if len(arxiv_ids) > MAX_COMPARE_PAPERS:
arxiv_ids = arxiv_ids[:MAX_COMPARE_PAPERS]
if not arxiv_ids:
return templates.TemplateResponse(