feat: refactor summarizer and PDF extraction pipeline
- Split summarizer into summary_generator and summary_persister modules - Refactor pdf_image_extractor to two-phase pipeline with PicoDet layout detection - Add layout_detector service for PicoDet-S_layout_3cls integration - Add exceptions module with ConflictError and NotFoundError - Improve admin dashboard with better statistics and task management - Add design review document with system optimization suggestions - Add new tests for crawler, pdf_downloader, pipeline, and summary_utils - Update dependencies and configuration - Clean up dead code and improve error handling
This commit is contained in:
+1
-1
@@ -19,7 +19,7 @@ dependencies = [
|
||||
"pymupdf>=1.25",
|
||||
"itsdangerous>=2.2.0",
|
||||
"bleach>=6.4.0",
|
||||
"pymupdf4llm>=1.27.2.3",
|
||||
"onnxruntime>=1.17",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
|
||||
Reference in New Issue
Block a user