🚀 Auto Paper Digest (APD): Automated AI Paper Interpretation and Publishing System
Abstract
Auto Paper Digest (APD) is a one-stop automated AI paper processing platform that can automatically capture cutting-edge AI papers, generate video explanations, and publish them to platforms such as HuggingFace and Douyin, enabling wider dissemination of scientific research results.
Feature Highlights
📚 Paper Acquisition
APD can automatically capture weekly popular AI papers from Hugging Face, supporting precise acquisition through weekly URLs. The system automatically parses paper information, including title, authors, abstract, and other key content, providing basic data for subsequent processing.
📄 PDF Download
When downloading paper PDF files from arXiv, the system uses idempotent operations and SHA256 verification mechanisms to ensure the integrity and accuracy of file downloads. Downloaded papers are automatically cached to avoid repeated downloads of the same files, improving processing efficiency.
🎬 Video Generation
Through the NotebookLM platform, APD can automatically generate video explanations of papers. The system converts paper content into a format suitable for video presentation, including voice explanations, subtitle display, etc., making complex academic content more accessible and easy to understand.
📤 Automatic Publishing
Generated videos can be automatically uploaded to HuggingFace Dataset, making it convenient for users to share and disseminate on the platform. The system automatically updates the metadata.json file to ensure the accuracy and completeness of video information.
📱 Douyin Publishing
APD also supports automatic publishing of videos to the Douyin Creator Platform. Users only need to complete one login, and the system will persistently save the login state, eliminating the need for repeated logins in subsequent publishing. During the publishing process, the system automatically fills in video titles, adds topic tags, and other information, simplifying the publishing process.
🌐 Portal Website
Through the Gradio portal website, users can play generated videos online. The portal website provides a user-friendly interface, supporting multiple ways to browse and search video content by week, day, etc., allowing users to quickly find the paper interpretations they are interested in.
💾 Resumable Processing
The system uses SQLite database for state tracking and supports resumable processing. Even if processing is interrupted, users can continue to complete unprocessed papers later, avoiding repeated work.
🔐 Login Reuse
Google and Douyin login states are persistently saved. Users only need to complete one login, and no repeated logins are required for subsequent use. This not only improves usage efficiency but also reduces the operational burden on users.
Architecture Design
The architecture design of APD is divided into three main phases: Upload, Download, and Publish. The system uses SQLite database for state management to ensure the collaborative work of each phase.
┌─────────────────────────────────────────────────────────────────────┐
│ Auto Paper Digest │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: Upload Phase 2: Download Phase 3: Publish │
│ ┌─────────┐ ┌─────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ HF │───▶│ arXiv │───▶│ NotebookLM │───▶│ HuggingFace │ │
│ │ Papers │ │ PDFs │ │ Videos │ │ Dataset │ │
│ └─────────┘ └─────────┘ └─────────────┘ └──────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ SQLite Database │ │
│ │ (status: NEW → PDF_OK → NBLM_OK → VIDEO_OK) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Portal Website │ │ Douyin │ │ Other │ │
│ │ (HF Spaces) │ │ Creator │ │ Platforms │ │
│ └─────────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Quick Start
1. Installation
# Clone the repository
git clone https://github.com/brianxiadong/auto-paper-digest.git
cd auto-paper-digest
# Install dependencies
pip install -e .
# Install browser
playwright install chromium
2. Configure Environment Variables
# Copy configuration template
cp .env.example .env
# Edit .env to fill in HuggingFace configuration
# HF_TOKEN=hf_xxx
# HF_USERNAME=your-username
# HF_DATASET_NAME=paper-digest-videos
3. First-time Google Login
apd login
The browser will open the NotebookLM login page. After completing Google login, the session will be saved. This eliminates the need for repeated logins in subsequent use, improving usage efficiency.
Three-Phase Workflow
Phase 1: Upload and Trigger Video Generation
apd upload --week 2026-01 --headful --max 10
This command will perform the following operations:
-
Get this week’s papers from HuggingFace (using /week/YYYY-WXXURL) -
Download arXiv PDFs (supports caching, skipped if already downloaded) -
Upload to NotebookLM -
Trigger video generation (does not wait for completion)
Phase 2: Download Generated Videos
After waiting a few minutes (video generation takes time), run the following command:
apd download-video --week 2026-01 --headful
The system supports caching, and already downloaded videos will be automatically skipped. If you need to re-download, you can use the --force parameter to force re-download.
Phase 3: Publish to HuggingFace
apd publish --week 2026-01
This command will perform the following operations:
-
Upload videos to HuggingFace Dataset -
Update metadata.json -
Generate Markdown summary
Phase 3b: Publish to Douyin (Optional)
First-time use requires logging in to Douyin:
apd douyin-login
The browser will open the Douyin Creator Center login page. After scanning the QR code with the Douyin app to log in, the login state will be saved. No repeated logins are required for subsequent video publishing.
Then publish videos to Douyin:
apd publish-douyin --week 2026-01 --headful
This command will perform the following operations:
-
Automatically upload videos to the Douyin Creator Platform -
Fill in video titles (paper titles) -
Add topic tags (AI, paper interpretation, etc.) -
Automatically click publish
For first-time use, it is recommended to add the --headful parameter to observe the publishing process. After confirming it is correct, you can remove this parameter. This ensures the smooth progress of the publishing process and avoids unexpected situations.
Daily Processing (Optional)
In addition to weekly processing, APD also supports daily paper processing:
# Get papers for a specific date
apd fetch --date 2026-01-08 --max 10
# Upload and generate videos
apd upload --date 2026-01-08 --headful --max 10
# Download videos
apd download-video --date 2026-01-08 --headful
# Publish to Douyin
apd publish-douyin --date 2026-01-08 --headful
It should be noted that there are usually no new papers published on weekends and holidays, and the system will prompt an error instead of continuing processing. This avoids unnecessary resource waste and improves system operating efficiency.
Folder Structure
Daily and weekly data are stored separately for easy management and lookup by users:
-
data/pdfs/weekly/2026-01/– Weekly processed PDFs -
data/pdfs/daily/2026-01-08/– Daily processed PDFs -
data/videos/weekly/2026-01/– Weekly processed videos -
data/videos/daily/2026-01-08/– Daily processed videos
🌐 Portal Website
After videos are published, they can be directly viewed on the HuggingFace Spaces portal website:
https://huggingface.co/spaces/your-username/paper-digest
The portal website provides a user-friendly interface, supporting multiple ways to browse and search video content by week, day, etc., allowing users to quickly find the paper interpretations they are interested in.
Command Reference
| Command | Description |
|---|---|
apd login |
Open browser to complete Google login (NotebookLM) |
apd douyin-login |
Open browser to complete Douyin login |
apd fetch |
Only get paper list (no download) |
apd download |
Only download PDFs (supports caching) |
apd upload |
Phase 1: Get + Download + Upload + Trigger generation |
apd download-video |
Phase 2: Download generated videos (supports caching) |
apd publish |
Phase 3: Publish to HuggingFace |
apd publish-douyin |
Phase 3b: Publish to Douyin Creator Platform |
apd digest |
Generate local weekly report |
apd run |
Complete workflow (one-click execution, needs to wait for video generation) |
apd status |
Check paper processing status |
Common Parameters
--week, -w Specify week ID (e.g., 2026-01), default current week
--max, -m Maximum number of papers
--headful Show browser window (for debugging)
--force, -f Force reprocessing (ignore caching)
--debug Enable debug logging
Directory Structure
auto-paper-digest/
├── apd/ # Main program package
│ ├── cli.py # Command line entry
│ ├── config.py # Configuration constants
│ ├── db.py # SQLite database
│ ├── hf_fetcher.py # HF paper fetching (supports weekly URLs)
│ ├── pdf_downloader.py # PDF downloader
│ ├── nblm_bot.py # NotebookLM automation
│ ├── douyin_bot.py # Douyin Creator Platform automation
│ ├── publisher.py # HuggingFace publishing
│ ├── digest.py # Weekly report generation
│ └── utils.py # Utility functions
├── portal/ # HuggingFace Spaces portal
│ ├── app.py # Gradio application
│ ├── requirements.txt
│ └── README.md
├── data/
│ ├── apd.db # SQLite database
│ ├── .douyin_auth.json # Douyin login state
│ ├── pdfs/ # Downloaded PDFs (organized by week)
│ ├── videos/ # Generated videos (organized by week)
│ ├── digests/ # Weekly report files
│ └── profiles/ # Browser configuration (including login state)
├── .env.example # Environment variable template
└── pyproject.toml
Caching Mechanism
PDF Caching
Downloaded PDFs are verified using SHA256 to ensure file integrity and accuracy. The same files are automatically skipped for download, avoiding repeated work and improving processing efficiency.
Video Caching
Uses filename prefix matching ({paper_id}_*.mp4), supporting new naming format: {paper_id}_{video_title}.mp4. If you need to re-download videos, you can use the --force parameter to force re-download.
Publishing Caching
Published papers are recorded in metadata.json, and repeated publishing is automatically skipped, avoiding repeated publishing of the same content and improving publishing efficiency.
State Tracking
NEW → PDF_OK → NBLM_OK → VIDEO_OK
│ │
└──────── ERROR ◄──────────┘
| Status | Meaning |
|---|---|
NEW |
Paper has been fetched, pending processing |
PDF_OK |
PDF has been downloaded |
NBLM_OK |
Uploaded to NotebookLM, video generation in progress |
VIDEO_OK |
Video has been downloaded |
ERROR |
Processing failed (will automatically retry) |
Check status:
apd status --week 2026-01
apd status --week 2026-01 --status ERROR
Troubleshooting
Login Issues
If you encounter login issues, you can re-run the following command:
apd login
NotebookLM Interface Changes
If the NotebookLM interface changes, you can view screenshots:
ls data/profiles/screenshots/
Video Not Generated
Video generation takes some time. If the video is not generated, you can wait a few minutes and retry:
apd download-video --week 2026-01 --headful
HuggingFace Token Issues
Ensure the .env file is configured correctly:
cat .env
# Check HF_TOKEN and HF_USERNAME
Technology Stack
-
Python 3.11+ – Core language -
Playwright – Browser automation -
SQLite – State persistence -
Click – CLI framework -
Requests + BeautifulSoup – Web scraping -
huggingface_hub – HF API -
Gradio – Portal website -
python-dotenv – Environment variable management

