🚀 Auto Paper Digest (APD): Automated AI Paper Interpretation and Publishing System

Abstract

Auto Paper Digest (APD) is a one-stop automated AI paper processing platform that can automatically capture cutting-edge AI papers, generate video explanations, and publish them to platforms such as HuggingFace and Douyin, enabling wider dissemination of scientific research results.

Feature Highlights

📚 Paper Acquisition

APD can automatically capture weekly popular AI papers from Hugging Face, supporting precise acquisition through weekly URLs. The system automatically parses paper information, including title, authors, abstract, and other key content, providing basic data for subsequent processing.

📄 PDF Download

When downloading paper PDF files from arXiv, the system uses idempotent operations and SHA256 verification mechanisms to ensure the integrity and accuracy of file downloads. Downloaded papers are automatically cached to avoid repeated downloads of the same files, improving processing efficiency.

🎬 Video Generation

Through the NotebookLM platform, APD can automatically generate video explanations of papers. The system converts paper content into a format suitable for video presentation, including voice explanations, subtitle display, etc., making complex academic content more accessible and easy to understand.

📤 Automatic Publishing

Generated videos can be automatically uploaded to HuggingFace Dataset, making it convenient for users to share and disseminate on the platform. The system automatically updates the metadata.json file to ensure the accuracy and completeness of video information.

📱 Douyin Publishing

APD also supports automatic publishing of videos to the Douyin Creator Platform. Users only need to complete one login, and the system will persistently save the login state, eliminating the need for repeated logins in subsequent publishing. During the publishing process, the system automatically fills in video titles, adds topic tags, and other information, simplifying the publishing process.

🌐 Portal Website

Through the Gradio portal website, users can play generated videos online. The portal website provides a user-friendly interface, supporting multiple ways to browse and search video content by week, day, etc., allowing users to quickly find the paper interpretations they are interested in.

💾 Resumable Processing

The system uses SQLite database for state tracking and supports resumable processing. Even if processing is interrupted, users can continue to complete unprocessed papers later, avoiding repeated work.

🔐 Login Reuse

Google and Douyin login states are persistently saved. Users only need to complete one login, and no repeated logins are required for subsequent use. This not only improves usage efficiency but also reduces the operational burden on users.

Architecture Design

The architecture design of APD is divided into three main phases: Upload, Download, and Publish. The system uses SQLite database for state management to ensure the collaborative work of each phase.

┌─────────────────────────────────────────────────────────────────────┐
│                        Auto Paper Digest                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   Phase 1: Upload            Phase 2: Download      Phase 3: Publish │
│   ┌─────────┐    ┌─────────┐    ┌─────────────┐    ┌──────────────┐ │
│   │   HF    │───▶│  arXiv  │───▶│ NotebookLM  │───▶│  HuggingFace │ │
│   │ Papers  │    │  PDFs   │    │   Videos    │    │   Dataset    │ │
│   └─────────┘    └─────────┘    └─────────────┘    └──────────────┘ │
│        │               │               │                   │         │
│        ▼               ▼               ▼                   ▼         │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │                    SQLite Database                           │   │
│   │      (status: NEW → PDF_OK → NBLM_OK → VIDEO_OK)            │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│              ┌───────────────┼───────────────┐                       │
│              ▼               ▼               ▼                       │
│   ┌─────────────────┐ ┌─────────────┐ ┌─────────────┐               │
│   │ Portal Website  │ │   Douyin    │ │   Other     │               │
│   │  (HF Spaces)    │ │  Creator    │ │  Platforms  │               │
│   └─────────────────┘ └─────────────┘ └─────────────┘               │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Quick Start

1. Installation

# Clone the repository
git clone https://github.com/brianxiadong/auto-paper-digest.git
cd auto-paper-digest

# Install dependencies
pip install -e .

# Install browser
playwright install chromium

2. Configure Environment Variables

# Copy configuration template
cp .env.example .env

# Edit .env to fill in HuggingFace configuration
# HF_TOKEN=hf_xxx
# HF_USERNAME=your-username
# HF_DATASET_NAME=paper-digest-videos

3. First-time Google Login

apd login

The browser will open the NotebookLM login page. After completing Google login, the session will be saved. This eliminates the need for repeated logins in subsequent use, improving usage efficiency.

Three-Phase Workflow

Phase 1: Upload and Trigger Video Generation

apd upload --week 2026-01 --headful --max 10

This command will perform the following operations:

  • Get this week’s papers from HuggingFace (using /week/YYYY-WXX URL)
  • Download arXiv PDFs (supports caching, skipped if already downloaded)
  • Upload to NotebookLM
  • Trigger video generation (does not wait for completion)

Phase 2: Download Generated Videos

After waiting a few minutes (video generation takes time), run the following command:

apd download-video --week 2026-01 --headful

The system supports caching, and already downloaded videos will be automatically skipped. If you need to re-download, you can use the --force parameter to force re-download.

Phase 3: Publish to HuggingFace

apd publish --week 2026-01

This command will perform the following operations:

  • Upload videos to HuggingFace Dataset
  • Update metadata.json
  • Generate Markdown summary

Phase 3b: Publish to Douyin (Optional)

First-time use requires logging in to Douyin:

apd douyin-login

The browser will open the Douyin Creator Center login page. After scanning the QR code with the Douyin app to log in, the login state will be saved. No repeated logins are required for subsequent video publishing.

Then publish videos to Douyin:

apd publish-douyin --week 2026-01 --headful

This command will perform the following operations:

  • Automatically upload videos to the Douyin Creator Platform
  • Fill in video titles (paper titles)
  • Add topic tags (AI, paper interpretation, etc.)
  • Automatically click publish

For first-time use, it is recommended to add the --headful parameter to observe the publishing process. After confirming it is correct, you can remove this parameter. This ensures the smooth progress of the publishing process and avoids unexpected situations.

Daily Processing (Optional)

In addition to weekly processing, APD also supports daily paper processing:

# Get papers for a specific date
apd fetch --date 2026-01-08 --max 10

# Upload and generate videos
apd upload --date 2026-01-08 --headful --max 10

# Download videos
apd download-video --date 2026-01-08 --headful

# Publish to Douyin
apd publish-douyin --date 2026-01-08 --headful

It should be noted that there are usually no new papers published on weekends and holidays, and the system will prompt an error instead of continuing processing. This avoids unnecessary resource waste and improves system operating efficiency.

Folder Structure

Daily and weekly data are stored separately for easy management and lookup by users:

  • data/pdfs/weekly/2026-01/ – Weekly processed PDFs
  • data/pdfs/daily/2026-01-08/ – Daily processed PDFs
  • data/videos/weekly/2026-01/ – Weekly processed videos
  • data/videos/daily/2026-01-08/ – Daily processed videos

🌐 Portal Website

After videos are published, they can be directly viewed on the HuggingFace Spaces portal website:

https://huggingface.co/spaces/your-username/paper-digest

The portal website provides a user-friendly interface, supporting multiple ways to browse and search video content by week, day, etc., allowing users to quickly find the paper interpretations they are interested in.

Command Reference

Command Description
apd login Open browser to complete Google login (NotebookLM)
apd douyin-login Open browser to complete Douyin login
apd fetch Only get paper list (no download)
apd download Only download PDFs (supports caching)
apd upload Phase 1: Get + Download + Upload + Trigger generation
apd download-video Phase 2: Download generated videos (supports caching)
apd publish Phase 3: Publish to HuggingFace
apd publish-douyin Phase 3b: Publish to Douyin Creator Platform
apd digest Generate local weekly report
apd run Complete workflow (one-click execution, needs to wait for video generation)
apd status Check paper processing status

Common Parameters

--week, -w     Specify week ID (e.g., 2026-01), default current week
--max, -m      Maximum number of papers
--headful      Show browser window (for debugging)
--force, -f    Force reprocessing (ignore caching)
--debug        Enable debug logging

Directory Structure

auto-paper-digest/
├── apd/                    # Main program package
│   ├── cli.py              # Command line entry
│   ├── config.py           # Configuration constants
│   ├── db.py               # SQLite database
│   ├── hf_fetcher.py       # HF paper fetching (supports weekly URLs)
│   ├── pdf_downloader.py   # PDF downloader
│   ├── nblm_bot.py         # NotebookLM automation
│   ├── douyin_bot.py       # Douyin Creator Platform automation
│   ├── publisher.py        # HuggingFace publishing
│   ├── digest.py           # Weekly report generation
│   └── utils.py            # Utility functions
├── portal/                 # HuggingFace Spaces portal
│   ├── app.py              # Gradio application
│   ├── requirements.txt
│   └── README.md
├── data/
│   ├── apd.db              # SQLite database
│   ├── .douyin_auth.json   # Douyin login state
│   ├── pdfs/               # Downloaded PDFs (organized by week)
│   ├── videos/             # Generated videos (organized by week)
│   ├── digests/            # Weekly report files
│   └── profiles/           # Browser configuration (including login state)
├── .env.example            # Environment variable template
└── pyproject.toml

Caching Mechanism

PDF Caching

Downloaded PDFs are verified using SHA256 to ensure file integrity and accuracy. The same files are automatically skipped for download, avoiding repeated work and improving processing efficiency.

Video Caching

Uses filename prefix matching ({paper_id}_*.mp4), supporting new naming format: {paper_id}_{video_title}.mp4. If you need to re-download videos, you can use the --force parameter to force re-download.

Publishing Caching

Published papers are recorded in metadata.json, and repeated publishing is automatically skipped, avoiding repeated publishing of the same content and improving publishing efficiency.

State Tracking

NEW → PDF_OK → NBLM_OK → VIDEO_OK
 │                          │
 └──────── ERROR ◄──────────┘
Status Meaning
NEW Paper has been fetched, pending processing
PDF_OK PDF has been downloaded
NBLM_OK Uploaded to NotebookLM, video generation in progress
VIDEO_OK Video has been downloaded
ERROR Processing failed (will automatically retry)

Check status:

apd status --week 2026-01
apd status --week 2026-01 --status ERROR

Troubleshooting

Login Issues

If you encounter login issues, you can re-run the following command:

apd login

NotebookLM Interface Changes

If the NotebookLM interface changes, you can view screenshots:

ls data/profiles/screenshots/

Video Not Generated

Video generation takes some time. If the video is not generated, you can wait a few minutes and retry:

apd download-video --week 2026-01 --headful

HuggingFace Token Issues

Ensure the .env file is configured correctly:

cat .env
# Check HF_TOKEN and HF_USERNAME

Technology Stack

  • Python 3.11+ – Core language
  • Playwright – Browser automation
  • SQLite – State persistence
  • Click – CLI framework
  • Requests + BeautifulSoup – Web scraping
  • huggingface_hub – HF API
  • Gradio – Portal website
  • python-dotenv – Environment variable management