PAPER2WEB: Bringing Your Academic Papers to Life
An integrated guide for turning static PDFs into interactive, structured academic websites and presentation materials.

Table of Contents
1. Introduction
Academic papers are highly structured and information-dense, but their PDF format often limits discoverability and interactivity. Researchers, students, and project teams face challenges such as:
-
Difficulty navigating complex content -
Static figures and tables -
Time-consuming manual website creation
PAPER2WEB addresses these challenges by providing an autonomous pipeline that converts academic papers into interactive, explorable project websites. The pipeline iteratively refines both content and layout, producing engaging websites that showcase the research in a structured, readable, and interactive format.
Key features include:
-
Automatic layout-aware content generation -
Interactive navigation for users -
Support for posters, presentation videos, and PR materials -
Integration with advanced aesthetic agents (EvoPresent)
2. What’s New
Recent updates from the project:
-
EvoPresent Integration: Adds self-improving aesthetic agents for academic presentations. -
Paper2Web Dataset & Benchmark: Tens of thousands of categorized papers, including metadata and citation counts, available for analysis and model training. -
Paper2ALL Pipeline Release: Incorporates Paper2Video, Paper2Poster, and AutoPR, creating a unified toolchain for promotional materials.

3. Installation Guide
3.1 Prerequisites
Before installing, ensure the following:
-
Python ≥ 3.11 -
Conda (recommended for environment management) -
LibreOffice (required for document conversion) -
Poppler-utils (PDF rendering and parsing)
Tip: Conda environments help isolate dependencies and avoid conflicts between Python packages.
3.2 Creating Conda Environment
conda create -n p2w python=3.11
conda activate p2w
This creates an isolated environment named p2w for all Paper2Web dependencies.
3.3 Installing Dependencies
pip install -r requirements.txt
This installs Python packages required for the pipeline, including libraries for PDF processing, LLM interaction, and website generation.
3.4 System Dependencies
LibreOffice
sudo apt install libreoffice
If sudo is unavailable, download the executable version from LibreOffice and add it to your system PATH.
Poppler
conda install -c conda-forge poppler
Poppler is used for PDF parsing and rendering, enabling conversion from LaTeX/PDF to HTML content.
4. Configuration
Before running the pipeline, configure your API credentials in a .env file:
# OpenAI API
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1
# Optional: OpenRouter
OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=sk-or-your-openrouter-key-here
AutoPR Component:
cp AutoPR/.env.example AutoPR/.env
Edit credentials as needed.
Optional: Google Search API (for logo search):
GOOGLE_SEARCH_API_KEY=your_google_search_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id
5. Quick Start
5.1 Input Directory Structure
The pipeline automatically detects target platforms based on folder names:
papers/
├── 12345/ # Numeric → Twitter (English)
│ └── paper.pdf
└── research_project/ # Alphanumeric → Xiaohongshu (Chinese)
└── paper.pdf
5.2 Running All Modules
python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output"
For a specific PDF:
python pipeline_all.py \
--input-dir "path/to/papers" \
--output-dir "path/to/output" \
--pdf-path "path/to/paper.pdf"
5.3 Running Specific Modules
-
Website Generation Only:
python pipeline_all.py --model-choice 1
-
Poster Generation Only (default 48×36 inches):
python pipeline_all.py --model-choice 2
-
Poster with Custom Size:
python pipeline_all.py --model-choice 2 --poster-width-inches 60 --poster-height-inches 40
-
PR Material Generation Only:
python pipeline_all.py --model-choice 3
6. Generating Academic Presentation Videos (Paper2Video)
Paper2Video converts LaTeX papers into full presentation videos, including:
-
Slides -
Subtitles -
Audio narration -
Cursor animations -
Optional talking-head avatars

6.1 Environment Setup
cd paper2all/Paper2Video/src
conda create -n p2v python=3.10
conda activate p2v
pip install -r requirements.txt
conda install -c conda-forge tectonic ffmpeg poppler
6.2 Optional: Talking-Head Generation
Separate environment recommended to avoid package conflicts:
cd hallo2
conda create -n hallo python=3.10
conda activate hallo
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
huggingface-cli download fudan-generative-ai/hallo2 --local-dir ../pretrained_models
Find the Python executable path:
which python
Use this path as --talking_head_env in the pipeline.
6.3 Inference Pipeline
The pipeline consumes:
-
LaTeX sources of papers -
Reference images -
Reference audio
The pipeline outputs a complete academic presentation video. Minimum recommended GPU: NVIDIA A6000 (48GB).
6.4 Example Commands
Fast generation (without talking-head):
python pipeline_light.py \
--model_name_t gpt-4.1 \
--model_name_v gpt-4.1 \
--result_dir /path/to/output \
--paper_latex_root /path/to/latex_proj \
--ref_img /path/to/ref_img.png \
--ref_audio /path/to/ref_audio.wav \
--gpu_list [0,1,2,3,4,5,6,7]
Full generation (with talking-head):
python pipeline.py \
--model_name_t gpt-4.1 \
--model_name_v gpt-4.1 \
--model_name_talking hallo2 \
--result_dir /path/to/output \
--paper_latex_root /path/to/latex_proj \
--ref_img /path/to/ref_img.png \
--ref_audio /path/to/ref_audio.wav \
--talking_head_env /path/to/hallo2_env \
--gpu_list [0,1,2,3,4,5,6,7]
7. Paper2Web Dataset Overview
Dataset includes:
-
Metadata for papers with and without project websites -
Citation counts -
13 main categories:
| Category | Description |
|---|---|
| 3D Vision & Computational Graphics | Papers on 3D reconstruction and graphics |
| Multimodal Learning | Learning across images, text, and audio |
| Generation Models | Generative AI models |
| Speech & Audio | Processing and understanding audio signals |
| AI for Science | AI applied to scientific domains |
| ML System & Infrastructure | Frameworks and tools for ML |
| Deep Learning Architectures | Neural network design |
| Probabilistic Inference | Probabilistic reasoning methods |
| Natural Language Understanding | NLP and language models |
| Information Retrieval & Recommendation | Search engines, recommender systems |
| Reinforcement Learning | RL algorithms and applications |
| Trustworthy AI | Safety, fairness, explainability |
| ML Theory & Optimization | Theoretical and optimization research |
8. Benchmarking Paper2Web
Benchmark includes:
-
Original website source URLs -
Paper metadata -
Partial results from PWAgent -
Visual comparison of original vs. generated websites


Evaluation metrics:
-
Informative quality -
Aesthetic quality -
QA accuracy -
Content completeness -
Connectivity and interactivity
9. Contributing
-
Fork the repository -
Create a feature branch -
Modify code -
Add tests (if applicable) -
Submit a pull request
10. Acknowledgments
Thanks to:
-
Authors and guiding advisors -
Open-source community -
Paper2AI ecosystem contributors -
Paper2Video, Paper2Poster, AutoPR, EvoPresent teams
11. FAQ
Q1: How does PAPER2WEB determine the platform?
-
Numeric folder name → Twitter (English) -
Alphanumeric → Xiaohongshu (Chinese)
Q2: Is LibreOffice required?
Yes, for document conversion. If sudo unavailable, download executable and add to PATH.
Q3: What is Poppler used for?
PDF parsing and rendering.
Q4: Can I use OpenRouter instead of OpenAI API?
Yes, configure OPENAI_API_BASE and OPENAI_API_KEY in .env.
Q5: Minimum GPU requirement for Paper2Video?
NVIDIA A6000 48GB recommended.
Q6: Can I skip talking-head generation?
Yes, use pipeline_light.py without hallo2 environment.
Q7: Poster default and custom sizes?
Default: 48×36 inches
Custom: Use --poster-width-inches and --poster-height-inches.
Q8: What data is in the Paper2Web dataset?
-
Metadata -
Website existence -
Citation counts -
Categories (13 classes)
Q9: What can the benchmark do?
-
Compare original vs. generated websites -
Evaluate visual design, content completeness, connectivity
Q10: How to contribute?
Follow standard open-source workflow: fork → branch → modify → test → PR
✅ Next Steps / Tips for Users
-
Always use a separate Conda environment per module to avoid dependency conflicts -
Prepare LaTeX, images, and audio references before running Paper2Video -
Use the benchmark dataset to evaluate and improve generated websites -
Keep .envsecure; API keys should not be shared publicly

