PAPER2WEB: Bringing Your Academic Papers to Life

An integrated guide for turning static PDFs into interactive, structured academic websites and presentation materials.

Introduction
What’s New
Installation Guide
Configuration
Quick Start
Generating Academic Presentation Videos (Paper2Video)
Paper2Web Dataset Overview
Benchmarking Paper2Web
Contributing
Acknowledgments
FAQ

1. Introduction

Academic papers are highly structured and information-dense, but their PDF format often limits discoverability and interactivity. Researchers, students, and project teams face challenges such as:

Difficulty navigating complex content
Static figures and tables
Time-consuming manual website creation

PAPER2WEB addresses these challenges by providing an autonomous pipeline that converts academic papers into interactive, explorable project websites. The pipeline iteratively refines both content and layout, producing engaging websites that showcase the research in a structured, readable, and interactive format.

Key features include:

Automatic layout-aware content generation
Interactive navigation for users
Support for posters, presentation videos, and PR materials
Integration with advanced aesthetic agents (EvoPresent)

2. What’s New

Recent updates from the project:

EvoPresent Integration: Adds self-improving aesthetic agents for academic presentations.
Paper2Web Dataset & Benchmark: Tens of thousands of categorized papers, including metadata and citation counts, available for analysis and model training.
Paper2ALL Pipeline Release: Incorporates Paper2Video, Paper2Poster, and AutoPR, creating a unified toolchain for promotional materials.

3. Installation Guide

3.1 Prerequisites

Before installing, ensure the following:

Python ≥ 3.11
Conda (recommended for environment management)
LibreOffice (required for document conversion)
Poppler-utils (PDF rendering and parsing)

Tip: Conda environments help isolate dependencies and avoid conflicts between Python packages.

3.2 Creating Conda Environment

conda create -n p2w python=3.11
conda activate p2w

This creates an isolated environment named p2w for all Paper2Web dependencies.

3.3 Installing Dependencies

pip install -r requirements.txt

This installs Python packages required for the pipeline, including libraries for PDF processing, LLM interaction, and website generation.

3.4 System Dependencies

LibreOffice

sudo apt install libreoffice

If sudo is unavailable, download the executable version from LibreOffice and add it to your system PATH.

Poppler

conda install -c conda-forge poppler

Poppler is used for PDF parsing and rendering, enabling conversion from LaTeX/PDF to HTML content.

4. Configuration

Before running the pipeline, configure your API credentials in a .env file:

# OpenAI API
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1

# Optional: OpenRouter
OPENAI_API_BASE=https://openrouter.ai/api/v1
OPENAI_API_KEY=sk-or-your-openrouter-key-here

AutoPR Component:

cp AutoPR/.env.example AutoPR/.env

Edit credentials as needed.

Optional: Google Search API (for logo search):

GOOGLE_SEARCH_API_KEY=your_google_search_api_key
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id

5. Quick Start

5.1 Input Directory Structure

The pipeline automatically detects target platforms based on folder names:

papers/
├── 12345/                    # Numeric → Twitter (English)
│   └── paper.pdf
└── research_project/         # Alphanumeric → Xiaohongshu (Chinese)
    └── paper.pdf

5.2 Running All Modules

python pipeline_all.py --input-dir "path/to/papers" --output-dir "path/to/output"

For a specific PDF:

python pipeline_all.py \
  --input-dir "path/to/papers" \
  --output-dir "path/to/output" \
  --pdf-path "path/to/paper.pdf"

5.3 Running Specific Modules

Website Generation Only:

python pipeline_all.py --model-choice 1

Poster Generation Only (default 48×36 inches):

python pipeline_all.py --model-choice 2

Poster with Custom Size:

python pipeline_all.py --model-choice 2 --poster-width-inches 60 --poster-height-inches 40

PR Material Generation Only:

python pipeline_all.py --model-choice 3

6. Generating Academic Presentation Videos (Paper2Video)

Paper2Video converts LaTeX papers into full presentation videos, including:

Slides
Subtitles
Audio narration
Cursor animations
Optional talking-head avatars

6.1 Environment Setup

cd paper2all/Paper2Video/src
conda create -n p2v python=3.10
conda activate p2v
pip install -r requirements.txt
conda install -c conda-forge tectonic ffmpeg poppler

6.2 Optional: Talking-Head Generation

Separate environment recommended to avoid package conflicts:

cd hallo2
conda create -n hallo python=3.10
conda activate hallo
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
huggingface-cli download fudan-generative-ai/hallo2 --local-dir ../pretrained_models

Find the Python executable path:

which python

Use this path as --talking_head_env in the pipeline.

6.3 Inference Pipeline

The pipeline consumes:

LaTeX sources of papers
Reference images
Reference audio

The pipeline outputs a complete academic presentation video. Minimum recommended GPU: NVIDIA A6000 (48GB).

6.4 Example Commands

Fast generation (without talking-head):

python pipeline_light.py \
  --model_name_t gpt-4.1 \
  --model_name_v gpt-4.1 \
  --result_dir /path/to/output \
  --paper_latex_root /path/to/latex_proj \
  --ref_img /path/to/ref_img.png \
  --ref_audio /path/to/ref_audio.wav \
  --gpu_list [0,1,2,3,4,5,6,7]

Full generation (with talking-head):

python pipeline.py \
  --model_name_t gpt-4.1 \
  --model_name_v gpt-4.1 \
  --model_name_talking hallo2 \
  --result_dir /path/to/output \
  --paper_latex_root /path/to/latex_proj \
  --ref_img /path/to/ref_img.png \
  --ref_audio /path/to/ref_audio.wav \
  --talking_head_env /path/to/hallo2_env \
  --gpu_list [0,1,2,3,4,5,6,7]

7. Paper2Web Dataset Overview

Dataset includes:

Metadata for papers with and without project websites
Citation counts
13 main categories:

Category	Description
3D Vision & Computational Graphics	Papers on 3D reconstruction and graphics
Multimodal Learning	Learning across images, text, and audio
Generation Models	Generative AI models
Speech & Audio	Processing and understanding audio signals
AI for Science	AI applied to scientific domains
ML System & Infrastructure	Frameworks and tools for ML
Deep Learning Architectures	Neural network design
Probabilistic Inference	Probabilistic reasoning methods
Natural Language Understanding	NLP and language models
Information Retrieval & Recommendation	Search engines, recommender systems
Reinforcement Learning	RL algorithms and applications
Trustworthy AI	Safety, fairness, explainability
ML Theory & Optimization	Theoretical and optimization research

8. Benchmarking Paper2Web

Benchmark includes:

Original website source URLs
Paper metadata
Partial results from PWAgent
Visual comparison of original vs. generated websites

Evaluation metrics:

Informative quality
Aesthetic quality
QA accuracy
Content completeness
Connectivity and interactivity

9. Contributing

Fork the repository
Create a feature branch
Modify code
Add tests (if applicable)
Submit a pull request

10. Acknowledgments

Thanks to:

Authors and guiding advisors
Open-source community
Paper2AI ecosystem contributors
Paper2Video, Paper2Poster, AutoPR, EvoPresent teams

11. FAQ

Q1: How does PAPER2WEB determine the platform?

Numeric folder name → Twitter (English)
Alphanumeric → Xiaohongshu (Chinese)

Q2: Is LibreOffice required?

Yes, for document conversion. If sudo unavailable, download executable and add to PATH.

Q3: What is Poppler used for?

PDF parsing and rendering.

Q4: Can I use OpenRouter instead of OpenAI API?

Yes, configure OPENAI_API_BASE and OPENAI_API_KEY in .env.

Q5: Minimum GPU requirement for Paper2Video?

NVIDIA A6000 48GB recommended.

Q6: Can I skip talking-head generation?

Yes, use pipeline_light.py without hallo2 environment.

Q7: Poster default and custom sizes?

Default: 48×36 inches
Custom: Use --poster-width-inches and --poster-height-inches.

Q8: What data is in the Paper2Web dataset?

Metadata
Website existence
Citation counts
Categories (13 classes)

Q9: What can the benchmark do?

Compare original vs. generated websites
Evaluate visual design, content completeness, connectivity

Q10: How to contribute?

Follow standard open-source workflow: fork → branch → modify → test → PR

✅ Next Steps / Tips for Users

Always use a separate Conda environment per module to avoid dependency conflicts
Prepare LaTeX, images, and audio references before running Paper2Video
Use the benchmark dataset to evaluate and improve generated websites
Keep .env secure; API keys should not be shared publicly

Paper2Web: Turn Academic PDFs into Interactive Research Websites

PAPER2WEB: Bringing Your Academic Papers to Life

Table of Contents

1. Introduction

2. What’s New

3. Installation Guide

3.1 Prerequisites

3.2 Creating Conda Environment

3.3 Installing Dependencies

3.4 System Dependencies

LibreOffice

Poppler

4. Configuration

5. Quick Start

5.1 Input Directory Structure

5.2 Running All Modules

5.3 Running Specific Modules

6. Generating Academic Presentation Videos (Paper2Video)

6.1 Environment Setup

6.2 Optional: Talking-Head Generation

6.3 Inference Pipeline

6.4 Example Commands

7. Paper2Web Dataset Overview

8. Benchmarking Paper2Web

9. Contributing

10. Acknowledgments

11. FAQ

Q1: How does PAPER2WEB determine the platform?

Q2: Is LibreOffice required?

Q3: What is Poppler used for?

Q4: Can I use OpenRouter instead of OpenAI API?

Q5: Minimum GPU requirement for Paper2Video?

Q6: Can I skip talking-head generation?

Q7: Poster default and custom sizes?

Q8: What data is in the Paper2Web dataset?

Q9: What can the benchmark do?

Q10: How to contribute?

Paper2Web: Turn Academic PDFs into Interactive Research Websites

PAPER2WEB: Bringing Your Academic Papers to Life

Table of Contents

1. Introduction

2. What’s New

3. Installation Guide

3.1 Prerequisites

3.2 Creating Conda Environment

3.3 Installing Dependencies

3.4 System Dependencies

LibreOffice

Poppler

4. Configuration

5. Quick Start

5.1 Input Directory Structure

5.2 Running All Modules

5.3 Running Specific Modules

6. Generating Academic Presentation Videos (Paper2Video)

6.1 Environment Setup

6.2 Optional: Talking-Head Generation

6.3 Inference Pipeline

6.4 Example Commands

7. Paper2Web Dataset Overview

8. Benchmarking Paper2Web

9. Contributing

10. Acknowledgments

11. FAQ

Q1: How does PAPER2WEB determine the platform?

Q2: Is LibreOffice required?

Q3: What is Poppler used for?

Q4: Can I use OpenRouter instead of OpenAI API?

Q5: Minimum GPU requirement for Paper2Video?

Q6: Can I skip talking-head generation?

Q7: Poster default and custom sizes?

Q8: What data is in the Paper2Web dataset?

Q9: What can the benchmark do?

Q10: How to contribute?

Related Posts