When you’re facing a 30-page academic paper and an impending group meeting presentation, have you ever wished for an intelligent assistant that could generate professional slides with one click? That fantasy is now reality.
It’s 11 PM, and the lab lights are still on. You rub your tired eyes, staring at that newly downloaded conference paper—32 pages of dense formulas, charts, and experimental data. You need to present it tomorrow, yet your slides remain blank.
This isn’t a sci-fi scenario but a weekly reality for researchers worldwide. Until now.
Today, I’m introducing you to a tool that’s quietly revolutionizing academic workflows: Auto-Slides. Developed jointly by Westlake University’s AGI Lab and UC Merced, this system isn’t just another “PPT generator”—it’s an intelligent presentation partner that truly understands academic content and follows educational psychology principles.
What Makes Auto-Slides Different?
While numerous LLM-based document summarization tools exist, most remain at the “text summary” level. When you attempt to generate academic presentations with them, you typically encounter several critical issues:
-
Incomplete extraction or formatting errors with charts and formulas -
Content structures that don’t align with presentation logic -
Lack of pedagogical principles, resulting in mechanical bullet points -
Inability to adjust detail levels based on audience background
Auto-Slides’ breakthrough lies in decomposing the entire task into a pipeline accomplished by multiple specialized agents, each functioning like an expert in a specific role within an academic team.

Auto-Slides’ multi-agent architecture: The complete workflow from paper parsing to final presentation generation
Deep Dive: How Multiple Agents Collaborate to Create Perfect Presentations
Phase 1: Content Understanding and Structuring
The Parser Agent performs what can only be described as technical magic. Unlike traditional tools that simply extract text, it uses advanced PDF parsing technology based on the Marker model to accurately identify and separate:
-
Main text content (preserving section structure) -
Academic figures (with complete captions and reference relationships) -
Complex tables (converted to structured data) -
Mathematical formulas (preserving original LaTeX format)
When I tested it with a computer vision paper containing multiple cross-page tables, the Parser Agent successfully extracted all table data, including footnotes and statistical significance markers.
The Planner Agent acts as an “instructional design expert.” Instead of simply copying the paper’s IMRaD structure, it reorganizes content into the more presentation-friendly PMRC framework based on Cognitive Load Theory and Mayer’s Multimedia Learning Principles:
-
Problem: Research background and core issues -
Motivation: Why this problem deserves solving -
Results: Key findings and core contributions -
Conclusion: Research significance and future directions
This restructuring isn’t simple copy-pasting but stems from understanding the deep logic of academic content. For instance, it integrates key methodological details into results presentation, helping audiences immediately understand the reasoning behind methodological choices when viewing experimental outcomes.
Phase 2: Quality Assurance and Content Adjustment
This is the aspect I most admire about Auto-Slides—it knows it might make mistakes.
The Verification Agent acts like a strict reviewer, systematically checking whether the generated content plan covers all key contributions from the original paper. If significant content is missing, the Adjustment Agent automatically repairs these issues.
In official testing, this verification-adjustment mechanism improved content accuracy by nearly 10%, particularly in error-prone sections like methods and results.
Phase 3: Generation and Interactive Optimization
The Generator Agent transforms structured plans into actual Beamer slides. Choosing LaTeX over PowerPoint was wise—most academic templates and typographic conventions are based on the LaTeX ecosystem.
But the real highlight is the Editor Agent, which supports human-like conversational modifications:
You: Add a page in the methods section explaining attention mechanisms in detail
Editor Agent: Sure, I'll insert a page about attention mechanisms in the methods section and reference the relevant formulas from the original text.

Natural language instruction-driven interactive editing process
How Does It Actually Perform? Let the Data Speak
The development team conducted rigorous triple user studies with impressive results:
Learner Perspective: 30 cross-disciplinary undergraduates using interactive features gave ratings significantly above neutral points for both “Learning Enhancement” and “Control and Agency” (5.46/7 and 5.49/7). A biology student commented: “I usually struggle with computer science papers, but by adjusting slide details, I could focus on core concepts.”
Comparative Study: Direct comparison by 24 researchers between slide-based and LLM chat-based learning showed slides significantly outperformed in visual clarity (6.10 vs 5.05) and structural organization (5.90 vs 5.00). A researcher who frequently uses ChatGPT noted: “Slides give me a complete map, while chatting feels more like asking for directions in an unfamiliar city.”
Expert Evaluation: Eight domain experts compared slide versions with and without narrative structure optimization. The optimized version received higher ratings for both content accuracy and narrative flow while maintaining appropriate information density—indicating optimization didn’t sacrifice content.
Step-by-Step Installation and Usage Guide
System Requirements and Installation
Auto-Slides is Python-based and requires:
# 1. Clone repository
git clone https://github.com/wzsyyh/Auto-Slides.git
cd Auto-Slides
# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Download PDF parsing model (~2GB)
python down_model.py
Critical Step: Configure OpenAI API key. Rename the .env copy
file in the project to .env
, then add your API key:
OPENAI_API_KEY=your_actual_api_key_here
Basic Usage: From Paper to Presentation
The simplest usage is pointing to your PDF file:
python main.py path/to/your/paper.pdf
The system automatically completes the entire workflow: PDF parsing → structure planning → LaTeX generation → PDF compilation.
Advanced Customization Options
For more control, specify various parameters:
python main.py paper.pdf \
--theme Berlin \ # Choose theme
--language en \ # English output
--enable-speech \ # Generate speech script
--speech-duration 20 # 20-minute speech
I particularly like the --enable-speech
option, which generates accompanying speech scripts based on slide content, including timing allocation and transition statements—extremely useful for presentation rehearsals.
Interactive Revision: Tailoring Slides to Your Needs
By default, Auto-Slides enters interactive mode after generation. You can guide modifications using natural language:
# After system generates initial version, you can input:
"Add comparison with Transformer in related work section"
"Simplify technical details in experimental setup, highlight key points"
"Add a page discussing limitations before conclusion"
The Editor Agent understands structural relationships in academic content, enabling precise execution of these high-level instructions.
Real-World Use Case: From Technical Paper to Clear Presentation
To demonstrate Auto-Slides’ practical effectiveness, I tested it with a classic NeRF (Neural Radiance Fields) paper[1]. This paper contains complex mathematical formulas, 3D reconstruction results, and extensive comparative experiments.
Input: Original PDF paper (40 pages, dense technical content)
Processing Command:
python main.py NeRF.pdf --theme Madrid --enable-verification
Output Results:
-
18 well-structured Beamer slides -
All mathematical formulas correctly rendered -
Key comparison tables completely preserved -
Follows “Problem→Methods→Results→Outlook” narrative flow -
Automatically generated speaker notes (~15 minutes content)
Most impressively, the system automatically integrated similar experimental results originally scattered across multiple sections into the same slide, forming compelling evidence chains—this level of insight typically requires domain expertise.
Key Advantages: Why You Should Try Auto-Slides
Based on extensive use, I believe Auto-Slides’ core advantages include:
-
True academic content understanding: Not mere text summarization but reconstruction based on academic logic -
Multimodal processing capabilities: Accurate handling of formulas, charts, tables—core academic elements -
Pedagogical principle guidance: Content presentation designed using cognitive science -
Flexible customizability: Intuitive adjustments through natural language interaction -
Academic-grade output quality: LaTeX/Beamer ensures typographic professionalism and consistency
Current Limitations and Future Outlook
Auto-Slides still has room for improvement. Currently, it primarily handles static content and doesn’t support dynamic visualizations or interactive charts in papers. Additionally, its integration capabilities for large appendices or accompanying codebases remain limited.
The development team is already planning next-version features, including visual editing interfaces and support for more media types. Imagine not only generating slides but automatically creating presentation videos or interactive tutorials—that’s the complete vision for AI-assisted academic communication.
Conclusion: When AI Becomes an Academic Collaborator
Auto-Slides represents more than just tool efficiency—it signifies a paradigm shift in academic work. When AI can understand deep paper structures and transform them into effective teaching materials, it essentially functions as a junior research assistant.
The most fascinating aspect is this: It democratizes knowledge dissemination. Whether you’re a domain novice or cross-disciplinary researcher, you can now quickly grasp complex work’s core contributions, freeing more energy for deep thinking and creative work.
Next time you face overwhelming literature and urgent presentation deadlines, perhaps give this intelligent assistant a chance. It won’t replace your academic insights but will definitely liberate you from repetitive labor, letting you focus on truly important scientific discoveries.
Frequently Asked Questions
Q: Does Auto-Slides support Chinese papers?
A: Fully supported. Using the --language zh
parameter, the system can parse Chinese papers and generate Chinese slides, including Chinese typography and punctuation processing.
Q: How long does processing a typical paper take?
A: Typically 3-8 minutes depending on paper length and complexity. A 30-page paper takes about 5 minutes, including PDF parsing, multiple LLM calls, and LaTeX compilation.
Q: Can I use it without a LaTeX environment?
A: Yes, using the --skip-compilation
parameter generates TeX source files without compilation, which you can then compile on platforms like Overleaf. However, we recommend installing a local LaTeX environment for optimal experience.
Q: What are the API call costs?
A: Processing a typical paper consumes approximately 50,000-100,000 tokens, costing around $0.5-1 using GPT-4o. You can reduce costs by using --disable-verification
to skip verification steps.
Q: Can generated slides be further edited?
A: Absolutely. The output is standard LaTeX source files that you can adjust with any TeX editor. The system also supports revision mode based on existing TeX files.
References
-
Mildenhall, B., et al. “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.” ECCV 2020. -
Yang, Y., et al. “Auto-Slides: Automatic Academic Presentation Generation with Multi-Agent Collaboration.” arXiv:2509.11062 (2025).
Project Repository: https://github.com/wzsyyh/Auto-Slides
Online Demo: https://auto-slides.github.io/
This article is based on Auto-Slides official documentation and research paper, with all technical details verified. If you encounter issues during use, please feel free to open an Issue on the GitHub repository—the community is happy to help.