When AI Writes Its Own Papers: Inside AI-Researcher, the End-to-End Lab in a Box
“What if a college junior could complete a conference-grade study, from blank page to camera-ready PDF, overnight?”
AI-Researcher is turning that hypothetical into a nightly routine.
Table of Contents
-
What exactly does it do? -
How the pipeline works—three stages, no hand-holding -
Run it yourself: zero-to-paper in 6–12 h -
FAQ—answers to the questions people keep asking -
Where it still falls short vs. human teams -
Install & configure—Docker, uv, or one-click GUI -
Seven real examples across six research fields
1. What Exactly Does It Do?
AI-Researcher is an open-source framework that automates the entire research cycle:
-
Ingests 10–15 reference papers or a short idea prompt -
Writes novel algorithms in PyTorch -
Trains models on your GPU or CPU -
Evaluates against standard benchmarks -
Produces a LaTeX paper ready for submission
The only human input is the initial level choice:
Level | What you give | What it does |
---|---|---|
Level-1 | A concrete idea (e.g., “fix VQ-VAE gradient flow”) | Executes the idea step-by-step |
Level-2 | Only reference papers | Discovers its own research gap and solution |
2. How the Pipeline Works—Three Stages, No Hand-Holding
Stage 1: Literature Review & Idea Generation
-
Knowledge Acquisition Agent -
Scrapes arXiv, GitHub, Hugging Face -
Filters by stars, citations, recency
-
-
Resource Analyst -
Maps every formula in the PDF to matching code
-
-
Idea Generator -
Identifies contradictions or untouched angles -
Outputs five distinct directions → ranks by novelty & feasibility
-
Stage 2: Algorithm Design, Implementation, Validation
-
Code Agent -
Writes self-contained PyTorch projects (no external imports) -
Runs a mini burn-in (1–2 epochs) to confirm viability
-
-
Advisor Agent -
Reviews code like a senior grad student → lists fixes -
Loops until all atomic concepts are correctly implemented
-
-
Full Experiment -
Switches to full dataset, records metrics, saves checkpoints
-
Stage 3: Paper Writing
-
Documentation Agent -
Uses a three-pass hierarchical process -
Outline with LaTeX comments -
Section-by-section draft -
Review checklist (math, clarity, novelty)
-
-
-
Output = paper.tex
+figures/
+project/
ready for arXiv upload
3. Run It Yourself—Zero-to-Paper in 6–12 h
3.1 Pick Your Install Path
Method | One-liner |
---|---|
uv (fastest) | curl -LsSf https://astral.sh/uv/install.sh | sh && uv venv --python 3.11 && uv pip install -e . |
Docker | docker pull tjbtech1/airesearcher:v1 |
Web GUI | python web_ai_researcher.py → open http://localhost:7860 |
3.2 Configure One File
Copy .env.template
→ .env
and set:
OPENROUTER_API_KEY=sk-xxxxxxxx
GITHUB_AI_TOKEN=ghp_xxxxxxxx
COMPLETION_MODEL=claude-3-5-sonnet-20241022 # or gpt-4o
3.3 Level-1 Example (You Have an Idea)
python run_infer_plan.py \
--instance_path ../benchmark/final/vq/one_layer_vq.json \
--task_level task1 \
--model claude-3-5-sonnet-20241022
Outputs:
-
project/
→ runnable code -
paper.tex
→ submission-ready draft -
results.log
→ metrics & plots
3.4 Level-2 Example (AI Invents the Idea)
Same command, change --task_level task2
.
The agent will first write the idea section, then proceed as above.
4. FAQ—Answers to the Questions People Keep Asking
Q1: Does it just copy-and-paste existing work?
A: No. All model names and paper titles are anonymized during input. The system must reconstruct concepts from generic descriptions.
Q2: Is the code actually usable?
A: Across 22 benchmark tasks, the framework achieved 93.8 % completion rate (code runs without error) and 2.65/5 correctness score (meets spec). Claude-3.5 backbone performs best.
Q3: Can it handle large datasets?
A: Yes. After the 1–2 epoch smoke test, the agent automatically scales to the full dataset. Batch size and GPU count are read from your .env
.
Q4: Is the paper ready for submission?
A: In blind reviews using GPT-4o as judge, 78.9 % of AI-generated papers scored “comparable” or better than human-authored baselines. Claude models are stricter yet still pass 20–30 %.
5. Where It Still Falls Short vs. Human Teams
Aspect | Human Team | AI-Researcher |
---|---|---|
Breakthrough theory | Can invent new paradigms | Good at incremental combos |
Deep domain lore | Years of tacit knowledge | Learns from text only |
Story telling | Builds compelling narratives | Structure correct, style neutral |
Resource limits | Can bid for 1000-GPU jobs | Stays within user’s hardware |
Time cost | Weeks to months | 6–12 h |
In short: Let the machine do the grinding, humans ask the big questions.
6. Install & Configure—Docker, uv, or One-Click GUI
6.1 Quick Start Matrix
Path | Prerequisites | Steps |
---|---|---|
uv (macOS/Linux) | None | “`curl -LsSf https://astral.sh/uv/install.sh |
Docker (any OS) | Docker + nvidia-docker | docker pull tjbtech1/airesearcher:v1 docker run --gpus all -e OPENROUTER_API_KEY=... tjbtech1/airesearcher:v1 python run_infer_plan.py ... |
Web GUI | Python 3.11 | python web_ai_researcher.py → browser opens automatically |
6.2 Environment Variables You Actually Need
# Core LLM access
OPENROUTER_API_KEY=sk-xxxxxxxx
GITHUB_AI_TOKEN=ghp_xxxxxxxx
# GPU (optional)
GPUS='"device=0"' # or '"device=0,1"' or '"all"'
# Task selection
CATEGORY=vq # diffu_flow, gnn, reasoning, recommendation, vq
INSTANCE_ID=one_layer_vq # JSON file inside ./benchmark/final/${CATEGORY}/
TASK_LEVEL=task1 # task1 = guided, task2 = open-ended
7. Seven Real Examples Across Six Research Fields
Each link points to a fully auto-generated paper and its code.
Field | Task in Plain English | Generated Assets |
---|---|---|
Vector Quantization | Fix gradient flow in VQ-VAE using rotation tricks | paper.pdf • code |
Vector Quantization (alt.) | Replace VQ with simple scalar quantization | paper.pdf • code |
Recommendation | Use knowledge graphs + meta networks for better user-item predictions | paper.pdf • code |
Recommendation (alt.) | Contrastive learning on user-item graphs to fight sparse data | paper.pdf • code |
Diffusion & Flow | Train continuous normalizing flows in one step | paper.pdf • code |
Graph Neural Networks | Scalable all-pair message passing with kernelized Gumbel-Softmax | paper.pdf • code |
Graph Neural Networks (alt.) | Energy-based diffusion on learned latent graphs | paper.pdf • code |
All artifacts compile out-of-the-box; Docker image includes every dependency.
Closing Thoughts
AI-Researcher does not replace scientists—it compresses the tedious 90 % so humans can focus on the creative 10 %.
When you wake up to a draft in your inbox, the real research starts: asking the next question the machine hasn’t thought of yet.
-
Paper: https://arxiv.org/abs/2505.18705 -
GitHub: https://github.com/HKUDS/AI-Researcher -
Live Docs: https://auto-researcher.github.io/docs