AI Researcher: A Complete Guide to Building Autonomous Research Agents

Core Question: How Can AI Automate the Entire Research Process from Design to Execution?

AI Researcher represents a revolutionary autonomous research system capable of receiving a research objective, automatically breaking it down into executable experiments, assigning them to specialized research agents, and finally generating paper-level reports. The most striking feature of this system is that each agent can launch GPU sandboxes to train models, run inference, and evaluate results, truly achieving end-to-end automated research workflows.

1. System Overview and Core Value

1.1 How AI Researcher Transforms Traditional Research Models

Traditional research processes often require researchers to manually design experiments, configure environments, run code, collect data, and write reports. This process is not only time-consuming but also susceptible to human bias. AI Researcher fundamentally changes this paradigm through the following approaches:

Intelligent Task Decomposition: The system automatically breaks down complex research objectives into executable experimental tasks
Parallel Experiment Execution: Multiple specialized agents work simultaneously, each with access to independent GPU resources
Dynamic Decision Mechanism: Decides whether to continue deeper exploration or adjust direction based on experimental results
Automatic Report Generation: Integrates all experimental results into coherent academic paper format

1.2 Core Components of the Technical Architecture

The system is built upon three key technical pillars: a lightweight HTTP API, the Modal cloud platform, and an intelligent orchestrator. The HTTP API serves as the frontend interface, passing user requests to the backend CLI program; Modal provides GPU computing resources and persistent storage; the orchestrator handles task allocation and result integration.

2. Quick Start: Launch Your AI Research Assistant in Three Steps

2.1 Environment Setup and Dependency Installation

Core Question: How can you quickly set up the AI Researcher environment?
The simplest approach is to use the one-click startup script provided by the project:

python run_app.py

This command automatically completes the following operations:

Checks and installs missing Python dependency packages
Starts the API service and frontend interface
Opens the interactive notebook in your browser
For manual dependency installation, use:

pip install -r requirements.txt

2.2 Key Configuration and Permission Settings

The system requires the following types of API keys to function properly:
LLM Service Keys (at least one required):

Google AI Studio: GOOGLE_API_KEY (for Gemini 3 Pro)
Anthropic: ANTHROPIC_API_KEY (for Claude Opus 4.5)
Modal Platform Keys:
MODAL_TOKEN_ID
MODAL_TOKEN_SECRET
These keys can be configured through two methods:

Create a .env file in the project root directory and add the keys
Enter them directly in the Web UI interface (system will save them automatically)

2.3 Model Selection and Configuration

AI Researcher supports two major large language models:

Gemini 3 Pro: Google’s latest large model, suitable for complex reasoning tasks
Claude Opus 4.5: Anthropic’s flagship model, excelling in academic writing
Users can select models from the dropdown menu in the Web UI or specify via CLI parameters:

--model gemini-3-pro-preview

3. Deep Dive into Modal Platform: The Art of GPU Resource Management

3.1 Understanding Core Concepts

Core Question: How does Modal simplify GPU resource acquisition and management?
Modal employs several core abstractions to simplify cloud resource management:

App: The basic unit of deployment, defined through modal.App("app-name")
Image: Runtime environment containing operating system and Python packages
Function: Actual running code, marked with @app.function decorator
Volume: Persistent storage for large files (datasets, models)

3.2 GPU Resource Selection Strategies

Modal offers multiple GPU options, each suited for different scenarios:

GPU Type	Use Case	Cost (per second)	Characteristics
H100	Large-scale LLM training	~$0.001097	Most powerful, scarce resources
A100	Standard LLM training/inference	~$0.000694	Balanced performance and cost
A10G	Inference tasks	Lower	Excellent price-performance ratio
T4	Lightweight tasks	Lowest	Economical choice
Multi-GPU Configuration Example:

@app.function(gpu="A100:4")  # Request 4 A100 GPUs
def distributed_training():
    import torch
    print(f"Available GPU count: {torch.cuda.device_count()}")

3.3 Best Practices for Persistent Storage

Core Question: How can you ensure experimental data consistency across multiple runs?
Volume is Modal’s persistent storage solution, particularly suitable for storing datasets and model files:

# Create or get volume
volume = modal.Volume.from_name("research-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def process_dataset():
    # Read data
    with open("/data/raw_dataset.csv", "r") as f:
        data = f.read()
    
    # Process and save results
    processed = transform(data)
    with open("/data/processed_dataset.csv", "w") as f:
        f.write(processed)
    
    # Critical step: Commit changes for persistence
    volume.commit()

Important Note: Every write operation must be followed by volume.commit(), otherwise data won’t persist. If other functions update the volume, use volume.reload() to see the latest changes.

3.4 Secure Key Management

Core Question: How can you use API keys without exposing sensitive information?
Modal’s Secret feature provides a secure key management solution:

Create Secret (via CLI or Dashboard):

modal secret create hf-secret HF_TOKEN=hf_your_token_here

Use in Code:

@app.function(secrets=[modal.Secret.from_name("hf-secret")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # Use token to download model

This approach avoids hardcoding keys in code, improving security.
Cloud Security
Image Source: Unsplash

4. API Usage Deep Dive: Building Flexible Research Interfaces

4.1 HTTP API Design Philosophy

AI Researcher’s HTTP API adopts a lightweight design philosophy. It doesn’t change the existing research logic but serves as a wrapper for the current CLI tool. The advantages of this design include:

Consistency: API and CLI use the same core logic
Simplified Maintenance: Only one set of core code needs maintenance
Streaming Output: Real-time return of CLI output information

4.2 Web Endpoint Creation

Core Question: How can you expose research functionality as accessible web services?
Modal provides a simple way to create web endpoints:

@app.function()
@modal.web_endpoint(method="POST")
def research_webhook(data: dict):
    # Process research request
    objective = data.get("objective")
    model = data.get("model", "gemini-3-pro")
    
    # Call research logic
    result = run_research(objective, model)
    
    return {"status": "completed", "results": result}

After deployment, Modal outputs the endpoint URL in the console, which can be called directly via HTTP requests.

4.3 Asynchronous Execution Patterns

For long-running research tasks, Modal provides asynchronous execution options:

# Synchronous call - waits for result return
result = run_experiment.remote({"params": config})
# Asynchronous call - immediately returns task handle
job = run_experiment.spawn({"params": config})
# Can execute other tasks
result = job.get()  # Get final result

This pattern is particularly suitable for scenarios requiring simultaneous execution of multiple experiments.

5. Deployment Options: From Local to Cloud

5.1 Local Development Environment

For development and testing, local running is the most convenient option:

# Single agent mode
python main.py "Does label smoothing improve ViT-Base on CIFAR-10?" \
  --mode single --gpu any --model gemini-3-pro
# Multi-agent orchestrator mode
python main.py "Characterize scaling laws for sparse attention transformers" \
  --mode orchestrator --num-agents 3 --max-rounds 3 --max-parallel 2 --gpu any

5.2 Railway Cloud Deployment

Core Question: How can you deploy AI Researcher as a publicly accessible web service?
Railway provides a simplified deployment process:

Click Railway’s “Deploy from GitHub” button
Connect your GitHub account and select the repository
Railway automatically detects the Dockerfile and builds the application
After deployment, access the application via the URL
Environment Variable Configuration:
Default environment variables can be set in Railway:

GOOGLE_API_KEY: Google AI Studio key
ANTHROPIC_API_KEY: Anthropic key
MODAL_TOKEN_ID and MODAL_TOKEN_SECRET: Modal authentication information
Users can also input their own keys in the Web UI without setting environment variables.

Image Source: Unsplash

6. Practical Application Cases and Scenarios

6.1 Machine Learning Research Automation

Scenario: Research the impact of different optimizers on Transformer model performance

# Experiment configuration
objective = """
Compare the performance of Adam, AdamW, and SGD with momentum
on training a BERT-base model on GLUE benchmark tasks.
Evaluate convergence speed, final accuracy, and training stability.
"""
# Launch research
python main.py objective --mode orchestrator --num-agents 3 \
  --gpu A10G --model claude-opus-4.5

The system will automatically:

Create three independent experiment agents
Assign each agent an A10G GPU
Run different optimizer experiments in parallel
Collect and compare results
Generate complete reports with charts and analysis

6.2 Hyperparameter Optimization Research

Scenario: Find optimal hyperparameter combinations for specific tasks

objective = """
Perform a comprehensive hyperparameter search for ResNet-50
on ImageNet subset. Focus on learning rate schedules,
batch sizes, and regularization techniques.
Use Bayesian optimization for efficient search.
"""
# Use more agents to accelerate search
python main.py objective --mode orchestrator --num-agents 5 \
  --max-parallel 5 --gpu any

6.3 Model Architecture Comparison Research

Scenario: Compare the effectiveness of different attention mechanisms

objective = """
Compare standard attention, sparse attention, and linear attention
mechanisms in Transformer models. Evaluate on long-sequence
tasks with varying sequence lengths (1k, 4k, 16k tokens).
"""

7. Best Practices and Experience Reflections

7.1 Resource Management Strategies

Reflection: In practical use, GPU resource selection significantly impacts experimental cost and efficiency. We found:

For inference tasks, A10G provides the best price-performance ratio
Large-scale training should prioritize A100 80GB version to avoid memory bottlenecks
Using gpu="any" significantly reduces waiting time, suitable for latency-insensitive tasks

7.2 Experiment Design Principles

Unique Insight: The most powerful aspect of AI Researcher lies in its iterative capabilities. We recommend:

Start Simple: First run small-scale experiments to validate hypotheses
Incremental Expansion: Gradually increase complexity and data scale
Parallel Validation: Use multiple agents to test different directions simultaneously
Early Termination: Set reasonable evaluation metrics to terminate ineffective experiments promptly

7.3 Cost Optimization Techniques

Lessons Learned: Long-running research tasks can generate unexpectedly high costs. The following optimization strategies have proven effective:

# Use more economical GPUs for preliminary experiments
@app.function(gpu="T4", timeout=300)  # 5-minute timeout
def quick_experiment():
    pass
# Only use high-end GPUs for final validation
@app.function(gpu="H100", timeout=3600)  # 1-hour timeout
def final_validation():
    pass

AI Research
Image Source: Unsplash

8. Practical Summary and Action Checklist

8.1 Quick Start Checklist

[ ] Install dependencies: pip install -r requirements.txt
[ ] Configure API keys (Google/Anthropic + Modal)
[ ] Choose appropriate model (Gemini 3 Pro or Claude Opus 4.5)
[ ] Select GPU type based on task requirements
[ ] Set up Volume for data persistence
[ ] Configure Secrets for sensitive information management
[ ] Run tests to verify environment

8.2 One-page Summary

Function	Command/Code	Description
Launch Web Interface	`python run_app.py`	One-click start all services
Single Agent Research	`--mode single`	Suitable for simple tasks
Multi-agent Orchestrator	`--mode orchestrator`	Complex research tasks
GPU Selection	`--gpu A100`	Specify GPU type
Model Selection	`--model gemini-3-pro`	Choose LLM
Create Volume	`modal.Volume.from_name()`	Persistent storage
Web Endpoint	`@modal.web_endpoint`	Expose HTTP interface

9. Frequently Asked Questions

Q1: How are experiment failures handled?
The system automatically records failure reasons, and the orchestrator decides whether to retry or adjust experimental parameters based on error types.
Q2: Can I limit the runtime of each experiment?
Yes, you can set the `timeout` parameter in the function decorator, for example `@app.function(timeout=1800)` limits to 30 minutes.
Q3: How can I monitor experiment progress?
View real-time logs through the Modal Dashboard, or check terminal output when running locally.
Q4: How are datasets shared between agents?
Use Modal Volume mounted to all functions to ensure data consistency.
Q5: Can I customize the experiment report format?
Currently, the system generates standard academic paper format; output format can be customized by modifying templates.
Q6: How can I estimate experimental costs?
Modal Dashboard provides detailed cost analysis, with A100 at approximately $0.000694/ seco n d an d H 100 a t$ 0.001097/second.
Q7: What types of experiments are supported?
The system can run any Python code, particularly suitable for machine learning, deep learning, and data analysis tasks.
Q8: How can I debug experimental code?
You can run the same code locally for debugging, or view detailed error logs in Modal.

Through AI Researcher, researchers can focus their energy on research design while entrusting tedious experiment execution to automated systems. This not only improves research efficiency but also opens up entirely new research paradigms. As the platform continues to evolve, we look forward to seeing more breakthrough research results emerge through this approach.

How AI Researcher Automates Scientific Research from Design to Paper Writing