TextGAN-Researcher: How Adversarial AI Agents Argue Their Way to Better Research Reports

A practical, jargon-free guide for anyone who wants reproducible, high-quality documents without burning the midnight oil.


Table of Contents

  1. What Exactly Is TextGAN-Researcher?
  2. Why Traditional LLMs Fall Short—and How This Tool Fills the Gap
  3. Meet the Four AI “Characters” Inside the System
  4. The Execution State: Your Always-Growing, Never-Overwritten Logbook
  5. The Five-Step Workflow: From Blank Page to Polished Report
  6. Real-World Scenarios Where It Shines
  7. Getting Started: Installation, Configuration, and First Run
  8. Frequently Asked Questions (FAQ)
  9. Final Thoughts: Letting AI Debate Itself So You Don’t Have To

1. What Exactly Is TextGAN-Researcher?

Imagine you have two AI interns.

  • Intern A loves to draft reports at lightning speed.
  • Intern B is a meticulous fact-checker who never lets a mistake slide.

Now imagine a referee who records every argument between them—what was wrong, why it was wrong, and what to fix next.
TextGAN-Researcher is the digital equivalent of that three-person team. It is an open-source Python framework that turns research tasks into a zero-sum debate game between specialized AI agents. The end product is a document that has already survived multiple rounds of self-critique, complete with an audit trail you can read line-by-line.


2. Why Traditional LLMs Fall Short—and How This Tool Fills the Gap

Pain Point Typical LLM Behavior TextGAN-Researcher Fix
Memory Loss Each prompt starts fresh, so earlier mistakes repeat. Execution State keeps an immutable log of every attempt.
Hallucination Confidently cites papers that do not exist. Rewarder scores credibility; Reviewer flags fake sources.
No Traceability You cannot see why an answer changed. Every edit, score, and critique is timestamped.
Scope Creep Asks get longer and messier. Leader agent breaks tasks into atomic sub-questions.
Human Bottleneck You must prompt, review, re-prompt. The loop is fully automated until quality gates are met.

3. Meet the Four AI “Characters” Inside the System

Role Nickname in the Code Job Description Human Analog
Leader leader.py Writes the initial plan, splits the task into checkable items, and seeds the Execution State. Project manager who writes the brief
Prover (Generator) generator.py Reads the entire Execution State and drafts the next version of the artifact (report, code, outline). Creative writer who learns from past feedback
Rewarder rewarder.py Gives a fast, scalar truthfulness score 0-100. Cheap to run. First-pass reviewer who rejects obvious junk
Reviewer reviewer.py Expensive, slow, and interpretable. Writes bullet-point criticism: “Fact error #1: citation retracted in 2023. Fix: replace with DOI:10.xxxx.” Senior editor who tells you exactly why a paragraph fails

4. The Execution State: Your Always-Growing, Never-Overwritten Logbook

Think of the Execution State as a diary that never uses an eraser.
Each entry contains:

{
  "round": 7,
  "artifact": "## Section 3.2 ...",
  "reward_score": 68,
  "review": [
    "1. Source [Smith 2021] does not mention cross-attention.",
    "2. Metric F1 is undefined in the paragraph.",
    "3. Suggest adding ablation table for clarity."
  ],
  "timestamp": "2025-07-14T09:42:11Z"
}

Because the log is append-only, you can rewind to any round, inspect what changed, and even re-run from an earlier checkpoint. No more “it worked yesterday but I lost the prompt” moments.


5. The Five-Step Workflow: From Blank Page to Polished Report

Step 1: Initialization

The Leader agent writes a one-page plan:

Task: “Create a 2,000-word survey on low-rank adaptation methods for large language models. Must include (a) three representative papers, (b) Python code that reproduces at least one experiment, (c) comparison table.”

The plan is saved as the first entry in the Execution State.

Step 2: Generation

Prover reads the log (initially empty) and drafts section 1. It may hallucinate a paper or forget a metric—no problem, the next agents will catch it.

Step 3: Fast Screening

Rewarder scores the draft. If ≥ 90, the loop ends. Otherwise it returns the numeric score and a one-line verdict such as “citation missing.” This step is cheap—often a smaller model or a rules-based check.

Step 4: Deep Critique

Reviewer loads a heavier model and writes structured feedback. Example:

  • Accuracy: The claim “LoRA reduces VRAM by 90 %” is based on a tweet, not the LoRA paper.
  • Completeness: Missing AdaLoRA (2023) comparison.
  • Code: Provided script fails when rank > 64; add assertion and unit test.

Step 5: Convergence Check

After the critique is appended to the Execution State, Prover starts the next round. The loop exits when either:

  • Rewarder score ≥ 90 for three consecutive rounds, or
  • A user-defined maximum round count (default 20) is reached.

6. Real-World Scenarios Where It Shines

Scenario 1: Weekly Market Brief

A VC analyst needs a four-page summary of AI chip startups every Monday.

  • She schedules a cron job: 0 6 * * 1 python run.py --task "AI chip startups 2025 week notes"
  • By 6:30 AM the Execution State contains the finished PDF plus the full debate log.

Scenario 2: Academic Literature Review

A PhD student wants a reproducible survey on diffusion models for medical imaging.

  • He exports the final log as supplementary material, satisfying the university’s transparency requirement.

Scenario 3: Internal Technical Memo

An engineering team must decide between two vector databases.

  • They run two parallel tasks, compare the Execution States, and merge the best arguments into an internal wiki page.

7. Getting Started: Installation, Configuration, and First Run

7.1 Prerequisites

  • Python 3.9+
  • Git
  • OpenAI API key (or any OpenAI-compatible endpoint)
  • Optional: CUDA drivers if you plan to run local models

7.2 Clone and Install

git clone https://github.com/imbue-bit/TextGAN-Researcher.git
cd TextGAN-Researcher
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

7.3 One-Time Configuration

Edit config/config.yaml:

openai_api_key: "sk-YourKeyHere"
rewarder_model: "gpt-3.5-turbo"   # fast & cheap
reviewer_model: "gpt-4"           # slower but precise
max_rounds: 20
log_dir: "./logs"

7.4 Your First Task

Create a simple task file task.txt:

Write a 300-word beginner-friendly explanation of LoRA.
Include one Python snippet using Hugging Face PEFT.

Then run:

python run.py --task_file task.txt

After 3–5 minutes, check logs/latest/report.md and the full debate in logs/latest/state.jsonl.


8. Frequently Asked Questions (FAQ)

Q1: How much does it cost?

Only the Rewarder and Reviewer calls incur API fees. A typical 10-round run with GPT-3.5-turbo + GPT-4 costs ~0.80.

Q2: Can I swap in open-source models?

Yes. Any model that exposes an OpenAI-style REST endpoint works. Update the model fields in config.yaml.

Q3: Will it loop forever?

No. There are two hard stops: max_rounds and no-improvement limit (default 3 stagnant rounds).

Q4: Is my data private?

All logs stay on your disk unless you push them to GitHub. No telemetry is sent.

Q5: How big can the Execution State grow?

Roughly 50 kB per round for a 2 k-word article. Twenty rounds ≈ 1 MB—negligible even on modest laptops.

Q6: Can I pause and resume?

Yes. Copy the latest state.jsonl and later point the Leader to it with --resume_from.

Q7: Does it handle non-English tasks?

Absolutely. If the underlying model supports the language, the whole pipeline works in that language.

Q8: How do I add custom quality checks?

Append new rules to rewarder_rules.txt or extend reviewer_prompt_template.md.

Q9: Is there a GUI?

Not officially. The logs are plain JSONL and Markdown, so any text editor or Jupyter notebook can browse them.

Q10: License?

MIT. Commercial use is allowed; attribution is appreciated.


9. Final Thoughts: Letting AI Debate Itself So You Don’t Have To

TextGAN-Researcher reframes “research” as a structured argument between specialized agents.

  • The Prover brings creativity.
  • The Rewarder enforces quick filters.
  • The Reviewer delivers the deep, actionable critique.
  • The Execution State guarantees nothing is forgotten.

The outcome is not just a report—it is a traceable, reproducible artifact that you can defend in front of any stakeholder. Instead of praying your prompt was perfect the first time, you let the system iterate until the evidence shows it is good enough.

Clone the repo, run your first task, and watch two AI agents politely (but relentlessly) argue their way to a better answer than either could produce alone.

Happy researching.