Meet Bytebot: The Open-Source AI That Actually Uses a Computer for You

Imagine an intern who never sleeps, never complains, and already knows how to drive Firefox, LibreOffice, and the command line.
Bytebot is exactly that—an open-source desktop agent that lives inside its own Ubuntu computer and carries out multi-step tasks while you watch.


Table of Contents

  1. What Is a Desktop Agent, Really?
  2. Why Hand an AI a Full Computer Instead of Just a Browser?
  3. The 2-Minute Setup Guide (Railway or Docker)
  4. Everyday Tasks Bytebot Can Handle Today
  5. Under the Hood: Four Moving Parts
  6. How to Speak to Bytebot: Prompts, Files, and APIs
  7. Real-World Use Cases (Finance, Legal, DevOps, Research)
  8. Frequently Asked Questions
  9. Going Further: Scripting, Kubernetes, and Community Resources

1. What Is a Desktop Agent, Really?

Most AI tools today live in chat windows or browser tabs.
A desktop agent steps outside that box and takes control of an entire graphical operating system. Bytebot ships as a Docker image that spins up:

  • A complete Ubuntu 22.04 desktop with XFCE
  • Pre-installed apps: Firefox, VS Code, LibreOffice, terminal, and more
  • A small NestJS service that translates your plain-English instructions into mouse clicks, key strokes, file moves, and shell commands

In short, you give Bytebot a sentence such as “Download last month’s invoices from three vendor portals and create a summary spreadsheet”—then you watch it open the browser, log in, download PDFs, rename files, and launch LibreOffice Calc without any further help.


2. Why Hand an AI a Full Computer Instead of Just a Browser?

2.1 End-to-End Autonomy

Traditional RPA or browser-only agents break down when a workflow spans multiple applications.
Bytebot’s environment is unrestricted:

Task Step Browser-Only Tool Bytebot
Log into banking site
Download PDF statement
Open PDF, extract table ✅ (opens built-in PDF reader or LibreOffice)
Paste table into Excel
Save file to shared drive

2.2 Native File Handling

You drag-and-drop files onto the web UI. Bytebot places them in its own ~/Downloads, ~/Documents, or any folder you specify. From there it can:

  • Read entire PDFs into context
  • Batch-rename images
  • Convert .docx to .pdf with libreoffice --headless --convert-to
  • Run ffmpeg to extract audio from video—all inside the container

2.3 Genuine Cross-Application Workflows

Because the underlying system is Linux, Bytebot can:

  • Launch VS Code, open a project, run npm test, and push to Git
  • Open an email client, forward attachments, then open the same attachments in GIMP for quick edits
  • Install new software on the fly with apt-get—the change persists as long as the container lives

3. The 2-Minute Setup Guide

Option A: One-Click Railway

Deploy on Railway
  1. Click the button.
  2. Paste your AI-provider key (Anthropic, OpenAI, or Google).
  3. Wait ~90 seconds. Railway returns a public URL; open it in your browser.

Option B: Local Docker (Fully Offline)

# 1. Clone
git clone https://github.com/bytebot-ai/bytebot.git
cd bytebot

# 2. Choose one provider key
echo "ANTHROPIC_API_KEY=sk-ant-..." > docker/.env
# or: echo "OPENAI_API_KEY=sk-..." > docker/.env
# or: echo "GEMINI_API_KEY=..." > docker/.env

# 3. Launch
docker-compose -f docker/docker-compose.yml up -d

# 4. Open
open http://localhost:9992   # or your Docker host IP

Initial image pull is ~2 GB; expect 3–5 minutes on a normal broadband line.


4. Everyday Tasks Bytebot Can Handle Today

Below are copy-paste prompt ideas you can try immediately after setup.

4.1 Simple Web + File Tasks

Prompt Bytebot Action Stream
“Go to Wikipedia, search for ‘quantum computing’, and save a one-page summary to ~/Documents/quantum.txt Opens Firefox → types query → copies intro paragraphs → opens VS Code → pastes → saves.
“Take screenshots of the top 5 US news homepages and store them in ~/screenshots with today’s date in the filename” Navigates to CNN, BBC, Reuters… uses Firefox screenshot tool → renames files.

4.2 Document Processing

Prompt Bytebot Action Stream
“Read the uploaded contracts.pdf and list every payment deadline in an Excel file” Opens PDF viewer → searches for “due”, “deadline” → copies dates → opens LibreOffice Calc → pastes → saves as payment_deadlines.xlsx.
“Process the 50 invoices in ~/Invoices_2024 and create a summary CSV with vendor, amount, and due date columns” Loops over PDFs → extracts text using pdftotext → regex parse → writes CSV.

4.3 Multi-System Workflows

Prompt Bytebot Action Stream
“Log into our CRM, export last month’s customer list, and import it into the ERP” Opens CRM URL → uses password manager → exports CSV → opens ERP → imports → confirms.

5. Under the Hood: Four Moving Parts

  1. Virtual Desktop
    Ubuntu 22.04 + XFCE inside a Docker container. Persists installed software and user files across restarts.

  2. AI Agent Service
    Written in NestJS. Receives your task, breaks it into atomic actions (click, type, read file), and queries the LLM for next steps.

  3. Web UI & Live View
    Next.js front-end. Left panel for task descriptions and file uploads. Right panel shows a live VNC stream of the desktop so you can watch—or take over—at any time.

  4. REST & WebSocket APIs

    • POST /tasks to queue work
    • POST /computer-use for low-level mouse/keyboard commands
    • WebSocket stream for real-time screen updates

6. How to Speak to Bytebot: Prompts, Files, and APIs

6.1 Natural Language in the Web UI

Type as you would to a colleague:

“Download the latest sales report from the shared drive, open it in LibreOffice Calc, and highlight any row where revenue is below 10,000 USD.”

Bytebot maps “shared drive” to a mounted volume, “LibreOffice Calc” to the application menu, and “highlight” to conditional formatting.

6.2 Uploading Files

Drag files onto the task panel. Bytebot places them in ~/uploads/<task-id>/. From there you can reference them in prompts:

“Read the file ~/uploads/123/report.pdf and create a 300-word summary in Markdown.”

6.3 Programmatic Control

import requests

# Simple task
resp = requests.post('http://localhost:9991/tasks', json={
    'description': 'Generate a daily backup script and save it to ~/backups/backup.sh'
})

# Task with file
files = {'files': open('data.zip', 'rb')}
resp = requests.post('http://localhost:9991/tasks',
    data={'description': 'Unzip data.zip and list all filenames in result.txt'},
    files=files)

6.4 Low-Level Commands

# Move mouse and click
curl -X POST http://localhost:9990/computer-use \
  -H "Content-Type: application/json" \
  -d '{"action": "click_mouse", "coordinate": [500, 300]}'

7. Real-World Use Cases

7.1 Finance & Accounting

  • Invoice ingestion: Log into multiple vendor portals, download PDFs, extract totals, update internal bookkeeping sheet.
  • Bank reconciliation: Fetch monthly statements, match transactions against ERP exports, flag discrepancies.

7.2 Legal & Compliance

  • Contract review: Batch open agreements, search for liability clauses, export findings into a due-diligence report.
  • GDPR data mapping: Scan shared folders for personal data patterns, produce inventory spreadsheet.

7.3 Software QA & DevOps

  • Cross-browser UI testing: Open Firefox, Chrome, and Edge in sequence, run the same manual test steps, take screenshots for comparison.
  • Release verification: After CI/CD finishes, spin up Bytebot, install the new build, run end-to-end smoke tests.

7.4 Market Research

  • Competitive price monitoring: Visit competitor e-commerce sites daily, capture price tables, save CSV with timestamps.
  • Sentiment scraping: Read public forums, save threads to local DB, run custom sentiment analysis script.

8. Frequently Asked Questions

Is my data safe?

Everything runs in your container or Railway project. The AI provider only receives the prompts needed to decide the next action; files and credentials stay inside the environment.

Can Bytebot run Windows apps?

The base image is Ubuntu. You can install Wine for some Windows software or connect Bytebot to a separate Windows RDP host if required.

How do I store passwords securely?

Install Bitwarden CLI or 1Password X inside the browser once. Subsequent tasks can auto-fill logins via the password manager.

What if a task gets stuck?

Hit “Takeover” in the web UI. Your mouse and keyboard events are forwarded to the virtual desktop so you can intervene, then click “Resume” to let the AI continue.

Which AI models are supported?

Out of the box: Anthropic Claude, OpenAI GPT-4/3.5, Google Gemini. Through LiteLLM you can route to Azure OpenAI, AWS Bedrock, or local Ollama models.


9. Going Further: Scripting, Kubernetes, and Community Resources

9.1 Customize the Environment

# Inside the running container
sudo apt update && sudo apt install imagemagick tesseract-ocr

Any new software is available to all future tasks that use the same container volume.

9.2 Helm Chart for Enterprise

git clone https://github.com/bytebot-ai/bytebot.git
cd bytebot
helm install bytebot ./helm \
  --set agent.env.ANTHROPIC_API_KEY=sk-ant-... \
  --set persistence.size=20Gi

Use Kubernetes secrets for keys and persistent volumes for user data.

9.3 Community Templates

The Discord server hosts channels such as #task-recipes and #show-and-tell where users share ready-to-use prompts and shell scripts.


Ready to Try?

Click the Railway button or run docker-compose up—in under five minutes you’ll have a tireless digital coworker waiting for its first assignment.