Meet Bytebot: The Open-Source AI That Actually Uses a Computer for You
Imagine an intern who never sleeps, never complains, and already knows how to drive Firefox, LibreOffice, and the command line.
Bytebot is exactly that—an open-source desktop agent that lives inside its own Ubuntu computer and carries out multi-step tasks while you watch.
Table of Contents
-
What Is a Desktop Agent, Really? -
Why Hand an AI a Full Computer Instead of Just a Browser? -
The 2-Minute Setup Guide (Railway or Docker) -
Everyday Tasks Bytebot Can Handle Today -
Under the Hood: Four Moving Parts -
How to Speak to Bytebot: Prompts, Files, and APIs -
Real-World Use Cases (Finance, Legal, DevOps, Research) -
Frequently Asked Questions -
Going Further: Scripting, Kubernetes, and Community Resources
1. What Is a Desktop Agent, Really?
Most AI tools today live in chat windows or browser tabs.
A desktop agent steps outside that box and takes control of an entire graphical operating system. Bytebot ships as a Docker image that spins up:
-
A complete Ubuntu 22.04 desktop with XFCE -
Pre-installed apps: Firefox, VS Code, LibreOffice, terminal, and more -
A small NestJS service that translates your plain-English instructions into mouse clicks, key strokes, file moves, and shell commands
In short, you give Bytebot a sentence such as “Download last month’s invoices from three vendor portals and create a summary spreadsheet”—then you watch it open the browser, log in, download PDFs, rename files, and launch LibreOffice Calc without any further help.
2. Why Hand an AI a Full Computer Instead of Just a Browser?
2.1 End-to-End Autonomy
Traditional RPA or browser-only agents break down when a workflow spans multiple applications.
Bytebot’s environment is unrestricted:
Task Step | Browser-Only Tool | Bytebot |
---|---|---|
Log into banking site | ✅ | ✅ |
Download PDF statement | ✅ | ✅ |
Open PDF, extract table | ❌ | ✅ (opens built-in PDF reader or LibreOffice) |
Paste table into Excel | ❌ | ✅ |
Save file to shared drive | ❌ | ✅ |
2.2 Native File Handling
You drag-and-drop files onto the web UI. Bytebot places them in its own ~/Downloads
, ~/Documents
, or any folder you specify. From there it can:
-
Read entire PDFs into context -
Batch-rename images -
Convert .docx
to.pdf
withlibreoffice --headless --convert-to
-
Run ffmpeg
to extract audio from video—all inside the container
2.3 Genuine Cross-Application Workflows
Because the underlying system is Linux, Bytebot can:
-
Launch VS Code, open a project, run npm test
, and push to Git -
Open an email client, forward attachments, then open the same attachments in GIMP for quick edits -
Install new software on the fly with apt-get
—the change persists as long as the container lives
3. The 2-Minute Setup Guide
Option A: One-Click Railway
-
Click the button. -
Paste your AI-provider key (Anthropic, OpenAI, or Google). -
Wait ~90 seconds. Railway returns a public URL; open it in your browser.
Option B: Local Docker (Fully Offline)
# 1. Clone
git clone https://github.com/bytebot-ai/bytebot.git
cd bytebot
# 2. Choose one provider key
echo "ANTHROPIC_API_KEY=sk-ant-..." > docker/.env
# or: echo "OPENAI_API_KEY=sk-..." > docker/.env
# or: echo "GEMINI_API_KEY=..." > docker/.env
# 3. Launch
docker-compose -f docker/docker-compose.yml up -d
# 4. Open
open http://localhost:9992 # or your Docker host IP
Initial image pull is ~2 GB; expect 3–5 minutes on a normal broadband line.
4. Everyday Tasks Bytebot Can Handle Today
Below are copy-paste prompt ideas you can try immediately after setup.
4.1 Simple Web + File Tasks
Prompt | Bytebot Action Stream |
---|---|
“Go to Wikipedia, search for ‘quantum computing’, and save a one-page summary to ~/Documents/quantum.txt ” |
Opens Firefox → types query → copies intro paragraphs → opens VS Code → pastes → saves. |
“Take screenshots of the top 5 US news homepages and store them in ~/screenshots with today’s date in the filename” |
Navigates to CNN, BBC, Reuters… uses Firefox screenshot tool → renames files. |
4.2 Document Processing
Prompt | Bytebot Action Stream |
---|---|
“Read the uploaded contracts.pdf and list every payment deadline in an Excel file” |
Opens PDF viewer → searches for “due”, “deadline” → copies dates → opens LibreOffice Calc → pastes → saves as payment_deadlines.xlsx . |
“Process the 50 invoices in ~/Invoices_2024 and create a summary CSV with vendor, amount, and due date columns” |
Loops over PDFs → extracts text using pdftotext → regex parse → writes CSV. |
4.3 Multi-System Workflows
Prompt | Bytebot Action Stream |
---|---|
“Log into our CRM, export last month’s customer list, and import it into the ERP” | Opens CRM URL → uses password manager → exports CSV → opens ERP → imports → confirms. |
5. Under the Hood: Four Moving Parts
-
Virtual Desktop
Ubuntu 22.04 + XFCE inside a Docker container. Persists installed software and user files across restarts. -
AI Agent Service
Written in NestJS. Receives your task, breaks it into atomic actions (click, type, read file), and queries the LLM for next steps. -
Web UI & Live View
Next.js front-end. Left panel for task descriptions and file uploads. Right panel shows a live VNC stream of the desktop so you can watch—or take over—at any time. -
REST & WebSocket APIs
-
POST /tasks
to queue work -
POST /computer-use
for low-level mouse/keyboard commands -
WebSocket stream for real-time screen updates
-
6. How to Speak to Bytebot: Prompts, Files, and APIs
6.1 Natural Language in the Web UI
Type as you would to a colleague:
“Download the latest sales report from the shared drive, open it in LibreOffice Calc, and highlight any row where revenue is below 10,000 USD.”
Bytebot maps “shared drive” to a mounted volume, “LibreOffice Calc” to the application menu, and “highlight” to conditional formatting.
6.2 Uploading Files
Drag files onto the task panel. Bytebot places them in ~/uploads/<task-id>/
. From there you can reference them in prompts:
“Read the file
~/uploads/123/report.pdf
and create a 300-word summary in Markdown.”
6.3 Programmatic Control
import requests
# Simple task
resp = requests.post('http://localhost:9991/tasks', json={
'description': 'Generate a daily backup script and save it to ~/backups/backup.sh'
})
# Task with file
files = {'files': open('data.zip', 'rb')}
resp = requests.post('http://localhost:9991/tasks',
data={'description': 'Unzip data.zip and list all filenames in result.txt'},
files=files)
6.4 Low-Level Commands
# Move mouse and click
curl -X POST http://localhost:9990/computer-use \
-H "Content-Type: application/json" \
-d '{"action": "click_mouse", "coordinate": [500, 300]}'
7. Real-World Use Cases
7.1 Finance & Accounting
-
Invoice ingestion: Log into multiple vendor portals, download PDFs, extract totals, update internal bookkeeping sheet. -
Bank reconciliation: Fetch monthly statements, match transactions against ERP exports, flag discrepancies.
7.2 Legal & Compliance
-
Contract review: Batch open agreements, search for liability clauses, export findings into a due-diligence report. -
GDPR data mapping: Scan shared folders for personal data patterns, produce inventory spreadsheet.
7.3 Software QA & DevOps
-
Cross-browser UI testing: Open Firefox, Chrome, and Edge in sequence, run the same manual test steps, take screenshots for comparison. -
Release verification: After CI/CD finishes, spin up Bytebot, install the new build, run end-to-end smoke tests.
7.4 Market Research
-
Competitive price monitoring: Visit competitor e-commerce sites daily, capture price tables, save CSV with timestamps. -
Sentiment scraping: Read public forums, save threads to local DB, run custom sentiment analysis script.
8. Frequently Asked Questions
Is my data safe?
Everything runs in your container or Railway project. The AI provider only receives the prompts needed to decide the next action; files and credentials stay inside the environment.
Can Bytebot run Windows apps?
The base image is Ubuntu. You can install Wine for some Windows software or connect Bytebot to a separate Windows RDP host if required.
How do I store passwords securely?
Install Bitwarden CLI or 1Password X inside the browser once. Subsequent tasks can auto-fill logins via the password manager.
What if a task gets stuck?
Hit “Takeover” in the web UI. Your mouse and keyboard events are forwarded to the virtual desktop so you can intervene, then click “Resume” to let the AI continue.
Which AI models are supported?
Out of the box: Anthropic Claude, OpenAI GPT-4/3.5, Google Gemini. Through LiteLLM you can route to Azure OpenAI, AWS Bedrock, or local Ollama models.
9. Going Further: Scripting, Kubernetes, and Community Resources
9.1 Customize the Environment
# Inside the running container
sudo apt update && sudo apt install imagemagick tesseract-ocr
Any new software is available to all future tasks that use the same container volume.
9.2 Helm Chart for Enterprise
git clone https://github.com/bytebot-ai/bytebot.git
cd bytebot
helm install bytebot ./helm \
--set agent.env.ANTHROPIC_API_KEY=sk-ant-... \
--set persistence.size=20Gi
Use Kubernetes secrets for keys and persistent volumes for user data.
9.3 Community Templates
The Discord server hosts channels such as #task-recipes
and #show-and-tell
where users share ready-to-use prompts and shell scripts.
Ready to Try?
Click the Railway button or run docker-compose up
—in under five minutes you’ll have a tireless digital coworker waiting for its first assignment.