Introduction
In our daily work, we often need to repeatedly perform various browser operations—filling out forms, downloading files, extracting data, completing login processes, and more. Traditional automation methods rely on writing scripts for specific websites, using XPath or CSS selectors to locate elements. However, any minor change in website layout can cause these scripts to fail.
Now, a smarter solution has emerged. Skyvern fundamentally changes how browser automation is implemented by combining Large Language Models (LLMs) and computer vision technology. It can “see” and understand web page content like a human, comprehend task requirements, and autonomously decide how to operate—all without writing specific code for each website.
This article provides an in-depth look at Skyvern’s working principles, installation and usage methods, core features, and practical application scenarios, helping you fully understand this revolutionary automation tool.
What is Skyvern?
Skyvern is an AI-based browser automation platform that uses LLMs and computer vision to automate various browser workflows. Unlike traditional methods, Skyvern doesn’t require pre-written scripts for specific websites. Instead, it understands web pages’ visual elements and text content to make autonomous decisions and execute operations.
Key Features:
-
No need to write website-specific code -
Resilient to website layout changes -
Capable of handling never-before-seen websites -
Supports complex reasoning and decision-making
How Skyvern Works
Skyvern’s design draws inspiration from task-driven autonomous agent architectures like BabyAGI and AutoGPT, but adds a crucial capability: interacting with websites through browser automation libraries like Playwright.
Multi-Agent System Architecture
Skyvern uses a team of specialized agents that collaborate to complete tasks:
-
Understanding Agent: Analyzes web page content and identifies interactive elements -
Planning Agent: Develops the sequence of steps needed to complete the task -
Execution Agent: Actually performs browser operations like clicking, typing, and scrolling -
Validation Agent: Confirms whether operation results meet expectations
This division of labor enables Skyvern to handle complex workflows and adjust strategies when encountering unexpected situations.
Comparison with Traditional Methods
Traditional browser automation typically relies on:
-
DOM parsing and XPath selectors -
Pre-written scripts and workflows -
Custom code tailored to specific websites
The main weakness of these methods is their fragility—minor changes in website layout can break automation workflows.
Skyvern’s fundamentally different approach includes:
-
Visual understanding instead of code-based selectors -
Strong adaptability to handle layout changes -
Reasoning capabilities to manage complex situations
For example, when obtaining a car insurance quote from Geico, Skyvern can infer the answer to “Were you eligible to drive at 18?” from the fact that the driver received their license at age 16, without needing explicit instructions.
Performance and Evaluation
In the WebBench benchmark tests, Skyvern demonstrates outstanding performance with an overall accuracy rate of 64.4%. Particularly in “write” tasks (such as form filling, login, file downloads, etc.), Skyvern is the best-performing agent, which is especially important for Robotic Process Automation (RPA) related tasks.
These results indicate that Skyvern has reached industry-leading levels in handling real-world automation tasks.
Getting Started with Skyvern
Skyvern Cloud Service
For users who don’t want to handle infrastructure management, Skyvern Cloud offers a fully managed cloud service. It includes features like running multiple Skyvern instances in parallel, anti-bot detection mechanisms, proxy networks, and CAPTCHA solutions.
To try Skyvern Cloud, simply visit app.skyvern.com to create an account.
Local Installation and Usage
Environment Requirements
Before starting, ensure your system meets the following requirements:
-
Python 3.11.x (supports 3.12, not ready for 3.13 yet) -
NodeJS and NPM -
Additional requirements for Windows users: -
Rust -
VS Code with C++ development tools and Windows SDK
-
Installation Steps
-
Install Skyvern
pip install skyvern -
Initialize Skyvern
For first-time runs, database setup and migrations are needed:
skyvern quickstart -
Run Skyvern Service
skyvern run allOnce completed, visit http://localhost:8080 to use the web interface for creating and managing tasks.
Running Tasks via Code
Besides the web interface, you can also use Skyvern through Python code:
from skyvern import Skyvern
skyvern = Skyvern()
task = await skyvern.run_task(prompt="Find today's top post on HackerNews")
print(task)
Skyvern executes tasks in a browser window that pops up, automatically closing when the task is complete. You can view task history at http://localhost:8080/history.
Advanced Usage Techniques
Using Your Own Chrome Browser
Note: Starting from Chrome 136, the default user data directory refuses any CDP connections. To use your browser data, Skyvern copies the default user data directory to
./tmp/user_data_dirwhen first connecting to your local browser.
-
Control via Code
from skyvern import Skyvern # Chrome path example for Mac systems browser_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" skyvern = Skyvern( base_url="http://localhost:8000", api_key="YOUR_API_KEY", browser_path=browser_path, ) task = await skyvern.run_task( prompt="Find today's top post on HackerNews", ) -
Control via Skyvern Service
Add the following variables to your .env file:
# Chrome path example for Mac systems CHROME_EXECUTABLE_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" BROWSER_TYPE=cdp-connectAfter restarting the Skyvern service, you can run tasks through the UI or code.
Connecting to Remote Browsers
Get the CDP connection URL and pass it to Skyvern:
from skyvern import Skyvern
skyvern = Skyvern(cdp_url="Your CDP connection URL")
task = await skyvern.run_task(
prompt="Find today's top post on HackerNews",
)
Getting Structured Output
By specifying a data extraction schema, you can ensure output conforms to a specific format:
from skyvern import Skyvern
skyvern = Skyvern()
task = await skyvern.run_task(
prompt="Find today's top post on HackerNews",
data_extraction_schema={
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the top post"
},
"url": {
"type": "string",
"description": "The URL of the top post"
},
"points": {
"type": "integer",
"description": "Number of points the post has received"
}
}
}
)
Common Debugging Commands
# Start Skyvern server separately
skyvern run server
# Start Skyvern UI
skyvern run ui
# Check Skyvern service status
skyvern status
# Stop all Skyvern services
skyvern stop all
# Stop Skyvern UI
skyvern stop ui
# Stop Skyvern server
skyvern stop server
Docker Compose Deployment
For users who prefer containerized deployment, Skyvern provides Docker Compose configuration:
-
Ensure Docker Desktop is installed and running
-
Check if Postgres is running locally (using the
docker pscommand) -
Clone the repository and navigate to the root directory
-
Run
skyvern init llmto generate a .env file (this will be copied to the Docker image) -
Fill in the LLM provider key in docker-compose.yml
-
Run the following command:
docker compose up -d -
Access
http://localhost:8080in your browser to start using the UI
Important Note: Only one Postgres container can run on port 5432 at a time. If switching from CLI-managed Postgres to Docker Compose, you must first remove the original container:
docker rm -f postgresql-container
Skyvern Core Features
Task Management
Tasks are the fundamental building blocks in Skyvern. Each task represents a single request, instructing Skyvern to navigate a website and complete a specific goal.
Creating a task requires specifying:
-
url: Target website address -
prompt: Task description -
Optional data schema: If output needs to conform to a specific structure -
Optional error codes: If you want to stop execution under specific conditions
Workflow Design
Workflows allow chaining multiple tasks together to form coherent work units.
Typical Workflow Examples:
-
Invoice Download Workflow:
-
Navigate to invoice page -
Filter to show invoices after January 1st -
Extract list of eligible invoices -
Iterate through each invoice and download
-
-
E-commerce Purchase Workflow:
-
Navigate to target product page -
Add product to shopping cart -
Navigate to cart and validate state -
Complete checkout process
-
Supported Workflow Features:
-
Browser tasks -
Browser actions -
Data extraction -
Validation -
Loops -
File parsing -
Email sending -
Text prompts -
HTTP request blocks -
Custom code blocks -
Uploading files to block storage -
(Coming soon) Conditional statements
Live Streaming
Skyvern allows streaming the browser viewport to your local machine in real time, letting you watch Skyvern’s operations on web pages as they happen. This is extremely useful for debugging and understanding how Skyvern interacts with websites, allowing for intervention when necessary.
Form Filling
Skyvern natively supports filling out form inputs on websites. By passing information through the navigation_goal, Skyvern can comprehend the information and fill out forms accordingly.
Data Extraction
Skyvern can also extract data from websites. You can directly specify a data_extraction_schema in the main prompt to tell Skyvern exactly what data you want to extract from the website in JSONC format. Skyvern’s output will be structured according to the provided schema.
File Downloading
Skyvern supports downloading files from websites. All downloaded files are automatically uploaded to block storage (if configured), and you can access them through the UI.
Authentication Support
Skyvern supports multiple authentication methods, making it easier to automate tasks behind logins. If you’d like to try this feature, please contact us via email or Discord.
Two-Factor Authentication (2FA) Support
Skyvern supports multiple 2FA methods, allowing you to automate workflows that require 2FA:
-
QR code-based 2FA (like Google Authenticator, Authy) -
Email-based 2FA -
SMS-based 2FA
Password Manager Integration
Skyvern currently supports the following password manager integrations:
-
[x] Bitwarden -
[ ] 1Password (in development) -
[ ] LastPass (in development)
Model Context Protocol (MCP) Support
Skyvern supports the Model Context Protocol (MCP), allowing you to use any LLM that supports MCP.
Zapier / Make.com / N8N Integration
Skyvern integrates with Zapier, Make.com, and N8N, allowing you to connect Skyvern workflows to other applications.
Real-World Application Cases
Here are some practical examples of Skyvern in real-world scenarios:
Multi-Website Invoice Downloading
Businesses often need to download invoices from multiple vendor portals, each with different interfaces and navigation flows. Skyvern can automate this process without writing specific code for each website.
Job Application Automation
Job seekers can use Skyvern to automate the process of submitting resumes and filling out application forms, saving significant time.
Manufacturing Material Procurement
Manufacturing companies can use Skyvern to automate the process of finding and procuring raw materials, comparing prices and inventory across multiple supplier websites.
Government Website Account Registration and Form Filling
Skyvern can handle complex registration and form-filling processes on government websites, which often have unique interfaces and validation processes.
Contact Form Filling
Businesses can use Skyvern to automate filling out contact forms across multiple websites for lead generation or partner outreach.
Multi-Language Insurance Quote Retrieval
Insurance companies or comparison websites can use Skyvern to obtain quotes from multiple insurance providers, even when websites use different languages.
Supported LLM Providers
Skyvern supports multiple LLM providers, allowing you to choose the right model based on your requirements, budget, and performance needs.
| Provider | Supported Models |
|---|---|
| OpenAI | gpt4-turbo, gpt-4o, gpt-4o-mini |
| Anthropic | Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) |
| Azure OpenAI | Any GPT models, better performance with multimodal LLMs (azure/gpt4-o) |
| AWS Bedrock | Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) |
| Gemini | Gemini 2.5 Pro and flash, Gemini 2.0 |
| Ollama | Run any locally hosted model via Ollama |
| OpenRouter | Access models through OpenRouter |
| OpenAI-compatible | Any custom API endpoint following OpenAI API format (via liteLLM) |
Environment Variable Configuration
OpenAI
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_OPENAI |
Register OpenAI models | Boolean | true, false |
OPENAI_API_KEY |
OpenAI API Key | String | sk-1234567890 |
OPENAI_API_BASE |
OpenAI API Base URL, optional | String | https://openai.api.base |
OPENAI_ORGANIZATION |
OpenAI Organization ID, optional | String | your-org-id |
Recommended LLM_KEY: OPENAI_GPT4O, OPENAI_GPT4O_MINI, OPENAI_GPT4_1, OPENAI_O4_MINI, OPENAI_O3
Anthropic
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_ANTHROPIC |
Register Anthropic models | Boolean | true, false |
ANTHROPIC_API_KEY |
Anthropic API Key | String | sk-1234567890 |
Recommended LLM_KEY: ANTHROPIC_CLAUDE3.5_SONNET, ANTHROPIC_CLAUDE3.7_SONNET, ANTHROPIC_CLAUDE4_OPUS, ANTHROPIC_CLAUDE4_SONNET
Azure OpenAI
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_AZURE |
Register Azure OpenAI models | Boolean | true, false |
AZURE_API_KEY |
Azure deployment API key | String | sk-1234567890 |
AZURE_DEPLOYMENT |
Azure OpenAI deployment name | String | skyvern-deployment |
AZURE_API_BASE |
Azure deployment API base URL | String | https://skyvern-deployment.openai.azure.com/ |
AZURE_API_VERSION |
Azure API version | String | 2024-02-01 |
Recommended LLM_KEY: AZURE_OPENAI
AWS Bedrock
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_BEDROCK |
Register AWS Bedrock models. To use AWS Bedrock, make sure your AWS configurations are set up correctly first | Boolean | true, false |
Recommended LLM_KEY: BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE
Gemini
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_GEMINI |
Register Gemini models | Boolean | true, false |
GEMINI_API_KEY |
Gemini API Key | String | your_google_gemini_api_key |
Recommended LLM_KEY: GEMINI_2.5_PRO_PREVIEW, GEMINI_2.5_FLASH_PREVIEW
Ollama
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_OLLAMA |
Register local models via Ollama | Boolean | true, false |
OLLAMA_SERVER_URL |
Ollama server URL | String | http://host.docker.internal:11434 |
OLLAMA_MODEL |
Ollama model name | String | qwen2.5:7b-instruct |
Recommended LLM_KEY: OLLAMA
Note: Ollama doesn’t support vision capabilities yet.
OpenRouter
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_OPENROUTER |
Register OpenRouter models | Boolean | true, false |
OPENROUTER_API_KEY |
OpenRouter API key | String | sk-1234567890 |
OPENROUTER_MODEL |
OpenRouter model name | String | mistralai/mistral-small-3.1-24b-instruct |
OPENROUTER_API_BASE |
OpenRouter API base URL | String | https://api.openrouter.ai/v1 |
Recommended LLM_KEY: OPENROUTER
OpenAI-Compatible
| Variable | Description | Type | Sample Value |
|---|---|---|---|
ENABLE_OPENAI_COMPATIBLE |
Register custom OpenAI-compatible API endpoint | Boolean | true, false |
OPENAI_COMPATIBLE_MODEL_NAME |
OpenAI-compatible endpoint model name | String | yi-34b, gpt-3.5-turbo, mistral-large, etc. |
OPENAI_COMPATIBLE_API_KEY |
OpenAI-compatible endpoint API key | String | sk-1234567890 |
OPENAI_COMPATIBLE_API_BASE |
OpenAI-compatible endpoint base URL | String | https://api.together.xyz/v1, http://localhost:8000/v1, etc. |
OPENAI_COMPATIBLE_API_VERSION |
OpenAI-compatible endpoint API version, optional | String | 2023-05-15 |
OPENAI_COMPATIBLE_MAX_TOKENS |
Maximum tokens for completion, optional | Integer | 4096, 8192, etc. |
OPENAI_COMPATIBLE_TEMPERATURE |
Temperature setting, optional | Float | 0.0, 0.5, 0.7, etc. |
OPENAI_COMPATIBLE_SUPPORTS_VISION |
Whether model supports vision, optional | Boolean | true, false |
Supported LLM Key: OPENAI_COMPATIBLE
General LLM Configuration
| Variable | Description | Type | Sample Value |
|---|---|---|---|
LLM_KEY |
The name of the model you want to use | String | See supported LLM keys above |
SECONDARY_LLM_KEY |
The name of the model for mini agents Skyvern runs with | String | See supported LLM keys above |
LLM_CONFIG_MAX_TOKENS |
Override the max tokens used by the LLM | Integer | 128000 |
Developer Setup
For developers who want to contribute code or customize Skyvern, here are the steps to set up the development environment:
Make sure you have uv installed.
-
Create virtual environment (
.venv)uv sync --group dev -
Perform initial server configuration
uv run skyvern quickstart -
Access
http://localhost:8080in your browser to start using the UISkyvern CLI supports Windows, WSL, macOS, and Linux environments.
Feature Roadmap
The Skyvern team has a clear development plan. Here are the main goals for the coming months:
-
[x] Open Source – Open source Skyvern core codebase -
[x] Workflow Support – Support chaining multiple Skyvern calls together -
[x] Improved Context Understanding – Enhance Skyvern’s ability to understand content around interactive elements by providing relevant label context through text prompts -
[x] Cost Optimization – Improve stability and reduce running costs by optimizing the context tree passed to Skyvern -
[x] Self-Service UI – Replace Streamlit UI with React-based UI components allowing users to launch new tasks in Skyvern -
[x] Workflow UI Builder – Introduce UI allowing users to visually build and analyze workflows -
[x] Chrome Viewport Streaming – Introduce method to stream Chrome viewport to user’s browser in real time -
[x] Historical Run UI – Replace Streamlit UI with React-based UI allowing visualization of historical runs and their results -
[X] Auto Workflow Builder (“Observer” Mode) – Allow Skyvern to automatically generate workflows while browsing the web, making it easier to build new workflows -
[x] Prompt Caching – Introduce caching layer for LLM calls, significantly reducing Skyvern running costs -
[x] Web Evaluation Dataset – Integrate Skyvern with public benchmark tests to track model quality over time -
[ ] Improved Debug Mode – Allow Skyvern to plan actions and get “approval” before execution, facilitating debugging and prompt iteration -
[ ] Chrome Extension – Allow users to interact with Skyvern through Chrome extension -
[ ] Skyvern Action Recorder – Allow Skyvern to observe users completing tasks and automatically generate workflows -
[ ] Interactive Live Streaming – Allow users to interact with streams in real time for intervention when necessary -
[ ] Integrated LLM Observability Tools – Integrate LLM observability tools allowing backtesting of prompt changes with specific datasets -
[x] Langchain Integration – Create integration in langchain_community to use Skyvern as a “tool”
Frequently Asked Questions
How is Skyvern different from traditional RPA tools?
Traditional RPA tools typically rely on recording and playback techniques or scripts based on XPath/CSS selectors—methods that often fail when website layouts change. Skyvern uses LLMs and computer vision to understand web page content, adapt to layout changes, handle never-before-seen websites, and employ reasoning capabilities for complex situations.
Can Skyvern handle websites that require login?
Yes, Skyvern supports multiple authentication methods, including username/password login and two-factor authentication (2FA). It supports QR code-based, email-based, and SMS-based 2FA, and can integrate with password managers like Bitwarden.
How does Skyvern ensure data security?
When using local deployment, all data remains in your environment. Skyvern’s open-source version doesn’t include the anti-bot detection features available in the cloud service, but the core automation logic is identical. If you have licensing questions, you can contact the support team.
Which browsers does Skyvern support?
Skyvern is primarily optimized for Chromium-based browsers (like Google Chrome, Microsoft Edge) and interacts with browsers through Chrome DevTools Protocol (CDP). It supports connecting to both local and remote browser instances.
What if Skyvern gets stuck while executing a task?
Skyvern provides multiple debugging tools:
-
Live streaming functionality lets you observe the execution process -
Detailed task history allows reviewing each operation step -
You can intervene in task execution through UI or code -
Comprehensive logging helps diagnose issues
How well does Skyvern perform?
According to WebBench benchmark tests, Skyvern achieves 64.4% accuracy on overall tasks, with particularly outstanding performance on “write” tasks (like form filling, login, file downloads, etc.), which are core requirements for RPA scenarios.
Can Skyvern’s behavior be customized?
Yes, Skyvern offers multiple customization methods:
-
Define output format through data extraction schemas -
Define stopping conditions through error codes -
Support for custom workflows combining multiple tasks -
Integration of custom code blocks
Conclusion
Skyvern represents a significant advancement in the field of browser automation. By combining LLMs and computer vision, it addresses the fundamental limitations of traditional automation methods. It doesn’t require writing specific code for each website, can adapt to website layout changes, and possesses the ability to handle complex situations.
Whether you’re a business looking to automate repetitive workflows or a developer seeking more reliable browser automation solutions, Skyvern is worth trying. Its open-source version provides complete core functionality, while the cloud service offers convenience for users who don’t want to manage infrastructure.
As AI technology continues to evolve, tools like Skyvern have the potential to fundamentally change how we interact with web applications, freeing people from repetitive tasks and allowing them to focus on more valuable work.
