Introduction

In our daily work, we often need to repeatedly perform various browser operations—filling out forms, downloading files, extracting data, completing login processes, and more. Traditional automation methods rely on writing scripts for specific websites, using XPath or CSS selectors to locate elements. However, any minor change in website layout can cause these scripts to fail.

Now, a smarter solution has emerged. Skyvern fundamentally changes how browser automation is implemented by combining Large Language Models (LLMs) and computer vision technology. It can “see” and understand web page content like a human, comprehend task requirements, and autonomously decide how to operate—all without writing specific code for each website.

This article provides an in-depth look at Skyvern’s working principles, installation and usage methods, core features, and practical application scenarios, helping you fully understand this revolutionary automation tool.

What is Skyvern?

Skyvern is an AI-based browser automation platform that uses LLMs and computer vision to automate various browser workflows. Unlike traditional methods, Skyvern doesn’t require pre-written scripts for specific websites. Instead, it understands web pages’ visual elements and text content to make autonomous decisions and execute operations.

Key Features:

  • No need to write website-specific code
  • Resilient to website layout changes
  • Capable of handling never-before-seen websites
  • Supports complex reasoning and decision-making
Skyvern System Architecture

How Skyvern Works

Skyvern’s design draws inspiration from task-driven autonomous agent architectures like BabyAGI and AutoGPT, but adds a crucial capability: interacting with websites through browser automation libraries like Playwright.

Multi-Agent System Architecture

Skyvern uses a team of specialized agents that collaborate to complete tasks:

  1. Understanding Agent: Analyzes web page content and identifies interactive elements
  2. Planning Agent: Develops the sequence of steps needed to complete the task
  3. Execution Agent: Actually performs browser operations like clicking, typing, and scrolling
  4. Validation Agent: Confirms whether operation results meet expectations

This division of labor enables Skyvern to handle complex workflows and adjust strategies when encountering unexpected situations.

Comparison with Traditional Methods

Traditional browser automation typically relies on:

  • DOM parsing and XPath selectors
  • Pre-written scripts and workflows
  • Custom code tailored to specific websites

The main weakness of these methods is their fragility—minor changes in website layout can break automation workflows.

Skyvern’s fundamentally different approach includes:

  • Visual understanding instead of code-based selectors
  • Strong adaptability to handle layout changes
  • Reasoning capabilities to manage complex situations

For example, when obtaining a car insurance quote from Geico, Skyvern can infer the answer to “Were you eligible to drive at 18?” from the fact that the driver received their license at age 16, without needing explicit instructions.

Performance and Evaluation

In the WebBench benchmark tests, Skyvern demonstrates outstanding performance with an overall accuracy rate of 64.4%. Particularly in “write” tasks (such as form filling, login, file downloads, etc.), Skyvern is the best-performing agent, which is especially important for Robotic Process Automation (RPA) related tasks.

WebBench Overall Performance
WebBench Write Task Performance

These results indicate that Skyvern has reached industry-leading levels in handling real-world automation tasks.

Getting Started with Skyvern

Skyvern Cloud Service

For users who don’t want to handle infrastructure management, Skyvern Cloud offers a fully managed cloud service. It includes features like running multiple Skyvern instances in parallel, anti-bot detection mechanisms, proxy networks, and CAPTCHA solutions.

To try Skyvern Cloud, simply visit app.skyvern.com to create an account.

Local Installation and Usage

Environment Requirements

Before starting, ensure your system meets the following requirements:

  • Python 3.11.x (supports 3.12, not ready for 3.13 yet)
  • NodeJS and NPM
  • Additional requirements for Windows users:

    • Rust
    • VS Code with C++ development tools and Windows SDK

Installation Steps

  1. Install Skyvern

    pip install skyvern
    
  2. Initialize Skyvern

    For first-time runs, database setup and migrations are needed:

    skyvern quickstart
    
  3. Run Skyvern Service

    skyvern run all
    

    Once completed, visit http://localhost:8080 to use the web interface for creating and managing tasks.

Running Tasks via Code

Besides the web interface, you can also use Skyvern through Python code:

from skyvern import Skyvern

skyvern = Skyvern()
task = await skyvern.run_task(prompt="Find today's top post on HackerNews")
print(task)

Skyvern executes tasks in a browser window that pops up, automatically closing when the task is complete. You can view task history at http://localhost:8080/history.

Advanced Usage Techniques

Using Your Own Chrome Browser

Note: Starting from Chrome 136, the default user data directory refuses any CDP connections. To use your browser data, Skyvern copies the default user data directory to ./tmp/user_data_dir when first connecting to your local browser.

  1. Control via Code

    from skyvern import Skyvern
    
    # Chrome path example for Mac systems
    browser_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
    skyvern = Skyvern(
        base_url="http://localhost:8000",
        api_key="YOUR_API_KEY",
        browser_path=browser_path,
    )
    task = await skyvern.run_task(
        prompt="Find today's top post on HackerNews",
    )
    
  2. Control via Skyvern Service

    Add the following variables to your .env file:

    # Chrome path example for Mac systems
    CHROME_EXECUTABLE_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
    BROWSER_TYPE=cdp-connect
    

    After restarting the Skyvern service, you can run tasks through the UI or code.

Connecting to Remote Browsers

Get the CDP connection URL and pass it to Skyvern:

from skyvern import Skyvern

skyvern = Skyvern(cdp_url="Your CDP connection URL")
task = await skyvern.run_task(
    prompt="Find today's top post on HackerNews",
)

Getting Structured Output

By specifying a data extraction schema, you can ensure output conforms to a specific format:

from skyvern import Skyvern

skyvern = Skyvern()
task = await skyvern.run_task(
    prompt="Find today's top post on HackerNews",
    data_extraction_schema={
        "type": "object",
        "properties": {
            "title": {
                "type": "string",
                "description": "The title of the top post"
            },
            "url": {
                "type": "string",
                "description": "The URL of the top post"
            },
            "points": {
                "type": "integer",
                "description": "Number of points the post has received"
            }
        }
    }
)

Common Debugging Commands

# Start Skyvern server separately
skyvern run server

# Start Skyvern UI
skyvern run ui

# Check Skyvern service status
skyvern status

# Stop all Skyvern services
skyvern stop all

# Stop Skyvern UI
skyvern stop ui

# Stop Skyvern server
skyvern stop server

Docker Compose Deployment

For users who prefer containerized deployment, Skyvern provides Docker Compose configuration:

  1. Ensure Docker Desktop is installed and running

  2. Check if Postgres is running locally (using the docker ps command)

  3. Clone the repository and navigate to the root directory

  4. Run skyvern init llm to generate a .env file (this will be copied to the Docker image)

  5. Fill in the LLM provider key in docker-compose.yml

  6. Run the following command:

    docker compose up -d
    
  7. Access http://localhost:8080 in your browser to start using the UI

Important Note: Only one Postgres container can run on port 5432 at a time. If switching from CLI-managed Postgres to Docker Compose, you must first remove the original container:

docker rm -f postgresql-container

Skyvern Core Features

Task Management

Tasks are the fundamental building blocks in Skyvern. Each task represents a single request, instructing Skyvern to navigate a website and complete a specific goal.

Creating a task requires specifying:

  • url: Target website address
  • prompt: Task description
  • Optional data schema: If output needs to conform to a specific structure
  • Optional error codes: If you want to stop execution under specific conditions
Skyvern Task Interface

Workflow Design

Workflows allow chaining multiple tasks together to form coherent work units.

Typical Workflow Examples:

  1. Invoice Download Workflow:

    • Navigate to invoice page
    • Filter to show invoices after January 1st
    • Extract list of eligible invoices
    • Iterate through each invoice and download
  2. E-commerce Purchase Workflow:

    • Navigate to target product page
    • Add product to shopping cart
    • Navigate to cart and validate state
    • Complete checkout process

Supported Workflow Features:

  • Browser tasks
  • Browser actions
  • Data extraction
  • Validation
  • Loops
  • File parsing
  • Email sending
  • Text prompts
  • HTTP request blocks
  • Custom code blocks
  • Uploading files to block storage
  • (Coming soon) Conditional statements
Workflow Block Example

Live Streaming

Skyvern allows streaming the browser viewport to your local machine in real time, letting you watch Skyvern’s operations on web pages as they happen. This is extremely useful for debugging and understanding how Skyvern interacts with websites, allowing for intervention when necessary.

Form Filling

Skyvern natively supports filling out form inputs on websites. By passing information through the navigation_goal, Skyvern can comprehend the information and fill out forms accordingly.

Data Extraction

Skyvern can also extract data from websites. You can directly specify a data_extraction_schema in the main prompt to tell Skyvern exactly what data you want to extract from the website in JSONC format. Skyvern’s output will be structured according to the provided schema.

File Downloading

Skyvern supports downloading files from websites. All downloaded files are automatically uploaded to block storage (if configured), and you can access them through the UI.

Authentication Support

Skyvern supports multiple authentication methods, making it easier to automate tasks behind logins. If you’d like to try this feature, please contact us via email or Discord.

Secure Password Task Example

Two-Factor Authentication (2FA) Support

Skyvern supports multiple 2FA methods, allowing you to automate workflows that require 2FA:

  • QR code-based 2FA (like Google Authenticator, Authy)
  • Email-based 2FA
  • SMS-based 2FA

Password Manager Integration

Skyvern currently supports the following password manager integrations:

  • [x] Bitwarden
  • [ ] 1Password (in development)
  • [ ] LastPass (in development)

Model Context Protocol (MCP) Support

Skyvern supports the Model Context Protocol (MCP), allowing you to use any LLM that supports MCP.

Zapier / Make.com / N8N Integration

Skyvern integrates with Zapier, Make.com, and N8N, allowing you to connect Skyvern workflows to other applications.

Real-World Application Cases

Here are some practical examples of Skyvern in real-world scenarios:

Multi-Website Invoice Downloading

Businesses often need to download invoices from multiple vendor portals, each with different interfaces and navigation flows. Skyvern can automate this process without writing specific code for each website.

Invoice Download Demo

Job Application Automation

Job seekers can use Skyvern to automate the process of submitting resumes and filling out application forms, saving significant time.

Job Application Demo

Manufacturing Material Procurement

Manufacturing companies can use Skyvern to automate the process of finding and procuring raw materials, comparing prices and inventory across multiple supplier websites.

Material Procurement Demo

Government Website Account Registration and Form Filling

Skyvern can handle complex registration and form-filling processes on government websites, which often have unique interfaces and validation processes.

Government Services Demo

Contact Form Filling

Businesses can use Skyvern to automate filling out contact forms across multiple websites for lead generation or partner outreach.

Contact Forms Demo

Multi-Language Insurance Quote Retrieval

Insurance companies or comparison websites can use Skyvern to obtain quotes from multiple insurance providers, even when websites use different languages.

Insurance Quote Demo
Geico Insurance Demo

Supported LLM Providers

Skyvern supports multiple LLM providers, allowing you to choose the right model based on your requirements, budget, and performance needs.

Provider Supported Models
OpenAI gpt4-turbo, gpt-4o, gpt-4o-mini
Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet)
Azure OpenAI Any GPT models, better performance with multimodal LLMs (azure/gpt4-o)
AWS Bedrock Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet)
Gemini Gemini 2.5 Pro and flash, Gemini 2.0
Ollama Run any locally hosted model via Ollama
OpenRouter Access models through OpenRouter
OpenAI-compatible Any custom API endpoint following OpenAI API format (via liteLLM)

Environment Variable Configuration

OpenAI

Variable Description Type Sample Value
ENABLE_OPENAI Register OpenAI models Boolean true, false
OPENAI_API_KEY OpenAI API Key String sk-1234567890
OPENAI_API_BASE OpenAI API Base URL, optional String https://openai.api.base
OPENAI_ORGANIZATION OpenAI Organization ID, optional String your-org-id

Recommended LLM_KEY: OPENAI_GPT4O, OPENAI_GPT4O_MINI, OPENAI_GPT4_1, OPENAI_O4_MINI, OPENAI_O3

Anthropic

Variable Description Type Sample Value
ENABLE_ANTHROPIC Register Anthropic models Boolean true, false
ANTHROPIC_API_KEY Anthropic API Key String sk-1234567890

Recommended LLM_KEY: ANTHROPIC_CLAUDE3.5_SONNET, ANTHROPIC_CLAUDE3.7_SONNET, ANTHROPIC_CLAUDE4_OPUS, ANTHROPIC_CLAUDE4_SONNET

Azure OpenAI

Variable Description Type Sample Value
ENABLE_AZURE Register Azure OpenAI models Boolean true, false
AZURE_API_KEY Azure deployment API key String sk-1234567890
AZURE_DEPLOYMENT Azure OpenAI deployment name String skyvern-deployment
AZURE_API_BASE Azure deployment API base URL String https://skyvern-deployment.openai.azure.com/
AZURE_API_VERSION Azure API version String 2024-02-01

Recommended LLM_KEY: AZURE_OPENAI

AWS Bedrock

Variable Description Type Sample Value
ENABLE_BEDROCK Register AWS Bedrock models. To use AWS Bedrock, make sure your AWS configurations are set up correctly first Boolean true, false

Recommended LLM_KEY: BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE, BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE

Gemini

Variable Description Type Sample Value
ENABLE_GEMINI Register Gemini models Boolean true, false
GEMINI_API_KEY Gemini API Key String your_google_gemini_api_key

Recommended LLM_KEY: GEMINI_2.5_PRO_PREVIEW, GEMINI_2.5_FLASH_PREVIEW

Ollama

Variable Description Type Sample Value
ENABLE_OLLAMA Register local models via Ollama Boolean true, false
OLLAMA_SERVER_URL Ollama server URL String http://host.docker.internal:11434
OLLAMA_MODEL Ollama model name String qwen2.5:7b-instruct

Recommended LLM_KEY: OLLAMA

Note: Ollama doesn’t support vision capabilities yet.

OpenRouter

Variable Description Type Sample Value
ENABLE_OPENROUTER Register OpenRouter models Boolean true, false
OPENROUTER_API_KEY OpenRouter API key String sk-1234567890
OPENROUTER_MODEL OpenRouter model name String mistralai/mistral-small-3.1-24b-instruct
OPENROUTER_API_BASE OpenRouter API base URL String https://api.openrouter.ai/v1

Recommended LLM_KEY: OPENROUTER

OpenAI-Compatible

Variable Description Type Sample Value
ENABLE_OPENAI_COMPATIBLE Register custom OpenAI-compatible API endpoint Boolean true, false
OPENAI_COMPATIBLE_MODEL_NAME OpenAI-compatible endpoint model name String yi-34b, gpt-3.5-turbo, mistral-large, etc.
OPENAI_COMPATIBLE_API_KEY OpenAI-compatible endpoint API key String sk-1234567890
OPENAI_COMPATIBLE_API_BASE OpenAI-compatible endpoint base URL String https://api.together.xyz/v1, http://localhost:8000/v1, etc.
OPENAI_COMPATIBLE_API_VERSION OpenAI-compatible endpoint API version, optional String 2023-05-15
OPENAI_COMPATIBLE_MAX_TOKENS Maximum tokens for completion, optional Integer 4096, 8192, etc.
OPENAI_COMPATIBLE_TEMPERATURE Temperature setting, optional Float 0.0, 0.5, 0.7, etc.
OPENAI_COMPATIBLE_SUPPORTS_VISION Whether model supports vision, optional Boolean true, false

Supported LLM Key: OPENAI_COMPATIBLE

General LLM Configuration

Variable Description Type Sample Value
LLM_KEY The name of the model you want to use String See supported LLM keys above
SECONDARY_LLM_KEY The name of the model for mini agents Skyvern runs with String See supported LLM keys above
LLM_CONFIG_MAX_TOKENS Override the max tokens used by the LLM Integer 128000

Developer Setup

For developers who want to contribute code or customize Skyvern, here are the steps to set up the development environment:

Make sure you have uv installed.

  1. Create virtual environment (.venv)

    uv sync --group dev
    
  2. Perform initial server configuration

    uv run skyvern quickstart
    
  3. Access http://localhost:8080 in your browser to start using the UI

    Skyvern CLI supports Windows, WSL, macOS, and Linux environments.

Feature Roadmap

The Skyvern team has a clear development plan. Here are the main goals for the coming months:

  • [x] Open Source – Open source Skyvern core codebase
  • [x] Workflow Support – Support chaining multiple Skyvern calls together
  • [x] Improved Context Understanding – Enhance Skyvern’s ability to understand content around interactive elements by providing relevant label context through text prompts
  • [x] Cost Optimization – Improve stability and reduce running costs by optimizing the context tree passed to Skyvern
  • [x] Self-Service UI – Replace Streamlit UI with React-based UI components allowing users to launch new tasks in Skyvern
  • [x] Workflow UI Builder – Introduce UI allowing users to visually build and analyze workflows
  • [x] Chrome Viewport Streaming – Introduce method to stream Chrome viewport to user’s browser in real time
  • [x] Historical Run UI – Replace Streamlit UI with React-based UI allowing visualization of historical runs and their results
  • [X] Auto Workflow Builder (“Observer” Mode) – Allow Skyvern to automatically generate workflows while browsing the web, making it easier to build new workflows
  • [x] Prompt Caching – Introduce caching layer for LLM calls, significantly reducing Skyvern running costs
  • [x] Web Evaluation Dataset – Integrate Skyvern with public benchmark tests to track model quality over time
  • [ ] Improved Debug Mode – Allow Skyvern to plan actions and get “approval” before execution, facilitating debugging and prompt iteration
  • [ ] Chrome Extension – Allow users to interact with Skyvern through Chrome extension
  • [ ] Skyvern Action Recorder – Allow Skyvern to observe users completing tasks and automatically generate workflows
  • [ ] Interactive Live Streaming – Allow users to interact with streams in real time for intervention when necessary
  • [ ] Integrated LLM Observability Tools – Integrate LLM observability tools allowing backtesting of prompt changes with specific datasets
  • [x] Langchain Integration – Create integration in langchain_community to use Skyvern as a “tool”

Frequently Asked Questions

How is Skyvern different from traditional RPA tools?

Traditional RPA tools typically rely on recording and playback techniques or scripts based on XPath/CSS selectors—methods that often fail when website layouts change. Skyvern uses LLMs and computer vision to understand web page content, adapt to layout changes, handle never-before-seen websites, and employ reasoning capabilities for complex situations.

Can Skyvern handle websites that require login?

Yes, Skyvern supports multiple authentication methods, including username/password login and two-factor authentication (2FA). It supports QR code-based, email-based, and SMS-based 2FA, and can integrate with password managers like Bitwarden.

How does Skyvern ensure data security?

When using local deployment, all data remains in your environment. Skyvern’s open-source version doesn’t include the anti-bot detection features available in the cloud service, but the core automation logic is identical. If you have licensing questions, you can contact the support team.

Which browsers does Skyvern support?

Skyvern is primarily optimized for Chromium-based browsers (like Google Chrome, Microsoft Edge) and interacts with browsers through Chrome DevTools Protocol (CDP). It supports connecting to both local and remote browser instances.

What if Skyvern gets stuck while executing a task?

Skyvern provides multiple debugging tools:

  • Live streaming functionality lets you observe the execution process
  • Detailed task history allows reviewing each operation step
  • You can intervene in task execution through UI or code
  • Comprehensive logging helps diagnose issues

How well does Skyvern perform?

According to WebBench benchmark tests, Skyvern achieves 64.4% accuracy on overall tasks, with particularly outstanding performance on “write” tasks (like form filling, login, file downloads, etc.), which are core requirements for RPA scenarios.

Can Skyvern’s behavior be customized?

Yes, Skyvern offers multiple customization methods:

  • Define output format through data extraction schemas
  • Define stopping conditions through error codes
  • Support for custom workflows combining multiple tasks
  • Integration of custom code blocks

Conclusion

Skyvern represents a significant advancement in the field of browser automation. By combining LLMs and computer vision, it addresses the fundamental limitations of traditional automation methods. It doesn’t require writing specific code for each website, can adapt to website layout changes, and possesses the ability to handle complex situations.

Whether you’re a business looking to automate repetitive workflows or a developer seeking more reliable browser automation solutions, Skyvern is worth trying. Its open-source version provides complete core functionality, while the cloud service offers convenience for users who don’t want to manage infrastructure.

As AI technology continues to evolve, tools like Skyvern have the potential to fundamentally change how we interact with web applications, freeing people from repetitive tasks and allowing them to focus on more valuable work.