Building Trustworthy Web-Automation Agents in 15 Minutes with Notte

“I need AI to scrape job posts for me, but CAPTCHAs keep blocking the log-in.”
“Our team has to pull data from hundreds of supplier sites. Old-school crawlers break every time the layout changes, while pure AI is too expensive. Is there a middle ground?”
If either sentence sounds familiar, this article is for you.


Table of Contents

  1. What exactly is Notte, and why should you care?
  2. Five-minute install and first run
  3. Local quick win: let an agent scroll through cat memes on Google Images
  4. Taking it to the cloud: managed browsers, auto-CAPTCHA, and proxies
  5. Core features, plain and simple

    • Structured output: turn any page into Python objects
    • Vault: enterprise-grade credential storage
    • Persona: disposable e-mail, phone, and 2FA in one call
    • Stealth: built-in CAPTCHA solving and proxy rotation
    • Hybrid workflows: scripting plus AI to cut costs
  6. Three end-to-end walkthroughs

    • Scraping top posts from Hacker News
    • Uploading a PDF and downloading the receipt
    • Bulk form submission with auto-generated identities
  7. Benchmarks: why Notte finishes tasks in half the time
  8. Frequently asked questions
  9. Next steps and further reading

1. What exactly is Notte, and why should you care?

In one line:
Notte is a full-stack framework that stitches traditional web-automation scripts and large-language-model reasoning together.
The goal is to let you finish any browser-based task with the least code, the lowest bill, and the highest reliability.

It ships in two layers:

Component Open-source core Managed service (recommended) Typical use case
Browser session Local Playwright Cloud with auto-CAPTCHA + proxy You do not want to run Chrome yourself
Agent Python SDK calls any LLM Same plus identities, files, cookies You need advanced credential handling

Key selling points

  • Cheaper: deterministic steps stay in code; the LLM is invoked only when reasoning is required. Internal tests show token cost savings above 50 %.
  • Stable: the managed fleet carries CAPTCHA solvers, residential proxies, and anti-detection patches.
  • Faster: median task duration is 47 s versus 113 s for the closest open-source alternative.
  • Simpler: write locally, then swap notte for cli to move to the cloud—no other change required.

2. Five-minute install and first run

System requirements

  • Python 3.11 or newer
  • A computer you control (laptop, on-prem server, or cloud VM)

Steps

# 1) Install the package
pip install notte

# 2) Install the browser (skip if you plan to use the cloud only)
patchright install --with-deps chromium

# 3) Provide an LLM key
# Any OpenAI-compatible endpoint will work. Place it in .env.
echo "OPENAI_API_KEY=sk-xxx" >> .env

Quick smoke test

Save the snippet below as quick_test.py and run it.
A browser window should open, navigate to https://example.com, and close after five seconds.

import notte
from dotenv import load_dotenv
load_dotenv()

with notte.Session(headless=False) as session:
    agent = notte.Agent(
        session=session,
        reasoning_model="gemini/gemini-2.5-flash",
        max_steps=5
    )
    agent.run("visit https://example.com and take a screenshot")

3. Local quick win: let an agent scroll through cat memes on Google Images

import notte
from dotenv import load_dotenv
load_dotenv()

with notte.Session(headless=False) as session:
    agent = notte.Agent(
        session=session,
        reasoning_model="gemini/gemini-2.5-flash",
        max_steps=30
    )
    response = agent.run(
        "search for cat memes on Google Images and scroll down three full screens"
    )
    print(response.answer)

What actually happens?

  1. A Chromium window starts.
  2. The agent types https://images.google.com in the address bar.
  3. It fills the search box with “cat memes” and presses Enter.
  4. It scrolls three times, waits for images to load, and summarizes the result in plain English.

4. Taking it to the cloud: managed browsers, auto-CAPTCHA, and proxies

If you prefer not to babysit Chrome, switch to the managed service in three steps:

  1. Register at Notte Console and copy your API key.
  2. Replace import notte with from notte_sdk import NotteClient.
  3. Prefix every object with cli.

Example:

from notte_sdk import NotteClient

cli = NotteClient(api_key="nt-xxx")

with cli.Session(headless=False) as session:
    agent = cli.Agent(
        session=session,
        reasoning_model="gemini/gemini-2.5-flash",
        max_steps=30
    )
    agent.run("scroll through cat memes on Google Images")

Local vs. cloud comparison

Dimension Open-source core Managed service
Browser You maintain Fully hosted
CAPTCHA Manual Automatic
Proxy DIY Residential rotation built-in
Concurrency Single machine Horizontal scaling
Cost Free Pay-as-you-go

5. Core features, plain and simple

5.1 Structured output: turn any page into Python objects

Pain point: traditional crawlers rely on brittle XPath/CSS selectors.
Notte’s approach: tell the agent which fields you need and let it figure out the rest.

from notte_sdk import NotteClient
from pydantic import BaseModel
from typing import List

class HackerNewsPost(BaseModel):
    title: str
    url: str
    points: int
    author: str
    comments_count: int

class TopPosts(BaseModel):
    posts: List[HackerNewsPost]

cli = NotteClient()
with cli.Session(headless=False, browser_type="firefox") as session:
    agent = cli.Agent(
        session=session,
        reasoning_model="gemini/gemini-2.5-flash",
        max_steps=15
    )
    response = agent.run(
        task="go to news.ycombinator.com and extract the top 5 posts",
        response_format=TopPosts
    )
print(response.answer.posts[0])

Sample output

HackerNewsPost(
  title='Show HN: A pocket-sized E-ink terminal',
  url='https://github.com/foo/bar',
  points=512,
  author='baz',
  comments_count=97
)

5.2 Vault: enterprise-grade credential storage

Scenario: you need to log in to an internal dashboard but do not want plain-text passwords in the repo.

from notte_sdk import NotteClient

cli = NotteClient()

with cli.Vault() as vault, cli.Session(headless=False) as session:
    vault.add_credentials(
        url="https://x.com",
        username="you@corp.com",
        password="SuperSecret123"
    )
    agent = cli.Agent(session=session, vault=vault, max_steps=10)
    agent.run("log in to Twitter and open the messages tab")

Benefits

  • Credentials are AES-encrypted at rest.
  • Reused automatically for the same origin.
  • Supports TOTP, SSO, and MFA tokens.

5.3 Persona: disposable e-mail, phone, and 2FA in one call

Scenario: load-testing a registration funnel requires 100 unique accounts.

from notte_sdk import NotteClient

cli = NotteClient()

with cli.Persona(create_phone_number=True) as persona:
    with cli.Session(browser_type="firefox", headless=False) as session:
        agent = cli.Agent(session=session, persona=persona, max_steps=15)
        agent.run(
            "open the Google Form and RSVP yes using the persona’s details",
            url="https://forms.google.com/your-form-url"
        )

Under the hood

  1. A random name, e-mail, and phone are generated.
  2. If SMS verification is required, the platform receives the code and fills it in.
  3. Everything is discarded after the session unless you explicitly save it.

5.4 Stealth: built-in CAPTCHA solving and proxy rotation

Built-in proxy + solver

with cli.Session(
    solve_captchas=True,
    proxies=True,         # rotates US residential IPs
    browser_type="firefox",
    headless=False
) as session:
    agent = cli.Agent(session=session, max_steps=5)
    agent.run("solve the CAPTCHA demo at https://www.google.com/recaptcha/api2/demo")

Custom proxy

from notte_sdk.types import ExternalProxy

proxy = ExternalProxy(
    server="http://proxy.corp.com:8080",
    username="corpUser",
    password="corpPass"
)

with cli.Session(proxies=[proxy]) as session:
    agent = cli.Agent(session=session, max_steps=5)
    agent.run("navigate through the corporate proxy")

5.5 Hybrid workflows: scripting plus AI to cut costs

Idea: keep deterministic navigation in code; bring in the agent only for reasoning-heavy steps.

from notte_sdk import NotteClient
import time

cli = NotteClient()

with cli.Session(headless=False, perception_type="fast") as page:
    # Script: deterministic navigation
    page.execute(type="goto",
                 value="https://www.quince.com/women/organic-stretch-cotton-chino-short")
    page.observe()

    # Agent: reason about color and size
    agent = cli.Agent(session=page)
    agent.run("select ivory color in size 6")

    # Script: deterministic checkout
    page.execute(type="click", selector='button[name="ADD TO CART"]')
    page.execute(type="click", selector='button[name="CHECKOUT"]')
    time.sleep(5)

Outcome

  • Deterministic steps cost zero tokens.
  • The agent handles only the messy parts, improving reliability.

6. Three end-to-end walkthroughs

6.1 Scraping top posts from Hacker News

Goal: feed the daily top 30 posts into an internal knowledge base.

Code: reuse the snippet in section 5.1.
Automation: add a cron job that runs the script every morning and pushes the JSON to your database.

6.2 Uploading a PDF and downloading the receipt

Goal: file tax reports without manual clicks.

from notte_sdk import NotteClient

cli = NotteClient()
storage = cli.FileStorage()

storage.upload("/tmp/report.pdf")

with cli.Session(storage=storage) as session:
    agent = cli.Agent(session=session, max_steps=10)
    agent.run(
        "log in to the tax portal, upload report.pdf, and download the receipt into storage"
    )

receipts = storage.list(type="downloads")
storage.download(file_name=receipts[0], local_dir="./archive")

6.3 Bulk form submission with auto-generated identities

Goal: generate 100 test votes for a conference feedback form.

Approach: loop over cli.Persona() and submit once per identity.


7. Benchmarks: why Notte finishes tasks in half the time

The maintainers ran 100 public tasks and compared three frameworks (raw data in the README):

Framework Self-reported success Third-party eval Median time Reliability
Notte 86.2 % 79.0 % 47 s 96.6 %
Browser-Use 77.3 % 60.2 % 113 s 83.3 %
Convergence