Building Trustworthy Web-Automation Agents in 15 Minutes with Notte
“I need AI to scrape job posts for me, but CAPTCHAs keep blocking the log-in.”
“Our team has to pull data from hundreds of supplier sites. Old-school crawlers break every time the layout changes, while pure AI is too expensive. Is there a middle ground?”
If either sentence sounds familiar, this article is for you.
Table of Contents
-
What exactly is Notte, and why should you care? -
Five-minute install and first run -
Local quick win: let an agent scroll through cat memes on Google Images -
Taking it to the cloud: managed browsers, auto-CAPTCHA, and proxies -
Core features, plain and simple -
Structured output: turn any page into Python objects -
Vault: enterprise-grade credential storage -
Persona: disposable e-mail, phone, and 2FA in one call -
Stealth: built-in CAPTCHA solving and proxy rotation -
Hybrid workflows: scripting plus AI to cut costs
-
-
Three end-to-end walkthroughs -
Scraping top posts from Hacker News -
Uploading a PDF and downloading the receipt -
Bulk form submission with auto-generated identities
-
-
Benchmarks: why Notte finishes tasks in half the time -
Frequently asked questions -
Next steps and further reading
1. What exactly is Notte, and why should you care?
In one line:
Notte is a full-stack framework that stitches traditional web-automation scripts and large-language-model reasoning together.
The goal is to let you finish any browser-based task with the least code, the lowest bill, and the highest reliability.
It ships in two layers:
Component | Open-source core | Managed service (recommended) | Typical use case |
---|---|---|---|
Browser session | Local Playwright | Cloud with auto-CAPTCHA + proxy | You do not want to run Chrome yourself |
Agent | Python SDK calls any LLM | Same plus identities, files, cookies | You need advanced credential handling |
Key selling points
-
Cheaper: deterministic steps stay in code; the LLM is invoked only when reasoning is required. Internal tests show token cost savings above 50 %. -
Stable: the managed fleet carries CAPTCHA solvers, residential proxies, and anti-detection patches. -
Faster: median task duration is 47 s versus 113 s for the closest open-source alternative. -
Simpler: write locally, then swap notte
forcli
to move to the cloud—no other change required.
2. Five-minute install and first run
System requirements
-
Python 3.11 or newer -
A computer you control (laptop, on-prem server, or cloud VM)
Steps
# 1) Install the package
pip install notte
# 2) Install the browser (skip if you plan to use the cloud only)
patchright install --with-deps chromium
# 3) Provide an LLM key
# Any OpenAI-compatible endpoint will work. Place it in .env.
echo "OPENAI_API_KEY=sk-xxx" >> .env
Quick smoke test
Save the snippet below as quick_test.py
and run it.
A browser window should open, navigate to https://example.com
, and close after five seconds.
import notte
from dotenv import load_dotenv
load_dotenv()
with notte.Session(headless=False) as session:
agent = notte.Agent(
session=session,
reasoning_model="gemini/gemini-2.5-flash",
max_steps=5
)
agent.run("visit https://example.com and take a screenshot")
3. Local quick win: let an agent scroll through cat memes on Google Images
import notte
from dotenv import load_dotenv
load_dotenv()
with notte.Session(headless=False) as session:
agent = notte.Agent(
session=session,
reasoning_model="gemini/gemini-2.5-flash",
max_steps=30
)
response = agent.run(
"search for cat memes on Google Images and scroll down three full screens"
)
print(response.answer)
What actually happens?
-
A Chromium window starts. -
The agent types https://images.google.com
in the address bar. -
It fills the search box with “cat memes” and presses Enter. -
It scrolls three times, waits for images to load, and summarizes the result in plain English.
4. Taking it to the cloud: managed browsers, auto-CAPTCHA, and proxies
If you prefer not to babysit Chrome, switch to the managed service in three steps:
-
Register at Notte Console and copy your API key. -
Replace import notte
withfrom notte_sdk import NotteClient
. -
Prefix every object with cli.
Example:
from notte_sdk import NotteClient
cli = NotteClient(api_key="nt-xxx")
with cli.Session(headless=False) as session:
agent = cli.Agent(
session=session,
reasoning_model="gemini/gemini-2.5-flash",
max_steps=30
)
agent.run("scroll through cat memes on Google Images")
Local vs. cloud comparison
Dimension | Open-source core | Managed service |
---|---|---|
Browser | You maintain | Fully hosted |
CAPTCHA | Manual | Automatic |
Proxy | DIY | Residential rotation built-in |
Concurrency | Single machine | Horizontal scaling |
Cost | Free | Pay-as-you-go |
5. Core features, plain and simple
5.1 Structured output: turn any page into Python objects
Pain point: traditional crawlers rely on brittle XPath/CSS selectors.
Notte’s approach: tell the agent which fields you need and let it figure out the rest.
from notte_sdk import NotteClient
from pydantic import BaseModel
from typing import List
class HackerNewsPost(BaseModel):
title: str
url: str
points: int
author: str
comments_count: int
class TopPosts(BaseModel):
posts: List[HackerNewsPost]
cli = NotteClient()
with cli.Session(headless=False, browser_type="firefox") as session:
agent = cli.Agent(
session=session,
reasoning_model="gemini/gemini-2.5-flash",
max_steps=15
)
response = agent.run(
task="go to news.ycombinator.com and extract the top 5 posts",
response_format=TopPosts
)
print(response.answer.posts[0])
Sample output
HackerNewsPost(
title='Show HN: A pocket-sized E-ink terminal',
url='https://github.com/foo/bar',
points=512,
author='baz',
comments_count=97
)
5.2 Vault: enterprise-grade credential storage
Scenario: you need to log in to an internal dashboard but do not want plain-text passwords in the repo.
from notte_sdk import NotteClient
cli = NotteClient()
with cli.Vault() as vault, cli.Session(headless=False) as session:
vault.add_credentials(
url="https://x.com",
username="you@corp.com",
password="SuperSecret123"
)
agent = cli.Agent(session=session, vault=vault, max_steps=10)
agent.run("log in to Twitter and open the messages tab")
Benefits
-
Credentials are AES-encrypted at rest. -
Reused automatically for the same origin. -
Supports TOTP, SSO, and MFA tokens.
5.3 Persona: disposable e-mail, phone, and 2FA in one call
Scenario: load-testing a registration funnel requires 100 unique accounts.
from notte_sdk import NotteClient
cli = NotteClient()
with cli.Persona(create_phone_number=True) as persona:
with cli.Session(browser_type="firefox", headless=False) as session:
agent = cli.Agent(session=session, persona=persona, max_steps=15)
agent.run(
"open the Google Form and RSVP yes using the persona’s details",
url="https://forms.google.com/your-form-url"
)
Under the hood
-
A random name, e-mail, and phone are generated. -
If SMS verification is required, the platform receives the code and fills it in. -
Everything is discarded after the session unless you explicitly save it.
5.4 Stealth: built-in CAPTCHA solving and proxy rotation
Built-in proxy + solver
with cli.Session(
solve_captchas=True,
proxies=True, # rotates US residential IPs
browser_type="firefox",
headless=False
) as session:
agent = cli.Agent(session=session, max_steps=5)
agent.run("solve the CAPTCHA demo at https://www.google.com/recaptcha/api2/demo")
Custom proxy
from notte_sdk.types import ExternalProxy
proxy = ExternalProxy(
server="http://proxy.corp.com:8080",
username="corpUser",
password="corpPass"
)
with cli.Session(proxies=[proxy]) as session:
agent = cli.Agent(session=session, max_steps=5)
agent.run("navigate through the corporate proxy")
5.5 Hybrid workflows: scripting plus AI to cut costs
Idea: keep deterministic navigation in code; bring in the agent only for reasoning-heavy steps.
from notte_sdk import NotteClient
import time
cli = NotteClient()
with cli.Session(headless=False, perception_type="fast") as page:
# Script: deterministic navigation
page.execute(type="goto",
value="https://www.quince.com/women/organic-stretch-cotton-chino-short")
page.observe()
# Agent: reason about color and size
agent = cli.Agent(session=page)
agent.run("select ivory color in size 6")
# Script: deterministic checkout
page.execute(type="click", selector='button[name="ADD TO CART"]')
page.execute(type="click", selector='button[name="CHECKOUT"]')
time.sleep(5)
Outcome
-
Deterministic steps cost zero tokens. -
The agent handles only the messy parts, improving reliability.
6. Three end-to-end walkthroughs
6.1 Scraping top posts from Hacker News
Goal: feed the daily top 30 posts into an internal knowledge base.
Code: reuse the snippet in section 5.1.
Automation: add a cron job that runs the script every morning and pushes the JSON to your database.
6.2 Uploading a PDF and downloading the receipt
Goal: file tax reports without manual clicks.
from notte_sdk import NotteClient
cli = NotteClient()
storage = cli.FileStorage()
storage.upload("/tmp/report.pdf")
with cli.Session(storage=storage) as session:
agent = cli.Agent(session=session, max_steps=10)
agent.run(
"log in to the tax portal, upload report.pdf, and download the receipt into storage"
)
receipts = storage.list(type="downloads")
storage.download(file_name=receipts[0], local_dir="./archive")
6.3 Bulk form submission with auto-generated identities
Goal: generate 100 test votes for a conference feedback form.
Approach: loop over cli.Persona()
and submit once per identity.
7. Benchmarks: why Notte finishes tasks in half the time
The maintainers ran 100 public tasks and compared three frameworks (raw data in the README):
Framework | Self-reported success | Third-party eval | Median time | Reliability |
---|---|---|---|---|
Notte | 86.2 % | 79.0 % | 47 s | 96.6 % |
Browser-Use | 77.3 % | 60.2 % | 113 s | 83.3 % |
Convergence |