How to Build a Web-Browsing AI Agent Using MCP & OpenAI’s gpt-oss: A Hands-On Guide for Developers

高效码农

5 months ago

Build Your Own Web-Browsing AI Agent with MCP and OpenAI gpt-oss

A hands-on guide for junior developers, content creators, and curious minds

Why This Guide Exists
What You Will Build
Background: The MCP Ecosystem
Prerequisites: Tools & Accounts
Project 1: Local Browser Agent
Project 2: Hugging Face MCP Hub
Frequently Asked Questions
Next Steps & Roadmap

Why This Guide Exists

If you have ever wished for an assistant that can open web pages, grab the latest AI model rankings, and even create images for your blog—all without you touching a browser—this tutorial is for you.
We will use MCP (Model-Context-Protocol) and OpenAI gpt-oss-120B through Fireworks AI. All code is already inside the /browser-agent and /hf-mcp-server directories of the repository, so you can copy-paste and start experimenting immediately.

What You Will Build

Project	Core Skill	End Result
Local Browser Agent	Web automation	An agent that browses, searches, screenshots, and reports back in plain English
Hugging Face MCP Hub	AI-space orchestration	An agent that can call thousands of Hugging Face Spaces (text-to-image, text-to-video, etc.)

No GPU is required; everything runs either locally or on managed services.

Background: The MCP Ecosystem

1. What is MCP?

MCP is a lightweight protocol that lets any language model use external tools through a standardized interface.
Think of it as a USB-C port for AI: one plug, many devices.

2. Three Moving Parts

Part	Real-World Analogy	Responsibility
MCP Server	Browser plug-in	Exposes one specific capability (e.g., open Chrome, call an API)
MCP Client	Browser itself	Decides which plug-ins to load, passes user requests, returns results
Agent	End user	Writes or speaks a request in natural language

3. Why not use plain Function Calling?

Function Calling is tied to a single provider. MCP is provider-agnostic: today you run gpt-oss, tomorrow you swap to another model—no code change required.

Prerequisites: Tools & Accounts

Item	Version	Purpose
Node.js	18 or higher	Runs the Playwright MCP Server
Python	3.9 or higher	Runs the Tiny Agents client (optional)
Git	any	Clone the repository
Hugging Face Token	free	Authenticates you for both demos

Get Your Hugging Face Token

Visit https://huggingface.co/settings/tokens
Create a token with Write permission
In your terminal:
```
huggingface-cli login
```
Paste the token when prompted.

Project 1: Local Browser Agent

Goal: In under 15 minutes, have an agent that can open websites, search, and take screenshots for you.

Step 0: Quick Checklist

[ ] Node 18+ installed
[ ] Hugging Face token saved
[ ] Terminal open in the project root

Step 1: Explore the Provided Files

All required files live in /browser-agent:

agent.json – tells the Tiny Agents client which model and tools to use
PROMPT.md – optional system prompt that tells the AI to plan, reflect, and never guess

Step 2: Understand agent.json

Open browser-agent/agent.json:

{
  "model": "openai/gpt-oss-120b",
  "provider": "fireworks-ai",
  "servers": [
    {
      "type": "stdio",
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  ]
}

Key	Meaning
`model`	The exact model string expected by Fireworks AI
`provider`	Where the inference actually happens
`servers`	A single entry that launches the Playwright MCP Server via Node

The Playwright MCP Server exposes browser automation actions such as:

navigate(url)
screenshot()
click(selector)
type(selector, text)

Step 3: Install the Python Tiny Agents Client

If you prefer Python:

pip install -U "huggingface_hub[mcp]>=0.32.0"

If you prefer Node:

npm install -g @huggingface/tiny-agents

Step 4: Run the Agent

# Python
tiny-agents run ./browser-agent

# Node
npx @huggingface/tiny-agents run ./browser-agent

The first run downloads Chromium through Playwright; grab a coffee.

Step 5: Talk to Your Agent

Example prompt:

“Please open https://huggingface.co/models, sort by ‘Most Downloads this week’, grab the top 10 model names, and save a screenshot of the list.”

The agent will:

Launch a headless browser
Navigate to the URL
Parse the table
Return a neat list plus a screenshot saved locally

Project 2: Hugging Face MCP Hub

Goal: Let your agent tap into thousands of AI Spaces on Hugging Face (text-to-image, text-to-video, audio, etc.).

Step 1: Register Spaces on Hugging Face

Visit https://hf.co/mcp
Click “Add” next to any Space you like.
Popular picks:
- evalstate/FLUX.1-Krea-dev – high-quality text-to-image
- evalstate/ltx-video-distilled – text-to-video
Note your User Access Token (same as before).

Step 2: Create a New Folder

Create hf-mcp-server and place inside:

{
  "model": "openai/gpt-oss-120b",
  "provider": "fireworks-ai",
  "inputs": [
    {
      "type": "promptString",
      "id": "hf-token",
      "description": "Your Hugging Face Token",
      "password": true
    }
  ],
  "servers": [
    {
      "type": "http",
      "url": "https://huggingface.co/mcp",
      "headers": {
        "Authorization": "Bearer ${input:hf-token}"
      }
    }
  ]
}

New Element	Purpose
`inputs`	Prompts you for the token at runtime—safer than hard-coding
`servers`	Points to Hugging Face’s MCP gateway, which forwards calls to the Spaces you registered

Step 3: Run It

# Python
tiny-agents run ./hf-mcp-server

# Node
npx @huggingface/tiny-agents run ./hf-mcp-server

Step 4: Prompt Examples

“Using FLUX.1, generate a 1024×1024 image of an astronaut eating ramen on the moon. Use cinematic lighting.”

The agent will:

Ask for confirmation
Call the Space
Return a shareable URL to the generated image

Frequently Asked Questions

Q1: Do I need a GPU?

No. Inference happens on Fireworks AI’s cloud or Hugging Face’s cloud.

Q2: Is my token safe?

Yes. The token is never written to disk; it lives only in memory and is masked when typed.

Q3: Can I run both projects at the same time?

Absolutely. Each project has its own agent.json and can be started in separate terminals.

Q4: How do I add a custom tool?

Write a small MCP Server in any language (templates are available in the MCP docs), then add one more entry under servers.

Q5: What if I want to use a different model?

Change the model and provider fields in agent.json. As long as the new provider supports MCP, no further code changes are needed.

Q6: Can I run this on Windows?

Yes. Both Node and Python clients are cross-platform.

Next Steps & Roadmap

Time Investment	Action	Outcome
30 min	Re-run both demos with your own prompts	Solid muscle memory
1 day	Wrap your company’s REST API as an MCP Server	Internal AI assistant
1 week	Chain multiple agents	Browser agent gathers data, Hugging Face agent creates visuals, database agent stores results

Closing Thoughts

MCP turns the chaotic world of AI tools into tidy building blocks.
Today you connected two blocks—browser automation and Hugging Face Spaces—without changing a single line of server code.
Tomorrow you can snap in a database block, an email block, or a custom analytics block, and your agent will keep working exactly as before.

Clone the repo, run the commands, and start experimenting. The only limit is your imagination.