Site icon Efficient Coder

How to Build a Web-Browsing AI Agent Using MCP & OpenAI’s gpt-oss: A Hands-On Guide for Developers

Build Your Own Web-Browsing AI Agent with MCP and OpenAI gpt-oss

A hands-on guide for junior developers, content creators, and curious minds

Table of Contents

  1. Why This Guide Exists
  2. What You Will Build
  3. Background: The MCP Ecosystem
  4. Prerequisites: Tools & Accounts
  5. Project 1: Local Browser Agent
  6. Project 2: Hugging Face MCP Hub
  7. Frequently Asked Questions
  8. Next Steps & Roadmap

Why This Guide Exists

If you have ever wished for an assistant that can open web pages, grab the latest AI model rankings, and even create images for your blog—all without you touching a browser—this tutorial is for you.
We will use MCP (Model-Context-Protocol) and OpenAI gpt-oss-120B through Fireworks AI. All code is already inside the /browser-agent and /hf-mcp-server directories of the repository, so you can copy-paste and start experimenting immediately.


What You Will Build

Project Core Skill End Result
Local Browser Agent Web automation An agent that browses, searches, screenshots, and reports back in plain English
Hugging Face MCP Hub AI-space orchestration An agent that can call thousands of Hugging Face Spaces (text-to-image, text-to-video, etc.)

No GPU is required; everything runs either locally or on managed services.


Background: The MCP Ecosystem

1. What is MCP?

MCP is a lightweight protocol that lets any language model use external tools through a standardized interface.
Think of it as a USB-C port for AI: one plug, many devices.

2. Three Moving Parts

Part Real-World Analogy Responsibility
MCP Server Browser plug-in Exposes one specific capability (e.g., open Chrome, call an API)
MCP Client Browser itself Decides which plug-ins to load, passes user requests, returns results
Agent End user Writes or speaks a request in natural language

3. Why not use plain Function Calling?

Function Calling is tied to a single provider. MCP is provider-agnostic: today you run gpt-oss, tomorrow you swap to another model—no code change required.


Prerequisites: Tools & Accounts

Item Version Purpose
Node.js 18 or higher Runs the Playwright MCP Server
Python 3.9 or higher Runs the Tiny Agents client (optional)
Git any Clone the repository
Hugging Face Token free Authenticates you for both demos

Get Your Hugging Face Token

  1. Visit https://huggingface.co/settings/tokens
  2. Create a token with Write permission
  3. In your terminal:
    huggingface-cli login
    

    Paste the token when prompted.


Project 1: Local Browser Agent

Goal: In under 15 minutes, have an agent that can open websites, search, and take screenshots for you.

Step 0: Quick Checklist

  • [ ] Node 18+ installed
  • [ ] Hugging Face token saved
  • [ ] Terminal open in the project root

Step 1: Explore the Provided Files

All required files live in /browser-agent:

  • agent.json – tells the Tiny Agents client which model and tools to use
  • PROMPT.md – optional system prompt that tells the AI to plan, reflect, and never guess

Step 2: Understand agent.json

Open browser-agent/agent.json:

{
  "model": "openai/gpt-oss-120b",
  "provider": "fireworks-ai",
  "servers": [
    {
      "type": "stdio",
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  ]
}
Key Meaning
model The exact model string expected by Fireworks AI
provider Where the inference actually happens
servers A single entry that launches the Playwright MCP Server via Node

The Playwright MCP Server exposes browser automation actions such as:

  • navigate(url)
  • screenshot()
  • click(selector)
  • type(selector, text)

Step 3: Install the Python Tiny Agents Client

If you prefer Python:

pip install -U "huggingface_hub[mcp]>=0.32.0"

If you prefer Node:

npm install -g @huggingface/tiny-agents

Step 4: Run the Agent

# Python
tiny-agents run ./browser-agent

# Node
npx @huggingface/tiny-agents run ./browser-agent

The first run downloads Chromium through Playwright; grab a coffee.

Step 5: Talk to Your Agent

Example prompt:

“Please open https://huggingface.co/models, sort by ‘Most Downloads this week’, grab the top 10 model names, and save a screenshot of the list.”

The agent will:

  1. Launch a headless browser
  2. Navigate to the URL
  3. Parse the table
  4. Return a neat list plus a screenshot saved locally
CLI screenshot

Project 2: Hugging Face MCP Hub

Goal: Let your agent tap into thousands of AI Spaces on Hugging Face (text-to-image, text-to-video, audio, etc.).

Step 1: Register Spaces on Hugging Face

  1. Visit https://hf.co/mcp
  2. Click “Add” next to any Space you like.
    Popular picks:
    • evalstate/FLUX.1-Krea-dev – high-quality text-to-image
    • evalstate/ltx-video-distilled – text-to-video
  3. Note your User Access Token (same as before).

Step 2: Create a New Folder

Create hf-mcp-server and place inside:

{
  "model": "openai/gpt-oss-120b",
  "provider": "fireworks-ai",
  "inputs": [
    {
      "type": "promptString",
      "id": "hf-token",
      "description": "Your Hugging Face Token",
      "password": true
    }
  ],
  "servers": [
    {
      "type": "http",
      "url": "https://huggingface.co/mcp",
      "headers": {
        "Authorization": "Bearer ${input:hf-token}"
      }
    }
  ]
}
New Element Purpose
inputs Prompts you for the token at runtime—safer than hard-coding
servers Points to Hugging Face’s MCP gateway, which forwards calls to the Spaces you registered

Step 3: Run It

# Python
tiny-agents run ./hf-mcp-server

# Node
npx @huggingface/tiny-agents run ./hf-mcp-server

Step 4: Prompt Examples

“Using FLUX.1, generate a 1024×1024 image of an astronaut eating ramen on the moon. Use cinematic lighting.”

The agent will:

  • Ask for confirmation
  • Call the Space
  • Return a shareable URL to the generated image
Video demo

Frequently Asked Questions

Q1: Do I need a GPU?

No. Inference happens on Fireworks AI’s cloud or Hugging Face’s cloud.

Q2: Is my token safe?

Yes. The token is never written to disk; it lives only in memory and is masked when typed.

Q3: Can I run both projects at the same time?

Absolutely. Each project has its own agent.json and can be started in separate terminals.

Q4: How do I add a custom tool?

Write a small MCP Server in any language (templates are available in the MCP docs), then add one more entry under servers.

Q5: What if I want to use a different model?

Change the model and provider fields in agent.json. As long as the new provider supports MCP, no further code changes are needed.

Q6: Can I run this on Windows?

Yes. Both Node and Python clients are cross-platform.


Next Steps & Roadmap

Time Investment Action Outcome
30 min Re-run both demos with your own prompts Solid muscle memory
1 day Wrap your company’s REST API as an MCP Server Internal AI assistant
1 week Chain multiple agents Browser agent gathers data, Hugging Face agent creates visuals, database agent stores results

Closing Thoughts

MCP turns the chaotic world of AI tools into tidy building blocks.
Today you connected two blocks—browser automation and Hugging Face Spaces—without changing a single line of server code.
Tomorrow you can snap in a database block, an email block, or a custom analytics block, and your agent will keep working exactly as before.

Clone the repo, run the commands, and start experimenting. The only limit is your imagination.

Exit mobile version