Local Google Search Tool: Achieve Automated Searches Without Relying on APIs

In an era of information overload, quickly accessing accurate search results has become the foundation for many work and research tasks. However, traditional methods of obtaining search engine results often face limitations—either they depend on paid APIs or struggle with anti-scraping mechanisms. The tool we’ll explore today solves these problems: it’s a Node.js tool built on Playwright that enables local Google searches, bypasses anti-scraping restrictions, and even provides real-time search capabilities for AI assistants.

What Problems Does This Tool Solve?

If you frequently need to retrieve Google search results in bulk, you’ve likely encountered these frustrations: paid SERP (Search Engine Results Page) APIs are costly and have call limits; custom-built scrapers get detected repeatedly, leading to temporary account bans; and finding a suitable tool to help AI assistants access real-time information feels impossible.

This Google search tool is designed specifically to address these issues. It runs entirely locally, no third-party API services required. By simulating real user behavior, it avoids anti-scraping detection. Plus, it integrates seamlessly with AI assistants like Claude, giving AI the ability to search in real time.

Core Features: Why It’s Worth Trying

1. Local Alternative to Paid SERP APIs

You no longer need to pay for search engine result APIs or worry about hitting call limits. All search operations happen on your local device, so data acquisition costs are nearly zero—and you have full control over the process.

2. Intelligent Bypass of Anti-Robot Detection

Google’s anti-scraping mechanisms are growing more sophisticated, and ordinary automation tools get flagged easily. This tool uses a combination of techniques to counter this:

Dynamically manages browser fingerprints, making each request appear to come from a real user on a different device
Automatically saves and restores browser state (e.g., cookies, login data) to reduce repeated verifications
Switches from headless mode to headed mode automatically when verification pages appear, making it easy to complete manual checks
Randomizes device models, region settings, and other parameters to lower the risk of being labeled a robot

3. Comprehensive Result Processing Capabilities

Beyond extracting structured data like titles, links, and snippets, it also:

Retrieves the raw HTML of search results pages (automatically removing CSS and JavaScript for easier analysis)
Captures full-page screenshots automatically to preserve visual records
Outputs results in JSON format for easy post-processing

4. Seamless Integration with AI Assistants

Through a Model Context Protocol (MCP) server, it can directly provide real-time search capabilities to AI assistants like Claude. This means when AI needs up-to-date information, you won’t have to search manually—it will call this tool automatically to get results.

5. Fully Open-Source and Free

All code is transparent and accessible. You can modify features or extend compatibility to other search engines based on your needs, with no usage restrictions.

Technical Specifications: How It Works

This tool is built on a modern tech stack that balances stability and scalability:

Programming Language: TypeScript, which provides type safety to reduce code errors
Browser Automation: Powered by Playwright, supporting multiple browser engines (Chrome, Firefox, WebKit)
Command-Line Support: Search keywords and custom parameters can be entered directly via the command line
Output Format: Defaults to JSON, including key information like search queries, titles, links, and snippets
Operating Modes: Supports headless mode (runs in the background) and headed mode (displays the browser interface for debugging)
Logging System: Provides detailed logs to simplify troubleshooting
State Management: Saves and restores browser state to minimize interception by anti-scraping mechanisms

Installation Guide: How to Deploy It on Your Device

Whether you use Windows, Mac, or Linux, follow these steps to install the tool:

Basic Installation Steps

Clone the Repository
Open your terminal (Command Prompt or PowerShell for Windows), and run the following commands:
```
git clone https://github.com/web-agent-master/google-search.git
cd google-search
```

Install Dependencies
Three package managers are supported: npm, yarn, and pnpm. Choose the one you use regularly:

# Using npm
npm install

# Or using yarn
yarn

# Or using pnpm (recommended for efficiency)
pnpm install

Compile TypeScript Code
The tool is written in TypeScript, so you need to compile it to JavaScript first:
```
# Using npm
npm run build

# Or using yarn
yarn build

# Or using pnpm
pnpm build
```
Link to Global (Optional but Recommended)
To use the command-line tool from any directory, run the link command:
```
# Using npm
npm link

# Or using yarn
yarn link

# Or using pnpm
pnpm link
```

Special Notes for Windows Users

Windows users don’t need to worry about compatibility—this tool includes Windows-specific optimizations:

Provides .cmd files to ensure normal operation in Command Prompt and PowerShell
Automatically stores log files in the system’s temporary directory (instead of /tmp used in Linux)
Optimizes process signal handling to ensure the server shuts down properly
Supports Windows path separators (\)—no manual path conversion required

If you encounter browser installation failures on first run, try launching the terminal as an administrator and re-running the installation commands.

Usage Guide: From Basic Searches to Advanced Features

Using It as a Command-Line Tool

The most straightforward way to use the tool is via the command line. Here’s the basic syntax:

# Simple search
google-search "your search keyword"

# Example: Search for "latest artificial intelligence research"
google-search "latest artificial intelligence research"

Customizing Search Parameters

To adjust result limits, timeout duration, and other settings, use these parameters:

Parameter	Function	Example
`-l, --limit <number>`	Limits the number of results (default: 10)	`google-search "keyword" --limit 5`
`-t, --timeout <number>`	Sets timeout duration (in milliseconds, default: 60000)	`google-search "keyword" --timeout 30000`
`--no-headless`	Displays the browser interface (for debugging)	`google-search "keyword" --no-headless`
`--get-html`	Retrieves raw HTML instead of parsed results	`google-search "keyword" --get-html`
`--save-html`	Saves HTML to a file (requires `--get-html`)	`google-search "keyword" --get-html --save-html`
`--html-output <path>`	Specifies the HTML save path	`google-search "keyword" --get-html --save-html --html-output "./result.html"`

Development and Debugging Modes

If you need to modify the code or troubleshoot issues, these commands are useful:

# Run in development mode (real-time compilation)
pnpm dev "search keyword"

# Debug mode (displays browser interface to observe operations)
pnpm debug "search keyword"

# Run temporarily with npx (no global installation needed)
npx google-search-cli "search keyword"

Example Output

By default, the tool returns structured results in JSON format:

{
  "query": "deepseek",
  "results": [
    {
      "title": "DeepSeek",
      "link": "https://www.deepseek.com/",
      "snippet": "DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1. Available on web, app, and API. Click for details. Into ..."
    },
    {
      "title": "deepseek-ai/DeepSeek-V3",
      "link": "https://github.com/deepseek-ai/DeepSeek-V3",
      "snippet": "We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token."
    }
    // More results...
  ]
}

When using the --get-html parameter, it returns HTML-related information:

{
  "query": "playwright automation",
  "url": "https://www.google.com/",
  "originalHtmlLength": 1291733,
  "cleanedHtmlLength": 456789,
  "htmlPreview": "<!DOCTYPE html><html itemscope=\"\" itemtype=\"http://schema.org/SearchResultsPage\" lang=\"en\"><head>..."
}

When combined with --save-html, it also shows the save path and screenshot path:

{
  "query": "playwright automation",
  "url": "https://www.google.com/",
  "originalHtmlLength": 1292241,
  "cleanedHtmlLength": 458976,
  "savedPath": "./google-search-html/playwright_automation-2025-04-06T03-30-06-852Z.html",
  "screenshotPath": "./google-search-html/playwright_automation-2025-04-06T03-30-06-852Z.png",
  "htmlPreview": "<!DOCTYPE html><html itemscope=\"\" itemtype=\"http://schema.org/SearchResultsPage\" lang=\"en\">..."
}

Using It as an MCP Server to Provide Search Capabilities for AI Assistants

Through the Model Context Protocol (MCP), this tool can enable AI assistants like Claude to perform Google searches directly. Below are the steps to integrate it with Claude Desktop:

Prerequisites

First, ensure you’ve completed the project build:

pnpm build

Configuring Claude Desktop

Locate the Configuration File
- Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
- Windows: %APPDATA%\Claude\claude_desktop_config.json (you can enter %APPDATA%\Claude directly in the File Explorer address bar to access this folder)

Add Server Configuration
Open the configuration file, add the following content (choose the option that matches your system), then restart Claude:

Option 1: General Configuration (Recommended for Mac/Linux)

{
  "mcpServers": {
    "google-search": {
      "command": "npx",
      "args": ["google-search-mcp"]
    }
  }
}

Option 2: Windows cmd.exe Configuration

{
  "mcpServers": {
    "google-search": {
      "command": "cmd.exe",
      "args": ["/c", "npx", "google-search-mcp"]
    }
  }
}

Option 3: Direct Node Call for Windows (Recommended for Better Compatibility)

{
  "mcpServers": {
    "google-search": {
      "command": "node",
      "args": ["C:/your-installation-path/google-search/dist/mcp-server.js"]
    }
  }
}

Note: Replace C:/your-installation-path with the actual directory where you installed the tool.

After configuration, enter a command like “Search for 2024 AI industry reports” in Claude. The AI will automatically call this tool to retrieve the latest results.

Project Structure: Understanding the Tool’s Components

The tool has a clear code structure, making it easy to understand and modify for secondary development:

google-search/
├── package.json          # Project configuration and dependency management
├── tsconfig.json         # TypeScript compilation configuration
├── src/
│   ├── index.ts          # Command-line parsing and main logic entry
│   ├── search.ts         # Core search functionality (Powered by Playwright)
│   ├── mcp-server.ts     # MCP server implementation code
│   └── types.ts          # Type definitions (ensures code type safety)
├── dist/                 # Compiled JavaScript files
├── bin/                  # Executable scripts (command-line entry)
└── README.md             # Project documentation

Tech Stack Breakdown: The Core Technologies Behind It

The tool relies on mature technologies, each playing a critical role:

TypeScript: Provides static type checking to reduce runtime errors and improve code maintainability
Node.js: Serves as the runtime environment, enabling cross-platform operation on local devices
Playwright: Handles browser automation, simulates user actions, and supports multiple browser engines
Commander: Parses command-line parameters and processes user input options (e.g., --limit, --timeout)
Model Context Protocol (MCP): Implements communication protocols with AI assistants, allowing the tool to be called by AI
MCP SDK: Simplifies MCP server development and enables quick integration with AI assistants
Zod: Validates data to ensure input/output formats meet expectations and improve tool stability
pnpm: An efficient package manager that saves disk space and speeds up dependency installation

Development Guide: How to Modify and Extend the Tool

If you have development experience and want to customize the tool to your needs, refer to these common commands:

Basic Development Commands

# Install dependencies (run after first cloning the project)
pnpm install

# Install Playwright browsers (required for automation)
pnpm run postinstall

# Compile TypeScript code (run after making modifications)
pnpm build

# Clean compiled output (run before recompiling if needed)
pnpm clean

Debugging and Testing

# Run in development mode (auto-recompiles when code changes)
pnpm dev "search keyword"

# Debug mode (displays browser interface to observe operations)
pnpm debug "search keyword"

# Run compiled code (to verify final results)
pnpm start "search keyword"

# Run tests (to ensure functionality works as expected)
pnpm test

MCP Server Development

# Run MCP server in development mode (supports hot updates)
pnpm mcp

# Run compiled MCP server (for production use)
pnpm mcp:build

Error Handling: What to Do When Issues Arise

The tool includes a comprehensive error-handling system that provides clear prompts for common problems:

If the browser fails to launch, it shows the specific cause (e.g., port in use, browser not installed)
If the network connection is interrupted, it prompts you to check your network status and retries automatically
If search result parsing fails, it logs detailed records (including raw HTML) to simplify troubleshooting of page structure changes
In case of timeouts, it exits gracefully and suggests possible causes (e.g., slow network, delayed verification page handling)

If you’re frequently blocked by Google, try these solutions:

Reduce request frequency to simulate the search intervals of a real user
Enable state files (enabled by default) and specify the save path with --state-file
Avoid using fixed device parameters—let the tool randomize configurations automatically

Important Notes: Must-Read Before Use

Compliance and Usage Guidelines

This tool is for learning and research purposes only. When using it, comply with Google’s Terms of Service and local laws/regulations
Do not send requests frequently, as this may overload Google’s servers and lead to account or IP restrictions
Accessing Google may require a proxy in some regions. The tool itself does not provide proxy functionality—you need to configure this separately

State File Management

State files contain browser data like cookies and local storage, which are critical for bypassing anti-scraping measures
Keep state files secure and do not share them with others (they may contain personal login information)
If you encounter persistent verification issues, try deleting the state file to let the tool rebuild the browser environment

System Requirements

Node.js version 16 or higher (version 18+ recommended for better compatibility)
The first run automatically downloads browsers (approximately several hundred MB)—ensure a stable internet connection
Minimum configuration: 2GB RAM, modern CPU (supports 64-bit systems)

Comparison with Commercial SERP APIs: Why Choose a Local Tool?

Feature	This Tool	Commercial SERP APIs (e.g., SerpAPI)
Cost	Completely free	Charged per call; high long-term costs
Data Privacy	Processed locally; no third-party records	Search requests sent to API providers
Flexibility	Open-source and customizable; supports secondary development	Fixed features; no access to modify core logic
Call Limits	No limits (subject to Google’s anti-scraping rules)	Daily/monthly call limits apply
AI Integration	Natively supports MCP protocol; direct integration with Claude	Requires additional development for integration
Stability	Depends on local network and configuration	Depends on the stability of the API provider

In short, if you need to use search result retrieval frequently over the long term, or have requirements for data privacy and customization, this local tool is the better choice.

Frequently Asked Questions (FAQ)

Do I need a VPN to use this tool?

Yes, since it requires access to Google Search, your environment must support connecting to Google services. The tool itself does not include proxy functionality—you need to configure a working proxy in advance.

Why does a browser window pop up during a search?

There are two possible reasons: either you used the --no-headless parameter (debug mode), or you encountered a Google verification page (e.g., CAPTCHA). The tool automatically switches to headed mode to make it easy for you to complete manual verification, after which it continues the search.

Can I use it to crawl large volumes of results in bulk?

Not recommended. While the tool can bypass some anti-scraping mechanisms, Google has strict limits on high-frequency requests. Sending a large number of requests frequently may lead to temporary IP blocks. It’s best to control search frequency and wait a few minutes between searches.

What should I do if Windows says “command not found”?

This may happen if you didn’t run npm link (or yarn link/pnpm link) or if the system environment variables haven’t been updated. Solutions:

Re-run the link command (requires administrator privileges)
Call the tool directly using the relative path: node ./dist/index.js "search keyword"

How do I update the tool to the latest version?

Navigate to the project directory and run these commands:

git pull
pnpm install
pnpm build

Can it retrieve results from search engines other than Google?

Currently, the tool only supports Google Search. However, since the code is open-source, you can modify the URL and result parsing rules in src/search.ts to adapt it to other search engines (e.g., Bing, Baidu).

This tool provides a flexible, low-cost solution for users who need to retrieve Google search results frequently. Whether used as a standalone command-line tool or integrated with AI assistants to enhance real-time information access, it meets basic needs. If you have development skills, you can further extend its functionality by modifying the code to better fit your specific use case.