mgrep: The CLI-Native Way to Semantically Search Everything

For decades, developers have relied on grep as an indispensable tool in their programming toolkit. Since its birth in 1973, this powerful text search utility has served generations of programmers. But as we stand at the threshold of the artificial intelligence era, have we ever stopped to wonder: why do we still need exact keyword matching to find code, rather than being able to directly describe what we’re looking for in natural language?

This is the fundamental question that mgrep seeks to answer.

From Exact Matching to Semantic Understanding: The Evolution of Search Tools

Imagine this common scenario: you’ve just joined a new project and need to locate the code that handles user authentication. Using traditional grep, you might try various keyword combinations: auth, authentication, login, signin, and so on. Each attempt might return numerous results, many of which are false positives, while the code you actually need might be completely missed due to different naming conventions.

Now, with mgrep, you simply ask:

mgrep "where do we set up auth?"

It understands your intent and returns the most relevant results, regardless of what specific naming patterns were used in the code. This shift is similar to moving from command-line interfaces to graphical user interfaces—once you’ve experienced it, going back becomes difficult.

Why We Need Semantic Search

Traditional grep operates at the lexical level, looking for exact character matches. mgrep operates at the semantic level, understanding the meaning and context of queries. This distinction becomes crucial in several aspects:

  • Multilingual Support: Codebases may use different terms for the same concepts (like “auth” versus “authentication”)
  • Cross-Modal Search: Modern projects contain multiple file types, not just code
  • Beginner-Friendly: New team members can find what they need without familiarizing themselves with the project’s specific naming conventions

mgrep in Practice: From Installation to Mastery

Getting Started

Installing mgrep is straightforward using package managers like npm, pnpm, or bun:

npm install -g @mixedbread/mgrep

Next, you’ll need to authenticate. mgrep offers two approaches:

Interactive Login (suitable for personal development environments):

mgrep login

API Key Authentication (ideal for CI/CD environments):

export MXBAI_API_KEY=your_api_key_here

Indexing Your Project

One of mgrep‘s core features is intelligent indexing, achieved through:

cd path/to/your/project
mgrep watch

This command performs an initial sync, respects your project’s .gitignore rules, then keeps the Mixedbread store synchronized with local changes through file watchers. You’ll see processing progress (“processed / uploaded”) in your terminal, giving you clear visibility into the indexing status.

mgrep logo

Performing Searches

The basic search syntax is intuitive and easy to use:

# Search in current directory
mgrep "where do we set up auth?"

# Search in specific directory
mgrep "How are chunks defined?" src/models

# Limit result count
mgrep -m 10 "What is the maximum number of concurrent workers?"

# Generate answers based on search results
mgrep -a "What code parsers are available?"

Working with AI Programming Assistants

mgrep was designed with AI programming assistant integration in mind. While Claude Code is currently supported, integrations with other major assistants (like Codex, Cursor, Windsurf, etc.) are on the development roadmap.

Configuring Claude Code with mgrep

Setup can be completed with a single command:

mgrep install-claude-code

This command handles all necessary steps: signing in (if needed), adding the Mixedbread mgrep plugin to the marketplace, and installing it for you. Once done, enable the plugin in Claude Code and point your agent at the repository you’re indexing with mgrep watch.

Afterward, you can ask Claude questions just like you would locally, with results streaming directly into the chat interface, complete with file paths and line number hints.

Performance and Efficiency: What the Data Shows

One might worry whether adding a semantic understanding layer would introduce performance overhead. The reality proves quite the opposite.

In a benchmark of 50 QA tasks, the mgrep+Claude Code combination used approximately 2x fewer tokens than grep-based workflows while maintaining similar or better judged quality.

mgrep performance comparison

This efficiency improvement stems from a fundamental difference in approach: mgrep first finds relevant code snippets through a few semantic queries, then the model focuses its capacity on reasoning rather than sifting through irrelevant code from endless grep attempts.

Note: Win Rate (%) was calculated using an LLM as a judge.

Design Philosophy: Complement, Don’t Replace

mgrep was designed to complement rather than replace grep. The most effective code search strategy combines both tools.

When to Use grep

grep (or its modern alternative ripgrep) remains irreplaceable in these scenarios:

  • Exact Matches: When you know precisely what you’re looking for
  • Symbol Tracing: Refactoring, finding function calls, etc.
  • Regular Expressions: Complex pattern matching

When to Use mgrep

mgrep excels in these situations:

  • Intent Search: When you want to find code that implements specific functionality but don’t know the exact naming
  • Code Exploration: Understanding the structure and design of new codebases
  • Feature Discovery: Locating code that implements specific features
  • Team Onboarding: Helping new members quickly familiarize themselves with codebases

Technical Architecture Unveiled

Behind mgrep lies Mixedbread Search technology, a full-featured search solution that combines state-of-the-art semantic retrieval models with context-aware parsing and optimized inference methods.

Workflow

  1. File Processing: Each file gets pushed to a Mixedbread store using the same SDK that applications use
  2. Intelligent Search: Search requests return the top-k most relevant results using Mixedbread reranking capabilities
  3. Result Presentation: Results include relative paths and contextual hints (line ranges for text, page numbers for PDFs, etc.) for a skim-friendly experience
  4. Cloud Synchronization: Since stores are cloud-backed, agents and team members can query the same corpus without re-uploading

Configuration Tips

  • Use the --store <name> parameter to isolate workspaces (by repository, by team, by experiment). Stores are created on-demand if they don’t exist yet
  • Ignore rules come straight from git, so temporary files, build outputs, and vendored dependencies stay out of your embeddings
  • watch reports progress (“processed / uploaded”) as it scans; leave it running in a terminal tab to keep your store fresh
  • search accepts most grep-style switches and politely ignores anything it cannot support, so existing muscle memory still works

Environment Variables:

  • MXBAI_API_KEY: Set this to authenticate without browser login (ideal for CI/CD)
  • MXBAI_STORE: Override the default store name (default: mgrep)

Advanced Usage Techniques

Custom Ignore Rules

Beyond respecting .gitignore, mgrep also supports a .mgrepignore file in the repository root. This file follows the same syntax as .gitignore.

Multiple Store Management

For complex projects, you can use different store names to manage indexing for different aspects:

mgrep watch --store frontend
mgrep watch --store backend

Then specify the corresponding store when searching:

mgrep "authentication component" --store frontend

File Format Support

Currently Supported: Code, text, PDFs, images
Coming Soon: Audio and video

This multimodal support means you can use the same natural language queries to search for image content or PDF documents within your codebase, significantly improving information retrieval efficiency.

Development and Contribution

If you’re a developer interested in mgrep‘s internal workings, you can explore its source code and contribute:

pnpm install
pnpm build        # or use pnpm dev for quick compile + run
pnpm format       # biome formatting + linting

The project is developed in TypeScript, with the executable located at dist/index.js. Tests are written using the bats framework and can be run via pnpm test.

Troubleshooting

Common Issues and Solutions

  • Login keeps reopening: Run mgrep logout to clear cached tokens, then try mgrep login again
  • Watcher feels noisy: Set MXBAI_STORE or pass --store to separate experiments, or pause the watcher and restart after large refactors
  • Need a fresh store: Delete it from the Mixedbread dashboard, then run mgrep watch. It will auto-create a new one

Performance Optimization Suggestions

  • For large repositories, consider using .mgrepignore to exclude directories that truly don’t need indexing
  • In CI/CD environments, ensure you set the MXBAI_API_KEY environment variable to avoid interactive login
  • Regularly check store usage and delete experimental stores you no longer need

Future Outlook

mgrep represents the trend of command-line tools evolving toward greater intelligence and natural interaction. With audio and video support coming soon, its application scenarios will further expand.

From a broader perspective, mgrep exemplifies the deep integration of developer tools with artificial intelligence technologies. It’s not a standalone AI product but seamlessly integrates AI capabilities into developers’ existing workflows—this represents a healthy direction for technological evolution.

Conclusion

In today’s rapidly evolving technological landscape, we need to both respect and preserve time-tested tools like grep, while embracing new tools like mgrep that leverage the latest technological advances. They’re not mutually exclusive but complementary, collectively building more efficient development environments.

The emergence of mgrep signals that command-line tools are beginning to understand our intentions rather than just our commands. This shift may redefine how we interact with computers, making technology better fit human thinking patterns rather than forcing humans to adapt to machine operation modes.

Whether you’re an independent developer or part of a large team, whether you’re exploring new codebases or maintaining mature projects, mgrep is worth trying. It might change your expectations about code search and your understanding of command-line tool capabilities.


mgrep is open source under the Apache-2.0 license. See the LICENSE file for details.

Want to try mgrep immediately? Visit the online demo to experience its capabilities!