mgrep: The CLI-Native Way to Semantically Search Everything
For decades, developers have relied on grep as an indispensable tool in their programming toolkit. Since its birth in 1973, this powerful text search utility has served generations of programmers. But as we stand at the threshold of the artificial intelligence era, have we ever stopped to wonder: why do we still need exact keyword matching to find code, rather than being able to directly describe what we’re looking for in natural language?
This is the fundamental question that mgrep seeks to answer.
From Exact Matching to Semantic Understanding: The Evolution of Search Tools
Imagine this common scenario: you’ve just joined a new project and need to locate the code that handles user authentication. Using traditional grep, you might try various keyword combinations: auth, authentication, login, signin, and so on. Each attempt might return numerous results, many of which are false positives, while the code you actually need might be completely missed due to different naming conventions.
Now, with mgrep, you simply ask:
mgrep "where do we set up auth?"
It understands your intent and returns the most relevant results, regardless of what specific naming patterns were used in the code. This shift is similar to moving from command-line interfaces to graphical user interfaces—once you’ve experienced it, going back becomes difficult.
Why We Need Semantic Search
Traditional grep operates at the lexical level, looking for exact character matches. mgrep operates at the semantic level, understanding the meaning and context of queries. This distinction becomes crucial in several aspects:
-
Multilingual Support: Codebases may use different terms for the same concepts (like “auth” versus “authentication”) -
Cross-Modal Search: Modern projects contain multiple file types, not just code -
Beginner-Friendly: New team members can find what they need without familiarizing themselves with the project’s specific naming conventions
mgrep in Practice: From Installation to Mastery
Getting Started
Installing mgrep is straightforward using package managers like npm, pnpm, or bun:
npm install -g @mixedbread/mgrep
Next, you’ll need to authenticate. mgrep offers two approaches:
Interactive Login (suitable for personal development environments):
mgrep login
API Key Authentication (ideal for CI/CD environments):
export MXBAI_API_KEY=your_api_key_here
Indexing Your Project
One of mgrep‘s core features is intelligent indexing, achieved through:
cd path/to/your/project
mgrep watch
This command performs an initial sync, respects your project’s .gitignore rules, then keeps the Mixedbread store synchronized with local changes through file watchers. You’ll see processing progress (“processed / uploaded”) in your terminal, giving you clear visibility into the indexing status.
Performing Searches
The basic search syntax is intuitive and easy to use:
# Search in current directory
mgrep "where do we set up auth?"
# Search in specific directory
mgrep "How are chunks defined?" src/models
# Limit result count
mgrep -m 10 "What is the maximum number of concurrent workers?"
# Generate answers based on search results
mgrep -a "What code parsers are available?"
Working with AI Programming Assistants
mgrep was designed with AI programming assistant integration in mind. While Claude Code is currently supported, integrations with other major assistants (like Codex, Cursor, Windsurf, etc.) are on the development roadmap.
Configuring Claude Code with mgrep
Setup can be completed with a single command:
mgrep install-claude-code
This command handles all necessary steps: signing in (if needed), adding the Mixedbread mgrep plugin to the marketplace, and installing it for you. Once done, enable the plugin in Claude Code and point your agent at the repository you’re indexing with mgrep watch.
Afterward, you can ask Claude questions just like you would locally, with results streaming directly into the chat interface, complete with file paths and line number hints.
Performance and Efficiency: What the Data Shows
One might worry whether adding a semantic understanding layer would introduce performance overhead. The reality proves quite the opposite.
In a benchmark of 50 QA tasks, the mgrep+Claude Code combination used approximately 2x fewer tokens than grep-based workflows while maintaining similar or better judged quality.
This efficiency improvement stems from a fundamental difference in approach: mgrep first finds relevant code snippets through a few semantic queries, then the model focuses its capacity on reasoning rather than sifting through irrelevant code from endless grep attempts.
Note: Win Rate (%) was calculated using an LLM as a judge.
Design Philosophy: Complement, Don’t Replace
mgrep was designed to complement rather than replace grep. The most effective code search strategy combines both tools.
When to Use grep
grep (or its modern alternative ripgrep) remains irreplaceable in these scenarios:
-
Exact Matches: When you know precisely what you’re looking for -
Symbol Tracing: Refactoring, finding function calls, etc. -
Regular Expressions: Complex pattern matching
When to Use mgrep
mgrep excels in these situations:
-
Intent Search: When you want to find code that implements specific functionality but don’t know the exact naming -
Code Exploration: Understanding the structure and design of new codebases -
Feature Discovery: Locating code that implements specific features -
Team Onboarding: Helping new members quickly familiarize themselves with codebases
Technical Architecture Unveiled
Behind mgrep lies Mixedbread Search technology, a full-featured search solution that combines state-of-the-art semantic retrieval models with context-aware parsing and optimized inference methods.
Workflow
-
File Processing: Each file gets pushed to a Mixedbread store using the same SDK that applications use -
Intelligent Search: Search requests return the top-k most relevant results using Mixedbread reranking capabilities -
Result Presentation: Results include relative paths and contextual hints (line ranges for text, page numbers for PDFs, etc.) for a skim-friendly experience -
Cloud Synchronization: Since stores are cloud-backed, agents and team members can query the same corpus without re-uploading
Configuration Tips
-
Use the --store <name>parameter to isolate workspaces (by repository, by team, by experiment). Stores are created on-demand if they don’t exist yet -
Ignore rules come straight from git, so temporary files, build outputs, and vendored dependencies stay out of your embeddings -
watchreports progress (“processed / uploaded”) as it scans; leave it running in a terminal tab to keep your store fresh -
searchaccepts mostgrep-style switches and politely ignores anything it cannot support, so existing muscle memory still works
Environment Variables:
-
MXBAI_API_KEY: Set this to authenticate without browser login (ideal for CI/CD) -
MXBAI_STORE: Override the default store name (default:mgrep)
Advanced Usage Techniques
Custom Ignore Rules
Beyond respecting .gitignore, mgrep also supports a .mgrepignore file in the repository root. This file follows the same syntax as .gitignore.
Multiple Store Management
For complex projects, you can use different store names to manage indexing for different aspects:
mgrep watch --store frontend
mgrep watch --store backend
Then specify the corresponding store when searching:
mgrep "authentication component" --store frontend
File Format Support
Currently Supported: Code, text, PDFs, images
Coming Soon: Audio and video
This multimodal support means you can use the same natural language queries to search for image content or PDF documents within your codebase, significantly improving information retrieval efficiency.
Development and Contribution
If you’re a developer interested in mgrep‘s internal workings, you can explore its source code and contribute:
pnpm install
pnpm build # or use pnpm dev for quick compile + run
pnpm format # biome formatting + linting
The project is developed in TypeScript, with the executable located at dist/index.js. Tests are written using the bats framework and can be run via pnpm test.
Troubleshooting
Common Issues and Solutions
-
Login keeps reopening: Run mgrep logoutto clear cached tokens, then trymgrep loginagain -
Watcher feels noisy: Set MXBAI_STOREor pass--storeto separate experiments, or pause the watcher and restart after large refactors -
Need a fresh store: Delete it from the Mixedbread dashboard, then run mgrep watch. It will auto-create a new one
Performance Optimization Suggestions
-
For large repositories, consider using .mgrepignoreto exclude directories that truly don’t need indexing -
In CI/CD environments, ensure you set the MXBAI_API_KEYenvironment variable to avoid interactive login -
Regularly check store usage and delete experimental stores you no longer need
Future Outlook
mgrep represents the trend of command-line tools evolving toward greater intelligence and natural interaction. With audio and video support coming soon, its application scenarios will further expand.
From a broader perspective, mgrep exemplifies the deep integration of developer tools with artificial intelligence technologies. It’s not a standalone AI product but seamlessly integrates AI capabilities into developers’ existing workflows—this represents a healthy direction for technological evolution.
Conclusion
In today’s rapidly evolving technological landscape, we need to both respect and preserve time-tested tools like grep, while embracing new tools like mgrep that leverage the latest technological advances. They’re not mutually exclusive but complementary, collectively building more efficient development environments.
The emergence of mgrep signals that command-line tools are beginning to understand our intentions rather than just our commands. This shift may redefine how we interact with computers, making technology better fit human thinking patterns rather than forcing humans to adapt to machine operation modes.
Whether you’re an independent developer or part of a large team, whether you’re exploring new codebases or maintaining mature projects, mgrep is worth trying. It might change your expectations about code search and your understanding of command-line tool capabilities.
mgrep is open source under the Apache-2.0 license. See the LICENSE file for details.
Want to try mgrep immediately? Visit the online demo to experience its capabilities!
