Fast Agentic Search (FAS) Cuts Code Search Time 4× with Claude-Level Accuracy: A Deep Dive

高效码农

2 months ago

4× Faster Code Search with Claude-Level Accuracy: Deep Dive into Relace AI’s Fast Agentic Search (FAS)

Featured Snippet Answer (67 words):
Fast Agentic Search (FAS) is a specialized small agent model released by Relace AI that dramatically accelerates codebase navigation. By combining parallel tool calling (4–12 files at once) with on-policy reinforcement learning, FAS achieves the same precision as traditional step-by-step Agentic Search while being 4× faster. Real-world SWE-bench integration shows 9.3% lower median latency and 13.6% fewer tokens.

If you’ve ever watched an AI coding assistant spend two full minutes just “looking for the right file” in a 5 000-file repo, you know the pain.

It opens one file, thinks, closes it, apologizes, opens another…
60% of the tokens and 70% of the wall-clock time are wasted on pure navigation.

Relace AI just pulled the most expensive single step out of the loop and rebuilt it from scratch.
The result is called Fast Agentic Search — FAS.

Why Traditional Codebase Search Is Either Fast or Accurate — Never Both

There are only two mainstream approaches today:

Approach	Speed	Accuracy	Why It Fails in Real Projects
RAG (Vector search)	< 500 ms	Medium–Low	Relies on semantic similarity; misses indirect calls, renames, and complex business logic
Classic Agentic Search	30–90 seconds	Very High (human-like)	Strictly sequential: think → call one tool → wait → think again

Every major coding agent (Claude 3.5 Sonnet, GPT-4o, Cursor, Aider, DevGPT, etc.) is stuck in this trade-off — until now.

How FAS Solves the Speed-Accuracy Paradox in One Stroke

Relace didn’t just “add parallelism.” They rebuilt the entire search behavior with three surgical innovations.

1. Parallel Tool Calling (4–12 actions in a single turn)

Traditional agents are forced to call tools one at a time because OpenAI/Anthropic function-calling schemas were designed that way.

FAS was explicitly trained to emit multiple structured tool calls in parallel:

[
  { "tool": "grep", "query": "permission_check|rbac|can_" },
  { "tool": "view", "path": "src/middleware/auth.py" },
  { "tool": "view", "path": "src/services/permission.rs" },
  { "tool": "ast_grep", "pattern": "if (user.role !== 'admin')" },
  ...
]

Instead of 10 sequential round-trips (10–10–15 s latency), everything finishes in 1–2 network round-trips.
Real measured speedup: ~4× end-to-end.

2. On-Policy Reinforcement Learning with a Custom Reward Function

Most teams stop at supervised fine-tuning (SFT). Relace went further and built a full RL loop with a reward that simultaneously maximizes:

Recall & Precision of relevant files
Minus a heavy penalty for “excessive rounds”

Late in training, a beautiful emergent behavior appeared:
The model learned to pause for 1–2 seconds of pure reasoning before firing parallel calls — exactly like a senior engineer who thinks first and only then opens the six most suspicious files.

This proves FAS is not brute-forcing; it is planning.

3. Sub-Agent Architecture — Decoupling Search from Reasoning

Data from thousands of real programming sessions showed:
≈60% of all tokens in coding agents are spent on navigation, not on code generation or refactoring.

FAS turns the workflow into:

User query
↓
Main coding model produces a 1-sentence search intent
↓
FAS (tiny specialized agent) locates the exact files in parallel
↓
Only the 5–10 most relevant files are injected back into the main model
↓
Main model writes the actual patch

Result: the expensive frontier model no longer wastes context window or time on exploration.

Hard Numbers from Independent Benchmarks

Metric	Traditional Agentic	With FAS Integrated	Improvement
End-to-end search speed (same accuracy)	1×	4× faster	4.0×
SWE-bench Verified median latency	Baseline	−9.3%	Significant
Token consumption	Baseline	−13.6%	Significant

Important: SWE-bench tasks are relatively “clean.” In messy enterprise repos with poor naming and sparse comments, the relative gains are dramatically higher because navigation dominates even more.

Frequently Asked Questions (FAQ)

Q: How big is the FAS model?
Relace classifies it as a “small specialized agent.” Inference cost and latency suggest something in the 7B–34B class, but exact size has not been disclosed.

Q: Can I run FAS locally?
Yes. It supports fully air-gapped deployment and also offers encrypted cloud inference similar to Claude’s enterprise mode.

Q: Does it only use grep and view?
No. Out of the box it supports grep, rg, git grep, ast-grep, tree, jump-to-definition, and custom project-specific tools.

Q: Will my current coding assistant support it?
Aider, Continue.dev, Cursor, Sweep, and Roo Code have either already added FAS backends or are in the process. It exposes an OpenAI-compatible API — just change the base URL.

Q: Why can’t I just make GPT-4o or Claude call tools in parallel?
You can, but without the specialized RL reward they will spray 20 irrelevant files, wasting even more money and time. FAS was trained for years (in simulation) to parallelize intelligently.

The Bigger Picture: The Era of Specialist Sub-Agents Has Begun

For two years the industry chased ever-larger models and longer context windows, yet real-world coding speed barely budged.

Relace’s FAS proves that the true bottleneck was never raw intelligence — it was navigation efficiency.

By carving out the single most expensive sub-task (code search) and optimizing it to death with a dedicated small model, they unlocked a 4× leap practically overnight.

This is not just a feature release; it’s the first clear signal of the next architectural shift:

Tomorrow’s best coding assistant will not be one giant model.
It will be a team of hyper-specialized sub-agents:

Navigation agent (FAS)
Planning agent
Code synthesis agent
Test generation agent
Refactoring agent

Each one absurdly good at its narrow job, orchestrated together.

The 4× speed you see today is only the beginning.

Welcome to the age of compound AI systems.