MUVERA Multi‑Vector Retrieval: In‑Depth Guide to the Fixed‑Dimensional Encoding (FDE) Python Implementation
In modern large‑scale search systems, documents are often represented by multiple vectors (hundreds per document) to capture fine‑grained semantics and boost accuracy. However, matching each query against every vector becomes prohibitively slow at scale. MUVERA (Multi‑Vector Retrieval via Fixed‑Dimensional Encodings) introduces Fixed‑Dimensional Encoding (FDE): a technique that compresses a set of vectors into a single high‑dimensional embedding, preserving original similarity relationships. This article walks you through FDE’s core ideas, configuration, helper functions, algorithmic flow, Python API, performance characteristics, and practical examples—everything you need to run FDE end to end in Python.
Table of Contents
Why FDE? The Multi‑Vector Search Challenge
Imagine a search index containing one billion documents, each represented by 100 vectors. This multi‑vector representation—popularized by ColBERT—offers high recall and precision by modeling different aspects of content. Yet brute‑force search requires computing dot products between query vectors and every document vector:
-
Traditional single‑vector approach
-
Represent each document with a single embedding -
Pros: Extremely fast search via one dot product per document -
Cons: Oversimplifies rich document semantics
-
-
Modern multi‑vector approach
-
Represent each document by hundreds of embeddings -
Pros: Captures nuances, context, and term interactions -
Cons: Hundreds of dot products per document → catastrophic slowdown
-
Fixed‑Dimensional Encoding (FDE) solves this trade‑off. It compresses a set of vectors into a single, fixed‑size embedding that approximates the Chamfer similarity of the original sets. In practice, one dot product between two FDEs yields a score close to aggregating all pairwise similarities—driving search speed to that of single‑vector retrieval while retaining most accuracy.
How FDE Works: The Three Key Steps
FDE’s magic lies in combining space partitioning, vector aggregation, and repetition & concatenation. Let’s break down each step.
2.1 Space Partitioning (SimHash + Gray Code)
-
SimHash Projection
-
Multiply each original vector by a random Gaussian matrix -
Convert each projected dimension to a binary sign ( +1
→ bit 1,<0
→ bit 0) -
Intuition: akin to hashing vectors into buckets by random hyperplanes
-
-
Gray Code Encoding
-
Interpret the sequence of sign bits as a Gray code sequence -
Gray code ensures adjacent partitions differ by only one bit, preserving locality -
Result: each vector is assigned a partition index in [0, 2ᵏ)
wherek = num_simhash_projections
-
-
Partition Count
-
Using k
projections yields2ᵏ
partitions (e.g.,k = 6
→ 64 blocks)
-
2.2 Vector Aggregation: Sum vs. Average
-
Queries: Few vectors → sum all vectors falling into each partition
-
Documents: Many vectors → average vectors in each partition
-
Why different?
-
Summation amplifies features when vector counts are low -
Averaging avoids large partitions dominating when vector counts are high
-
2.3 Repetition & Concatenation: Reinforcing Locality
One pass of partitioning may mis‑bucket some vectors. To smooth out randomness:
-
Repeat the SimHash + aggregation process r
times with different seeds -
Concatenate the r
resulting partition‑aggregated vectors -
Final FDE dimension = r × num_partitions × projection_dim
Example:
r = 20
repetitionsnum_simhash_projections = 6
→2⁶ = 64
partitionsprojection_dim = 128
Total FDE dimension = 20 × 64 × 128 = 163,840
Preparing Your Environment and Installation
You only need Python 3 and NumPy to run this implementation.
# 1. Clone the repository
git clone https://github.com/your-repo/fde_generator_python.git
cd fde_generator_python
# 2. Create a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
.\.venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
contents of
requirements.txt
:numpy>=1.24 tqdm # optional, for progress bars
Verify installation:
python -c "import numpy; print('NumPy version:', numpy.__version__)"
Configuration Classes Explained
All FDE settings are encapsulated in simple Python classes:
from enum import Enum
from dataclasses import dataclass
from typing import Optional
class EncodingType(Enum):
DEFAULT_SUM = 0 # queries: sum per partition
AVERAGE = 1 # documents: average per partition
class ProjectionType(Enum):
DEFAULT_IDENTITY = 0 # no dimensionality reduction
AMS_SKETCH = 1 # use AMS Sketch for reduction
@dataclass
class FixedDimensionalEncodingConfig:
dimension: int = 128
num_repetitions: int = 10
num_simhash_projections: int = 6
seed: int = 42
encoding_type: EncodingType = EncodingType.DEFAULT_SUM
projection_type: ProjectionType = ProjectionType.DEFAULT_IDENTITY
projection_dimension: Optional[int] = None
fill_empty_partitions: bool = False
final_projection_dimension: Optional[int] = None
Parameter | Description |
---|---|
dimension |
Original vector length |
num_repetitions |
How many independent partitioning runs |
num_simhash_projections |
Number of SimHash projections → partitions = 2ᵏ |
seed |
Initial random seed for reproducibility |
encoding_type |
Query aggregation (sum) vs. document aggregation (average) |
projection_type |
No reduction (identity) vs. AMS Sketch |
projection_dimension |
Intermediate projection dimension (if using AMS Sketch) |
fill_empty_partitions |
Whether to explicitly set zero for partitions that receive no vectors |
final_projection_dimension |
Optional final Count Sketch compression to reduce FDE vector size |
Deep Dive into Internal Helper Functions
Several helper functions mirror the original C++ logic, now in clear Python.
5.1 Gray Code Functions
def _append_to_gray_code(gray_code: int, bit: bool) -> int:
# Append a bit to an existing Gray code sequence
return (gray_code << 1) + (int(bit) ^ (gray_code & 1))
def _gray_code_to_binary(num: int) -> int:
# Convert Gray code back to binary integer
mask = num >> 1
while mask:
num ^= mask
mask >>= 1
return num
-
Purpose:
-
Maintain Gray code’s single‑bit‑flip property for neighboring partitions -
Ensure consistent indexing logic matching the C++ implementation
-
5.2 Random Matrix Generators
def _simhash_matrix_from_seed(dimension: int, num_projections: int, seed: int):
rng = np.random.default_rng(seed)
return rng.normal(0.0, 1.0, size=(dimension, num_projections))
-
Role: Creates a Gaussian matrix for SimHash projections.
def _ams_projection_matrix_from_seed(dimension: int, projection_dim: int, seed: int):
# Build a sparse random matrix with one nonzero entry per row
# Matches the AMS Sketch approach in the C++ code
-
Role: Intermediate dimensionality reduction via Count Sketch.
5.3 Partition Index Calculation
def _simhash_partition_index_gray(sketch_vector: np.ndarray) -> int:
idx = 0
for val in sketch_vector:
idx = _append_to_gray_code(idx, val > 0)
return idx
-
Logic:
-
For each projected dimension, check sign → append to Gray code → partition index
-
Step‑by‑Step Algorithm Walkthrough
Here’s a high‑level view of _generate_fde_internal()
:
def _generate_fde_internal(point_cloud: np.ndarray, config: FixedDimensionalEncodingConfig):
# 1. Validate input shape and dtype
# 2. Compute total partitions = 2^config.num_simhash_projections
# 3. If using AMS Sketch, generate projection matrix
# 4. Allocate output array: (config.num_repetitions, num_partitions, proj_dim)
for rep in range(config.num_repetitions):
# a. Build SimHash matrix with seed = config.seed + rep
# b. Project and sign → sketch_vectors
# c. Compute partition indices for each vector
# d. Initialize per-partition accumulators (zero if fill_empty_partitions)
# e. Aggregate vectors by sum or average into partitions
# f. If AMS Sketch selected, apply intermediate reduction
# 5. Concatenate rep outputs into one flat vector
# 6. If final_projection_dimension set, apply Count Sketch reduction
return fde_vector
-
Time Complexity: O(n × d × r × k)
-
n = number of vectors, d = original dimension, r = repetitions, k = projections
-
-
Resulting FDE size = r × 2ᵏ × (projection_dimension or dimension)
Public Python API: One‑Line FDE Generation
The external interface exposes three simple functions:
def generate_query_fde(point_cloud: np.ndarray, config: FixedDimensionalEncodingConfig):
config.encoding_type = EncodingType.DEFAULT_SUM
return _generate_fde_internal(point_cloud, config)
def generate_document_fde(point_cloud: np.ndarray, config: FixedDimensionalEncodingConfig):
config.encoding_type = EncodingType.AVERAGE
return _generate_fde_internal(point_cloud, config)
def generate_fde(point_cloud: np.ndarray, config: FixedDimensionalEncodingConfig):
# Switch based on config.encoding_type
Just pass a NumPy array of shape (num_vectors, dimension)
plus your config object to get an FDE.
C vs. Python Function Mappings
C++ Function | Python Function | Purpose |
---|---|---|
GenerateFixedDimensionalEncoding() |
generate_fde() |
Top‑level routing |
GenerateQueryFixedDimensionalEncoding() |
generate_query_fde() |
Query FDE generation |
GenerateDocumentFixedDimensionalEncoding() |
generate_document_fde() |
Document FDE generation |
internal::SimHashPartitionIndex() |
_simhash_partition_index_gray() |
Partition assignment |
internal::ApplyCountSketchToVector() |
(internal) Count Sketch reduction | Final dimensionality reduction |
Usage Examples
9.1 Basic Usage
import numpy as np
from fde_generator import (
FixedDimensionalEncodingConfig,
generate_query_fde,
generate_document_fde
)
# 1. Build configuration
config = FixedDimensionalEncodingConfig(
dimension=128,
num_repetitions=10,
num_simhash_projections=6,
seed=42
)
# 2. Prepare vectors
query_vectors = np.random.randn(32, 128).astype(np.float32)
doc_vectors = np.random.randn(80, 128).astype(np.float32)
# 3. Generate FDE embeddings
q_fde = generate_query_fde(query_vectors, config)
d_fde = generate_document_fde(doc_vectors, config)
# 4. Compute similarity (approximate Chamfer)
score = np.dot(q_fde, d_fde)
print("Similarity score:", score)
9.2 Configurations with Dimensionality Reduction
from fde_generator import ProjectionType, replace
# Intermediate AMS Sketch down to 16 dims
config_reduced = replace(
config,
projection_type=ProjectionType.AMS_SKETCH,
projection_dimension=16
)
# Final Count Sketch down to 1024 dims
config_final = replace(
config,
final_projection_dimension=1024
)
Adjust projection_dimension
and final_projection_dimension
to balance precision vs. vector size.
Performance Highlights
-
Indexing Overhead: Each document takes ~33 ms when r=20
, 4600 docs → ~151 s -
Query Time: Single dot product per query → < 0.2 ms -
Recall@10: Drops slightly from 0.70 to 0.64 vs. native ColBERT, but retrieval is ~10× faster
Frequently Asked Questions (FAQ)
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is Fixed‑Dimensional Encoding (FDE)?",
"acceptedAnswer": {
"@type": "Answer",
"text": "FDE compresses a set of vectors into one high‑dimensional embedding by partitioning, aggregating, and concatenating multiple runs, approximating the original Chamfer similarity."
}
},
{
"@type": "Question",
"name": "How do I choose the number of SimHash projections?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The parameter k yields 2ᵏ partitions. Larger k means finer partitions but more computation. Common values range from 6 to 8."
}
},
{
"@type": "Question",
"name": "Why sum for queries and average for documents?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Queries have few vectors, so summation highlights signals. Documents have many vectors, and averaging prevents any partition from dominating."
}
},
{
"@type": "Question",
"name": "How can I reduce the final FDE dimension?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Set `projection_dimension` for intermediate AMS Sketch reduction and `final_projection_dimension` for a final Count Sketch compression."
}
}
]
}
How‑To Guide: Quickstart FDE
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "Generate a Fixed‑Dimensional Encoding Quickly",
"step": [
{
"@type": "HowToStep",
"name": "Install Dependencies",
"text": "pip install -r requirements.txt"
},
{
"@type": "HowToStep",
"name": "Create Configuration",
"text": "config = FixedDimensionalEncodingConfig(dimension=128, num_repetitions=10, num_simhash_projections=6, seed=42)"
},
{
"@type": "HowToStep",
"name": "Prepare Vector Data",
"text": "vectors = np.random.randn(num_vectors, dimension).astype(np.float32)"
},
{
"@type": "HowToStep",
"name": "Call Generation Function",
"text": "fde = generate_query_fde(vectors, config)"
}
],
"tool": [
{
"@type": "HowToTool",
"name": "Python 3"
},
{
"@type": "HowToTool",
"name": "NumPy"
}
]
}
References
-
Original Paper: MUVERA: Multi‑Vector Retrieval via Fixed‑Dimensional Encodings -
C++ Reference Implementation: Google Graph Mining Repository (sketching/point_cloud) -
Technical Blog: Google Research MUVERA blog post