Vector Search vs Hybrid Search
Learn the key differences between vector search and hybrid search in RAG applications. Use cases, performance tradeoffs, and when to choose each.
In this cookbook, we demonstrate how to enhance retrieval performance by implementing hybrid search in your RAG applications. We’ll explore how structured metadata can dramatically improve search relevance and precision beyond what vector similarity alone can achieve.
When users search for products, documents, or other content, they often have specific attributes in mind. For example, a shopper might want “red dresses for summer occasions” or a researcher might need “papers on climate change published after 2020.” Pure semantic search might miss these nuances, but metadata filtering allows you to combine the power of vector search with explicit attribute filtering.
Like always, we’ll focus on data-driven approaches to measure and improve retrieval performance.
Requirements
Before starting, ensure you have the following packages installed:
Setup
Start by setting up LangWatch to monitor your RAG application:
The Dataset
In this cookbook, we’ll work with a product catalog dataset containing fashion items with structured metadata. The dataset includes:
- Basic product information: titles, descriptions, brands, and prices
- Categorization: categories, subcategories, and product types
- Attributes: structured characteristics like sleeve length, neckline, and fit
- Materials and patterns: fabric types and design patterns
Here’s what our taxonomy structure looks like:
Having well-structured metadata enables more precise filtering and can significantly improve search relevance, especially for domain-specific applications where users have particular attributes in mind. This data might come from manual tagging by product managers or automated processes with LLMs.
Let’s first load the dataset from Huggingface:
Now we can load it into our Chroma vector database.
Understanding Our Vector Database
We’ve now loaded our product catalog into a Chroma vector database with the following components:
- Document Text: The product descriptions that will be embedded and used for semantic search
- Metadata: Structured attributes like category, price, material, etc., that can be used for filtering
This setup allows us to perform both:
- Pure semantic search: Finding products based on the meaning of their descriptions
- Hybrid search: Combining semantic similarity with explicit metadata filters
The embeddings are generated using OpenAI’s embedding model, which creates high-dimensional vectors that represent the semantic content of each product description. Similar products will have vectors that are close together in this high-dimensional space.
Generating Synthetic Data
When you don’t have production data to start with, you can generate synthetic data to simulate a real-world scenario. We already have the ‘output’, which is the clothing item we just embedded. We now want to generate synthetic queries that would be relevant to the clothing item.
In this case, we’ll use GPT-4 to generate realistic user queries that would naturally lead to each product in our catalog. This gives us query-product pairs where we know the ground truth relevance.
Let’s visualize what this looks like:
Query Filtering
To implement metadata filtering, we first need to extract structured filters from natural language queries. This process involves:
- Understanding user intent: Identifying what specific attributes the user is looking for
- Mapping to our taxonomy: Converting natural language descriptions to our structured metadata schema
- Handling ambiguity: Resolving cases where the user’s language doesn’t precisely match our metadata values
This is where LLMs excel - they can understand the nuances of natural language and extract structured information that aligns with our predefined taxonomy. We’ll use a Pydantic model to ensure the extracted filters conform to our expected schema:
With these models in place, we can start extracting query filters from all queries. We need to let the LLM know what the possible taxonomies are. We’ll use the taxonomy.json file for this.
Retrieval Evaluation: Semantic Search vs. Metadata Filtering
Now comes the critical part - evaluating how well each retrieval method performs. We’ll compare pure semantic search against metadata-filtered search using two key metrics:
- Recall: The proportion of relevant items successfully retrieved
- Mean Reciprocal Rank (MRR): How high relevant items appear in our results
These metrics help us understand different aspects of retrieval quality:
- High recall means we’re finding most of the relevant items
- High MRR means we’re ranking relevant items near the top of the results
By comparing these metrics across different retrieval methods, we can make data-driven decisions about which approach works best for our specific use case.
For this evaluation, we’ll compare two distinct retrieval approaches:
- Pure Semantic Search: Using only vector embeddings to find similar items
- Semantic Search with Metadata Filtering: Combining vector similarity with structured metadata filters
This comparison will demonstrate how metadata filtering can significantly improve retrieval precision and relevance, especially for queries with specific attributes or constraints.
Now we can run the evals:
k | method | avg_recall | avg_mrr |
---|---|---|---|
3 | pure_semantic | 0.921466 | 0.846422 |
3 | metadata_filtering | 0.816754 | 0.779232 |
5 | pure_semantic | 0.926702 | 0.847731 |
5 | metadata_filtering | 0.837696 | 0.784206 |
10 | pure_semantic | 0.942408 | 0.849913 |
10 | metadata_filtering | 0.858639 | 0.787354 |
Conclusion
Whilst writing this cookbook, I had secretly ‘hoped’ that hybrid search would outperform pure semantic search. Most people default to vector embeddings, but in production I found that structured metadata extraction consistently delivered better results.
However, this analysis shows that no application is the same. There is no ‘universal’ best method for doing things - it depends on the specific use case and the data at hand. In our particular experiment:
- Pure semantic search achieved higher recall and MRR across all k values
- This suggests that for this specific dataset and query set, the semantic meaning captured by embeddings was sufficient
- The additional complexity of metadata filtering didn’t provide an advantage in this case
This highlights the importance of empirical evaluation rather than assuming one approach is always superior. Some possible reasons for these results:
- Our synthetic queries might be particularly well-aligned with the semantic content
- The metadata extraction might need refinement to better capture query intent
- The dataset might not have enough attribute diversity to showcase the benefits of filtering
I hope this analysis helps you make informed decisions about the best approach for your own use case.
For the full notebook, check it out on: GitHub.