Pure vector search loses exact terms, prices, and document numbers. I fixed this in one day — without changing the LLM, without GPUs, without new dependencies.
My RAG service was working. Vector search found relevant chunks, the LLM generated answers in Ukrainian.
But when a client asked "lawyer consultation 500 UAH" — vector search returned chunks
about legal services in general, ignoring the exact price. And the query "Order No. 142" found
everything about orders, except for document No. 142 itself.
The problem was not with the LLM or the embedding model. Pure vector search looks for *meaning* —
but sometimes *text* is needed. I added BM25 alongside vector search, merged the results
via RRF — and the retrieval quality noticeably increased. In this article — how exactly I did it in production
on Spring Boot + pgvector, what mistakes I made, and what to consider before implementation.
⚡ In Short
- ✅ Problem: vector search "blurs" exact terms, prices, codes, document numbers
- ✅ Solution: hybrid search — BM25 (keywords) + vector (semantics) + RRF (merging)
- ✅ Stack: Java 21, Spring Boot, PostgreSQL + pgvector, tsvector for BM25
- ✅ Configuration: switching vector/hybrid via properties, without recompilation
📚 Article Contents
🎯 Why Hybrid Search: Where Vector Search Fails
I am building a commercial RAG service — business clients upload company documents (PDF, DOCX, CSV, FAQ),
and their users ask questions in natural language and receive answers from the LLM based on
the uploaded content. Stack: Java 21, Spring Boot + Spring AI, PostgreSQL with
pgvector (IVFFlat index),
Ollama locally
(nomic-embed-text for embeddings, mistral-nemo for chat).
Before hybrid search, my search worked like this: the user's query is converted into a vector (768 dimensions via
nomic-embed-text), pgvector finds the closest chunks by cosine similarity.
This captures *meaning* well — a query like "how to protect company data" found chunks about
"information security" and "personal data protection," even if the words didn't match.
But I noticed three types of queries where vector search consistently missed:
- Exact prices and numbers: "500 UAH" — the embedding model converts this into a vector that describes
the general "meaning" of the price, but the difference between 500 and 550 in vector space is minimal
- Codes and document numbers: "Order No. 142" — the vector for "order" is similar to the vector for any
other order, the number is lost
- Specific terms: "amortization" — vector search returned semantically similar results
("wear and tear of fixed assets"), but not always the chunk with the exact term
This is a known problem with vector search, which I described in detail in the
article on Hybrid Search and Reranking.
The solution is to add keyword search (BM25), which looks for exact word matches, and combine the results
with vector search.
📌 What is BM25 and Why the 1994 Algorithm Still Works
BM25 (Best Matching 25) is a text search ranking algorithm that was
formalized by Robertson and Walker in 1994.
It hasn't died in 30 years — and here's why.
BM25 evaluates document relevance based on three factors:
- TF (Term Frequency) — how often a word appears in a specific chunk. The more frequent, the more relevant
- IDF (Inverse Document Frequency) — how rare a word is in the entire collection. The word "document"
appears everywhere — it's less valuable. The word "amortization" is rare — it's more important
- Document length — normalization to ensure short and long chunks are on equal footing
For my use case, BM25 is critical because business documents contain exact terms,
prices, numbers — things that vector search "blurs." BM25 finds a chunk with "500 UAH" in milliseconds,
without neural networks, without GPUs.
BM25 limitations: it doesn't understand synonyms. "Car" ≠ "automobile." If a user
writes "cancel subscription," but the document says "opt out of tariff," BM25 will find nothing.
This is precisely why a *combination* is needed — vector search captures meaning, BM25 captures exact words.
Table: When What Works
| Query Type | Vector Search | BM25 | Hybrid |
| "lawyer consultation 500 UAH" | ⚠️ finds legal, ignores price | ✅ exact match | ✅✅ |
| "how to protect company data" | ✅ semantics | ❌ no exact matches | ✅ |
| "Order No. 142 on dismissal" | ⚠️ finds orders in general | ✅ "Order No. 142" | ✅✅ |
| "refund" | ✅ semantics | ✅ exact match | ✅✅ both signals |
A detailed comparison of BM25 vs. Dense Vector Search with benchmarks is in my
article on Hybrid Search, Section 1.
📌 Database Preparation: Migration, tsvector, GIN Index
Before writing Java code, I prepared PostgreSQL. My vector_store table already
had vectors (embeddings) for cosine similarity search. For BM25, an additional structure is needed —
tsvector. This is a built-in PostgreSQL type where text is broken down into tokens (lexemes) with positions.
Without it, full-text search using the @@ operator doesn't work.
Analogy: a vector (embedding) is the "understanding of meaning" of text. And tsvector —
is an alphabetical index in a book. Different data structures are needed for different types of search.
Migration — Two Commands
-- 1. Add a column for full-text search
ALTER TABLE vector_store ADD COLUMN content_tsv tsvector;
-- 2. Create a GIN index for fast keyword search
CREATE INDEX idx_vector_store_content_tsv ON vector_store USING GIN (content_tsv);
The first command adds the content_tsv column. After migration, it will be NULL
for all existing chunks — this is normal, we'll fill it later.
The second command creates a
GIN (Generalized Inverted Index) —
an index type optimized for full-text search. Without it, BM25 queries with the @@ operator
would scan all rows. With GIN — the search is fast. It's like an IVFFlat index for vectors, only GIN is for text.
Populating tsvector for Existing Chunks
UPDATE vector_store
SET content_tsv = to_tsvector('simple', content)
WHERE content_tsv IS NULL;
⚠️ Pitfall: Choosing the Text Search Configuration
When PostgreSQL converts text to tsvector, it needs to know the language — to remove
stop words ("and," "or," "on") and reduce words to their base form (stemming: "employees" → "employee").
For Ukrainian language, PostgreSQL does not have a built-in dictionary. I had two options:
simple — simply splits text into words, converts to lowercase. No stemming, no stop words. Reliable — there won't be situations where PostgreSQL incorrectly stems a Ukrainian word
russian — the closest built-in option. Stemming partially works for Ukrainian (languages are similar), but may incorrectly stem some words
I chose simple — less "smart," but reliable. BM25 with the simple config
still finds exact keyword matches, and for "understanding meaning," I have vector search.
📌 Implementation of HybridSearchService: two searches + RRF
My HybridSearchService does three things in the search() method:
- Vector search — the same
vectorStore.similaritySearch() via pgvector, cosine similarity
- BM25 search — SQL query with the
@@ operator on the content_tsv column
- RRF merge — combining both result lists using the formula
1/(k + rank)
BM25 search: SQL query
For BM25, I use plainto_tsquery() — it automatically breaks down the user's query
into words and searches for them using AND. Results are ranked using ts_rank() — a built-in PostgreSQL function
that calculates a BM25-like score.
SELECT id, content, metadata FROM vector_store
WHERE content_tsv @@ plainto_tsquery(CAST(:tsconfig AS regconfig), :question)
ORDER BY ts_rank(content_tsv, plainto_tsquery(CAST(:tsconfig AS regconfig), :question)) DESC
LIMIT :topK
⚠️ Pitfall: CAST to regconfig
My first attempt without CAST resulted in a BadSqlGrammarException. PostgreSQL cannot
automatically cast the string parameter ? (which comes as String via
JdbcClient) to the regconfig type. An explicit cast is needed: CAST(:tsconfig AS regconfig).
The same error occurred when populating content_tsv during indexing — it had to be
fixed in two places.
RRF (Reciprocal Rank Fusion): how merging works
RRF was proposed by
Cormack, Clarke, and Buettcher in 2009
(SIGIR '09) and has since become a standard for hybrid search. The formula is:
score(d) = Σ 1 / (k + rank)
where rank is the position of the document in each individual ranking, and k is a
smoothing constant (I use the standard value of 60).
What k does: it controls how much "first place" differs from "fifth place".
- k=60 (standard): 1st place = 1/61 = 0.0164, 5th = 1/65 = 0.0154. The difference is small — all results are "almost equal"
- k=1 (small): 1st = 1/2 = 0.5, 5th = 1/6 = 0.167. The difference is threefold — top results dominate
- k=200 (large): there is almost no difference — only whether the chunk was included in the results matters
Why 60? This value is from the
original research paper.
It is used by
Elasticsearch,
Qdrant,
Weaviate.
Example with real data
Query: "Order No. 142 on dismissal"
Vector search returns (by cosine similarity — based on the meaning of "dismissal"):
- Chunk about the dismissal procedure (rank 1)
- Chunk about the employment contract (rank 2)
- Chunk "Order No. 142" (rank 5)
BM25 returns (by exact match of the words "Order No. 142"):
- Chunk "Order No. 142" (rank 1)
- Chunk "Order No. 155" (rank 2)
RRF score for the "Order No. 142" chunk:
vector rank=5: 1/(60+5) = 0.0154
BM25 rank=1: 1/(60+1) = 0.0164
total: 0.0318 ← highest among all
The "Order No. 142" chunk wins — it is high in both rankings. Without hybrid search,
it would have been in 5th position and might not have been included in the LLM context.
Populating tsvector during indexing of new documents
New documents go through PgVectorIndexingService. After
vectorStore.add(documents) (which stores embeddings), I added an UPDATE
to populate content_tsv:
private void updateTsVector(Long docId) {
jdbcClient.sql(
"UPDATE vector_store SET content_tsv = to_tsvector(CAST(:tsconfig AS regconfig), content) " +
"WHERE metadata->>'doc_id' = :docId AND content_tsv IS NULL"
)
.param("tsconfig", tsConfig) // @Value("${app.search.tsconfig:simple}")
.param("docId", String.valueOf(docId))
.update();
}
Why not a trigger: I considered the option with a PostgreSQL trigger, but a trigger is SQL,
it doesn't know about Spring @Value. If the client changes the language (e.g., from simple
to german for a German client) — the trigger would have to be recreated via migration.
With Java code, everything is managed from application.properties.
📌 Configuration: vector vs hybrid via properties
I did not remove the old PgVectorSearchService. Instead, I implemented switching
via @ConditionalOnProperty:
# application.properties
app.search.mode=hybrid # or "vector" for pure vector search
app.search.tsconfig=simple # language for tsvector: simple, russian, german, english...
app.search.rrf-k=60 # RRF smoothing constant
@ConditionalOnProperty(name = "app.search.mode", havingValue = "vector", matchIfMissing = true)
public class PgVectorSearchService implements SearchService { ... }
@ConditionalOnProperty(name = "app.search.mode", havingValue = "hybrid")
public class HybridSearchService implements SearchService { ... }
Why two modes
My service serves different business clients, and hybrid search is not optimal for all of them:
- Hybrid search puts more load on the database — two queries instead of one (vector + BM25),
plus an additional GIN index consumes RAM. For some clients with a small document base
and simple queries, this is excessive
- Fallback — if the BM25 part breaks or
tsvector is not populated for some
chunks, you can instantly switch back to pure vector search via a single property
- A/B testing — you can compare the quality of responses between modes for the same queries
By default, matchIfMissing = true on PgVectorSearchService — if the property is not set,
it works as before. Nothing breaks.
Configuration per client
For a Ukrainian client:
app.search.mode=hybrid
app.search.tsconfig=simple
For a German client:
app.search.mode=hybrid
app.search.tsconfig=german
For a client with a small database where hybrid is excessive:
app.search.mode=vector
The language for tsvector and tsquery must match — otherwise, the search will not work correctly.
PostgreSQL natively supports: simple, english, german, french,
spanish, russian, italian, dutch, turkish, and
others.
⚠️ Pitfalls I encountered
1. BadSqlGrammarException with plainto_tsquery
Problem: the first run of BM25 search produced bad SQL grammar. PostgreSQL could not
cast the string parameter ? to the regconfig type.
Solution: explicit cast CAST(:tsconfig AS regconfig) in two places —
in the WHERE and ORDER BY parts of the SQL.
2. The same error during indexing of new documents
Problem: I fixed HybridSearchService, but when loading a new document —
the same BadSqlGrammarException in PgVectorIndexingService.updateTsVector().
Solution: add CAST there too. Lesson learned — if to_tsvector() is used
with a parameter via JdbcClient, CAST is needed *always*.
3. @RequiredArgsConstructor does not work with @Value
Problem: Lombok's @RequiredArgsConstructor generates a constructor only for final
fields. A field with @Value is not final, so it doesn't get into the constructor.
Solution: replaced @RequiredArgsConstructor with an explicit constructor in classes
where there are both final dependencies and @Value configuration.
4. content_tsv = NULL for existing documents
Problem: after the migration, the new content_tsv column was NULL for all
existing chunks. BM25 returned 0 results.
Solution: a one-time UPDATE:
UPDATE vector_store SET content_tsv = to_tsvector('simple', content) WHERE content_tsv IS NULL;
5. Similarity threshold for Ukrainian text
The default cosine similarity threshold for vector search is too high for Ukrainian text from
nomic-embed-text. I lowered it to 0.1 — otherwise, vector search returned
few results. More details on choosing an embedding model and threshold —
in the article on embedding models.
📊 Results: vector=5, bm25=2, merged=5
Logs from my service after implementing hybrid search:
Stream query: 'How long does it take to get a refund?', sessionId=3
HybridSearchService: Hybrid search results: 5 chunks (vector=5, bm25=1, merged=5)
Stream query: 'How long does it take to develop a landing page?', sessionId=null
HybridSearchService: Hybrid search results: 5 chunks (vector=5, bm25=2, merged=5)
Stream query: 'write about Local deployment How it works?', sessionId=3
HybridSearchService: Hybrid search results: 5 chunks (vector=5, bm25=0, merged=5)
What we see:
- "refund" — BM25 found 1 chunk with an exact word match, vector found 5 by meaning.
The chunk that appeared in both rankings received the highest RRF score and ended up first
- "landing page development" — BM25 found 2 chunks. Hybrid provided 5 results — chunks from both
rankings, sorted by RRF score
- "Local deployment" — BM25 found 0. This is normal: the query
"write about Local deployment How it works" is semantic, without exact terms from the documents.
Vector search handled it on its own
Main conclusion: hybrid search does not degrade results when BM25 finds nothing —
it simply returns vector results. But when BM25 *does find something* — the quality of the final ranking improves,
because RRF boosts chunks that appear in both searches.
About document chunking — how I split PDF, DOCX, and CSV into chunks for indexing,
including semantic FAQ chunking — read in the
article on Chunking Strategies.
And about choosing Ollama models that work locally on 8 GB RAM —
in the article on Ollama on 8 GB.
❓ Frequently Asked Questions (FAQ)
Is hybrid search necessary if the document base is small (< 100 documents)?
Not necessarily. With a small base, vector search is usually sufficient — there is less "noise"
and relevant chunks reach the top. Hybrid is justified when documents contain
exact terms, codes, or prices that vector search "blurs." That's why I implemented
switching via app.search.mode — for small clients, I keep it as vector.
Why tsvector instead of Elasticsearch for BM25?
I already have PostgreSQL — it stores both documents and vectors (pgvector). Adding Elasticsearch
as a separate service means DevOps overhead, monitoring, data synchronization. The built-in tsvector
with a GIN index solves the BM25 search problem without additional infrastructure. For a scale
of 10K+ documents with high QPS, it's worth considering Elasticsearch or
Qdrant with native hybrid.
How does hybrid search affect latency?
Minimally. BM25 and vector search are executed sequentially (not yet in parallel), but
BM25 via a GIN index takes milliseconds. RRF merging takes microseconds (rank calculation).
The main time is spent embedding the query via nomic-embed-text and vector search via pgvector.
Hybrid adds ~5-15ms to the total search time.
Can I use the russian config instead of simple for Ukrainian?
You can — stemming works partially (languages are similar). But there's a risk of incorrect stemming
for some Ukrainian words. simple is more reliable: it simply tokenizes without stemming.
For "understanding" words, I have vector search — BM25 is only needed for exact matches.
What to do if BM25 always returns 0 results?
Check three things: (1) is the content_tsv column populated — execute
SELECT count(*) FROM vector_store WHERE content_tsv IS NOT NULL;
(2) does the tsconfig in to_tsvector() and plainto_tsquery() match;
(3) are there exact matches of the query words in the chunk text. If queries are predominantly semantic —
BM25 returns 0, and this is normal behavior.
✅ Conclusions
- 🔹 Vector search alone is not enough: it "blurs" exact terms, prices,
document numbers. BM25 fills these gaps
- 🔹 Hybrid search (BM25 + vector + RRF): two parallel searches, merged using the formula
1/(k + rank). A chunk that ranks high in both wins
- 🔹 pgvector + tsvector: no Elasticsearch needed — PostgreSQL with a GIN index
is sufficient for BM25 alongside vector search
- 🔹 Client-specific configuration:
app.search.mode=hybrid/vector,
app.search.tsconfig=simple/german/english — all via properties, without recompilation
- 🔹 Pitfalls:
CAST(:tsconfig AS regconfig) is mandatory for JdbcClient,
content_tsv needs to be populated for existing chunks, similarity threshold
for Ukrainian text is 0.1
- 🔹 Hybrid doesn't degrade: when BM25 finds nothing — results = vector search.
When it finds something — quality improves
My main takeaway: hybrid search is the simplest and most effective step
to improve RAG system quality after basic vector search. If your documents contain
terms, prices, codes — the effect is immediately noticeable. And if not — hybrid simply works as vector search,
breaking nothing.
📖 Sources
📚 Related articles