You have 5 tools — everything is fine. You have 15 tools — problems begin.
You have 50 tools — the agent degrades. But there is a solution that elegantly solves the scaling problem — and you already know how it works because you use
it for documents.
This article is part of a series about AI agents on Spring Boot.
If you haven't read about the mechanics of tool selection yet — start with
How LLM Decides When to Call a Tool.
About what to do after the tool has responded —
Grounding and Trust in Sources.
Table of Contents
When Tool RAG is Needed — Decision Tree
Before reading further — determine if you need this article at all.
90% of projects do not require Tool RAG. But everyone should know about it
because the scaling problem comes unnoticed.
How many tools does your agent have?
↓
up to 10
↓
Good description + system prompt
→ Sufficient. Tool RAG is not needed.
→ Read about writing descriptions.
from 10 to 20
↓
Are the tools thematically different?
├── YES → Tool categories and routing (section 7)
│ Simpler than Tool RAG, solves the problem
└── NO → Improve descriptions, delineate responsibilities
from 20 to 50
↓
Do you see selection degradation in logs?
├── YES → Tool RAG (this article)
└── NO → Categories + routing, monitoring (section 9)
50+
↓
Tool RAG is mandatory.
Without it, the agent degrades to ~13% selection accuracy.
Important: Tool RAG is not a silver bullet and not a must-have
for every project. It is a tool for a specific problem.
If you have fewer than 20 tools — stop at section 7 (categories and routing)
and return to this article when the registry grows.
The Scaling Problem: Numbers That Will Surprise You
You add tools gradually. First 5, then 10, then 20.
Each new tool solves a real task. But at some point
the agent's performance starts to decline — and you don't understand why.
There are no errors in the code. Descriptions are written correctly. But the agent
increasingly chooses the wrong tool or doesn't call any.
This is not a problem with your code. This is a systemic problem that researchers
have called "choice paralysis" — and it is confirmed
by several independent studies from 2025-2026.
What the 2025-2026 Studies Show
RAG-MCP (Anthropic, May 2025) — a study
on real MCP servers showed catastrophic non-linear degradation:
| Number of tools |
Tokens for descriptions |
Selection accuracy (baseline) |
Accuracy with Tool RAG |
| ~10 tools |
~2K |
78% |
~90% |
| ~50 tools |
8K |
84-95% |
~95% |
| ~200 tools |
32K |
41-83% |
~85% |
| 100+ tools |
~20K |
13.62% |
43.13% |
| ~740 tools |
120K |
0-20% |
~60% |
Key result: Tool RAG more than tripled the accuracy
(from 13.62% to 43.13%) with a large tool registry and reduced
the prompt size by more than 50%.
Note the non-linearity of the degradation. With 50 tools, it's still acceptable —
84-95%. With 200 tools, it's already critical — a drop to 41%. With 740 tools,
the agent essentially chooses randomly — 0-20%. This is not a gradual deterioration,
it's a cliff.
What Degradation Looks Like in Logs
If you log tool calls in Agent Chat or AskYourDocs —
this is what the problem looks like in reality:
// Agent with 5 tools — normal behavior:
INFO: Round 1 AGENT_A — Tavily search: 'vibe coding productivity'
INFO: Tavily found 3 results
INFO: Round 1 AGENT_A reply: "GitHub Copilot increases productivity by 51%..."
// The same agent but with 30 tools — degradation:
INFO: Round 1 AGENT_A — [no tool call]
INFO: Round 1 AGENT_A reply: "According to general data, vibe coding..."
// stop_reason: "end_turn" without any tool_use
// The agent "decided" to reply from memory because it got lost in choosing tools
// Or another variant of degradation:
INFO: Round 1 AGENT_A — Wikipedia search: 'productivity'
// Chose Wikipedia instead of Tavily — although Tavily is more suitable for current statistics
// With 30 tools, the model doesn't distinguish subtle differences between descriptions
The "Lost in the Middle" Effect for Tools
A separate and less known problem is positional bias.
The BiasBusters study (2025) showed: tools in the middle
of a long list are chosen significantly less often than tools at the beginning or end.
With 741 tools:
- Tools at the beginning and end of the list — 31-32% accuracy
- Tools in the middle (positions 40-60%) — only 22-52% accuracy
Why this happens technically: transformer models
use Rotary Position Embedding (RoPE) which has a "long-term decay" effect —
tokens at the beginning and end of the context receive more attention than tokens in the middle. This is an architectural bias present in most modern LLMs regardless of provider.
Practical consequence: if your best tool for a query
accidentally ends up in the middle of a list of 50+ tools —
the chance that the model will choose it is significantly lower than if it were first.
Tool RAG solves this automatically — inject only 3-5 tools,
all of them are at the beginning of the context, positional bias is minimal.
Another Reason for Degradation: Tokens
Each tool description takes up tokens. With 50 tools with detailed
descriptions — that's 5,000-15,000 tokens just for tool descriptions,
even before the conversation context and history begin.
The Modarressi et al. study (2025) showed:
- An increase in context by 1,000 tokens →
a decrease in accuracy by 16 percentage points
- Exceeding 8,000 tokens →
a drop to 50 p.p.
In detail about how tokens affect the quality and cost of responses —
in the article LLM Context Window: Why AI Forgets and How Much It Costs.
There you will also find specific cost figures for different providers.
// Token math for Agent Chat with 5 tools (current state):
// 5 tools × ~200 tokens = 1,000 tokens — acceptable ✅
// If scaled to 30 tools:
// 30 tools × ~200 tokens = 6,000 tokens — degradation begins ⚠️
// 50 tools:
// 50 tools × ~200 tokens = 10,000 tokens — significant degradation ❌
// Tool RAG — regardless of registry size:
// inject 3 tools × ~200 tokens = 600 tokens — always acceptable ✅
// even if there are 500 tools in the registry
"Prompt Bloat" and the Death of MCP That Wasn't:
in late 2025, articles appeared with headlines "MCP is Dead After Just One Year".
InfiniFlow (December 2025) analyzed the situation precisely:
the problem is not with the MCP protocol — the problem is with the approach
of "loading all tool descriptions into context at once".
With 4,400+ MCP servers on mcp.so (April 2025) and hundreds of tools
in enterprise systems — "choice paralysis" becomes inevitable.
Tool RAG solves exactly this problem without changing the protocol.
Tool RAG Concept — The Same Idea as RAG for Documents
If you've built a RAG system — you'll understand Tool RAG in a minute.
If not — here's an analogy: imagine a library with 10,000 books.
When you need an answer to a question — you don't read all the books in order.
You go to the catalog, find 3-5 relevant books, and read only them.
Classic RAG does the same with documents.
Tool RAG does the same with agent tools.
Comparison: What Changes in the Prompt
The most illustrative way to understand Tool RAG is to see
what the prompt looks like before and after:
// ❌ WITHOUT Tool RAG — all 30 tools in every request:
{
"tools": [
{ "name": "searchWikipedia", "description": "Searches Wikipedia..." },
{ "name": "searchWeb", "description": "Searches the internet..." },
{ "name": "getStockPrice", "description": "Gets stock price..." },
{ "name": "searchNews", "description": "Searches for news..." },
{ "name": "searchPapers", "description": "Searches for scientific papers..." },
{ "name": "getWeather", "description": "Gets weather..." },
{ "name": "translateText", "description": "Translates text..." },
{ "name": "summarizeDoc", "description": "Summarizes document..." },
// ... 22 more tools
],
"messages": [{ "role": "user", "content": "what is the price of AAPL stock?" }]
}
// Prompt size: ~8,000 tokens just for tools
// The model sees 30 options and can get lost
// ✅ WITH Tool RAG — only 2 relevant tools:
{
"tools": [
{ "name": "getStockPrice", "description": "Gets stock price..." },
{ "name": "searchWeb", "description": "Searches for current financial data..." }
],
"messages": [{ "role": "user", "content": "what is the price of AAPL stock?" }]
}
// Prompt size: ~400 tokens for tools
// The model sees 2 obvious options — selection is accurate
Analogy Table: RAG for Documents vs. Tool RAG
|
Classic RAG |
Tool RAG |
| What is stored in the vector DB |
Document chunks |
Tool descriptions |
| What we index (embed) |
Document text |
Description + trigger scenarios for the tool |
| What we search for |
Relevant text fragments |
Relevant tools |
| What we inject into the LLM |
Top-K fragments as context |
Top-K tools as available instruments |
| Technology |
pgvector, Qdrant |
The same pgvector, Qdrant |
| Embedding model |
text-embedding-3-small |
The same model |
| What it solves |
Hallucinations due to lack of knowledge |
Degradation due to excess choice |
Key advantage: if you already have a RAG infrastructure —
Tool RAG is added to it with minimal effort.
The same pgvector, the same embedding model, the same approach.
If you use AskYourDocs or any other RAG system
on pgvector — Tool RAG is essentially another table in the same DB.
If you haven't built a RAG system yet: before
implementing Tool RAG — I recommend familiarizing yourself with the basic concept.
In detail about how RAG works internally and how to build it
on Spring AI + pgvector —
RAG with Ollama: From Pipeline to Production.
Tool RAG will be clear immediately after this.
Flow Tool RAG: from query to inject
The entire Tool RAG process consists of six steps.
The first two occur before the LLM receives the query —
this is the main difference from the classic approach.
User query: "find the current price of AAPL stock"
↓
[1] Query Embedding
embeddingModel.embed("find the current price of AAPL stock")
→ vector[1536]
// Convert the query into a numerical vector.
// The same embedding model used for documents in RAG.
// Important: embedding models for tool descriptions and for queries
// must be the same — otherwise, vector search will not work correctly.
↓
[2] Vector Search on Tool Descriptions Registry
SELECT tool_name, bean_name, 1 - (embedding <=> query_vector) as score
FROM tool_registry
WHERE is_active = TRUE
ORDER BY embedding <=> query_vector
LIMIT 5
→ [AlphaVantageTool: 0.91, TavilySearchTool: 0.73,
NewsApiTool: 0.61, WikipediaSearchTool: 0.44, ArxivTool: 0.31]
// pgvector returns all tools sorted by relevance.
// Even WikipediaSearchTool made it into the results —
// but with a low score of 0.44. We'll filter it in the next step.
↓
[3] Relevance Threshold Filtering
MIN_RELEVANCE_THRESHOLD = 0.60
→ keep: [AlphaVantageTool: 0.91, TavilySearchTool: 0.73]
// Discard tools with a score below the threshold.
// This is a critical step — without it, an irrelevant tool might be injected.
// NewsApiTool (0.61) is borderline — depends on your threshold.
↓
[4] Load Spring beans for found tools
List<ToolCallback> tools = loadTools(["alphaVantageTool", "tavilySearchTool"])
// Load actual Spring beans by bean_name stored in the registry.
// Not strings — real objects with the @Tool annotation.
↓
[5] LLM query with 2 tools instead of 30+
agentChatModel.call(prompt, tools)
// The model sees only 2 relevant tools.
// ~400 tokens for descriptions instead of 6,000-15,000.
// The choice is obvious — AlphaVantageTool for stock prices.
↓
[6] LLM calls AlphaVantageTool
→ getStockPrice("AAPL")
↓
[7] Response to user
"AAPL Stock: $213.50 | Change: +1.2% | High: $215.20 | Low: $212.80"
Instead of passing 30+ tool descriptions (~6,000 tokens) —
we pass the 2 most relevant ones (~400 tokens).
Token savings: 93%. Selection accuracy: significantly higher.
Latency: +50-100ms for embedding query, but saved tokens
compensate for faster processing of a shorter prompt.
Two steps worth understanding in detail
Step 1 — Query Embedding: converting text
into a numerical vector — the foundation of all Tool RAG. The quality of the embedding model
determines how accurately the system finds a relevant tool.
Details on how to choose an embedding model for your stack —
Embedding models for RAG in 2026: how to choose, provider comparison.
If you want to understand how embeddings work internally —
Embeddings in simple terms: how AI understands meaning, not just words.
Step 3 — Relevance Threshold: this is the most important
parameter that needs to be adjusted for your registry.
Too high a threshold (0.85+) — the agent will often not find any tool
and will respond without searching. Too low (0.40-) — irrelevant tools will be injected
and degradation will return.
Recommended starting threshold: 0.60-0.65.
Adjust based on monitoring (section 9).
What to do if no tool is found above the threshold?
Two options: (1) respond without tools — safe if the query
does not require current data; (2) lower the threshold and inject
the best result even if the score is low.
In Agent Chat, we use option 1 as the default —
the agent responds from its own knowledge if Tool RAG found nothing.
Implementation: pgvector for tool registry on Spring AI
Database schema for tool registry
-- Tool registry with embedding descriptions
CREATE TABLE tool_registry (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tool_name VARCHAR(200) NOT NULL UNIQUE, -- Java class or method name
display_name VARCHAR(200) NOT NULL, -- human-readable name
description TEXT NOT NULL, -- full description for embedding
category VARCHAR(100), -- category for routing
bean_name VARCHAR(200) NOT NULL, -- Spring bean name for injection
is_active BOOLEAN DEFAULT TRUE,
version INTEGER DEFAULT 1, -- for versioning
embedding vector(1536), -- pgvector
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Index for fast vector search
CREATE INDEX tool_registry_embedding_idx
ON tool_registry
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 10); -- 10 lists for a small registry (up to 1000 tools)
-- Index for searching by category
CREATE INDEX tool_registry_category_idx ON tool_registry(category, is_active);
Tool registration service
@Service
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryService {
private final JdbcTemplate jdbcTemplate;
private final EmbeddingModel embeddingModel;
/**
* Registers a tool in the registry.
* Called on startup or when a new tool is added.
*/
public void registerTool(ToolRegistration registration) {
// Generate embedding from description
float[] embedding = embeddingModel.embed(registration.getDescription());
jdbcTemplate.update("""
INSERT INTO tool_registry
(tool_name, display_name, description, category, bean_name, embedding)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT (tool_name) DO UPDATE SET
description = EXCLUDED.description,
category = EXCLUDED.category,
embedding = EXCLUDED.embedding,
version = tool_registry.version + 1,
updated_at = NOW()
""",
registration.getToolName(),
registration.getDisplayName(),
registration.getDescription(),
registration.getCategory(),
registration.getBeanName(),
embedding
);
log.info("Tool registered: {} (category: {})",
registration.getToolName(), registration.getCategory());
}
/**
* Semantic search for relevant tools for a query
*/
public List<ToolMatch> findRelevantTools(String userQuery, int topK) {
float[] queryEmbedding = embeddingModel.embed(userQuery);
return jdbcTemplate.query("""
SELECT tool_name, display_name, bean_name, category,
1 - (embedding <=> ?) as relevance_score
FROM tool_registry
WHERE is_active = TRUE
ORDER BY embedding <=> ?
LIMIT ?
""",
(rs, rowNum) -> ToolMatch.builder()
.toolName(rs.getString("tool_name"))
.displayName(rs.getString("display_name"))
.beanName(rs.getString("bean_name"))
.category(rs.getString("category"))
.relevanceScore(rs.getDouble("relevance_score"))
.build(),
embedding, embedding, topK
);
}
}
@Value
@Builder
public class ToolMatch {
String toolName;
String displayName;
String beanName;
String category;
double relevanceScore;
}
@Value
@Builder
public class ToolRegistration {
String toolName;
String displayName;
String description; // full text for embedding — the more detailed, the better
String category;
String beanName;
}
Registering all tools on startup
@Component
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryInitializer implements ApplicationRunner {
private final ToolRegistryService registryService;
@Override
public void run(ApplicationArguments args) {
log.info("Initializing tool registry...");
List<ToolRegistration> tools = List.of(
ToolRegistration.builder()
.toolName("AlphaVantageTool.getStockPrice")
.displayName("Stock Price Lookup")
.description("""
Retrieves the current stock price on the exchange.
Use for queries about: stock prices, company market capitalization,
financial indicators, market dynamics.
Supports tickers: AAPL, GOOGL, TSLA, AMZN, MSFT, and others.
DO NOT use for: news, forecasts, general company information.
""")
.category("FINANCE")
.beanName("alphaVantageTool")
.build(),
ToolRegistration.builder()
.toolName("TavilySearchTool.searchWeb")
.displayName("Web Search")
.description("""
Searches for current information on the internet via Tavily.
Use for: latest news, current statistics,
events of 2024-2025, data not available on Wikipedia.
DO NOT use for: stable facts, definitions, biographies.
""")
.category("SEARCH")
.beanName("tavilySearchTool")
.build(),
ToolRegistration.builder()
.toolName("WikipediaSearchTool.searchWikipedia")
.displayName("Wikipedia Search")
.description("""
Searches for stable factual information on Wikipedia.
Use for: definitions of concepts, biographies, scientific facts,
historical events, geographical information.
DO NOT use for: current news, prices, ongoing events.
""")
.category("SEARCH")
.beanName("wikipediaSearchTool")
.build(),
ToolRegistration.builder()
.toolName("ArxivSearchTool.searchPapers")
.displayName("ArXiv Scientific Papers")
.description("""
Searches for scientific articles and research papers on ArXiv.
Use for: scientific research, academic publications,
technical articles in AI/ML, physics, mathematics, CS.
Queries must be in English.
""")
.category("RESEARCH")
.beanName("arxivSearchTool")
.build(),
ToolRegistration.builder()
.toolName("NewsApiSearchTool.searchNews")
.displayName("News Search")
.description("""
Searches for recent news on a topic via NewsAPI.
Use for: latest news, current events,
corporate news, market news.
Limit: 100 requests per day.
""")
.category("NEWS")
.beanName("newsApiSearchTool")
.build()
);
tools.forEach(registryService::registerTool);
log.info("Tool registry initialized: {} tools registered", tools.size());
}
}
Dynamic tool injection in Spring AI
The most interesting part — how to inject only relevant tools
into the LLM query instead of all of them at once. The key idea:
ToolCallingChatOptions in Spring AI accepts an array
of ToolCallback[] dynamically — meaning different queries
can receive different sets of tools without changing the code.
@Service
@RequiredArgsConstructor
@Slf4j
public class ToolRagAgentService {
private final ToolRegistryService registryService;
private final ApplicationContext applicationContext;
private final ChatModel chatModel;
// How many tools to inject into one query at most
private static final int TOP_K_TOOLS = 3;
// Minimum relevance threshold — adjust for your registry
private static final double MIN_RELEVANCE = 0.60;
public String askWithToolRag(String systemPrompt, String userQuery) {
long startTime = System.currentTimeMillis();
// 1. Find relevant tools via vector search
List<ToolMatch> relevantMatches = registryService
.findRelevantTools(userQuery, TOP_K_TOOLS);
// 2. Filter by minimum relevance threshold
List<ToolMatch> filteredTools = relevantMatches.stream()
.filter(m -> m.getRelevanceScore() >= MIN_RELEVANCE)
.toList();
long ragLatency = System.currentTimeMillis() - startTime;
log.info("Tool RAG: query='{}' found={}/{} tools threshold={} latency={}ms",
userQuery.length() > 50 ? userQuery.substring(0, 50) + "..." : userQuery,
filteredTools.size(),
relevantMatches.size(),
MIN_RELEVANCE,
ragLatency);
filteredTools.forEach(t ->
log.info(" → {} score={:.3f}", t.getToolName(), t.getRelevanceScore()));
// 3. Construct the message
List<Message> messages = List.of(
new SystemMessage(systemPrompt),
new UserMessage(userQuery)
);
// 4. Fallback if no relevant tool is found
if (filteredTools.isEmpty()) {
log.warn("Tool RAG: no tools above threshold={} for query='{}' — answering without tools",
MIN_RELEVANCE, userQuery);
return chatModel.call(new Prompt(messages))
.getResult().getOutput().getText();
}
// 5. Load Spring beans and make the query
ToolCallback[] tools = loadToolCallbacks(filteredTools);
return chatModel.call(
new Prompt(messages,
ToolCallingChatOptions.builder()
.toolCallbacks(tools)
.build()))
.getResult().getOutput().getText();
}
/**
* Loads ToolCallbacks via Spring ApplicationContext
* by bean name stored in the registry.
*
* Thread-safe: ApplicationContext.getBean() is thread-safe —
* Spring returns singleton beans without blocking.
*/
private ToolCallback[] loadToolCallbacks(List<ToolMatch> matches) {
return matches.stream()
.map(match -> {
try {
Object bean = applicationContext.getBean(match.getBeanName());
return ToolCallbacks.from(bean);
} catch (NoSuchBeanDefinitionException e) {
// Bean not found — perhaps the tool was removed from code
// but remains in the DB registry
log.error("Tool bean not found: '{}' — " +
"deactivate tool in registry or restart the application",
match.getBeanName());
return new ToolCallback[0];
} catch (BeansException e) {
log.error("Failed to load tool bean '{}': {}",
match.getBeanName(), e.getMessage());
return new ToolCallback[0];
}
})
.flatMap(Arrays::stream)
.toArray(ToolCallback[]::new);
}
}
Caching embeddings to reduce latency
Tool RAG adds one embedding query before each LLM call.
If the same or similar queries are repeated — caching
allows avoiding unnecessary embedding requests:
@Service
@RequiredArgsConstructor
@Slf4j
public class CachedToolRegistryService {
private final ToolRegistryService registryService;
// Simple in-memory cache — for production, use Redis
// ConcurrentHashMap is thread-safe
private final Map<String, CachedResult> cache = new ConcurrentHashMap<>();
private static final Duration CACHE_TTL = Duration.ofMinutes(5);
private static final int MAX_CACHE_SIZE = 500;
public List<ToolMatch> findRelevantToolsCached(String userQuery, int topK) {
// Normalize the query for better cache hit rate
String cacheKey = userQuery.toLowerCase().trim();
CachedResult cached = cache.get(cacheKey);
if (cached != null && !cached.isExpired()) {
log.debug("Tool RAG cache HIT for query: '{}'", cacheKey);
return cached.tools();
}
// Cache miss — perform actual search
log.debug("Tool RAG cache MISS for query: '{}'", cacheKey);
List<ToolMatch> tools = registryService.findRelevantTools(userQuery, topK);
// Store in cache if the limit is not exceeded
if (cache.size() < MAX_CACHE_SIZE) {
cache.put(cacheKey, new CachedResult(tools, Instant.now()));
}
return tools;
}
/**
* Clears the cache when the tool registry is updated
* (call after registerTool or updateToolDescription)
*/
public void invalidateCache() {
int size = cache.size();
cache.clear();
log.info("Tool RAG cache invalidated: {} entries cleared", size);
}
record CachedResult(List<ToolMatch> tools, Instant cachedAt) {
boolean isExpired() {
return Instant.now().isAfter(cachedAt.plus(CACHE_TTL));
}
}
}
Integration into AgentConversationRunner
Here's how the Agent Chat migration from a static tool list
to dynamic Tool RAG looks — minimal changes to existing code:
// In AgentConversationRunner.ask()
// ❌ Was — all 5 tools in each round regardless of the topic:
ToolCallback[] tools = ToolCallbacks.from(
wikipediaSearchTool, // always inject
tavilySearchTool, // always inject
alphaVantageTool, // always inject — even if we're talking about architecture
arxivSearchTool, // always inject
newsApiSearchTool // always inject
);
// ~1,000 tokens for tools each round
// ✅ Now — only relevant tools for the current message:
List<ToolMatch> relevantTools = cachedToolRegistryService
.findRelevantToolsCached(lastMessage, 3);
ToolCallback[] tools = loadToolCallbacks(relevantTools);
log.info("Tool RAG round={} injected={} tools: [{}]",
round,
tools.length,
relevantTools.stream()
.map(t -> t.getToolName() + ":" + String.format("%.2f", t.getRelevanceScore()))
.collect(joining(", ")));
// Example logs during a dialogue about vibe coding:
// Tool RAG round=1 injected=2 tools: [TavilySearchTool:0.87, WikipediaSearchTool:0.71]
// Tool RAG round=2 injected=2 tools: [TavilySearchTool:0.83, NewsApiSearchTool:0.68]
// Tool RAG round=3 injected=1 tools: [WikipediaSearchTool:0.79]
// AlphaVantageTool and ArxivSearchTool not injected — not relevant to the topic
Pitfall — registry and code desynchronization:
if you delete or rename a Spring bean but don't update the DB registry —
loadToolCallbacks() will throw NoSuchBeanDefinitionException.
To avoid this: add registry validation on application startup
(method validateRegistry() from section 8) and deactivate
obsolete entries using deactivateTool()
instead of deleting from the DB.
Tool categories and routing — a simplified alternative
If you have 10-30 tools and they are thematically diverse —
categories and keyword routing are simpler and faster than full Tool RAG.
This is an intermediate solution that solves 80% of scaling problems
without a vector DB and without embedding requests.
The main difference from Tool RAG: routing determines the category
by searching for keywords in the query — this is a CPU operation in microseconds,
not an embedding request in 50-100ms. For systems where latency is critical —
this is a significant advantage.
Approach: keyword routing
@Service
@RequiredArgsConstructor
@Slf4j
public class ToolCategoryRouter {
// Tools are injected directly — without extra fields
private final AlphaVantageTool alphaVantageTool;
private final TavilySearchTool tavilySearchTool;
private final WikipediaSearchTool wikipediaSearchTool;
private final ArxivSearchTool arxivSearchTool;
private final NewsApiSearchTool newsApiSearchTool;
// Keywords for each category
// Important: words must be specific enough not to cause
// false positives. "news" can appear in any query —
// so "breaking news", "latest news", "новини сьогодні" is better
private static final Map<String, List<String>> CATEGORY_KEYWORDS = Map.of(
"FINANCE", List.of("акція", "ціна акцій", "капіталізація",
"stock price", "market cap", "AAPL", "TSLA", "GOOGL"),
"NEWS", List.of("новини", "останні події", "сьогодні відбулось",
"breaking news", "latest news", "поточні події"),
"RESEARCH", List.of("дослідження", "наукова стаття", "arxiv",
"research paper", "academic study", "peer-reviewed"),
"FACTS", List.of("що таке", "хто такий", "визначення", "wikipedia",
"what is", "who is", "definition of", "history of")
);
// Mapping categories to tools — defined once
// LinkedHashSet preserves order and prevents duplicates
private Map<String, List<Object>> buildCategoryTools() {
return Map.of(
"FINANCE", List.of(alphaVantageTool, tavilySearchTool),
"NEWS", List.of(newsApiSearchTool, tavilySearchTool),
"RESEARCH", List.of(arxivSearchTool, wikipediaSearchTool),
"FACTS", List.of(wikipediaSearchTool, tavilySearchTool)
);
}
/**
* Determines the query category and returns the corresponding tools.
* Deduplicates tools if the query falls into multiple categories.
*/
public ToolCallback[] routeTools(String userQuery) {
String queryLower = userQuery.toLowerCase();
Set<String> matchedCategories = CATEGORY_KEYWORDS.entrySet().stream()
.filter(entry -> entry.getValue().stream()
.anyMatch(queryLower::contains))
.map(Map.Entry::getKey)
.collect(Collectors.toSet());
log.info("Tool routing: query='{}' → categories={}",
userQuery.length() > 60 ? userQuery.substring(0, 60) + "..." : userQuery,
matchedCategories.isEmpty() ? "[DEFAULT]" : matchedCategories);
if (matchedCategories.isEmpty()) {
// Default: basic search for any query
log.info("Tool routing: no category matched → using default [Tavily, Wikipedia]");
return ToolCallbacks.from(tavilySearchTool, wikipediaSearchTool);
}
Map<String, List<Object>> categoryTools = buildCategoryTools();
// LinkedHashSet for deduplication — tavilySearchTool won't be added twice
// if the query matches FINANCE and NEWS simultaneously
Set<Object> selectedTools = new LinkedHashSet<>();
matchedCategories.forEach(category ->
selectedTools.addAll(categoryTools.getOrDefault(category, List.of()))
);
log.info("Tool routing: selected {} tools: {}",
selectedTools.size(),
selectedTools.stream()
.map(t -> t.getClass().getSimpleName())
.collect(Collectors.joining(", ")));
return ToolCallbacks.from(selectedTools.toArray());
}
}
When routing works well — and when it breaks
// ✅ Routing works well:
"what is the stock price of AAPL?"
→ FINANCE → [AlphaVantageTool, TavilySearchTool] ✓
"what is vibe coding?"
→ FACTS → [WikipediaSearchTool, TavilySearchTool] ✓
"latest news about Tesla stock price"
→ NEWS + FINANCE → [NewsApiSearchTool, TavilySearchTool, AlphaVantageTool] ✓
(deduplication works — TavilySearchTool once)
// ❌ Routing breaks:
"tell me about a company that changed the market"
→ [] → DEFAULT → [TavilySearchTool, WikipediaSearchTool]
(no keyword matched — but Tavily is still suitable)
"what is the weather in Kyiv and what is the dollar exchange rate?"
→ [] → DEFAULT
(no WEATHER or CURRENCY categories — routing doesn't know what to do)
// With Tool RAG: embedding would find WeatherTool and CurrencyTool automatically
"research shows that stock prices are rising"
→ RESEARCH + FINANCE → too many tools
(the word "research" is present but the query doesn't require ArXiv)
// This is where routing becomes fragile
Comparison: routing vs Tool RAG
|
Keyword routing |
Tool RAG |
| Overhead per request |
~0ms (CPU) |
~50-100ms (embedding) |
| Selection accuracy |
Good for simple queries |
High for any query |
| Code maintenance |
Manual keyword updates |
Update descriptions in DB |
| Multilingual support |
Separate keywords for each language |
Automatic via semantic search |
| Number of tools |
Up to 30 |
Unlimited |
| Implementation complexity |
Low — one class |
Medium — DB + embedding |
| Infrastructure |
Nothing additional |
pgvector + embedding model |
Three signals that it's time to switch from routing to Tool RAG:
1. The keyword list is growing — you are constantly adding new
keywords because routing misses queries. This is a sign that semantic search
will handle it better.
2. Queries often belong to 2-3 categories simultaneously —
injection becomes unpredictable, the agent receives too many tools.
3. You are adding a new interface language — keywords need to
be duplicated for each language, while Tool RAG supports multilingualism
automatically through semantic search.
Tool versioning — how to update the registry
A practical problem not found in any tutorial:
what to do when a tool changes? New functionality, new description,
new limitations. Or even more complex — you changed the embedding model
and all old vectors became incompatible.
Key principle: never delete records from the registry —
deactivate them. This preserves the change history and allows
rolling back if something goes wrong.
Four update scenarios
@Service
@RequiredArgsConstructor
@Slf4j
public class ToolVersioningService {
private final ToolRegistryService registryService;
private final CachedToolRegistryService cachedRegistryService;
private final JdbcTemplate jdbcTemplate;
private final EmbeddingModel embeddingModel;
/**
* Scenario 1: Only the description changed (most common)
* Regenerate only one embedding — other fields unchanged
*/
@Transactional
public void updateToolDescription(String toolName, String newDescription) {
// First, check if the tool exists
int exists = jdbcTemplate.queryForObject(
"SELECT COUNT(*) FROM tool_registry WHERE tool_name = ? AND is_active = TRUE",
Integer.class, toolName);
if (exists == 0) {
throw new IllegalArgumentException(
"Tool not found or inactive: " + toolName);
}
// Generate a new embedding only for the updated description
float[] newEmbedding = embeddingModel.embed(newDescription);
jdbcTemplate.update("""
UPDATE tool_registry
SET description = ?,
embedding = ?,
version = version + 1,
updated_at = NOW()
WHERE tool_name = ?
""",
newDescription, newEmbedding, toolName
);
// Always invalidate the cache after an update
cachedRegistryService.invalidateCache();
log.info("Tool updated: {} — new embedding generated, cache invalidated",
toolName);
}
/**
* Scenario 2: Tool deactivated (obsolete or removed from code)
* DO NOT delete — deactivate, preserve history
*/
@Transactional
public void deactivateTool(String toolName) {
int updated = jdbcTemplate.update("""
UPDATE tool_registry
SET is_active = FALSE,
updated_at = NOW()
WHERE tool_name = ?
""",
toolName
);
if (updated == 0) {
log.warn("Tool not found for deactivation: {}", toolName);
return;
}
cachedRegistryService.invalidateCache();
log.info("Tool deactivated: {} — removed from active registry", toolName);
}
/**
* Scenario 3: Mass update after refactoring descriptions
* Regenerate all embeddings — may take several minutes
* for a large registry
*/
@Transactional
public RebuildResult rebuildAllEmbeddings() {
log.info("Starting full tool registry rebuild...");
long startTime = System.currentTimeMillis();
List<Map<String, Object>> tools = jdbcTemplate.queryForList(
"SELECT tool_name, description FROM tool_registry WHERE is_active = TRUE"
);
int success = 0;
int failed = 0;
for (Map<String, Object> tool : tools) {
String toolName = (String) tool.get("tool_name");
String description = (String) tool.get("description");
try {
float[] newEmbedding = embeddingModel.embed(description);
jdbcTemplate.update("""
UPDATE tool_registry
SET embedding = ?,
version = version + 1,
updated_at = NOW()
WHERE tool_name = ?
""",
newEmbedding, toolName
);
success++;
} catch (Exception e) {
log.error("Failed to rebuild embedding for tool '{}': {}",
toolName, e.getMessage());
failed++;
}
}
cachedRegistryService.invalidateCache();
long elapsed = System.currentTimeMillis() - startTime;
log.info("Tool registry rebuild complete: {}/{} tools updated in {}ms",
success, tools.size(), elapsed);
return new RebuildResult(success, failed, elapsed);
}
/**
* Scenario 4: Changing the embedding model — the most complex case
* All old vectors are incompatible with the new model —
* the entire registry needs to be rebuilt and dimensions updated
*/
public void migrateEmbeddingModel(int newDimensions) {
log.warn("EMBEDDING MODEL MIGRATION STARTED — " +
"all existing vectors will be invalidated!");
// 1. Check if the new dimension differs
// (if it's the same — just rebuildAllEmbeddings)
if (newDimensions != getCurrentDimensions()) {
log.info("Updating vector dimensions: {} → {}",
getCurrentDimensions(), newDimensions);
// 2. Change the column dimension in pgvector
// WARNING: this is a DROP and CREATE — all old vectors are deleted
jdbcTemplate.execute(
"ALTER TABLE tool_registry ALTER COLUMN embedding TYPE vector(" +
newDimensions + ")");
}
// 3. Regenerate all embeddings with the new model
RebuildResult result = rebuildAllEmbeddings();
log.info("Embedding model migration complete: {}", result);
}
private int getCurrentDimensions() {
// Get current dimension from the first record
return jdbcTemplate.queryForObject("""
SELECT vector_dims(embedding)
FROM tool_registry
WHERE embedding IS NOT NULL
LIMIT 1
""", Integer.class);
}
record RebuildResult(int success, int failed, long elapsedMs) {}
}
Registry validation on startup — health check
Add validation in ApplicationRunner to see discrepancies between code and registry immediately after startup:
@Component
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryValidator implements ApplicationRunner {
private final ToolVersioningService versioningService;
private final ApplicationContext applicationContext;
private final JdbcTemplate jdbcTemplate;
@Override
public void run(ApplicationArguments args) {
log.info("Validating tool registry consistency...");
// Get all active bean names from the registry
List<String> registeredBeans = jdbcTemplate.queryForList(
"SELECT bean_name FROM tool_registry WHERE is_active = TRUE",
String.class
);
List<String> issues = new ArrayList<>();
// Check if beans exist in the Spring context
for (String beanName : registeredBeans) {
if (!applicationContext.containsBean(beanName)) {
issues.add("ORPHANED: bean '" + beanName +
"' in registry but NOT in Spring context");
}
}
// Check for tools with null embedding
List<String> noEmbedding = jdbcTemplate.queryForList("""
SELECT tool_name FROM tool_registry
WHERE is_active = TRUE AND embedding IS NULL
""", String.class);
if (!noEmbedding.isEmpty()) {
issues.add("NO EMBEDDING: tools without embedding: " + noEmbedding);
}
// Output the result
if (issues.isEmpty()) {
log.info("Tool registry OK: {} active tools, all consistent",
registeredBeans.size());
} else {
log.warn("Tool registry has {} issues:", issues.size());
issues.forEach(issue -> log.warn(" ⚠️ {}", issue));
log.warn("Run ToolVersioningService.rebuildAllEmbeddings() " +
"or deactivateTool() to fix");
}
}
}
Pitfall — cache after update:
each registry update method calls
cachedRegistryService.invalidateCache().
If you forget about this — the agent will continue to use
old search results for another 5 minutes (cache TTL).
This is especially critical when deactivating a tool: the agent might try
to call a deactivated tool if it's still in the cache.
Pitfall — embedding model migration:
this is an irreversible operation that deletes all old vectors.
Always back up the table before migration:
CREATE TABLE tool_registry_backup AS SELECT * FROM tool_registry;
Registry Monitoring and Metrics
The tools registry is a living component. Without monitoring, you won't know:
which tools are actually being used, which can be removed,
which descriptions are worth rewriting. And most importantly – you won't know
when the Tool RAG starts to fail.
Analytics Table Schema
-- Logging every tool selection with Tool RAG
CREATE TABLE tool_usage_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tool_name VARCHAR(200) NOT NULL,
query_snippet VARCHAR(500), -- first 500 characters of the query
relevance_score DOUBLE PRECISION,
was_called BOOLEAN DEFAULT FALSE, -- whether LLM actually called the tool after injection
session_id VARCHAR(100), -- session ID for correlation
conversation_id BIGINT, -- for Agent Chat
created_at TIMESTAMP DEFAULT NOW()
);
-- Index for fast queries by time and tool_name
CREATE INDEX tool_usage_log_time_idx ON tool_usage_log(created_at DESC);
CREATE INDEX tool_usage_log_tool_idx ON tool_usage_log(tool_name, created_at DESC);
-- Aggregated statistics for the last 30 days
CREATE VIEW tool_usage_stats AS
SELECT
tool_name,
COUNT(*) as injected_count,
SUM(CASE WHEN was_called THEN 1 ELSE 0 END) as called_count,
ROUND(
SUM(CASE WHEN was_called THEN 1 ELSE 0 END)::numeric /
NULLIF(COUNT(*), 0) * 100, 1
) as call_rate_pct,
ROUND(AVG(relevance_score)::numeric, 3) as avg_relevance,
ROUND(MIN(relevance_score)::numeric, 3) as min_relevance,
MAX(created_at) as last_injected,
MAX(CASE WHEN was_called THEN created_at END) as last_called
FROM tool_usage_log
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY tool_name
ORDER BY called_count DESC;
ToolUsageMonitor — Monitoring Service
@Service
@RequiredArgsConstructor
@Slf4j
public class ToolUsageMonitor {
private final JdbcTemplate jdbcTemplate;
/**
* Batch insert — one query instead of N separate ones
*/
public void logInjection(List<ToolMatch> injectedTools,
String query,
String sessionId,
Long conversationId) {
if (injectedTools.isEmpty()) return;
String querySnippet = query.length() > 500
? query.substring(0, 500) : query;
// Batch insert for all tools at once
jdbcTemplate.batchUpdate("""
INSERT INTO tool_usage_log
(tool_name, query_snippet, relevance_score,
session_id, conversation_id)
VALUES (?, ?, ?, ?, ?)
""",
injectedTools,
injectedTools.size(),
(ps, tool) -> {
ps.setString(1, tool.getToolName());
ps.setString(2, querySnippet);
ps.setDouble(3, tool.getRelevanceScore());
ps.setString(4, sessionId);
ps.setObject(5, conversationId);
}
);
}
/**
* Update was_called=TRUE after actual tool call.
* Use session_id instead of time — more reliable.
*/
public void markToolCalled(String toolName, String sessionId) {
int updated = jdbcTemplate.update("""
UPDATE tool_usage_log
SET was_called = TRUE
WHERE tool_name = ?
AND session_id = ?
AND was_called = FALSE
""",
toolName, sessionId
);
if (updated == 0) {
log.warn("markToolCalled: no record found for tool='{}' session='{}'",
toolName, sessionId);
}
}
/**
* "Dead" tools — injected frequently but rarely called.
* callRateThreshold: 0.10 = less than 10% of calls after injection
*/
public List<DeadToolReport> findDeadTools(double callRateThreshold) {
return jdbcTemplate.query("""
SELECT tool_name, injected_count, called_count,
call_rate_pct, avg_relevance
FROM tool_usage_stats
WHERE injected_count >= 10
AND call_rate_pct < ?
ORDER BY injected_count DESC
""",
(rs, rowNum) -> new DeadToolReport(
rs.getString("tool_name"),
rs.getInt("injected_count"),
rs.getInt("called_count"),
rs.getDouble("call_rate_pct"),
rs.getDouble("avg_relevance")
),
callRateThreshold * 100 // convert 0.10 → 10 for comparison with call_rate_pct
);
}
/**
* Tools not used for 30+ days
*/
public List<String> findUnusedTools() {
return jdbcTemplate.queryForList("""
SELECT tr.tool_name
FROM tool_registry tr
LEFT JOIN tool_usage_log tul
ON tr.tool_name = tul.tool_name
AND tul.created_at > NOW() - INTERVAL '30 days'
WHERE tr.is_active = TRUE
AND tul.tool_name IS NULL
ORDER BY tr.tool_name
""",
String.class
);
}
/**
* Tools with consistently low relevance score —
* a signal that the description poorly matches actual queries
*/
public List<String> findLowRelevanceTools(double scoreThreshold) {
return jdbcTemplate.queryForList("""
SELECT tool_name
FROM tool_usage_stats
WHERE injected_count >= 5
AND avg_relevance < ?
ORDER BY avg_relevance ASC
""",
String.class,
scoreThreshold // e.g., 0.65
);
}
/**
* Full registry health report
*/
public RegistryHealthReport generateHealthReport() {
List<DeadToolReport> deadTools = findDeadTools(0.10);
List<String> unusedTools = findUnusedTools();
List<String> lowRelevanceTools = findLowRelevanceTools(0.65);
int totalActive = jdbcTemplate.queryForObject(
"SELECT COUNT(*) FROM tool_registry WHERE is_active = TRUE",
Integer.class);
RegistryHealthReport report = new RegistryHealthReport(
totalActive, deadTools, unusedTools, lowRelevanceTools);
if (report.hasIssues()) {
log.warn("Tool registry health report:\n{}", report.summary());
} else {
log.info("Tool registry healthy: {} active tools, no issues", totalActive);
}
return report;
}
record DeadToolReport(
String toolName,
int injectedCount,
int calledCount,
double callRatePct,
double avgRelevance
) {}
record RegistryHealthReport(
int totalActiveTools,
List<DeadToolReport> deadTools,
List<String> unusedTools,
List<String> lowRelevanceTools
) {
boolean hasIssues() {
return !deadTools.isEmpty()
|| !unusedTools.isEmpty()
|| !lowRelevanceTools.isEmpty();
}
String summary() {
return String.format("""
Active tools: %d
Dead tools (low call rate): %s
Unused tools (30+ days): %s
Low relevance tools: %s
""",
totalActiveTools,
deadTools.stream().map(DeadToolReport::toolName).toList(),
unusedTools,
lowRelevanceTools
);
}
}
}
Scheduled Monitoring — Automatic Report
@Component
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryHealthScheduler {
private final ToolUsageMonitor monitor;
private final ToolVersioningService versioningService;
/**
* Weekly health check — every Monday at 9:00 AM
*/
@Scheduled(cron = "0 0 9 * * MON")
public void weeklyHealthCheck() {
log.info("=== Weekly Tool Registry Health Check ===");
RegistryHealthReport report = monitor.generateHealthReport();
if (report.hasIssues()) {
// In production: send to Slack/email/Grafana
// notificationService.sendAlert("Tool Registry Issues", report.summary());
log.warn("Action required — review tool registry");
}
}
/**
* Daily consistency check on startup
*/
@Scheduled(cron = "0 0 8 * * *")
public void dailyConsistencyCheck() {
// Check for tools without embeddings
// (may appear after a failure during registerTool)
List<String> noEmbedding = versioningService.findToolsWithoutEmbeddings();
if (!noEmbedding.isEmpty()) {
log.warn("Tools without embeddings found: {} — rebuilding",
noEmbedding);
versioningService.rebuildEmbeddingsForTools(noEmbedding);
}
}
}
What to Do with Monitoring Results
| Symptom |
Metric |
Cause |
Action |
| Tool injected frequently, called rarely |
call_rate_pct < 10% |
Description too broad |
Add anti-use-cases to description, narrow down triggers |
| Tool not appearing in injection |
injected_count = 0 |
Description does not match user queries |
Rewrite description using language from logs |
| Tool not used for 30+ days |
last_called IS NULL |
Tool is outdated or covered by another tool |
Deactivate using deactivateTool() |
| avg_relevance consistently low |
avg_relevance < 0.65 |
Weak semantic match |
Enrich description with synonyms and query examples |
| call_rate_pct = 100% |
called_count = injected_count |
Tool injected too rarely — threshold is too high |
Lower MIN_RELEVANCE or enrich description |
Practical Tip: the first week after implementing
Tool RAG — log all query_snippets for injections with a low score (<0.70).
These queries show where semantic search fails to find the correct tool.
Add these formulations directly to the description as examples —
and search accuracy will increase without any code changes.
Comparison of Approaches
The three approaches are not mutually exclusive — in production,
a combination is often used: routing for fast
initial filtering and Tool RAG for precise selection within a category.
| Approach |
Optimal for |
Latency overhead |
Selection Accuracy |
Infrastructure |
Complexity |
| All tools in prompt |
up to 10 tools |
0ms |
78-95% |
None |
Minimal |
| Categories + keyword routing |
10-30 tools, thematically diverse |
~1ms (CPU) |
75-90% |
None |
Low |
| Tool RAG (vector search) |
30+ tools, semantically similar |
~50-150ms |
85-95% |
pgvector + embedding model |
Medium |
| Tool RAG + caching |
30+ tools, repetitive queries |
~10-30ms |
85-95% |
pgvector + Redis/ConcurrentHashMap |
Medium+ |
| Hybrid: routing → Tool RAG |
50+ tools, mixed categories |
~20-80ms |
90-97% |
pgvector + categories in DB |
High |
Hybrid Approach — How It Works
For large registries (50+ tools), the most effective combination is:
User Query
↓
[1] Keyword routing → determine category (FINANCE / NEWS / RESEARCH)
~1ms, CPU operation
↓
[2] Tool RAG within the category only
Search among 10-15 tools instead of 50+
~30-50ms instead of 100-150ms
↓
[3] Inject Top-3 tools from the category
Higher accuracy because the search is in a smaller space
// Example:
// Query: "what is the stock price of AAPL and what is the news about Apple?"
// Routing: FINANCE + NEWS → search only within 15 financial and 10 news tools
// Tool RAG: AlphaVantageTool (0.91) + NewsApiTool (0.87) + TavilySearchTool (0.74)
// Inject: 3 tools instead of 50
Which Strategy to Choose — Quick Decision
tools <= 10 → All tools in prompt. Don't overcomplicate.
tools 10-30 → Categories + routing. One class, no infrastructure.
tools 30-50 → Tool RAG with pgvector. If you have RAG — it's easy to add.
tools 50+ → Hybrid: routing → Tool RAG. Maximum accuracy.
tools 100+ → Hybrid is mandatory. Without it, accuracy is < 14%.
Real numbers from RAG-MCP (Anthropic, 2025):
with 100+ tools, the basic approach yields **13.62% accuracy** —
the agent is essentially choosing randomly. Tool RAG yields **43.13%** —
more than three times better. Prompt size is reduced by over 50%.
For comparison in tokens: 100 tools × ~200 tokens = 20,000 tokens
for descriptions in each query. Tool RAG injects 3 tools → 600 tokens.
Savings: **97% of tokens on tools** with higher selection accuracy.
If you pay for tokens — Tool RAG pays for itself very quickly.
Conclusions
When I first encountered the problem of scaling tools in Agent Chat —
I didn't understand why the agent started "stuttering" when new tools were added.
The code was correct, descriptions were written — but the selection became worse.
It turned out that this is not a bug but an architectural problem that Tool RAG solves.
Good news: if you already have pgvector and Spring AI —
adding Tool RAG takes a day of work, not a week.
It's the same infrastructure as for documents,
just applied to tool descriptions.
What I learned from implementation:
- Up to 10 tools — good description and system prompt.
Tool RAG is not needed. Don't overcomplicate prematurely
- 10-30 tools — categories and keyword routing.
One class, no infrastructure, solves 80% of scaling problems
- 30+ tools — Tool RAG is mandatory.
Without it, accuracy degrades non-linearly — and you won't notice it immediately
- "Prompt bloat" — a real and insidious problem —
the agent degrades gradually with each new tool.
With 100+ tools, accuracy drops to 13% — the agent is effectively choosing randomly
- Monitoring is mandatory from day one —
without logs, you don't know which tools are actually called,
which are injected but ignored, and which can be safely deactivated
- Description is not documentation but a prompt —
the more accurately the description matches actual user queries,
the higher the relevance score and the less often re-query is needed.
The first week after implementation — analyze logs and enrich descriptions
Next step in the series: Tool RAG solves the problem
of tool selection. But there's another problem — the agent "forgets"
context between sessions. It answers a question, the next day the same
user asks again — and the agent doesn't remember the previous conversation.
About four types of agent memory and when to use each —
→ Agent AI Memory —
in-context, RAG, episodic, and semantic.
Read also in the series:
→ Tool Use vs Function Calling — basic mechanics before scaling.
→ How LLM Decides When to Call a Tool — how to write descriptions so the model chooses correctly.
→ Grounding and Trust in Sources — what to do after a tool responds.
Sources:
RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection (2025),
vLLM Semantic Tool Selection (2025),
Red Hat: Tool RAG — The Next Breakthrough (2026),
BiasBusters: Tool Selection Bias in LLMs (2025),
RAGFlow: From RAG to Context — 2025 Review,
Spring AI Documentation