Tool RAG: What to Do When Your AI Agent Has Too Many Tools

Updated:
Ask AI about this article
Tool RAG: What to Do When Your AI Agent Has Too Many Tools

You have 5 tools — everything is fine. You have 15 tools — problems begin. You have 50 tools — the agent degrades. But there is a solution that elegantly solves the scaling problem — and you already know how it works because you use it for documents.

This article is part of a series about AI agents on Spring Boot. If you haven't read about the mechanics of tool selection yet — start with How LLM Decides When to Call a Tool. About what to do after the tool has responded — Grounding and Trust in Sources.

Table of Contents

When Tool RAG is Needed — Decision Tree

Before reading further — determine if you need this article at all. 90% of projects do not require Tool RAG. But everyone should know about it because the scaling problem comes unnoticed.

How many tools does your agent have?
          ↓
       up to 10
          ↓
Good description + system prompt
→ Sufficient. Tool RAG is not needed.
→ Read about writing descriptions.

      from 10 to 20
          ↓
Are the tools thematically different?
├── YES → Tool categories and routing (section 7)
│         Simpler than Tool RAG, solves the problem
└── NO → Improve descriptions, delineate responsibilities

      from 20 to 50
          ↓
Do you see selection degradation in logs?
├── YES → Tool RAG (this article)
└── NO → Categories + routing, monitoring (section 9)

      50+
          ↓
Tool RAG is mandatory.
Without it, the agent degrades to ~13% selection accuracy.
Important: Tool RAG is not a silver bullet and not a must-have for every project. It is a tool for a specific problem. If you have fewer than 20 tools — stop at section 7 (categories and routing) and return to this article when the registry grows.

The Scaling Problem: Numbers That Will Surprise You

You add tools gradually. First 5, then 10, then 20. Each new tool solves a real task. But at some point the agent's performance starts to decline — and you don't understand why. There are no errors in the code. Descriptions are written correctly. But the agent increasingly chooses the wrong tool or doesn't call any.

This is not a problem with your code. This is a systemic problem that researchers have called "choice paralysis" — and it is confirmed by several independent studies from 2025-2026.

What the 2025-2026 Studies Show

RAG-MCP (Anthropic, May 2025) — a study on real MCP servers showed catastrophic non-linear degradation:

Number of tools Tokens for descriptions Selection accuracy (baseline) Accuracy with Tool RAG
~10 tools ~2K 78% ~90%
~50 tools 8K 84-95% ~95%
~200 tools 32K 41-83% ~85%
100+ tools ~20K 13.62% 43.13%
~740 tools 120K 0-20% ~60%

Key result: Tool RAG more than tripled the accuracy (from 13.62% to 43.13%) with a large tool registry and reduced the prompt size by more than 50%.

Note the non-linearity of the degradation. With 50 tools, it's still acceptable — 84-95%. With 200 tools, it's already critical — a drop to 41%. With 740 tools, the agent essentially chooses randomly — 0-20%. This is not a gradual deterioration, it's a cliff.

What Degradation Looks Like in Logs

If you log tool calls in Agent Chat or AskYourDocs — this is what the problem looks like in reality:

// Agent with 5 tools — normal behavior:
INFO: Round 1 AGENT_A — Tavily search: 'vibe coding productivity'
INFO: Tavily found 3 results
INFO: Round 1 AGENT_A reply: "GitHub Copilot increases productivity by 51%..."

// The same agent but with 30 tools — degradation:
INFO: Round 1 AGENT_A — [no tool call]
INFO: Round 1 AGENT_A reply: "According to general data, vibe coding..."
// stop_reason: "end_turn" without any tool_use
// The agent "decided" to reply from memory because it got lost in choosing tools

// Or another variant of degradation:
INFO: Round 1 AGENT_A — Wikipedia search: 'productivity'
// Chose Wikipedia instead of Tavily — although Tavily is more suitable for current statistics
// With 30 tools, the model doesn't distinguish subtle differences between descriptions

The "Lost in the Middle" Effect for Tools

A separate and less known problem is positional bias. The BiasBusters study (2025) showed: tools in the middle of a long list are chosen significantly less often than tools at the beginning or end. With 741 tools:

  • Tools at the beginning and end of the list — 31-32% accuracy
  • Tools in the middle (positions 40-60%) — only 22-52% accuracy

Why this happens technically: transformer models use Rotary Position Embedding (RoPE) which has a "long-term decay" effect — tokens at the beginning and end of the context receive more attention than tokens in the middle. This is an architectural bias present in most modern LLMs regardless of provider.

Practical consequence: if your best tool for a query accidentally ends up in the middle of a list of 50+ tools — the chance that the model will choose it is significantly lower than if it were first. Tool RAG solves this automatically — inject only 3-5 tools, all of them are at the beginning of the context, positional bias is minimal.

Another Reason for Degradation: Tokens

Each tool description takes up tokens. With 50 tools with detailed descriptions — that's 5,000-15,000 tokens just for tool descriptions, even before the conversation context and history begin. The Modarressi et al. study (2025) showed:

  • An increase in context by 1,000 tokens → a decrease in accuracy by 16 percentage points
  • Exceeding 8,000 tokens → a drop to 50 p.p.

In detail about how tokens affect the quality and cost of responses — in the article LLM Context Window: Why AI Forgets and How Much It Costs. There you will also find specific cost figures for different providers.

// Token math for Agent Chat with 5 tools (current state):
// 5 tools × ~200 tokens = 1,000 tokens — acceptable ✅

// If scaled to 30 tools:
// 30 tools × ~200 tokens = 6,000 tokens — degradation begins ⚠️

// 50 tools:
// 50 tools × ~200 tokens = 10,000 tokens — significant degradation ❌

// Tool RAG — regardless of registry size:
// inject 3 tools × ~200 tokens = 600 tokens — always acceptable ✅
// even if there are 500 tools in the registry
"Prompt Bloat" and the Death of MCP That Wasn't: in late 2025, articles appeared with headlines "MCP is Dead After Just One Year". InfiniFlow (December 2025) analyzed the situation precisely: the problem is not with the MCP protocol — the problem is with the approach of "loading all tool descriptions into context at once". With 4,400+ MCP servers on mcp.so (April 2025) and hundreds of tools in enterprise systems — "choice paralysis" becomes inevitable. Tool RAG solves exactly this problem without changing the protocol.

Tool RAG Concept — The Same Idea as RAG for Documents

If you've built a RAG system — you'll understand Tool RAG in a minute. If not — here's an analogy: imagine a library with 10,000 books. When you need an answer to a question — you don't read all the books in order. You go to the catalog, find 3-5 relevant books, and read only them.

Classic RAG does the same with documents. Tool RAG does the same with agent tools.

Comparison: What Changes in the Prompt

The most illustrative way to understand Tool RAG is to see what the prompt looks like before and after:

// ❌ WITHOUT Tool RAG — all 30 tools in every request:
{
  "tools": [
    { "name": "searchWikipedia", "description": "Searches Wikipedia..." },
    { "name": "searchWeb", "description": "Searches the internet..." },
    { "name": "getStockPrice", "description": "Gets stock price..." },
    { "name": "searchNews", "description": "Searches for news..." },
    { "name": "searchPapers", "description": "Searches for scientific papers..." },
    { "name": "getWeather", "description": "Gets weather..." },
    { "name": "translateText", "description": "Translates text..." },
    { "name": "summarizeDoc", "description": "Summarizes document..." },
    // ... 22 more tools
  ],
  "messages": [{ "role": "user", "content": "what is the price of AAPL stock?" }]
}
// Prompt size: ~8,000 tokens just for tools
// The model sees 30 options and can get lost

// ✅ WITH Tool RAG — only 2 relevant tools:
{
  "tools": [
    { "name": "getStockPrice", "description": "Gets stock price..." },
    { "name": "searchWeb", "description": "Searches for current financial data..." }
  ],
  "messages": [{ "role": "user", "content": "what is the price of AAPL stock?" }]
}
// Prompt size: ~400 tokens for tools
// The model sees 2 obvious options — selection is accurate

Analogy Table: RAG for Documents vs. Tool RAG

Classic RAG Tool RAG
What is stored in the vector DB Document chunks Tool descriptions
What we index (embed) Document text Description + trigger scenarios for the tool
What we search for Relevant text fragments Relevant tools
What we inject into the LLM Top-K fragments as context Top-K tools as available instruments
Technology pgvector, Qdrant The same pgvector, Qdrant
Embedding model text-embedding-3-small The same model
What it solves Hallucinations due to lack of knowledge Degradation due to excess choice

Key advantage: if you already have a RAG infrastructure — Tool RAG is added to it with minimal effort. The same pgvector, the same embedding model, the same approach. If you use AskYourDocs or any other RAG system on pgvector — Tool RAG is essentially another table in the same DB.

If you haven't built a RAG system yet: before implementing Tool RAG — I recommend familiarizing yourself with the basic concept. In detail about how RAG works internally and how to build it on Spring AI + pgvector — RAG with Ollama: From Pipeline to Production. Tool RAG will be clear immediately after this.
Tool RAG: What to Do When Your AI Agent Has Too Many Tools

Flow Tool RAG: from query to inject

The entire Tool RAG process consists of six steps. The first two occur before the LLM receives the query — this is the main difference from the classic approach.

User query: "find the current price of AAPL stock"
          ↓
[1] Query Embedding
    embeddingModel.embed("find the current price of AAPL stock")
    → vector[1536]
    // Convert the query into a numerical vector.
    // The same embedding model used for documents in RAG.
    // Important: embedding models for tool descriptions and for queries
    // must be the same — otherwise, vector search will not work correctly.
          ↓
[2] Vector Search on Tool Descriptions Registry
    SELECT tool_name, bean_name, 1 - (embedding <=> query_vector) as score
    FROM tool_registry
    WHERE is_active = TRUE
    ORDER BY embedding <=> query_vector
    LIMIT 5
    → [AlphaVantageTool: 0.91, TavilySearchTool: 0.73,
       NewsApiTool: 0.61, WikipediaSearchTool: 0.44, ArxivTool: 0.31]
    // pgvector returns all tools sorted by relevance.
    // Even WikipediaSearchTool made it into the results —
    // but with a low score of 0.44. We'll filter it in the next step.
          ↓
[3] Relevance Threshold Filtering
    MIN_RELEVANCE_THRESHOLD = 0.60
    → keep: [AlphaVantageTool: 0.91, TavilySearchTool: 0.73]
    // Discard tools with a score below the threshold.
    // This is a critical step — without it, an irrelevant tool might be injected.
    // NewsApiTool (0.61) is borderline — depends on your threshold.
          ↓
[4] Load Spring beans for found tools
    List<ToolCallback> tools = loadTools(["alphaVantageTool", "tavilySearchTool"])
    // Load actual Spring beans by bean_name stored in the registry.
    // Not strings — real objects with the @Tool annotation.
          ↓
[5] LLM query with 2 tools instead of 30+
    agentChatModel.call(prompt, tools)
    // The model sees only 2 relevant tools.
    // ~400 tokens for descriptions instead of 6,000-15,000.
    // The choice is obvious — AlphaVantageTool for stock prices.
          ↓
[6] LLM calls AlphaVantageTool
    → getStockPrice("AAPL")
          ↓
[7] Response to user
    "AAPL Stock: $213.50 | Change: +1.2% | High: $215.20 | Low: $212.80"

Instead of passing 30+ tool descriptions (~6,000 tokens) — we pass the 2 most relevant ones (~400 tokens). Token savings: 93%. Selection accuracy: significantly higher. Latency: +50-100ms for embedding query, but saved tokens compensate for faster processing of a shorter prompt.

Two steps worth understanding in detail

Step 1 — Query Embedding: converting text into a numerical vector — the foundation of all Tool RAG. The quality of the embedding model determines how accurately the system finds a relevant tool. Details on how to choose an embedding model for your stack — Embedding models for RAG in 2026: how to choose, provider comparison. If you want to understand how embeddings work internally — Embeddings in simple terms: how AI understands meaning, not just words.

Step 3 — Relevance Threshold: this is the most important parameter that needs to be adjusted for your registry. Too high a threshold (0.85+) — the agent will often not find any tool and will respond without searching. Too low (0.40-) — irrelevant tools will be injected and degradation will return. Recommended starting threshold: 0.60-0.65. Adjust based on monitoring (section 9).

What to do if no tool is found above the threshold? Two options: (1) respond without tools — safe if the query does not require current data; (2) lower the threshold and inject the best result even if the score is low. In Agent Chat, we use option 1 as the default — the agent responds from its own knowledge if Tool RAG found nothing.

Implementation: pgvector for tool registry on Spring AI

Database schema for tool registry

-- Tool registry with embedding descriptions
CREATE TABLE tool_registry (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tool_name       VARCHAR(200) NOT NULL UNIQUE,  -- Java class or method name
    display_name    VARCHAR(200) NOT NULL,          -- human-readable name
    description     TEXT NOT NULL,                  -- full description for embedding
    category        VARCHAR(100),                   -- category for routing
    bean_name       VARCHAR(200) NOT NULL,          -- Spring bean name for injection
    is_active       BOOLEAN DEFAULT TRUE,
    version         INTEGER DEFAULT 1,              -- for versioning
    embedding       vector(1536),                   -- pgvector
    created_at      TIMESTAMP DEFAULT NOW(),
    updated_at      TIMESTAMP DEFAULT NOW()
);

-- Index for fast vector search
CREATE INDEX tool_registry_embedding_idx
ON tool_registry
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 10);  -- 10 lists for a small registry (up to 1000 tools)

-- Index for searching by category
CREATE INDEX tool_registry_category_idx ON tool_registry(category, is_active);

Tool registration service

@Service
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryService {

    private final JdbcTemplate jdbcTemplate;
    private final EmbeddingModel embeddingModel;

    /**
     * Registers a tool in the registry.
     * Called on startup or when a new tool is added.
     */
    public void registerTool(ToolRegistration registration) {
        // Generate embedding from description
        float[] embedding = embeddingModel.embed(registration.getDescription());

        jdbcTemplate.update("""
            INSERT INTO tool_registry
                (tool_name, display_name, description, category, bean_name, embedding)
            VALUES (?, ?, ?, ?, ?, ?)
            ON CONFLICT (tool_name) DO UPDATE SET
                description = EXCLUDED.description,
                category = EXCLUDED.category,
                embedding = EXCLUDED.embedding,
                version = tool_registry.version + 1,
                updated_at = NOW()
            """,
            registration.getToolName(),
            registration.getDisplayName(),
            registration.getDescription(),
            registration.getCategory(),
            registration.getBeanName(),
            embedding
        );

        log.info("Tool registered: {} (category: {})",
            registration.getToolName(), registration.getCategory());
    }

    /**
     * Semantic search for relevant tools for a query
     */
    public List<ToolMatch> findRelevantTools(String userQuery, int topK) {
        float[] queryEmbedding = embeddingModel.embed(userQuery);

        return jdbcTemplate.query("""
            SELECT tool_name, display_name, bean_name, category,
                   1 - (embedding <=> ?) as relevance_score
            FROM tool_registry
            WHERE is_active = TRUE
            ORDER BY embedding <=> ?
            LIMIT ?
            """,
            (rs, rowNum) -> ToolMatch.builder()
                .toolName(rs.getString("tool_name"))
                .displayName(rs.getString("display_name"))
                .beanName(rs.getString("bean_name"))
                .category(rs.getString("category"))
                .relevanceScore(rs.getDouble("relevance_score"))
                .build(),
            embedding, embedding, topK
        );
    }
}

@Value
@Builder
public class ToolMatch {
    String toolName;
    String displayName;
    String beanName;
    String category;
    double relevanceScore;
}

@Value
@Builder
public class ToolRegistration {
    String toolName;
    String displayName;
    String description;   // full text for embedding — the more detailed, the better
    String category;
    String beanName;
}

Registering all tools on startup

@Component
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryInitializer implements ApplicationRunner {

    private final ToolRegistryService registryService;

    @Override
    public void run(ApplicationArguments args) {
        log.info("Initializing tool registry...");

        List<ToolRegistration> tools = List.of(
            ToolRegistration.builder()
                .toolName("AlphaVantageTool.getStockPrice")
                .displayName("Stock Price Lookup")
                .description("""
                    Retrieves the current stock price on the exchange.
                    Use for queries about: stock prices, company market capitalization,
                    financial indicators, market dynamics.
                    Supports tickers: AAPL, GOOGL, TSLA, AMZN, MSFT, and others.
                    DO NOT use for: news, forecasts, general company information.
                    """)
                .category("FINANCE")
                .beanName("alphaVantageTool")
                .build(),

            ToolRegistration.builder()
                .toolName("TavilySearchTool.searchWeb")
                .displayName("Web Search")
                .description("""
                    Searches for current information on the internet via Tavily.
                    Use for: latest news, current statistics,
                    events of 2024-2025, data not available on Wikipedia.
                    DO NOT use for: stable facts, definitions, biographies.
                    """)
                .category("SEARCH")
                .beanName("tavilySearchTool")
                .build(),

            ToolRegistration.builder()
                .toolName("WikipediaSearchTool.searchWikipedia")
                .displayName("Wikipedia Search")
                .description("""
                    Searches for stable factual information on Wikipedia.
                    Use for: definitions of concepts, biographies, scientific facts,
                    historical events, geographical information.
                    DO NOT use for: current news, prices, ongoing events.
                    """)
                .category("SEARCH")
                .beanName("wikipediaSearchTool")
                .build(),

            ToolRegistration.builder()
                .toolName("ArxivSearchTool.searchPapers")
                .displayName("ArXiv Scientific Papers")
                .description("""
                    Searches for scientific articles and research papers on ArXiv.
                    Use for: scientific research, academic publications,
                    technical articles in AI/ML, physics, mathematics, CS.
                    Queries must be in English.
                    """)
                .category("RESEARCH")
                .beanName("arxivSearchTool")
                .build(),

            ToolRegistration.builder()
                .toolName("NewsApiSearchTool.searchNews")
                .displayName("News Search")
                .description("""
                    Searches for recent news on a topic via NewsAPI.
                    Use for: latest news, current events,
                    corporate news, market news.
                    Limit: 100 requests per day.
                    """)
                .category("NEWS")
                .beanName("newsApiSearchTool")
                .build()
        );

        tools.forEach(registryService::registerTool);
        log.info("Tool registry initialized: {} tools registered", tools.size());
    }
}

Dynamic tool injection in Spring AI

The most interesting part — how to inject only relevant tools into the LLM query instead of all of them at once. The key idea: ToolCallingChatOptions in Spring AI accepts an array of ToolCallback[] dynamically — meaning different queries can receive different sets of tools without changing the code.

@Service
@RequiredArgsConstructor
@Slf4j
public class ToolRagAgentService {

    private final ToolRegistryService registryService;
    private final ApplicationContext applicationContext;
    private final ChatModel chatModel;

    // How many tools to inject into one query at most
    private static final int TOP_K_TOOLS = 3;
    // Minimum relevance threshold — adjust for your registry
    private static final double MIN_RELEVANCE = 0.60;

    public String askWithToolRag(String systemPrompt, String userQuery) {
        long startTime = System.currentTimeMillis();

        // 1. Find relevant tools via vector search
        List<ToolMatch> relevantMatches = registryService
            .findRelevantTools(userQuery, TOP_K_TOOLS);

        // 2. Filter by minimum relevance threshold
        List<ToolMatch> filteredTools = relevantMatches.stream()
            .filter(m -> m.getRelevanceScore() >= MIN_RELEVANCE)
            .toList();

        long ragLatency = System.currentTimeMillis() - startTime;

        log.info("Tool RAG: query='{}' found={}/{} tools threshold={} latency={}ms",
            userQuery.length() > 50 ? userQuery.substring(0, 50) + "..." : userQuery,
            filteredTools.size(),
            relevantMatches.size(),
            MIN_RELEVANCE,
            ragLatency);

        filteredTools.forEach(t ->
            log.info("  → {} score={:.3f}", t.getToolName(), t.getRelevanceScore()));

        // 3. Construct the message
        List<Message> messages = List.of(
            new SystemMessage(systemPrompt),
            new UserMessage(userQuery)
        );

        // 4. Fallback if no relevant tool is found
        if (filteredTools.isEmpty()) {
            log.warn("Tool RAG: no tools above threshold={} for query='{}' — answering without tools",
                MIN_RELEVANCE, userQuery);
            return chatModel.call(new Prompt(messages))
                .getResult().getOutput().getText();
        }

        // 5. Load Spring beans and make the query
        ToolCallback[] tools = loadToolCallbacks(filteredTools);

        return chatModel.call(
            new Prompt(messages,
                ToolCallingChatOptions.builder()
                    .toolCallbacks(tools)
                    .build()))
            .getResult().getOutput().getText();
    }

    /**
     * Loads ToolCallbacks via Spring ApplicationContext
     * by bean name stored in the registry.
     *
     * Thread-safe: ApplicationContext.getBean() is thread-safe —
     * Spring returns singleton beans without blocking.
     */
    private ToolCallback[] loadToolCallbacks(List<ToolMatch> matches) {
        return matches.stream()
            .map(match -> {
                try {
                    Object bean = applicationContext.getBean(match.getBeanName());
                    return ToolCallbacks.from(bean);
                } catch (NoSuchBeanDefinitionException e) {
                    // Bean not found — perhaps the tool was removed from code
                    // but remains in the DB registry
                    log.error("Tool bean not found: '{}' — " +
                              "deactivate tool in registry or restart the application",
                        match.getBeanName());
                    return new ToolCallback[0];
                } catch (BeansException e) {
                    log.error("Failed to load tool bean '{}': {}",
                        match.getBeanName(), e.getMessage());
                    return new ToolCallback[0];
                }
            })
            .flatMap(Arrays::stream)
            .toArray(ToolCallback[]::new);
    }
}

Caching embeddings to reduce latency

Tool RAG adds one embedding query before each LLM call. If the same or similar queries are repeated — caching allows avoiding unnecessary embedding requests:

@Service
@RequiredArgsConstructor
@Slf4j
public class CachedToolRegistryService {

    private final ToolRegistryService registryService;

    // Simple in-memory cache — for production, use Redis
    // ConcurrentHashMap is thread-safe
    private final Map<String, CachedResult> cache = new ConcurrentHashMap<>();

    private static final Duration CACHE_TTL = Duration.ofMinutes(5);
    private static final int MAX_CACHE_SIZE = 500;

    public List<ToolMatch> findRelevantToolsCached(String userQuery, int topK) {

        // Normalize the query for better cache hit rate
        String cacheKey = userQuery.toLowerCase().trim();

        CachedResult cached = cache.get(cacheKey);
        if (cached != null && !cached.isExpired()) {
            log.debug("Tool RAG cache HIT for query: '{}'", cacheKey);
            return cached.tools();
        }

        // Cache miss — perform actual search
        log.debug("Tool RAG cache MISS for query: '{}'", cacheKey);
        List<ToolMatch> tools = registryService.findRelevantTools(userQuery, topK);

        // Store in cache if the limit is not exceeded
        if (cache.size() < MAX_CACHE_SIZE) {
            cache.put(cacheKey, new CachedResult(tools, Instant.now()));
        }

        return tools;
    }

    /**
     * Clears the cache when the tool registry is updated
     * (call after registerTool or updateToolDescription)
     */
    public void invalidateCache() {
        int size = cache.size();
        cache.clear();
        log.info("Tool RAG cache invalidated: {} entries cleared", size);
    }

    record CachedResult(List<ToolMatch> tools, Instant cachedAt) {
        boolean isExpired() {
            return Instant.now().isAfter(cachedAt.plus(CACHE_TTL));
        }
    }
}

Integration into AgentConversationRunner

Here's how the Agent Chat migration from a static tool list to dynamic Tool RAG looks — minimal changes to existing code:

// In AgentConversationRunner.ask()

// ❌ Was — all 5 tools in each round regardless of the topic:
ToolCallback[] tools = ToolCallbacks.from(
    wikipediaSearchTool,   // always inject
    tavilySearchTool,      // always inject
    alphaVantageTool,      // always inject — even if we're talking about architecture
    arxivSearchTool,       // always inject
    newsApiSearchTool      // always inject
);
// ~1,000 tokens for tools each round

// ✅ Now — only relevant tools for the current message:
List<ToolMatch> relevantTools = cachedToolRegistryService
    .findRelevantToolsCached(lastMessage, 3);

ToolCallback[] tools = loadToolCallbacks(relevantTools);

log.info("Tool RAG round={} injected={} tools: [{}]",
    round,
    tools.length,
    relevantTools.stream()
        .map(t -> t.getToolName() + ":" + String.format("%.2f", t.getRelevanceScore()))
        .collect(joining(", ")));

// Example logs during a dialogue about vibe coding:
// Tool RAG round=1 injected=2 tools: [TavilySearchTool:0.87, WikipediaSearchTool:0.71]
// Tool RAG round=2 injected=2 tools: [TavilySearchTool:0.83, NewsApiSearchTool:0.68]
// Tool RAG round=3 injected=1 tools: [WikipediaSearchTool:0.79]
// AlphaVantageTool and ArxivSearchTool not injected — not relevant to the topic
Pitfall — registry and code desynchronization: if you delete or rename a Spring bean but don't update the DB registry — loadToolCallbacks() will throw NoSuchBeanDefinitionException. To avoid this: add registry validation on application startup (method validateRegistry() from section 8) and deactivate obsolete entries using deactivateTool() instead of deleting from the DB.

Tool categories and routing — a simplified alternative

If you have 10-30 tools and they are thematically diverse — categories and keyword routing are simpler and faster than full Tool RAG. This is an intermediate solution that solves 80% of scaling problems without a vector DB and without embedding requests.

The main difference from Tool RAG: routing determines the category by searching for keywords in the query — this is a CPU operation in microseconds, not an embedding request in 50-100ms. For systems where latency is critical — this is a significant advantage.

Approach: keyword routing

@Service
@RequiredArgsConstructor
@Slf4j
public class ToolCategoryRouter {

    // Tools are injected directly — without extra fields
    private final AlphaVantageTool alphaVantageTool;
    private final TavilySearchTool tavilySearchTool;
    private final WikipediaSearchTool wikipediaSearchTool;
    private final ArxivSearchTool arxivSearchTool;
    private final NewsApiSearchTool newsApiSearchTool;

    // Keywords for each category
    // Important: words must be specific enough not to cause
    // false positives. "news" can appear in any query —
    // so "breaking news", "latest news", "новини сьогодні" is better
    private static final Map<String, List<String>> CATEGORY_KEYWORDS = Map.of(
        "FINANCE",  List.of("акція", "ціна акцій", "капіталізація",
                            "stock price", "market cap", "AAPL", "TSLA", "GOOGL"),
        "NEWS",     List.of("новини", "останні події", "сьогодні відбулось",
                            "breaking news", "latest news", "поточні події"),
        "RESEARCH", List.of("дослідження", "наукова стаття", "arxiv",
                            "research paper", "academic study", "peer-reviewed"),
        "FACTS",    List.of("що таке", "хто такий", "визначення", "wikipedia",
                            "what is", "who is", "definition of", "history of")
    );

    // Mapping categories to tools — defined once
    // LinkedHashSet preserves order and prevents duplicates
    private Map<String, List<Object>> buildCategoryTools() {
        return Map.of(
            "FINANCE",  List.of(alphaVantageTool, tavilySearchTool),
            "NEWS",     List.of(newsApiSearchTool, tavilySearchTool),
            "RESEARCH", List.of(arxivSearchTool, wikipediaSearchTool),
            "FACTS",    List.of(wikipediaSearchTool, tavilySearchTool)
        );
    }

    /**
     * Determines the query category and returns the corresponding tools.
     * Deduplicates tools if the query falls into multiple categories.
     */
    public ToolCallback[] routeTools(String userQuery) {
        String queryLower = userQuery.toLowerCase();

        Set<String> matchedCategories = CATEGORY_KEYWORDS.entrySet().stream()
            .filter(entry -> entry.getValue().stream()
                .anyMatch(queryLower::contains))
            .map(Map.Entry::getKey)
            .collect(Collectors.toSet());

        log.info("Tool routing: query='{}' → categories={}",
            userQuery.length() > 60 ? userQuery.substring(0, 60) + "..." : userQuery,
            matchedCategories.isEmpty() ? "[DEFAULT]" : matchedCategories);

        if (matchedCategories.isEmpty()) {
            // Default: basic search for any query
            log.info("Tool routing: no category matched → using default [Tavily, Wikipedia]");
            return ToolCallbacks.from(tavilySearchTool, wikipediaSearchTool);
        }

        Map<String, List<Object>> categoryTools = buildCategoryTools();

        // LinkedHashSet for deduplication — tavilySearchTool won't be added twice
        // if the query matches FINANCE and NEWS simultaneously
        Set<Object> selectedTools = new LinkedHashSet<>();
        matchedCategories.forEach(category ->
            selectedTools.addAll(categoryTools.getOrDefault(category, List.of()))
        );

        log.info("Tool routing: selected {} tools: {}",
            selectedTools.size(),
            selectedTools.stream()
                .map(t -> t.getClass().getSimpleName())
                .collect(Collectors.joining(", ")));

        return ToolCallbacks.from(selectedTools.toArray());
    }
}

When routing works well — and when it breaks

// ✅ Routing works well:
"what is the stock price of AAPL?"
→ FINANCE → [AlphaVantageTool, TavilySearchTool] ✓

"what is vibe coding?"
→ FACTS → [WikipediaSearchTool, TavilySearchTool] ✓

"latest news about Tesla stock price"
→ NEWS + FINANCE → [NewsApiSearchTool, TavilySearchTool, AlphaVantageTool] ✓
  (deduplication works — TavilySearchTool once)

// ❌ Routing breaks:
"tell me about a company that changed the market"
→ [] → DEFAULT → [TavilySearchTool, WikipediaSearchTool]
  (no keyword matched — but Tavily is still suitable)

"what is the weather in Kyiv and what is the dollar exchange rate?"
→ [] → DEFAULT
  (no WEATHER or CURRENCY categories — routing doesn't know what to do)
  // With Tool RAG: embedding would find WeatherTool and CurrencyTool automatically

"research shows that stock prices are rising"
→ RESEARCH + FINANCE → too many tools
  (the word "research" is present but the query doesn't require ArXiv)
  // This is where routing becomes fragile

Comparison: routing vs Tool RAG

Keyword routing Tool RAG
Overhead per request ~0ms (CPU) ~50-100ms (embedding)
Selection accuracy Good for simple queries High for any query
Code maintenance Manual keyword updates Update descriptions in DB
Multilingual support Separate keywords for each language Automatic via semantic search
Number of tools Up to 30 Unlimited
Implementation complexity Low — one class Medium — DB + embedding
Infrastructure Nothing additional pgvector + embedding model
Three signals that it's time to switch from routing to Tool RAG: 1. The keyword list is growing — you are constantly adding new keywords because routing misses queries. This is a sign that semantic search will handle it better. 2. Queries often belong to 2-3 categories simultaneously — injection becomes unpredictable, the agent receives too many tools. 3. You are adding a new interface language — keywords need to be duplicated for each language, while Tool RAG supports multilingualism automatically through semantic search.

Tool versioning — how to update the registry

A practical problem not found in any tutorial: what to do when a tool changes? New functionality, new description, new limitations. Or even more complex — you changed the embedding model and all old vectors became incompatible.

Key principle: never delete records from the registry — deactivate them. This preserves the change history and allows rolling back if something goes wrong.

Four update scenarios

@Service
@RequiredArgsConstructor
@Slf4j
public class ToolVersioningService {

    private final ToolRegistryService registryService;
    private final CachedToolRegistryService cachedRegistryService;
    private final JdbcTemplate jdbcTemplate;
    private final EmbeddingModel embeddingModel;

    /**
     * Scenario 1: Only the description changed (most common)
     * Regenerate only one embedding — other fields unchanged
     */
    @Transactional
    public void updateToolDescription(String toolName, String newDescription) {

        // First, check if the tool exists
        int exists = jdbcTemplate.queryForObject(
            "SELECT COUNT(*) FROM tool_registry WHERE tool_name = ? AND is_active = TRUE",
            Integer.class, toolName);

        if (exists == 0) {
            throw new IllegalArgumentException(
                "Tool not found or inactive: " + toolName);
        }

        // Generate a new embedding only for the updated description
        float[] newEmbedding = embeddingModel.embed(newDescription);

        jdbcTemplate.update("""
            UPDATE tool_registry
            SET description = ?,
                embedding = ?,
                version = version + 1,
                updated_at = NOW()
            WHERE tool_name = ?
            """,
            newDescription, newEmbedding, toolName
        );

        // Always invalidate the cache after an update
        cachedRegistryService.invalidateCache();

        log.info("Tool updated: {} — new embedding generated, cache invalidated",
            toolName);
    }

    /**
     * Scenario 2: Tool deactivated (obsolete or removed from code)
     * DO NOT delete — deactivate, preserve history
     */
    @Transactional
    public void deactivateTool(String toolName) {
        int updated = jdbcTemplate.update("""
            UPDATE tool_registry
            SET is_active = FALSE,
                updated_at = NOW()
            WHERE tool_name = ?
            """,
            toolName
        );

        if (updated == 0) {
            log.warn("Tool not found for deactivation: {}", toolName);
            return;
        }

        cachedRegistryService.invalidateCache();
        log.info("Tool deactivated: {} — removed from active registry", toolName);
    }

    /**
     * Scenario 3: Mass update after refactoring descriptions
     * Regenerate all embeddings — may take several minutes
     * for a large registry
     */
    @Transactional
    public RebuildResult rebuildAllEmbeddings() {
        log.info("Starting full tool registry rebuild...");
        long startTime = System.currentTimeMillis();

        List<Map<String, Object>> tools = jdbcTemplate.queryForList(
            "SELECT tool_name, description FROM tool_registry WHERE is_active = TRUE"
        );

        int success = 0;
        int failed = 0;

        for (Map<String, Object> tool : tools) {
            String toolName = (String) tool.get("tool_name");
            String description = (String) tool.get("description");

            try {
                float[] newEmbedding = embeddingModel.embed(description);

                jdbcTemplate.update("""
                    UPDATE tool_registry
                    SET embedding = ?,
                        version = version + 1,
                        updated_at = NOW()
                    WHERE tool_name = ?
                    """,
                    newEmbedding, toolName
                );
                success++;

            } catch (Exception e) {
                log.error("Failed to rebuild embedding for tool '{}': {}",
                    toolName, e.getMessage());
                failed++;
            }
        }

        cachedRegistryService.invalidateCache();

        long elapsed = System.currentTimeMillis() - startTime;
        log.info("Tool registry rebuild complete: {}/{} tools updated in {}ms",
            success, tools.size(), elapsed);

        return new RebuildResult(success, failed, elapsed);
    }

    /**
     * Scenario 4: Changing the embedding model — the most complex case
     * All old vectors are incompatible with the new model —
     * the entire registry needs to be rebuilt and dimensions updated
     */
    public void migrateEmbeddingModel(int newDimensions) {
        log.warn("EMBEDDING MODEL MIGRATION STARTED — " +
                 "all existing vectors will be invalidated!");

        // 1. Check if the new dimension differs
        // (if it's the same — just rebuildAllEmbeddings)
        if (newDimensions != getCurrentDimensions()) {
            log.info("Updating vector dimensions: {} → {}",
                getCurrentDimensions(), newDimensions);

            // 2. Change the column dimension in pgvector
            // WARNING: this is a DROP and CREATE — all old vectors are deleted
            jdbcTemplate.execute(
                "ALTER TABLE tool_registry ALTER COLUMN embedding TYPE vector(" +
                newDimensions + ")");
        }

        // 3. Regenerate all embeddings with the new model
        RebuildResult result = rebuildAllEmbeddings();

        log.info("Embedding model migration complete: {}", result);
    }

    private int getCurrentDimensions() {
        // Get current dimension from the first record
        return jdbcTemplate.queryForObject("""
            SELECT vector_dims(embedding)
            FROM tool_registry
            WHERE embedding IS NOT NULL
            LIMIT 1
            """, Integer.class);
    }

    record RebuildResult(int success, int failed, long elapsedMs) {}
}

Registry validation on startup — health check

Add validation in ApplicationRunner to see discrepancies between code and registry immediately after startup:

@Component
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryValidator implements ApplicationRunner {

    private final ToolVersioningService versioningService;
    private final ApplicationContext applicationContext;
    private final JdbcTemplate jdbcTemplate;

    @Override
    public void run(ApplicationArguments args) {
        log.info("Validating tool registry consistency...");

        // Get all active bean names from the registry
        List<String> registeredBeans = jdbcTemplate.queryForList(
            "SELECT bean_name FROM tool_registry WHERE is_active = TRUE",
            String.class
        );

        List<String> issues = new ArrayList<>();

        // Check if beans exist in the Spring context
        for (String beanName : registeredBeans) {
            if (!applicationContext.containsBean(beanName)) {
                issues.add("ORPHANED: bean '" + beanName +
                           "' in registry but NOT in Spring context");
            }
        }

        // Check for tools with null embedding
        List<String> noEmbedding = jdbcTemplate.queryForList("""
            SELECT tool_name FROM tool_registry
            WHERE is_active = TRUE AND embedding IS NULL
            """, String.class);

        if (!noEmbedding.isEmpty()) {
            issues.add("NO EMBEDDING: tools without embedding: " + noEmbedding);
        }

        // Output the result
        if (issues.isEmpty()) {
            log.info("Tool registry OK: {} active tools, all consistent",
                registeredBeans.size());
        } else {
            log.warn("Tool registry has {} issues:", issues.size());
            issues.forEach(issue -> log.warn("  ⚠️ {}", issue));
            log.warn("Run ToolVersioningService.rebuildAllEmbeddings() " +
                     "or deactivateTool() to fix");
        }
    }
}
Pitfall — cache after update: each registry update method calls cachedRegistryService.invalidateCache(). If you forget about this — the agent will continue to use old search results for another 5 minutes (cache TTL). This is especially critical when deactivating a tool: the agent might try to call a deactivated tool if it's still in the cache. Pitfall — embedding model migration: this is an irreversible operation that deletes all old vectors. Always back up the table before migration: CREATE TABLE tool_registry_backup AS SELECT * FROM tool_registry;

Registry Monitoring and Metrics

The tools registry is a living component. Without monitoring, you won't know: which tools are actually being used, which can be removed, which descriptions are worth rewriting. And most importantly – you won't know when the Tool RAG starts to fail.

Analytics Table Schema

-- Logging every tool selection with Tool RAG
CREATE TABLE tool_usage_log (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tool_name       VARCHAR(200) NOT NULL,
    query_snippet   VARCHAR(500),      -- first 500 characters of the query
    relevance_score DOUBLE PRECISION,
    was_called      BOOLEAN DEFAULT FALSE, -- whether LLM actually called the tool after injection
    session_id      VARCHAR(100),      -- session ID for correlation
    conversation_id BIGINT,            -- for Agent Chat
    created_at      TIMESTAMP DEFAULT NOW()
);

-- Index for fast queries by time and tool_name
CREATE INDEX tool_usage_log_time_idx ON tool_usage_log(created_at DESC);
CREATE INDEX tool_usage_log_tool_idx ON tool_usage_log(tool_name, created_at DESC);

-- Aggregated statistics for the last 30 days
CREATE VIEW tool_usage_stats AS
SELECT
    tool_name,
    COUNT(*)                                                    as injected_count,
    SUM(CASE WHEN was_called THEN 1 ELSE 0 END)                as called_count,
    ROUND(
        SUM(CASE WHEN was_called THEN 1 ELSE 0 END)::numeric /
        NULLIF(COUNT(*), 0) * 100, 1
    )                                                           as call_rate_pct,
    ROUND(AVG(relevance_score)::numeric, 3)                    as avg_relevance,
    ROUND(MIN(relevance_score)::numeric, 3)                    as min_relevance,
    MAX(created_at)                                            as last_injected,
    MAX(CASE WHEN was_called THEN created_at END)              as last_called
FROM tool_usage_log
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY tool_name
ORDER BY called_count DESC;

ToolUsageMonitor — Monitoring Service

@Service
@RequiredArgsConstructor
@Slf4j
public class ToolUsageMonitor {

    private final JdbcTemplate jdbcTemplate;

    /**
     * Batch insert — one query instead of N separate ones
     */
    public void logInjection(List<ToolMatch> injectedTools,
                              String query,
                              String sessionId,
                              Long conversationId) {
        if (injectedTools.isEmpty()) return;

        String querySnippet = query.length() > 500
            ? query.substring(0, 500) : query;

        // Batch insert for all tools at once
        jdbcTemplate.batchUpdate("""
            INSERT INTO tool_usage_log
                (tool_name, query_snippet, relevance_score,
                 session_id, conversation_id)
            VALUES (?, ?, ?, ?, ?)
            """,
            injectedTools,
            injectedTools.size(),
            (ps, tool) -> {
                ps.setString(1, tool.getToolName());
                ps.setString(2, querySnippet);
                ps.setDouble(3, tool.getRelevanceScore());
                ps.setString(4, sessionId);
                ps.setObject(5, conversationId);
            }
        );
    }

    /**
     * Update was_called=TRUE after actual tool call.
     * Use session_id instead of time — more reliable.
     */
    public void markToolCalled(String toolName, String sessionId) {
        int updated = jdbcTemplate.update("""
            UPDATE tool_usage_log
            SET was_called = TRUE
            WHERE tool_name = ?
              AND session_id = ?
              AND was_called = FALSE
            """,
            toolName, sessionId
        );

        if (updated == 0) {
            log.warn("markToolCalled: no record found for tool='{}' session='{}'",
                toolName, sessionId);
        }
    }

    /**
     * "Dead" tools — injected frequently but rarely called.
     * callRateThreshold: 0.10 = less than 10% of calls after injection
     */
    public List<DeadToolReport> findDeadTools(double callRateThreshold) {
        return jdbcTemplate.query("""
            SELECT tool_name, injected_count, called_count,
                   call_rate_pct, avg_relevance
            FROM tool_usage_stats
            WHERE injected_count >= 10
              AND call_rate_pct < ?
            ORDER BY injected_count DESC
            """,
            (rs, rowNum) -> new DeadToolReport(
                rs.getString("tool_name"),
                rs.getInt("injected_count"),
                rs.getInt("called_count"),
                rs.getDouble("call_rate_pct"),
                rs.getDouble("avg_relevance")
            ),
            callRateThreshold * 100  // convert 0.10 → 10 for comparison with call_rate_pct
        );
    }

    /**
     * Tools not used for 30+ days
     */
    public List<String> findUnusedTools() {
        return jdbcTemplate.queryForList("""
            SELECT tr.tool_name
            FROM tool_registry tr
            LEFT JOIN tool_usage_log tul
                ON tr.tool_name = tul.tool_name
                AND tul.created_at > NOW() - INTERVAL '30 days'
            WHERE tr.is_active = TRUE
              AND tul.tool_name IS NULL
            ORDER BY tr.tool_name
            """,
            String.class
        );
    }

    /**
     * Tools with consistently low relevance score —
     * a signal that the description poorly matches actual queries
     */
    public List<String> findLowRelevanceTools(double scoreThreshold) {
        return jdbcTemplate.queryForList("""
            SELECT tool_name
            FROM tool_usage_stats
            WHERE injected_count >= 5
              AND avg_relevance < ?
            ORDER BY avg_relevance ASC
            """,
            String.class,
            scoreThreshold  // e.g., 0.65
        );
    }

    /**
     * Full registry health report
     */
    public RegistryHealthReport generateHealthReport() {
        List<DeadToolReport> deadTools = findDeadTools(0.10);
        List<String> unusedTools = findUnusedTools();
        List<String> lowRelevanceTools = findLowRelevanceTools(0.65);

        int totalActive = jdbcTemplate.queryForObject(
            "SELECT COUNT(*) FROM tool_registry WHERE is_active = TRUE",
            Integer.class);

        RegistryHealthReport report = new RegistryHealthReport(
            totalActive, deadTools, unusedTools, lowRelevanceTools);

        if (report.hasIssues()) {
            log.warn("Tool registry health report:\n{}", report.summary());
        } else {
            log.info("Tool registry healthy: {} active tools, no issues", totalActive);
        }

        return report;
    }

    record DeadToolReport(
        String toolName,
        int injectedCount,
        int calledCount,
        double callRatePct,
        double avgRelevance
    ) {}

    record RegistryHealthReport(
        int totalActiveTools,
        List<DeadToolReport> deadTools,
        List<String> unusedTools,
        List<String> lowRelevanceTools
    ) {
        boolean hasIssues() {
            return !deadTools.isEmpty()
                || !unusedTools.isEmpty()
                || !lowRelevanceTools.isEmpty();
        }

        String summary() {
            return String.format("""
                Active tools: %d
                Dead tools (low call rate): %s
                Unused tools (30+ days): %s
                Low relevance tools: %s
                """,
                totalActiveTools,
                deadTools.stream().map(DeadToolReport::toolName).toList(),
                unusedTools,
                lowRelevanceTools
            );
        }
    }
}

Scheduled Monitoring — Automatic Report

@Component
@RequiredArgsConstructor
@Slf4j
public class ToolRegistryHealthScheduler {

    private final ToolUsageMonitor monitor;
    private final ToolVersioningService versioningService;

    /**
     * Weekly health check — every Monday at 9:00 AM
     */
    @Scheduled(cron = "0 0 9 * * MON")
    public void weeklyHealthCheck() {
        log.info("=== Weekly Tool Registry Health Check ===");
        RegistryHealthReport report = monitor.generateHealthReport();

        if (report.hasIssues()) {
            // In production: send to Slack/email/Grafana
            // notificationService.sendAlert("Tool Registry Issues", report.summary());
            log.warn("Action required — review tool registry");
        }
    }

    /**
     * Daily consistency check on startup
     */
    @Scheduled(cron = "0 0 8 * * *")
    public void dailyConsistencyCheck() {
        // Check for tools without embeddings
        // (may appear after a failure during registerTool)
        List<String> noEmbedding = versioningService.findToolsWithoutEmbeddings();

        if (!noEmbedding.isEmpty()) {
            log.warn("Tools without embeddings found: {} — rebuilding",
                noEmbedding);
            versioningService.rebuildEmbeddingsForTools(noEmbedding);
        }
    }
}

What to Do with Monitoring Results

Symptom Metric Cause Action
Tool injected frequently, called rarely call_rate_pct < 10% Description too broad Add anti-use-cases to description, narrow down triggers
Tool not appearing in injection injected_count = 0 Description does not match user queries Rewrite description using language from logs
Tool not used for 30+ days last_called IS NULL Tool is outdated or covered by another tool Deactivate using deactivateTool()
avg_relevance consistently low avg_relevance < 0.65 Weak semantic match Enrich description with synonyms and query examples
call_rate_pct = 100% called_count = injected_count Tool injected too rarely — threshold is too high Lower MIN_RELEVANCE or enrich description
Practical Tip: the first week after implementing Tool RAG — log all query_snippets for injections with a low score (<0.70). These queries show where semantic search fails to find the correct tool. Add these formulations directly to the description as examples — and search accuracy will increase without any code changes.

Comparison of Approaches

The three approaches are not mutually exclusive — in production, a combination is often used: routing for fast initial filtering and Tool RAG for precise selection within a category.

Approach Optimal for Latency overhead Selection Accuracy Infrastructure Complexity
All tools in prompt up to 10 tools 0ms 78-95% None Minimal
Categories + keyword routing 10-30 tools, thematically diverse ~1ms (CPU) 75-90% None Low
Tool RAG (vector search) 30+ tools, semantically similar ~50-150ms 85-95% pgvector + embedding model Medium
Tool RAG + caching 30+ tools, repetitive queries ~10-30ms 85-95% pgvector + Redis/ConcurrentHashMap Medium+
Hybrid: routing → Tool RAG 50+ tools, mixed categories ~20-80ms 90-97% pgvector + categories in DB High

Hybrid Approach — How It Works

For large registries (50+ tools), the most effective combination is:

User Query
      ↓
[1] Keyword routing → determine category (FINANCE / NEWS / RESEARCH)
    ~1ms, CPU operation
      ↓
[2] Tool RAG within the category only
    Search among 10-15 tools instead of 50+
    ~30-50ms instead of 100-150ms
      ↓
[3] Inject Top-3 tools from the category
    Higher accuracy because the search is in a smaller space

// Example:
// Query: "what is the stock price of AAPL and what is the news about Apple?"
// Routing: FINANCE + NEWS → search only within 15 financial and 10 news tools
// Tool RAG: AlphaVantageTool (0.91) + NewsApiTool (0.87) + TavilySearchTool (0.74)
// Inject: 3 tools instead of 50

Which Strategy to Choose — Quick Decision

tools <= 10         → All tools in prompt. Don't overcomplicate.
tools 10-30         → Categories + routing. One class, no infrastructure.
tools 30-50         → Tool RAG with pgvector. If you have RAG — it's easy to add.
tools 50+           → Hybrid: routing → Tool RAG. Maximum accuracy.
tools 100+          → Hybrid is mandatory. Without it, accuracy is < 14%.
Real numbers from RAG-MCP (Anthropic, 2025): with 100+ tools, the basic approach yields **13.62% accuracy** — the agent is essentially choosing randomly. Tool RAG yields **43.13%** — more than three times better. Prompt size is reduced by over 50%. For comparison in tokens: 100 tools × ~200 tokens = 20,000 tokens for descriptions in each query. Tool RAG injects 3 tools → 600 tokens. Savings: **97% of tokens on tools** with higher selection accuracy. If you pay for tokens — Tool RAG pays for itself very quickly.

Conclusions

When I first encountered the problem of scaling tools in Agent Chat — I didn't understand why the agent started "stuttering" when new tools were added. The code was correct, descriptions were written — but the selection became worse. It turned out that this is not a bug but an architectural problem that Tool RAG solves.

Good news: if you already have pgvector and Spring AI — adding Tool RAG takes a day of work, not a week. It's the same infrastructure as for documents, just applied to tool descriptions.

What I learned from implementation:

  • Up to 10 tools — good description and system prompt. Tool RAG is not needed. Don't overcomplicate prematurely
  • 10-30 tools — categories and keyword routing. One class, no infrastructure, solves 80% of scaling problems
  • 30+ tools — Tool RAG is mandatory. Without it, accuracy degrades non-linearly — and you won't notice it immediately
  • "Prompt bloat" — a real and insidious problem — the agent degrades gradually with each new tool. With 100+ tools, accuracy drops to 13% — the agent is effectively choosing randomly
  • Monitoring is mandatory from day one — without logs, you don't know which tools are actually called, which are injected but ignored, and which can be safely deactivated
  • Description is not documentation but a prompt — the more accurately the description matches actual user queries, the higher the relevance score and the less often re-query is needed. The first week after implementation — analyze logs and enrich descriptions
Next step in the series: Tool RAG solves the problem of tool selection. But there's another problem — the agent "forgets" context between sessions. It answers a question, the next day the same user asks again — and the agent doesn't remember the previous conversation. About four types of agent memory and when to use each — Agent AI Memory — in-context, RAG, episodic, and semantic.

Read also in the series:

Tool Use vs Function Calling — basic mechanics before scaling.

How LLM Decides When to Call a Tool — how to write descriptions so the model chooses correctly.

Grounding and Trust in Sources — what to do after a tool responds.

Sources: RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection (2025), vLLM Semantic Tool Selection (2025), Red Hat: Tool RAG — The Next Breakthrough (2026), BiasBusters: Tool Selection Bias in LLMs (2025), RAGFlow: From RAG to Context — 2025 Review, Spring AI Documentation