В чому різниця між Function Calling і Tool Use?

Function Calling — оригінальний термін OpenAI: модель повертає об'єкт з function.name і function.arguments. Tool Use — ширший термін Anthropic і загальноіндустрійний стандарт 2024–2025, що охоплює кастомні функції + вбудовані інструменти (code interpreter, web search, file reading). По суті — одна механіка, різні назви від різних провайдерів.

Чи виконує LLM функцію сама?

Ні. LLM — stateless text transformer без доступу до мережі або зовнішніх систем. Вона лише аналізує контекст і повертає структурований JSON з назвою функції та аргументами. Виконання відбувається у runtime вашого застосунку — у Agent Scaffolding. Помилки виникають не у моделі, а у вашому коді обробки результату.

В чому різниця синтаксису OpenAI і Anthropic API для tool use?

Головна відмінність: OpenAI використовує ключ parameters всередині обгортки function, Anthropic — input_schema без обгортки function. Решта структури JSON Schema ідентична. Код написаний під OpenAI не запрацює з Anthropic API без цього правлення. LiteLLM і LangChain конвертують формат автоматично.

Коли використовувати tool_choice required і які обмеження?

Тільки у детермінованих pipeline: structured output, обов'язкове логування, примусовий запит до зовнішнього API. У conversational режимі призводить до зайвих tool calls на будь-якому запиті включно з 'Привіт'. Критичне обмеження: tool_choice any і примусовий виклик конкретного tool несумісні з extended thinking у Claude — повертає HTTP 400.

Що таке Agent Scaffolding і яка його роль у tool use?

Agent Scaffolding — прошарок між LLM і зовнішнім світом який керує циклом TAO: Thought (LLM вирішує що робити) → Action (Scaffolding виконує tool call) → Observation (результат повертається в LLM). У простих випадках це кілька рядків Python. У складних — фреймворки LangGraph або AutoGen. Принцип незмінний: модель вирішує, код виконує.

Чим RAG pipeline відрізняється від Tool Use?

RAG — детермінований pipeline де retrieve відбувається завжди незалежно від запиту. Tool Use — механізм де модель сама вирішує чи викликати інструмент, який і з якими параметрами. RAG передбачуваний але неефективний на простих запитах. Tool Use адаптивний але вносить нову точку відмови: модель може не викликати пошук там де він потрібний.

Чи можна реалізувати RAG як інструмент у системі Tool Use?

Так. search_knowledge_base(query) стає tool який модель викликає за потреби замість автоматичного retrieve. Це знижує зайві embedding-запити і витрати токенів. Але додає ризик: модель може відповісти з власних знань без пошуку якщо опис інструменту нечіткий або модель вважає себе достатньо обізнаною. Завжди логуйте stop_reason — якщо end_turn без попереднього tool_use, модель не шукала.

AI_TOOLS 09 April 2026 21 min read 2,055 view

Function Calling vs Tool Use in LLMs: Mechanics of JSON Schema and Its Connection to RAG

Updated: 09 April 2026

Language: 🇺🇦 🇺🇸 🇩🇪 🇪🇸

Dmitro Petrov

A Tech Lead who builds AI/ML systems for production — and writes about how they actually work.

Function Calling vs Tool Use in LLMs: Mechanics of JSON Schema and Its Connection to RAG

When a developer first sees an LLM "calling a function" — an intuitive error arises: it seems like the model itself executed a database or API request. This is not the case, and it is precisely this error that generates a whole class of architectural bugs. Spoiler: The LLM only returns a structured JSON with the function name and arguments — all execution happens in your code.

⚡ In a nutshell

✅ LLM is an orchestrator, not an executor: the model forms a JSON request, your code executes it
✅ Function Calling = Tool Use: different names from OpenAI and Anthropic for the same mechanic
✅ tool_choice: auto — the model decides itself; required — forced call; none — text only
✅ RAG and Tool Use are different abstraction levels: RAG can be one of the tools in a Tool Use system
🎯 You will get: a clear understanding of the function calling mechanic and where it intersects with your RAG pipeline
👇 Below are detailed explanations, code examples, and tables

📚 Table of Contents

📌 The Model as an Orchestrator, Not an Executor
📌 What a Call Looks Like Technically: JSON Schema, tool_choice auto/required/none
📌 RAG as a Pipeline vs Tool Use as a Decision Mechanism
📌 Where RAG and Tool Use Intersect — and Where They Diverge
💼 RAG PDF Service as a Starting Point
❓ Frequently Asked Questions (FAQ)
✅ Conclusions

⸻

What is Function Calling — The Model as an Orchestrator, Not an Executor

The fundamental principle that is constantly confused: The LLM does not execute the function itself. It only analyzes the context, determines which function to call and with which arguments — and returns a structured JSON request. Execution happens at runtime in your application.

To understand why this is the case, we need to look at the architecture from the right angle. An LLM is a stateless text transformer. It has no sockets, cannot open a database connection, cannot make an HTTP request. All it does is take tokens as input and generate tokens as output. Function calling is simply an agreement on the format of these output tokens: instead of natural text, the model generates structured JSON, which your code interprets as an instruction to act.

The Full Call Cycle

Symflower (2025) describes it this way: the LLM asks Agent Scaffolding to perform a tool call on its behalf. Comet (2026) details the cycle through the TAO pattern — Thought → Action → Observation:

1. User message → LLM
   ↓
2. LLM analyzes context (Thought)
   → returns JSON with function name and arguments
   ↓
3. Agent Scaffolding parses JSON (Action)
   → executes the function in your code
   → receives the result
   ↓
4. Result is passed back to LLM as tool result (Observation)
   ↓
5. LLM generates the final response to the user

Note: between step 2 and step 4 — the LLM is not involved at all. It "waits" for your code to do the work and return the result. This is why errors in tool calls most often occur not in the model, but between steps 3 and 4 — in your result processing code.

This architecture is called Agent Scaffolding — a layer between the LLM and the external world, which manages the call cycle. In simple cases, it's a few lines of Python. In complex ones — full-fledged frameworks like LangGraph or AutoGen. But the principle remains the same: the model decides what to do, the code executes.

Terminology Reference: Function Calling vs Tool Use

Both terms describe the same mechanic — but from different perspectives:

Function Calling — the original OpenAI term (appeared in GPT-4, June 2023). The focus is on the model "calling a function" — hence the name. The model returns an object with function.name and function.arguments.
Tool Use — a broader term by Anthropic and a general industry standard for 2024–2025. The focus is on the model "using a tool" — emphasizing that it's just one of the means, not an end in itself. It covers custom functions + built-in tools (code interpreter, web search, file reading).

As noted by Martin Fowler (2025): "tool calling" is a more general and modern term; both terms coexist for historical reasons. It's also worth noting: in Anthropic's documentation, any is used instead of required, and the tool descriptions themselves are passed via input_schema instead of parameters — minor syntactic differences with identical logic.

Why the Model Can Do This at All

Function calling is not a built-in "ability" of the LLM in a hardware sense. It is the result of fine-tuning on synthetic examples where the correct answer is JSON, not text. As described by Simplicity is SOTA (2025), providers generate thousands of examples of prompt → tool call with a Chain-of-Thought reasoning trace, where the model learns not just to generate JSON, but to justify why a specific tool is needed in a specific context.

The quality of tool calling directly depends on the quality of tool descriptions. The model is trained to recognize intent through descriptions — if a description is vague or missing, the model will either not call the tool, or call the wrong one. More on this in How an LLM Model Decides When to Search — The Decision-Making Mechanic.

Parallel Calls: When One Tool Isn't Enough

Modern models support parallel tool calls in a single turn — when multiple independent sources are needed simultaneously for a response. For example, a request "compare terms of contracts A and B" might trigger two parallel calls to search_documents with different parameters instead of two sequential ones.

# Model response with parallel tool calls:
{
  "tool_calls": [
    {
      "id": "call_001",
      "function": {
        "name": "search_documents",
        "arguments": "{\"query\": \"terms of contract A\", \"top_k\": 3}"
      }
    },
    {
      "id": "call_002",
      "function": {
        "name": "search_documents",
        "arguments": "{\"query\": \"terms of contract B\", \"top_k\": 3}"
      }
    }
  ]
}

Your code must process both results and pass them back in the next message as two separate tool_result blocks. If you return only one — the model will receive incomplete context and may hallucinate the second part of the answer.

⚠️ Pitfall #1: The Invisible Error Between Step 3 and 4

The most common error in the architecture: the developer believes the model "executed a database query," when in reality it only formed a JSON request, and the execution might not have happened at all due to an error in the tool call processing code. The LLM, meanwhile, receives an empty or erroneous result — and continues generating a response as if nothing happened, often hallucinating instead of admitting the problem.

Minimum checklist for a reliable cycle:

Log the entire cycle: tool call request → execution → result → model response
Verify that arguments were successfully deserialized via json.loads() before execution
Handle exceptions if the function returned an error — and explicitly pass that error back to the LLM
For parallel calls — always return results for all tool_call.id

What a Call Looks Like Technically: JSON Schema, tool_choice auto/required/none

For the model to know about tools, they need to be described in JSON Schema format and passed in the API request. The syntax differs depending on the provider — and this difference is practical: code written for OpenAI will not work with the Anthropic API without editing one key field.

OpenAI vs Anthropic Syntax: Where the Code Breaks

The main difference: OpenAI uses the parameters key, Anthropic uses input_schema. The rest of the structure is identical.

# OpenAI / OpenAI-compatible API
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_documents",
            "description": "Searches for relevant fragments in the corporate knowledge base",
            "parameters": {          # ← OpenAI: "parameters"
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The text of the search query"
                    },
                    "top_k": {
                        "type": "integer",
                        "description": "Number of fragments (default 5)"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

# Anthropic Claude API (native)
tools = [
    {
        "name": "search_documents",
        "description": "Searches for relevant fragments in the corporate knowledge base",
        "input_schema": {            # ← Anthropic: "input_schema", without the "function" wrapper
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The text of the search query"
                },
                "top_k": {
                    "type": "integer",
                    "description": "Number of fragments (default 5)"
                }
            },
            "required": ["query"]
        }
    }
]

If you use LiteLLM or LangChain — they abstract this difference and convert the format automatically. When working directly with the Anthropic SDK — only input_schema.

What the Model Returns and How to Pass the Result Back

Most tutorials stop at how the model returns a tool call. But most often, developers get stuck at the next step — how to correctly pass the execution result back in the next message.

Step 1. The model returns a tool call instead of text:

# Model response (stop_reason: "tool_use")
{
  "content": [
    {
      "type": "tool_use",
      "id": "toolu_01XFDUDYJgAACTvYkLMeDRVQ",   # ← id is needed for the next step
      "name": "search_documents",
      "input": {
        "query": "terms of contract termination",
        "top_k": 5
      }
    }
  ],
  "stop_reason": "tool_use"
}

Note: in the Anthropic API, the field is called input (already an object), not arguments (a string as in OpenAI). Deserialization via json.loads() is only needed when working with the OpenAI format.

Step 2. Your code executes the function and forms the next request:

import anthropic, json

client = anthropic.Anthropic()

# First request
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What are the terms of contract termination?"}]
)

# Get the tool call from the response
tool_use_block = next(b for b in response.content if b.type == "tool_use")
tool_result = search_documents(**tool_use_block.input)  # your function

# Second request — pass the result back
follow_up = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What are the terms of contract termination?"},
        {"role": "assistant", "content": response.content},   # ← the entire model response
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,          # ← id from step 1
                    "content": json.dumps(tool_result, ensure_ascii=False)
                }
            ]
        }
    ]
)

Three critical details of this cycle:

The tool_use_id in the result must exactly match the id from the tool call — otherwise the API will return a validation error.
The entire model response (response.content) is passed in messages, not just the text — the model needs to see its own tool call in context.
For parallel calls (multiple tool calls in one response) — you need to return a tool_result for each id, otherwise the next request will return an error.

The tool_choice Parameter

The tool_choice parameter controls the decision-making mode for tool invocation. Official Anthropic documentation defines four options:

Value	Behavior	When to Use	⚠️ Pitfall
`auto`	The model decides itself: text or tool call	Typical conversational scenario. Default if tools are provided.	The model might not call a tool if the description is unclear or it "knows" the answer from its own knowledge — even if the data is outdated.
`any` / `required`	The model is obligated to call at least one tool	Structured output, mandatory logging, forced external API request	Calls a tool even for "Hello" — extra tokens and latency. Incompatible with extended thinking in Claude (HTTP 400).
`none`	The model does not call any tool, generates only text	Response exclusively from provided context; final step after receiving all results	If you pass `none` along with `tool_result` in messages — the API will return an error: tool definitions must be present.
`{"type": "tool", "name": "..."}`	Forced call of a specific tool	Testing a single tool, deterministic pipeline	Incompatible with extended thinking. The model does not generate text before the tool call — even if explicitly asked.

Note on terminology: OpenAI uses required, Anthropic uses any to indicate "must call at least one tool." LiteLLM converts between formats automatically.

RAG as a Separate Pipeline vs. Tool Use as a Decision-Making Mechanism

This is not just a conceptual difference – it's a difference in who controls the flow. In a RAG pipeline, the decision to search is hardcoded into the architecture. In Tool Use, the model is an active participant in this decision. Understanding this boundary determines what problems you'll encounter in production and where to look for them.

Classic RAG: Deterministic Pipeline

User query
  → Query embedding         ← transforming text into a vector
  → Vector DB search (always, unconditionally)
  → Results reranking
  → Context formation (prompt stuffing)
  → Passing to LLM
  → Response

Each step happens independently of the query's content. If a user asks "what is 2+2" – the pipeline will still perform embedding, go to Qdrant, retrieve the top 5 chunks, and pass them to the model's context. The model will receive unnecessary context and be forced to ignore it.

The first step of this pipeline – embedding – deserves special attention: this is where text is converted into a numerical vector that carries semantic meaning, not just keywords. How it works technically – Embeddings in Simple Terms: How AI Understands Meaning, Not Just Words . More on the full mechanics of a RAG pipeline from chunking to reranking: RAG in 2026: From PoC to Production – A Complete Guide . On the fundamental difference between the capabilities of a "pure" LLM and a RAG system – LLM vs RAG in 2026: Why They Are Not the Same Thing .

This is not a bug – it's a conscious architectural cost for predictability.

Tool Use: Model as an Active Agent

User query
  → LLM analyzes context and intent
  → Decision: respond from own knowledge / call tool / call multiple tools in parallel
       ↓ if tool call
  → Which tool? With what parameters?
  → JSON → Agent Scaffolding → execution → result → LLM
  → Final response

The model doesn't just generate text – it makes decisions about the flow. For the query "what is 2+2," it will respond directly without touching any tools. For the query "what are the contract termination conditions" – it will call search_documents. For the query "compare contracts A and B" – possibly two parallel calls to the same tool with different parameters.

Where the Real Boundary Lies

Key analogy: RAG is a librarian who always goes to the shelf before answering. Tool Use is a consultant who thinks first, and only then decides if they need to look at documents. The consultant is more effective – but their decisions are less predictable.

Characteristic	RAG Pipeline	Tool Use
Who decides "to search or not to search"	Architecture – always searches	Model – depends on context
Number of data sources	Usually one (vector DB)	Any number of tools
Predictability of behavior	High – same path always	Lower – depends on model's decision
Latency	Stable and predictable	Variable: from 0 to N tool calls
Cost per query	Fixed	Depends on the number of calls
Efficiency on simple queries	Low – unnecessary retrieve always	High – model skips unnecessary steps
Debugging complexity	Low – one deterministic path	Higher – need to log the entire decision cycle
Risk of "not finding" needed information	Only if the retriever is poor	Also if the model decides not to search

RAG within Tool Use: What It Offers and What It Costs

A RAG pipeline can be implemented as one of the tools in a Tool Use system – meaning search_knowledge_base(query) becomes a tool that the model calls when needed. This offers a real advantage: unnecessary embedding queries and searches don't happen, and tokens aren't wasted on forming context.

But there's a cost:

New failure points. In classic RAG, retrieval is guaranteed. In Tool Use, the model might decide not to search for a query where searching is critically needed. This is especially dangerous when the model is "confident" in an answer from its own knowledge, but that knowledge is outdated.
Tool description quality becomes critical. If the description is vague or doesn't cover the scenario, the model won't call the tool. In classic RAG, the description doesn't affect the search decision at all. Details on how to write descriptions are in TU-2.
Observability is more complex. In a RAG pipeline, there's always a log of the retrieve query and its result. In Tool Use, you need to separately log the model's decision (called/not called), call parameters, and the result – to understand where the response went wrong.

Practical Conclusion: When to Choose What

Stick with classic RAG if: the sole data source is a corporate document base, all queries require searching, predictability and ease of debugging are critical, SLA for latency is strict.

Switch to Tool Use if: there are multiple different data sources (DBs + APIs + files), some queries don't require external data at all, the ability for parallel queries to different systems is needed, you are building an agentic system where retrieval is just one of the scenarios.

⚠️ Pitfall

The most dangerous scenario in "RAG via Tool Use": the model responds confidently and coherently, but without calling a tool – because it decided it already knows the answer. In a corporate context (contracts, prices, regulations), this means an answer that might be correct in form but outdated in content. Always log the stop_reason: if it's "end_turn" without a preceding "tool_use" – the model did not search. Decide if this is acceptable for your scenario.

Where RAG and Tool Use Intersect — and Where They Diverge

The previous section showed the difference in architectural logic. Here's the practice: how these two approaches interact in real systems, which combination patterns work, and where the boundary between them is intentionally blurred.

Three Combination Patterns

Pattern 1: RAG as a Standalone Pipeline (without Tool Use)

The classic approach. Retrieval always happens, and the model automatically receives context. Suitable when all queries require search and predictability is more important than efficiency. AskYourDocs in its basic configuration is precisely this pattern.

Pattern 2: RAG as One of the Tools in a Tool Use System

The model receives search_knowledge_base(query) as a tool and decides when to call it. Simple queries ("what is PDF?", "hello") are handled without retrieval. Complex or specific ones trigger a search. This increases efficiency but adds a new point of failure: the model might not call the tool when it should.

Pattern 3: Multiple Specialized Tools + RAG as One of Them

The most powerful and most complex option. The model has access to multiple tools simultaneously:

tools = [
    search_knowledge_base(query),      # RAG on corporate documents
    get_contract_status(contract_id),  # query to CRM/ERP
    calculate_deadline(date, days),    # date calculation
    send_notification(user_id, text),  # action in an external system
]

Here, RAG is no longer the entire pipeline but one specialized tool among others. The model itself decides the combination: for example, first search_knowledge_base to find the contract terms, then get_contract_status to check the current status, and finally generate a response based on both results.

Where the Boundary Blurs: RAG with Internal Selection Logic

There's an intermediate option that is often underestimated: a classic RAG pipeline with pre-retrieval routing within it. Before going to the vector database, a lightweight classifier or prompt decides whether retrieval is needed for this query at all.

# Simplified pre-retrieval routing logic
def should_retrieve(query: str) -> bool:
    # Simple heuristic classifier
    factual_keywords = ["contract", "terms", "price", "regulation", "deadline"]
    return any(kw in query.lower() for kw in factual_keywords)

def answer(query: str) -> str:
    if should_retrieve(query):
        context = retriever.search(query)
        return llm.generate(query, context=context)
    else:
        return llm.generate(query)  # without retrieve

This is not "pure" Tool Use — the code makes the decision, not the model. But it's also not "pure" RAG — retrieval doesn't always happen. This hybrid offers the predictability of a classic pipeline plus some of the efficiency of the Tool Use approach. The cost is maintaining the classifier and the risk of false positives.

Pattern Comparison

Pattern	Who Decides Retrieval	Complexity	Best For
RAG Pipeline	Architecture (always)	Low	Single knowledge base, all queries require search
RAG + Pre-retrieval Routing	Classifier in code	Medium	Mixed queries, predictability needed
RAG as a Tool in Tool Use	LLM (based on context)	Medium	Primarily document queries + small percentage of "pure" LLM
Multiple Tools + RAG as One	LLM (based on context)	High	Agent systems, multiple data sources, complex workflows

Where Exactly They Intersect: Common Components

Regardless of the pattern, both approaches rely on the same fundamental components:

Embedding Model — needed in RAG for vector search, and may be needed in Tool Use for tool retrieval when there are 50+ tools. How embeddings turn text into a semantic vector is explained in Embeddings in Simple Terms: How AI Understands Meaning, Not Just Words .
Vector Database — in RAG, it stores document chunks; in Tool RAG systems, it stores tool descriptions for semantic search within the tool registry.
Reranker — in RAG, it re-evaluates the relevance of chunks; in Tool Use, it can re-evaluate tool relevance with a large registry.
Observability Layer — in both cases, it's crucial to log what was found/called and what went into the model's context.

⚠️ Pitfall #3: The Illusion of Choice

The most common architectural mistake when transitioning from RAG to Tool Use: the developer is convinced that "the model is smart and will decide when to search itself" — and doesn't spend time on a quality description for the tool. As a result, the model either fails to call the tool for queries where it's needed, or calls it for queries where it's redundant. A tool without a clear description is like a library without a catalog: technically accessible, but practically unreachable. Details on the mechanics of this decision and how to write descriptions are in TU-2: How the Model Decides When to Search.

Tool Use ≠ Agent: Why Calling a Function Isn't an Agent Yet

This is one of the most common terminological mix-ups in 2026. Upon seeing a model call get_weather() or search_documents(), developers say "I've built an agent." In reality, they've built an LLM with a tool — an important step, but not agency.

What is an Agent, Really?

Comet (2026) defines an agent by three mandatory components:

Loop: An agent doesn't make one call and stop. It operates in a loop: think → act → observe → think again. The number of steps is not predetermined.
Memory: An agent remembers what it has already done, what results it obtained, and uses this for subsequent decisions. Not just dialogue history — but intermediate execution results.
Planning: An agent can break down a complex goal into subtasks, determine the order of actions, and change the plan if something goes wrong.

Tool Use without these three components is simply LLM with RPC calls. Technically, it's no different from how a regular program calls a function. The only difference is who decides which function to call: the programmer (code) or the model (prompt).

Spectrum of Complexity: From Simple Call to True Agent

Agency is not a binary property (either an agent or not), but a spectrum. It's convenient to think in terms of "degree of agency":

L2: Conditional Tool Calls

Level	Capabilities	Example	Is it an Agent?
L0: Single Tool Call	One call, one result, then the answer	"What's the weather in Kyiv?" → `get_weather` → answer	❌ No, regular tool use
L1: Sequential Tool Calls	Multiple calls in sequence, without feedback between them	"Find document A and document B" → two calls in order	❌ No, just a pipeline
The result of the first call affects the second	"Find the contract → if it's active, then check the terms"	⚠️ Partially — nascent planning
L3: ReACT-style Agent	Full loop: Thought → Action → Observation → repeat	"Schedule a meeting: find available slots → choose the best one → send invitation"	✅ Yes, classic agent
L4: Autonomous Agent	Like L3 + long-term memory + independent subtask planning	"Prepare a monthly report" → decides itself what data to collect, in what order, and does it	✅ Yes, a full-fledged agent

Key takeaway: Tool Use appears at level L0. But true agency — the loop + planning — only emerges at L3.

Why This Understanding is Important

Developers who confuse tool use with agents make three typical mistakes:

They don't add a loop. They make one call, get a result — and think it's an agent. But if two tools are needed for the answer, the system fails.
They don't pass execution context. An agent should receive back not only the function's result but also meta-information: whether it executed successfully, how long it took, what errors occurred. Without this, the agent is "blind."
They expect magic. "I added tools — why isn't the model building a complex plan itself?" Because planning is a separate capability that requires either proper prompting or specialized frameworks (LangGraph, AutoGen, CrewAI).

What a Minimal Agent Loop Looks Like

Here's the fundamental difference between a one-time tool call and a looped agent:

# ❌ This is NOT an agent — it's tool use response = client.messages.create( tools=[search_documents], messages=[{"role": "user", "content": "find documents about the contract"}] ) result = execute_tool(response.tool_call) final = client.messages.create( messages=[..., tool_result] ) # ✅ This is a MINIMAL agent — with a loop max_iterations = 5 while iterations < max_iterations and not finished: response = client.messages.create( tools=tools, messages=conversation_history ) if response.stop_reason == "tool_use": for tool_call in response.tool_calls: result = execute_tool(tool_call) conversation_history.append(tool_result(tool_call.id, result)) continue # ← loop! return to the model with the result # stop_reason == "end_turn" — the agent decided the answer is ready finished = True return response.content

The difference is one line — continue. But it's this line that creates the loop, allowing the model to take multiple steps, see the results of its actions, and make subsequent decisions based on what it has observed. Without a loop, it's just RPC. With a loop, it's the seed of an agent.

⚠️ Pitfall #4: An Agent Isn't Always Better

Agents sound cool. But they come with a real cost:

Unpredictable number of calls. One query might generate 1 tool call, and another might generate 10 — depending on complexity and prompt quality. Token costs and latency become non-deterministic.
Risk of infinite loops. Without protective mechanisms, an agent can get stuck in an endless loop: call tool → get error → call again → error again. Always add max_iterations and detection of repeating patterns.
Debugging complexity. In regular tool use, you know: one call was made → got a response. With an agent, it's a chain of N steps, each of which could have gone wrong. Observability tools (langfuse, helicone, arize) become not an option, but a necessity.

Practical advice: start with L0 (single tool call) or L1 (sequential). Add a loop and planning only when a simple pipeline truly hits limitations. An agent is a tool for complex tasks, not a default architecture.

In Short: Remember the Formula

Tool Use = LLM decides which tool to call.
Agent = Tool Use + loop + memory + planning.

If you only have the first line — you don't have an agent. And that's okay. Most tasks don't require full agency. But calling things by their proper names is important for setting correct expectations with the team and stakeholders.

From RAG Pipeline to Tool Use: What's Next

If you came from the RAG Hub , your current stack looks something like this: documents → chunking → embed → Qdrant → BM25 + rerank → LLM. This is a classic RAG pipeline — deterministic, predictable, well-debugged.

The Tool Use Hub, which you are currently reading, describes what happens around this pipeline:

How the model decides when to initiate a search at all → How LLM Models Decide When to Search — Decision-Making Mechanics

AskYourDocs is a specific product example where the RAG pipeline is one of the tools in a broader system: hybrid search over corporate documents, complete data isolation on the client's server, without vendor lock-in at the LLM provider level.

❓ Frequently Asked Questions

What's the difference between Function Calling and Tool Use?

Function Calling is the original OpenAI term. Tool Use is a broader term from Anthropic, which also includes built-in tools (web search, code interpreter). The mechanics are the same: the model returns JSON, your code executes it.

Does the LLM execute the function itself?

No. The model generates a structured JSON request with the function name and arguments. Execution is always on your application's side.

How does RAG differ from Tool Use?

RAG is a deterministic pipeline where retrieval always occurs. Tool Use is a mechanism where the model itself decides if calling a tool is necessary. RAG can be implemented as one of the tools in a Tool Use system.

When to use tool_choice: required?

Only in deterministic pipelines — structured output, mandatory logging. In conversational mode, it leads to unnecessary tool calls and token costs. Also: required / any are incompatible with extended thinking in Claude.

✅ Conclusions

Function Calling and Tool Use are the same mechanics with different names. The model generates JSON, your code executes it.
tool_choice offers three control modes: auto (default), required (forced), none (text only).
RAG pipelines and Tool Use exist at different levels of abstraction — the former is deterministic, the latter is adaptive.
RAG can be a tool within a Tool Use system — but this increases unpredictability and requires precise tool descriptions.
The classic RAG remains the best choice for products with a single knowledge base and strict predictability requirements.

Next step: how exactly the model decides whether to call a tool or not — and how the tool's description affects this — we'll cover in TU-2: How the Model Decides When to Search.

Sources

Symflower — Function calling in LLM agents (2025) · Martin Fowler — Function calling using LLMs (2025) · Anthropic Docs — How to implement tool use · Prompt Engineering Guide — Function Calling with LLMs · Simplicity is SOTA — How LLMs are trained for function calling (2025) · Berkeley BFCL V4 (2025) · Comet — Agent Orchestration (2026)

Categories