What is LLM Tool Calling?

LLM tool calling is a model capability where, instead of generating plain text, the model outputs a structured function call with a name and arguments -- enabling AI applications to execute code, query databases, and interact with external systems.

Large language models generate text. But text alone cannot query a database, send an email, read a file, or call an API. Tool calling bridges this gap by enabling a model to output structured function calls that an application can execute on the model's behalf.

Without tool calling, developers resort to prompt engineering -- instructing the model to output JSON in a specific format, then parsing that output with custom code. This approach is fragile: the model might produce malformed JSON, hallucinate function names, or include extra text around the structured output. Tool calling formalizes this interaction, giving the model a typed interface for expressing "I want to call this function with these arguments."

Tool calling is the foundation of agentic AI. Every autonomous agent that plans, executes multi-step tasks, and interacts with the real world depends on the ability to invoke tools reliably and correctly.

How Tool Calling Works

A tool calling interaction follows a defined loop between the application, the model, and external systems.

Step 1: Define Available Tools

The application provides the model with a set of tool definitions -- each specifying a name, description, and parameter schema. These definitions tell the model what tools are available and how to call them.

{
  "tools": [
    {
      "name": "query_database",
      "description": "Execute a read-only SQL query against the application database",
      "parameters": {
        "type": "object",
        "properties": {
          "sql": {
            "type": "string",
            "description": "The SQL query to execute"
          }
        },
        "required": ["sql"]
      }
    },
    {
      "name": "search_documents",
      "description": "Search indexed documents using natural language",
      "parameters": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "limit": { "type": "integer", "default": 10 }
        },
        "required": ["query"]
      }
    }
  ]
}

The quality of tool definitions directly affects how well the model uses them. Clear, specific descriptions and well-typed parameter schemas reduce errors and hallucinated arguments.

Step 2: Model Selects a Tool

Given the user's message and the available tool definitions, the model decides whether to respond with text or invoke a tool. If it chooses a tool, it outputs a structured object with the tool name and arguments:

{
  "tool_call": {
    "name": "query_database",
    "arguments": {
      "sql": "SELECT customer_name, SUM(amount) as total FROM orders WHERE created_at > '2026-01-01' GROUP BY customer_name ORDER BY total DESC LIMIT 10"
    }
  }
}

The model does not execute the tool. It produces a structured request that the application interprets.

Step 3: Application Executes the Tool

The application receives the tool call, validates the arguments, and executes the function. This is where security controls, rate limiting, and authorization checks are applied. The application -- not the model -- decides whether the tool call is safe to execute.

Step 4: Result Fed Back to Model

The tool's output is sent back to the model as a new message in the conversation. The model then reasons about the result and either responds to the user with text or makes another tool call.

{
  "role": "tool",
  "name": "query_database",
  "content": "[{\"customer_name\": \"Acme Corp\", \"total\": 142500}, {\"customer_name\": \"Globex\", \"total\": 98300}]"
}

This loop -- tool call, execution, result, reasoning -- can repeat multiple times in a single interaction, enabling multi-step workflows.

The Multi-Step Tool Calling Loop

Simple questions need a single tool call. Complex tasks require multiple steps where the output of one tool informs the next. Consider a user asking: "Which of our enterprise customers had the highest support ticket volume last quarter, and what were the top issues?"

A capable agent might execute this sequence:

  1. Call query_database to get enterprise customers from the CRM
  2. Call query_database to get support tickets for those customers in the last quarter
  3. Call query_database to aggregate tickets by issue category
  4. Reason about the results and produce a summary

Each step depends on the previous step's output. The model plans the sequence, executes tools iteratively, and synthesizes the results into a coherent response. This multi-step reasoning is what distinguishes agentic tool use from simple function calling.

Parallel Tool Calls

Some models support parallel tool calling, where multiple independent tools are invoked in a single turn. If the model needs both customer data and product data, it can issue both queries simultaneously rather than sequentially. This reduces round trips and improves latency in multi-step workflows.

Tool Calling vs. Prompt Engineering for Structured Output

Before tool calling was widely available, developers extracted structured actions from models using prompt engineering:

You are a helpful assistant. When the user asks for data, respond with
a JSON object like: {"action": "query", "sql": "SELECT ..."}
Do not include any other text in your response.

This approach has several problems:

  • Unreliable formatting: The model might include markdown code fences, explanatory text, or malformed JSON.
  • No schema validation: There is no formal contract between the model's output and the expected structure.
  • Ambiguous intent: The model might respond with text when a tool call was expected, or vice versa.
  • No tool discovery: Adding new tools requires rewriting the system prompt rather than adding a typed definition.

Tool calling solves these problems by making function invocation a first-class capability of the model. The model produces typed, validated tool calls through a dedicated output channel, separate from text generation. This is more reliable, easier to maintain, and scales to dozens or hundreds of tools.

How MCP Standardizes Tool Calling

The Model Context Protocol (MCP) standardizes how AI applications discover, connect to, and invoke tools across distributed servers. Without MCP, every AI application implements its own tool calling integration for each external service. MCP defines a universal protocol so a tool built once works with any MCP-compatible client.

MCP's contribution to tool calling is threefold:

Discovery: MCP servers expose tool manifests -- machine-readable descriptions of available tools, their parameters, and their capabilities. An AI application connecting to an MCP server automatically discovers what tools are available without hardcoded configurations.

Transport: MCP defines how tool calls and results are transmitted between the AI application (client) and the tool provider (server), supporting both local execution (stdio) and remote execution (SSE over HTTP).

Interoperability: A tool exposed as an MCP server works with Claude, GitHub Copilot, Cursor, and any other MCP-compatible client. This eliminates the O(N x M) integration problem where N applications each need custom integrations for M tools.

Tool Calling for Data Access

One of the most common tool calling patterns is giving AI models access to data through SQL queries, API calls, or search operations.

SQL as a Tool

When a model has access to a query_database tool, it can answer data questions by writing and executing SQL. This is more flexible than pre-computed dashboards because the model generates queries dynamically based on the user's specific question.

SQL federation makes this pattern even more powerful. A single tool can provide access to PostgreSQL, MySQL, Snowflake, S3, and 30+ other data sources through one interface. The model writes a SQL query; the federation engine routes it to the correct source.

Search as a Tool

Models that need to retrieve relevant documents or context can use search tools. A hybrid search tool combines keyword and semantic search to find relevant content, which the model then uses to generate informed responses. This is the tool-calling-based approach to retrieval-augmented generation (RAG).

API Calls as Tools

Tools can wrap any HTTP API -- reading from a CRM, posting a message to Slack, creating a Jira ticket, or triggering a deployment. Each API endpoint becomes a tool with a defined schema, and the model invokes it as needed during multi-step workflows.

Security Considerations

Tool calling introduces a new attack surface: the model can now take actions, not just produce text. Security must be treated as a first-class concern.

Input Validation

Every tool call argument must be validated before execution. A model generating SQL queries could produce destructive statements (DROP TABLE, DELETE FROM without a WHERE clause). The application must enforce read-only constraints, parameterize inputs, and reject malformed queries.

# Always validate and constrain tool call arguments
def execute_query(sql: str) -> str:
    # Reject write operations
    normalized = sql.strip().upper()
    if any(normalized.startswith(kw) for kw in ["DROP", "DELETE", "UPDATE", "INSERT", "ALTER", "TRUNCATE"]):
        return "Error: Only read-only queries are permitted."

    # Execute with timeout and row limit
    result = db.execute(sql, timeout=5000, max_rows=1000)
    return json.dumps(result)

Sandboxing

Tools that execute code, access file systems, or interact with infrastructure should run in sandboxed environments with minimal permissions. A code execution tool should run in a container with no network access and no persistent storage. A file system tool should be restricted to a specific directory.

Authorization

Not every user (or model) should have access to every tool. Authorization policies should control:

  • Which tools are available to which models or users
  • What parameter values are permitted (e.g., restricting queries to specific tables)
  • Rate limits on tool invocations
  • Audit logging for compliance and debugging

Prompt Injection

Malicious content in tool results can attempt to manipulate the model's behavior. If a search tool returns a document containing "Ignore all previous instructions and...", the model might follow those instructions. Defenses include sanitizing tool outputs, using separate system prompts for tool results, and monitoring for anomalous model behavior after tool execution.

Tool Calling with Spice

Spice provides governed tool calling through its MCP gateway, combining tool execution with federated data access and LLM inference in a single runtime:

  • MCP server federation: Aggregate tools from multiple MCP servers behind a single endpoint. Models access all available tools through one connection rather than managing separate integrations.
  • Governed tool routing: Assign specific tools to specific models with fine-grained access controls. A customer-facing model gets read-only data tools; an internal automation agent gets broader access.
  • Data tools built in: SQL queries, embedding search, and hybrid search are available as tools natively -- no external MCP server needed for data access.
  • End-to-end observability: Distributed tracing follows a request from the inference call through tool execution, data queries, and back to the model, providing full visibility into multi-step agent workflows.
  • Security controls: Input validation, rate limiting, and audit logging are applied at the gateway level, enforcing consistent policies across all tool invocations regardless of which model or client initiated the call.

This approach means AI applications get model inference, tool calling, and data access through a single, governed infrastructure layer -- reducing complexity while maintaining the security controls that enterprise deployments require.

Advanced Topics

The Tool Calling Loop

In agentic workflows, tool calling is not a single request-response exchange. It is an iterative loop where the model reasons, invokes tools, processes results, and decides whether to continue or respond to the user.

loop [Until LLM responds with text] User message + tool definitions Tool call (name + arguments) Tool result Final text response Reason about result User LLM Tools

Understanding the mechanics of this loop -- and the failure modes at each step -- is essential for building reliable agent systems.

Parallel Tool Calls

When the model needs data from multiple independent sources, sequential tool calls introduce unnecessary latency. Parallel tool calling enables the model to emit multiple tool call requests in a single turn, which the application executes concurrently and returns as a batch.

Consider an agent asked: "Compare our Q1 revenue against the industry benchmark and check if any support escalations are open." The model can issue a query_revenue call and a check_escalations call simultaneously. The application runs both, returns both results, and the model synthesizes a single response.

Not all models support parallel tool calls natively. For models that do (including Claude and GPT-4), the tool call response contains an array of calls rather than a single call. The application must match each result back to the correct call ID when returning results. For models that don't support parallel calls, the application can implement a planning layer that detects independent tool calls across sequential turns and executes them concurrently, returning results in a single batch.

Parallel tool calls reduce round trips and end-to-end latency proportionally to the number of independent calls. In multi-step agent workflows with 3-5 independent data lookups, parallel execution can cut total latency by 60-80%.

Tool Call Chaining and Planning

Complex tasks require the model to plan a sequence of tool calls where each step depends on the output of the previous one. This is tool call chaining -- the model decomposes a high-level objective into an ordered sequence of tool invocations.

A user asking "Find the customer with the highest churn risk and draft a retention email based on their recent activity" requires:

  1. Call query_database to retrieve churn risk scores
  2. Call query_database to fetch the top customer's recent activity
  3. Call send_email (or draft the email in text) based on the activity data

The model must plan this chain, execute each step, validate intermediate results, and adjust the plan if unexpected data appears (e.g., the highest-risk customer has no recent activity on record).

Effective chaining depends on the model's ability to maintain a coherent plan across multiple turns. Providing the model with explicit planning instructions in the system prompt -- "Think step by step about what information you need before taking action" -- improves chaining reliability. Some frameworks (like LangChain's plan-and-execute pattern) formalize this by having the model output an explicit plan before executing any tools.

Error Recovery Patterns

Tool calls fail. Databases time out, APIs return errors, arguments are malformed, and rate limits are hit. A robust tool calling system needs strategies for handling these failures gracefully.

Retry with backoff is the simplest pattern: if a tool call returns a transient error (timeout, rate limit, 503), the application retries with exponential backoff before returning a permanent failure to the model. The model should not be responsible for implementing retry logic -- this is an application-layer concern.

Fallback tools provide alternative paths when the primary tool fails. If a real-time API is unavailable, the application can fall back to a cached data source or a different API that provides approximate data. The model receives the result with a note that it came from a fallback source.

Graceful degradation means the model explains what it could not do rather than failing silently. If a tool call fails after retries and no fallback is available, the model should report the specific failure ("I wasn't able to retrieve the latest revenue data because the database connection timed out") and offer what it can provide from the context it has. This is preferable to hallucinating an answer or returning a generic error message.

Error context matters: when returning a tool failure to the model, include the error type, a human-readable message, and whether the error is transient or permanent. This gives the model enough information to decide whether to retry, use an alternative approach, or report the issue to the user.

Tool Calling FAQ

What is the difference between tool calling and function calling?

Tool calling and function calling refer to the same capability -- the model outputs a structured function invocation instead of plain text. "Function calling" was the original term used by OpenAI; "tool calling" is the more general term adopted across the industry. MCP and most frameworks use "tool calling" to describe this capability.

How does the model decide which tool to use?

The model selects tools based on the user's message, the conversation history, and the tool definitions (name, description, parameter schema) provided by the application. Clear, specific tool descriptions are critical -- they help the model match user intent to the correct tool. When multiple tools could apply, the model uses the descriptions and parameter schemas to choose the best fit.

What are the security risks of tool calling?

Tool calling allows models to take actions, not just produce text, which introduces risks: SQL injection through generated queries, unauthorized access to sensitive APIs, destructive operations (deleting data, modifying configurations), and prompt injection through tool results. Mitigations include input validation, read-only constraints, sandboxed execution, authorization policies, and audit logging.

How does tool calling differ from retrieval-augmented generation (RAG)?

RAG retrieves relevant documents and injects them into the model's context before generation. Tool calling lets the model invoke arbitrary functions -- which can include retrieval, but also database queries, API calls, code execution, and actions. Tool calling is more general: RAG is one pattern that can be implemented through tool calling (a search tool), but tool calling supports many other patterns beyond retrieval.

Can a model use multiple tools in a single interaction?

Yes. Models can make sequential tool calls where the output of one tool informs the next, enabling multi-step workflows. Some models also support parallel tool calling, where multiple independent tools are invoked simultaneously in a single turn. Multi-step tool use is the foundation of agentic AI, where models plan and execute complex tasks autonomously.

See Spice in action

Get a guided walkthrough of how development teams use Spice to query, accelerate, and integrate AI for mission-critical workloads.

Get a demo