title: 'Tool Registry' sidebar_label: 'Tool Registry' description: 'Reduce per-turn token cost and improve LLM tool selection accuracy by replacing individual tool definitions with searchable tool_search and tool_invoke meta-tools backed by hybrid full-text, keyword, schema, and vector search.' sidebar_position: 13 pagination_prev: null pagination_next: null tags:
The Tool Registry is a runtime-level capability that replaces large lists of individual tool definitions with two meta-tools — tool_search and tool_invoke — backed by a hybrid search index over the runtime's tool catalog. It's used to keep per-turn token cost bounded as the catalog grows, while preserving the model's ability to discover and call any tool on demand.
The registry indexes every tool that's callable from an LLM, regardless of where it came from:
sql, list_datasets, table_schema, search, random_sample, …)sse or stdio)as_tool: true (the default)tools: entries with as_sql: true (callable from both SQL and the LLM)If it can be called from a chat completion, it goes through the registry.
Each tool exposed to a model carries a name, a description, and a JSON Schema for its parameters. A typical tool is 200–500 tokens of schema; a Spicepod with rich MCP integrations, several datasets exposed via sql / table_schema / search, and custom user-defined functions can quickly cross 50 tools and 10,000+ tokens of tool definitions injected into every chat turn.
That cost is paid on every request:
The Tool Registry replaces every individual tool definition with just two meta-tools:
tool_search(query, ...) — Searches the registry for tools relevant to a natural-language query. Returns the top N tools with their full schemas.tool_invoke(tool_id, arguments) — Invokes a tool returned by tool_search.For a workload with 50 tools, this is roughly a 10× reduction in tool-definition tokens injected per turn — the model now only sees the schemas of the tools it actively asks for.
list_datasets is always exposed directly alongside the meta-tools so the model can orient itself ("what tables exist?") in a single call without first asking the registry.
The registry is the right default for any model that has access to a substantial number of tools — particularly when those tools include:
tools: section).It's less useful when:
tool_search (saves one tool call per turn).For everything else — especially Spicepods that compose multiple tool sources — tools: auto is the recommended default.
The registry is controlled via the tools parameter on a model. Set it to search_registry to require registry-based discovery, or auto to let Spice decide:
tools: auto switches to the registry only when both of these are true:
AUTO_SEARCH_TOOL_THRESHOLD).Otherwise auto falls back to providing tools directly — keeping small Spicepods ergonomic while large ones automatically benefit. See the Tool Modes table for the full set of values.
tool_embedding_modelThe registry's vector channel uses a configured embedding model:
tool_embedding_model is required and must name one of them.tools: search_registry is rejected; tools: auto falls back to direct tools with a warning log.tool_search Ranks Resultstool_search runs a hybrid search over four channels and fuses the results with Reciprocal Rank Fusion (RRF):
| Channel | Signal |
|---|---|
full_text | TF-IDF over tokenized tool name (×3 weight), description (×2), and parameters (×1). |
keyword | Exact-phrase and token matches against name / description / parameter text. Weighted by where the match lands. |
schema | Matches against the parameter keys in the tool's JSON Schema (e.g. dataset, query). |
vector | Cosine similarity between the query embedding and per-tool document embeddings. |
Each channel produces a ranked list; RRF combines the ranks (not the scores) so a tool that places top-3 in two channels usually outranks one that places top-1 in a single channel. The final score is normalized to 0.0–1.0 against the highest-scoring tool in the result set.
Per-tool embeddings are computed lazily on first search and cached for the lifetime of the registry instance. The runtime keeps an LRU cache (up to 64 entries) of search-tool instances keyed on (runtime, embedding model, tools hash) so a Spicepod that hot-reloads tools without restarting the runtime doesn't pay the embedding cost repeatedly.
tool_search ReferenceThe model calls tool_search with a JSON object:
| Parameter | Type | Description |
|---|---|---|
query | string (required) | Natural-language description of the capability the model needs. |
keywords | string[] | Optional exact-match phrases that boost the keyword channel — useful for column or table names. |
limit | integer | Maximum results to return. Defaults to 5, capped at 20. |
min_score | number | Optional minimum score (0.0–1.0). When the cutoff filters out everything, the registry still returns the unfiltered top match as a fallback so the model isn't left empty-handed. |
Example call (issued by the model):
tool_search Responsematch_sources is intentionally surfaced — it lets the model (or a debugger) reason about why a tool was returned. A tool that only matched on vector but not full_text may be a semantic match for an unfamiliar phrasing; one that matched all four is a high-confidence hit.
tool_invoke Reference| Parameter | Type | Description |
|---|---|---|
tool_id | string | Tool name returned by tool_search. |
arguments | object | JSON object matching the selected tool's parameter schema. Defaults to {}. |
Example:
tool_invoke ResponseErrors propagate the underlying tool's error message, prefixed with the tool_id so the model can decide whether to retry, ask for a different tool, or surface the failure to the user.
Every function declared with as_tool: true (the default) is registered both as a SQL UDF and as a tool, and therefore participates in the registry. This means a Spicepod with many domain-specific UDFs benefits from the registry exactly the same way as one with many MCP tools — the model only sees the function definitions for the few it actually asks about.
To keep a function out of the registry (and out of the LLM tool surface entirely) while still callable from SQL, set as_tool: false:
User-defined table functions (UDTFs) are SQL-only and are not currently registered as LLM tools, so they don't appear in the registry.
tool_search and tool_invoke are reserved names. If a user-defined tool, function, or MCP tool registers under either name:
tools: search_registry → fails at startup with a clear error.tools: auto → logs a warning and falls back to direct tools.Rename the offending tool, or set as_tool: false to keep it SQL-only.
Two ways to inspect the catalog from outside the model:
SELECT * FROM list_udfs() WHERE source = 'user'; lists every user-declared function, regardless of whether it's currently in the registry.GET /v1/functions returns the functions registered as both SQL and tool entries.For tools (built-in plus MCP plus function-derived), the model can call tool_search with an open-ended query (e.g. query: "*") — though in practice, asking for the tools relevant to the current step is what the model actually wants.