spiceai/docs

spiceai/

docs

Help Login

evgenii/docs-spicepod-v2

Edit on GitHub

Fork

/docs/website/versioned_docs/version-2.0.x/features/semantic-model/index.md

spiceai/docs | Spice Cloud Platform

evgenii/docs-spicepod-v2

Edit on GitHub

Fork

/docs/website/versioned_docs/version-2.0.x/features/semantic-model/index.md

spiceai/docs/README.md

title: 'Semantic Model' sidebar_label: 'Semantic Model' description: 'Attach descriptions and metadata to datasets, views, and columns in Spice so LLMs, SQL functions, and humans share the same understanding of your data.' sidebar_position: 12 pagination_prev: null pagination_next: null

The semantic model is the layer that gives every dataset, view, and column a human-readable description plus structured metadata, and exposes that context uniformly to SQL queries, LLM tools, and language models.

Every description ends up in two places at once:

on the dataset's Arrow schema as description metadata — readable from SQL via obj_description and col_description; and
in the LLM tool context — surfaced by the built-in list_datasets and table_schema tools, so models see your descriptions automatically.

Defining a Semantic Model

Semantic data models are defined within the spicepod.yaml file under the datasets section. Each dataset supports description, metadata, and a columns field where individual columns are described with metadata and features for utility and clarity.

Example Configuration

Dataset Metadata

Datasets can be defined with the following metadata:

instructions: Optional. Instructions to provide to a language model when using this dataset.
reference_url_template: Optional. A URL template for citation links.

Arbitrary additional keys may be added under metadata: and are passed through to the Arrow schema unchanged, so anything an LLM tool or downstream consumer expects to find there (e.g. governance labels, owning team, source-of-truth links) can be attached without code changes.

For detailed metadata configuration, see the Dataset Reference.

Column Definitions

Each column can be defined with the following attributes:

description: Optional. A description of the column's contents and purpose.
type (alias data_type): Optional. Declared column type (Postgres-style or Arrow display form).
nullable: Optional. Override column nullability.
embeddings: Optional. Vector embeddings configuration for this column.
full_text_search: Optional. Full-text-search configuration.
metadata: Optional. Arbitrary key/value metadata attached to the column.

For detailed columns configuration, see the Dataset Reference.

Source-Side Comments

When a dataset is loaded from a source that exposes table or column comments, Spice automatically imports those comments into the Arrow schema metadata under the description key. This means database-native COMMENT ON TABLE and COMMENT ON COLUMN annotations show up alongside Spicepod-defined description values, giving the semantic model the same context that already lives in the source database.

Source-side comments are imported automatically from:

Source	What is imported
PostgreSQL	Table and column comments via `obj_description` / `col_description` in `pg_catalog`.
MySQL	`information_schema.tables.table_comment` and `information_schema.columns.column_comment`.
Snowflake	`information_schema.tables.comment` and `information_schema.columns.comment`.
Databricks SQL Warehouse	Table and column comments returned by the SQL Warehouse driver's metadata API.
BigQuery (via ADBC catalog)	`INFORMATION_SCHEMA.TABLE_OPTIONS` (`description`) for the table and `INFORMATION_SCHEMA.COLUMN_FIELD_PATHS.description` for top-level columns.

Connectors not listed above (object stores, file formats, HTTP, etc.) do not have a comment concept on the source side; for those datasets, attach descriptions in the Spicepod.

Precedence

When both a Spicepod description and a source-side comment exist for the same table or column, the Spicepod value wins. This lets you override unhelpful or out-of-date source comments without round-tripping a DDL change through the upstream database.

If the Spicepod does not set a description, the source-side comment passes through unchanged.

Reading descriptions from SQL

Two PostgreSQL-compatible scalar UDFs read the description metadata at query time:

Full signatures, including PostgreSQL-shaped (oid, 'pg_class') variants and four-part (catalog, schema, table, column) forms, are documented in obj_description and col_description.

Because these UDFs read the same metadata that source-side comment extraction populates, queries against PostgreSQL or Snowflake datasets that were authored with COMMENT ON … will return those comments without any additional configuration.

How LLMs see the semantic model

The built-in LLM tools surface descriptions and metadata into the model's context automatically.

`list_datasets`

Returns every dataset, view, and catalog table visible to the runtime as JSON, with each entry's fully-qualified table name, description, metadata, columns, and capability flags (e.g. can_search_documents). Descriptions come from the Spicepod, falling back to source-side comments when no Spicepod description is set.

`table_schema`

Returns one or more tables' columns as a markdown table. When called with output: full (the default), the rendered schema includes a Metadata: block at the top with the table-level description and any other dataset metadata keys, and a per-column Metadata cell containing the column's description and any column metadata. With output: minimal, only column names, types, and nullability are returned.

Because both list_datasets and table_schema are part of the auto and all tool groups, no extra configuration is required — every model declared with tools: auto (or any explicit list that includes table_schema / list_datasets) gets the semantic model in its context.

When the Tool Registry is active, dataset descriptions also feed the hybrid search index used by tool_search to match user questions to relevant dataset-bound tools, so well-described datasets are easier for the model to discover at scale.

End-to-end flow