title: 'Vector-Based Search' sidebar_label: 'Vector Search' description: 'Learn how Spice can perform searches using vector-based methods.' sidebar_position: 1 tags:
Spice provides advanced vector-based search capabilities, enabling more nuanced and intelligent searches. The runtime supports both:
Embedding models are defined in the spicepod.yaml file as top-level components.
Datasets can be augmented with embeddings targeting specific columns, to enable search capabilities through similarity searches.
By defining embeddings on the body column, Spice is now configured to execute similarity searches on the dataset.
For more details, see the API reference for /v1/search.
Spice also supports vector search on datasets with preexisting embeddings. See below for compatibility details.
When performing searches on datasets with chunking enabled, Spice returns the most relevant chunk for each match. To retrieve the full content of a column, include the embedding column in the additional_columns list.
For example:
Response:
Datasets that already include embeddings can utilize the same functionalities (e.g., vector search) as those augmented with embeddings using Spice. To ensure compatibility, the dataset must:
spicepod.yaml file. This isn't used to compute embedding on data in the table, but to embed the query text for similarity search operations. Like above, this can be done in the dataset component:Underlying Column Presence:
string Arrow data type .Embeddings Column Naming Convention:
<column_name>_embedding. For example, a customer_reviews table with a review column must have a review_embedding column.Embeddings Column Data Type:
FixedSizeList[Float32 or Float64, N], where N is the dimension (size) of the embedding vector. FixedSizeList is used for efficient storage and processing of fixed-size vectors.List[FixedSizeList[Float32 or Float64, N]].Offset Column for Chunked Data:
<column_name>_offsets with the following Arrow data type:
By following these guidelines, you can ensure that your dataset with pre-existing embeddings is fully compatible with the vector search and other embedding functionalities provided by Spice.
A table sales with an address column and corresponding embedding column(s).
The same table if it was chunked:
The embedding index can also be used to perform search in SQL, via a user-defined table function (UDTF).
The function signature of vector_search is
:::warning[Limitations]
vector_search UDTF does not support chunked embedding columns.:::
List[FixedSizeList[Int32, 2]], where each element is a pair of integers [start, end] representing the start and end indices of the chunk in the underlying text column. This offset column maps each chunk in the embeddings back to the corresponding segment in the underlying text column.[[0, 100], [101, 200]] indicates two chunks covering indices 0–100 and 101–200, respectively.