title: 'Vector-Based Search' sidebar_label: 'Vector Search' description: 'Learn how Spice can perform searches using vector-based methods.' sidebar_position: 1 tags:
🎓 Learn how it works with the Amazon S3 Vectors with Spice engineering blog post.
Spice provides advanced vector-based search capabilities, enabling more nuanced and intelligent searches.
Spice supports two types of embedding providers:
Embedding models are defined in the spicepod.yaml file as top-level components.
To enable vector search, specify embeddings for the dataset columns in spicepod.yaml:
This configuration instructs Spice to create embeddings from the body column, enabling similarity searches on body content.
Execute similarity searches using Spice's HTTP API:
For detailed API documentation, see Search API Reference.
If the dataset uses chunking, Spice returns relevant chunks. To retrieve entire documents, include the embedding column in additional_columns:
Response:
The embedding index can also be used to perform search in SQL, via a user-defined table function (UDTF).
SQL Function Signature of vector_search:
:::warning[Limitations]
vector_search UDTF does not support chunked embedding columns.:::
Spice supports vector searches on datasets with pre-existing embeddings. Ensure the dataset meets these requirements:
<original_column_name>_embedding.FixedSizeList[Float32|Float64, N]List[FixedSizeList[Float32|Float64, N]]<column_name>_offsets) is required:
List[FixedSizeList[Int32, 2]], indicating chunk boundaries.Example dataset structure (sales table):
Non-chunked:
Chunked:
Underlying Column Presence:
string Arrow data type .Embeddings Column Naming Convention:
<column_name>_embedding. For example, a customer_reviews table with a review column must have a review_embedding column.Embeddings Column Data Type:
FixedSizeList[Float32 or Float64, N], where N is the dimension (size) of the embedding vector. FixedSizeList is used for efficient storage and processing of fixed-size vectors.List[FixedSizeList[Float32 or Float64, N]].Offset Column for Chunked Data:
<column_name>_offsets with the following Arrow data type:
By following these guidelines, you can ensure that your dataset with pre-existing embeddings is fully compatible with the vector search and other embedding functionalities provided by Spice.
A table sales with an address column and corresponding embedding column(s).
List[FixedSizeList[Int32, 2]], where each element is a pair of integers [start, end] representing the start and end indices of the chunk in the underlying text column. This offset column maps each chunk in the embeddings back to the corresponding segment in the underlying text column.[[0, 100], [101, 200]] indicates two chunks covering indices 0–100 and 101–200, respectively.