title: 'Vector-Based Search' sidebar_label: 'Vector Search' description: 'Learn how Spice can perform searches using vector-based methods.' sidebar_position: 1 tags:
🎓 Learn how it works with the Amazon S3 Vectors with Spice engineering blog post.
Vector search uses embeddings (numerical representations of text or data) to find semantically similar content. Unlike keyword search, vector search understands meaning and context, making it useful for:
Spice supports two types of embedding providers:
Embedding models are defined in the spicepod.yaml file as top-level components.
To enable vector search, specify embeddings for the dataset columns in spicepod.yaml:
This configuration instructs Spice to create embeddings from the body column, enabling similarity searches on body content.
Execute similarity searches using Spice's HTTP API:
For detailed API documentation, see Search API Reference.
If the dataset uses chunking, Spice returns relevant chunks. To retrieve entire documents, include the embedding column in additional_columns:
Response:
The embedding index can also be used to perform search in SQL, via a user-defined table function (UDTF).
SQL Function Signature of vector_search:
By default, vector_search retrieves up to 1000 results. To adjust this limit, specify the limit parameter in the function call. When using a specific vector engine, such as s3_vectors the limit defaults to that of the vector engine.
:::warning[Limitations]
vector_search UDTF does not yet support chunked embedding columns. Chunking support is on the roadmap.:::
Spice supports vector searches on datasets with pre-existing embeddings. Ensure the dataset meets these requirements:
<original_column_name>_embedding.FixedSizeList[Float32|Float64, N]List[FixedSizeList[Float32|Float64, N]]<column_name>_offsets) is required:
List[FixedSizeList[Int32, 2]], indicating chunk boundaries.Example dataset structure (sales table):
Non-chunked:
Chunked:
Underlying Column Presence:
string Arrow data type .Embeddings Column Naming Convention:
<column_name>_embedding. For example, a customer_reviews table with a review column must have a review_embedding column.Embeddings Column Data Type:
FixedSizeList[Float32 or Float64, N], where N is the dimension (size) of the embedding vector. FixedSizeList is used for efficient storage and processing of fixed-size vectors.List[FixedSizeList[Float32 or Float64, N]].Offset Column for Chunked Data:
<column_name>_offsets with the following Arrow data type:
By following these guidelines, you can ensure that your dataset with pre-existing embeddings is fully compatible with the vector search and other embedding functionalities provided by Spice.
A table sales with an address column and corresponding embedding column(s).
List[FixedSizeList[Int32, 2]], where each element is a pair of integers [start, end] representing the start and end indices of the chunk in the underlying text column. This offset column maps each chunk in the embeddings back to the corresponding segment in the underlying text column.[[0, 100], [101, 200]] indicates two chunks covering indices 0–100 and 101–200, respectively.