DuckDB can be used as a vector engine in Spice to store embeddings and execute vector similarity search using HNSW indexes via the DuckDB VSS extension. This is useful when a dataset or view is already accelerated with DuckDB and a fully embedded, single-process vector store is preferred over an external service.
The DuckDB vector engine requires the dataset or view to be accelerated with the DuckDB accelerator. Spice computes embeddings on the configured columns during refresh and write, stores them in the DuckDB accelerator alongside the source data, and creates an HNSW index that is used to answer vector_search and /v1/search queries.
Accelerated views also support DuckDB HNSW vector indexes. Configure columns[].embeddings and vectors on the view:
| Parameter | Description | Default |
|---|---|---|
duckdb_distance_metric | Optional. Vector similarity metric. Accepts cosine, l2 (or l2_norm / euclidean / l2sq), or inner_product (or ip / dot / dot_product). | cosine |
duckdb_metric | Optional. Alias for duckdb_distance_metric. duckdb_distance_metric takes precedence when both are set. | — |
duckdb_hnsw_m | Optional. HNSW graph parameter m — the number of bidirectional links per node. Higher values improve recall at the cost of index size and build time. | DuckDB VSS default |
embeddings SyntaxWhen a dataset is accelerated with DuckDB and has embedding columns configured, the DuckDB vector engine can be enabled implicitly by placing HNSW parameters directly on the DuckDB accelerator's params. This avoids the separate vectors: block when an HNSW index is the only vector-engine configuration needed.
Spice detects the HNSW parameters on the accelerator config and automatically attaches a DuckDB vector engine to the dataset. The recognized keys are duckdb_distance_metric (or duckdb_metric), duckdb_hnsw_m, duckdb_hnsw_ef_construction, and duckdb_hnsw_ef_search; any non-vector accelerator parameters are passed through to DuckDB unchanged.
The two configurations are equivalent:
embeddings syntax — HNSW params on acceleration.params. Inferred when the dataset has DuckDB acceleration and at least one recognized HNSW parameter.vectors block — vectors.engine: duckdb with HNSW params on vectors.params. Required if the engine name needs to be set explicitly or to disable the vector engine without removing the HNSW parameters.If both are set, the explicit vectors: block takes precedence.
When configured as a vector engine, Spice:
vector_search and /v1/search against the DuckDB accelerator, computing similarity natively in DuckDB.The DuckDB VSS extension is installed and loaded automatically by the runtime; no manual setup is required.
:::warning[Limitations]
acceleration.engine: duckdb) for the DuckDB vector engine to be used.row_id.partition_by is not yet supported for the DuckDB vector engine.spill_writes is not supported for the DuckDB vector engine.:::
Any embedding model supported by Spice can be used to produce the vectors stored in DuckDB, including local models via Hugging Face and hosted models via OpenAI, Bedrock, and others. The vector dimension is inferred from the embedding model and used to size the DuckDB embedding column.
Spice requires a primary key to round-trip matches between the HNSW index and the base dataset. If the source dataset does not carry primary key metadata, specify it on the column embedding:
The distance metric controls how similarity is computed between query and stored vectors. Pick the metric that matches how your embedding model is trained:
cosine (default) — cosine similarity. Appropriate for most text embedding models.l2 — Euclidean (L2) distance. Aliases: l2_norm, euclidean, l2sq.inner_product — dot-product similarity. Aliases: ip, dot, dot_product, max_inner_product.The duckdb_hnsw_m, duckdb_hnsw_ef_construction, and duckdb_hnsw_ef_search parameters control the trade-off between recall, index size, build time, and query latency. When unset, Spice defers to the DuckDB VSS defaults. See the DuckDB VSS documentation for guidance on tuning these values.
Vector search uses the standard Spice search surfaces. When the dataset is backed by the DuckDB vector engine, both vector_search and /v1/search execute natively in DuckDB using the HNSW index.
The query text is embedded with the configured embedding model and used as the probe vector for the HNSW index.
For the full reference, see Vector Search and Search API Reference.
duckdb_hnsw_ef_construction | Optional. HNSW build-time parameter — the size of the dynamic candidate list during index construction. Higher values improve recall at the cost of build time. | DuckDB VSS default |
duckdb_hnsw_ef_search | Optional. HNSW query-time parameter — the size of the dynamic candidate list during search. Higher values improve recall at the cost of query latency. | DuckDB VSS default |