spiceai/docs

spiceai/

docs

Help Login

evgenii/docs-spicepod-v2

Edit on GitHub

Fork

/docs/website/versioned_docs/version-2.0.x/components/data-connectors/elasticsearch/index.md

spiceai/docs | Spice Cloud Platform

evgenii/docs-spicepod-v2

Edit on GitHub

Fork

/docs/website/versioned_docs/version-2.0.x/components/data-connectors/elasticsearch/index.md

spiceai/docs/README.md

title: 'Elasticsearch Data Connector' sidebar_label: 'Elasticsearch Data Connector' description: 'Query Elasticsearch indexes as SQL tables in Spice, including kNN vector search, full-text search, and hybrid search.' tags:

data-connectors
elasticsearch
search

The Elasticsearch Data Connector exposes Elasticsearch indexes as SQL tables in Spice. Index mappings are translated to Arrow schemas so that documents can be queried with federated SQL alongside data from other connectors.

To run vector, full-text, or hybrid search (the vector_search, text_search, and rrf UDTFs) against an Elasticsearch index, the dataset must additionally be configured for search — as an Elasticsearch Vector Engine with an embedding model for vector search, and/or with full-text search columns for text_search. See Vector and Full-Text Search below. Registering an index through the data connector alone exposes it for federated SQL but does not make it searchable through those UDTFs.

:::note[Enterprise edition] The Elasticsearch connector is available in the Spice Enterprise edition. :::

Configuration

`from`

The from field takes the form elasticsearch:{index_name} where index_name is the Elasticsearch index to query.

Dot-separated paths may be used to refer to nested fields in query results (e.g. address.city); the connector flattens object mappings into Arrow columns using that convention.

`name`

The dataset name used as the table name within Spice. The dataset name cannot be a reserved keyword.

`params`

The Elasticsearch connector accepts the following params. Use the secret replacement syntax to load credentials from a secret store.

Parameter Name	Description	Required	Default
`elasticsearch_endpoint`	Cluster URL (e.g., `https://localhost:9200`).	Yes	-
`elasticsearch_user`	Username for HTTP basic authentication.	No	-
`elasticsearch_pass`	Password for HTTP basic authentication.	No	-

Types

The connector derives an Arrow schema from each index's mapping via GET /<index>/_mapping. Elasticsearch field types map to Arrow as follows:

Elasticsearch Field Type	Arrow Type	Notes
`text`, `keyword`, `wildcard`, `constant_keyword`, `match_only_text`	`Utf8`
`long`	`Int64`
`unsigned_long`	`UInt64`	Accepts both numeric values and digit strings (JS clients commonly serialize values > 253-1 as strings).
`integer`	`Int32`
`short`	`Int16`
`byte`	`Int8`
`double`

Nested object fields are flattened by concatenating field names with dots (e.g. address.city). nested fields are preserved as JSON strings because per-document ordering must be retained.

Querying

After registering a dataset, query it like any other Spice table:

Vector and Full-Text Search

An Elasticsearch dataset is not searchable through the search UDTFs by virtue of being registered with the data connector. To enable search against an Elasticsearch index, configure the dataset for search:

For vector and hybrid search, configure the dataset as an Elasticsearch Vector Engine (vectors: { engine: elasticsearch, enabled: true }) with a column-level embeddings entry naming an embedding model. The embedding model is required — it is used to embed the query text at search time.
For full-text search, enable full_text_search on the column(s) to search.

Once configured, the following UDTFs are available against the dataset:

Vector similarity search via vector_search — executed natively as an Elasticsearch kNN query.
Full-text search via text_search — executed using Elasticsearch's native BM25 ranking.
Hybrid search via rrf — combining both with Reciprocal Rank Fusion.

These operations run against the Elasticsearch cluster directly rather than ingesting vectors into an accelerator, keeping indexing and search colocated in Elasticsearch.

Example:

See Search Functionality for the full search feature guide.

Authentication

The connector uses HTTP basic authentication when elasticsearch_user and elasticsearch_pass are provided. For production deployments, store credentials in a secret store and reference them with ${secrets:...} rather than hard-coding them in spicepod.yaml.

TLS is enabled automatically for https:// endpoints.

Limitations

Nested object fields are exposed as JSON strings rather than structured columns.
date and date_nanos fields are preserved as strings because Elasticsearch accepts heterogeneous date formats; cast to a timestamp in SQL when numeric comparison is required.
dense_vector fields without a declared dims value fall back to Utf8 and are not usable as a vector column.
For queries with LIMIT N where N ≤ 10,000, the connector issues a single _search request. For larger result sets or queries without LIMIT, the connector automatically paginates using Point-In-Time (PIT) + search_after, fetching all matching documents in 10,000-hit batches.
Pushdown of SQL predicates to Elasticsearch query DSL is limited; complex filter expressions are evaluated locally by DataFusion after fetching results.

Elasticsearch can also be configured as a Vector Engine for datasets sourced from other connectors (storing Spice-managed embeddings in Elasticsearch rather than querying an existing index).

Cookbook

A cookbook recipe to configure Elasticsearch as a data connector in Spice. Elasticsearch Data Connector