title: 'Elasticsearch Data Connector' sidebar_label: 'Elasticsearch Data Connector' description: 'Query Elasticsearch indexes as SQL tables in Spice, including kNN vector search, full-text search, and hybrid search.' tags:
The Elasticsearch Data Connector exposes Elasticsearch indexes as SQL tables in Spice. Index mappings are translated to Arrow schemas so that documents can be queried with federated SQL alongside data from other connectors.
To run vector, full-text, or hybrid search (the vector_search, text_search, and rrf UDTFs) against an Elasticsearch index, the dataset must additionally be configured for search — as an Elasticsearch Vector Engine with an embedding model for vector search, and/or with full-text search columns for text_search. See Vector and Full-Text Search below. Registering an index through the data connector alone exposes it for federated SQL but does not make it searchable through those UDTFs.
:::note[Enterprise edition] The Elasticsearch connector is available in the Spice Enterprise edition. :::
fromThe from field takes the form elasticsearch:{index_name} where index_name is the Elasticsearch index to query.
Dot-separated paths may be used to refer to nested fields in query results (e.g. address.city); the connector flattens object mappings into Arrow columns using that convention.
nameThe dataset name used as the table name within Spice. The dataset name cannot be a reserved keyword.
paramsThe Elasticsearch connector accepts the following params. Use the secret replacement syntax to load credentials from a secret store.
| Parameter Name | Description | Required | Default |
|---|---|---|---|
elasticsearch_endpoint | Cluster URL (e.g., https://localhost:9200). | Yes | - |
elasticsearch_user | Username for HTTP basic authentication. | No | - |
elasticsearch_pass | Password for HTTP basic authentication. | No | - |
The connector derives an Arrow schema from each index's mapping via GET /<index>/_mapping. Elasticsearch field types map to Arrow as follows:
| Elasticsearch Field Type | Arrow Type | Notes |
|---|---|---|
text, keyword, wildcard, constant_keyword, match_only_text | Utf8 | |
long | Int64 | |
unsigned_long | UInt64 | Accepts both numeric values and digit strings (JS clients commonly serialize values > 253-1 as strings). |
integer | Int32 | |
short | Int16 | |
byte | Int8 | |
double |
Nested object fields are flattened by concatenating field names with dots (e.g. address.city). nested fields are preserved as JSON strings because per-document ordering must be retained.
After registering a dataset, query it like any other Spice table:
An Elasticsearch dataset is not searchable through the search UDTFs by virtue of being registered with the data connector. To enable search against an Elasticsearch index, configure the dataset for search:
vectors: { engine: elasticsearch, enabled: true }) with a column-level embeddings entry naming an embedding model. The embedding model is required — it is used to embed the query text at search time.full_text_search on the column(s) to search.Once configured, the following UDTFs are available against the dataset:
vector_search — executed natively as an Elasticsearch kNN query.text_search — executed using Elasticsearch's native BM25 ranking.rrf — combining both with Reciprocal Rank Fusion.These operations run against the Elasticsearch cluster directly rather than ingesting vectors into an accelerator, keeping indexing and search colocated in Elasticsearch.
Example:
See Search Functionality for the full search feature guide.
The connector uses HTTP basic authentication when elasticsearch_user and elasticsearch_pass are provided. For production deployments, store credentials in a secret store and reference them with ${secrets:...} rather than hard-coding them in spicepod.yaml.
TLS is enabled automatically for https:// endpoints.
date and date_nanos fields are preserved as strings because Elasticsearch accepts heterogeneous date formats; cast to a timestamp in SQL when numeric comparison is required.dense_vector fields without a declared dims value fall back to Utf8 and are not usable as a vector column.LIMIT N where N ≤ 10,000, the connector issues a single _search request. For larger result sets or queries without LIMIT, the connector automatically paginates using Point-In-Time (PIT) + search_after, fetching all matching documents in 10,000-hit batches.Elasticsearch can also be configured as a Vector Engine for datasets sourced from other connectors (storing Spice-managed embeddings in Elasticsearch rather than querying an existing index).
Float64 |
float, half_float, scaled_float | Float32 |
boolean | Boolean |
date, date_nanos | Utf8 | ES dates are flexibly formatted; preserved as strings. |
binary | Utf8 | Base64-encoded in the JSON response. |
ip | Utf8 |
dense_vector (with dims) | FixedSizeList<Float32, dims> | Required dims field must fit in i32. |
dense_vector (missing dims) | Utf8 | Falls back to raw JSON when dims cannot be resolved. |
object (with sub-fields) | (flattened) | Expanded into dot-separated columns (e.g. address.city). |
object (no sub-fields), nested | Utf8 | Serialized JSON. |
| Any other mapping type | Utf8 | Fallback — the raw JSON value is preserved as a string. |