title: 'Databricks Data Connector' sidebar_label: 'Databricks Data Connector' description: 'Databricks Data Connector Documentation' pagination_prev: null tags:
Databricks as a connector for federated SQL query against Databricks using Spark Connect, directly from Delta Lake tables, or using the SQL Statement Execution API.
fromThe from field for the Databricks connector takes the form databricks:catalog.schema.table where catalog.schema.table is the fully-qualified path to the table to read from.
nameThe dataset name. This will be used as the table name within Spice.
Example:
The dataset name cannot be a reserved keyword.
paramsUse the secret replacement syntax to reference a secret, e.g. ${secrets:my_token}.
| Parameter Name | Description |
|---|---|
mode | The execution mode for querying against Databricks. The default is spark_connect. Possible values: spark_connect: Use Spark Connect to query against Databricks. Requires a Spark cluster to be available.delta_lake: Query directly from Delta Tables. Requires the object store credentials to be provided. |
databricks_endpoint | The endpoint of the Databricks instance. Required for both modes. |
databricks_sql_warehouse_id | The ID of the SQL Warehouse in Databricks to use for the query. Only valid when mode is sql_warehouse. |
databricks_cluster_id | The ID of the compute cluster in Databricks to use for the query. Only valid when mode is spark_connect. |
databricks_use_ssl | If true, use a TLS connection to connect to the Databricks endpoint. Default is true. |
client_timeout |
To learn more about how to set up personal access tokens, see Databricks PAT docs.
Spice supports the Machine-to-Machine (M2M) OAuth flow with service principal credentials by utilizing the databricks_client_id and databricks_client_secret parameters. The runtime will automatically refresh the token.
Ensure that you grant your service principal the "Data Reader" privilege preset for the catalog and "Can Attach" cluster permissions when using Spark Connect mode.
To Learn more about how to set up the service principal, see Databricks M2M OAuth docs.
Configure the connection to the object store when using mode: delta_lake. Use the secret replacement syntax to reference a secret, e.g. ${secrets:aws_access_key_id}.
| Parameter Name | Description |
|---|---|
databricks_aws_region | Optional. The AWS region for the S3 object store. E.g. us-west-2. |
databricks_aws_access_key_id | The access key ID for the S3 object store. |
databricks_aws_secret_access_key | The secret access key for the S3 object store. |
databricks_aws_endpoint | Optional. The endpoint for the S3 object store. E.g. s3.us-west-2.amazonaws.com. |
databricks_aws_allow_http | Optional. Enables insecure HTTP connections to databricks_aws_endpoint. Defaults to false. |
:::info Note One of the following auth values must be provided for Azure Blob:
databricks_azure_storage_account_key,databricks_azure_storage_client_id and azure_storage_client_secret, ordatabricks_azure_storage_sas_key.
:::| Parameter Name | Description |
|---|---|
databricks_azure_storage_account_name | The Azure Storage account name. |
databricks_azure_storage_account_key | The Azure Storage key for accessing the storage account. |
databricks_azure_storage_client_id | The Service Principal client ID for accessing the storage account. |
databricks_azure_storage_client_secret | The Service Principal client secret for accessing the storage account. |
databricks_azure_storage_sas_key | The shared access signature key for accessing the storage account. |
databricks_azure_storage_endpoint | Optional. The endpoint for the Azure Blob storage account. |
| Parameter Name | Description |
|---|---|
google_service_account | Filesystem path to the Google service account JSON key file. |
The table below shows the Databricks (mode: delta_lake) data types supported, along with the type mapping to Apache Arrow types in Spice.
| Databricks SQL Type | Arrow Type |
|---|---|
STRING | Utf8 |
BIGINT | Int64 |
INT | Int32 |
SMALLINT | Int16 |
TINYINT | Int8 |
FLOAT | Float32 |
DOUBLE | Float64 |
BOOLEAN | Boolean |
BINARY |
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the secret stores documentation. Additionally, learn how to use referenced secrets in component parameters by visiting the using referenced secrets guide.
Databricks connector (mode: delta_lake) does not support reading Delta tables with the V2Checkpoint feature enabled. To use the Databricks connector (mode: delta_lake) with such tables, drop the V2Checkpoint feature by executing the following command:
For more details on dropping Delta table features, refer to the official documentation: Drop Delta table features
When using mode: spark_connect, correlated scalar subqueries can only be used in filters, aggregations, projections, and UPDATE/MERGE/DELETE commands. Spark Docs
:::warning[Memory Considerations]
When using the Databricks (mode: delta_lake) Data connector without acceleration, data is loaded into memory during query execution. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.
Memory limitations can be mitigated by storing acceleration data on disk, which is supported by duckdb and sqlite accelerators by specifying mode: file.
mode: spark_connect) does not yet support streaming query results from Spark.:::
Optional. Applicable only in delta_lake mode. Specifies timeout for object store operations. Default value is 30s E.g. client_timeout: 60s |
databricks_token | The Databricks API token to authenticate with the Unity Catalog API. Can't be used with databricks_client_id and databricks_client_secret. |
databricks_client_id | The Databricks Service Principal Client ID. Can't be used with databricks_token. |
databricks_client_secret | The Databricks Service Principal Client Secret. Can't be used with databricks_token. |
BinaryDATE | Date32 |
TIMESTAMP | Timestamp(Microsecond, Some("UTC")) |
TIMESTAMP_NTZ | Timestamp(Microsecond, None) |
DECIMAL | Decimal128 |
ARRAY | List |
STRUCT | Struct |
MAP | Map |