title: 'Data Connectors' sidebar_label: 'Data Connectors' description: 'Learn how to use Data Connector to query external data.' image: /img/og/data-connectors.png sidebar_position: 1 pagination_prev: null pagination_next: null tags:
Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.
Supported Data Connectors include:
| Name | Description | Status | Protocol/Format |
|---|---|---|---|
postgres | PostgreSQL, Amazon Redshift | Stable | PostgreSQL-line |
mysql | MySQL | Stable | |
s3 | S3 | Stable | Parquet, CSV, JSON |
file | File | Stable | Parquet, CSV, JSON |
duckdb | DuckDB | Stable | Embedded |
dremio | Dremio | Stable | Arrow Flight |
spice.ai | Spice.ai OSS & Cloud | Stable | Arrow Flight |
databricks (mode: delta_lake) | Databricks | Stable | S3/Delta Lake |
delta_lake | Delta Lake | Stable | Delta Lake |
github | GitHub | Stable | GitHub API |
graphql | GraphQL | Release Candidate | JSON |
databricks (mode: spark_connect) | Databricks | Beta | Spark Connect |
flightsql | FlightSQL | Beta | Arrow Flight SQL |
mssql | Microsoft SQL Server | Beta | Tabular Data Stream (TDS) |
odbc | ODBC | Beta | ODBC |
snowflake | Snowflake | Beta | Arrow |
spark | Spark | Beta | Spark Connect |
iceberg | Apache Iceberg | Beta | Parquet |
abfs | Azure BlobFS | Alpha | Parquet, CSV, JSON |
ftp, sftp | FTP/SFTP | Alpha | Parquet, CSV, JSON |
glue | Glue | Alpha | Iceberg, Parquet, CSV |
http, https | HTTP(s) | Alpha | Parquet, CSV, JSON |
imap | IMAP | Alpha | IMAP Emails |
localpod | Local dataset replication | Alpha | |
oracle | Oracle | Alpha | Oracle ODPI-C |
sharepoint | Microsoft SharePoint | Alpha | Unstructured UTF-8 documents |
clickhouse | Clickhouse | Alpha | |
debezium | Debezium CDC | Alpha | Kafka + JSON |
kafka | Kafka | Alpha | Kafka + JSON |
dynamodb | DynamoDB | Alpha | |
mongodb | MongoDB | Alpha | |
elasticsearch | ElasticSearch | Roadmap |
For data connectors that are object store compatible, if a folder is provided, the file format must be specified with params.file_format.
If a file is provided, the file format will be inferred, and params.file_format is unnecessary.
File formats currently supported are:
| Name | Parameter | Supported | Is Document Format |
|---|---|---|---|
| Apache Parquet | file_format: parquet | ✅ | ❌ |
| CSV | file_format: csv | ✅ | ❌ |
| Apache Iceberg | file_format: iceberg | Roadmap | ❌ |
| JSON | file_format: json | Roadmap | ❌ |
| Microsoft Excel | file_format: xlsx | Roadmap | ❌ |
| Markdown | file_format: md | ✅ | ✅ |
| Text | file_format: txt | ✅ | ✅ |
file_format: pdf | Alpha | ✅ | |
| Microsoft Word | file_format: docx | Alpha | ✅ |
File formats support additional parameters in the params (like csv_has_header) described in File Formats
If a format is a document format, each file will be treated as a document, as per document support below.
:::warning[Note] Document formats in Alpha (e.g. pdf, docx) may not parse all structure or text from the underlying documents correctly. :::
If a Data Connector supports documents, when the appropriate file format is specified (see above), each file will be treated as a row in the table, with the contents of the file within the content column. Additional columns will exist, dependent on the data connector.
Consider a local filesystem
And the spicepod
A Document table will be created.
import DocCardList from '@theme/DocCardList';