spiceai/docs

spiceai/

docs

Help Login

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/features/query-federation/url-tables.md

spiceai/docs | Spice Cloud Platform

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/features/query-federation/url-tables.md

spiceai/docs/README.md

title: 'URL Tables' sidebar_label: 'URL Tables' description: 'Query object store files directly using URLs without pre-registering datasets' sidebar_position: 3 tags:

query
sql
features
s3
azure

URL tables enable querying files in object stores directly using their URLs, without pre-registering datasets in a Spicepod. This provides an ad-hoc query capability for exploring data stored in S3, Azure Blob Storage, or HTTP endpoints.

Enabling URL Tables

URL tables are disabled by default and must be explicitly enabled in the Spicepod configuration:

Supported URL Schemes

Scheme	Description	Example
`s3://`	Amazon S3	`s3://bucket/path/file.parquet`
`abfs://`	Azure Blob Storage	`abfs://container@account/path/file.parquet`
`abfss://`	Azure Data Lake Storage Gen2	`abfss://container@account.dfs.core.windows.net/path/`
`https://`	HTTPS endpoints	`https://example.com/data.parquet`
`http://`	HTTP endpoints	`http://localhost:8080/data.csv`

Query Patterns

Single File

Query a single file by specifying its full URL:

Directory or Prefix

Query all files under a directory or prefix by including a trailing slash:

Glob Patterns

Use glob patterns to match specific files:

Hive-Style Partitions

Hive-style partitions are automatically inferred from the path structure, enabling partition pruning:

Authentication

URL tables use the same authentication mechanisms as the corresponding data connectors. Credentials are loaded automatically from environment variables or cloud provider defaults.

S3

For S3, credentials are loaded from:

Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
Shared AWS credentials file (~/.aws/credentials)
IAM instance profiles or roles

For public buckets, no authentication is required.

Azure Blob Storage

For Azure, set the storage account name via environment variable:

Alternatively, include the account name in the URL:

Additional authentication options:

Environment variable: AZURE_STORAGE_KEY for access key authentication
Azure Managed Identity (automatic when running on Azure)
Azure CLI credentials

Examples

S3 Query

Azure Blob Storage Query

Set the account via environment variable:

Or include the account in the URL:

Cross-Source Query

URL tables can be combined with registered datasets in federated queries:

Considerations

Schema Inference: The schema is inferred from the files at query time. For best performance with large datasets, consider registering datasets in the Spicepod.
File Format Detection: File formats are automatically inferred from file extensions. Supported formats include Parquet, CSV, and JSON.
Performance: URL tables query data directly from the object store without local acceleration. For frequently accessed data or performance-critical queries, register datasets with data acceleration.
Authentication Scope: URL table queries use environment-level credentials. For queries requiring different credentials per source, register datasets with explicit authentication parameters.