title: 'S3 Data Connector Deployment Guide' sidebar_label: 'Deployment Guide' description: 'Operating guide for the S3 data connector in production: IAM, credential chains, file formats, metrics, and observability.' sidebar_position: 10 pagination_prev: null pagination_next: null tags:
Production operating guide for the S3 data connector covering IAM authentication, credential chains, file-format tuning, metrics, and observability.
S3 authentication is selected via s3_auth:
| Value | Behavior |
|---|---|
| (unset) | Default AWS credential chain (IAM-based). Equivalent to iam_role with iam_role_source: auto. |
iam_role | Load credentials from the AWS credential chain; the source is further narrowed by iam_role_source. |
key | Use the explicit s3_key / s3_secret pair. Required for S3-compatible stores that do not speak IAM (MinIO, Cloudflare R2 with keys, Backblaze B2, etc.). |
public | Unauthenticated access for public buckets. |
When s3_auth is unset or iam_role, the credential source is controlled by iam_role_source:
| Value | Behavior |
|---|---|
auto | Default AWS credential chain (env vars → shared credentials file → IMDS/ECS/IRSA). |
metadata | Restrict to instance/container metadata only: IMDS (EC2), ECS task role, EKS IRSA (pod role). |
env | Restrict to environment variables only (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN). |
For production on EKS or ECS, prefer iam_role_source: metadata to guarantee the runtime only draws credentials from the workload identity, never from ambient environment variables.
For MinIO, R2, B2, or on-prem S3 gateways:
Keys must be sourced from a secret store in production. See Secret Stores.
s3_region is validated against AWS's known region set. Uppercase regions are auto-corrected to lowercase with a warning. Unrecognized regions produce a startup warning but do not prevent the connector from starting. Custom S3-compatible endpoints still require a valid-looking AWS region code.
S3 I/O uses the AWS SDK's default retry strategy: standard adaptive backoff with retries on throttling (SlowDown, 503) and transient network errors. Per-operation retry parameters are not currently exposed at the Spice layer.
Authentication failures (401, 403) and missing buckets (404) surface immediately as query errors. Unlike the Databricks connector, the S3 connector does not permanently disable itself — subsequent queries re-attempt authentication, so transient IAM or network issues self-heal.
hive_partitioning_enabled: true when listing partitioned datasets so DataFusion can prune irrelevant partitions at plan time instead of listing and filtering at execution time.schema in the dataset definition for large datasets to avoid repeated list/head operations.S3 I/O metrics are collected via the shared runtime-object-store layer (request counts, retries, bytes read) and are exposed through Spice's runtime metrics. See Component Metrics for configuration.
The connector does not currently register S3-specific dataset-level instruments. Monitor S3 health via:
AllRequests, 4xxErrors, 5xxErrors, TotalRequestLatency).query_duration_ms, query_processed_rows) from runtime.metrics.S3 object reads participate in Spice task history through DataFusion's object-store plan nodes. Individual object GETs are attributed to their enclosing sql_query or accelerated_table_refresh task via the DataFusion execution plan.
s3:// URIs when the region and endpoint match.| Symptom | Likely cause | Resolution |
|---|---|---|
The request signature we calculated does not match the signature you provided | Clock skew or wrong s3_key/s3_secret. | Verify secret values; check system clock (AWS tolerates only ~15 min drift). |
Access Denied | IAM policy lacks s3:GetObject or s3:ListBucket. | Attach a policy granting read on the bucket and prefix. Cross-account buckets also need bucket policy. |
NoSuchBucket | Bucket does not exist in the configured region. | Confirm bucket name and s3_region. |
EnvCredentialsNotSet on EKS | iam_role_source: env while running under IRSA. | Set iam_role_source: metadata or auto. |
InvalidSignatureException against MinIO/R2 | s3_endpoint not set or AWS SDK trying to sign for AWS S3. |
Set s3_endpoint and s3_region to match the S3-compatible provider. |
| Slow queries on large partitioned datasets | Hive partitioning not enabled; every scan lists all files. | Set hive_partitioning_enabled: true and encode partitions as key=value/ in the path. |