spiceai/docs

spiceai/

docs

Help Login

evgenii/docs-spicepod-v2

Edit on GitHub

Fork

/docs/website/versioned_docs/version-2.0.x/components/data-connectors/glue.md

spiceai/docs | Spice Cloud Platform

evgenii/docs-spicepod-v2

Edit on GitHub

Fork

/docs/website/versioned_docs/version-2.0.x/components/data-connectors/glue.md

spiceai/docs/README.md

title: 'Glue Data Connector' sidebar_label: 'Glue Data Connector' description: 'Connect to and query tables in an AWS Glue Data Catalog' tags:

data-connectors
glue
write

The Glue Data Connector enables federated SQL querying on tables in an AWS Glue Data Catalog.

Configuration

`from`

Specify a table using the format, glue:<database>.<table> by replacing <database> with the name of the Glue database and <table>with the name of the table inside of the <database>.

`name`

The dataset name. This will be used as the table name within Spice.

Example:

The dataset name cannot be a reserved keyword.

`params`

The following parameters are supported for configuring the connection to the Glue Data Catalog:

Parameter Name	Definition
`glue_region`	The AWS region for the Glue Data Catalog. E.g. `us-west-2`.
`glue_catalog_id`	The Glue catalog ID. For Amazon S3 Tables, use the format `<account_id>:s3tablescatalog/<table_bucket_name>`. If not provided, the default catalog for the account is used.
`glue_key`	Access key (e.g. AWS_ACCESS_KEY_ID for AWS). If not provided, credentials will be loaded from environment variables or IAM roles.
`glue_secret`	Secret key (e.g. AWS_SECRET_ACCESS_KEY for AWS). If not provided, credentials will be loaded from environment variables or IAM roles.
`glue_session_token`	Session token (e.g. AWS_SESSION_TOKEN for AWS) for temporary credentials
`glue_iam_role_source`	Optional. IAM role credential source. `auto` (default) uses the default AWS credential chain, `metadata` uses only instance/container metadata (IMDS, ECS, EKS/IRSA), `env` uses only environment variables.

The following parameters control how the embedded S3 reader fetches Parquet/CSV data files referenced by Glue table metadata. They are inherited from the S3 data connector and do not apply to Iceberg-format tables, whose object I/O is handled by the Iceberg client.

Parameter Name	Definition
`glue_endpoint`	Optional. Custom S3-compatible endpoint URL used when reading Parquet/CSV data files (e.g. `https://s3.us-east-1.amazonaws.com`, `http://minio.local:9000`). Leave unset for AWS S3.
`glue_url_style`	Optional. S3 URL addressing style for Parquet/CSV data files. One of `vhost` or `path`. Auto-detected from the endpoint when unset.
`glue_versioning`	Optional. Enables S3 object versioning support for Parquet/CSV data files when set to `enabled`. Defaults to `enabled`.
`client_timeout`	Optional. Timeout for the underlying S3 client used to fetch Parquet/CSV data files. E.g. `30s`.
`allow_http`	Optional. Set to `true` to allow insecure HTTP for the S3 endpoint used to read Parquet/CSV data files. Defaults to `false`. Required when `glue_endpoint` uses an scheme.

Examples

Basic Glue Table

Amazon S3 Tables

Connect to tables in Amazon S3 Tables using the glue_catalog_id parameter with the S3 Tables catalog format:

Authentication

If AWS credentials are not explicitly provided in the configuration, the connector will automatically load credentials from the following sources in order. These credentials will be used to connect to the S3 bucket as well as the Glue catalog.

Environment Variables:
- AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
- AWS_SESSION_TOKEN (if using temporary credentials)
Shared AWS Config/Credentials Files:
- Config file: ~/.aws/config (Linux/Mac) or %UserProfile%\.aws\config (Windows)
- Credentials file: ~/.aws/credentials (Linux/Mac) or %UserProfile%\.aws\credentials (Windows)
- The AWS_PROFILE environment variable can be used to specify a named profile, otherwise the [default] profile is used.
- Supports both static credentials and SSO sessions
- Example credentials file:
:::tip To set up SSO authentication:
1. Run aws configure sso to configure a new SSO profile
2. Use the profile by setting AWS_PROFILE=sso-profile
3. Run aws sso login --profile sso-profile to start a new SSO session :::
AWS STS Web Identity Token Credentials:
- Used primarily with OpenID Connect (OIDC) and OAuth

The connector will try each source in order until valid credentials are found. If no valid credentials are found, an authentication error will be returned.

:::note[IAM Permissions] Regardless of the credential source, the IAM role or user must have appropriate S3/Glue permissions (e.g., s3:ListBucket, glue:GetTable) to access the tables. If the Spicepod connects to multiple different AWS services, the permissions should cover all of them. :::

Required IAM Permissions

The IAM role or user needs the following permissions to access Iceberg tables in S3/Glue:

Permission Details

Permission	Purpose
`s3:ListBucket`	Required. Allows scanning all objects from the bucket
`s3:GetObject`	Required. Allows fetching objects
`s3:PutObject`	Required for write operations. Allows writing objects
`glue:GetCatalog`	Required. Retrieve metadata about the specified catalog.
`glue:GetDatabases`	Required. List the databases available in the current catalog.
`glue:GetDatabase`	Required. Retrieve metadata about the specified database.
`glue:GetTable`	Required. Retrieve metadata about the specified table.
`glue:GetTables`	Required. List the tables available in the current database.

Write Support

This connector supports writing data to Glue-managed Iceberg tables using SQL INSERT INTO statements. Writes are currently append-only — inserted data is added as new data files and registered through a new Iceberg table snapshot. Schema validation ensures inserted data matches the target table schema.

To enable writes, set access: read_write on the dataset:

Inserting into partitioned Iceberg tables is supported. UPDATE and DELETE operations are not currently supported.

Write operations require s3:PutObject permission on the target S3 bucket in addition to the read permissions listed above. For more details, see Data Ingestion.

Limitations

:::warning[Data Source/Data Format Restrictions]

This catalog connector is limited to tables that use the S3 data source. Kinesis and Kafka data sources are not currently supported. Additionally, this catalog connector is currently limited to Iceberg tables, tables with parquet or CSV data format only.

:::

:::warning[Performance Considerations]

When using the Glue Data connector without acceleration, data is loaded into memory during query execution. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.

Memory limitations can be mitigated by storing acceleration data on disk, which is supported by duckdb and sqlite accelerators by specifying mode: file.

Each query retrieves data from the S3 source, which might result in significant network requests and bandwidth consumption. This can affect network performance and incur costs related to data transfer from S3.

:::

Cookbook

A cookbook recipe to configure Glue as a data connector in Spice. Glue Data Connector