title: 'DynamoDB Data Connector' sidebar_label: 'DynamoDB Data Connector' description: 'DynamoDB Data Connector Documentation' tags:
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. This connector enables using DynamoDB tables as data sources for federated SQL queries in Spice.
fromThe from field should specify the DynamoDB table name:
from | Description |
|---|---|
dynamodb:table | Read data from a DynamoDB table named table |
:::note
If an expected table is not found, verify the dynamodb_aws_region parameter. DynamoDB tables are region-specific.
:::
nameThe dataset name. This will be used as the table name within Spice.
Example:
The dataset name cannot be a reserved keyword.
paramsThe DynamoDB data connector supports the following configuration parameters:
| Parameter Name | Description |
|---|---|
dynamodb_aws_region | Required. The AWS region containing the DynamoDB table |
dynamodb_aws_access_key_id | Optional. AWS access key ID for authentication. If not provided, credentials will be loaded from environment variables or IAM roles |
dynamodb_aws_secret_access_key | Optional. AWS secret access key for authentication. If not provided, credentials will be loaded from environment variables or IAM roles |
dynamodb_aws_session_token | Optional. AWS session token for authentication |
dynamodb_aws_auth | Optional. Authentication method. Use iam_role (default) for IAM role-based authentication or key for explicit access key credentials. |
dynamodb_aws_iam_role_source | Optional. IAM role credential source (only used when dynamodb_aws_auth: iam_role). auto (default) uses the default AWS credential chain, metadata uses only instance/container metadata (IMDS, ECS, EKS/IRSA), env uses only environment variables |
unnest_depth |
The DynamoDB connector supports two authentication methods controlled by the dynamodb_aws_auth parameter:
dynamodb_aws_auth: iam_role)This is the default authentication method. When using IAM role authentication, the dynamodb_aws_iam_role_source parameter controls which credential sources are used:
| Source Value | Description | Credential Sources |
|---|---|---|
auto (default) | Uses the default AWS credential chain | All sources listed below, in order |
metadata | Uses only instance/container metadata | Web Identity Token, ECS Container Credentials, EC2 Instance Metadata (IMDSv2) |
env | Uses only environment variables | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN |
:::note
When using iam_role authentication, any explicitly provided access keys (dynamodb_aws_access_key_id, dynamodb_aws_secret_access_key) are ignored.
:::
dynamodb_aws_auth: key)When dynamodb_aws_auth is set to key, credentials must be provided explicitly:
auto)When using dynamodb_aws_auth: iam_role with dynamodb_aws_iam_role_source: auto (or when both parameters are omitted), credentials are loaded from the following sources in order:
Environment Variables:
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEYAWS_SESSION_TOKEN (if using temporary credentials)Shared AWS Config/Credentials Files:
Config file: ~/.aws/config (Linux/Mac) or %UserProfile%\.aws\config (Windows)
Credentials file: ~/.aws/credentials (Linux/Mac) or %UserProfile%\.aws\credentials (Windows)
The AWS_PROFILE environment variable can be used to specify a named profile, otherwise the [default] profile is used.
Supports both static credentials and SSO sessions
Example credentials file:
:::tip To set up SSO authentication:
aws configure sso to configure a new SSO profileAWS_PROFILE=sso-profileaws sso login --profile sso-profile to start a new SSO session
:::AWS STS Web Identity Token Credentials:
The connector will try each source in order until valid credentials are found. If no valid credentials are found, an authentication error will be returned.
:::note[IAM Permissions]
Regardless of the credential source, the IAM role or user must have appropriate DynamoDB permissions (e.g., dynamodb:Scan, dynamodb:Query, dynamodb:DescribeTable) to access the tables. If the Spicepod connects to multiple different AWS services, the permissions should cover all of them.
:::
The IAM role or user needs the following permissions to access DynamoDB tables:
| Permission | Purpose |
|---|---|
dynamodb:Scan | Required. Allows reading all items from the table |
dynamodb:Query | Required. Allows reading items from the table using partition key |
dynamodb:DescribeTable | Required. Allows fetching table metadata and schema information |
:::warning[Security Considerations]
dynamodb:* permissions as it grants more access than necessary.:::
The table below shows the DynamoDB data types supported, along with the type mapping to Apache Arrow types in Spice.
| DynamoDB Type | Description | Arrow Type | Notes |
|---|---|---|---|
Bool | Boolean | Boolean | |
S | String | Utf8 | |
S | String | Timestamp(Millisecond) | Naive timestamp if it matches time_format without timezone |
S | String | Timestamp(Millisecond, <timezone>) | Timezone-aware timestamp if it matches time_format with timezone |
Ss | String Set | List<Utf8> | |
N | Number | Int64 | Float64 | |
Since DynamoDB stores timestamps as strings, Spice supports parsing timestamps using a customizable format. By default, Spice will try to parse timestamps using ISO8601 format, but you can provide a custom format using the time_format parameter.
Once Spice is able to parse a timestamp, it will convert it to a Timestamp(Millisecond) Arrow type, and will use the same format to serialize it back to DynamoDB for filter pushdown.
This parameter uses Go-style time formatting, which uses a reference time of Mon Jan 2 15:04:05 MST 2006.
| Format Pattern | Example Value | Description |
|---|---|---|
2006-01-02T15:04:05Z07:00 | 2024-03-15T14:30:00Z | ISO8601 / RFC3339 with timezone (default) |
2006-01-02T15:04:05.999Z07:00 | 2024-03-15T14:30:00.123-07:00 | ISO8601 with milliseconds and timezone |
2006-01-02T15:04:05 | 2024-03-15T14:30:00 | ISO8601 without timezone (naive timestamp) |
2006-01-02 15:04:05 | 2024-03-15 14:30:00 | Date and time with space separator |
01/02/2006 15:04:05 | 03/15/2024 14:30:00 | US-style date with time |
02/01/2006 15:04:05 | 15/03/2024 14:30:00 | European-style date with time |
Jan 2, 2006 3:04:05 PM | Mar 15, 2024 2:30:00 PM |
Go's format uses specific reference values that must appear exactly as shown:
| Component | Reference Value | Alternatives |
|---|---|---|
| Year | 2006 | 06 (2-digit) |
| Month | 01 | 1, Jan, January |
| Day | 02 | 2 |
| Hour (24h) | 15 | — |
| Hour (12h) | 03 | 3 |
| Minute | 04 | 4 |
| Second | 05 | 5 |
| AM/PM | PM |
Consider the following document:
Using unnest_depth you can control the unnesting behavior. Here are the examples:
When working with DynamoDB tables that have many columns, you can consolidate unspecified columns into a single JSON column using the json_object metadata option. This is useful when you only need a few columns as discrete fields and want to bundle the remaining columns into a single JSON structure.
To use JSON nesting, define your desired columns explicitly in the columns list and add a "catch-all" column with json_object: "*" metadata. Any columns from the source table that are not explicitly listed will be nested into this JSON column.
Given a DynamoDB table with this schema:
| Column | Type |
|---|---|
| PK | String |
| SK | String |
| Foo | Map |
| Bar | List |
| Baz | String |
The configuration above produces:
| Column | Type |
|---|---|
| PK | String |
| SK | String |
| Baz | String |
| data_json | JSON ({"Foo": <map>, "Bar": <list>}) |
The Foo and Bar columns, which were not explicitly listed, are automatically nested into the data_json column as a JSON object.
unnest_depthWhen both unnest_depth and json_object are specified, the operations are applied in this order:
unnest_depth valuejson_object columnConsider this DynamoDB dataset:
+----+------+-------------------+------------------------------------------------------+
| PK | SK | Baz | Foo |
+----+------+-------------------+------------------------------------------------------+
| 1 | 200 | some_string_value | { "Age" : { "N" : "35" }, "Name" : { "S" : "Joe" } } |
+----+------+-------------------+------------------------------------------------------+
And this configuration
Will produce the following Spice dataset:
+----+-----+---------------------------------------------------------+
| PK | SK | json_data |
+----+-----+---------------------------------------------------------+
| 1 | 200 | {"Baz":"some_string", "Foo.Age":35.0, "Foo.Name":"Joe"} |
+----+-----+---------------------------------------------------------+
:::warning[Limitations]
json_object metadata only accepts "*" as its value, which captures all unspecified columnsjson_object metadata. Specifying multiple columns with json_object will result in an error:::
DynamoDB supports complex nested JSON structures. These fields can be queried using SQL:
:::warning[Limitations]
:::
Example schema from a users table:
The DynamoDB Data Connector integrates with DynamoDB Streams to enable real-time streaming of table changes. This feature supports both initial table bootstrapping and continuous change data capture (CDC), so Spice can automatically detect and stream inserts, updates, and deletes from DynamoDB tables.
:::warning
Using DynamoDB Streams requires acceleration with refresh_mode: changes.
:::
To enable streaming from DynamoDB, enable acceleration and set the refresh_mode to changes in your dataset configuration.
ready_lag - Defines the maximum lag threshold before the dataset is reported as "Ready". Once the stream lag falls below this value, queries can be executed against the dataset. Default behavior reports ready immediately after bootstrap completes.
scan_interval - Controls the polling frequency for checking new records in the DynamoDB stream. Lower values provide more real-time updates but increase API calls. Higher values reduce API usage but may introduce additional latency.
snapshots - Optional. Controls snapshots behavior. Supported values are disabled (default), enabled, create_only, bootstrap_only.snapshots_trigger - Optional. Determines type of trigger for creating snapshots. Supported values are time_interval (default) and stream_batches.snapshots_trigger_threshold - Optional. Threshold value for snapshot creation. The format depends on the snapshots_trigger type:
snapshots_trigger is stream_batches: a raw integer specifying the number of batches (e.g., 100, 1000).snapshots_trigger is time_interval: an integer with a time unit suffix (e.g., 10m, , ).See Acceleration snapshots for more details.
The following Component Metrics are provided for monitoring streaming performance and health:
| Metric | Type | Description |
|---|---|---|
shards_active | Gauge | Current number of active shards in the stream |
records_consumed_total | Counter | Total number of records consumed from the stream |
lag_ms | Gauge | Current lag in milliseconds between stream watermark and the current time |
errors_transient_total | Counter | Total number of transient errors encountered while polling from the stream |
These metrics are not enabled by default, enable them by setting the metrics parameter:
You can find an example dashboard for DynamoDB Streams in monitoring/grafana-dashboard.json.
For production workloads requiring fine-tuned control over streaming behavior and performance characteristics:
:::warning[Limitations]
refresh_sql.:::
| Optional. Maximum nesting depth for unnesting embedded documents into a flattened structure. Higher values expand deeper nested fields. |
schema_infer_max_records | Optional. The number of documents to use to infer the schema. Defaults to 10 |
scan_segments | Optional. Number of segments for Scan request. 'auto' by default, which will calculate number of segments based on number of the records in a table |
time_format | Optional. Go-style time format used for parsing/formatting timestamps. See Time Format |
ECS Container Credentials:
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI which are automatically injected by ECS.AWS EC2 Instance Metadata Service (IMDSv2):
Ns| Number Set |
List<Int64|Float64> |
B | Binary | Binary |
Bs | Binary Set | List<Binary> |
L | List | List<Utf8> | DynamoDB arrays can be heterogeneous e.g. [1, "foo", true], Arrow arrays must be homogeneous - use strings to preserve all data |
M | Map | Utf8 or Unflattened | Depending on unnest_depth value |
| Human-readable with 12-hour clock |
20060102150405 | 20240315143000 | Compact format (no separators) |
pm| Timezone | Z07:00 | -0700, MST |
| Milliseconds | .000 | .999 (trailing zeros trimmed) |
| Microseconds | .000000 | .999999 (trailing zeros trimmed) |
| Nanoseconds | .000000000 | .999999999 (trailing zeros trimmed) |
30s1h