spiceai/docs

spiceai/

docs

Help Login

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/components/data-accelerators/index.md

spiceai/docs | Spice Cloud Platform

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/components/data-accelerators/index.md

spiceai/docs/README.md

title: 'Data Accelerators' sidebar_label: 'Data Accelerators' description: 'Data acceleration engines for local materialization and query acceleration in Spice' image: /img/og/data-accelerators.png sidebar_position: 2 pagination_prev: null pagination_next: null

Data sourced by Data Connectors can be locally materialized and accelerated using a Data Accelerator.

A Data Accelerator queries/fetches data from a connected data source and stores/updates it locally in an embedded acceleration engine, such as Spice Cayenne, DuckDB, or SQLite. To set data refresh behavior, such as refreshing data on an interval, see Data Refresh.

Dataset acceleration is enabled by setting the acceleration configuration:

For the complete reference specification, see datasets.

By default, datasets are locally materialized using in-memory Arrow records.

Supported Data Accelerators

Name	Description	Status	Engine Modes
`arrow`	In-Memory Arrow Records	Stable	`memory`
`cayenne`	Spice Cayenne	Beta	`file`, `file_create`, `file_update`
`duckdb`	Embedded DuckDB	Stable	`memory`, `file`
`postgres`	Attached PostgreSQL	Release Candidate	N/A
`sqlite`	Embedded SQLite	Release Candidate	`memory`, `file`
`turso`

Choosing an Accelerator

Select the appropriate accelerator based on dataset size, query patterns, and resource constraints:

Use Case	Recommended Accelerator	Rationale
Small datasets (under 1 GB), maximum speed	`arrow`	In-memory storage provides lowest latency
Medium datasets (1-100 GB), complex SQL	`duckdb`	Mature SQL support with memory management
Large datasets (100 GB - 1+ TB), scalable analytics	`cayenne`	Vortex columnar format scales beyond single-file limits
Point lookups on large datasets	`cayenne`	Vortex provides 100x faster random access vs Parquet
Simple queries, low resource usage	`sqlite`	Lightweight, minimal overhead
Async operations, concurrent workloads	`turso`	Native async support, modern connection pooling
External database integration	`postgres`	Use existing PostgreSQL infrastructure

Spice Cayenne vs DuckDB

Both Spice Cayenne and DuckDB support file-based acceleration, but differ in architecture and performance characteristics:

Choose Spice Cayenne when:

Datasets exceed ~1 TB
Multi-file data ingestion is required (e.g., partitioned S3 data)
Lower memory overhead is preferred
Workloads benefit from Vortex's 10-20x faster scans
Point lookups and random access patterns are common (100x faster than Parquet)

Choose DuckDB when:

Datasets are under ~1 TB
Complex SQL features are required (window functions, CTEs)
Existing DuckDB tooling integration is beneficial
Explicit index control is required

Data Types

Data Accelerators may not support all possible Apache Arrow data types. For complete compatibility, see specifications.

:::warning[Memory Considerations]

When accelerating a dataset using mode: memory (the default), some or all of the dataset is loaded into memory. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.

In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by duckdb, sqlite, and turso accelerators by specifying mode: file.

:::

Schema Handling

Data accelerators store the schema that Spice infers from the data source at startup. This schema is fixed for the lifetime of the runtime process and defines the column names, data types, and nullability of the accelerated table.

If the source schema changes while the runtime is running (for example, new columns are added or data types change), subsequent data refreshes into the accelerator will fail because the incoming data no longer matches the schema of the accelerated table. Restart the runtime to re-infer the schema and re-initialize the accelerated table.

For details on how schema inference works per connector and recommendations for managing schema drift, see Schema Inference.

Data Accelerator Docs

import DocCardList from '@theme/DocCardList';

title: 'Data Accelerators' sidebar_label: 'Data Accelerators' description: 'Data acceleration engines for local materialization and query acceleration in Spice' image: /img/og/data-accelerators.png sidebar_position: 2 pagination_prev: null pagination_next: null

Data sourced by Data Connectors can be locally materialized and accelerated using a Data Accelerator.

Dataset acceleration is enabled by setting the acceleration configuration:

For the complete reference specification, see datasets.

By default, datasets are locally materialized using in-memory Arrow records.

Supported Data Accelerators

Name	Description	Status	Engine Modes
`arrow`	In-Memory Arrow Records	Stable	`memory`
`cayenne`	Spice Cayenne	Beta	`file`, `file_create`, `file_update`
`duckdb`	Embedded DuckDB	Stable	`memory`, `file`
`postgres`	Attached PostgreSQL	Release Candidate	N/A
`sqlite`	Embedded SQLite	Release Candidate	`memory`, `file`
`turso`

Choosing an Accelerator

Select the appropriate accelerator based on dataset size, query patterns, and resource constraints:

Use Case	Recommended Accelerator	Rationale
Small datasets (under 1 GB), maximum speed	`arrow`	In-memory storage provides lowest latency
Medium datasets (1-100 GB), complex SQL	`duckdb`	Mature SQL support with memory management
Large datasets (100 GB - 1+ TB), scalable analytics	`cayenne`	Vortex columnar format scales beyond single-file limits
Point lookups on large datasets	`cayenne`	Vortex provides 100x faster random access vs Parquet
Simple queries, low resource usage	`sqlite`	Lightweight, minimal overhead
Async operations, concurrent workloads	`turso`	Native async support, modern connection pooling
External database integration	`postgres`	Use existing PostgreSQL infrastructure

Spice Cayenne vs DuckDB

Both Spice Cayenne and DuckDB support file-based acceleration, but differ in architecture and performance characteristics:

Choose Spice Cayenne when:

Datasets exceed ~1 TB
Multi-file data ingestion is required (e.g., partitioned S3 data)
Lower memory overhead is preferred
Workloads benefit from Vortex's 10-20x faster scans
Point lookups and random access patterns are common (100x faster than Parquet)

Choose DuckDB when:

Datasets are under ~1 TB
Complex SQL features are required (window functions, CTEs)
Existing DuckDB tooling integration is beneficial
Explicit index control is required

Data Types

Data Accelerators may not support all possible Apache Arrow data types. For complete compatibility, see specifications.

:::warning[Memory Considerations]

In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by duckdb, sqlite, and turso accelerators by specifying mode: file.

:::

Schema Handling

For details on how schema inference works per connector and recommendations for managing schema drift, see Schema Inference.

Data Accelerator Docs

import DocCardList from '@theme/DocCardList';

title: 'Data Accelerators' sidebar_label: 'Data Accelerators' description: 'Data acceleration engines for local materialization and query acceleration in Spice' image: /img/og/data-accelerators.png sidebar_position: 2 pagination_prev: null pagination_next: null

Supported Data Accelerators

Choosing an Accelerator

Spice Cayenne vs DuckDB

Data Types

Schema Handling

Data Accelerator Docs

Related Documentation

title: 'Data Accelerators' sidebar_label: 'Data Accelerators' description: 'Data acceleration engines for local materialization and query acceleration in Spice' image: /img/og/data-accelerators.png sidebar_position: 2 pagination_prev: null pagination_next: null

Supported Data Accelerators

Choosing an Accelerator

Spice Cayenne vs DuckDB

Data Types

Schema Handling

Data Accelerator Docs

Related Documentation