spiceai/docs

spiceai/

docs

Help Login

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.10.x/reference/memory.md

spiceai/docs | Spice Cloud Platform

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.10.x/reference/memory.md

spiceai/docs/README.md

title: 'Managing Memory Usage' sidebar_label: 'Memory' sidebar_position: 31 description: 'Guidelines and best practices for managing memory usage and optimizing performance in Spice.ai Open Source deployments.' keywords:

memory pagination_prev: null pagination_next: null

Effective memory management is essential for maintaining optimal performance and stability in Spice deployments. This guide outlines recommendations and best practices for managing memory usage across different Data Accelerators.

General Memory Recommendations

Memory requirements vary based on workload characteristics, dataset sizes, query complexity, and refresh modes. Recommended allocations include:

Typical workloads: At least 8 GB RAM.
Larger datasets:
- refresh_mode: full: 2.5x dataset size.
- refresh_mode: append: 1.5x dataset size.
- refresh_mode: changes: Primarily influenced by CDC event volume and frequency; 1.5x dataset size is a reasonable estimate.

Memory requirements can be reduced by using file-based acceleration with DuckDB, SQLite, or Spice Cayenne, which store data on disk and support spilling.

Accelerator-Specific Memory Management

Different acceleration engines have distinct memory characteristics and tuning options.

Arrow (In-Memory)

The default Arrow accelerator stores all data in memory uncompressed. Datasets must fit entirely in available RAM.

Data is stored uncompressed in Apache Arrow format
No configuration options for memory limits
Best for smaller datasets requiring maximum query speed
Consider switching to file-based accelerators for datasets exceeding available memory

Hash Index Memory (Experimental, v1.11.0-rc.2+):

When using the optional hash index, additional memory is required:

Component	Memory per Row
Hash slot	16 bytes
Bloom filter	~1.25 bytes
Total	~17.25 bytes

For a 10 million row dataset with hash index enabled, expect ~165 MB additional memory overhead.

Spice Cayenne

Spice Cayenne stores data on disk using the Vortex columnar format, with configurable caches for metadata and frequently accessed data segments. The caches can be configured to reside either in memory or on disk, which impacts overall memory behavior.

Spice Cayenne is DataFusion query-native, meaning all query execution adheres to the runtime.query.memory_limit setting. When query memory is exhausted, DataFusion spills intermediate results to disk. This architecture provides predictable memory usage while maintaining high query performance.

Memory Configuration Parameters:

Parameter	Default	Description
`cayenne_footer_cache_mb`	`128`	Size of the in-memory Vortex footer cache in megabytes. Larger values improve query performance for repeated scans by caching file metadata.
`cayenne_segment_cache_mb`	`256`	Size of the in-memory Vortex segment cache in megabytes. Caches decompressed data segments for improved query performance.

Memory Usage Guidelines:

Base memory: ~500 MB for runtime overhead
Footer cache: 128 MB default, increase for datasets with many files
Segment cache: 256 MB default, increase for workloads with repeated scans on the same data
Query execution memory: Depends on query complexity and concurrency

Example Configuration:

DuckDB

DuckDB manages memory through streaming execution, intermediate spilling, and buffer management. By default, each DuckDB instance uses up to 80% of available system memory.

Memory Configuration Parameters:

Parameter	Default	Description
`duckdb_memory_limit`	80% of system RAM	Maximum memory for the DuckDB instance

Memory Usage Guidelines:

Set duckdb_memory_limit to control memory per DuckDB instance
DuckDB indexes do not support spilling and may consume significant memory
Allocate at least 30% additional container/machine memory for the runtime process

Example Configuration:

SQLite

SQLite is lightweight and efficient for smaller datasets but does not support intermediate spilling. Datasets must fit in memory or use application-level paging.

Refresh Modes and Memory Implications

Refresh modes affect memory usage as follows:

Full Refresh: Temporarily loads data into a new table before replacing the existing table so that it can be atomically swapped and maintain consistency. This requires memory for both tables simultaneously, resulting in higher usage.
Append Refresh: Incrementally inserts or upserts data, using memory only for the incremental data, which reduces usage.
Changes Refresh: Applies CDC events incrementally. Memory usage depends on event volume and frequency, typically resulting in lower and predictable usage.

DataFusion Memory Management

Spice.ai uses DataFusion as its query execution engine. By default, DataFusion does not enforce strict memory limits, which can lead to unbounded usage. Spice.ai addresses this through:

Memory Limit: The runtime.query.memory_limit parameter defines the maximum memory available for query execution. Once the memory limit is reached, supported query operations spill data to disk, helping prevent out-of-memory errors and maintain query stability. See Spicepod Configuration for details.
Memory Budgeting: Limits memory per query execution. Queries exceeding the limit return an error. See Spicepod Configuration for details.
Spill-to-Disk: Operators such as Sort, Join, and GroupByHash spill intermediate results to disk when memory limits are exceeded, preventing out-of-memory errors.
Spill File Compression: The runtime.query.spill_compression parameter controls how spill files are compressed during query execution. By default, Spice.ai uses Zstandard (zstd) compression, which offers a high compression ratio and helps reduce disk usage when queries spill intermediate data. See Spicepod Configuration for details.

DataFusion supports spilling for several operators, but not all operations are currently supported. Notably, the following operations do not support spilling:

HashJoin (tracking issue)
ExternalSorterMerge (no current tracking issue; previously discussed in the context of SortMergeJoin)
RepartitionMerge (spilling is suggested to be supported, but may depend on HashJoin support; see issue)

Embedded Data Accelerators

Spice.ai integrates with embedded accelerators like SQLite and DuckDB, each with unique memory considerations:

Operators such as Sort, Join, and GroupByHash spill intermediate results to disk when memory limits are exceeded, preventing out-of-memory errors. DataFusion writes spill files using the Arrow IPC Stream format.

Spill Compression:

The runtime.query.spill_compression parameter controls how spill files are compressed:

Value	Description
`zstd` (default)	High compression ratio, reduces disk usage
`lz4_frame`	Faster compression/decompression, larger files
`uncompressed`	No compression overhead, largest files

Spill Limitations

DataFusion supports spilling for several operators, but the following operations do not currently support spilling:

HashJoin (tracking issue)
ExternalSorterMerge
RepartitionMerge

Queries using these operators that exceed memory limits may fail. Monitor query patterns and allocate sufficient memory for workloads that rely on these operators.

Predicate Pushdown and Memory Reduction

Predicate pushdown reduces memory consumption by filtering data early in the query execution pipeline. Rather than reading all data and filtering afterward, Spice pushes filter predicates to the data source, reducing the volume of data materialized in memory.

How Pushdown Reduces Memory

Stage	Without Pushdown	With Pushdown
Read from source	All rows	Matching rows only
Decompress	Full row groups	Pruned row groups
Materialize	Entire dataset	Filtered subset
Process	Full scan	Reduced scan

For a query selecting 1% of rows from a 100 GB dataset, pushdown can reduce peak memory from tens of gigabytes to hundreds of megabytes.

Pushdown Techniques by Format

Parquet and Parquet-backed sources (Iceberg, Delta Lake):

Row group pruning: Skips entire row groups (typically 128 MB) based on min/max statistics
Page Index: Skips individual pages (typically 8 KB) within row groups
Bloom filters: Skips row groups for equality predicates
Late materialization: Filters during decoding, reducing columns materialized

Vortex (Spice Cayenne):

Segment pruning: Skips segments based on per-segment min/max statistics
Compute push-down: Evaluates predicates on compressed data, reducing decompression overhead

Configuration for Memory Efficiency

For memory-constrained environments, set an appropriate memory limit and use file-based acceleration:

Sorting data by frequently filtered columns maximizes pushdown effectiveness. When data is sorted, entire segments or row groups have non-overlapping value ranges, enabling efficient pruning.

Memory Impact of Data Layout

Data Layout	Pushdown Effectiveness	Memory Impact
Sorted by filter column	Excellent	Minimal data read
Clustered (Z-ordered)	Good	Moderate data read
Random	Limited	Most data read

For time-series data, sort by timestamp. For multi-tenant data, consider sorting by tenant_id or clustering by (tenant_id, timestamp).

Embedded Data Accelerator Comparison

Accelerator	Storage	Query Memory Control	Memory Spilling	Best For
Arrow	Memory only	`runtime.query.memory_limit`	Yes	Small datasets, maximum speed
Spice Cayenne	Disk (Vortex)	`runtime.query.memory_limit`	Yes	Large datasets (1TB+), scalable analytics
DuckDB	Memory or Disk	`duckdb_memory_limit`	Yes	Medium datasets, complex queries
SQLite	Memory or Disk	None	No	Small-medium datasets, simple queries

Spice Cayenne and Arrow both use DataFusion as the query execution engine and share the same runtime.query.memory_limit configuration. DuckDB manages its own memory pool separately via the duckdb_memory_limit parameter.

Memory Allocators

Spice supports multiple memory allocators, each with different performance characteristics. The allocator is selected at build time and available through different Docker image tags.

Allocator	Docker Tag Suffix	Characteristics	Best For
snmalloc (default)	(none)	High performance, low fragmentation, multi-threaded	Most workloads
jemalloc	`-jemalloc`	Predictable performance, good memory profiling	Memory-constrained environments, debugging
System allocator	`-sysalloc`	Uses OS default allocator	Compatibility, minimal overhead

Usage Example:

Recommendations:

Use the default (snmalloc) for most production workloads
Use jemalloc when memory profiling or debugging memory issues
Use sysalloc for maximum compatibility with system monitoring tools

Results Cache Memory

Spice maintains in-memory caches for SQL query results, search results, and embeddings. These caches consume memory in addition to accelerator and query execution memory.

Cache Memory Configuration

Cache Type	Default Max Size	Description
`sql_results`	128 MiB	Caches SQL query results
`search_results`	128 MiB	Caches vector and full-text search results
`embeddings`	128 MiB	Caches embedding model responses

Example Configuration:

Cache Eviction Policies

Policy	Description	Performance
`lru` (default)	Least Recently Used	Good general-purpose hit rates
`tiny_lfu`	TinyLFU admission policy	Higher hit rates for skewed access patterns

TinyLFU maintains frequency information to admit only items likely to be accessed again, resulting in higher hit rates for workloads with varying query frequency patterns.

Cache Memory Impact

When sizing memory, account for cache allocations:

Example calculation:

Runtime overhead: 500 MB
Spice Cayenne caches: 384 MB (128 MB footer + 256 MB segment)
Query memory limit: 4 GB
Results caches: 1 GB (512 MB SQL + 256 MB search + 256 MB embeddings)
Total: ~6 GB minimum

Cache Performance Considerations

Larger max_size values improve hit rates but consume more memory
Shorter item_ttl values reduce memory usage but may decrease hit rates
Use stale_while_revalidate_ttl to serve stale results while refreshing in the background
Monitor cache hit rates via observability metrics to tune configuration

See Caching for complete cache configuration options.

Kubernetes Memory Configuration

Configure appropriate memory requests and limits in Kubernetes pod specifications to ensure resource availability:

Monitoring and Profiling

Use observability tools to monitor and profile memory usage regularly. This helps identify and resolve potential bottlenecks promptly.

Query execution memory usage
Accelerator cache hit rates
Data refresh memory consumption

See Observability for configuration details.

General Memory Recommendations

Memory requirements vary based on workload characteristics, dataset sizes, query complexity, and refresh modes. Recommended allocations include:

Typical workloads: At least 8 GB RAM.
Larger datasets:
- refresh_mode: full: 2.5x dataset size.
- refresh_mode: append: 1.5x dataset size.
- refresh_mode: changes: Primarily influenced by CDC event volume and frequency; 1.5x dataset size is a reasonable estimate.

Memory requirements can be reduced by using file-based acceleration with DuckDB, SQLite, or Spice Cayenne, which store data on disk and support spilling.

Accelerator-Specific Memory Management

Different acceleration engines have distinct memory characteristics and tuning options.

Arrow (In-Memory)

The default Arrow accelerator stores all data in memory uncompressed. Datasets must fit entirely in available RAM.

Data is stored uncompressed in Apache Arrow format
No configuration options for memory limits
Best for smaller datasets requiring maximum query speed
Consider switching to file-based accelerators for datasets exceeding available memory

Hash Index Memory (Experimental, v1.11.0-rc.2+):

When using the optional hash index, additional memory is required:

Component	Memory per Row
Hash slot	16 bytes
Bloom filter	~1.25 bytes
Total	~17.25 bytes

For a 10 million row dataset with hash index enabled, expect ~165 MB additional memory overhead.

Spice Cayenne

Memory Configuration Parameters:

Parameter	Default	Description
`cayenne_footer_cache_mb`	`128`	Size of the in-memory Vortex footer cache in megabytes. Larger values improve query performance for repeated scans by caching file metadata.
`cayenne_segment_cache_mb`	`256`	Size of the in-memory Vortex segment cache in megabytes. Caches decompressed data segments for improved query performance.

Memory Usage Guidelines:

Base memory: ~500 MB for runtime overhead
Footer cache: 128 MB default, increase for datasets with many files
Segment cache: 256 MB default, increase for workloads with repeated scans on the same data
Query execution memory: Depends on query complexity and concurrency

Example Configuration:

DuckDB

DuckDB manages memory through streaming execution, intermediate spilling, and buffer management. By default, each DuckDB instance uses up to 80% of available system memory.

Memory Configuration Parameters:

Parameter	Default	Description
`duckdb_memory_limit`	80% of system RAM	Maximum memory for the DuckDB instance

Memory Usage Guidelines:

Set duckdb_memory_limit to control memory per DuckDB instance
DuckDB indexes do not support spilling and may consume significant memory
Allocate at least 30% additional container/machine memory for the runtime process

Example Configuration:

SQLite

SQLite is lightweight and efficient for smaller datasets but does not support intermediate spilling. Datasets must fit in memory or use application-level paging.

Refresh Modes and Memory Implications

Refresh modes affect memory usage as follows:

Full Refresh: Temporarily loads data into a new table before replacing the existing table so that it can be atomically swapped and maintain consistency. This requires memory for both tables simultaneously, resulting in higher usage.
Append Refresh: Incrementally inserts or upserts data, using memory only for the incremental data, which reduces usage.
Changes Refresh: Applies CDC events incrementally. Memory usage depends on event volume and frequency, typically resulting in lower and predictable usage.

DataFusion Memory Management

Spice.ai uses DataFusion as its query execution engine. By default, DataFusion does not enforce strict memory limits, which can lead to unbounded usage. Spice.ai addresses this through:

Memory Limit: The runtime.query.memory_limit parameter defines the maximum memory available for query execution. Once the memory limit is reached, supported query operations spill data to disk, helping prevent out-of-memory errors and maintain query stability. See Spicepod Configuration for details.
Memory Budgeting: Limits memory per query execution. Queries exceeding the limit return an error. See Spicepod Configuration for details.
Spill-to-Disk: Operators such as Sort, Join, and GroupByHash spill intermediate results to disk when memory limits are exceeded, preventing out-of-memory errors.
Spill File Compression: The runtime.query.spill_compression parameter controls how spill files are compressed during query execution. By default, Spice.ai uses Zstandard (zstd) compression, which offers a high compression ratio and helps reduce disk usage when queries spill intermediate data. See Spicepod Configuration for details.

DataFusion supports spilling for several operators, but not all operations are currently supported. Notably, the following operations do not support spilling:

HashJoin (tracking issue)
ExternalSorterMerge (no current tracking issue; previously discussed in the context of SortMergeJoin)
RepartitionMerge (spilling is suggested to be supported, but may depend on HashJoin support; see issue)

Embedded Data Accelerators

Spice.ai integrates with embedded accelerators like SQLite and DuckDB, each with unique memory considerations:

Spill Compression:

The runtime.query.spill_compression parameter controls how spill files are compressed:

Value	Description
`zstd` (default)	High compression ratio, reduces disk usage
`lz4_frame`	Faster compression/decompression, larger files
`uncompressed`	No compression overhead, largest files

Spill Limitations

DataFusion supports spilling for several operators, but the following operations do not currently support spilling:

HashJoin (tracking issue)
ExternalSorterMerge
RepartitionMerge

Queries using these operators that exceed memory limits may fail. Monitor query patterns and allocate sufficient memory for workloads that rely on these operators.

Predicate Pushdown and Memory Reduction

How Pushdown Reduces Memory

Stage	Without Pushdown	With Pushdown
Read from source	All rows	Matching rows only
Decompress	Full row groups	Pruned row groups
Materialize	Entire dataset	Filtered subset
Process	Full scan	Reduced scan

For a query selecting 1% of rows from a 100 GB dataset, pushdown can reduce peak memory from tens of gigabytes to hundreds of megabytes.

Pushdown Techniques by Format

Parquet and Parquet-backed sources (Iceberg, Delta Lake):

Row group pruning: Skips entire row groups (typically 128 MB) based on min/max statistics
Page Index: Skips individual pages (typically 8 KB) within row groups
Bloom filters: Skips row groups for equality predicates
Late materialization: Filters during decoding, reducing columns materialized

Vortex (Spice Cayenne):

Segment pruning: Skips segments based on per-segment min/max statistics
Compute push-down: Evaluates predicates on compressed data, reducing decompression overhead

Configuration for Memory Efficiency

For memory-constrained environments, set an appropriate memory limit and use file-based acceleration:

Sorting data by frequently filtered columns maximizes pushdown effectiveness. When data is sorted, entire segments or row groups have non-overlapping value ranges, enabling efficient pruning.

Memory Impact of Data Layout

Data Layout	Pushdown Effectiveness	Memory Impact
Sorted by filter column	Excellent	Minimal data read
Clustered (Z-ordered)	Good	Moderate data read
Random	Limited	Most data read

For time-series data, sort by timestamp. For multi-tenant data, consider sorting by tenant_id or clustering by (tenant_id, timestamp).

Embedded Data Accelerator Comparison

Accelerator	Storage	Query Memory Control	Memory Spilling	Best For
Arrow	Memory only	`runtime.query.memory_limit`	Yes	Small datasets, maximum speed
Spice Cayenne	Disk (Vortex)	`runtime.query.memory_limit`	Yes	Large datasets (1TB+), scalable analytics
DuckDB	Memory or Disk	`duckdb_memory_limit`	Yes	Medium datasets, complex queries
SQLite	Memory or Disk	None	No	Small-medium datasets, simple queries

Memory Allocators

Spice supports multiple memory allocators, each with different performance characteristics. The allocator is selected at build time and available through different Docker image tags.

Allocator	Docker Tag Suffix	Characteristics	Best For
snmalloc (default)	(none)	High performance, low fragmentation, multi-threaded	Most workloads
jemalloc	`-jemalloc`	Predictable performance, good memory profiling	Memory-constrained environments, debugging
System allocator	`-sysalloc`	Uses OS default allocator	Compatibility, minimal overhead

Usage Example:

Recommendations:

Use the default (snmalloc) for most production workloads
Use jemalloc when memory profiling or debugging memory issues
Use sysalloc for maximum compatibility with system monitoring tools

Results Cache Memory

Spice maintains in-memory caches for SQL query results, search results, and embeddings. These caches consume memory in addition to accelerator and query execution memory.

Cache Memory Configuration

Cache Type	Default Max Size	Description
`sql_results`	128 MiB	Caches SQL query results
`search_results`	128 MiB	Caches vector and full-text search results
`embeddings`	128 MiB	Caches embedding model responses

Example Configuration:

Cache Eviction Policies

Policy	Description	Performance
`lru` (default)	Least Recently Used	Good general-purpose hit rates
`tiny_lfu`	TinyLFU admission policy	Higher hit rates for skewed access patterns

TinyLFU maintains frequency information to admit only items likely to be accessed again, resulting in higher hit rates for workloads with varying query frequency patterns.

Cache Memory Impact

When sizing memory, account for cache allocations:

Example calculation:

Runtime overhead: 500 MB
Spice Cayenne caches: 384 MB (128 MB footer + 256 MB segment)
Query memory limit: 4 GB
Results caches: 1 GB (512 MB SQL + 256 MB search + 256 MB embeddings)
Total: ~6 GB minimum

Cache Performance Considerations

Larger max_size values improve hit rates but consume more memory
Shorter item_ttl values reduce memory usage but may decrease hit rates
Use stale_while_revalidate_ttl to serve stale results while refreshing in the background
Monitor cache hit rates via observability metrics to tune configuration

See Caching for complete cache configuration options.

Kubernetes Memory Configuration

Configure appropriate memory requests and limits in Kubernetes pod specifications to ensure resource availability:

Monitoring and Profiling

Use observability tools to monitor and profile memory usage regularly. This helps identify and resolve potential bottlenecks promptly.

Query execution memory usage
Accelerator cache hit rates
Data refresh memory consumption

See Observability for configuration details.

General Memory Recommendations

Accelerator-Specific Memory Management

Arrow (In-Memory)

Spice Cayenne

DuckDB

SQLite

Refresh Modes and Memory Implications

DataFusion Memory Management

Embedded Data Accelerators

Spill Limitations

Predicate Pushdown and Memory Reduction

How Pushdown Reduces Memory

Pushdown Techniques by Format

Configuration for Memory Efficiency

Memory Impact of Data Layout

Embedded Data Accelerator Comparison

Memory Allocators

Results Cache Memory

Cache Memory Configuration

Cache Eviction Policies

Cache Memory Impact

Cache Performance Considerations

Kubernetes Memory Configuration

Monitoring and Profiling

Related Documentation

General Memory Recommendations

Accelerator-Specific Memory Management

Arrow (In-Memory)

Spice Cayenne

DuckDB

SQLite

Refresh Modes and Memory Implications

DataFusion Memory Management

Embedded Data Accelerators

Spill Limitations

Predicate Pushdown and Memory Reduction

How Pushdown Reduces Memory

Pushdown Techniques by Format

Configuration for Memory Efficiency

Memory Impact of Data Layout

Embedded Data Accelerator Comparison

Memory Allocators

Results Cache Memory

Cache Memory Configuration

Cache Eviction Policies

Cache Memory Impact

Cache Performance Considerations

Kubernetes Memory Configuration

Monitoring and Profiling

Related Documentation