title: 'Managing Memory Usage' sidebar_label: 'Memory' sidebar_position: 31 description: 'Guidelines and best practices for managing memory usage and optimizing performance in Spice.ai Open Source deployments.' keywords:
Effective memory management is essential for maintaining optimal performance and stability in Spice deployments. This guide outlines recommendations and best practices for managing memory usage across different Data Accelerators.
Memory requirements vary based on workload characteristics, dataset sizes, query complexity, and refresh modes. Recommended allocations include:
refresh_mode: full: 2.5x dataset size.refresh_mode: append: 1.5x dataset size.refresh_mode: changes: Primarily influenced by CDC event volume and frequency; 1.5x dataset size is a reasonable estimate.Memory requirements can be reduced by using file-based acceleration with DuckDB, SQLite, or Spice Cayenne, which store data on disk and support spilling.
Different acceleration engines have distinct memory characteristics and tuning options.
The default Arrow accelerator stores all data in memory uncompressed. Datasets must fit entirely in available RAM.
Hash Index Memory (Experimental, v1.11.0-rc.2+):
When using the optional hash index, additional memory is required:
| Component | Memory per Row |
|---|---|
| Hash slot | 16 bytes |
| Bloom filter | ~1.25 bytes |
| Total | ~17.25 bytes |
For a 10 million row dataset with hash index enabled, expect ~165 MB additional memory overhead.
Spice Cayenne stores data on disk using the Vortex columnar format, with configurable caches for metadata and frequently accessed data segments. The caches can be configured to reside either in memory or on disk, which impacts overall memory behavior.
Spice Cayenne is DataFusion query-native, meaning all query execution adheres to the runtime.query.memory_limit setting. When query memory is exhausted, DataFusion spills intermediate results to disk. This architecture provides predictable memory usage while maintaining high query performance.
Memory Configuration Parameters:
| Parameter | Default | Description |
|---|---|---|
cayenne_footer_cache_mb | 128 | Size of the in-memory Vortex footer cache in megabytes. Larger values improve query performance for repeated scans by caching file metadata. |
cayenne_segment_cache_mb | 256 | Size of the in-memory Vortex segment cache in megabytes. Caches decompressed data segments for improved query performance. |
Memory Usage Guidelines:
Example Configuration:
DuckDB manages memory through streaming execution, intermediate spilling, and buffer management. By default, each DuckDB instance uses up to 80% of available system memory.
Memory Configuration Parameters:
| Parameter | Default | Description |
|---|---|---|
duckdb_memory_limit | 80% of system RAM | Maximum memory for the DuckDB instance |
Memory Usage Guidelines:
duckdb_memory_limit to control memory per DuckDB instanceExample Configuration:
SQLite is lightweight and efficient for smaller datasets but does not support intermediate spilling. Datasets must fit in memory or use application-level paging.
Refresh modes affect memory usage as follows:
Spice.ai uses DataFusion as its query execution engine. By default, DataFusion does not enforce strict memory limits, which can lead to unbounded usage. Spice.ai addresses this through:
runtime.query.memory_limit parameter defines the maximum memory available for query execution. Once the memory limit is reached, supported query operations spill data to disk, helping prevent out-of-memory errors and maintain query stability. See Spicepod Configuration for details.runtime.query.spill_compression parameter controls how spill files are compressed during query execution. By default, Spice.ai uses Zstandard (zstd) compression, which offers a high compression ratio and helps reduce disk usage when queries spill intermediate data. See Spicepod Configuration for details.DataFusion supports spilling for several operators, but not all operations are currently supported. Notably, the following operations do not support spilling:
Spice.ai integrates with embedded accelerators like SQLite and DuckDB, each with unique memory considerations:
Operators such as Sort, Join, and GroupByHash spill intermediate results to disk when memory limits are exceeded, preventing out-of-memory errors. DataFusion writes spill files using the Arrow IPC Stream format.
Spill Compression:
The runtime.query.spill_compression parameter controls how spill files are compressed:
| Value | Description |
|---|---|
zstd (default) | High compression ratio, reduces disk usage |
lz4_frame | Faster compression/decompression, larger files |
uncompressed | No compression overhead, largest files |
DataFusion supports spilling for several operators, but the following operations do not currently support spilling:
Queries using these operators that exceed memory limits may fail. Monitor query patterns and allocate sufficient memory for workloads that rely on these operators.
Predicate pushdown reduces memory consumption by filtering data early in the query execution pipeline. Rather than reading all data and filtering afterward, Spice pushes filter predicates to the data source, reducing the volume of data materialized in memory.
| Stage | Without Pushdown | With Pushdown |
|---|---|---|
| Read from source | All rows | Matching rows only |
| Decompress | Full row groups | Pruned row groups |
| Materialize | Entire dataset | Filtered subset |
| Process | Full scan | Reduced scan |
For a query selecting 1% of rows from a 100 GB dataset, pushdown can reduce peak memory from tens of gigabytes to hundreds of megabytes.
Parquet and Parquet-backed sources (Iceberg, Delta Lake):
Vortex (Spice Cayenne):
For memory-constrained environments, set an appropriate memory limit and use file-based acceleration:
Sorting data by frequently filtered columns maximizes pushdown effectiveness. When data is sorted, entire segments or row groups have non-overlapping value ranges, enabling efficient pruning.
| Data Layout | Pushdown Effectiveness | Memory Impact |
|---|---|---|
| Sorted by filter column | Excellent | Minimal data read |
| Clustered (Z-ordered) | Good | Moderate data read |
| Random | Limited | Most data read |
For time-series data, sort by timestamp. For multi-tenant data, consider sorting by tenant_id or clustering by (tenant_id, timestamp).
| Accelerator | Storage | Query Memory Control | Memory Spilling | Best For |
|---|---|---|---|---|
| Arrow | Memory only | runtime.query.memory_limit | Yes | Small datasets, maximum speed |
| Spice Cayenne | Disk (Vortex) | runtime.query.memory_limit | Yes | Large datasets (1TB+), scalable analytics |
| DuckDB | Memory or Disk | duckdb_memory_limit | Yes | Medium datasets, complex queries |
| SQLite | Memory or Disk | None | No | Small-medium datasets, simple queries |
Spice Cayenne and Arrow both use DataFusion as the query execution engine and share the same runtime.query.memory_limit configuration. DuckDB manages its own memory pool separately via the duckdb_memory_limit parameter.
Spice supports multiple memory allocators, each with different performance characteristics. The allocator is selected at build time and available through different Docker image tags.
| Allocator | Docker Tag Suffix | Characteristics | Best For |
|---|---|---|---|
| snmalloc (default) | (none) | High performance, low fragmentation, multi-threaded | Most workloads |
| jemalloc | -jemalloc | Predictable performance, good memory profiling | Memory-constrained environments, debugging |
| System allocator | -sysalloc | Uses OS default allocator | Compatibility, minimal overhead |
Usage Example:
Recommendations:
Spice maintains in-memory caches for SQL query results, search results, and embeddings. These caches consume memory in addition to accelerator and query execution memory.
| Cache Type | Default Max Size | Description |
|---|---|---|
sql_results | 128 MiB | Caches SQL query results |
search_results | 128 MiB | Caches vector and full-text search results |
embeddings | 128 MiB | Caches embedding model responses |
Example Configuration:
| Policy | Description | Performance |
|---|---|---|
lru (default) | Least Recently Used | Good general-purpose hit rates |
tiny_lfu | TinyLFU admission policy | Higher hit rates for skewed access patterns |
TinyLFU maintains frequency information to admit only items likely to be accessed again, resulting in higher hit rates for workloads with varying query frequency patterns.
When sizing memory, account for cache allocations:
Example calculation:
max_size values improve hit rates but consume more memoryitem_ttl values reduce memory usage but may decrease hit ratesstale_while_revalidate_ttl to serve stale results while refreshing in the backgroundSee Caching for complete cache configuration options.
Configure appropriate memory requests and limits in Kubernetes pod specifications to ensure resource availability:
Use observability tools to monitor and profile memory usage regularly. This helps identify and resolve potential bottlenecks promptly.
See Observability for configuration details.
Spice Documentation:
External References: