title: 'Spice Cayenne Data Accelerator' sidebar_label: 'Spice Cayenne Data Accelerator' description: 'Spice Cayenne Data Accelerator (Vortex) Documentation' sidebar_position: 1 tags:
Spice Cayenne is a data acceleration engine designed for high-performance, scalable query on large-scale datasets. Built on Vortex, a high-performance columnar file format, Spice Cayenne combines columnar storage with in-process metadata management to provide fast query performance to scale to datasets beyond 1TB.
Spice Cayenne uses Vortex as its storage format, providing significant performance advantages:
Vortex is a Linux Foundation (LF AI & Data) project under Apache-2.0 license with neutral governance. For performance benchmarks, see bench.vortex.dev.
While DuckDB excels for datasets up to approximately 1TB, Spice Cayenne with Vortex is designed to scale beyond these limits.
Spice Cayenne follows a lakehouse architecture inspired by DuckLake, separating metadata management from data storage:

Key Design Principles:
ListingTable at a unique directory, enabling append operations and parallel readsFor optimal performance, store Cayenne data files on NVMe storage. NVMe provides the lowest latency and highest throughput for the random access patterns that Vortex files require.
Use S3 Express One Zone when persistence of accelerations across restarts is required. S3 Express One Zone adds network latency compared to local NVMe but provides durability. Sharing accelerated data across multiple Spice instances is planned for a future release.
To use Spice Cayenne as the data accelerator, specify cayenne as the engine for acceleration. Spice Cayenne supports mode: file, mode: file_create, and mode: file_update and stores data on disk.
Spice Cayenne is configured through two distinct parameter scopes:
acceleration.params and control how that dataset's accelerated data is stored, compressed, written, and compacted.runtime.params and control engine-global behavior — caches, optimizer rules, and dedicated memory pools — shared by every Cayenne-accelerated dataset.The two scopes are not interchangeable: setting a runtime parameter under acceleration.params (or a per-dataset parameter under runtime.params) has no effect — the value is ignored.
acceleration.params)Set under a dataset's acceleration.params:
| Parameter | Description |
|---|---|
cayenne_compression_strategy | Compression algorithm for accelerated data. Defaults to btrblocks. Supports btrblocks or zstd. |
cayenne_unsupported_type_action | Action when an unsupported data type is encountered. Defaults to error. See Data Type Support. |
cayenne_segment_cache_mb | Size of the in-memory Vortex segment cache in megabytes, caching decompressed data segments for improved query performance. Defaults to 256. |
cayenne_file_path | Custom path for storing Cayenne data files. Supports local paths or S3 Express One Zone URLs (e.g., s3://bucket--usw2-az1--x-s3/prefix/). |
cayenne_target_file_size_mb | Target size for individual Vortex files in MB. When writes exceed this size, a new Vortex file is created. Defaults to 256. Smaller files enable better parallelism and predicate pushdown. |
cayenne_metadata_dir |
These are acceleration parameters (set under acceleration.params) used when storing Cayenne data files in S3 Express One Zone:
| Parameter | Description |
|---|---|
cayenne_s3_zone_ids | Comma-separated availability zone IDs (e.g., usw2-az1,usw2-az2). Auto-generates bucket names in format spice-{app}-{dataset}--{zone}--x-s3. |
cayenne_s3_region | AWS region (e.g., us-west-2). Auto-derived from zone ID if not specified. |
cayenne_s3_auth | Authentication method: iam_role (default) or key. |
cayenne_s3_key | AWS access key ID (required when cayenne_s3_auth: key). |
cayenne_s3_secret | AWS secret access key (required when cayenne_s3_auth: key). |
cayenne_s3_session_token | AWS session token (optional, for temporary credentials). |
cayenne_s3_endpoint | Custom S3 endpoint URL (optional, overrides auto-generated endpoint). |
runtime.params)Set once under the top-level runtime.params and applied to every Cayenne-accelerated dataset in the instance. These are not valid under a dataset's acceleration.params:
| Parameter | Description |
|---|---|
cayenne_footer_cache_mb | Size of the engine-wide in-memory Vortex footer cache in megabytes. The footer cache stores Vortex file metadata (schemas, statistics, encoding information) and is shared across all Cayenne datasets. Larger values improve query performance for repeated scans. Defaults to 128. |
cayenne_filter_propagation | Enables Cayenne's filter-propagation optimizer rules. Accepts enabled or disabled; defaults to disabled. |
cayenne_optimizer_rules | Selects which Cayenne optimizer rules run. Accepts auto (default — enables the recommended set, gated by cayenne_filter_propagation), all, none / disabled, or a comma-separated list of individual rule names. |
cayenne_compaction_memory_fraction | Fraction of the query memory pool carved out for a dedicated Cayenne compaction memory pool. Defaults to 0.2 and is clamped to a supported range. Only applied when at least one Cayenne-accelerated dataset is enabled and dedicated thread pools are not disabled. |
Spice Cayenne performance can be optimized through cache configuration, compression strategy selection, and resource allocation.
Spice Cayenne uses two in-memory caches to accelerate query performance:
Footer Cache (cayenne_footer_cache_mb) — runtime parameter:
The footer cache stores Vortex file metadata, including schemas, statistics, and encoding information. It is engine-global and shared across every Cayenne-accelerated dataset, so it is set under runtime.params, not per dataset. Larger cache sizes benefit workloads with many files.
Segment Cache (cayenne_segment_cache_mb) — acceleration parameter:
The segment cache stores decompressed data segments. It is configured per dataset under acceleration.params. Larger cache sizes benefit workloads with repeated queries on the same data.
Example - High-throughput configuration:
Spice Cayenne supports two compression strategies, each with different performance characteristics. The BtrBlocks compression algorithm is designed for fast analytical queries, while zstd provides fast write performance. Additionally, zstd achieves better compression ratios when data contains large chunks of binary or text.
| Strategy | Compression | Read Speed | Write Speed | Best For |
|---|---|---|---|---|
btrblocks | Higher | Faster | Moderate | Read-heavy analytics (default) |
zstd | High | Moderate | Faster | Write-heavy workloads, large binary or text data |
Example - Write-optimized configuration:
The cayenne_target_file_size_mb parameter controls when new Vortex files are created during writes:
Spice Cayenne is DataFusion query-native, meaning all query execution uses Apache DataFusion and adheres to the runtime.query.memory_limit setting. This provides:
DataFusion's GreedyMemoryPool allows memory reservations on a first-come, first-served basis, improving throughput for high-concurrency queries with many partitions.
Spice Cayenne uses Vortex's advanced columnar format, which provides:
Vortex delivers 100x faster random access reads compared to Apache Parquet through several architectural features:
Segment Statistics (Zone-Map Equivalent):
Vortex's ChunkedLayout maintains per-segment statistics for each column, enabling segment pruning during query execution. Statistics include:
| Statistic | Description | Use Case |
|---|---|---|
min | Minimum value in segment | Range predicate pruning |
max | Maximum value in segment | Range predicate pruning |
null_count | Count of null values | IS NULL/IS NOT NULL optimization |
is_sorted | Whether segment is sorted | Binary search for point lookups |
is_constant | Whether all values are identical | Immediate value return |
When a query includes a WHERE clause, Spice Cayenne evaluates whether each segment could contain matching rows. Segments that cannot match based on min/max statistics are skipped entirely, similar to DuckDB's zone-maps without requiring explicit index creation.
Example - Segment Pruning:
For a table with segments containing timestamp ranges [2024-01-01, 2024-01-15], [2024-01-16, 2024-01-31], [2024-02-01, 2024-02-15], a query:
Prunes the first segment (max < 2024-01-20) and reads only the second and third segments.
Fast Random Access Encodings:
Vortex encodings support direct random access to compressed data:
Compute Push-Down:
Vortex supports executing filter and compute operations directly on compressed data, avoiding full decompression for predicate evaluation. This compute push-down reduces CPU and memory overhead by processing data in its compressed form:
| Encoding | Data Type | Operations |
|---|---|---|
| FSST | Strings | Equality, prefix matching on compressed symbols |
| FastLanes | Integers | SIMD-accelerated comparison on bit-packed data |
| ALP | Floats | Range comparisons with minimal decompression |
| Dictionary | Any | Lookup predicates evaluated on dictionary indices |
| RLE | Any | Constant runs evaluated once per run |
Array-level statistics (is_sorted, is_constant, min, max) enable additional optimizations beyond filtering. For example, is_sorted enables binary search for point lookups, and is_constant returns values immediately without scanning.
Performance Characteristics:
For point lookups and selective queries, Spice Cayenne with Vortex often matches or exceeds the performance of traditional B-tree indexes while consuming no additional memory for index structures. Performance scales with:
Spice Cayenne implements efficient deletes without rewriting data files using deletion vectors. Deletion vectors track which rows have been logically deleted, and the information is applied transparently during query execution.
How deletions are recorded and applied is controlled by the cayenne_deletion_mode parameter:
| Mode | How deletes are applied |
|---|---|
auto (default) | Resolves to position (merge-on-read) for every table. |
position | Per-file row-position RoaringBitmaps are pushed into the Vortex scan, skipping deleted rows at the storage layer with no per-row CPU cost. |
key | Deletes are applied above the Vortex scan via a per-row probe on the byte representation of the primary key columns. The explicit opt-out from merge-on-read for primary-key tables. |
Under the default auto (position) mode:
RoaringBitmap for memory-efficient storage of deleted row IDs, providing 50-90% memory savings compared to HashSet for sparse deletions.row_idx() read-back after each write, with a key-based fallback for any row whose position is not yet known. Pushing the deletes into the scan eliminates the per-row RowConverter deletion tax above it.Key-based deletion (cayenne_deletion_mode: key) uses the byte representation of primary key columns and applies deletes above the scan. This approach is position-independent and survives data reorganization.
For tables with a single-column Int64 primary key, Cayenne uses an optimized direct lookup strategy that avoids serialization overhead:
When on_conflict is configured, Cayenne supports upsert semantics using sequence numbers (Iceberg-style ordering):
When a primary key is deleted and then re-inserted:
Spice Cayenne supports storing data files in AWS S3 Express One Zone for single-digit millisecond latency, ideal for latency-sensitive query workloads that require persistence. Metadata remains on local disk for fast catalog operations while data files are stored in S3 Express One Zone.
S3 Express One Zone directory buckets provide:
Example 1 - Explicit bucket:
Example 2 - Auto-generated bucket with IAM role:
Example 3 - Explicit credentials:
S3 Express One Zone buckets use a specific naming format:
{base-name}--{zone-id}--x-s3{region-code}-az{number} (e.g., usw2-az1, use1-az4)spice-{app-name}-{dataset-name}--{zone-id}--x-s3The zone ID is automatically extracted from the bucket name to configure the correct endpoint.
S3 Express One Zone is available in select regions. Spice automatically derives the region from zone IDs:
| Zone ID Prefix | Region |
|---|---|
use1 | us-east-1 |
use2 | us-east-2 |
usw1 | us-west-1 |
usw2 | us-west-2 |
euw1 | eu-west-1 |
euw2 | eu-west-2 |
euw3 | eu-west-3 |
euc1 | eu-central-1 |
eun1 | eu-north-1 |
eus1 | eu-south-1 |
apne1 | ap-northeast-1 |
apne2 | ap-northeast-2 |
apse1 | ap-southeast-1 |
apse2 | ap-southeast-2 |
See AWS documentation for the complete list of S3 Express One Zone availability zones.
cayenne_s3_zone_ids, Spice automatically creates the S3 Express directory bucket if it doesn't exist (requires appropriate IAM permissions).Cayenne (via Vortex) supports most Arrow data types with the following considerations:
Int8, Int16, Int32, Int64, UInt*)Float32, Float64)| Original Type | Converted To | Notes |
|---|---|---|
Float16 | Float32 | Automatic conversion for Vortex compatibility |
Timestamp(Nanosecond/...) | Timestamp(Microsecond) | Precision normalized |
The following types require the unsupported_type_action parameter:
Interval typesDuration typesFixedSizeBinaryunsupported_type_action options:
| Value | Behavior |
|---|---|
error | Fail with error (default) |
string | Convert to Utf8 string |
warn | Include as-is with warning (may fail on insert) |
ignore | Skip the column entirely |
Resource requirements for Spice Cayenne depend on dataset size, query patterns, and cache configuration.
Spice Cayenne manages memory efficiently through columnar storage and selective caching. Memory allocation should account for:
| Component | Default | Notes |
|---|---|---|
| Runtime overhead | ~500 MB | Fixed baseline for the Spice runtime |
| Footer cache | 128 MB | Increase for datasets with many files (1-10 KB per file) |
| Segment cache | 256 MB | Increase based on hot data volume |
| Query execution | Variable | Depends on query complexity and concurrency |
Example - Memory-constrained environment:
Spice Cayenne stores data in a columnar format optimized for analytical queries. Storage requirements include:
Query performance scales with available CPU cores. Vortex's columnar format supports parallel decompression and scanning across multiple threads. Allocate sufficient CPU for:
Consider the following limitations when using Spice Cayenne acceleration:
mode: file and does not support in-memory (mode: memory) acceleration.Interval, Duration, and FixedSizeBinary types require unsupported_type_action configuration.indexes configuration. Vortex's segment statistics and fast random access encodings provide equivalent or better performance for most point lookup workloads.Complete example configuration using Spice Cayenne with performance tuning:
Spice Documentation:
External References:
Custom directory for storing Cayenne metadata (SQLite catalog). Defaults to {spice_data_path}/metadata. |
cayenne_metastore | Metastore backend type. Supports sqlite (default) or turso (requires turso feature flag). |
cayenne_upload_concurrency | Maximum number of concurrent file uploads when writing multiple Vortex files to S3 Express One Zone. Defaults to the available CPU parallelism. |
cayenne_write_concurrency | Writer partition override for unsorted ingests, controlling how many Vortex files are encoded in parallel during a write. Defaults to the session target_partitions. Values below 1 are clamped to 1. The sort-and-rewrite compaction path always writes serially regardless of this setting. |
cayenne_deletion_mode | How primary-key deletions are recorded and applied. Accepts auto, key, or position; defaults to auto, which resolves to position (merge-on-read). See Deletion Strategies. |
cayenne_pk_conflict_detection | Controls primary-key conflict detection on insert. Accepts auto or none; defaults to auto, which detects existing primary keys and resolves them as merge-on-read upserts. Set to none to skip conflict detection (blind append) for append-only CDC workloads where the source guarantees primary-key uniqueness. |
cayenne_compaction_trigger_files | Minimum number of small Vortex files in the current snapshot before tiered compaction runs. A "small" file is one whose size is below cayenne_target_file_size_mb / 4. Defaults to 4 for refresh_mode: caching / changes, or append with refresh_check_interval ≤ 5m; 8 otherwise. A value of 1 is clamped to a minimum of 2. |
cayenne_compaction_trigger_protected_snapshots | Number of protected snapshots before snapshot-maintenance compaction runs. Separate from cayenne_compaction_trigger_files so small-file tuning does not silently change scan amplification behavior. Defaults to 4 for refresh_mode: caching / changes, or append with refresh_check_interval ≤ 5m; 8 otherwise. A value of 1 is clamped to a minimum of 2. |
cayenne_compaction_trigger_snapshot_age_ms | Maximum age in milliseconds of the oldest protected snapshot before snapshot-maintenance compaction runs. Set to 0 to disable the age trigger. Defaults to 60000 for refresh_mode: caching / changes, or append with refresh_check_interval ≤ 5m; 300000 otherwise. |
cayenne_compaction_max_levels | Maximum number of consecutive compaction passes per trigger. Bounds write amplification when promotion keeps producing new candidates. Defaults to 3. |
cayenne_compaction_max_files_per_pick | Maximum number of eligible file paths retained in one compaction candidate for trigger selection and observability. The current compactor rewrites the whole current snapshot once triggered, so this does not bound rewrite IO or memory. Defaults to 32. |
cayenne_compaction_background_interval_ms | Background compaction interval in milliseconds. The accelerator runs a per-table background task at this interval. Set to 0 to disable the background task — inline compaction on writes still runs. Defaults to 10000 for refresh_mode: caching / changes, or append with refresh_check_interval ≤ 5m; 30000 otherwise. |
sort_columns | Comma-separated list of columns to sort data by on refresh operations. Improves segment pruning for frequently filtered columns. |
unsupported_type_action | Action when encountering unsupported data types. Options: error (default), string, warn, ignore. |
cayenne_s3_client_timeout | Request timeout duration (e.g., 30s, 5m). Defaults to 120s. |
cayenne_s3_unsigned_payload | Use unsigned payload for S3 Express One Zone requests. Defaults to true. |
cayenne_s3_allow_http | Set to true for testing with local S3-compatible storage. Defaults to false. |
cayenne_sort_merge_min_rows | Advanced anti-join tuning: row-count threshold above which the filter-propagation optimizer switches to a sort-merge strategy. Defaults to an internally tuned value; override only when profiling indicates a need. |
cayenne_sort_merge_memory_pool_fraction | Advanced anti-join tuning: fraction of the memory pool the sort-merge anti-join strategy may use. Defaults to an internally tuned value. |
aps1 |
| ap-south-1 |
sae1 | sa-east-1 |
cac1 | ca-central-1 |
afs1 | af-south-1 |
mes1 | me-south-1 |