date: 2025-11-18 title: 'Spice v1.9.0-rc.4 (Nov 18, 2025)' type: blog authors: [phillipleblanc] tags: [release, cayenne, acceleration, distributed-query, ballista, datafusion, dynamodb, duckdb, full-text-search, vector-search, hybrid-search, data-connector, performance, sql, vortex]
Announcing the release of Spice v1.9.0-rc.4! :hot_pepper:
This release candidate brings DuckDB v1.4.2, Cayenne partitioning improvements, and comprehensive security hardening across the CLI, data connectors, runtime, and MCP. v1.9.0-rc.4 also includes MySQL and PostgreSQL connector improvements with fixed nullability inferences and full-text search support, DynamoDB consistency improvements, HTTP connector validation and UX enhancements, and numerous reliability and performance optimizations. Significant improvements were also made to test and automation infrastructure to ensure high quality releases.
v1.9.0 introduces Spice Cayenne, a new high-performance data accelerator built on the Vortex columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of Multi-Node Distributed Query based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.
Introducing Cayenne: SQL as an Acceleration Format: A new high-performance Data Accelerator that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the Vortex columnar format, a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.
Cayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:
Key Features:
Example Spicepod.yml configuration:
Note, the Cayenne Data Accelerator is in Beta with limitations.
For more details, refer to the Cayenne Documentation, the Vortex project, and the DuckLake announcement that partly inspired this design.
Apache Ballista Integration: Spice now supports distributed query execution based on Apache Ballista, enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in preview in v1.9.0-rc.3.
Architecture:
A distributed Spice cluster consists of:
Getting Started:
Start a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:
Start one or more executors configured with the scheduler's flight URI:
Query Execution:
Queries run through the scheduler will now show a distributed_plan in EXPLAIN output, demonstrating how the query is distributed across executor nodes:
Current Limitations:
Spice.ai is built on the Apache DataFusion query engine. The v50 release brings significant performance improvements and enhanced reliability:
Performance Improvements 🚀:
Dynamic Filter Pushdown: Enhanced dynamic filter pushdown for custom ExecutionPlans, ensuring filters propagate correctly through all physical operators for improved query performance.
Partition Pruning: Expanded partition pruning support ensures that unnecessary partitions are skipped when filters are not used, reducing data scanning overhead and improving query execution times.
Apache Spark Compatible Functions: Added support for Spark-compatible functions including array, bit_get/bit_count, bitmap_count, crc32/sha1, date_add/date_sub, if, last_day, like/ilike, luhn_check, mod/pmod, next_day, parse_url, rint, and width_bucket.
Bug Fixes & Reliability: Resolved issues with partition name validation and empty execution plans when vector index lists are empty. Fixed timestamp support for partition expressions, enabling better partitioning for time-series data.
See the Apache DataFusion 50.0.3 Release for more details.
DuckDB v1.4.2: DuckDB has been upgraded to v1.4.2, which includes several performance optimizations.
Composite ART Index Support: DuckDB in Spice now supports composite (multi-column) Adaptive Radix Tree (ART) indexes for accelerated table scans. When queries filter on multiple columns fully covered by a composite index, the optimizer automatically uses index scans instead of full table scans, delivering significant performance improvements for selective queries.
Example configuration:
Performance example with composite index on 7.5M rows:
DuckDB Intermediate Materialization: Queries with indexes now use intermediate materialization (WITH ... AS MATERIALIZED) to leverage faster index scans. Currently supported for non-federated queries (query_federation: disabled) against a single table with indexes only. When predicates cover more columns than the index, the optimizer rewrites queries to first materialize index-filtered results, then apply remaining predicates. This optimization can deliver significant performance improvements for selective queries.
Example configuration:
Performance example:
The optimizer automatically rewrites the query to:
Parquet Buffering for Partitioned Writes: DuckDB partitioned writes in table mode now support Parquet buffering, reducing memory usage and improving write performance for large datasets.
Retention SQL on Refresh Commit: DuckDB accelerations now support running retention SQL on refresh commit, enabling automatic data cleanup and lifecycle management during refresh operations.
UTC Timezone for DuckDB: DuckDB now uses UTC as the default timezone, ensuring consistent behavior for time-based queries across different environments.
Example Spicepod.yml configuration:
Querying endpoints as tables: The HTTP/HTTPS Data Connectors now supports querying HTTP endpoints directly as tables in SQL queries with dynamic filters. This feature transforms REST APIs into queryable data sources, making it easy to integrate external service data.
Query HTTP endpoint that returns structured data (JSON, CSV, etc.) as if it were a database table
Configurable retry logic, timeouts, and POST request support for more complex API interactions
Example Spicepod.yml configuration:
Example SQL query:
If a request_body is supplied it will be posted to the endpoint:
Example SQL query:
HTTP endpoints can be accelerated using refresh_sql:
Improved Query Performance: The DynamoDB Data Connector now includes improved filter handling for edge cases, parallel scan support for faster data ingestion, and better error handling for misconfigured queries. These improvements enable more reliable and performant access to DynamoDB data.
Example Spicepod.yml configuration:
Atomic Range Reads for Versioned Files: Spice now supports S3 Versioning for all connectors using object-store (S3, Delta Lake, etc.), ensuring range reads over versioned files are atomically correct. When S3 versioning is enabled, Spice automatically tracks version IDs during file discovery and uses them for all subsequent range reads, preventing inconsistencies from concurrent file modifications.
Current limitations:
Full-Text Search on Views: Full-text search indexes are now supported on views, enabling advanced search scenarios over pre-aggregated or transformed data. This extends the power of Spice's search capabilities beyond base datasets.
Multi-Column Embeddings on Views: Views now support embedding columns, enabling vector search and semantic retrieval on view data. This is useful for search over aggregated or joined datasets.
Vector Engines on Views: Vector search engines are now available for views, enabling similarity search over complex queries and transformations.
Example Spicepod.yml configuration:
Dedicated Query Thread Pool: Query execution and accelerated refreshes now run on their own dedicated thread pool, separate from the HTTP server. This prevents heavy query workloads from slowing down API responses, keeping health checks fast and avoiding unnecessary Kubernetes pod restarts under load.
This feature was opt-in in previous releases and is now enabled by default. To disable it and revert to the previous behavior, add the following spicepod.yaml configuration:
Stale-While-Revalidate Cache Control: Query results now support "stale-while-revalidate" cache control, allowing stale cached data to be served immediately while asynchronously refreshing the cache entry in the background. This improves response times for frequently-accessed queries while maintaining data freshness. Requires cache key type to be set to "sql (raw)" for proper operation.
Optimized Prepared Statements: Prepared statement handling has been optimized for better performance with parameterized queries, reducing planning overhead and improving execution time for repeated queries.
Large RecordBatch Chunking: Large Arrow RecordBatch objects are now automatically chunked to control memory usage during query execution, preventing memory exhaustion for queries returning large result sets.
HTTP Cache-Control Support: The query result cache now supports the stale-while-revalidate Cache-Control directive, enabling faster response times by serving stale cached results immediately while asynchronously refreshing the cache in the background. This feature is particularly useful for applications that can tolerate slightly stale data in exchange for improved performance.
How it works:
When a cache entry is stale but within the stale-while-revalidate window, Spice will:
Configuration:
Use the Cache-Control HTTP header with the stale-while-revalidate directive:
This configuration caches results for 5 minutes (300 seconds), and allows serving stale results for an additional 60 seconds while refreshing in the background.
Requirements:
cache_key_type to sql or plan in results_caching configuration)Example configuration via HTTP header:
This feature improves application responsiveness while ensuring data freshness through background updates.
Enhanced HTTP Client Security: HTTP client usage across the runtime has been hardened with improved TLS validation, certificate pinning for critical endpoints, and better error handling for network failures.
ODBC Connector Improvements: Removed unwrap calls from the ODBC connector, improving error handling and reliability. Fixed secret handling and Kubernetes secret integration.
CLI Permissions Hardening: Tightened file permissions for the CLI and install script, ensuring secure defaults for configuration files and credentials.
Oracle Instant Client Pinning: Oracle Instant Client downloads are now pinned to specific SHAs, ensuring reproducible builds and preventing supply chain attacks.
Improved Credential Retry Logic: AWS SDK credential initialization has been significantly improved with more robust retry logic and better error handling. The system now automatically retries transient credential resolution failures using Fibonacci backoff, allowing Spice to tolerate extended AWS outages (up to ~48 hours) without manual intervention.
Key features:
The improvements ensure more reliable AWS service integration, particularly in environments with intermittent network connectivity or during AWS service degradations.
DataFusion Log Emission: The Spice runtime now emits DataFusion internal logs, providing deeper visibility into query planning and execution for debugging and performance analysis.
AI Completions Tracing: Fixed tracing so that ai_completions operations are correctly parented under sql_query traces, improving observability for AI-powered queries.
Version-Controlled Data Access: The new Git Data Connector (Alpha) enables querying datasets stored in Git repositories. This connector is ideal for use cases involving configuration files, documentation, or any data tracked in version control.
Example Spicepod.yml configuration:
For more details, refer to the Git Data Connector Documentation.
The Spice Java SDK have been upgraded with support configurable Arrow memory limit: spice-java v0.4.0
Install Specific Versions: The spice install command now supports installing specific versions of the Spice runtime and CLI. This enables easy version management, downgrading, or installation of specific releases for testing or compatibility requirements.
Usage:
Note: Homebrew installations require manual version management via brew install spiceai/spiceai/spice@<version>.
Persistent Query History: The Spice CLI REPL (SQL, search, and chat interfaces) now persists command history to ~/.spice/query_history.txt, making your query history available across sessions. The history file is automatically created if it doesn't exist, with graceful fallback if the home directory cannot be determined.
New REPL Commands:
.clear - Clear the screen using ANSI escape codes for a clean workspace.clear history - Clear and persist the query history, removing all stored commandsTab Completion: Tab completion now includes suggestions based on your command history, making it faster to re-run or modify previous queries.
Example usage:
spicepod.yaml files, providing actionable feedback for misconfiguration.ListingTable partitions are pruned correctly when filters are not used.regexp_match function for DuckDB datasets.No breaking changes.
New HTTP Data Connector Recipe: New recipe demonstrating how to query REST APIs and HTTP(s) endpoints. See HTTP Connector Recipe for details.
The Spice Cookbook includes 82 recipes to help you get started with Spice quickly and easily.
To upgrade to v1.9.0-rc.4, use one of the following methods:
CLI:
Homebrew:
Docker:
Pull the spiceai/spiceai:1.9.0-rc.4 image:
For available tags, see DockerHub.
Helm:
AWS Marketplace:
🎉 Spice is now available in the AWS Marketplace!
generate_changelog script by @krinart in #8028CachedQueryVector to avoid recomputing embedding vector for spilling/partitioned vector indexes. by @Jeadie in #8059delta_kernel::listed_log_files warnings by @phillipleblanc in #8158