Announcing the release of Spice v1.4.0! ⚡
This release upgrades DataFusion to v47 and Arrow to v55 for faster queries, more efficient Parquet/CSV handling, and improved reliability. It introduces the AWS Glue Catalog and Data Connectors for native access to Glue-managed data on S3, and adds support for Databricks U2M OAuth for secure Databricks user authentication.
New Cron-based dataset refreshes and worker schedules enable automated task management, while dataset and search results caching improvements further optimizes query, search, and RAG performance.
Spice.ai is built on the DataFusion query engine. The v47 release brings:
Performance Improvements 🚀: This release delivers major query speedups through specialized GroupsAccumulator implementations for first_value, last_value, and min/max on Duration types, eliminating unnecessary sorting and computation. TopK operations are now up to 10x faster thanks to early exit optimizations, while sort performance is further enhanced by reusing row converters, removing redundant clones, and optimizing sort-preserving merge streams. Logical operations benefit from short-circuit evaluation for AND/OR, reducing overhead, and additional enhancements address high latency from sequential metadata fetching, improve int/string comparison efficiency, and simplify logical expressions for better execution.
Bug Fixes & Compatibility Improvements 🛠️: The release addresses issues with external sort, aggregation, and window functions, improves handling of NULL values and type casting in arrays and binary operations, and corrects problems with complex joins and nested window expressions. It also addresses SQL unparsing for subqueries, aliases, and UNION BY NAME.
See the Apache DataFusion 47.0.0 Changelog for details.
Arrow v55 delivers faster Parquet gzip compression, improved array concatenation, and better support for large files (4GB+) and modular encryption. Parquet metadata reads are now more efficient, with support for range requests and enhanced compatibility for INT96 timestamps and timezones. CSV parsing is more robust, with clearer error messages. These updates boost performance, compatibility, and reliability.
See the Arrow 55.0.0 Changelog and Arrow 55.1.0 Changelog for details.
Search Result Caching: Spice now supports runtime caching for search results, improving performance for subsequent searches and chat completion requests that use the document_similarity LLM tool. Caching is configurable with options like maximum size, item TTL, eviction policy, and hashing algorithm.
Example spicepod.yml configuration:
For more information, refer to the Caching documentation.
AWS Glue Catalog Connector Alpha: Connect to AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV tables in S3.
Example spicepod.yml configuration:
For more information, refer to the Glue Catalog Connector documentation.
AWS Glue Data Connector Alpha: Connect to specific tables in AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV in S3.
Example spicepod.yml configuration:
For more information, refer to the Glue Data Connector documentation.
Databricks U2M OAuth: Spice now supports User-to-Machine (U2M) authentication for Databricks when called with a compatible client, such as the Spice Cloud Platform.
Dataset Refresh Schedules: Accelerated datasets now support a refresh_cron parameter, automatically refreshing the dataset on a defined cron schedule. Cron scheduled refreshes respect the global dataset_refresh_parallelism parameter.
Example spicepod.yml configuration:
For more information, refer to the Dataset Refresh Schedules documentation.
Worker Execution Schedules: Workers now support a cron parameter and will execute an LLM-prompt or SQL query automatically on the defined cron schedule, in conjunction with a provided params.prompt.
Example spicepod.yml configuration:
For more information, refer to the Worker Execution Schedules documentation.
SQL Worker Actions: Spice now supports workers with sql actions for automated SQL query execution on a cron schedule:
For more information, refer to the Workers with a SQL action documentation;
The Spice Cookbook now includes 70 recipes to help you get started with Spice quickly and easily.
To upgrade to v1.4.0, use one of the following methods:
CLI:
Homebrew:
Docker:
Pull the spiceai/spiceai:1.4.0 image:
For available tags, see DockerHub.
Helm:
Full Changelog: v1.3.2...v1.4.0
ORDER BY rand() and ORDER BY NULL (#6071) by @phillipleblanc in #6071PostApplyCandidateGeneration to handle all filters & projections. (#6096) by @Jeadie in #6096display_records (#6191) by @Sevenannn in #6191