spiceai/docs

Help Login

trunk

Edit on GitHub

Fork

/docs/website/releases/v1.4.0.md

spiceai/docs | Spice Cloud Platform

trunk

Edit on GitHub

Fork

/docs/website/releases/v1.4.0.md

spiceai/docs/README.md

date: 2025-06-19 title: 'Spice v1.4.0 (June 18, 2025)' type: blog authors: [peasee] tags: [release, datafusion, arrow, aws, cron, glue, workers]

Announcing the release of Spice v1.4.0! ⚡

This release upgrades DataFusion to v47 and Arrow to v55 for faster queries, more efficient Parquet/CSV handling, and improved reliability. It introduces the AWS Glue Catalog and Data Connectors for native access to Glue-managed data on S3, and adds support for Databricks U2M OAuth for secure Databricks user authentication.

New Cron-based dataset refreshes and worker schedules enable automated task management, while dataset and search results caching improvements further optimizes query, search, and RAG performance.

What's New in v1.4.0

DataFusion v47 Highlights

Spice.ai is built on the DataFusion query engine. The v47 release brings:

Performance Improvements 🚀: This release delivers major query speedups through specialized GroupsAccumulator implementations for first_value, last_value, and min/max on Duration types, eliminating unnecessary sorting and computation. TopK operations are now up to 10x faster thanks to early exit optimizations, while sort performance is further enhanced by reusing row converters, removing redundant clones, and optimizing sort-preserving merge streams. Logical operations benefit from short-circuit evaluation for AND/OR, reducing overhead, and additional enhancements address high latency from sequential metadata fetching, improve int/string comparison efficiency, and simplify logical expressions for better execution.

Bug Fixes & Compatibility Improvements 🛠️: The release addresses issues with external sort, aggregation, and window functions, improves handling of NULL values and type casting in arrays and binary operations, and corrects problems with complex joins and nested window expressions. It also addresses SQL unparsing for subqueries, aliases, and UNION BY NAME.

See the Apache DataFusion 47.0.0 Changelog for details.

Arrow v55 Highlights

Arrow v55 delivers faster Parquet gzip compression, improved array concatenation, and better support for large files (4GB+) and modular encryption. Parquet metadata reads are now more efficient, with support for range requests and enhanced compatibility for INT96 timestamps and timezones. CSV parsing is more robust, with clearer error messages. These updates boost performance, compatibility, and reliability.

See the Arrow 55.0.0 Changelog and Arrow 55.1.0 Changelog for details.

Runtime Highlights

Search Result Caching: Spice now supports runtime caching for search results, improving performance for subsequent searches and chat completion requests that use the document_similarity LLM tool. Caching is configurable with options like maximum size, item TTL, eviction policy, and hashing algorithm.

Example spicepod.yml configuration:

For more information, refer to the Caching documentation.

AWS Glue Catalog Connector Alpha: Connect to AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV tables in S3.

Example spicepod.yml configuration:

For more information, refer to the Glue Catalog Connector documentation.

AWS Glue Data Connector Alpha: Connect to specific tables in AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV in S3.

Example spicepod.yml configuration:

For more information, refer to the Glue Data Connector documentation.

Databricks U2M OAuth: Spice now supports User-to-Machine (U2M) authentication for Databricks when called with a compatible client, such as the Spice Cloud Platform.

Dataset Refresh Schedules: Accelerated datasets now support a refresh_cron parameter, automatically refreshing the dataset on a defined cron schedule. Cron scheduled refreshes respect the global dataset_refresh_parallelism parameter.

Example spicepod.yml configuration:

For more information, refer to the Dataset Refresh Schedules documentation.

Worker Execution Schedules: Workers now support a cron parameter and will execute an LLM-prompt or SQL query automatically on the defined cron schedule, in conjunction with a provided params.prompt.

Example spicepod.yml configuration:

For more information, refer to the Worker Execution Schedules documentation.

SQL Worker Actions: Spice now supports workers with sql actions for automated SQL query execution on a cron schedule:

For more information, refer to the Workers with a SQL action documentation;

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

Added Glue Catalog Connector and Data Connector cookbooks: Connect to tables and databases in the AWS Glue Data catalog.
Added Cron-based Dataset Refresh: Refresh datasets on defined schedules.

The Spice Cookbook now includes 70 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.4.0, use one of the following methods:

CLI:

Homebrew:

Docker:

Pull the spiceai/spiceai:1.4.0 image:

For available tags, see DockerHub.

Helm:

What's Changed

Dependencies

DataFusion: Upgraded to v47
arrow-rs: Upgraded to v55.1.0
delta_kernel: Upgraded to v0.11.0

Changelog

Update trunk to 1.4.0-unstable (#5878) by @phillipleblanc in #5878
Update openapi.json (#5885) by @app/github-actions in #5885
feat: Testoperator reports benchmark failure summary (#5889) by @peasee in #5889
fix: Publish binaries to dev when platform option is all (#5905) by @peasee in #5905
feat: Print dispatch current test count of total (#5906) by @peasee in #5906
Include multiple duckdb files acceleration scenarios into testoperator dispatch (#5913) by @sgrebnov in #5913
feat: Support building testoperator on dev (#5915) by @peasee in #5915
Update spicepod.schema.json (#5927) by @app/github-actions in #5927
Update ROADMAP & SECURITY for 1.3.0 (#5926) by @phillipleblanc in #5926

Full Changelog: v1.3.2...v1.4.0

spiceai/docs/README.md

date: 2025-06-19 title: 'Spice v1.4.0 (June 18, 2025)' type: blog authors: [peasee] tags: [release, datafusion, arrow, aws, cron, glue, workers]

Announcing the release of Spice v1.4.0! ⚡

New Cron-based dataset refreshes and worker schedules enable automated task management, while dataset and search results caching improvements further optimizes query, search, and RAG performance.

What's New in v1.4.0

DataFusion v47 Highlights

Spice.ai is built on the DataFusion query engine. The v47 release brings:

See the Apache DataFusion 47.0.0 Changelog for details.

Arrow v55 Highlights

See the Arrow 55.0.0 Changelog and Arrow 55.1.0 Changelog for details.

Runtime Highlights

Example spicepod.yml configuration:

For more information, refer to the Caching documentation.

AWS Glue Catalog Connector Alpha: Connect to AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV tables in S3.

Example spicepod.yml configuration:

For more information, refer to the Glue Catalog Connector documentation.

AWS Glue Data Connector Alpha: Connect to specific tables in AWS Glue Data Catalogs to query Iceberg, Parquet, or CSV in S3.

Example spicepod.yml configuration:

For more information, refer to the Glue Data Connector documentation.

Databricks U2M OAuth: Spice now supports User-to-Machine (U2M) authentication for Databricks when called with a compatible client, such as the Spice Cloud Platform.

Example spicepod.yml configuration:

For more information, refer to the Dataset Refresh Schedules documentation.

Example spicepod.yml configuration:

For more information, refer to the Worker Execution Schedules documentation.

SQL Worker Actions: Spice now supports workers with sql actions for automated SQL query execution on a cron schedule:

For more information, refer to the Workers with a SQL action documentation;

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

Added Glue Catalog Connector and Data Connector cookbooks: Connect to tables and databases in the AWS Glue Data catalog.
Added Cron-based Dataset Refresh: Refresh datasets on defined schedules.

The Spice Cookbook now includes 70 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.4.0, use one of the following methods:

CLI:

Homebrew:

Docker:

Pull the spiceai/spiceai:1.4.0 image:

For available tags, see DockerHub.

Helm:

What's Changed

Dependencies

DataFusion: Upgraded to v47
arrow-rs: Upgraded to v55.1.0
delta_kernel: Upgraded to v0.11.0

Changelog

Update trunk to 1.4.0-unstable (#5878) by @phillipleblanc in #5878
Update openapi.json (#5885) by @app/github-actions in #5885
feat: Testoperator reports benchmark failure summary (#5889) by @peasee in #5889
fix: Publish binaries to dev when platform option is all (#5905) by @peasee in #5905
feat: Print dispatch current test count of total (#5906) by @peasee in #5906
Include multiple duckdb files acceleration scenarios into testoperator dispatch (#5913) by @sgrebnov in #5913
feat: Support building testoperator on dev (#5915) by @peasee in #5915
Update spicepod.schema.json (#5927) by @app/github-actions in #5927
Update ROADMAP & SECURITY for 1.3.0 (#5926) by @phillipleblanc in #5926

Full Changelog: v1.3.2...v1.4.0

runtime:
  caching:
    search_results:
      enabled: true
      max_size: 128mb
      item_ttl: 5s
      eviction_policy: lru
      hashing_algorithm: siphash

runtime:
  caching:
    search_results:
      enabled: true
      max_size: 128mb
      item_ttl: 5s
      eviction_policy: lru
      hashing_algorithm: siphash

catalogs:
  - from: glue
    name: my_glue_catalog
    params:
      glue_key: <your-access-key-id>
      glue_secret: <your-secret-access-key>
      glue_region: <your-region>
    include:
      - 'testdb.hive_*'
      - 'testdb.iceberg_*'

catalogs:
  - from: glue
    name: my_glue_catalog
    params:
      glue_key: <your-access-key-id>
      glue_secret: <your-secret-access-key>
      glue_region: <your-region>
    include:
      - 'testdb.hive_*'
      - 'testdb.iceberg_*'

sql> show tables;
+-----------------+--------------+-------------------+------------+
| table_catalog   | table_schema | table_name        | table_type |
+-----------------+--------------+-------------------+------------+
| my_glue_catalog | testdb       | hive_table_001    | BASE TABLE |
| my_glue_catalog | testdb       | iceberg_table_001 | BASE TABLE |
| spice           | runtime      | task_history      | BASE TABLE |
+-----------------+--------------+-------------------+------------+

sql> show tables;
+-----------------+--------------+-------------------+------------+
| table_catalog   | table_schema | table_name        | table_type |
+-----------------+--------------+-------------------+------------+
| my_glue_catalog | testdb       | hive_table_001    | BASE TABLE |
| my_glue_catalog | testdb       | iceberg_table_001 | BASE TABLE |
| spice           | runtime      | task_history      | BASE TABLE |
+-----------------+--------------+-------------------+------------+

datasets:
  - from: glue:my_database.my_table
    name: my_table
    params:
      glue_auth: key
      glue_region: us-east-1
      glue_key: ${secrets:AWS_ACCESS_KEY_ID}
      glue_secret: ${secrets:AWS_SECRET_ACCESS_KEY}

datasets:
  - from: glue:my_database.my_table
    name: my_table
    params:
      glue_auth: key
      glue_region: us-east-1
      glue_key: ${secrets:AWS_ACCESS_KEY_ID}
      glue_secret: ${secrets:AWS_SECRET_ACCESS_KEY}

datasets:
  - from: databricks:spiceai_sandbox.default.messages
    name: messages
    params:
      databricks_endpoint: ${secrets:DATABRICKS_ENDPOINT}
      databricks_cluster_id: ${secrets:DATABRICKS_CLUSTER_ID}
      databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID}

datasets:
  - from: databricks:spiceai_sandbox.default.messages
    name: messages
    params:
      databricks_endpoint: ${secrets:DATABRICKS_ENDPOINT}
      databricks_cluster_id: ${secrets:DATABRICKS_CLUSTER_ID}
      databricks_client_id: ${secrets:DATABRICKS_CLIENT_ID}

datasets:
  - name: my_dataset
    from: s3://my-bucket/my_file.parquet
    acceleration:
      refresh_cron: 0 0 * * * # Daily refresh at midnight

datasets:
  - name: my_dataset
    from: s3://my-bucket/my_file.parquet
    acceleration:
      refresh_cron: 0 0 * * * # Daily refresh at midnight

workers:
  - name: email_reporter
    models:
      - from: gpt-4o
    params:
      prompt: 'Inspect the latest emails, and generate a summary report for them. Post the summary report to the connected Teams channel'
    cron: 0 2 * * * # Daily at 2am

workers:
  - name: email_reporter
    models:
      - from: gpt-4o
    params:
      prompt: 'Inspect the latest emails, and generate a summary report for them. Post the summary report to the connected Teams channel'
    cron: 0 2 * * * # Daily at 2am

workers:
  - name: my_worker
    cron: 0 * * * *
    sql: 'SELECT * FROM lineitem'

workers:
  - name: my_worker
    cron: 0 * * * *
    sql: 'SELECT * FROM lineitem'

spice upgrade

spice upgrade

brew upgrade spiceai/spiceai/spice

brew upgrade spiceai/spiceai/spice

docker pull spiceai/spiceai:1.4.0

docker pull spiceai/spiceai:1.4.0

helm repo update
helm upgrade spiceai spiceai/spiceai

helm repo update
helm upgrade spiceai spiceai/spiceai