Spice.ai caches S3 and object store data by accelerating remote datasets into a local engine. Instead of scanning remote Parquet, CSV, or JSON files on every query, Spice materializes the data locally and refreshes it on a configurable schedule. For single-file datasets, Spice tracks the object's metadata (size, last modified, ETag) and skips refresh when the file has not changed, reducing S3 API costs.
This pattern is useful for analytics workloads over object store data — for example, querying a Parquet dataset in S3 repeatedly throughout the day without incurring the latency and cost of a full scan on each query.
Cache a Parquet dataset from S3 with periodic refresh:
Queries against taxi_trips run against the local accelerator. Every 10 minutes, Spice checks S3 for changes and refreshes the accelerated copy if the data has changed.
Using mode: file persists the accelerated data to disk, so the cache survives Spice restarts without re-reading from S3.
For analytics workloads that run the same queries repeatedly against S3 data — such as dashboards or scheduled reports — the SQL results cache stores query output in memory. Identical queries within the TTL window return instantly without re-executing against the accelerator.
With this configuration, the first execution of a query runs against the accelerated S3 data and the result is cached in memory. Identical queries within 30 seconds are served from the cache. Between 30 seconds and 5 minutes 30 seconds, stale results are served immediately while Spice re-executes the query in the background.
The Results-Cache-Status response header indicates cache state: HIT, MISS, BYPASS, or STALE. Clients can use Cache-Control: no-cache to bypass the cache and force a fresh query execution.