What is Data Acceleration?
Data acceleration is a technique that caches frequently accessed data in a fast, local engine -- enabling sub-second query performance without permanently moving data away from its source systems.
Querying data where it lives -- across data warehouses, object stores, and transactional databases -- is the promise of SQL federation. But federation alone has a performance ceiling: every query must travel over the network to the source system, wait for that system to process it, and transfer results back. For latency-sensitive applications, dashboards, and AI workloads, that round-trip can be too slow.
Data acceleration solves this by maintaining a local, queryable copy of frequently accessed datasets in a fast engine close to the application. The source system remains the system of record. The acceleration layer handles reads, serving queries in milliseconds instead of seconds or minutes. When source data changes, the acceleration layer is refreshed -- often via change data capture -- so it stays current.
This is not a new concept. Database caching, materialized views, and read replicas all address the same fundamental problem. What distinguishes modern data acceleration is that it works across heterogeneous sources (not just within a single database), integrates with federation engines for transparent query routing, and supports real-time refresh mechanisms that keep cached data fresh.
How Data Acceleration Works with Federation
Data acceleration and SQL federation are complementary patterns that address different parts of the query lifecycle.
Federation provides unified access. A single SQL query can reach PostgreSQL, S3, Databricks, and 30+ other sources through Spice's connector ecosystem. The federation engine handles connection management, dialect translation, and predicate pushdown.
Acceleration provides speed. Datasets that are queried frequently or require low latency are cached locally in a fast engine. When a query arrives for an accelerated dataset, the federation engine serves it from the local cache instead of routing it to the remote source.
The two patterns work together in a query router:
- A query arrives at the federation engine
- The engine checks whether the requested datasets are accelerated locally
- If yes, the query is served from the local acceleration engine (milliseconds)
- If no, the query is federated to the remote source (seconds to minutes, depending on the source)
This means applications get a single SQL endpoint that transparently handles both accelerated and federated queries. Developers don't need to manage separate connections or caching logic.
Acceleration Strategies
Not all acceleration is the same. The choice of engine, storage medium, and materialization scope determines the performance characteristics and resource requirements.
In-Memory vs. On-Disk Acceleration
In-memory acceleration stores data in RAM using columnar formats like Apache Arrow. This delivers the fastest query performance -- sub-millisecond scans on datasets that fit in memory -- but is limited by available RAM and volatile (data is lost on restart unless backed by a persistent store).
On-disk acceleration stores data on local SSD using embedded engines like DuckDB or SQLite. Performance is slower than in-memory (milliseconds instead of microseconds) but can handle much larger datasets and survives restarts without re-loading from the source.
The right choice depends on the workload:
- In-memory: Real-time dashboards, AI inference pipelines, embedding lookups, and any workload where single-digit-millisecond latency matters
- On-disk: Analytical queries over large datasets, batch processing, and workloads where durability matters more than raw speed
Full vs. Partial Materialization
Full materialization caches the entire dataset locally. Every row from the source table is replicated in the acceleration engine. This approach is simple and ensures every query can be served locally, but it requires enough storage to hold the full dataset and enough bandwidth to keep it synchronized.
Partial materialization caches only a subset of the data -- typically filtered by time range, partition, or access frequency. For example, an acceleration layer might cache only the last 90 days of order data, while queries for older data are federated to the source warehouse.
Partial materialization reduces storage and refresh costs but requires the query router to determine whether a given query can be served from the local cache or must be routed to the source.
Keeping Accelerated Data Fresh
Acceleration is only useful if the cached data is reasonably current. Stale acceleration caches can be worse than no acceleration, because queries return outdated results with no indication that the data is behind.
Change Data Capture (CDC) Refresh
The most effective refresh strategy uses change data capture to stream row-level changes from the source database to the acceleration layer. When a row is inserted, updated, or deleted at the source, the change event is applied to the local cache within seconds.
CDC refresh provides near-real-time freshness with minimal source impact. It works particularly well with transactional databases (PostgreSQL, MySQL) that expose write-ahead logs.
Scheduled Refresh
For sources that don't support CDC -- object stores like S3, REST APIs, or some SaaS platforms -- the acceleration layer refreshes on a schedule. The refresh interval can range from seconds to hours depending on freshness requirements.
Scheduled refresh is simpler to implement but introduces a staleness window equal to the refresh interval. A 5-minute schedule means cached data can be up to 5 minutes behind.
Append-Only Refresh
For time-series and event data that is never updated (only new rows are appended), the acceleration layer can use append-only refresh. It tracks the latest timestamp or sequence number and fetches only new rows on each refresh cycle. This is efficient because it avoids re-scanning unchanged data.
Acceleration Engines
The choice of acceleration engine determines query performance, supported SQL features, and resource requirements.
Apache Arrow
Apache Arrow is an in-memory columnar data format designed for analytical processing. It enables zero-copy reads and SIMD-optimized computation, making it one of the fastest options for scan-heavy analytical queries. Arrow is the default in-memory acceleration engine in Spice.
Arrow's main limitation is memory: the entire accelerated dataset must fit in RAM. For datasets that exceed available memory, on-disk engines are a better fit.
DuckDB
DuckDB is an embedded analytical database that stores data on disk in a columnar format. It supports a rich SQL dialect (including window functions, CTEs, and complex aggregations) and performs well on analytical workloads over datasets that are too large for in-memory processing.
Spice supports DuckDB as an on-disk acceleration engine, making it suitable for multi-gigabyte datasets where in-memory caching is not practical.
Choosing the Right Engine
The decision framework is straightforward:
- Dataset fits in memory + lowest possible latency required: Use Arrow (in-memory)
- Dataset too large for memory + complex analytical queries: Use DuckDB (on-disk)
- Mixed workloads: Use both -- accelerate hot, frequently accessed datasets with Arrow and larger, less latency-sensitive datasets with DuckDB
When to Accelerate vs. When to Federate
Not every dataset benefits from acceleration. The decision depends on query patterns, freshness requirements, and data volume.
Accelerate when:
- The dataset is queried frequently (dashboard queries, API endpoints, AI pipelines)
- Low latency is required (sub-second response times)
- The source system is slow or expensive to query (data warehouses billed per query, remote object stores)
- The dataset is small enough to cache cost-effectively
Federate without acceleration when:
- The dataset is queried infrequently (ad-hoc exploration, one-off reports)
- The source system is already fast enough for the use case
- Data freshness requirements are strict and CDC is not available
- The dataset is too large to cache practically
In practice, most production deployments use a mix: hot datasets are accelerated for performance, while the long tail of less-frequently-accessed data is queried via federation on demand.
Acceleration for AI Workloads
AI applications place unique demands on data infrastructure. Models need fast access to embeddings, features, and context data -- often with strict latency budgets measured in milliseconds.
Embedding Caches
Retrieval-augmented generation systems search vector indexes to find relevant context for LLM prompts. Accelerating the embedding store locally eliminates the network round-trip to a remote vector database, reducing retrieval latency from hundreds of milliseconds to single-digit milliseconds.
Feature Stores
Machine learning models consume feature vectors at inference time. Accelerating feature data in an in-memory engine ensures that model serving pipelines can retrieve features without blocking on slow source queries.
RAG Index Acceleration
RAG systems combine vector search with structured data retrieval. Accelerating both the vector index and the associated metadata tables ensures that the full RAG pipeline -- retrieval, context assembly, and LLM prompt construction -- runs within tight latency budgets.
The Spice Acceleration Architecture
Spice combines SQL federation and data acceleration in a single runtime. The architecture works as follows:
- Define datasets in a Spicepod configuration file, specifying the source connector and acceleration settings (engine, refresh mode, refresh interval)
- Initial load: Spice reads the full dataset from the source and loads it into the local acceleration engine
- Query routing: Incoming SQL queries are routed to the local acceleration engine for accelerated datasets, or federated to the source for non-accelerated datasets
- Refresh: CDC streams or scheduled refresh cycles keep the acceleration cache synchronized with the source
A Spicepod dataset configuration looks like:
datasets:
- from: postgres:orders
name: orders
acceleration:
engine: arrow
refresh_mode: changes
refresh_check_interval: 1sThis configuration connects to a PostgreSQL orders table, accelerates it in-memory using Arrow, and refreshes via CDC with a 1-second check interval. Queries against this dataset return in milliseconds, backed by data that is at most seconds behind the source.
For workloads that need to scale beyond a single node, Spice Cayenne provides a next-generation acceleration engine built for high-throughput data lake workloads.
Advanced Topics
Cache Eviction Strategies
When the acceleration layer has finite memory or disk capacity, it must decide which data to evict when new data arrives. The eviction strategy directly affects cache hit rates and query performance.
LRU (Least Recently Used) evicts the data that has not been accessed for the longest time. This works well when query patterns follow temporal locality -- recently accessed datasets are likely to be accessed again soon. LRU is simple to implement but can be defeated by sequential scans: a single large query that touches every cached dataset can flush the entire cache.
LFU (Least Frequently Used) evicts the data accessed the fewest times. This protects frequently accessed datasets from being evicted by one-off queries, but it can be slow to adapt when access patterns shift -- a dataset that was popular last week keeps its high frequency count even if it is no longer relevant.
TTL (Time-To-Live) evicts data based on age rather than access patterns. Each cached dataset has a configured TTL, and data is evicted (or marked for refresh) when the TTL expires. TTL-based eviction is common in acceleration layers because it directly controls freshness -- a 5-minute TTL guarantees that cached data is never more than 5 minutes stale. TTL works well in combination with CDC-based refresh, where TTL serves as a fallback eviction mechanism when CDC streams are unavailable.
In practice, most acceleration engines combine strategies. For example, Spice's acceleration layer uses CDC or scheduled refresh to keep data current (effectively a freshness-driven policy) and relies on the configured engine's memory management for capacity-based eviction.
Tiered Storage
A single acceleration engine is often not sufficient for diverse workloads. Tiered storage addresses this by placing data in different engines based on access patterns and performance requirements.
The typical tiers are:
- Hot tier (in-memory, Arrow): Datasets queried hundreds or thousands of times per minute -- embedding lookups, real-time dashboard queries, feature store reads. Sub-millisecond latency, limited by RAM.
- Warm tier (on-disk, DuckDB): Datasets queried regularly but without sub-millisecond requirements -- hourly reports, batch analytics, ad-hoc exploration over moderate-sized datasets. Millisecond-range latency, limited by SSD capacity.
- Cold tier (federated, no acceleration): Datasets queried infrequently or too large to cache. Queries are federated to the source system on demand. Latency depends on source performance.
Tiered storage can be configured statically (the operator assigns each dataset to a tier) or dynamically (the engine promotes and demotes datasets between tiers based on observed access patterns). Static assignment is simpler and more predictable. Dynamic tiering optimizes resource utilization but adds complexity in monitoring and debugging.
Consistency Models
Acceleration introduces a fundamental tradeoff between performance and consistency. The cached copy may lag behind the source, so queries against the acceleration layer may return slightly stale data.
Eventual consistency is the default model for most acceleration deployments. The acceleration layer is updated asynchronously -- via CDC or scheduled refresh -- and queries may return data that is seconds to minutes behind the source. This model is acceptable for dashboards, analytics, and most AI workloads where slight staleness does not affect correctness.
Read-your-writes consistency guarantees that if an application writes to the source and then reads from the acceleration layer, it sees its own write. Achieving this requires either synchronous refresh (the acceleration layer is updated before the write is acknowledged, which adds latency) or write-awareness (the application marks certain reads as requiring the latest data, and the acceleration layer routes those reads to the source instead of the cache).
Strong consistency guarantees that the acceleration layer always reflects the current source state. This is impractical for most acceleration deployments because it requires synchronous coordination between the source and the cache, negating the latency benefits of acceleration. When strong consistency is required, the better approach is to query the source directly via SQL federation and reserve acceleration for workloads that tolerate eventual consistency.
Data Acceleration FAQ
What is the difference between data acceleration and caching?
Traditional caching (Redis, Memcached) stores key-value pairs and requires application logic to manage cache invalidation. Data acceleration caches entire datasets in a queryable SQL engine and supports automatic refresh via CDC or scheduled sync. Acceleration preserves the full SQL interface, so applications query the acceleration layer with standard SQL rather than key-based lookups.
How fresh is accelerated data?
Freshness depends on the refresh strategy. CDC-based refresh keeps accelerated data within seconds of the source. Scheduled refresh introduces a staleness window equal to the refresh interval (e.g., a 1-minute schedule means data can be up to 1 minute behind). Append-only refresh for event data is typically near-real-time for new data.
When should I accelerate data vs. query the source directly?
Accelerate when datasets are queried frequently, low latency is critical, or the source system is slow or expensive to query. Query the source directly (via federation) when access is infrequent, the source is already fast enough, or the dataset is too large to cache cost-effectively. Most production deployments accelerate hot datasets and federate the rest.
How does data acceleration differ from materialized views?
Materialized views are a database-internal feature that precomputes and stores query results within the same database. Data acceleration caches data from external sources in a local engine, often a different engine than the source. Acceleration works across heterogeneous data sources (PostgreSQL, S3, Databricks), while materialized views are limited to a single database.
What are the cost implications of data acceleration?
Data acceleration trades compute and storage resources at the edge or application layer for reduced load and cost at the source. In-memory acceleration (Arrow) requires RAM proportional to dataset size. On-disk acceleration (DuckDB) requires local SSD storage. The cost savings come from reduced source query volume, lower data warehouse bills (for usage-based pricing), and eliminated ETL pipeline infrastructure.
Learn more about data acceleration
Guides, blog posts, and documentation on accelerating queries and keeping data fresh.
Spice.ai OSS Documentation
Learn how Spice accelerates datasets locally with Arrow and DuckDB, kept fresh via CDC and scheduled refresh.

Introducing Spice Cayenne: The Next-Generation Data Accelerator
Spice Cayenne is the next-gen data accelerator for high-scale workloads.

Getting Started with Spice.ai SQL Query Federation & Acceleration
Learn how to use Spice.ai to federate and accelerate queries.

See Spice in action
Get a guided walkthrough of how development teams use Spice to query, accelerate, and integrate AI for mission-critical workloads.
Get a demo