What is a Hybrid Data Architecture? Sidecar-Cluster

What is a Hybrid Data Architecture?

A hybrid data architecture pairs lightweight application sidecars with a centralized cluster -- sidecars handle the hot path with locally accelerated data, while the cluster manages data ingestion, refresh, and distributed compute. Think of it as a CDN for your data.

Explore the Spice platform

Read the architecture docs

Running a data layer as a sidecar alongside your application gives you the lowest possible read latency -- queries travel over loopback, not the network. But a pure sidecar model breaks down when you need centralized data ingestion, distributed queries across large datasets, or coordination of acceleration refreshes. Every sidecar must independently connect to upstream sources, manage its own refresh cycles, and handle ingestion overhead.

Running everything in a centralized cluster solves the coordination problem. A single cluster can ingest data once, manage refresh schedules, and serve distributed analytical queries. But now every read from an application pod must cross the network to the cluster, adding milliseconds of latency to every query -- unacceptable for hot-path workloads.

A hybrid data architecture combines both patterns. Application sidecars cache frequently accessed datasets locally for sub-millisecond reads. A centralized cluster handles the heavy work: data ingestion from upstream sources, acceleration and refresh, distributed compute for large queries, and hybrid search indexing. When a sidecar receives a query for data beyond its cached working set, it transparently delegates to the cluster. The application never needs to know which tier served the response.

This is the most common production topology for latency-sensitive, data-intensive applications. It separates the concerns of fast reads (sidecars) from data management (cluster) while keeping both accessible through a unified query interface.

How Hybrid Data Architecture Works

The hybrid architecture is a two-tier model. The first tier consists of lightweight sidecars deployed as containers alongside application pods -- typically in Kubernetes. The second tier is a centralized cluster that manages data pipelines, acceleration, and distributed query execution.

The Two-Tier Model

Sidecars run as pod-level containers in Kubernetes, co-located with the application. Each sidecar maintains a subset of accelerated datasets in local memory or on local disk. Because the sidecar runs on the same node as the application (or even in the same pod), queries travel over loopback -- the network hop is eliminated entirely. Sidecars start in seconds and scale horizontally with application pods.

Each sidecar is configured via a spicepod.yaml that declares which datasets to cache, which acceleration engines to use (Arrow for in-memory, DuckDB for on-disk), and which views, search indices, or AI models to load locally. The sidecar handles only caching and query serving -- it does not run data ingestion or refresh pipelines.

The cluster is a centralized deployment (single node or distributed) that handles everything the sidecars do not: connecting to upstream data sources, running ingestion pipelines, managing acceleration refresh cycles (including CDC-based refresh), executing distributed queries across large datasets, and maintaining search indices. The cluster is the authoritative data layer. It connects to data warehouses, transactional databases, object stores, and streaming platforms, then accelerates and serves that data to sidecars on demand.

Query Routing and Transparent Delegation

When a query arrives at a sidecar, the sidecar checks whether the requested data is available in its local acceleration cache. If the data is cached locally, the query is served directly -- sub-millisecond, zero network hops.

If the data is not in the sidecar's local cache -- because the dataset isn't configured for local acceleration, or because the query requires data the sidecar doesn't hold -- the sidecar transparently delegates the query to the cluster over Arrow Flight (gRPC). The cluster executes the query against its own accelerated datasets or federates it to the upstream source, then streams the results back to the sidecar. The application receives the response through the same interface, unaware of which tier served it.

This transparent delegation is what makes the pattern practical. Application code does not need conditional logic to decide where to route queries. The sidecar handles routing automatically based on its local cache state.

Cache Management and Invalidation

Sidecars do not manage their own data ingestion. Instead, the cluster handles all ingestion and refresh, then sidecars either pull updated data from the cluster on a configured schedule or receive push-based updates.

When the cluster refreshes a dataset -- whether via CDC from a PostgreSQL WAL, scheduled polling from an S3 bucket, or streaming ingestion from Kafka -- the updated data becomes available to sidecars on their next refresh cycle. The sidecar's refresh interval determines the maximum staleness of its local cache relative to the cluster.

For datasets where even seconds of staleness are unacceptable, sidecars can be configured to delegate all queries for that dataset to the cluster, using local caching only for datasets with relaxed freshness requirements. This per-dataset configuration gives operators fine-grained control over the latency-vs-freshness tradeoff.

The CDN Analogy

The hybrid sidecar-cluster pattern mirrors how content delivery networks (CDNs) work. In a CDN, an origin server holds the canonical content. Edge nodes cache popular content close to end users. When a user requests content that the edge node has cached, it serves it immediately -- low latency, no origin round-trip. When the edge node doesn't have the content, it fetches from the origin, caches it, and serves the response.

In the hybrid data architecture, the cluster is the origin server. It holds the full set of accelerated datasets, manages ingestion pipelines, and handles distributed queries. The sidecars are the edge nodes. They cache the hot working set -- the datasets and rows that the co-located application queries most frequently -- and serve them with sub-millisecond latency over loopback.

When a sidecar receives a query for data it doesn't have, it fetches from the cluster (the origin), just as a CDN edge fetches from the origin. The network cost of this delegation is higher than a local cache hit, but far lower than querying the upstream data source directly, because the cluster already has the data accelerated and ready to serve.

The analogy extends to scaling. CDNs add edge nodes without increasing load on the origin -- each node serves cached content independently. Similarly, adding sidecars does not increase load on upstream data sources. The cluster absorbs ingestion, and sidecars serve reads from their local caches. Ten sidecars or a thousand sidecars place the same load on PostgreSQL, S3, or Databricks -- only the cluster connects to those sources.

Benefits of the Hybrid Pattern

Sub-Millisecond Reads

Because sidecars run alongside the application -- on loopback in Kubernetes -- queries that hit the local cache avoid all network overhead. In-memory acceleration with Apache Arrow delivers sub-millisecond reads for cached datasets. This is critical for hot-path workloads: API serving, real-time dashboards, AI inference pipelines, and retrieval-augmented generation that need embedding lookups in single-digit milliseconds.

Centralized Data Management

Data ingestion, acceleration refresh, and pipeline orchestration happen once in the cluster, not redundantly in every sidecar. This means upstream data sources see a single connection (from the cluster), not hundreds of connections from individual sidecars. It also simplifies operational management -- refresh schedules, CDC pipelines, and schema evolution are configured and monitored in one place.

Horizontal Scalability

Sidecars scale with application pods. When Kubernetes scales an application from 3 replicas to 30, each new pod gets its own sidecar that caches the configured datasets. The cluster's load does not increase proportionally because sidecars serve most reads from their local cache. Only cache misses and refresh cycles generate cluster traffic.

This scaling model is particularly valuable for multi-tenant SaaS applications and microservice architectures where dozens or hundreds of application instances need fast access to the same datasets.

Resilience

Sidecars serve cached data even if the cluster is temporarily unavailable. If the network between a sidecar and the cluster goes down, the sidecar continues serving queries from its local cache. Queries that require delegation will fail, but cached workloads remain unaffected. When the cluster recovers, sidecars resume normal operation -- fetching updates and delegating cache misses.

This resilience model is similar to how CDN edge nodes continue serving cached content during origin outages. The sidecar's local cache acts as a buffer against cluster-level disruptions.

When to Use Hybrid Architecture

The hybrid pattern is not universally optimal. It adds architectural complexity -- two tiers to deploy, configure, and monitor. The following scenarios justify that complexity.

Real-Time + Analytical Workloads

Applications that need both sub-millisecond reads for real-time serving and distributed analytical queries across large datasets benefit from the two-tier split. Sidecars handle the real-time reads; the cluster handles the analytical workload. This separation prevents heavy analytical queries from competing with latency-sensitive application queries for the same resources.

For example, an operational data lakehouse that serves both live dashboards and batch reports can use sidecars for the dashboard queries and the cluster for the batch analytics -- all through a unified SQL interface.

Multi-Instance and Multi-Tenant Applications

When multiple application instances (microservices, API replicas, tenant-specific deployments) need fast access to the same datasets, the hybrid pattern avoids each instance independently connecting to and querying upstream sources. The cluster ingests once, and sidecars distribute the cached data across all instances.

A multi-tenant SaaS platform that runs isolated pods per tenant can deploy a sidecar in each tenant pod. Each sidecar caches the datasets relevant to that tenant, while the cluster manages the full dataset across all tenants. The tenant's queries are fast (local sidecar), and the platform's upstream sources see only the cluster's connections.

Reducing Upstream Source Load

If the priority is reducing load on upstream data sources -- a production PostgreSQL database, a rate-limited SaaS API, or a cost-per-query data warehouse -- the hybrid pattern centralizes all source access in the cluster. Sidecars never connect to upstream sources directly. This is the same principle behind CDN origin shielding: the edge never reaches the origin except through a controlled, cacheable path.

Edge Computing and Distributed Deployments

Applications running across multiple regions or at the edge benefit from the hybrid pattern when a central cluster can be deployed in a primary region and sidecars deployed alongside applications in satellite regions. Sidecars cache the working set locally, absorbing most read traffic without cross-region network hops. Delegation to the cluster handles the long tail of queries that miss the local cache.

When Hybrid Architecture Is Not Ideal

Simple single-instance applications that don't need horizontal scaling gain little from the two-tier model. A single sidecar or embedded deployment is simpler and sufficient.

Pure batch workloads with relaxed latency requirements (seconds to minutes are acceptable) can run directly against the cluster or the upstream source without needing the sidecar tier.

Unreliable networks between sidecars and cluster undermine the delegation model. If the sidecar-to-cluster connection is intermittent, queries that miss the local cache will fail unpredictably. In these scenarios, deploying full, self-contained instances (each with its own ingestion) may be more reliable.

Advanced Topics

Cache Coherency Strategies

In a distributed caching architecture, coherency -- ensuring all sidecars have a consistent view of the data -- is a design decision, not a guarantee. The hybrid pattern offers several strategies depending on the application's tolerance for staleness.

Pull-based refresh is the simplest model. Each sidecar periodically pulls the latest data from the cluster on a configured interval (e.g., every 10 seconds). This introduces a staleness window equal to the refresh interval, but it is predictable and easy to reason about. Most production deployments use this model for datasets where seconds of staleness are acceptable.

Push-based invalidation reduces staleness by having the cluster notify sidecars when data changes. When the cluster completes a refresh cycle (e.g., a CDC update), it pushes an invalidation signal to all connected sidecars. Sidecars then pull the updated data immediately rather than waiting for the next scheduled refresh. This reduces worst-case staleness from the full refresh interval to the time it takes for the invalidation-plus-pull cycle.

Delegate-on-write avoids coherency issues entirely for specific datasets by never caching them in the sidecar. All queries for those datasets are delegated to the cluster, which always has the latest data. This sacrifices sidecar-level latency for guaranteed freshness, and is appropriate for datasets where even seconds of staleness are unacceptable.

In practice, production deployments use a mix of these strategies, configured per dataset based on freshness requirements. Hot, frequently read datasets with relaxed freshness use pull-based refresh. Datasets requiring near-real-time freshness use push-based invalidation. Datasets requiring absolute freshness use delegation.

Multi-Region Deployments

The hybrid pattern extends naturally to multi-region architectures. A primary cluster runs in one region, handling ingestion, refresh, and serving as the authoritative data layer. Sidecars in other regions cache the working set locally, serving reads without cross-region latency.

For multi-region setups with stricter latency requirements, a secondary cluster can be deployed in each region, replicating data from the primary cluster. Sidecars in each region connect to their regional cluster rather than the primary, reducing delegation latency. This mirrors the CDN pattern of regional origin servers behind a global origin.

The key consideration in multi-region deployments is conflict resolution for writes. If applications in multiple regions write to the same datasets, the architecture must define how those writes are reconciled. The hybrid pattern is primarily a read-optimized architecture -- writes flow through the application's transactional database, and the cluster ingests those changes via CDC or scheduled refresh.

Sidecar Resource Budgeting

Each sidecar consumes CPU and memory on the application pod's node. In Kubernetes, this means setting resource requests and limits in the sidecar container spec that reflect the sidecar's working set.

The primary resource dimension is memory. A sidecar using Arrow in-memory acceleration requires enough RAM to hold all configured datasets. A sidecar accelerating 500 MB of datasets with Arrow needs approximately 500 MB of memory (plus overhead for query execution buffers). Sidecars using DuckDB for on-disk acceleration require less memory but need local disk space.

CPU requirements are typically modest -- sidecars serve cached data using Arrow's zero-copy reads, which require minimal CPU. CPU spikes occur during refresh cycles (loading updated data from the cluster) and during complex queries that involve local computation (joins, aggregations).

A practical budgeting approach is to start with memory equal to 1.5x the total dataset size (to account for query buffers and refresh overhead), a CPU limit of 0.5-1 core, and monitor actual usage during load testing. The sidecar's spicepod.yaml controls exactly which datasets are cached, so operators can tune the working set to fit within the resource budget.

Hybrid Data Architecture with Spice

Spice implements the hybrid sidecar-cluster pattern as its most common production deployment topology. The Spice runtime runs as both a sidecar (lightweight, caching, pod-level) and a cluster node (full-featured, ingestion, distributed compute).

Sidecars are configured via spicepod.yaml, specifying which datasets to accelerate locally, which acceleration engine to use, and the cluster endpoint for delegation:

# Sidecar spicepod.yaml
datasets:
  - from: spice.ai/cluster:orders
    name: orders
    acceleration:
      engine: arrow
      refresh_mode: full
      refresh_check_interval: 10s

  - from: spice.ai/cluster:products
    name: products
    acceleration:
      engine: duckdb
      refresh_check_interval: 60s

The cluster connects to upstream sources -- PostgreSQL, S3, Databricks, and 30+ other connectors -- handles ingestion and CDC-based refresh, and serves queries from sidecars that exceed the local cache. Communication between sidecars and the cluster uses Arrow Flight (gRPC) with mTLS encryption, ensuring data in transit is encrypted.

This architecture enables teams to build data lake acceleration layers, power AI agent workloads with sub-millisecond data access, and run federated queries across heterogeneous sources -- all through a single, Kubernetes-native deployment. The cluster can be self-managed or run on the Spice Cloud Platform for managed operations.

For a detailed deployment guide, see the hybrid architecture documentation.

Hybrid Data Architecture FAQ

What is the difference between a sidecar and a cluster in a hybrid data architecture?

A sidecar is a lightweight runtime that runs alongside an application pod, caching frequently accessed datasets locally for sub-millisecond reads. The cluster is a centralized deployment that handles data ingestion, acceleration refresh, distributed queries, and serves as the authoritative data layer. Sidecars delegate queries to the cluster when data is not available in the local cache.

How does the CDN analogy apply to data architecture?

In a CDN, edge nodes cache popular content close to users while the origin server holds the full content set. In a hybrid data architecture, sidecars act as edge nodes -- caching hot data close to the application -- while the cluster acts as the origin server, holding the complete accelerated dataset. Both patterns reduce latency by serving from the nearest cache and delegating to the origin only on cache misses.

How do sidecars stay in sync with the cluster?

Sidecars periodically pull updated data from the cluster on a configured refresh interval. The cluster manages all upstream data ingestion and refresh (including CDC-based refresh from transactional databases). Depending on the configuration, sidecars can also receive push-based invalidation signals or delegate all queries for freshness-critical datasets directly to the cluster.

Does the hybrid pattern increase load on upstream data sources?

No. The hybrid pattern reduces upstream source load because only the cluster connects to data sources -- sidecars never connect to upstream systems directly. Whether you run 5 or 500 sidecars, the upstream source sees the same number of connections and queries from the single cluster.

When should I use a single sidecar instead of the hybrid architecture?

A single sidecar (without a cluster) is simpler and sufficient for single-instance applications with small datasets, where the sidecar can handle both ingestion and caching. The hybrid pattern is justified when you need centralized data management, horizontal scaling across many application instances, or separation of real-time reads from analytical workloads.

Learn more about hybrid data architecture

Documentation and technical blog posts on deploying the sidecar-cluster pattern in production.

Docs

Hybrid Architecture Docs

Learn how to deploy Spice in a hybrid sidecar-cluster architecture for sub-millisecond reads with centralized data management.

Blog

Getting Started with Spice.ai SQL Query Federation & Acceleration

Learn how to use Spice.ai to federate and accelerate queries across operational and analytical systems with zero ETL.

Blog

How we use Apache DataFusion at Spice AI

A technical overview of how Spice extends Apache DataFusion with custom table providers, optimizer rules, and UDFs to power federated SQL, search, and AI inference.

Talk to an engineer

See Spice in action

Walk through your use case with an engineer and see how Spice handles federation, acceleration, and AI integration for production workloads.

Talk to an engineer