# Spice AI - Full Content > Spice.ai is a data and AI platform that combines federated SQL query, hybrid search, and LLM inference in a portable, open-source runtime This file contains the complete content from the Spice AI website for AI/LLM consumption. --- # Local Content ## About Us URL: https://spice.ai/about-us Date: 2025-11-19T21:05:55 Description: Learn about Spice AI's mission, team, and vision for empowering developers to build intelligent apps with unified data and AI infrastructure. Over the last 15 years, Luke has brought together the best builders and engineers across the globe to create developer-focused experiences through tools and technologies used by millions worldwide. Before founding Spice AI, Luke was the founding manager and co-creator of Azure Incubations at Microsoft, where he led cross-functional engineering teams to create and develop technologies like Dapr.

\n', }, { photo: { ID: 968, id: 968, title: 'eb22faba936070632d444bd11f59977cd866e1f8', filename: 'eb22faba936070632d444bd11f59977cd866e1f8.png', filesize: 36027, url: '/website-assets/media/2025/11/eb22faba936070632d444bd11f59977cd866e1f8.png', link: '/about-us/attachment/eb22faba936070632d444bd11f59977cd866e1f8/', alt: '', author: '5', description: '', caption: '', name: 'eb22faba936070632d444bd11f59977cd866e1f8', status: 'inherit', uploaded_to: 963, date: '2025-11-19 20:52:23', modified: '2025-11-30 23:10:57', menu_order: 0, mime_type: 'image/jpeg', type: 'image', subtype: 'jpeg', icon: '/website-assets/media/default.png', width: 606, height: 672, sizes: { thumbnail: '/website-assets/media/2025/11/eb22faba936070632d444bd11f59977cd866e1f8-150x150.png', 'thumbnail-width': 150, 'thumbnail-height': 150, medium: '/website-assets/media/2025/11/eb22faba936070632d444bd11f59977cd866e1f8-271x300.png', 'medium-width': 271, 'medium-height': 300, medium_large: '/website-assets/media/2025/11/eb22faba936070632d444bd11f59977cd866e1f8.png', 'medium_large-width': 606, 'medium_large-height': 672, large: '/website-assets/media/2025/11/eb22faba936070632d444bd11f59977cd866e1f8.png', 'large-width': 606, 'large-height': 672, '1536x1536': '/website-assets/media/2025/11/eb22faba936070632d444bd11f59977cd866e1f8.png', '1536x1536-width': 606, '1536x1536-height': 672, '2048x2048': '/website-assets/media/2025/11/eb22faba936070632d444bd11f59977cd866e1f8.png', '2048x2048-width': 606, '2048x2048-height': 672, }, }, name: 'Phillip LeBlanc', position__title: 'Founder and CTO', linkedin: 'https://www.linkedin.com/in/leblancphillip/', x__twitter_profile_url: 'https://x.com/leblancphill', paragraph: '

Phillip has spent a decade building some of the largest distributed systems and big data platforms used by millions worldwide. Before co-founding Spice AI, Phillip was both an engineering manager and IC working on distributed systems at GitHub and Microsoft. Phillip has contributed to services developers use every day, including GitHub Actions, Azure Active Directory, and Visual Studio App Center.

\n', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## 2025 Spice AI Year in Review URL: https://spice.ai/blog/2025-spice-ai-year-in-review Date: 2026-01-02T20:15:30 Description: From day one, Spice was designed to simplify building modern, intelligent applications. In 2025 that vision turned into reality. In January 2025, Spice announced 1.0 stable, marking the transition from an open-source project to an enterprise-grade, production-ready platform. Spice has shipped 35 stable releases and 11 major releases since then.

' } /> From day one, Spice was designed to simplify building modern, intelligent applications. In 2025 that vision turned into reality. Spice now serves as the data and AI substrate for global, production workloads at enterprises like Twilio and Barracuda - where mission-critical applications query, search, and reason over big data in real time.

' } /> These data and AI workloads impose fundamentally different demands on the data layer than previous generations of applications. Instead of the complexity of multiple query engines, search platforms, caches, and inference layers, Spice brings this functionality into a single, high-performance data and AI stack. Development teams can query operational databases, data lakes, analytical warehouses, and more with a single SQL interface, while taking advantage of built-in acceleration, hybrid search, and AI.

' } /> All of this is delivered by a fully open-source engine built in Rust that can be deployed anywhere - as a sidecar, at the edge, in the cloud, or in enterprise clusters. Developers have complete optionality based on their access patterns and business requirements. 

' } /> Below are some of the major features that defined 2025 across the core pillars of the Spice platform: federation and acceleration, search, and embedded LLM inference.

' } /> Major 2025 Federation & Acceleration Features' } /> SQL federation and acceleration is at the core of Spice and the applications it enables; enterprise AI applications depend on contextual data drawn from many different systems, and that data must be fast and available to search and reason over in real time.

' } /> In 2025, Spice simplified querying across disparate data sources while improving performance, scale, and reliability. The connector ecosystem also significantly expanded, enabling teams to ingest and combine data across any source.

' } />
  • Spice Cayenne data accelerator: Introduced in v1.9, Spice Cayenne is the new premier data accelerators built on the Vortex columnar format that enables low-latency, highly concurrent queries over large datasets, overcoming the scalability and memory limits of single-file accelerators like DuckDB.
  • Iceberg and Amazon S3 writes: Spice added write support for Iceberg tables (v1.8) and Amazon S3 Tables (1.10), delivering direct ingestion, transformation, and materialization of data into object storage. This simplifies writing operational data to object-stores, eliminating the need for complex and costly batch or streaming pipelines.
  • Multi-node distributed query (preview): v1.9 brought multi-node distributed query execution based on Apache Ballista, designed for querying partitioned data lake formats across multiple execution nodes for significantly improved query performance on large datasets.
  • Managed acceleration snapshots: Acceleration Snapshots enable faster restarts, shared accelerations across multiple Spice instances, reduced load on federated systems, and continued query serving even when source systems are temporarily unavailable for enterprise-grade resiliency.
  • Caching acceleration mode: A new caching mode introduced in v1.10 provides stale-while-revalidate (SWR) behavior for accelerations with background refreshes, and file-persistence with Spice Cayenne, SQLite, or DuckDB.
  • Expanded connector ecosystem: Delta Lake, S3, Databricks, Unity Catalog, AWS Glue, PostgreSQL, MySQL, Kafka, DynamoDB, Kafka, MongoDB, Iceberg and more were introduced or reached stable.
  • ' } /> Federation & Acceleration Feature Highlight: Spice Cayenne' } />
    Figure 1: The Spice Cayenne architecture, built on Vortex and SQLite
    ' } /> Spice leans into the industry shift to object storage as the source of truth for applications. These workloads are often multi-terabyte datasets using open data lake formats like Parquet, Iceberg, or Delta that must serve data and search queries to applications with sub-second performance.

    ' } /> Existing data accelerators like DuckDB, are fast and simple for datasets up to 1TB, however for multi-terabyte workloads, a new class of accelerator is required.

    ' } /> So we built Spice Cayenne, the next-generation data accelerator for high volume and latency-sensitive applications. 
    Spice Cayenne combines Vortex, the next-generation columnar file format from the Linux Foundation, with a simple, embedded metadata layer. This separation of concerns ensures that both the storage and metadata layers are fully optimized for what each does best. Cayenne delivers better performance and lower memory consumption than the existing DuckDB, Arrow, SQLite, and PostgreSQL data accelerators.

    ' } /> Spice Cayenne TPCH Benchmark
    Figure 2: Cayenne accelerated TPC-H queries 1.4x faster than DuckDB (file mode) and used nearly 3x less memory.
    ' } /> Spice Founder Luke Kim demonstrated and walked through the details of the Cayenne architecture in a December, 2025 Community Call:

    ' } /> Major 2025 Search Features'} /> AI applications are only as effective as the data they can retrieve and reason over. Beyond extracting data, they need to search across both structured and unstructured sources to surface the most relevant context at query time. In 2025, search evolved into a core primitive of the Spice platform, designed to operate natively across federated datasets.

    ' } />
  • Native Amazon S3 vectors integration: v1.5 added native support for Amazon S3 Vectors, making cost‑effective vector search on object storage a first‑class feature. Subsequent releases introduced multi-index scatter-gather, multi-column primary keys, and partitioned indexes to support scalable production workloads.
  • Reciprocal Rank Fusion (RRF): Introduced in v1.7, RRF combines vector and full-text search results with configurable weighting and recency bias by a simple SQL table-function, producing higher-quality hybrid search rankings than either approach alone.
  • Search on views (full-text and vector): Search on views enables advanced search scenarios across different search modalities over pre-aggregated or transformed data, extending the power of Spice\'s search functionality beyond base datasets.
  • Search results caching: Runtime caching for search results improves performance for subsequent searches and chat completion requests that use the document_similarity LLM tool. 
  • Table‑level search enhancements: v1.8.2 added additional_columns and where support for table relations in search, enabling multi‑table search workflows.
  • ' } /> Search Feature Highlight: Amazon S3 Vectors' } />
    Figure 3: Spice and S3 Vectors Architecture
    ' } /> In July, Spice introduced native support for Amazon S3 Vectors as a day 1 launch partner at the AWS Summit in NYC. Vector similarity search, structured filters, joins, and aggregations can now be executed in SQL within Spice without duplicating data. 

    ' } /> Developers can make vector searches using SQL or HTTP and combine similarity search with relational predicates and joins. Spice pushes filters down to S3 Vectors to minimize data scanned, delivering scalable sub-second query performance with the flexibility of SQL.

    ' } /> The Spice team presented a live demo of Spice and Amazon S3 Vectors at 2025 AWS re:Invent:

    ' } /> Major AI Features Released in 2025' } /> Spice deepened its AI capabilities by making LLM inference via SQL native within the query engine. LLMs can be invoked directly in SQL alongside federated queries, joins, and transformations, helping teams move from raw data to insights all within SQL.

    ' } />
  • AI SQL function: The AI SQL function was introduced in v1.8, supporting LLM calls directly from SQL for generation, translation and classification. Model inference can now run in the same execution path as joins, filters, search, and aggregations.
  • MCP server support: Introduced in v1.1, Spice works as both an MCP server and client. Spice can run stdio-based MCP tools internally or connect to external MCP servers over HTTP SSE and streaming.
  • Amazon Nova & Nova 2 embeddings: Support for models like Nova (v1.5.2) and Nova 2 multimodal embeddings (v1.9.1), which support high-dimensional vector representations with configurable truncation modes. 
  • Expanded model provider ecosystem: Spice added support for new providers including Anthropic, xAI, HuggingFace, Amazon Bedrock, Model2Vec static models, and more. 
  • Expanded tools ecosystem: Added native tool integrations including the OpenAI Responses API (for streaming tool calls and responses) and a Web Search tool powered by Perplexity. These tools can be invoked within the same execution context as SQL queries and model inference, enabling retrieval-augmented and agent-style workflows without external orchestration.
  • ' } /> AI Feature Highlight: AI SQL Function' } />
    Figure 4: AI SQL function example in Spice Cloud
    ' } /> The ai() SQL function enables developers to invoke LLMs directly within SQL for bulk generation, classification, translation, or analysis. Inference runs alongside joins, filters, aggregations, and search results without additional application-layer plumbing. Developers can transform federated data into structured insights extracting data, call external completion APIs, or orchestrate separate pipelines.

    ' } /> Check out a live demo of the AI SQL function here:

    '} /> Looking ahead'} /> 2025 was a major year for Spice as it grew from single-node data acceleration to a multi-node data, search, and AI platform. In 2026, Spice 2.0 will focus on bringing multi-node distributed query execution to GA, alongside continued improvements to search, acceleration, and AI primitives. These investments will help deliver even more predictable performance and operational simplicity.

    ' } /> The mission remains the same: to provide a durable, open data substrate that helps teams build and scale the next generation of intelligent, data and AI-driven applications.

    ' } /> Interested in seeing it for yourself? Get started with the open source runtime, explore pricing, or try the cloud platform today.

    ' } /> ## Frequently Asked Questions ### What is Spice.ai? Spice.ai is a portable, open-source data and AI compute engine built in Rust. It provides [SQL federation and acceleration](/platform/sql-federation-acceleration), [hybrid search](/platform/hybrid-sql-search), and [LLM inference](/platform/llm-inference) in a single runtime that can be [deployed anywhere](/feature/edge-to-cloud-deployments) from edge to cloud. ### What were the biggest Spice releases in 2025? Spice shipped 35 stable releases and 11 major releases in 2025. Key milestones included the 1.0-stable production release, the Spice Cayenne [data accelerator](/use-case/datalake-accelerator) built on the Vortex columnar format, native Amazon S3 Vectors integration, multi-node distributed query execution, and the AI SQL function for invoking LLMs directly from SQL. ### How does Spice compare to a traditional data warehouse? Unlike centralized data warehouses that require data movement, Spice federates queries across databases, data lakes, and warehouses in place. It materializes working sets locally for sub-millisecond performance and is designed for application serving rather than batch analytics. See [pricing](/pricing) for deployment options.

    '} /> --- ## A Developer's Guide to Understanding Spice.ai URL: https://spice.ai/blog/a-developers-guide-to-understanding-spice-ai Date: 2026-02-05T22:12:21 Description: Learn what Spice.ai is, when to use it, and how it solves enterprise data challenges. A developer-focused guide to federation, acceleration, search, and AI. TL;DR '} /> This hands-on guide is designed to help developers quickly build an understanding of Spice: what it is (an AI-native query engine that federates queries, accelerates data, and integrates search and AI), when to use it (data-intensive applications and AI agents), and how it can be leveraged to solve enterprise-scale data challenges.  

    ' } /> *Note: This guide was last updated on February 5, 2026. Please see the docs for the latest updates.  

    ' } /> Who this guide is for ' } /> This guide is for developers who want to understand why, how, and when to use Spice.ai.  

    ' } /> If you are new to Spice, you might also be wondering how Spice is different than other query engines or data and AI platforms. Most developers exploring Spice are generally doing one of the following: 

    ' } />
  • Operationalizing data lakes for real-time queries and search 
  • ' } />
  • Building applications that need fast access to disparate data 
  • ' } />
  • Building AI applications and agents that need fast, secure context 
  • ' } /> Let's start with the problem Spice is solving to anchor the discussion.  

    " } /> The problem Spice solves ' } /> Modern applications face a distributed data challenge. 

    ' } /> Enterprise data is spread across operational databases, data lakes, warehouses, third-party APIs, and more. Each source has its own interface, latency characteristics, and access patterns. 

    ' } /> AI workloads amplify the problem. RAG applications generally require: 

    ' } />
  • A vector database (e.g. Pinecone, Weaviate) for embeddings 
  • ' } />
  • A text search engine (e.g. Elasticsearch) for keyword matching 
  • ' } />
  • A cache layer (e.g. Redis) for performance & latency 
  • ' } />
  • Model hosting and serving (OpenAI, Anthropic) for LLM inference 
  • ' } />
  • Orchestration code and services to coordinate everything 
  • ' } /> This can be a lot of complexity, even for a simple application. 

    ' } /> What is Spice? ' } /> Spice is an open-source SQL querysearch, and LLM-inference engine written in Rust, purpose-built for data-driven applications and AI agents. At its core, Spice is a high-performance compute engine that federates, searches, and processes data across your existing infrastructure - querying & accelerating data where it lives and integrating search and AI capabilities through SQL. 

    ' } />
    Figure 1. Spice.ai architecture
    ' } /> Unlike databases that require migrations & maintenance, Spice takes a declarative configuration approach: datasets, views, models, tools are defined in declarative YAML, and Spice handles the operations of fetching, caching, and serving that data.  

    ' } /> This makes Spice ideal when: 

    '} />
  • Your application needs fast, unified access to disparate data sources 
  • ' } />
  • You want simplicity and to avoid building and maintaining ETL pipelines 
  • ' } />
  • You want an operational data lake house for applications and agents 
  • ' } />
  • You need sub-second query performance without ETL 
  • ' } /> What Spice is not: 

    '} />
  • Not a replacement for PostgreSQL or MySQL (use those for transactional workloads) 
  • ' } />
  • Not a data warehouse (use Snowflake/Databricks for centralized analytics) 
  • ' } /> Mental model: Spice as a data and AI substrate ' } /> Think of Spice as the operational data & AI layer between your applications and your data infrastructure. 

    ' } />
    Figure 2. Spice as the data substrate for data-intensive AI apps
    ' } /> How this guide works 

    '} /> We'll start with a hands-on quickstart to get Spice running, then progressively build your mental model through the core concepts: 

    " } />
  • Federation 
  • '} />
  • Acceleration 
  • ' } />
  • Views 
  • '} />
  • Caching 
  • '} />
  • Snapshots 
  • '} />
  • Models 
  • '} />
  • Search 
  • '} />
  • Writes 
  • '} /> By the end, you'll understand how these primitives are used together to solve enterprise-scale data challenges.

    " } /> Quickstart ' } /> To install and get Spice started, run: 

    '} /> ```bash curl https://install.spiceai.org | /bin/bash ``` Or using Homebrew: 

    '} /> ```bash brew install spiceai/spiceai/spice ``` Next, in any folder, create a spicepod.yaml file with the following content: 

    ' } /> ```yaml version: v1 kind: Spicepod name: my_spicepod datasets: - from: s3://spiceai-demo-datasets/taxi_trips/2024/ name: taxi_trips ``` In the same folder, run:

    '} /> ```bash spice run ``` And, finally, in another terminal, run:

    '} /> ```bash > spice sql Welcome to the Spice.ai SQL REPL! Type 'help' for help. show tables; -- list available tables sql> show tables; +--------------+---------------+--------------+-------------+ | table_catalog | table_schema | table_name | table_type | +--------------+---------------+--------------+-------------+ | spice | runtime | task_history | BASE TABLE | | spice | public | taxi_trips | BASE TABLE | +--------------+---------------+--------------+-------------+ Time: 0.010767 seconds. 2 rows. sql> select count(*) from taxi_trips ; +----------+ | count(*) | +----------+ | 2964624 | +----------+ ``` Understanding what just happened  ' } /> In that quickstart, you: 

    '} />
  • Configured a dataset (taxi_trips) pointing to a remote S3 bucket 
  • ' } />
  • Started the Spice runtime, which connected to that source 
  • ' } />
  • Queried the data using standard SQL - without moving or copying it. 
  • ' } /> Spice.ai Cloud Platform ' } /> You can run the same Spicepod configuration in Spice.ai Cloud, the fully managed version of Spice that extends the open-source runtime with enterprise capabilities: built-in observability, elastic scaling, and team collaboration. 

    ' } /> Core Concepts ' } /> 1. Federation ' } /> In the quickstart, you queried taxi_trips stored in a remote S3 bucket using standard SQL without copying or moving that data. That's federation in action - querying data where it lives, not where you've moved it to. 

    " } /> This is foundational to Spice\'s architecture. Federation in Spice enables you to query data across multiple heterogeneous sources using a single SQL interface, without moving data or building ETL pipelines. 

    ' } /> Traditional approaches force you to build ETL pipelines that extract data from these sources, transform it, and load it into a centralized database or warehouse. Every new data source means building and maintaining another pipeline. 

    ' } /> Spice connects directly to your existing data sources and provides a unified SQL interface across all of them. You configure datasets declaratively in YAML, and Spice handles the connection, query translation, and result aggregation.  

    ' } /> Spice supports query federation across: 

    '} />
  • Databases: PostgreSQL, MySQL, Microsoft SQL Server, Oracle, MongoDB, ClickHouse, DynamoDB, ScyllaDB 
  • ' } />
  • Data Warehouses: Snowflake, Databricks, BigQuery 
  • ' } />
  • Data Lakes: S3, Azure Blob Storage, Delta Lake, Apache Iceberg 
  • ' } />
  • Other Sources: GitHub, GraphQL, FTP/SFTP, IMAP, Kafka, HTTP/API, and 30+ more connectors 
  • ' } />
    Figure 3. Spice Federation (and acceleration) architecture
    ' } /> How it works ' } /> When you configure multiple datasets from different sources, Spice\'s query planner (built on Apache DataFusion) optimizes and routes queries appropriately: 

    ' } /> ```yaml datasets: # From PostgreSQL - from: postgres:customers name: customers params: pg_host: db.example.com pg_user: ${secrets:PG_USER} # From S3 Parquet files - from: s3://bucket/orders/ name: orders params: file_format: parquet # From Snowflake - from: snowflake:analytics.sales name: sales ``` ```sql -- Query across all three sources in one statement SELECT c.name, o.order_total, s.region FROM customers c JOIN orders o ON c.id = o.customer_id JOIN sales s ON o.id = s.order_id WHERE s.region = 'EMEA'; ``` Without additional configuration, each query fetches data directly from the underlying sources. Spice optimizes this as much as possible using filter pushdown and column projection.  

    ' } /> 📚 DocsSpice Federation and Data Connectors 

    ' } /> 2. Acceleration ' } /> Federation solves the data movement problem, but alone often isn't enough for production applications. Querying remote S3 buckets for every request introduces latency - even with query pushdown and optimization, round-trips to distributed data sources can take seconds (or tens of seconds) for large datasets. 

    " } />
    Figure 4. Acceleration example in a Spice sidecar architecture
    ' } /> Spice data acceleration materializes working sets of data locally, reducing query latency from seconds to milliseconds. When enabled, Spice syncs data from connected sources and stores it in local stores, like DuckDB or Vortex - giving you the speed of local data with the flexibility of federated access. 

    ' } /> You can think of acceleration as an intelligent caching layer that understands your data access patterns. Hot data gets materialized locally for instant access and cold data remains federated. Unlike traditional caches that just store query results or static database materializations, Spice accelerates entire datasets with configurable refresh strategies, with the flexible compute of an embedded database.  

    ' } /> Acceleration Engines ' } />
    Engine Mode Best For 
    Arrow In-memory only Ultra-fast analytical queries, ephemeral workloads 
    DuckDB Memory or file General-purpose OLAP, medium datasets, persistent storage 
    SQLite Memory or file Row-oriented lookups, OLTP patterns, lightweight deployments 
    Cayenne File only High-volume multi-file workloads, terabyte-scale data 
    ' } /> To enable acceleration, add the acceleration block to your dataset configuration: 

    ' } /> ```yaml datasets: - from: s3://data-lake/events/ name: events acceleration: enabled: true engine: cayenne # Choose your engine mode: file # 'memory' or 'file' ``` With this configuration, Spice fetches the events dataset from S3 and stores it in a local Spice Cayenne Vortex files. Queries to events are then served from the local disk instead of making remote calls to S3.  

    ' } />
    Figure 5. Spice Cayenne architecture
    ' } /> While DuckDB and SQLite are general purpose engines, Spice Cayenne is purpose-built for modern data lake workloads. It\'s built on Vortex - a next-generation columnar format under the Linux Foundation - designed for the scale and access patterns of object storage. 

    ' } /> Learn more: Introducing the Spice Cayenne Data Accelerator  

    ' } /> 📚 DocsData Accelerators 

    ' } /> Refresh Modes ' } /> Spice offers multiple strategies for keeping accelerated data synchronized with sources: 

    ' } />
    Mode Description Use Case 
    full Complete dataset replacement on each refresh Small, slowly-changing datasets 
    append (batch) Adds new records based on a time column Append-only logs, time-series data 
    append (stream) Continuous streaming without time column Real-time event streams 
    changes CDC-based incremental updates via Debezium or DynamoDB Frequently updated transactional data 
    caching Request-based row-level caching API responses, HTTP endpoints 
    ' } /> ```yaml # Full refresh every 8 hours acceleration: refresh_mode: full refresh_check_interval: 8h # Append mode: check for new records from the last day every 10 minutes acceleration: refresh_mode: append time_column: created_at refresh_check_interval: 10m refresh_data_window: 1d # Continuous ingestion using Kafka acceleration: refresh_mode: append # CDC with Debezium or DynamoDB Streams acceleration: refresh_mode: changes ``` 📚 DocsRefresh Modes 

    ' } /> Retention Policies ' } /> While refresh modes control how acceleration is populated, retention policies prevent unbounded growth. As data continuously flows into an accelerated dataset-especially in append or streaming modes-storage can grow indefinitely. Retention policies automatically evict stale data using time-based or custom SQL strategies. 
     
    Retention is particularly useful for time-series workloads like logs, metrics, and event streams where only recent data is relevant for queries. For example, an application monitoring dashboard might only need the last 7 days of logs for troubleshooting, while a real-time analytics pipeline processing IoT sensor data might retain just 24 hours of readings. By defining retention policies, you ensure accelerated datasets stay bounded and performant without manual intervention. 
     
    Spice supports two retention strategies: time-based, which removes records older than a specified period, and custom SQL-based, which executes arbitrary DELETE statements for more complex eviction logic. Once defined, Spice runs retention checks automatically at the configured interval: 

    ' } /> ```yaml acceleration: # Common retention parameters retention_check_enabled: true retention_check_interval: 1h # Time-based retention policy retention_period: 7d # Custom SQL-based Retention retention_sql: "DELETE FROM logs WHERE status = 'archived'" ``` 📚 DocsRetention 

    ' } /> Constraints and Indexes ' } /> Accelerated datasets support primary key constraints and indexes for optimized query performance and data integrity: 

    ' } /> ```yaml datasets: - from: postgres:orders name: orders acceleration: enabled: true engine: duckdb primary_key: order_id # Creates non-null unique index indexes: customer_id: enabled # Single column index '(created_at, status)': unique # Multi-column unique index ``` 📚 DocsConstraints & Indexes 

    ' } /> 3. Views '} /> Views are virtual tables defined by SQL queries - useful for pre-aggregations, transformations, and simplified access patterns: 

    ' } /> ```yaml views: - name: daily_revenue sql: | SELECT DATE_TRUNC('day', created_at) as day, SUM(amount) as revenue, COUNT(*) as transactions FROM orders GROUP BY 1 - name: top_customers sql: | SELECT customer_id, SUM(total) as lifetime_value FROM orders GROUP BY customer_id ORDER BY lifetime_value DESC LIMIT 100 ``` 📚 DocsViews 

    ' } /> 4. Caching ' } /> Spice provides in-memory caching for SQL query results, search results, and embeddings - all enabled by default. Caching eliminates redundant computation for repeated queries and improves performance for non-accelerated datasets. 

    ' } /> ```yaml runtime: caching: sql_results: enabled: true cache_max_size: 128MiB eviction_policy: lru item_ttl: 1s encoding: none search_results: enabled: true cache_max_size: 128MiB eviction_policy: lru item_ttl: 1s encoding: none embeddings_results: enabled: true cache_max_size: 128MiB eviction_policy: lru item_ttl: 1s encoding: none ```
    Option  Description  Default  
    cache_max_size  Entry expiration duration  128 MiB  
    item_ttl  Maximum cache storage  1 second  
    eviction_policy  `lru` (least-recently-used) or `tiny_lfu`  lru  
    encoding  Compression: `zstd` or `none`  none  
    ' } /> Spice also supports HTTP cache-control headers (no-cache, max-stale, only-if-cached) for fine-grained control over caching behavior per request. 

    ' } /> 📚 DocsResults Caching 

    ' } /> 5. Snapshots ' } /> Snapshots allow file-based acceleration engines (DuckDB, SQLite, or Cayenne) to bootstrap from pre-stored snapshots in object storage. This dramatically reduces cold-start latency in distributed deployments. 

    ' } /> ```yaml snapshots: enabled: true location: s3://large_table_snapshots datasets: - from: postgres:large_table name: large_table acceleration: engine: duckdb mode: file snapshots: enabled ``` Snapshot triggers vary by refresh mode: 

    '} />
  • refresh_complete: Creates snapshots after each refresh (full and batch-append modes) 
  • ' } />
  • time_interval: Creates snapshots on a fixed schedule (all refresh modes) 
  • ' } />
  • stream_batches: Creates snapshots after every N batches (streaming modes: Kafka, Debezium, DynamoDB Streams) 
  • ' } /> 📚 DocsSnapshots 

    ' } /> 6. Models '} /> AI is a first-class capability in the Spice runtime - not a bolt-on integration. Instead of wiring external APIs, you call LLMs directly from SQL queries using the `ai()` function. Embeddings generate automatically during data ingestion, eliminating separate pipeline infrastructure. Text-to-SQL is schema-aware with direct data access, preventing the hallucinations common in external tools that don't understand your table structure.  

    " } /> This SQL-first approach means you can query your federated and accelerated data, pipe results to an LLM for analysis, and get synthesized answers in a single SQL statement.  

    ' } /> You can connect to hosted providers (OpenAI, Anthropic, Bedrock) or serve models locally with GPU acceleration. Spice provides an OpenAI-compatible AI Gateway, so existing applications using OpenAI SDKs can swap endpoints without code changes. 

    ' } /> Chat Models ' } /> Connect to hosted models or serve locally: 

    '} /> ```yaml models: - name: gpt4 from: openai:gpt-4o params: openai_api_key: ${secrets:OPENAI_API_KEY} tools: auto # Enable tool use - name: claude from: anthropic:claude-3-5-sonnet params: anthropic_api_key: ${secrets:ANTHROPIC_KEY} - name: local_llama from: huggingface:huggingface.co/meta-llama/Llama-3.1-8B ``` Use via the OpenAI-compatible API or the spice chat CLI: 

    ' } /> ```bash $ spice chat Using model: gpt4 chat> How many orders were placed last month? Based on the orders table, there were 15,234 orders placed last month. ``` NSQL (Text-to-SQL) ' } /> The /v1/nsql endpoint converts natural language to SQL and executes it: 

    ' } /> ```bash curl -XPOST "http://localhost:8090/v1/nsql" \ -H "Content-Type: application/json" \ -d '{"query": "What was the highest tip any passenger gave?"}' ``` Spice uses tools like table_schema, random_sample, and sample_distinct_columns to help models write accurate, contextual SQL. 

    ' } /> Embeddings ' } /> Transform text into vectors for similarity search. These embeddings power the vector search capabilities covered in the \'search\' section coming up next: 

    ' } /> ```yaml embeddings: - name: openai_embed from: openai:text-embedding-3-small params: openai_api_key: ${secrets:OPENAI_API_KEY} - name: bedrock_titan from: bedrock:amazon.titan-embed-text-v2:0 params: aws_region: us-east-1 - name: local_minilm from: huggingface:sentence-transformers/all-MiniLM-L6-v2 ``` Configure columns for automatic embedding generation: 

    '} /> ```yaml datasets: - from: postgres:documents name: documents acceleration: enabled: true columns: - name: content embeddings: - from: openai_embed chunking: enabled: true target_chunk_size: 512 ``` 📚 DocsModels & Embeddings 

    ' } /> 7. Search ' } /> In the previous section, we configured embeddings to generate automatically during data ingestion. Those embeddings enable vector search - one of three search methods Spice provides as native SQL functions. 

    ' } /> Spice takes the same integrated approach with search as it does with AI. Search indexes are built on top of accelerated datasets - the same data you're querying and piping to LLMs. Full-text search uses Tantivy with BM25 scoring for keyword matching. Vector search uses the embeddings you've already configured to generate during ingestion. Hybrid search combines both methods with Reciprocal Rank Fusion (RRF) to merge rankings - all via SQL functions like`text_search()`, `vector_search()`, and `rrf()`. Search in Spice powers retrieval-augmented generation (RAG), recommendation systems, and content discovery: 

    " } />
    Method Best For How It Works 
    Full-Text Search  Keyword matching, exact phrases  BM25 scoring via Tantivy  
    Vector Search  Semantic similarity, meaning-based retrieval  Embedding distance calculation  
    Hybrid Search  Queries with both keywords and semantic similarityHybrid execution and ranking through Reciprocal Rank Fusion (RRF)  
    ' } /> Full-Text Search ' } /> Full-text search performs keyword-driven retrieval optimized for text data. Powered by Tantivy with BM25 scoring, it excels at finding exact phrases, specific terms, and keyword combinations. Enable it by indexing the columns you want to search: 

    ' } /> ```yaml datasets: - from: postgres:articles name: articles acceleration: enabled: true columns: - name: title full_text_search: enabled - name: body full_text_search: enabled ``` ```sql SELECT * FROM text_search(articles, 'machine learning', 10); ``` Vector Search ' } /> Vector search uses embeddings to find documents based on semantic similarity rather than exact keyword matches. This is particularly useful when users search with different wording than the source content-a query for "how to fix login issues" can match documents about "authentication troubleshooting." 
     
    Spice supports both local embedding models (like sentence-transformers from Hugging Face) and remote providers (OpenAI, Anthropic, etc.). Embeddings are configured as top-level components and referenced in dataset columns:

    ' } /> ```yaml datasets: - from: s3://docs/ name: documents vectors: enabled: true columns: - name: body embeddings: - from: openai_embed ``` ```sql SELECT * FROM vector_search (documents, 'How do I reset my password?', 10) WHERE category = 'support' ORDER BY score; ``` Vector search is also available via the `/v1/search` HTTP API for direct integration with applications. 

    ' } /> Hybrid Search with RRF ' } /> Neither vector nor full-text search alone produces optimal results for every query. A search for "Python error 403" benefits from both semantic understanding ("error" relates to "exception," "failure") and exact keyword matching ("403," "Python"). Hybrid search combines results from multiple search methods using Reciprocal Rank Fusion (RRF), merging rankings to improve relevance across diverse content types: 

    ' } /> ```sql SELECT * FROM rrf( vector_search(docs, 'query', 10), text_search(docs, 'query', 10) ) LIMIT 10; ``` 📚 DocsSearch & Vector Search 

    ' } /> 8. Writing Data ' } /> Spice supports writing to Apache Iceberg tables and Amazon S3 Tables via standard INSERT INTO statements. 

    ' } /> Apache Iceberg Writes ' } /> ```yaml catalogs: - from: iceberg:https://glue.us-east 1.amazonaws.com/iceberg/v1/catalogs/123456/namespaces name: ice access: read_write datasets: - from: iceberg:https://catalog.example.com/v1/namespaces/sales/tables/transactions name: transactions access: read_write ``` ```sql -- Insert from another table INSERT INTO transactions SELECT * FROM staging_transactions; -- Insert with values INSERT INTO transactions (id, amount, timestamp) VALUES (1001, 299.99, '2025-01-15'); -- Insert into catalog table INSERT INTO ice.sales.orders SELECT * FROM federated_orders; ``` Amazon S3 Tables ' } /> Spice offers full read/write capability for Amazon S3 Tables, enabling direct integration with AWS\' managed table format for S3: 

    ' } /> ```yaml datasets: - from: glue:my_namespace.my_table name: my_table params: glue_region: us-east-1 glue_catalog_id: 123456789012:s3tablescatalog/my-bucket access: read_write ``` Note: Write support requires access: read_write configuration.

    ' } /> 📚 DocsWrite-Capable Connectors 

    ' } /> Deployment ' } /> Spice is designed for deployment flexibility and optionality - from edge devices to multi-node distributed clusters. It ships as a single file ~140MB binary with no external dependencies beyond your configured data sources.  

    ' } /> This portability means you can deploy the same Spicepod configuration on a Raspberry Pi at the edge, as a sidecar in your Kubernetes cluster, or as a fully-managed cloud service - without code changes: 

    ' } />
    Deployment Model Description Best For 
    Standalone Single instance via Docker or binary Development, edge devices, simple workloads 
    Sidecar Co-located with your application pod Low-latency access, microservices architectures 
    Microservice Multiple replicas deployed behind a load balancer Loosely couple architectures, heavy or varying traffic 
    Cluster Distributed multi-node deployment Large-scale data, horizontal scaling, fault tolerance 
    Sharded Horizontal data partitioning across multiple instances Large scale data, distributed query execution 
    Tiered Hybrid approach combining sidecar for performance and shared microservice for batch processing Varying requirements across different application components 
    Cloud Fully-managed cloud platform  Auto-scaling, built-in observability, zero operational overhead.  
    ' } /> Putting it all together ' } /> Spice makes data fast, federated, and AI-ready - through configuration, not code. The flexibility of this architecture means you can start simple and evolve incrementally.  

    ' } />
    Concept Purpose 
    Federation Query 30+ sources with unified SQL 
    Acceleration Materialize data locally for sub-second queries 
    Views Virtual tables from SQL transformations 
    Snapshots Fast cold-start from object storage 
    Models Chat, NSQL, and embeddings via OpenAI-compatible API 
    Search Full-text and vector search integrated in SQL 
    Writes INSERT INTO for Iceberg and Amazon S3 tables 
    ' } /> What can you build with Spice? ' } />
    Use Case How Spice Helps 
    Operational Data Lakehouse  Serve real-time operational workloads and AI agents directly from Apache Iceberg, Delta Lake, or Parquet with sub-second query latency. Spice federates across object storage and databases, accelerates datasets locally, and integrates hybrid search and LLM inference - eliminating separate systems for operational access. 
    Data lake Accelerator Accelerate data lake queries from seconds to milliseconds by materializing frequently-accessed datasets in local engines. Maintain the scale and cost efficiency of object storage while delivering operational-grade query performance with configurable refresh policies. 
    Data Mesh Unified SQL access across distributed data sources with automatic performance optimization 
    Enterprise Search Combine semantic and full-text search across structured and unstructured data 
    RAG Pipelines Merge federated data with vector search and LLMs for context-aware AI applications 
    Real-Time Analytics Stream data from Kafka or DynamoDB with sub-second latency into accelerated tables 
    Agentic AI Build autonomous agents with tool-augmented LLMs and fast access to operational data 
    ' } /> Whether you're replacing complex ETL pipelines, building AI-powered applications, or deploying intelligent agents at the edge-Spice provides the primitives to deliver fast, context-aware access to data wherever it lives. 

    " } /> 📚 DocsUse Cases 

    ' } /> Next steps  ' } /> Now that you have a mental model for Spice, check out the cookbook recipes for 80+ examples, the GitHub repo, the full docs, and join us on Slack to connect directly with the team and other Spice users.

    ' } /> And, remember these principles: 

    ' } />
  • Spice is a runtime, not a database: It federates across your existing data infrastructure 
  • ' } />
  • Configuration over code: Declarative YAML replaces custom integration code 
  • ' } />
  • Acceleration is optional but powerful: Start with federation, add acceleration for latency-sensitive use cases  
  • ' } />
  • Composable primitives: Federation + Acceleration + Search + LLM Models work together 
  • ' } />
  • SQL-first: Everything accessible through standard SQL queries 
  • ' } /> ## Frequently Asked Questions ### What is Spice.ai used for? Spice.ai is a data infrastructure platform that provides SQL query federation, data acceleration, hybrid search, and LLM inference in a single runtime. Development teams use it to build data-intensive applications and AI agents that need sub-second access to data across distributed sources -- without building custom ETL pipelines or managing multiple systems. ### How is Spice different from a data warehouse like Snowflake or Databricks? Data warehouses require loading data before querying it and are optimized for batch analytics. Spice [federates queries](/platform/sql-federation-acceleration) across data sources in place, accelerates hot datasets locally, and serves results at application-grade latency (sub-millisecond). It's designed for production application serving rather than analyst-facing dashboards. ### What programming languages work with Spice? Spice exposes standard HTTP, Arrow Flight, Arrow Flight SQL, ODBC, and JDBC APIs. Any language with an HTTP client or Arrow Flight library can query Spice -- including Python, Go, Rust, TypeScript, Java, and .NET. OpenAI-compatible APIs are also available for [LLM inference](/platform/llm-inference) workloads. ### Can Spice be deployed at the edge or on-premises? Yes. Spice is a ~140 MB single binary that can be deployed as a standalone process, Kubernetes sidecar, microservice, or multi-node cluster. It runs on cloud, on-premises, and edge environments. A [Kubernetes Operator](https://spiceai.org/docs/deployment/kubernetes) is available for high-availability cluster deployments. ### Is Spice AI open source? Spice AI has an open-source core licensed under Apache 2.0, available at [github.com/spiceai/spiceai](https://github.com/spiceai/spiceai). [Spice Cloud](/pricing) adds enterprise features including SSO, RBAC, audit logs, SLAs, and managed infrastructure.

    '} /> --- ## A New Class of Applications That Learn and Adapt URL: https://spice.ai/blog/a-new-class-of-applications-that-learn-and-adapt Date: 2021-12-30T18:08:39 Description: Explore the history of decision engines and how modern machine learning enables applications that learn, adapt, and make better decisions over time with Spice.ai. A new class of applications that learn and adapt is becoming possible through machine learning (ML). These applications learn from data and make decisions to achieve the application\'s goals. In the post Making apps that learn and adapt, Luke described how developers integrate this ability to learn and adapt as a core part of the application\'s logic. You can think of the component that does this as a "decision engine." This post will explore a brief history of decision engines and use-cases for this application class.

    ' } /> History of decision engines' } /> The idea to make intelligent decision-making applications is not new. Developers first created these applications around the 1970s1, and they are some of the earliest examples of using artificial intelligence to solve real-world problems.

    ' } /> The first applications used a class of decision engines called "expert systems". A distinguishing trait of expert systems is that they encode human expertise in rules for decision-making. Domain experts created combinations of rules that powered decision-making capabilities.

    ' } /> Some uses of expert systems include:

    '} />
  • Fault diagnosis
  • "Smart" operator and troubleshooting manual
  • Recovery from extreme conditions
  • Emergency shutdown
  • ' } /> However, the resources required to build expert systems make employing them infeasible for many applications2. They often need a significant time and resource investment to capture and encode expertise into complex rule sets. These systems also do not automatically learn from experience, relying on experts to write more rules to improve decision-making.

    ' } /> With the advent of modern deep-learning techniques and the ability to access significantly more data, it is now possible for the computer, not only the developer, to learn and encode the rules to power a decision engine and improve them over time. The vision for Spice.ai is to make it easy for developers to build this new class of applications. So what are some use-cases for these applications?

    ' } /> Use cases of decision-making applications' } /> Reduce energy costs by optimizing air conditioning' } /> Today: The air conditioning system for an office building runs on a fixed schedule and is set to a fixed temperature in business hours, only adjusting using in-room sensor data, if at all. This behavior potentially over cools at business close as the outside temperature lowers and the building starts vacating.

    ' } /> With Spice.ai: Using Spice.ai, the application combines time-series data from multiple data sources, including the time of day and day of the week, building/room occupancy, and outside temperature, energy consumption, and pricing. The A/C controller application learns how to adjust the air conditioning system as the room naturally cools towards the end of the day. As the occupancy decreases, the decision engine is rewarded for maintaining the desired temperature and minimizing energy consumption/cost.

    ' } /> Food delivery order dispatching' } /> Today: Customers order food delivery with a mobile app. When the order is ready to be picked up from the restaurant, the order is dispatched to a delivery driver by a simple heuristic that chooses the nearest available driver. As the app gets more popular with customers and the number of restaurants, drivers, and customers increases, the heuristic needs to be constantly tuned or supplemented with human operators to handle the demand.

    ' } /> With Spice.ai: The application learns which driver to dispatch to minimize delivery time and maximize customer star ratings. It considers several factors from data, including patterns in both the restaurant and driver's order histories. As the number of users, drivers, and customers increases over time, the app adapts to keep up with the changing patterns and demands of the business.

    " } /> Routing stock or crypto trades to the best exchange' } /> Today: When trading stocks through a broker like Fidelity or TD Ameritrade, your broker will likely route your order to an exchange like the NYSE. And in the emerging world of crypto, you can place your trade or swap directly on a decentralized exchange (DEX) like Uniswap or Pancake Swap. In both cases, the routing of orders is likely to be either a form of traditional expert system based upon rules or even manually routed.

    ' } /> With Spice.ai: A smart order routing application learns from data such as pending transactions, time of day, day of the week, transaction size, and the recent history of transactions. It finds patterns to determine the most optimal route or exchange to execute the transaction and get you the best trade.

    ' } /> Summary'} /> A new class of applications that can learn and adapt are made possible by integrating AI-powered decision engines. Spice.ai is a decision engine that makes it easy for developers to build these applications.

    ' } /> If you\'d like to partner with us in creating this new generation of intelligent decision-making applications, we invite you to join us on Slack, or reach out on Twitter.

    ' } /> Phillip

    '} /> Footnotes'} /> Kendal, S. L., & Creen, M. (2007). An introduction to knowledge engineering. London: Springer. ISBN 978-1-84628-475-5 

    ' } /> Russell, Stuart; Norvig, Peter (1995). Artificial Intelligence: A Modern Approach. Simon & Schuster. pp. 22-23. ISBN 978-0-13-103805-9. 

    ' } /> --- ## Adding Spice - The Next Generation of Spice.ai OSS URL: https://spice.ai/blog/adding-spice-the-next-generation-of-spice-ai-oss Date: 2024-03-28T18:57:06 Description: Learn how Spice.ai OSS was rebuilt in Rust to deliver fast, local SQL queries across databases, warehouses, and data lakes. TL;DR: We\'ve rebuilt Spice.ai OSS from the ground up in Rust, as a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets sourced from any database, data warehouse or data lake. Learn more at github.com/spiceai/spiceai.

    ' } /> In September, 2021, we introduced Spice.ai OSS as a runtime for building AI-driven applications using time-series data.

    ' } /> We quickly ran into a big problems in making these applications work... data, the fuel for intelligent software, was painfully difficult to access, operationalize, and use, not only in machine learning, but also in web frontends, backend applications, dashboards, data pipelines, and notebooks. And we had to make hard tradeoffs between cost and query performance.

    ' } /> We felt this pain every day building 100TB+ scale data and AI systems for the Spice.ai Cloud Platform. So we took our learnings and infused them back into Spice.ai OSS with the capabilities we wished we had.

    ' } /> We rebuilt Spice.ai OSS from the ground up in Rust, as a unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse or data lake.

    ' } /> Spice OSS Stack
    Figure 1: Spice OSS Architecture
    ' } /> Spice is a fast, lightweight (< 150 MB), single binary, designed to be deployed alongside your application, dashboard, and within your data or machine learning pipelines. Spice federates SQL queries across databases (MySQL, PostgreSQL, etc.), data warehouses (Snowflake, BigQuery, etc.) and data lakes (S3, MinIO, Databricks, etc.) so you can easily use and combine data wherever it lives. Datasets, declaratively defined, can be materialized and accelerated using your engine of choice, including DuckDB, SQLite, PostgreSQL, and in-memory Apache Arrow records, for ultra-fast, low-latency query. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.

    ' } /> Before Spice'} />
    Figure 2:Before Spice, applications submit many queries to external data sources.
    ' } /> With Spice '} />
    Figure 3: With Spice, applications can submit a single request to external data sources.
    ' } /> Use-Cases'} /> The next-generation of Spice.ai OSS enables:

    '} /> Better applications. Accelerate and co-locate data with frontend and backend applications, for high concurrent queries, serving more users with faster page loads and data updates. Try the CQRS sample app.

    ' } /> Snappy dashboards, analytics, and BI. Faster, more responsive dashboards without massive compute costs. Spice supports Arrow Flight SQL (JDBC/ODBC/ADBC) for connectivity with Tableau, Looker, PowerBI, and more. Watch the Apache Superset with Spice demo.

    ' } /> Faster data pipelines, machine learning training and inference. Co-locate datasets with pipelines where the data is needed to minimize data-movement and improve query performance. Predict hard drive failure with the SMART data demo.

    ' } /> Data lake acceleration. Materialize and accelerate data from S3, Delta Lake, or Apache Iceberg for sub-second queries without moving data into a centralized warehouse.

    ' } /> Easily query many data sources. Federated SQL query across databases, data warehouses, and data lakes using Data Connectors.

    ' } /> Community Built' } /> Spice is open-source, Apache 2.0 licensed, and is built using industry-leading technologies including Apache DataFusion, Arrow, and Arrow Flight SQL. We\'re launching with several built-in Data Connectors and Accelerators and Spice is extensible so more will be added in each release. If you\'re interested in contributing, we\'d love to welcome you to the community!

    ' } /> Getting Started' } /> You can download and run Spice in less than 30 seconds by following the quickstart at spiceai.org/docs/getting-started.

    ' } /> Conclusion'} /> Spice, rebuilt in Rust, introduces a unified SQL query interface, making it simpler and faster to build data-driven applications. The lightweight Spice runtime is easy to deploy and makes it possible to materialize and query data from any source quickly and cost-effectively. Applications can serve more users, dashboards and analytics can be snappier, and data and ML pipelines finish faster, without the heavy lifting of managing data.

    ' } /> For developers this translates to less time wrangling data and more time creating innovative applications and business value.

    ' } /> Check out and star the project on GitHub!

    ' } /> Thank you,

    '} /> Phillip

    '} />

    '} /> ## Frequently Asked Questions ### What is Spice.ai OSS? Spice.ai OSS is an open-source, portable runtime written in Rust that provides developers with a unified SQL query interface to locally materialize, accelerate, and query datasets sourced from any database, data warehouse, or data lake. It is designed for [data-intensive applications](/use-case/datalake-accelerator) that require fast, reliable data access. ### How is Spice.ai OSS different from a traditional database or data warehouse? Rather than replacing your existing databases, Spice sits alongside your application and [federates queries](/platform/sql-federation-acceleration) across multiple data sources. It materializes working datasets locally for sub-second performance while keeping your source of truth intact. ### What programming languages and protocols does Spice support? Spice exposes data over industry-standard protocols including HTTP, Apache Arrow Flight, and Arrow Flight SQL. This means any language or tool that speaks SQL, Arrow Flight, or ODBC/JDBC can query Spice without custom integration.

    '} /> --- ## AI needs AI-ready data URL: https://spice.ai/blog/ai-needs-ai-ready-data Date: 2021-12-05T04:30:17 Description: An introduction to AI-ready data and how Spice.ai handles normalization, encoding, and real-time data preparation for ML applications. A significant challenge when developing an app powered by AI is providing the machine learning (ML) engine with data in a format that it can use to learn. To do that, you need to normalize the numerical data, one-hot encode categorical data, and decide what to do with incomplete data - among other things.

    ' } /> This data handling is often challenging! For example, to learn from Bitcoin price data, the prices are better if normalized to a range between -1 and 1. Being close to 0 is also a problem because of the lack of precision in floating-point representations (usually under 1e-5).

    ' } /> As a developer, if you are new to AI and machine learning, a great talk that explains the basics is Machine Learning Zero to Hero. Spice.ai makes the process of getting the data into an AI-ready format easy by doing it for you!

    ' } /> What is AI-ready data?' } /> You write code with if statements and functions, but your machine only understands 1s and 0s. When you write code, you leverage tools, like a compiler, to translate that human-readable code into a machine-readable format.

    ' } /> Similarly, data for AI needs to be translated or "compiled" to be understood by the ML engine. You may have heard of tensors before; they are simply another word for a multi-dimensional array and they are the language of ML engines. All inputs to and all outputs from the engine are in tensors. You could use the following techniques when converting (or "compiling") source data to a tensor.

    ' } />
  • Normalization/standardization of the numerical input data. Many of the inputs and outputs in machine learning are interpreted as probability distributions. Much of the math that powers machine learning, such as softmax, tanh, sigmoid, etc., is meant to work in the [-1, 1] range.
  • ' } /> Normalizing raw data Figure 1. Normalizing Bitcoin price data.

    ' } />
  • Conversion of categorical data into numerical data. For categorical data (i.e., colors such as "red," "blue," or "green"), you can achieve this through a technique called "One Hot Encoding." In one hot encoding, each possible value in the category appears as a column. The values in the column are assigned a binary value of 1 or 0 depending on whether the value exists or not.
  • ' } /> Figure 2. A visualization of one-hot encoding 

    ' } /> Figure 2. A visualization of one-hot encoding.

    '} />
  • Several advanced techniques exist for "compiling" this source data - this process is known in the AI world as "feature engineering." This article goes into more detail on feature engineering techniques if you are interested in learning more.
  • ' } /> There are excellent tools like PandasNumpyscipy, and others that make the process of data transformation easier. However, most of these tools are Python libraries and frameworks - which means having to learn Python if you don\'t know it already. Plus, when building intelligent apps (instead of just doing pure data analysis), this all needs to work on real-time data in production.

    ' } /> Building intelligent apps' } /> The tools mentioned above are not designed for building real-time apps. They are often designed for analytics/data science.

    ' } /> In your app, you will need to do this data compilation in real-time - and you can't rely on a local script to help process your data. It becomes trickier if the team responsible for the initial training of the machine learning model is not the team responsible for deploying it out into production.

    " } /> How data is loaded and processed in a static dataset is likely very different from how the data is loaded and processed in real-time as your app is live. The result often is two separate codebases that are maintained by different teams that are both responsible for doing the same thing! Ensuring that those codebases stay consistent and evolve together is another challenge to tackle.

    ' } /> Spice.ai helps developers build apps with real-time ML' } /> Spice.ai handles the "compilation" of data for you.

    '} /> You specify the data that your ML should learn from in a Spicepod. The Spice.ai runtime handles the logistics of gathering the data and compiling it into an AI-ready format.

    ' } /> It does this by using many techniques described earlier, such as normalization and one-hot encoding. And because we're continuing to evolve Spice.ai, our data compilation will only get better over time.

    " } /> In addition, the design of the Spice.ai runtime naturally ensures that the data used for both the training and real-time cases are consistent. Spice.ai uses the same data-components and runtime logic to produce the data. And not only that, you can take this a step further and share your Spicepod with someone else, and they would be able to use the same AI-ready data for their applications.

    ' } /> Summary'} /> Spice.ai handles the process of compiling your data into an AI-ready format in a way that is consistent both during the training and real-time stages of the ML engine. A Spicepod defines which data to get and where to get it. Sharing this Spicepod allows someone else to use the same AI-ready data format in their application.

    ' } /> Learn more and contribute' } /> Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!

    ' } /> Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

    ' } /> If you are interested in partnering, we\'d love to talk. Try out Spice.aiemail us "hey," join our community Slack, or reach out on Twitter.

    ' } /> We are just getting started! 🚀

    '} /> Phillip

    '} /> --- ## Spice.ai Now Supports Amazon S3 Vectors For Vector Search at Petabyte Scale! URL: https://spice.ai/blog/amazon-s3-vectors Date: 2025-07-16T18:59:00 Description: Spice AI has partnered with AWS to integrate Amazon S3 Vectors into the Spice.ai Open Source data and AI compute engine. Today, we\'re announcing native support for Amazon S3 Vectors in the Spice.ai Open Source data and AI compute engine.

    ' } /> As an AWS Startup Partner and AWS Marketplace Seller, Spice AI partners with AWS across technology integration, joint go-to-market, and co-selling to deliver solutions for enterprise customers that address real-world data challenges, accelerating the delivery of AI-native applications on AWS.

    ' } /> The Spice.ai S3 Vectors integration arrives alongside AWS\'s announcement of the public preview of Amazon S3 Vectors, a new S3 bucket type designed for vector embeddings, complete with a query endpoint and metadata service. Developers can now configure Spice.ai to use S3 Vectors as a vector database backend, for simple, efficient storage, indexing, and querying of embeddings directly from S3.

    ' } />
    Figure 1. Spice.ai S3 Vectors integration.
    ' } /> What is Vector Similarity Search?' } /> Vector similarity search retrieves data by comparing similarities in multi-dimensional representations, instead of relying on exact keyword or value matches. This method powers semantic search, recommendation systems, and retrieval-augmented generation (RAG) in AI applications.

    ' } /> The process works as follows:

    '} />
  • Convert data to vectors: Turn items like text, images, or audio into vectors - arrays of numbers that capture the data\'s core meaning or features. Machine learning models handle this conversion, known as embedding. Examples include Amazon Titan Embeddings or Cohere Embeddings via AWS Bedrock, or MiniLM L6 available on HuggingFace.
  • Store the vectors: Store the embeddings in a specialized vector database or index designed for fast similarity queries.
  • Query with a vector: Convert the user\'s query (e.g., a phrase or image) into a vector. The system then identifies the closest matches using distance measures such as cosine similarity, Euclidean distance, or dot product.
  • ' } /> This approach provides precise, context-aware data retrieval from vast unstructured datasets. It supports AI applications that prioritize understanding over simple matching.

    ' } /> With the S3 Vectors integration, this process and the lifecycle of vectors is completely managed by the Spice.ai runtime, which also provides an intuitive SQL interface for querying.

    ' } /> Amazon S3 Vectors' } /> Amazon S3 Vectors, launched in public preview on July 15, 2025, provides the first cloud object store with native vector storage and querying, extending AWS object storage for semantic search and retrieval. It features vector indexes within buckets for embedding organization, PUT APIs for uploads, and query APIs for similarity searches using metrics like cosine distance.

    ' } /> Reducing costs for uploading, storing, and querying vectors by up to 90% versus alternatives, it supports AI agents, inference, and semantic search on S3 content with sub-second query performance at petabyte scale. It upholds S3's elasticity, durability, and compute-storage separation - vectors stay in durable storage, queries run on transient resources, bypassing monolithic databases and idle-period costs. Suited for tasks like matching scenes in video archives, clustering business documents, or pattern detection in medical images, it uses a new bucket type with dedicated APIs, no provisioning required, and scales to 10,000 indexes per bucket.

    " } /> Spice.ai\'s Integration with Amazon S3 Vectors' } /> Spice.ai's integration with Amazon S3 Vectors simplifies and accelerates application development for developers.

    " } /> With native support for S3 Vectors, Spice developers can configure datasets via YAML to use S3 Vectors as the vector storage engine, annotating columns with hosted embedding models including Amazon Titan Embeddings or Cohere Embeddings via AWS Bedrock, or self-hosted models like MiniLM L6 from Hugging Face.

    ' } />
    Figure 2. Simple YAML configuration of S3 Vectors in Spice.ai.
    ' } />

    '} /> The Spice.ai runtime manages the full vector lifecycle: it ingests source data from disparate enterprise sources like files, databases, and data lakes, embeds it using specified models, and pushes it into S3 Vector buckets. Applications query via SQL (e.g., SELECT * FROM vector_search(table, 'search query') WHERE condition ORDER BY score) with push-down optimization for efficiency, or HTTP APIs, while the runtime handles indexing and provides an intuitive SQL interface.

    " } />
    Figure 3. Using the vector_search SQL function in Spice Cloud for semantic search.
    ' } /> The Spice S3 Vectors integration simplifies and accelerates AI application development by leveraging S3's vector capabilities with minimal application code, without operational overhead, and it can be used together with existing Spice.ai Keyword and Full-Text (BM25) search capabilities.

    " } /> Demo of Amazon S3 Vectors in Spice.ai Open Source' } /> Availability'} /> S3 Vectors support is available today in the v1.5.0 release of Spice.ai Open Source and Spice Cloud!

    ' } /> To learn more about S3 Vectors in Spice, visit spiceai.org/docs/components/vectors/s3_vectors.

    ' } /> About Spice AI'} /> Spice AI helps enterprises build fast, accurate, and scalable AI applications and agents with its portable, open-source data and AI compute engine. It connects data from disparate sources, simplifies application development, and supports workloads across cloud, edge, and on-premises systems. Based in Seattle, Spice AI focuses on making AI application development simple and easy. Learn more about the Spice AI and AWS partnership.

    ' } /> Get started with Spice.ai Open Source in just 30 seconds at: https://spiceai.org/docs/getting-started

    ' } /> ## Frequently Asked Questions ### What is Amazon S3 Vectors? Amazon S3 Vectors is a new S3 bucket type that provides native vector storage and querying at petabyte scale. It reduces costs for storing and querying vectors by up to 90 percent versus alternatives, and supports similarity searches using metrics like cosine distance without provisioning infrastructure. ### How does Spice.ai integrate with S3 Vectors? Spice manages the full vector lifecycle: ingesting source data from databases, files, and data lakes, embedding it using specified models (such as Amazon Titan or Cohere via Bedrock), and pushing it into S3 Vector buckets. Applications query with standard SQL using the `vector_search` function, and the runtime handles indexing automatically. ### Can I combine vector search with keyword search in Spice? Yes. Spice supports [hybrid search](/platform/hybrid-sql-search) that combines vector similarity search with keyword and full-text BM25 search. This lets you blend semantic meaning with exact-match precision for more relevant [RAG](/use-case/retrieval-augmented-generation) results. --- ## Announcing Spice.ai Open Source 1.0-stable: A Portable Compute Engine for Data-Grounded AI - Now Ready for Production URL: https://spice.ai/blog/announcing-spice-ai-open-source-1-0-stable Date: 2025-01-22T19:24:40 Description: Learn how Spice.ai OSS grounds AI in real data with federated query, fast retrieval, and portable deployment anywhere. 🎉 Today marks the 1.0-stable release of Spice.ai Open Source-purpose-built to help enterprises ground AI in data. By unifying federated data query, retrieval, and AI inference into a single engine, Spice mitigates AI hallucinations, accelerates data access for mission-critical workloads, and makes it simple and easy for developers to build fast and accurate data-intensive applications across cloud, edge, or on-prem.

    ' } />
    Figure 1: Spice.ai OSS Stable
    ' } /> Enterprise AI systems are only as good as the context they\'re provided. When data is inaccessible, incomplete, or outdated, even the most advanced models can generate outputs that are inaccurate, misleading, or worse, potentially harmful. In one example, a chatbot was tricked into selling a 2024 Chevy Tahoe for $1 due to a lack of contextual safeguards. For enterprises, errors like these are unacceptable-it\'s the difference between success and failure.

    ' } /> Retrieval-Augmented Generation (RAG) is part of the answer - but traditional RAG is only as good as the data it has access to. If data is locked away in disparate, often legacy data systems, or cannot be stitched together for accurate retrieval, you get, as Benioff puts it, "Clippy 2.0".

    ' } />
    Figure 2: Marc Benioff on the limitations of Copilot
    ' } /> And often, after initial Python-scripted pilots, you're left with a new set of problems: How do you deploy AI that meets enterprise requirements for performance, security, and compliance while being cost efficient? Directly querying large datasets for retrieval is slow and expensive. Building and maintaining complex ETL pipelines requires expensive data teams that most organizations don't have. And because enterprise data is highly sensitive, you need secure access and auditable observability-something many RAG setups don't even consider.

    " } /> Developers need a platform at the intersection of data and AI-one specifically designed to ground AI in data. A solution that unifies data query, search, retrieval, and model inference-ensuring performance, security, and accuracy so you can build AI that you and your customers can trust.

    ' } /> Spice.ai OSS: A portable data, AI, and retrieval engine' } /> In March of 2024, we introduced Spice.ai Open Source, a SQL query engine to materialize and accelerate data from any database, data warehouse, or data lake so that data can be accessed wherever it lives across the enterprise - consistently fast. But that was only the start.

    ' } /> Building on this foundation, Spice.ai OSS unifies data, retrieval, and AI, to provide current, relevant context to mitigate AI "hallucinations" and significantly reduce incorrect outputs-just one of the many mission-critical use cases Spice.ai addresses.

    ' } /> Spice is a portable, single-node, compute engine built in Rust. It embeds the fastest single-node SQL query engine, DataFusion, to serve secure, virtualized data views to data-intensive apps, AI, and agents. Sub-second data query is accelerated locally using Apache Arrow, DuckDB, or SQLite.

    ' } /> Now at version 1.0-stable, Spice is ready for production. It's already deployed in enterprise use at Twilio, Barracuda Networks, and NRC Health, and can be deployed anywhere-cloud-hosted, BYOC, edge, on-prem.

    " } /> Spice AI compute engine
    Figure 3: The Spice.ai OSS architecture
    ' } /> ‍Data-grounded AI'} /> Data-grounded AI anchors models in accurate, current, and domain-specific data, rather than relying solely on pre-trained knowledge. By unifying enterprise data-across databases, data lakes, and APIs-and applying advanced ingestion and retrieval techniques, these systems dynamically incorporate real-world context at inference time without leaking sensitive information. This approach helps developers minimize hallucinations, reduce operational risk, and build trust in AI by delivering reliable, relevant outputs.

    ' } /> Ground AI in data
    Figure 4: AI responses with and without contextual data
    ' } /> How does Spice.ai OSS solve data-grounding?' } /> With Spice, models always have access to materializations of low-latency, real-time data for near-instant retrieval, minimizing data movement while enabling AI feedback so apps and agents can learn and adapt over time. For example, you can join customer records from PostgreSQL with sales data in Snowflake and logs stored in S3-all with a single SQL query or LLM function call.

    ' } />
    Figure 5: A secure compute engine for AI inference
    ' } /> Spice includes an advanced suite of LLM tools including vector and hybrid search, text-to-SQL, SQL query and retrieval, data sampling, and context formatting-all purpose-built for accurate outputs.

    ' } /> The latest research is continually incorporated so that teams can focus on business objectives rather than trying to keep up with the incredibly fast-moving and often overwhelming space of AI.

    ' } /> Spice.ai OSS: The engine that makes AI work' } /> Spice.ai OSS is a lightweight, portable runtime (single ~140 MB binary) with the capabilities of a high-speed cloud data warehouse built into a self-hostable AI inference engine, all in a single, run-anywhere package.

    ' } /> It's designed to be distributed and integrated at the application level, rather than being a bulky, centralized system to manage, and is often deployed as a sidecar. Whether running one Spice instance per service or one for each customer, Spice is flexible enough to fit your application architecture.

    " } /> Apps and agents integrate with Spice.ai OSS via three industry-standard APIs, so that it can be adopted incrementally with minimal changes to applications.

    ' } />
  • SQL Query APIs: HTTP, Arrow Flight, Arrow Flight SQL, ODBC, JDBC, and ADBC.
  • OpenAI-Compatible APIs: HTTP APIs compatible with the OpenAI SDK, AI SDK with local model serving (CUDA/Metal accelerated), and gateway to hosted models.
  • Iceberg Catalog REST APIs: A unified Iceberg Catalog REST API.
  • ' } /> Architecture diagram of Spice.ai 1.0
    Figure 6: The building blocks of the Spice.ai stack
    ' } /> Key features of Spice.ai OSS include:

    '} />
  • Federated SQL Query Across Data Sources: Perform SQL queries across disparate data sources with over 25 open-source data connectors, including catalogs (Unity Catalog, Iceberg Catalog, etc), databases (PostgreSQL, MySQL, etc.), data warehouses (Snowflake, Databricks, etc.), and data lakes (e.g., S3, ABFS, MinIO, etc.).
  • ' } />
  • Data Materialization and Acceleration: Locally materialize and accelerate data using Arrow, DuckDB, SQLite, and PostgreSQL, enabling low-latency and high-speed transactional and analytical queries. Data can be ingested via Change-Data-Capture (CDC) using Debezium, Catalog integrations, on an interval, or by trigger.
  • ' } />
  • AI Inference, Gateway, and LLM toolset: Load and serve models like Llama3 locally, or use Spice as a gateway to hosted AI platforms including OpenAI, Anthropic, xAI, and NVidia NIM. Automatically use a purpose-built LLM toolset for data-grounded AI.
  • ' } />
  • Enterprise Search and Retrieval: Advanced search capabilities for LLM applications, including vector-based similarity search and hybrid search across structured and unstructured data. Real-time retrieval grounds AI applications in dynamic, contextually relevant information, enabling state-of-the-art RAG.
  • ' } />
  • LLM Memory: Enable long-term memory for LLMs by efficiently storing, retrieving, and updating context across interactions. Support real-time contextual continuity and grounding for applications that require persistent and evolving understanding.
  • ' } />
  • LLM Evaluations: Test and boost model reliability and accuracy with integrated LLM-powered evaluation tools to assess and refine AI outputs against business objectives and user expectations.
  • ' } />
  • Monitoring and Observability: Ensure operational excellence with telemetry, distributed tracing, query/task history, and metrics, that provide end-to-end visibility into data flows and model performance in production.
  • ' } />
  • Deploy Anywhere; Edge-to-Cloud Flexibility: Deploy Spice as a standalone instance, Kubernetes sidecar, microservice, or scalable cluster, with the flexibility to run distributed across edge, on-premises, or any cloud environment. Spice AI offers managed, cloud-hosted deployments of Spice.ai OSS through the Spice Cloud Platform (SCP).
  • ' } /> Real-world use-cases' } /> Spice delivers data readiness for teams like Twilio and Barracuda, and accelerates time-to-market of data-grounded AI, such as with developers on GitHub and at NRC Health.

    ' } /> Here are some examples of how Spice.ai OSS solves real problems for these teams.

    ' } /> CDN for Databases - Twilio'} />
    Figure 7: Twilio use case diagram
    ' } /> A core requirement for many applications is consistently fast data access, with or without AI. Twilio uses Spice.ai OSS as a data acceleration framework or Database CDN, staging data in object-storage that\'s accelerated with Spice for sub-second query to improve the reliability of critical services in its messaging pipelines. Before Spice, a database outage could result in a service outage.

    ' } /> With Spice, Twilio has achieved:

    '} />
  • Significantly Improved Query Performance: Used Spice to co-locate control-plane data in the messaging runtime, accelerated with DuckDB, to send messages with a P99 query time of < 5ms.
  • Low-Latency Multi-Tenancy Controls: Spice is integrated into the message-sending runtime to manage multi-tenancy data controls. Before, data changes required manual triggers and took hours to propagate. Now, they update automatically and reach the messaging front door within five minutes via a resilient data-availability framework.
  • Mission-Critical Reliability: Reduced reliance on queries to databases by using Spice to accelerate data in-memory locally, with automatic failover to query data directly from S3, ensuring uninterrupted service even during database downtime.
  • ' } /> Datalake Accelerator - Barracuda'} /> Diagram illustrating Barracuda\'s use of Spice
    Figure 8: Barracuda use case diagram
    ' } /> Barracuda uses Spice.ai OSS to modernize data access for their email archiving and audit log systems, solving two big problems: slow query performance and costly queries. Before Spice, customers experienced frustrating delays of up to two minutes when searching email archives, due to the data volume being queried.

    ' } /> With Spice, Barracuda has achieved:

    '} />
  • Significant Cost Reduction: Replaced expensive Databricks Spark queries, significantly cutting expenses while improving performance.
  • 100x Query Performance Improvement: Accelerated email archive queries from a P99 time of 2 minutes to 100-200 milliseconds.
  • Efficient Audit Logs: Offloaded audit logs to Parquet files in S3, queried directly by Spice.
  • Mission-Critical Reliability: Reduced load on Cassandra, improving overall infrastructure stability.
  • ' } /> Data-Grounded AI apps and agents - NRC Health' } />
    Figure 9: NRC Health use case diagram
    ' } /> NRC Health uses Spice.ai OSS to simplify and accelerate the development of data-grounded AI features, unifying data from multiple platforms including MySQL, SharePoint, and Salesforce, into secure, AI-ready data. Before Spice, scaling AI expertise across the organization to build complex RAG-based scenarios was a challenge.

    ' } /> With Spice OSS, NRC Health has achieved:

    '} />
  • Developer Productivity: Partnered with Spice in three company-wide AI hackathons to build complete end-to-end data-grounded AI features in hours instead of weeks or months.
  • Accelerated Time-to-Market: Centralized data integration and AI model serving an enterprise-ready service, accelerating time to market.
  • ' } /> '} /> Data-Grounded AI Software Development - Spice.ai GitHub Copilot Extension' } /> When using tools like GitHub Copilot, developers often face the hassle of switching between multiple environments to get the data they need.

    ' } /> The Spice.ai for GitHub Copilot Extension built on Spice.ai OSS, gives developers the ability to connect data from external sources to Copilot, grounding Copilot in relevant data not generally available in GitHub, like test data stored in a development database.

    ' } /> Developers can simply type @spiceai to interact with connected data, with relevant answers now surfaced directly in Copilot Chat, significantly improving productivity.

    ' } /> Why choose Spice.ai OSS?' } /> Adopting Spice.ai OSS addresses real challenges in modern AI development: it grounds models in accurate, domain-specific, real-time data. With Spice, engineering teams can focus on what matters-delivering innovative, accurate, AI-powered applications and agents that work. Additionally, Spice.ai OSS is open-source under Apache 2.0, ensuring transparency and extensibility so your organization remains free to innovate without vendor lock-in.

    ' } /> Get started in 30 seconds' } /> You can install Spice.ai OSS in less than a minute, on macOS, Linux, and Windows.

    ' } /> macOS, Linux, and WSL:'} /> ```yaml curl https://install.spiceai.org | /bin/bash ``` Or using brew:

    '} /> ```bash brew install spiceai/spiceai/spice ``` Windows:'} /> ```shell curl -L "https://install.spiceai.org/Install.ps1" -o Install.ps1 && PowerShell -ExecutionPolicy Bypass -File ./Install.ps1 ``` Once installed, follow the Getting Started with Spice.ai guide to ground OpenAI chat with data from S3 in less than 2 minutes.

    ' } /> Looking ahead'} /> The 1.0-stable release of Spice.ai OSS marks a major step toward accurate AI for developers. By combining data, AI, and retrieval into a unified runtime, Spice anchors AI in relevant, real-time data-helping you build apps and agents that work.

    ' } /> A cloud-hosted, fully managed Spice.ai OSS service is available in the Spice Cloud Platform. It's SOC 2 Type II compliant and makes it easy to operate Spice deployments.

    " } /> Beyond apps and agents, the vision for Spice is to be the best digital labor platform for building autonomous AI employees and teams. These are exciting times! Stay tuned for some upcoming announcements later in 2025!

    ' } /> The Spice AI Team

    '} /> Learn more'} />
  • Cookbook: 47+ samples and examples using Spice.ai OSS
  • Documentation: Learn about features, use cases, and advanced configurations
  • X: Follow @spice_ai on X for news and updates
  • Slack: Connect with the team and the community
  • GitHub: Star the repo, contribute, and raise issues
  • ' } /> ## Frequently Asked Questions ### What is Spice.ai OSS 1.0-stable? Spice.ai OSS 1.0-stable is the production-ready release of Spice, an open-source, portable compute engine built in Rust. It unifies [SQL federation](/platform/sql-federation-acceleration), [hybrid search](/platform/hybrid-sql-search), and [AI inference](/platform/llm-inference) into a single lightweight runtime (~140 MB binary) that can be deployed anywhere from edge to cloud. ### How does Spice ground AI in data? Spice mitigates AI hallucinations by providing models with access to materializations of low-latency, real-time data from across the enterprise. Rather than relying solely on pre-trained knowledge, Spice enables [retrieval-augmented generation](/use-case/retrieval-augmented-generation) by federating queries across databases, data warehouses, and data lakes so models receive accurate, current context at inference time. ### How is Spice different from a data warehouse like Snowflake or Databricks? Unlike centralized data warehouses that require data movement and introduce cold-start latency, Spice queries data in place with sub-second performance. It is designed for application serving rather than analytics, often deployed as a sidecar alongside production services. See [pricing](/pricing) for deployment options. ### What real-world results have enterprises achieved with Spice? Twilio achieved P99 query times under 5 ms using Spice as a Database CDN. Barracuda improved email archive queries by 100x (from 15 seconds to 100-200 ms) while cutting costs by 50 percent with the [data lake accelerator](/use-case/datalake-accelerator) pattern. NRC Health accelerated development of data-grounded AI features by centralizing data from MySQL, SharePoint, and Salesforce into a single query layer. --- ## Spice Cloud v1.7.0: DataFusion v49, Full-Text Search Updates & More URL: https://spice.ai/blog/announcing-spice-cloud-v1-7-0 Date: 2025-09-24T17:41:00 Description: Spice Cloud v1.7.0 includes DataFusion v49, EmbeddingGemma support, and real-time indexing for full-text search Spice Cloud & Spice.ai Enterprise 1.7.0 are now live, bringing performance upgrades with DataFusion v49, real-time full-text search indexing, EmbeddingGemma support, and improvements across search, embeddings, and API integrations. Spice Cloud customers will automatically upgrade to v1.7.0 on deployment, while Spice.ai Enterprise customers can consume the Enterprise v1.7.0 image from the Spice AWS Marketplace listing.

    ' } /> What\'s New in v1.7.0'} /> DataFusion v49 Upgrade'} /> Spice now runs on DataFusion v49, delivering lower latency and improved query optimization.

    ' } />
    Source: DataFusion v49 Release Blog
    ' } />

    '} /> DataFusion v49 highlights include:

    '} />
  • Dynamic filters and pushdown to skip unnecessary reads in ORDER BY & LIMIT queries
  • Compressed spill files to reduce disk usage during large sorts and aggregations
  • Support for ordered-set aggregates with WITHIN GROUP
  • New REGEXP_INSTR function to identify regex match positions
  • ' } /> EmbeddingGemma Support'} /> Spice now supports EmbeddingGemma, Google\'s latest embedding model for text and documents. It delivers high-quality embeddings for semantic search, retrieval, and recommendation tasks. Configure it directly in your Spicepod via HuggingFace.

    ' } /> Embedding Request Caching'} /> Repeated embedding requests can now be cached in the Spice runtime. This reduces both latency and costs, with configurable cache size and TTL options. Check out the caching documentation for more details.

    ' } /> Real-Time Indexing for Full Text Search' } /> Full-text indexing now supports real-time changes from CDC streams such as Debezium. New events are searchable as they arrive, ensuring continuously fresh results.

    ' } /> OpenAI Responses API Tool Calls with Streaming' } /> The OpenAI Responses API in Spice now supports tool calls with streaming. Results from tools like web_search and code_interpreter are streamed as they\'re generated, enabling more responsive agent and application experiences.

    ' } /> Bug & Stability Fixes'} /> v1.7.0 includes numerous fixes and improvements:

    '} />
  • CDC streams readiness and full-text indexing reliability
  • Vector search pipeline and vector_search UDTF fixes
  • Kafka schema inference, consumer group persistence, and cooperative mode
  • Error reporting improvements (e.g., ThrottlingException handling)
  • Iceberg connector support for LIMIT pushdown
  • S3 Vector ingestion reliability and tracing fixes
  • ' } /> v1.7 Release Community Call'} /> We\'ll walk through highlights of v1.7 live on our Release Community Call. Join us to see the new functionality in action and bring your questions! Register here.

    ' } />
    Thursday, October 2nd Release community Call
    ' } /> '} />

    '} /> --- ## Apache Iceberg at Spice AI: How we Query, Accelerate, and Write to Open Table Formats URL: https://spice.ai/blog/apache-iceberg-at-spice-ai Date: 2026-02-25T00:00:00 Description: A technical deep-dive into how Spice AI integrates Apache Iceberg for federated queries, sub-second acceleration, and ACID-compliant writes to open table formats. **TL;DR:** Spice integrates Apache Iceberg as a first-class data source -- connect to any Iceberg catalog, query tables with full SQL semantics, selectively accelerate hot datasets for sub-millisecond reads, and write back with ACID guarantees. This post covers catalog integration, [query acceleration](/platform/sql-federation-acceleration), write support, DataFusion internals, and production lessons from running Iceberg at scale. --- [Apache Iceberg](https://iceberg.apache.org/) has become the default open table format for production data lakes, bringing ACID transactions, schema evolution, and time travel to data stored in object storage. But Iceberg alone doesn't solve the performance problem: querying data in object storage like S3 carries inherent latency that's too high for many application workloads. At [Spice AI](/platform/sql-federation-acceleration), we integrate Iceberg as a first-class data source - customers can connect to Iceberg catalogs and query tables with full SQL semantics, then selectively accelerate hot datasets for sub-millisecond reads. This post explains how that integration works, from connecting your first catalog to the query engine internals. **What this post covers:** - What Apache Iceberg is and why it matters for data lakes - The latency gap that Iceberg leaves open for application workloads - How Spice connects to Iceberg catalogs, accelerates tables, and writes data back - How the query engine works under the hood (Apache DataFusion, catalog providers, query optimization) - Production lessons from running Iceberg at scale This article is the second part of our Engineering at Spice AI series, where we share technical deep-dives into the [open-source technologies](https://spiceai.org/) and practices that power the Spice.ai compute engine. 1. [Apache DataFusion at Spice AI](/blog/how-we-use-apache-datafusion-at-spice-ai): The query engine 2. **Apache Iceberg at Spice AI: Open table format and SQL-based ingestion** 3. Rust at Spice AI: The systems programming foundation 4. Apache Arrow at Spice AI: The core data format 5. DuckDB at Spice AI: Embedded analytics acceleration 6. Vortex at Spice AI: Columnar compression for [Cayenne](/blog/introducing-spice-cayenne-data-accelerator), Spice's premier data accelerator Want to skip ahead? The [Iceberg catalog connector docs](https://spiceai.org/docs/components/catalogs/iceberg) walk through connecting a local REST catalog in under 5 minutes. The rest of this post explains how the integration works under the hood. ## What is Apache Iceberg? [Apache Iceberg](https://iceberg.apache.org/) is an open table format - a specification for how to organize metadata and data files so that data stored in object storage (S3, GCS, HDFS) can support database-level properties like ACID transactions, schema evolution, and time travel. It's not a storage system or a query engine, but a layer that sits between the two. Unlike the older Hive approach, which treats directories as tables and relies on naming conventions for partitioning, Iceberg maintains explicit metadata files that track: - **Current and historical table schemas** - so readers can handle schema changes without rewriting data - **Which data files belong to the table** - including which rows have been deleted - **Partition specifications and their evolution** - old and new partitioning schemes coexist - **Immutable snapshots** - for time travel queries and concurrent-write safety - **File-level and column-level statistics** - for pruning data files before reading them These capabilities are organized into three layers: - **The metadata layer:** JSON and Avro files tracking table schema, partitions, and data files - **The data layer:** Parquet files containing the actual data in object storage - **A catalog API:** The standard interface for table discovery and atomic updates ```mermaid graph TD A["Iceberg Catalog\n(REST, Glue, Hive, Hadoop)"] --> B["Table Metadata\n(Schema, Partitions, Snapshots)"] B --> C["Data Files\n(Parquet in object storage, e.g. S3)"] ``` This metadata-driven design is what gives Iceberg its core strengths. We'll return to specific features - ACID transactions, hidden partitioning, schema evolution, time travel, and file-level pruning - later in this post, in the context of how Spice uses them. ## The latency gap: Why Iceberg alone isn't enough for applications Iceberg solves the thorny data lake consistency and reliability challenges, but there's still an inherent performance gap for latency-sensitive workloads. Object storage like S3 carries 50-200ms per-request latency by design, and resolving an Iceberg query requires multiple metadata round-trips (manifest list, manifests, then data files) before touching a single row. Parquet itself is optimized for bulk analytical scans, not the high-concurrency, low-cardinality reads that application and increasingly AI agent workloads demand. Consider a concrete example: a 10TB Iceberg table of user events, partitioned by date, stored in S3. Your data team runs ad-hoc queries over years of data - Iceberg's partition pruning makes this efficient. But your application serves a dashboard that queries the last 7 days of data, thousands of times per second. The common workaround is to copy hot data into a faster system (ClickHouse, Redis, etc.) and introduce an ETL pipeline to keep it in sync with the lake. This works, but it introduces drift between systems, duplicates storage costs, and adds another operational surface to maintain. This is the gap Spice is designed to fill. ## How Spice.ai bridges the gap [Spice.ai](/platform/sql-federation-acceleration) is a data compute engine that lets you query data across sources using standard SQL. It connects to databases, data lakes, and warehouses - including Iceberg catalogs - and presents them through a single query interface. The key capability for Iceberg workloads is **acceleration**: Spice can materialize a hot working set into a faster local engine (DuckDB, SQLite, or [Spice Cayenne](/blog/introducing-spice-cayenne-data-accelerator)), then keep it automatically synchronized with the source. Iceberg stays the authoritative source of truth; Spice handles the caching and refresh logic so you don't have to build a separate pipeline. Spice is built on [Apache DataFusion](https://datafusion.apache.org/) for query planning and execution, and on [Apache Arrow](https://arrow.apache.org/) as the in-memory columnar format. We'll cover the DataFusion internals later in this post; for now, the important thing is that Spice treats Iceberg as a first-class catalog - connect once, query across data sources, then choose which datasets to accelerate. ```mermaid graph LR A["Read Manifests"] --> B["Check Statistics"] B --> C{"File matches predicate?"} C -->|"event_date max < 2024-01-01"| D["Skip file"] C -->|"user_id out of range"| D C -->|"Predicate matches"| E["Scan file"] ``` Spice supports several [deployment topologies](/feature/edge-to-cloud-deployments). The [sidecar pattern](https://spiceai.org/docs/deployment/architectures/sidecar) works well for workloads that need low-latency and high-concurrency. In Kubernetes, Spice runs as a second container in the same pod as your application. Communication goes over localhost instead of the network, which removes round-trip overhead. Each pod has its own copy of the accelerated data, so reads never leave the machine and each instance fails independently. The trade-off is resource duplication: every pod stores its own copy, which costs extra memory and compute. This works best when scale is moderate and sub-millisecond reads justify the extra resources - for example, a dashboard serving thousands of concurrent users. ```mermaid graph TD A["Application Layer\nSQL queries via Spice"] --> B["Spice Query Engine\n(Apache DataFusion + acceleration layer)"] B -->|"Cold queries\n(full history, ad-hoc)"| C["Iceberg on S3\n(full dataset)\n10TB historical\nPartition pruning\nQuery in 500ms-5s"] B -->|"Hot queries\n(recent data, dashboards)"| D["DuckDB Acceleration\n(last 7 days, cached)\n50GB on local NVMe\nSub-10ms queries\nAuto-refresh"] ``` ## Connecting Iceberg to Spice Configuration in Spice uses the [spicepod](https://spiceai.org/docs/getting-started/spicepods) - a YAML file that declares which data sources to connect and how to access them. To connect an Iceberg catalog, add a `catalogs` entry pointing to the catalog's REST endpoint (or Glue, or Hadoop): ```yaml catalogs: - from: iceberg:http://localhost:8181/v1/namespaces name: ice params: iceberg_s3_endpoint: http://localhost:9000 iceberg_s3_access_key_id: admin iceberg_s3_secret_access_key: password ``` On startup, Spice connects to the catalog, discovers all namespaces (databases) and tables, and registers them for SQL access: ```bash spice run # 2025-01-27T19:08:37Z INFO Registered catalog 'ice' with 1 schema and 8 tables ``` Every table is now queryable - no additional configuration needed: ```sql sql> SHOW TABLES; +--------+-------+------+ | table_catalog | table_schema | table_name | +--------+-------+------+ | ice | tpch_sf1 | lineitem | | ice | tpch_sf1 | nation | | ice | tpch_sf1 | orders | | ice | tpch_sf1 | customer | +--------+-------+------+ sql> SELECT COUNT(*) FROM ice.tpch_sf1.lineitem; +------+ | count(*) | +------+ | 6001215 | +------+ Time: 0.186233833 seconds ``` Tables registered this way go directly to the Iceberg source. This is the simplest setup, but it means query latency depends on the source (S3, network, file format, etc.). For low-latency use cases, the next step is acceleration. ## Accelerating Iceberg tables Tables registered through a catalog are queryable but not accelerated. To accelerate a table, register it as a **dataset** in the spicepod. This gives you control over the acceleration engine, refresh schedule, and which subset of data to cache: ```yaml datasets: - from: ice.analytics.events name: events acceleration: enabled: true engine: cayenne refresh_sql: | SELECT * FROM events WHERE event_date > NOW() - INTERVAL '7 days' refresh_check_interval: 10m ``` The `from:` field references the catalog table (`ice.analytics.events`), so the Iceberg catalog still serves as the source of truth. The dataset definition adds the acceleration layer on top. What happens behind the scenes: 1. On startup, Spice executes the `refresh_sql` against the Iceberg table 2. Results are loaded into a local [acceleration engine](https://spiceai.org/docs/features/data-acceleration) (Spice supports DuckDB, SQLite, Arrow, and Spice Cayenne for queries with low memory and DuckDB-like scale) 3. Queries against `events` hit the acceleration engine (local NVMe storage) 4. Every 10 minutes, Spice re-executes the refresh SQL to pull new data You don't have to choose between Iceberg and fast queries. Iceberg stays your source of truth. Acceleration adds a transparent cache for predictable, low-latency reads. ## Writing back to Iceberg Spice supports [INSERT operations](/blog/write-to-apache-iceberg-tables-with-sql) with full ACID guarantees via Iceberg's transaction protocol (check out the [Iceberg cookbook](https://github.com/spiceai/cookbook/blob/trunk/catalogs/iceberg/README.md) for a full example). To enable writes, set `access: read_write` on the catalog: ```yaml catalogs: - from: iceberg:http://localhost:8181/v1/namespaces access: read_write #required to enable INSERT operations name: ice params: iceberg_s3_endpoint: http://localhost:9000 iceberg_s3_access_key_id: admin iceberg_s3_secret_access_key: password iceberg_s3_region: us-east-1 ``` ```sql - Insert new rows INSERT INTO ice.tpch_sf1.region (r_regionkey, r_name, r_comment) VALUES (5, 'ANTARCTICA', 'A cold and remote region'); - Verify SELECT * FROM ice.tpch_sf1.region WHERE r_regionkey = 5; SELECT * FROM ice.tpch_sf1.nation WHERE n_nationkey = 25; ``` How this works under the hood: 1. Spice writes new Parquet files to S3 2. Creates a new manifest file listing the new files 3. Creates a new snapshot referencing the updated manifest 4. Atomically updates the metadata pointer in the catalog If another writer commits between steps 1-3, Spice's commit fails and retries. This is Iceberg's optimistic concurrency in action - every write creates a new snapshot, and the catalog uses atomic compare-and-swap operations to ensure only one writer wins. The loser retries from the latest snapshot. ## Under the hood: How Spice queries Iceberg Now that we've covered the user-facing configuration, let's look at how the query engine processes an Iceberg query. This section introduces the internal components - if you're primarily interested in using Spice with Iceberg, the sections above have you covered. ### Apache DataFusion: Spice's query engine Spice is built on [Apache DataFusion](https://datafusion.apache.org/), an extensible SQL query engine written in Rust. DataFusion handles SQL parsing, query planning, and execution. (We covered DataFusion in depth in the [first post in this series](/blog/how-we-use-apache-datafusion-at-spice-ai).) DataFusion organizes data sources into a three-level hierarchy: **Catalog** -> **Schema** -> **Table**. This maps directly to Iceberg's own structure: - **Catalog** = Iceberg catalog (REST endpoint, Glue database, or Hadoop warehouse) - **Schema** = Iceberg namespace - **Table** = Iceberg table Iceberg namespaces can technically be nested (`catalog.a.b.c.table`), but most catalog implementations use a single level. When you add an Iceberg catalog to the spicepod, Spice creates three types of DataFusion components to handle it: - **Catalog connectors** to connect to REST, Glue, or Hadoop catalogs - **CatalogProvider / SchemaProvider** to discover namespaces and list tables - **TableProvider** to read and write individual Iceberg tables ### How a query flows through the system When you run `SELECT * FROM ice.db.events`, here's how Spice resolves it: ```mermaid graph TD A["User Query\nSELECT * FROM iceberg.db.table"] --> B["Name Resolution / Table Lookup\niceberg.db.table → IcebergTableProvider"] B --> C["IcebergTableProvider\n(iceberg-datafusion crate)\nReads metadata, plans which Parquet\nfiles to read for the scan"] C --> D["Parquet Files\n(S3, GCS, HDFS, Local FS)"] ``` DataFusion parses the three-part table name (`ice.db.table`), resolves `ice` to the Iceberg CatalogProvider, `db` to the SchemaProvider for that namespace, and `table` to an `IcebergTableProvider`. The TableProvider then uses Iceberg's metadata to plan the most efficient read from S3. ### Schema discovery: Loading tables from a namespace Each Iceberg namespace becomes a DataFusion SchemaProvider. On initialization, the provider: 1. Lists all tables in the namespace from the catalog 2. Filters them against glob patterns if an inclusion list is configured 3. Loads each table concurrently **Performance note:** Loading table metadata is the slow path - each table requires fetching metadata from S3 (metadata.json, manifest lists). For catalogs with hundreds of tables, this can take 10-30 seconds on startup. Use glob filtering to scope the inclusion list to only the tables a given deployment needs. This keeps startup time predictable regardless of catalog size. ### Query optimization through Iceberg metadata Spice uses the community `iceberg-datafusion` crate to connect Iceberg tables to DataFusion. When a query hits an Iceberg table, the TableProvider uses Iceberg's metadata to minimize the data read from S3: - **Manifest pruning** - filter expressions skip manifest files that can't contain matching rows - **Predicate pushdown** - query predicates push down to the Parquet reader, so only matching row groups are scanned - **Column projection** - only requested columns are read from Parquet files This means a query like `SELECT customer_id FROM events WHERE event_date = '2025-01-15'` might read 50MB instead of 10TB - partition pruning eliminates irrelevant files, and column projection skips every column except `customer_id`. ## Key Iceberg features Spice leverages The query optimization described above depends on several foundational Iceberg features. Here's a closer look at each. ### ACID transactions via optimistic concurrency Every write creates a new metadata file pointing to a new snapshot. The catalog uses atomic compare-and-swap operations to commit changes. If two writers conflict, one fails and retries: ```mermaid sequenceDiagram participant A as Writer A participant Cat as Catalog participant B as Writer B A->>Cat: Read snapshot S1 B->>Cat: Read snapshot S1 A->>Cat: Add files, commit snapshot S2 Cat-->>A: Commit successful B->>Cat: Add files, try to commit snapshot S3 Cat-->>B: Rejected (S2 already committed) B->>Cat: Retry from S2 ``` This comes with a tradeoff - too many concurrent changes by multiple writers will result in a lot of wasted work as rejected writes retry. But the benefit is that writers can work independently, with the catalog serving as the coordination point. ### Hidden partitioning Iceberg maintains a separate partition spec that maps logical columns to physical partition values via transforms. Unlike Hive-style partitioning, users never query partition columns directly - partitioning is always defined as a transform on existing columns. For example, consider a table: ```sql events ( event_id BIGINT, user_id BIGINT, event_time DATE ) ``` With the partition spec: ```sql PARTITION SPEC day(event_time) ``` Iceberg physically stores a derived partition value such as `event_time_day = 2024-01-15`, but this field is not part of the table schema. When a user writes `WHERE event_time = '2024-01-15'`, Iceberg rewrites the predicate automatically using the partition transform and prunes data files accordingly. Changing the partitioning strategy only requires updating the spec. Old data remains partitioned under the old spec; new data uses the new one. Iceberg's metadata tracks which data files belong to which spec, and during scan planning it evaluates predicates across all active specs. ### Schema evolution without rewrites Iceberg tracks schema evolution in metadata; each data file records which schema version it was written with. Readers handle differences automatically: - New columns in old files -> return NULL - Renamed columns -> metadata tracks the mapping - Type promotions (int -> long) -> handled transparently ### Time travel Every commit creates an immutable snapshot. You can query as of any snapshot ID or timestamp. The tradeoff is that many small changes create many snapshots, increasing query planning time. ### File-level pruning via metadata statistics Each manifest file stores min/max values for every column in every data file. Before scanning any Parquet, Iceberg can eliminate files that cannot contain matching data: ```sql WHERE event_date > '2024-01-01' AND user_id = 12345 ``` ```mermaid graph LR A["Read Manifests"] --> B["Check Statistics"] B --> C{"File matches predicate?"} C -->|"event_date max < 2024-01-01"| D["Skip file"] C -->|"user_id out of range"| D C -->|"Predicate matches"| E["Scan file"] ``` Parquet files from modern writers also include these statistics in the Parquet footer - but Iceberg's metadata-level statistics enable pruning without opening the files at all. Write predicates that Iceberg can optimize: ```sql - Good: Iceberg can prune partitions SELECT * FROM events WHERE event_date > '2024-01-01' - Less optimal: Function prevents pushdown SELECT * FROM events WHERE YEAR(event_date) = 2024 ``` ## Error handling: Making failures actionable Iceberg operations can fail in many ways - network issues, permission errors, corrupted metadata, or missing files. Generic error messages make these hard to debug. Spice maps Iceberg errors to specific messages that tell you what went wrong and what to do about it. For example, if you have a typo in a namespace name: Generic Iceberg error: ```bash Unexpected => Failed to execute http request, NoSuchNamespaceException ``` Spice error: ```bash The namespace 'analytics_prod' does not exist in the Iceberg catalog, verify the namespace name and try again. ``` Or if the catalog URL is wrong: Generic Iceberg error: ```bash Unexpected => Failed to execute http request, source: error sending request for url (http://localhoster:8181/v1/config) ``` Spice error: ```bash Failed to connect to the Iceberg catalog or object store at (http://localhoster:8181/v1/config). Verify the Iceberg catalog is accessible and try again. ``` The same approach applies across TLS certificate errors, unsupported features, and invalid table metadata - each error includes context about the cause and a suggested next step. ## Production lessons After running Iceberg in production, here are some lessons learned: ### 1. Catalog discovery is a startup bottleneck Listing all namespaces and tables requires many metadata requests. Each table load fetches metadata from S3, and for catalogs with hundreds of tables, startup can take 10-30 seconds. We address this with: - **Concurrency control** - a semaphore caps concurrent table loads at 10 to avoid overwhelming the catalog or object store - **Include patterns** - glob filters scope discovery to only the tables a deployment needs - **Lazy loading (planned)** - fetch table metadata on first query instead of at startup, trading startup time for first-query latency ### 2. S3 request signing can expire on large tables AWS SigV4 signatures have a fixed validity window. When loading a large Iceberg table, Spice may queue hundreds of S3 requests for metadata files, manifests, and Parquet files. If requests wait too long in the queue, AWS rejects them with `RequestTimeTooSkewed`. This looks like a permissions error, but it's actually a timing issue - the signature expired before the request was sent. Concurrency limiting (point 1) helps here by keeping the request queue from growing too large. ### 3. Environment credentials can leak into explicit configs The `iceberg-rust` library uses OpenDAL for S3 access. By default, OpenDAL loads credentials from multiple sources - explicit properties, environment variables, and `~/.aws/config`. Even when you provide credentials explicitly in the spicepod, OpenDAL can pick up `AWS_SESSION_TOKEN` from the environment and mix it with your explicit access key. This causes authentication failures that are hard to trace because the credentials you configured are correct - it's the extra session token that's wrong. Spice now sets `s3.disable-config-load=true` to prevent this. If you hit unexpected auth errors with Iceberg, check whether AWS credentials are set in your environment. ### 4. Not every table in a catalog is an Iceberg table When connecting to AWS Glue, the catalog may contain Hive, Parquet, or CSV tables alongside Iceberg tables. Loading a non-Iceberg table as Iceberg fails. Spice handles this by skipping non-Iceberg tables with a warning instead of failing the entire catalog. If you see "Failed to load table" warnings on startup, this is likely why - it's not an error, it's Spice filtering out tables it can't read as Iceberg. ## Our contributions to iceberg-rust Spice depends on `iceberg-rust` for catalog access, metadata handling, and Parquet scanning. We maintain a fork for changes that haven't been upstreamed yet, and contribute fixes back to the upstream project as they stabilize. Merged upstream to `apache/iceberg-rust`: - [Handle converting Utf8View & BinaryView to Iceberg schema (#831)](https://github.com/apache/iceberg-rust/pull/831) - Added support for Arrow's newer string/binary view types - [Make schema and partition_spec optional for TableMetadataV1 (#1087)](https://github.com/apache/iceberg-rust/pull/1087) - Compatibility fix for older Iceberg V1 tables - [Handle pagination via next-page-token in REST Catalog (#1097)](https://github.com/apache/iceberg-rust/pull/1097) - Fixed REST catalog listing for catalogs with many namespaces/tables - [Fix predicates not matching the Arrow type of columns read from parquet files (#1308)](https://github.com/apache/iceberg-rust/pull/1308) - Fixed predicate pushdown producing incorrect results when Arrow types didn't match - [Add support for custom credential loader for S3 FileIO (#1528)](https://github.com/apache/iceberg-rust/pull/1528) - Enabled pluggable AWS credential providers for S3 access - [Improve IcebergCommitExec to correctly populate properties/schema (#1721)](https://github.com/apache/iceberg-rust/pull/1721) - Fixed write path metadata - [Fix: ensure CoalescePartitionsExec is enabled for IcebergCommitExec (#1723)](https://github.com/apache/iceberg-rust/pull/1723) - Fixed partitioned writes ## Current limitations and future work **What works today:** - Read all Iceberg tables (v1 and v2 format) - INSERT new data with snapshot isolation - Partition evolution - old and new partitioning schemes coexist transparently **Current limitations:** - No UPDATE or DELETE. Row-level mutations are not yet supported. - No incremental refresh. Acceleration re-fetches the entire dataset on each refresh. Incremental refresh (only fetch new data) is in development. - No automatic catalog refresh. Schema changes in the Iceberg catalog require restarting Spice. Periodic auto-refresh is planned. **What's next:** - DELETE FROM via equality delete files (merge-on-read) - Incremental acceleration refresh - only pull new partitions since the last refresh - Metadata-only queries - answer COUNT, MIN, MAX from Iceberg statistics without reading data files - Lazy table loading - defer metadata fetches until first query to reduce startup time - Better predicate pushdown to S3 Select for highly selective queries ## Apache Iceberg + Spice AI: Summary Apache Iceberg provides ACID transactions, schema evolution, and open file formats for data lakes. Spice adds federated SQL access with local acceleration on top. Together, they give you: - **Iceberg's reliability** - ACID guarantees and schema evolution over open Parquet files - **Spice's flexibility** - query Iceberg alongside Postgres, Snowflake, and other sources in one SQL interface - **Sub-millisecond reads** - accelerate frequently queried datasets locally while Iceberg remains the source of truth Run Iceberg locally in under 5 minutes using the [Spice .ai Iceberg Catalog Connector](https://github.com/spiceai/cookbook/blob/trunk/catalogs/iceberg/README.md). ## Frequently Asked Questions ### What is Apache Iceberg? Apache Iceberg is an open table format that brings ACID transactions, schema evolution, hidden partitioning, and time travel to data stored in object storage like S3, GCS, or HDFS. It organizes metadata and Parquet data files so that data lakes can support database-level reliability without requiring a specific storage system or query engine. ### How does Spice AI integrate with Apache Iceberg? Spice connects to Iceberg catalogs (REST, AWS Glue, or Hadoop) and registers every table for SQL access automatically. Once connected, you can query Iceberg tables alongside other data sources like PostgreSQL or Snowflake through a single SQL interface. ### Can Spice AI accelerate Iceberg tables for low-latency queries? Yes. Spice can materialize frequently queried subsets of Iceberg tables into a local acceleration engine (DuckDB, SQLite, or [Spice Cayenne](/blog/introducing-spice-cayenne-data-accelerator)) with configurable refresh intervals. This delivers sub-millisecond read performance while Iceberg remains the authoritative source of truth. ### Does Spice AI support writes to Iceberg tables? Yes. Spice supports INSERT operations with full ACID guarantees via Iceberg's optimistic concurrency protocol. New Parquet files are written to object storage, and the catalog metadata pointer is updated atomically. See [Writing to Apache Iceberg Tables with SQL](/blog/write-to-apache-iceberg-tables-with-sql) for a detailed walkthrough. ### What Iceberg catalog types does Spice support? Spice supports REST catalogs, AWS Glue, and Hadoop catalogs. Connect by adding a `catalogs` entry to your [spicepod](https://spiceai.org/docs/getting-started/spicepods) configuration file with the catalog endpoint and credentials. --- ## Basis Set Ventures Deploys Spice.ai to Power Natural Language Queries and Mitigate Hallucinations URL: https://spice.ai/blog/basis-set-ventures-deploys-spice-ai Date: 2025-09-17T17:51:00 Description: Basis Set Ventures uses Spice.ai Enterprise to power natural language searches directly against real-time datasets. TL;DR'} /> Basis Set Ventures needed to search continuously refreshed data on 10,000+ people and companies without the burden of managing embeddings or pipelines. Using Spice\'s data and AI platform, Basis Set investors can now run natural language searches directly against fresh datasets - ultimately delivering accurate, data-grounded insights that help them spot opportunities earlier and act faster.

    ' } />
    Figure 1. Pascal AI Search: "Show me people in our network that used to be at Duolingo."
    ' } /> Situation'} /> Founded in 2017 by Dr. Lan Xuezhao, Basis Set Ventures is a San Francisco-based venture capital firm that targets investments in early-stage technology companies across the United States. Basis Set describes itself as an "AI-native venture fund" because of its strong internal use of AI to identify promising entrepreneurs and enterprises.

    ' } /> AI-driven technology has made it easier than ever for someone with an innovative idea to start a new company; this is great for entrepreneurs, but it makes it harder for venture capitalists to identify early prospects in which to invest.

    ' } /> Basis Set saw this challenge as a data problem and created Pascal, their AI investment application for internal use. Pascal scours the internet looking for innovation, monitoring what\'s happening on GitHub, Reddit, LinkedIn, X, and a number of other sources. It then uses its proprietary algorithms to track more opaque variables like community sentiment about code contributions or areas of traction.

    ' } /> Challenge: Custom Searches Across Continuously Refreshed Data' } /> Monitoring more than 10,000 individuals and companies, and with harvested updates pouring in from social media and other areas of the internet throughout the day, Basis Set needed to keep its data set continuously updated so its team of investors could monitor areas of interests in real time.

    ' } /> "One key challenge we had is that, due to the nature of our business, we need to keep our database extremely fresh," says Rachel Wong, CTO & Partner at Basis Set. "That means pulling in updates to data every single day, both on people and company metrics."

    ' } /> The company first tried building and managing their own vector-based search system, converting words and images into numerical vectors, that could then be matched for similarity.  

    ' } /> Basis Set also sought to make it as easy as possible for the company's investment partners to query its evolving data set, which meant converting natural language to SQL queries.

    " } /> "With our first version of Pascal, our users were giving us multiple inputs on the kinds of people they were looking for," says Muhammad Ammad, Staff Engineer at Basis Set Ventures. "We would manually go in and search for those people and form a pipeline and give those people to the investors. To do this we had to continuously change our code because they were giving us new information every day, and we had to sync with them with a lot of back and forth, with different dimensions and criteria. We wanted to reduce our communication and make things self-serve to give them the independence to search the database however they needed to."

    ' } /> The specificity of the searches - which can involve individual work histories, social media contributions, GitHub postings, momentum, community sentiment and a variety of other factors - would be a significant operational lift using traditional search tooling.

    ' } /> Solution: Basis Set Adopts Spice.ai Enterprise, Purpose-Built to Help Enterprises Ground AI in Data' } /> To solve these challenges, Basis Set adopted Spice.ai Enterprise, an open source and cloud-deployable runtime that unifies query federationaccelerationhybrid search, and LLM inference in one system.

    ' } /> Spice automatically runs multiple forms of queries against the Basis Set data - including schema and semantic interrogation, data sampling, and evaluation - and then converts the natural language searches to precise SQL queries, delivering the most accurate answer to return to the user.

    ' } /> The Basis Set data remains on the company's dedicated cloud-based infrastructure and communicates with the Spice Compute Engine managed in Spice Cloud. With the Spice Platform, Basis Set can continually add to its data stores without managing embeddings manually or requiring other pre-search preparations.

    " } />
    Figure 2: . Basis Set Pascal AI app and Spice Cloud Architecture.
    ' } /> Benefits'} /> Enabling Natural Language Queries Without Managing Embeddings' } /> Basis Set needed to use natural language queries to make it easier for its investors to search precisely for the characteristics they sought. But, as noted earlier, they found managing the embedding process of converting human language to numerical representations too time consuming and expensive to keep up with perpetually changing data stores. "Our data is continuously being updated, which means it is always changing," Ammad says. "With embedding, if even one character is changed, that whole embedding is out of date, and we have to make a new embedding, which is also pretty expensive because we have more than 100,000 people in our database."

    ' } /> With just a few lines of configuration code, Basis Set was able to do away with managing embeddings and instead use the Spice Cloud Platform to convert natural language queries to SQL, enabling always-fresh searches across Basis Set\'s data. Ammad shared, "With Spice AI, we no longer have to keep updating embeddings to enable natural language queries. Spice takes care of all of that, which is awesome."

    ' } /> Eliminating embedding management has saved significant time for the company.

    ' } /> Behind the scenes, Spice goes beyond transforming natural language into SQL queries; it also tests multiple queries to find the one that generates the most precise results.

    ' } /> Eliminated Hallucinations'} /> AI is famously prone to hallucinations, in which it authoritatively delivers false information. Spice mitigates hallucinations by grounding its AI with the actual data of Basis Set, and using SQL queries, search, and LLM tools as inputs to AI prompts."We saw a lot of hallucinations when we were using the manual embedding approach," says Wong. "One problem with our embedded data was that it didn\'t understand the true sentiment. If we asked it to find people who worked at Dropbox in 2017, it could hallucinate and return people who actually worked at Box. False returns like that are counter-productive and lessen confidence in our tool.

    ' } /> Hallucinations were eliminated after deploying Spice.ai. "We like the way Spice AI has approached this problem set," Wong says. "Spice AI grounds AI in our actual data, using SQL queries across all our data, which brings accuracy to probabilistic AI systems, which are very prone to hallucinations."

    ' } /> Ease of Use & Deployment' } /> "One of the great things about Spice is that it was truly plug and play," Wong says. "We jumped on a call with the Spice team, set up the configuration files with a couple lines of setup code, and were able to integrate it into our system pretty much immediately. Our whole business team was impressed. We told them we were bringing in a new search engine for our Pascal application, and two days later everyone was using it."

    ' } /> The ability of Spice AI to support natural language searches has proven to be popular throughout the company, including with its investment team. "Spice takes a huge lift off our investment team," Wong continues. "Previously, our investors had to configure very granular filters and settings for the algorithms on our platform. Now they can do search using regular English sentiment, which is a lot more natural for our users. They\'re investors, they\'re more business minded, so it makes sense for them to be able to just type in some heuristics of groups of people or companies that they want to be tracking, for example pre-seed companies working on MCP, who were founded in 2017-just whatever combinations they want to search for. It\'s a lot more natural for their workflow than having to go in and think about every tiny setting."

    ' } /> Observability'} /> Basis Set values the observability Spice provides into how its AI-powered SQL queries are processing"Spice AI gives us observability, with which we can actually see what is happening under the hood, and if something is not as we expect it to be, we can see where we need to change the prompt and so on," says Ammad.

    ' } /> Spice\'s observability capabilities stand in contrast to the opaqueness of other AI platform solutions.

    ' } /> Conclusion'} /> With Spice, Basis Set transformed the way its investors interact with data. The platform removes the burden of managing embeddings, reduces hallucinations, and makes natural language search over structured data both accurate and real-time. This leads to faster insights, greater confidence, and a competitive edge in spotting the next generation of high-growth startups.

    ' } /> Getting Started with Spice'} /> Interested in giving Spice a try? Check out the following resources:

    ' } />
  • Sign up for Spice Cloud for free, or get started with Spice Open Source
  • Book a demo
  • Explore the Spice cookbooks and docs
  • ' } />

    '} />

    '} />

    '} /> --- ## Spice AI Announces Contribution of TableProviders for PostgreSQL, MySQL, DuckDB, and SQLite to the Apache DataFusion Project URL: https://spice.ai/blog/contribution-of-tableproviders-to-datafusion Date: 2024-07-04T19:11:00 Description: Spice AI has contributed new TableProviders for PostgreSQL, MySQL, DuckDB, and SQLite to the Apache DataFusion project. Spice AI has contributed new TableProviders for PostgreSQL, MySQL, DuckDB, and SQLite to the Apache DataFusion project. This addition reflects our commitment to building together in the data and AI ecosystem and supporting the open-source community.

    ' } /> What is Apache DataFusion?'} /> Apache DataFusion is a high-performance query engine built on Apache Arrow. It allows you to execute SQL queries quickly and efficiently on data stored in various formats. By using the in-memory columnar format of Apache Arrow, DataFusion speeds up data processing and works natively with other Arrow-based tools.

    ' } /> About the Spice OSS Project'} /> Spice OSS is an open-source project from Spice AI that provides developers with a unified SQL query interface to locally materialize, accelerate, and query datasets from any database, data warehouse, or data lake. Spice OSS incorporates Apache DataFusion as its SQL query engine. Learn more about Spice SQL federation and acceleration.

    ' } /> Our goal with Spice OSS is to make data and AI-driven development more accessible. By contributing to projects like DataFusion and Arrow, we can help making accessing and using data better for everyone building in the space.

    ' } /> New TableProviders for DataFusion' } /> We've initially added TableProviders for PostgreSQL, MySQL, DuckDB, and SQLite to DataFusion. This expands the range of data sources you can query using DataFusion. And we plan to add more in the future.

    " } />
  • PostgreSQL: A robust and extensible open-source relational database.
  • MySQL: A reliable and user-friendly open-source relational database.
  • DuckDB: An in-process SQL OLAP database for analytical queries.
  • SQLite: A lightweight, disk-based database commonly used in embedded systems.
  • ' } /> These new TableProviders make DataFusion even more versatile, allowing you to work with your existing databases and data lakes more easily.

    ' } /> Our Commitment to Data and AI'} /> At Spice AI, we believe in the power of open-source and the potential of data and AI. By contributing to Apache DataFusion, we're helping to advance data processing technology and make powerful tools available to developers everywhere.

    " } /> To learn more about Spice AI and our open-source projects, check out our GitHub repository. You can also explore Apache DataFusion at datafusion.apache.org.

    ' } /> Stay tuned for more updates from Spice AI as we continue to contribute to the data and AI ecosystem.

    ' } />

    '} />

    '} /> --- ## Announcing Our Partnership with Databricks! URL: https://spice.ai/blog/databricks-partnership Date: 2025-06-10T19:04:00 Description: Spice partners with Databricks to accelerate operational AI apps with fast SQL queries, Mosaic AI embeddings, and Unity Catalog governance. Today we announced our partnership with Databricks, the leading data and AI platform! We\'re also excited to roll out new integrations that support the Databricks platform, enabling customers to build faster and more reliable apps and agents that extend across cloud, on-premises, and edge.

    ' } /> New capabilities now available to Databricks customers include:

    ' } />
  • Databricks SQL Warehouse and Spark Connect integrations for high-performance SQL queries accelerated with DuckDB and SQLite.
  • Databricks Mosaic AI model serving and embeddings integrations to bring MosaicAI alongside applications.
  • Unity Catalog support for governance and security.
  • Apache Iceberg & Delta Lake support for query and management of open format tables via Unity Catalog.
  • Enterprise-grade security Service Principal M2M & U2M OAuth authentication for enterprise-grade role-based security.
  • ' } />

    '} /> With Spice AI's integrations with Databricks, you can now:

    "} />
  • Query Data For Operational Use-Cases: Execute fast, low-latency SQL queries across Databricks, on-premises, and edge sources with Spice.ai\'s unified engine, enabling real-time applications like inventory tracking or fraud detection.
  • Embed AI With Applications: Integrate Databricks Mosaic AI model serving and embeddings with the Spice engine to deploy AI features, such as low-latency recommendation systems, search, or predictive maintenance.
  • Streamline Data Governance: Manage Apache Iceberg and Delta Lake tables using Unity Catalog, enforcing secure access ensuring compliance and restricted data access.
  • Optimize Workload Performance: Use Spice.ai to cache hot data, replicate high-demand datasets, and load-balance hosted AI endpoints, maintaining speed and resilience for applications like real-time dashboards.
  • ' } />
    The Spice.ai Compute Engine
    ' } /> Get Started Today' } /> Spice AI's partnership with Databricks is available now. Explore the following options to suit your needs:

    " } />
  • Spice.ai Open Source: A single-node, open-source SQL query and AI-inference engine for developers.
  • Spice.ai Enterprise: A scalable, multi-node solution for cloud or self-hosted Kubernetes, built for enterprise performance and scale.
  • Spice Cloud Platform: A fully managed, cloud-hosted platform for building and scaling operational AI applications.
  • ' } />

    '} /> To try Spice.ai for yourself, visit: /login

    ' } />

    '} /> --- ## Getting started with Amazon S3 Vectors and Spice URL: https://spice.ai/blog/getting-started-with-amazon-s3-vectors-and-spice Date: 2025-07-31T21:35:21 Description: Learn how Spice AI integrates Amazon S3 Vectors for scalable, cost-effective vector search - combining semantic, full-text, and SQL queries in one runtime. TLDR'} /> The latest Spice.ai release (v1.5.0) brings major improvements to search, including native support for Amazon S3 Vectors. Announced in public preview at AWS Summit New York 2025, Amazon S3 Vectors is a new S3 bucket type purpose-built for vector embeddings, with dedicated APIs for similarity search.

    ' } /> Spice AI was a day 1 launch partner for S3 Vectors, integrating it as a scalable vector index backend. In this post, we explore how S3 Vectors integrates into Spice.ai's data, search, and AI-inference engine, how Spice manages indexing and lifecycle of embeddings for production vector search, and how this unlocks a powerful hybrid search experience. We'll also put this in context with industry trends and compare Spice's approach to other vector database solutions like Qdrant, Weaviate, Pinecone, and Turbopuffer.

    " } /> Amazon S3 Vectors Overview' } />
    Figure 1: Amazon S3 Vectors workflow
    ' } /> Amazon S3 Vectors extends S3 object storage with native support for storing and querying vectors at scale. As AWS describes, it is "designed to provide the same elasticity, scale, and durability as Amazon S3," providing storage of billions of vectors and sub-second similarity queries. Crucially, S3 Vectors dramatically lowers the cost of vector search infrastructure - reducing upload, storage, and query costs by up to 90% compared to traditional solutions. It achieves this by separating storage from compute: vectors reside durably in S3, and queries execute on transient, on-demand resources, avoiding the need for always-on, memory-intensive vector database servers. In practice, S3 Vectors exposes two core operations:

    ' } />
  • Upsert vectors - assign a vector (an array of floats) to a given key (identifier) and optionally store metadata alongside it.
  • Vector similarity query - given a new query vector, efficiently find the stored vectors that are closest (e.g. minimal distance) to it, returning their keys (and scores).
  • ' } /> This transforms S3 into a massively scalable vector index service. You can store embeddings at petabyte scale and perform similarity search with metrics like cosine or Euclidean distance via a simple API. It's ideal for AI use cases like semantic search, recommendations, or Retrieval-Augmented Generation (RAG) where large volumes of embeddings need to be queried semantically. By leveraging S3's pay-for-use storage and ephemeral compute, S3 Vectors can handle infrequent or large-scale queries much more cost-effectively than memory-bound databases, yet still deliver sub-second results.

    " } /> Vector Search with Embeddings' } /> Vector similarity search retrieves data by comparing items in a high-dimensional embedding space rather than by exact keywords. In a typical pipeline:

    ' } />
  • Data to vectors: We first convert each data item (text, image, etc.) into a numeric vector representation (embedding) using an ML model. For example, a customer review text might be turned into a 768-dimensional embedding that encodes its semantic content. Models like Amazon Titan Embeddings, OpenAI, or Hugging Face sentence transformers handle this step.
  • ' } />
  • Index storage: These vectors are stored in a specialized index or database optimized for similarity search. This could be a dedicated vector database or, in our case, Amazon S3 Vectors acting as the index. Each vector is stored with an identifier (e.g. the primary key of the source record) and possibly metadata.
  • ' } />
  • Query by vector: A search query (e.g. a phrase or image) is also converted into an embedding vector. The vector index is then queried to find the closest stored vectors by distance metric (cosine, Euclidean, dot product, etc.). The result is a set of IDs of the most similar items, often with a similarity score.
  • ' } /> This process enables semantic search - results are returned based on meaning and similarity rather than exact text matches. It powers features like finding relevant documents by topic even if exact terms differ, recommendation systems (finding similar user behavior or content), and providing knowledge context to LLMs in RAG. With the Spice.ai Open Source integration, this whole lifecycle (embedding data, indexing vectors, querying) is managed by the Spice runtime and exposed via a familiar SQL or HTTP interface.

    ' } /> Amazon S3 Vectors in Spice.ai' } />
    Figure 2: S3 Vectors and Spice architecture
    ' } /> Spice.ai is an open-source data, search and AI compute engine that supports vector search end-to-end. By integrating S3 Vectors as an index, Spice can embed data, store embeddings in S3, and perform similarity queries - all orchestrated through simple configuration and SQL queries. Let's walk through how you enable and use this in Spice.

    " } /> Configuring a Dataset with Embeddings' } /> To use vector search, annotate your dataset schema to specify which column(s) to embed and with which model. Spice supports various embedding models (both local or hosted) via the embeddings section in the configuration. For example, suppose we have a customer reviews table and we want to enable semantic search over the review text (body column):

    ' } /> ```yaml datasets: - from: oracle:"CUSTOMER_REVIEWS" name: reviews columns: - name: body embeddings: from: bedrock_titan # use an embedding model defined below embeddings: - from: bedrock:amazon.titan-embed-text-v2:0 name: bedrock_titan params: aws_region: us-east-2 dimensions: '256' ``` In this spicepod.yaml, we defined an embedding model bedrock_titan (in this case AWS\'s Titan text embedding model) and attached it to the body column. When the Spice runtime ingests the dataset, it will automatically generate a vector embedding for each row\'s body text using that model. By default, Spice can either store these vectors in its acceleration layer or compute them on the fly. However, with S3 Vectors, we can offload them to an S3 Vectors index for scalable storage.

    ' } /> To use S3 Vectors, we simply enable the vector engine in the dataset config:

    ' } /> ```yaml datasets: - from: oracle:"CUSTOMER_REVIEWS" name: reviews vectors: enabled: true engine: s3_vectors params: s3_vectors_bucket: my-s3-vector-bucket #... (rest of dataset definition as above) ``` This tells Spice to create or use an S3 Vectors index (in the specified S3 bucket) for storing the body embeddings. Spice manages the entire index lifecycle: it creates the vector index, handles inserting each vector with its primary key into S3, and knows how to query it. The embedding model and data source are as before - the only change is where the vectors are stored and queried. The benefit is that now our vectors reside in S3's highly scalable storage, and we can leverage S3 Vectors' efficient similarity search API.

    " } /> Performing a Vector Search Query' } /> Once configured, performing a semantic search is straightforward. Spice exposes both an HTTP endpoint and a SQL table-valued function for vector search. For example, using the HTTP API:

    ' } /> ```bash curl -X POST http://localhost:8090/v1/search \ -H "Content-Type: application/json" \ -d '{ "datasets": ["reviews"], "text": "issues with same day shipping", "additional_columns": ["rating", "customer_id"], "where": "created_at >= now() - INTERVAL '7 days'", "limit": 2 }' ``` This JSON query says: search the reviews dataset for items similar to the text "issues with same day shipping", and return the top 2 results, including their rating and customer id, filtered to reviews from the last 7 days. The Spice engine will embed the query text (using the same model as the index), perform a similarity lookup in the S3 Vectors index, filter by the WHERE clause, and return the results. A sample response might look like:

    ' } /> ```json { "results": [ { "matches": { "body": "Everything on the site made it seem like I'd get it the same day. Still waiting the next morning was a letdown." }, "data": { "rating": 3, "customer_id": 6482 }, "primary_key": { "review_id": 123 }, "score": 0.82, "dataset": "reviews" }, { "matches": { "body": "It was marked as arriving 'today' when I paid, but the delivery was pushed back without any explanation. Timing was kind of important for me." }, "data": { "rating": 2, "customer_id": 3310 }, "primary_key": { "review_id": 24 }, "score": 0.76, "dataset": "reviews" } ], "duration_ms": 86 } ``` Each result includes the matching column snippet (body), the additional requested fields, the primary key, and a relevance score. In this case, the two reviews shown are indeed complaints about "same day" delivery issues, which the vector search found based on semantic similarity to the query (see how the second result made no mention of "same day" delivery, but rather described a similar issue as the first ).

    ' } /> Developers can also use SQL for the same operation. Spice provides a table function vector_search(dataset, query) that can be used in the FROM clause of a SQL query. For example, the above search could be expressed as:

    ' } /> ```sql SELECT review_id, rating, customer_id, body, score FROM vector_search(reviews, 'issues with same day shipping') WHERE created_at >= to_unixtime(now() - INTERVAL '7 days') ORDER BY score DESC LIMIT 2; ``` This would yield a result set (with columns like review_idscore, etc.) similar to the JSON above, which you can join or filter just like any other SQL table. This ability to treat vector search results as a subquery/table and combine them with standard SQL filtering is a powerful feature of Spice.ai's integration - few other solutions let you natively mix vector similarity and relational queries so seamlessly.

    " } /> See a 2-min demo of it in action:

    '} /> Managing Embeddings Storage in Spice.ai' } /> An important design question for any vector search system is where and how to store the embedding vectors. Before introducing S3 Vectors, Spice offered two approaches for managing vectors:

    ' } />
  • Accelerator storage: Embed the data in advance and store the vectors alongside other cached data in a Data Accelerator (Spice\'s high-performance materialization layer). This keeps vectors readily accessible in memory or fast storage.
  • Just-in-time computation: Compute the necessary vectors on the fly during a query, rather than storing them persistently. For example, at query time, embed only the subset of rows that satisfy recent filters (e.g. all reviews in the last 7 days) and compare those to the query vector.
  • ' } /> Both approaches have trade-offs. Pre-storing in an accelerator provides fast query responses but may not be feasible for very large datasets (which might not fit entirely, or fit affordably in fast storage) and accelerators, like DuckDB or SQLite aren't optimized for similarity search algorithms on billion-scale vectors. Just-in-time embedding avoids extra storage but becomes prohibitively slow when computing embeddings over large data scans (and for each query), and provides no efficient algorithm for efficiently finding similar neighbours.

    " } /> Amazon S3 Vectors offers a compelling third option: the scalability of S3 with the efficient retrieval of vector index data structures. By configuring the dataset with engine: s3_vectors as shown earlier, Spice will offload the vector storage and similarity computations to S3 Vectors. This means you can handle very large embedding sets (millions or billions of items) without worrying about Spice's memory or local disk limits, and still get fast similarity operations via S3's API. In practice, when Spice ingests data, it will embed each row's body and PUT it into the S3 Vector index (with the review_id as the key, and possibly some metadata). At query time, Spice calls S3 Vectors' query API to retrieve the nearest neighbors for the embedded query. All of this is abstracted away; you simply query Spice and it orchestrates these steps.

    " } /> The Spice runtime manages index creation, updates, and deletion. For instance, if new data comes in or old data is removed, Spice will synchronize those changes to the S3 vector index. Developers don't need to directly interact with S3 - it's configured once in YAML. This tight integration accelerates application development: your app can treat Spice like any other database, while behind the scenes Spice leverages S3's elasticity for the heavy lifting.

    " } /> Vector Index Usage in Query Execution' } /> How does a vector index actually get used in Spice's SQL query planner? To illustrate, consider the simplified SQL we used:

    " } /> ```sql SELECT * FROM vector_search(reviews, 'issues with same day shipping') ORDER BY score DESC LIMIT 5; ``` Logically, without a vector index, Spice would have to do the following at query time:

    ' } />
  • Embed the query text \'issues with same day shipping\' into a vector v.
  • Retrieve or compute all candidate vectors for the searchable column (here every body embedding in the dataset). This could mean scanning every row or at least every row matching other filter predicate.
  • Calculate distances between the query vector v and each candidate vector, compute a similarity score (e.g. score = 1 - distance).
  • Sort all candidates by the score and take the top 5.
  • ' } /> For large datasets, steps 2-4 would be extremely expensive (a brute-force scan through potentially millions of vectors for each search, then a full sort operation). A vector index avoiding unnecessary recomputation of embeddings, reduces the number of distance calculations required, and provides in-order candidate neighbors.

    ' } /> With S3 Vectors, step 2 and 3 are pushed down to the S3 service. The vector index can directly return the top K closest matches to v. Conceptually, S3 Vectors gives back an ordered list of primary keys with their similarity scores. For example, it might return something like: {(review_id=123, score=0.82), (review_id=24, score=0.76), ...} up to K results.

    ' } /> Spice then uses these results, logically as a temporary table (let's call it vector_query_results), joined with the main reviews table to get the full records. In SQL pseudocode, Spice does something akin to:

    " } /> ```sql -- The vector index returns the closest matches for a given query. CREATE TEMP TABLE vector_query_results ( review_id BIGINT, score FLOAT ); ``` Imagine this temp table is populated by an efficient vector retrieval operatin in S3 Vectors for the query.

    ' } /> ```sql -- Now we join to retrieve full details SELECT r.review_id, r.rating, r.customer_id, r.body, v.score FROM vector_query_results v JOIN reviews r ON r.review_id = v.review_id ORDER BY v.score DESC LIMIT 5; ``` This way, only the top few results (say 50 or 100 candidates) are processed in the database, rather than the entire dataset. The heavy work of narrowing down candidates occurs inside the vector index. Spice essentially treats vector_search(dataset, query) as a table-valued function that produces (id, score) pairs which are then joinable.

    ' } /> Handling Filters Efficiently' } /> One consideration when using an external vector index is how to handle additional filter conditions (the WHERE clause). In our example, we had a filter created_at >= now() - 7 days. If we simply retrieve the top K results from the vector search and then apply the time filter, we might run into an issue: those top K might not include any recent items, even if there are relevant recent items slightly further down the similarity ranking. This is because S3 Vectors (like most ANN indexes) will return the top K most similar vectors globally, unaware of our date constraint.

    ' } /> If only a small fraction of the data meets the filter, a naive approach could drop most of the top results, leaving fewer than the desired number of final results. For example, imagine the vector index returns 100 nearest reviews overall, but only 5% of all reviews are from the last week - we'd expect only ~5 of those 100 to be recent, possibly fewer than the LIMIT. The query could end up with too few results not because they don't exist, but because the index wasn't filter-aware and we truncated the candidate list.

    " } /> To solve this, S3 Vectors supports metadata filtering at query time. We can store certain fields as metadata with each vector and have the similarity search constrained to vectors where the metadata meets criteria. Spice.ai leverages this by allowing you to mark some dataset columns as "vector filterable". In our YAML, we could do:

    ' } /> ```yaml columns: - name: created_at metadata: vectors: filterable ``` By doing this, Spice's query planner will include the created_at value with each vector it upserts to S3, and it will push down the time filter into the S3 Vectors query. Under the hood, the S3 vector query will then return only nearest neighbors that also satisfy created_at >= now()-7d. This greatly improves both efficiency and result relevance. The query execution would conceptually become:

    " } /> ```sql -- Vector query with filter returns a temp table including the metadata CREATE TEMP TABLE vector_query_results ( review_id BIGINT, score FLOAT, created_at TIMESTAMP ); -- vector_query_results is already filtered to last 7 days SELECT r.review_id, r.rating, r.customer_id, r.body, v.score FROM vector_query_results v JOIN reviews r ON r.review_id = v.review_id -- (no need for additional created_at filter here, it's pre-filtered) ORDER BY v.score DESC LIMIT 5; ``` Now the index itself is ensuring all similar reviews are from the last week, and so if there are at least five results from the last week, it will return a full result (i.e. respecting LIMIT 5).

    ' } /> Including Data to Avoid Joins' } /> Another optimization Spice supports is storing additional, non-filterable columns in the vector index to entirely avoid the expensive table join back to the main table for certain queries. For example, we might mark ratingcustomer_id, or even the text body as non-filterable vector metadata. This means these fields are stored with the vector in S3, but not used for filtering (just for retrieval). In the Spice config, it would look like:

    ' } /> ```yaml columns: - name: rating metadata: vectors: non-filterable - name: customer_id metadata: vectors: non-filterable - name: body metadata: vectors: non-filterable ``` With this setup, when Spice queries S3 Vectors, the vector index will return not only each match's review_id and score, but also the stored ratingcustomer_id, and body values. Thus, the temporary vector_query_results table already has all the information needed to satisfy the query. We don't even need to join against the reviews table unless we want some column that wasn't stored. The query can be answered entirely from the index data:

    " } /> ```sql SELECT review_id, rating, customer_id, body, score FROM vector_query_results ORDER BY score DESC LIMIT 5; ``` This is particularly useful for read-heavy query workloads where hitting the main database adds latency. By storing the most commonly needed fields along with the vector, Spice's vector search behaves like an index-only query (similar to covering indexes in relational databases). You trade a bit of extra storage in S3 (duplicating some fields, but still managed by Spice) for faster queries that bypass the heavier join.

    " } /> This extends to WHERE conditions on non-filterable columns, or filter predicate unsupported by S3 vectors. Spice's execution engine can apply these filters, still avoiding any expensive JOIN on the underlying table.

    " } /> ```sql SELECT review_id, rating, customer_id, body, score FROM vector_query_results where rating > 3 -- Filter performed in Spice on, with non-filterable data from vector index ORDER BY score DESC LIMIT 5; ``` t's worth noting that you should choose carefully which fields to mark as metadata - too many or very large fields could increase index storage and query payload sizes. Spice gives you the flexibility to include just what you need for filtering and projection to optimize each use case.

    " } /> Beyond Basic Vector Search in Spice' } /> Many real-world search applications go beyond a single-vector similarity lookup. Spice.ai's strength is that it's a full database engine. You can compose more complex search workflows, including hybrid vector and full-text search (combining keyword/text search with vector search), multi-vector queries, re-ranking strategies, and more. Spice provides both an out-of-the-box hybrid search API and the ability to write custom SQL to implement advanced retrieval logic.

    " } />
  • Multiple vector fields or multi-modal search: You might have vectors for different aspects of data (e.g. an e-commerce product could have embeddings for both its description and the product\'s image. Or a document has both a title and body that should be searchable individually and together) that you may want to search across and combine results. Spice lets you do vector search on multiple columns easily, and you can weight the importance of each. For instance, you might boost matches in the title higher than matches in the body.
  • ' } />
  • Vector and full-text search: Similar to vector search, columns can have text indexes defined that enable full-text BM25 search. Text search can then be performed in SQL with a similar text_search UDTF. The /v1/search HTTP API will perform a hybrid search across both full-text and vector indexes, merging results using Reciprocal Rank Fusion (RRF). This means you get a balanced result set that accounts for direct keyword matches as well as semantic similarity. The example below demonstrates how RRF can be implemented in SQL by combining ranks.
  • ' } />
  • Hybrid vector + keyword search: Sometimes you want to ensure certain keywords are present while also using semantic similarity. Spice supports hybrid search natively - its default /v1/search HTTP API actually performs both full-text BM25 search and vector search, then merges results using Reciprocal Rank Fusion (RRF). This means you get a balanced result set that accounts for direct keyword matches as well as semantic similarity. In Spice\'s SQL, you can also call text_search(dataset, query) for traditional full-text search, and combine it with vector_search results. The example below demonstrates how RRF can be implemented in SQL by combining ranks.
  • ' } />
  • Two-phase retrieval (re-ranking): A common pattern is to use a fast first-pass retrieval (e.g. a keyword search) to get a larger candidate set, then apply a more expensive or precise ranking (e.g. vector search) on this subset to improve the score of the required final candidate set. With Spice, you can orchestrate this in SQL or in application code. For example, you could query a BM25 index for 100 candidates, then perform a vector search amongst this candidate set(i.e. restricted to those IDs) for a second phase. Since Spice supports standard SQL constructs, you can express these multi-step plans with common table expressions (CTEs) and joins.
  • ' } /> To illustrate hybrid search, here's a SQL snippet that uses the Reciprocal Rank Fusion (RRF) technique to merge vector and text search results for the same query (RRF is used, when needed, in the v1/search HTTP API):

    " } /> ```sql WITH vector_results AS ( SELECT review_id, RANK() OVER (ORDER BY score DESC) AS vector_rank FROM vector_search(reviews, 'issues with same day shipping') ), text_results AS ( SELECT review_id, RANK() OVER (ORDER BY score DESC) AS text_rank FROM text_search(reviews, 'issues with same day shipping') ) SELECT COALESCE(v.review_id, t.review_id) AS review_id, -- RRF scoring: 1/(60+rank) from each source (1.0 / (60 + COALESCE(v.vector_rank, 1000)) + 1.0 / (60 + COALESCE(t.text_rank, 1000))) AS fused_score FROM vector_results v FULL OUTER JOIN text_results t ON v.review_id = t.review_id ORDER BY fused_score DESC LIMIT 50; ``` This takes the vector similarity results and text (BM25) results, assigns each a rank based not on the score, but rather the relative order of candidates, and combines these ranks for an overall order. Spice's primary key SQL semantics easily enables this document ID join.

    " } /> For a multi-column vector search example, suppose our reviews dataset has both a title and body with embeddings, and we want to prioritize title matches higher. We could create a combined_score where the title is weighted twice as high as the body:

    ' } /> ```sql WITH body_results AS ( SELECT review_id, score AS body_score FROM vector_search(reviews, 'issues with same day shipping', col => 'body') ), title_results AS ( SELECT review_id, score AS title_score FROM vector_search(reviews, 'issues with same day shipping', col => 'title') ) SELECT COALESCE(body.review_id, title.review_id) AS review_id, COALESCE(body_score, 0) + 2.0 * COALESCE(title_score, 0) AS combined_score FROM body_results FULL OUTER JOIN title_results ON body_results.review_id = title_results.review_id ORDER BY combined_score DESC LIMIT 5; ``` These examples scratch the surface of what you can do by leveraging Spice's SQL-based composition. The key point is that Spice isn't just a vector database - it's a hybrid engine that lets you combine vector search with other query logic (text search, filters, joins, aggregations, etc.) all in one place. This can significantly simplify building complex search and AI-driven applications.

    " } /> (Note: Like most vector search systems, S3 Vectors uses an approximate nearest neighbor (ANN) algorithm under the hood for performance. This yields fast results that are probabilistically the closest, which is usually an acceptable trade-off in practice. Additionally, in our examples we focused on one embedding per row; production systems may use techniques like chunking text into multiple embeddings or adding external context, but the principles above remain the same.)

    ' } /> Industry Context and Comparisons' } /> The rise of vector databases over the past few years (Pinecone, Qdrant, Weaviate, etc.) has been driven by the need to serve AI applications with semantic search at scale. Each solution takes a slightly different approach in architecture and trade-offs. Spice.ai's integration with Amazon S3 Vectors represents a newer trend in this space: decoupling storage from compute for vector search, analogous to how data warehouses separated compute and storage in the past. Let's compare this approach with some existing solutions:

    " } />
  • Traditional Vector Databases (Qdrant, Weaviate, Pinecone): These systems typically run as dedicated services or clusters that handle both the storage of vectors (on disk or in-memory) and the computation of similarity search. For example, Qdrant (an open-source engine in Rust) allows either in-memory storage or on-disk storage (using RocksDB) for vectors and payloads. It\'s optimized for high performance and offers features like filtering, quantization, and distributed clustering, but you generally need to provision servers/instances that will host all your data and indexes. Weaviate, another popular open-source vector DB, uses a Log-Structured Merge (LSM) tree based storage engine that persists data to disk and keeps indexes in memory. Weaviate supports hybrid search (it can combine keyword and vector queries) and offers a GraphQL API, with a managed cloud option priced mainly by data volume. Pinecone, a fully managed SaaS, also requires you to select a service tier or pod which has certain memory/CPU allocated for your index - essentially your data lives in Pinecone\'s infrastructure, not in your AWS account. These solutions excel at low-latency search for high query throughput scenarios (since data is readily available in RAM or local SSD), but the cost can be high for large datasets. You pay for a lot of infrastructure to be running, even during idle times. In fact, prior to S3 Vectors, vector search engines often stored data in memory at ~$2/GB and needed multiple replicas on SSD, which is "the most expensive way to store data", as Simon Eskildsen (Turbopuffer\'s founder) noted. Some databases mitigate cost by compressing or offloading to disk, but still, maintaining say 100 million embeddings might require a sizable cluster of VMs or a costly cloud plan.
  • ' } />
  • Spice.ai with Amazon S3 Vectors: This approach flips the script by storing vectors in cheap, durable object storage (S3) and loading/indexing them on demand. As discussed, S3 Vectors keeps the entire vector dataset in S3 at ~$0.02/GB storage , and only spins up transient compute (managed by AWS) to serve queries, meaning you aren\'t paying for idle GPU or RAM time. AWS states this design can cut total costs by up to 90% while still giving sub-second performance on billions of vectors. It\'s essentially a serverless vector search model - you don\'t manage servers or even dedicated indices; you just use the API. Spice.ai\'s integration means developers get this cost-efficiency without having to rebuild their application: they can use standard SQL and Spice will push down operations to S3 Vectors as appropriate. This decoupled storage/compute model is ideal for use cases where the data is huge but query volumes are moderate or bursty (e.g., an enterprise semantic search that is used a few times an hour, or a nightly ML batch job). It avoids the "monolithic database" scenario of having a large cluster running 24/7. However, one should note that if you need extremely high QPS (thousands of queries per second at ultra-low latency), a purely object-storage-based solution might not outperform a tuned in-memory vector DB - AWS positions S3 Vectors as complementary to higher-QPS solutions like OpenSearch for real-time needs.
  • ' } />
  • Turbopuffer: Turbopuffer is a startup that, much like Spice with S3 Vectors, is built from first principles on object storage. It provides "serverless vector and full-text search... fast, 10× cheaper, and extremely scalable," by leveraging S3 or similar object stores with smart caching. The philosophy is the same: use the durability and low cost of object storage for the bulk of data, and layer a cache (memory/SSD) in front for performance-critical portions. According to Turbopuffer\'s founder, moving from memory/SSD-centric architectures to an object storage core can yield 100× cost savings for cold data and 6-20× for warm data, without sacrificing too much performance. Turbopuffer\'s engine indexes data incrementally on S3 and uses caching to achieve similar latency to conventional search engines on hot data. The key difference is that Turbopuffer is a standalone search service (with its own API), whereas Spice uses AWS\'s S3 Vectors service as the backend. Both approaches validate the industry trend toward disaggregated storage for search. Essentially, they are bringing the cloud data warehouse economics to vector search: store everything cheaply, compute on demand.
  • ' } /> In summary, Spice.ai's integration with S3 Vectors and similar efforts indicate a shift in vector search towards cost-efficient, scalable architectures that separate the concerns of storing massive vector sets and serving queries. Developers now have options: if you need blazing fast, real time vector search with constant high traffic, dedicated compute infrastructure might be justified. But for many applications - enterprise search, AI assistants with a lot of knowledge but lower QPS, periodic analytics over embeddings - offloading to something like S3 Vectors can save enormously on cost while still delivering sub-second performance at huge scale. And with Spice.ai, you get the best of both worlds: the ease of a unified SQL engine that can do keyword + vector hybrid search on structured data, combined with the power of a cloud-native vector store. It simplifies your stack (no separate vector DB service to manage) and accelerates development since you can join and filter vector search results with your data immediately in one query.

    " } /> References:

    '} /> Weaviate storage architecture discussion

    ' } /> Spice.ai announcement"Spice.ai Now Supports Amazon S3 Vectors For Vector Search at Petabyte Scale!"

    ' } /> Spice.ai Amazon S3 Vectors documentation

    ' } /> Spice.ai Amazon S3 Vectors Cookbook Recipe Sample

    ' } /> Amazon S3 Vectors official page

    ' } /> Pinecone Database Architecture (managed vector database)

    ' } /> Qdrant documentation (storage modes and features)

    ' } /> Turbopuffer blog by Simon Eskildsen (cost of search on object storage)

    ' } />

    '} /> --- ## How we use Apache DataFusion at Spice AI URL: https://spice.ai/blog/how-we-use-apache-datafusion-at-spice-ai Date: 2026-01-15T19:31:56 Description: A technical overview of how Spice extends Apache DataFusion with custom table providers, optimizer rules, and UDFs to power federated SQL, search, and AI inference. **TL;DR:** Spice is built on [Apache DataFusion](https://datafusion.apache.org/), a Rust-native query engine. This post covers how Spice extends DataFusion with custom table providers for [SQL federation](/platform/sql-federation-acceleration), optimizer rules for acceleration routing, UDFs for [hybrid search](/platform/hybrid-sql-search) and [LLM inference](/platform/llm-inference), and lessons learned building a production SQL platform on DataFusion. --- Introducing \'Engineering at Spice AI\'' } /> 'Engineering at Spice AI' is a technical blog series that breaks down the systems and abstractions behind Spice's data and AI platform.

    " } /> We\'ll explain why we chose specific open-source technologies, how we\'ve extended them, and what we\'ve learned building a SQL-first platform for federated query and acceleration, search, and embedded LLM inference.

    ' } /> The goal of this series is to share concrete engineering patterns that teams building data and AI infrastructure can apply in their own systems, while also unfolding how Spice is designed so users understand and can trust the foundations they're relying on. Familiarity with SQL engines, Arrow, or Rust will help.

    " } /> This article kicks off the series by diving into Apache DataFusion, the query engine at the core of Spice.

    ' } /> Future posts will cover:

    '} />
  • Rust at Spice AI - Our systems programming foundation
  • Apache Arrow at Spice AI- Arrow as our core in-memory data format
  • DuckDB at Spice AI - Embedded analytics acceleration
  • Apache Iceberg at Spice AI - Open table format and SQL-based ingestion
  • Vortex at Spice AI - Columnar compression for Cayenne, our premier data accelerator
  • ' } /> Spice.ai Compute Engine
    Figure 1: The Spice architecture, built on open-source
    ' } /> What is Apache DataFusion?' } /> Apache DataFusion is a fast, extensible query engine written in Rust. It provides SQL and DataFrame APIs, a query planner, a cost-based optimizer, and a multi-threaded execution engine, all built on Apache Arrow.

    ' } /> DataFusion provides the complete query execution pipeline:

    '} /> SQL → Parsed SQL (*AST) → Logical Plan → Optimizer → Physical Plan → Execution → Arrow Results

    ' } /> *Abstract Syntax Tree

    ' } /> DataFusion provides extension points across planning and execution, which we use to add custom table providers (20+ sources), optimizer rules (federation and acceleration pushdowns), and UDFs (AI inference, vector search, and text search).

    ' } /> Each stage is extensible:

    '} />
    StageExtension PointSpice Extensions
    ParserCustom SQL syntax-
    Logical PlanningTableProvider, ScalarUDF, TableFunction20+ data connectors
    OptimizationOptimizerRule, AnalyzerRuleFederation analyzer
    Physical PlanningExtensionPlannerDuckDB aggregate pushdowns
    ExecutionExecutionPlanSchema casting, managed streams, fallback execution
    ' } /> Why DataFusion at Spice' } /> The core technical challenge we were looking to solve was executing one logical query across many fundamentally different systems: operational databases, data warehouses, object stores, streams, APIs, and more - while still making that query fast, composable, and extensible enough to evolve with rapidly changing data and AI workloads.

    ' } /> After evaluating several engines, DataFusion was the one that met those requirements without forcing architectural compromises:

    ' } />
  • Native Rust and Arrow: DataFusion is written in Rust and uses Arrow as its native memory format, matching our architecture without foreign function interface (FFI) overhead, runtime boundary crossings, or data format conversions.
  • Extensibility: Every component can be replaced or extended. We can add custom data sources, optimizer rules, and execution plans without forking the core engine.
  • Active community: DataFusion has an active community with regular releases. We contribute upstream when our extensions benefit the broader ecosystem.
  • Performance: DataFusion\'s execution engine uses:
    • Vectorized processing with Arrow arrays
    • Push-based execution for streaming
    • Partition-aware parallelism that scales with CPU cores
    • Predicate and projection pushdown to minimize data movement
  • ' } /> How we use DataFusion' } /> We treat DataFusion as a programmable query compiler and runtime.

    ' } /> At a high level, DataFusion gives us:

    '} />
  • A full SQL -> logical -> physical execution pipeline
  • A cost-based optimizer we can extend and rewrite
  • Stable extension points at every stage of planning and execution
  • Arrow-native, vectorized execution that works equally well for analytics and streaming results
  • ' } /> The unique qualities around our DataFusion implementation includes:

    ' } />
  • Deciding where a query should execute (source vs. local accelerator)
  • Deciding when cached data is valid, stale, or needs fallback
  • Injecting AI inference and search functions directly into SQL
  • Coordinating execution across local, remote, and hybrid plans
  • ' } /> DataFusion is one of the few engines where these decisions can be expressed inside the planner and execution engine itself, rather than bolted on externally.

    ' } /> What DataFusion delivers for Spice' } /> DataFusion enables several components that define Spice today:

    ' } />
  • SQL federation: We can push computation down to source systems or pull it into local accelerators
  • Pluggable acceleration: Spice accelerates datasets by materializing them in local compute engines, providing applications with high-performance, low-latency queries and dynamic compute flexibility beyond static materialization.  
    • Spice supports multiple acceleration engines as first-class execution targets: ApacheDataFusion + Apache Arrow, SQLite, Spice Cayenne, and DuckDB, with options for in-memory or on-disk storage. Accelerations are implemented as a standard DataFusion TableProvider that manages two underlying table providers: a federated table provider (pointing to the source system) and an acceleration engine table provider (the local materialized copy). This architecture enables accelerations to integrate with DataFusion\'s query planning and execution; Spice manages refresh by executing special DataFusion queries on the federated table and inserting results into the accelerated table. Most user queries are served directly from the accelerated table, with automatic fallbacks to the federated table under specific conditions (such as cache misses or data freshness requirements). 
  • ' } /> ```rust pub struct AcceleratedTable { dataset_name: TableReference, accelerator: Arc, // Local cache (DuckDB, SQLite, Arrow) federated: Arc, // Source data zero_results_action: ZeroResultsAction, // Fallback behavior refresh_mode: RefreshMode, // Full, Append, Changes } ```
  • Search and AI as query operators: Vector search, text search, and LLM calls are modeled as UDFs and table functions.
  • Resilient, production-grade execution: Deferred connections, fallback execution, schema casting, and cache invalidation all live inside the engine, not the application layer.
  • Incremental improvements: As DataFusion adds new optimizer capabilities, execution primitives, and APIs, we can adopt them incrementally while still shipping Spice-specific features on our own cadence
  • ' } />
    Figure 2: Spice Cayenne integration with DataFusion
    ' } /> With that context, let's zoom in to some of the specifics of our implementation.

    " } /> SessionState Configuration' } /> DataFusion\'s SessionState holds all configuration for query execution.

    ' } /> Key configuration choices:

    ' } />
  • PostgreSQL Dialect: We use PostgreSQL syntax to provide a widely supported SQL dialect with consistent, well-understood semantics. 
  • Case-Sensitive Identifiers: Disabled normalization preserves column case from source systems. While we use the PostgreSQL dialect for syntax, we differ from PostgreSQL\'s behavior of normalizing identifiers to lower-case. 
  • Custom Analyzer Rules: Our federation analyzer runs before DataFusion\'s default rules to ensure we produce valid federated plans. Some default optimizer rules assume a single execution engine and can generate invalid plans for federation, so we intercept early and then selectively apply DataFusion optimizations such as predicate and column pushdown. 
  • ' } /> Custom TableProvider Implementations' } /> TableProvider is the interface between DataFusion and data sources. We implement it for every connector, but they fall into two distinct categories based on execution model:

    ' } /> SQL-Federated Sources: For these sources (e.g. PostgreSQL, MySQL, DuckDb, and Snowflake), the TableProvider acts primarily as a marker for the federation analyzer to discover. The actual execution doesn\'t follow DataFusion\'s normal path - instead, the federation analyzer identifies these tables and replaces them with a simple execution plan that defers computation to the remote source. 

    ' } /> Take this query as an example:  

    '} /> ```sql SELECT count(*), course FROM duckdb_table GROUP BY course; ``` This query is sent almost unchanged to DuckDB, with DataFusion doing minimal work in the middle. 

    ' } /> For single-source queries, Spice can often push the SQL down nearly unchanged. When a query spans multiple sources (e.g., JOIN/UNION across tables from different systems), Spice splits the work: it pushes per-source subqueries down, then lets DataFusion combine the results locally (see the SQL Federation section below for details). 

    ' } /> Non-Federated Sources: For data lake tables and streaming sources where data is stored in formats like Parquet or Vortex files, all execution happens within DataFusion. Here, the TableProvider implementation is critical; DataFusion directly uses the scan() method to create execution plans, and proper implementation of filter pushdown, projection, and other capabilities directly impacts query performance. 

    ' } /> Accelerated dataset architecture  ' } /> When acceleration is enabled for a dataset, we use a layered TableProvider architecture: 

    ' } /> ┌─────────────────────────────────────────────────────────┐\n│ AcceleratedTable │\n│ Wraps federated source with local cache │\n│ Handles refresh, fallback, zero-results policies │\n├─────────────────────────────────────────────────────────┤\n│ Accelerator TableProvider │\n│ DuckDB, SQLite, Arrow, Cayenne, PostgreSQL │\n├─────────────────────────────────────────────────────────┤\n│ FederatedTable │\n│ Supports immediate or deferred connection │\n│ Enables SQL pushdown to source │\n├─────────────────────────────────────────────────────────┤\n│ Connector TableProvider │\n│ PostgreSQL, Snowflake, S3, DuckDB, etc. │\n└─────────────────────────────────────────────────────────┘' } /> For non-accelerated datasets, the architecture is simpler; we register the federated TableProvider directly in DataFusion, without the AcceleratedTable layer or accelerator engine. 

    ' } /> AcceleratedTable' } /> ```rust pub struct AcceleratedTable { dataset_name: TableReference, accelerator: Arc, // Local cache (DuckDB, SQLite, Arrow) federated: Arc, // Source data zero_results_action: ZeroResultsAction, // Fallback behavior refresh_mode: RefreshMode, // Full, Append, Changes } ``` AcceleratedTable provides:

    '} />
  • Local query execution against the accelerator
  • Background refresh from the federated source
  • Fallback to source when local returns zero results (configurable)
  • ' } /> FederatedTable'} /> ```rust pub enum FederatedTable { // TableProvider available immediately Immediate(Arc), // Retries connection in background, serves stale data from checkpoint Deferred(DeferredTableProvider), } ``` Deferred mode enables resilient startup. If a source is temporarily unavailable, Spice starts with cached data and retries in the background.

    ' } /> Data Source Coverage' } /> We implement TableProvider for 20+ sources:

    ' } />
    CategorySources
    DatabasesPostgreSQL, MySQL, SQLite, DuckDB, MongoDB, Oracle, MSSQL, ClickHouse, Turso
    WarehousesSnowflake, Databricks, BigQuery, Redshift
    LakesDelta Lake, Iceberg, S3, Azure Blob, GCS
    StreamingKafka, Debezium, DynamoDB Streams
    APIsGraphQL, HTTP/REST, GitHub, SharePoint
    SpecializedFTP/SFTP, SMB/NFS
    ' } /> SQL Federation'} /> For sources that support SQL (databases, warehouses), we push queries down rather than pulling all data; this means we minimize the work DataFusion does in the middle. The user query is parsed into a LogicalPlan, which the federation analyzer captures and converts (via the DataFusion unparser) into dialect-specific SQL executed directly by the source.  

    ' } /> ```sql -- User query SELECT name, SUM(amount) FROM sales WHERE region = 'NA' AND date > '2024-01-01' GROUP BY name -- What we push to Snowflake (via Arrow Flight SQL) SELECT name, SUM(amount) FROM sales WHERE region = 'NA' AND date > '2024-01-01' GROUP BY name -- Only aggregated results flow over the network ``` Multi-source query splitting ' } /> When a query references multiple federated tables - like a JOIN between Postgres and Snowflake - the federation analyzer rewrites the LogicalPlan into per-source subqueries. Each source executes its portion with filters/projections pushed down, and DataFusion performs the remaining work locally (e.g., join, union, final projection). 

    ' } /> Consider this query:  

    '} /> ```sql SELECT o.order_id, o.order_date, c.name AS customer_name FROM postgres.sales.orders o JOIN snowflake.crm.customers c ON o.customer_id = c.customer_id WHERE o.order_date >= DATE '2025-01-01' AND c.country = 'KR'; ``` The federation analyzer will split this into two queries, one each to Postgres and Snowflake:

    ' } /> Postgres:

    '} /> ```sql SELECT order_id, order_date, customer_id FROM sales.orders WHERE order_date >= DATE '2025-01-01'; ``` Snowflake:

    '} /> ```sql SELECT customer_id, name FROM crm.customers WHERE country = 'KR'; ``` Federation Architecture' } /> We use the datafusion-federation crate to handle query pushdown. At a high level, this enables DataFusion to identify sub-plans in a query that can be executed by an external system (for example, a database or warehouse), push those sub-plans down for remote execution, and then combine the results locally only when necessary. 

    ' } /> This is how Spice can efficiently execute queries that span multiple systems, pushing filters, projections, joins, and aggregates to each source when supported, while handling any cross-source work inside DataFusion. 

    ' } /> Future articles will explore Spice\'s federation architecture in more detail. For readers interested in the underlying framework today, see the datafusion-federation README

    ' } /> Dialect Translation' } /> Different databases have different SQL dialects. As part of the query pipeline, we first parse the user query into a DataFusion LogicalPlan. The federation analyzer then captures that plan and uses the DataFusion unparser - extended with source-specific dialect rules - to convert it back into SQL that can be executed natively by the underlying system. 

    ' } /> We rewrite DataFusion functions into their source-native equivalent: 

    ' } /> ```rust pub fn new_duckdb_dialect() -> Arc { DuckDBDialect::new().with_custom_scalar_overrides(vec![ // cosine_distance → array_cosine_distance (COSINE_DISTANCE_UDF_NAME, Box::new(duckdb::cosine_distance_to_sql)), // rand() → random() ("rand", Box::new(duckdb::rand_to_random)), // regexp_like → regexp_matches (REGEXP_LIKE_NAME, Box::new(duckdb::regexp_like_to_sql)), ]) } ``` Custom Optimizer Rules' } /> DataFusion's optimizer is a pipeline of rules that can rewrite or wrap a logical plan. We extend this pipeline with our own rules for two purposes: (1) semantics-preserving rewrites that produce logically equivalent plans with better execution characteristics, and (2) engine-level behavior that we inject at planning time using the same rule extension point (for example, cache invalidation around DML).

    " } /> Some examples:

    '} /> Cache Invalidation Rule' } /> Cache invalidation is not a performance optimization; it's engine logic needed to keep cached results consistent after data changes. We implement it using DataFusion's optimizer rule interface as an extension point: when the planner encounters a DML statement (INSERT, UPDATE, DELETE), we wrap that DML plan in an extension node that triggers invalidation for the affected table(s) after the statement completes. 

    " } /> ```rust impl OptimizerRule for CacheInvalidationOptimizerRule { fn name(&self) -> &'static str { "cache_invalidation" } fn rewrite( &self, plan: LogicalPlan, _config: &dyn OptimizerConfig, ) -> Result> { plan.transform_down(|plan| match plan { LogicalPlan::Dml(dml) => { // Wrap DML with cache invalidation node let node = CacheInvalidationNode::new( LogicalPlan::Dml(dml), table_name, Weak::clone(&self.caching), ); Ok(Transformed::yes(LogicalPlan::Extension( Extension { node: Arc::new(node) } ))) } _ => Ok(Transformed::no(plan)), }) } } ``` DuckDB Aggregate Pushdown' } /> When federation is enabled, aggregate pushdown is normally handled by the federation analyzer. When federation is disabled, those analyzer-based pushdowns do not run, and aggregates would not be pushed down through the standard TableProvider interface. To preserve aggregate pushdown for DuckDB-accelerated tables in that configuration, we apply a DuckDB-specific optimizer rule that recognizes supported aggregate functions and rewrites the plan to execute the aggregation inside DuckDB: 

    ' } /> ```rust static SUPPORTED_AGG_FUNCTIONS: LazyLock> = LazyLock::new(|| { HashSet::from([ // Basic aggregates "avg", "count", "max", "min", "sum", // Statistical "corr", "covar_pop", "stddev_pop", "var_pop", // Boolean "bool_and", "bool_or", // Approximate "approx_percentile_cont", ]) }); ``` When enabled, the optimizer rewrites:

    '} /> ```sql -- Original (DataFusion executes aggregate) SELECT region, SUM(sales) FROM duckdb_table GROUP BY region -- Rewritten (DuckDB executes aggregate via SQL federation) SELECT region, SUM(sales) FROM duckdb_table GROUP BY region -- Pushed as native DuckDB SQL ``` Physical Optimizer: Empty Hash Join' } /> If we can prove one side of a join is empty at planning time, we skip execution:

    ' } /> ```rust impl PhysicalOptimizerRule for EmptyHashJoinExecPhysicalOptimization { fn optimize( &self, plan: Arc, _config: &ConfigOptions, ) -> Result> { plan.transform_down(|plan| { let Some(join) = plan.as_any().downcast_ref::() else { return Ok(Transformed::no(plan)); }; let is_empty = match join.join_type() { JoinType::Inner => guaranteed_empty(join.left()) || guaranteed_empty(join.right()), JoinType::Left => guaranteed_empty(join.left()), // ... other join types }; if is_empty { Ok(Transformed::yes(Arc::new(EmptyExec::new(join.schema())))) } else { Ok(Transformed::no(plan)) } }).data() } } ``` User-Defined Functions' } /> DataFusion supports scalar UDFs, aggregate UDFs, and table-valued functions.

    ' } /> We use all three:

    '} /> Scalar UDFs'} /> Simple functions that operate on individual values:

    '} /> ```rust use datafusion::common::hash_utils::create_hashes; pub struct Bucket; impl ScalarUDFImpl for Bucket { fn name(&self) -> &'static str { "bucket" } fn signature(&self) -> &Signature { &Signature::any(2, Volatility::Immutable) } fn return_type(&self, arg_types: &[DataType]) -> Result { Ok(DataType::Int32) } fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result { let args = args.args; let num_args = args.len(); if num_args != 2 { return Err(BucketError::InvalidArgumentCount { count: args.len() }.into()); } let num_buckets = match &args[0] { ColumnarValue::Scalar(ScalarValue::Int64(Some(n))) => { if *n <= 0 || *n > MAX_NUM_BUCKETS { return Err(BucketError::InvalidNumBuckets { num_buckets: *n }.into()); } *n } arg => { return Err(BucketError::InvalidFirstArgType { description: describe_columnar_value(arg), } .into()); } }; match &args[1] { ColumnarValue::Scalar(scalar) => { let bucket = compute_bucket(scalar, num_buckets)?; Ok(ColumnarValue::Scalar(bucket)) } ColumnarValue::Array(array) => { let buckets = compute_bucket_array(array, num_buckets)?; Ok(ColumnarValue::Array(Arc::new(buckets))) } } } } fn compute_bucket(scalar: &ScalarValue, num_buckets: i64) -> Result { if scalar.is_null() { return Ok(ScalarValue::Int32(None)); } let array = scalar.to_array()?; let mut hashes = vec![0; 1]; create_hashes(&[array], &RANDOM_STATE, &mut hashes)?; Ok(ScalarValue::Int32(Some( u64::try_from(num_buckets) .and_then(|n| i32::try_from(hashes[0] % n)) .context(BucketLargerThanTypeSnafu)?, ))) } ``` Async Scalar UDFs for AI' } /> LLM calls are async. DataFusion\'s AsyncScalarUDFImpl trait enables this:

    ' } /> ```rust pub struct Ai { model_store: Arc>, } #[async_trait] impl AsyncScalarUDFImpl for Ai { fn name(&self) -> &str { "ai" } async fn invoke_async( &self, args: ScalarFunctionArgs, ) -> DataFusionResult { let prompt = extract_string(&args.args[0])?; let model_name = extract_string(&args.args[1])?; let model = self.model_store.read().get(&model_name)?; let response = model.complete(&prompt).await?; Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(response)))) } } ``` Example usage:

    '} /> ```sql SELECT ai('Summarize this text: ' || content, 'gpt-4') as summary FROM documents ``` ‍‍Table-Valued Functions' } /> vector_search() and text_search() return tables:

    ' } /> ```rust impl TableFunctionImpl for VectorSearchTableFunc { fn call(&self, args: &[Expr]) -> DataFusionResult> { let parsed = Self::parse_args(args)?; let df = self.df.upgrade().context("Runtime dropped")?; let table = df.get_table_sync(&parsed.table)?; let embedding_table = find_embedding_table(&table)?; Ok(Arc::new(VectorSearchUDTFProvider { args: parsed, underlying: table, embedding_models: embedding_table.embedding_models, })) } } ``` Example usage:

    '} /> ```sql SELECT * FROM vector_search( 'documents', 'embedding_column', 'search query text', 10 -- top k ) ``` UDF Registration' } /> All UDFs are registered at runtime startup:

    ' } /> ```rust pub async fn register_udfs(runtime: &crate::Runtime) { let ctx = &runtime.df.ctx; // Scalar UDFs ctx.register_udf(CosineDistance::new().into()); ctx.register_udf(Bucket::new().into()); ctx.register_udf(Truncate::new().into()); // Async UDFs for AI #[cfg(feature = "models")] { ctx.register_udf(Embed::new(runtime.embeds()).into()); ctx.register_udf( Ai::new(runtime.completion_llms()) .into_async_udf() .into_scalar_udf(), ); } // Table-valued functions ctx.register_udtf("vector_search", Arc::new(VectorSearchTableFunc::new(...))); ctx.register_udtf("text_search", Arc::new(TextSearchTableFunc::new(...))); } ``` Physical Execution Extensions' } /> Sometimes we need custom execution behavior beyond logical planning:

    ' } /> FallbackOnZeroResultsScanExec' } /> If an accelerated table returns zero rows, optionally fall back to the source:

    ' } /> ```rust pub struct FallbackOnZeroResultsScanExec { input: Arc, fallback_table_provider: FallbackAsyncTableProvider, fallback_scan_params: TableScanParams, } impl ExecutionPlan for FallbackOnZeroResultsScanExec { fn execute( &self, partition: usize, context: Arc, ) -> DataFusionResult { let input_stream = self.input.execute(partition, context.clone())?; // Wrap stream to detect zero results and trigger fallback Ok(Box::pin(FallbackStream::new( input_stream, self.fallback_table_provider.clone(), self.fallback_scan_params.clone(), context, ))) } } ``` SchemaCastScanExec' } /> SchemeCastScanExec handles type representation differences across systems during streaming. Different systems represent the same logical types differently. For example, SQLite only supports 5 types while Arrow has 30+ types. SchemaCastScanExec maps between these type system differences as data streams through DataFusion, ensuring type compatibility across connectors:  

    ' } /> ```rust pub struct SchemaCastScanExec { input: Arc, target_schema: SchemaRef, } impl ExecutionPlan for SchemaCastScanExec { fn execute(...) -> DataFusionResult { let input_stream = self.input.execute(partition, context)?; Ok(Box::pin(SchemaCastStream::new( input_stream, Arc::clone(&self.target_schema), ))) } } ``` Extension Planners' } /> Custom logical plan nodes need physical planners:

    '} /> ```rust pub fn default_extension_planners() -> Vec> { vec![ Arc::new(IndexTableScanExtensionPlanner::new()), Arc::new(FederatedPlanner::new()), Arc::new(CacheInvalidationExtensionPlanner::new()), #[cfg(feature = "duckdb")] DuckDBLogicalExtensionPlanner::new(), ] } ``` Our DataFusion fork and contributions' } /> Building Spice on Apache DataFusion meant moving quickly at layers of the engine that are still actively evolving upstream. Very early on, we made a deliberate decision to maintain a fork of DataFusion rather than treat it as a fixed dependency. We maintain a fork of DataFusion at spiceai/datafusion:

    ' } /> ```yaml datafusion = { git = "https://github.com/spiceai/datafusion", rev = "10b5cc5" } ``` The benefits of maintaining our own fork include: 

    ' } /> Faster iteration: We can ship features before they\'re merged upstream. Some patches are tightly coupled to Spice-specific concepts-federation semantics, acceleration policies, or execution behaviors that don\'t generalize cleanly to other DataFusion users. Keeping those changes in our fork lets us move fast without forcing premature abstractions into the core engine. 

    ' } /> Predictable stability: We control when we rebase, when we absorb breaking changes, and how we roll out upgrades. This is critical for a production system that spans dozens of connectors and execution paths. 

    ' } /> That said, we work hard to avoid drifting away from the community. When improvements are broadly useful-bug fixes, performance optimizations, clearer APIs, or missing documentation-we contribute them back upstream. We stay close to DataFusion's main branch and regularly rebase our fork, treating upstream not as an external dependency but as a shared foundation we help maintain. 

    " } /> Lessons learned from building on DataFusion' } /> After building Spice on top of DataFusion in production for multiple years, a few patterns and lessons have consistently stood out:

    ' } /> 1. TableProvider is incredibly powerful: The TableProvider abstraction lets us add any data source without modifying DataFusion. We\'ve implemented 20+ connectors this way.

    2. Optimizer rules compose well: Each rule does one thing. Cache invalidation, aggregate pushdown, and empty join elimination all coexist without conflicts.

    3. Physical planning is the escape hatch: When logical transformations aren\'t enough, custom ExecutionPlan implementations let us do anything - fallback streams, schema casting, managed runtimes.

    4. Schema metadata is your friend: Arrow schema metadata flows through the entire pipeline. We use it for:

    ' } />
  • Source tracking (which connector)
  • Acceleration status (accelerated vs. federated)
  • Optimization hints (enable aggregate pushdown)
  • ' } /> 5. Async UDFs open new possibilities: DataFusion\'s async UDF support enables SQL-embedded AI:

    ' } /> ```sql SELECT ai('Summarize: ' || text) FROM articles ``` This wouldn't be possible with synchronous-only UDFs.

    "} /> 6. Federation requires dialect awareness: Different databases have different SQL. Plan for dialect translation from the start, not as an afterthought.

    ' } /> Conclusion'} /> Apache DataFusion is the foundation of Spice's query engine: parsing, planning, optimization, and execution all delivered entirely in Rust, with native Arrow memory and vectorized execution. Its design lets us extend the engine at every layer, adding custom table providers, optimizer rules, and execution operators without rewriting or wrapping the core.

    " } /> DataFusion isn't just a fast SQL engine - it's a programmable query compiler. Stable extension points like TableProvider, OptimizerRule, ExecutionPlan, and ScalarUDFImpl allow us to express federation, acceleration, search, and AI inference inside the planner and runtime, not as external systems. By building on these abstractions and contributing improvements back upstream, we get a production-grade engine that evolves with our needs rather than constraining them.

    " } /> If you have further questions about our implementation or are interested in learning more about the Spice platform, join us on Slack.

    ' } /> References'} />
  • Writing a Custom TableProvider
  • Apache DataFusion Documentation
  • DataFusion GitHub
  • DataFusion Examples
  • datafusion-federation
  • ' } />

    '} /> ## Frequently Asked Questions ### What is Apache DataFusion? Apache DataFusion is an extensible SQL query engine written in Rust that uses Apache Arrow as its in-memory columnar format. It provides SQL parsing, query planning, optimization, and execution, and is designed to be embedded into other systems via well-defined extension points like `TableProvider`, `OptimizerRule`, and `ExecutionPlan`. ### How does Spice AI use Apache DataFusion? Spice uses DataFusion as its core query engine for parsing SQL, planning queries, optimizing execution, and running federated queries across 30+ data sources. Spice extends DataFusion with custom table providers for [SQL federation](/platform/sql-federation-acceleration), optimizer rules for acceleration routing, UDFs for [hybrid search](/platform/hybrid-sql-search) and [LLM inference](/platform/llm-inference), and physical operators for schema casting and fallback streams. ### What are DataFusion TableProviders? TableProviders are DataFusion's abstraction for data sources. Each TableProvider implements methods to describe the table schema, report statistics, and produce execution plans. Spice uses custom TableProviders to connect to databases like PostgreSQL, DynamoDB, Snowflake, and Iceberg catalogs, enabling federated SQL queries across all of them. ### Can DataFusion handle federated queries across multiple databases? Yes. Through DataFusion's `TableProvider` interface and custom optimizer rules, Spice routes queries to remote databases via predicate and projection pushdown, executes portions of queries at the source, and combines results locally. This enables joining data across PostgreSQL, S3, Snowflake, and other sources in a single SQL query. ### Does Spice contribute back to the Apache DataFusion project? Yes. Spice maintains a fork of DataFusion for rapid iteration but regularly contributes improvements back upstream, including bug fixes, performance optimizations, and API enhancements. The team treats upstream DataFusion as a shared foundation rather than an external dependency. --- ## Interviewing at Spice AI URL: https://spice.ai/blog/interviewing-at-spice-ai Date: 2024-03-14T19:22:00 Description: A guide to the Spice AI interview process, covering what to expect at each stage, how we evaluate candidates, and tips for preparation. It's been said that a leader's core job is two things. One, set direction and two, put the right people in the right roles. That's why at Spice AI, the mission, hiring, and interviewing is fundamental and the people we bring into the team is critical for success. This post walks through our interviewing process as a guide for candidates and to share our learnings with the broader community.

    " } /> Spice AI Hiring Principles' } /> Everything we do at Spice AI is built from first-principles. Here are our hiring principles:

    ' } />
  • High Standards - Every hire is exceptional and a bar raiser.
  • Mission-Driven - Candidates are excited by and want to contribute to the Spice AI mission.
  • Able - Hires are intelligent, are great communicators, and have done this before with a proven track record. As a startup, we need contributors who can hit the ground running and contribute from day 1. We\'re also distributed, so communication is critical.
  • Willing - Hires know the value of overcoming challenge, doing hard things, and are willing to put in the work to build something great together.
  • Clarity of thought - Hires demonstrate clear thinking and problem solving, with written pieces to back it up, because "if you\'re not writing, you\'re not thinking."
  • Great judgement - Hires have a history of making great judgement calls and decisions.
  • Vectors over scalars - Beyond table stakes, direction and velocity more important than a single measure of skill or ability.
  • Hire people not just resumes - We\'re on this mission and are building this company together.
  • ' } /> A Unique Approach to Interviews' } /> The Spice AI interview process includes a couple of unique elements. We generally minimize methods, like whiteboard interviews, which we believe to have limited value.

    ' } /> Pair Programming: Pair programming sessions help assess collaboration, clarity of thought, and problem solving using real-world challenges. The first session is in your choice of Go or Rust, playing to your strengths, and the second in the other, demonstrating drive for results, adaptability, problem solving, and learning rate.

    ' } /> Book and Paper Challenge: We introduce a novel way to get to know candidates and for them to get to know the team by discussing a research paper, white paper, or leadership book. Think of it like a mini book club. We ask the candidate to lead the discussion as if they were on the team already.

    ' } /> Example papers: MEMTOTemporal Fusion TransformersToolLLMForecasting at Scale.

    ' } /> Example books: MindsetThe Big LeapExtreme OwnershipGritThe DipThe Phoenix Project.

    ' } /> Standard Interview Process' } /> Once a candidate proceeds to the engineering team, the standard interview process has two stages, either on the same or different days, and is as follows.

    ' } />
    The Spice AI interview process
    ' } /> After the interview, the team will meet separately to discuss the candidate's performance. Candidate profiles put forward as hires will then be presented in the weekly team meeting. A member of the team must advocate for the candidate for them to be given an offer.

    " } /> Keys to Success at Spice AI' } /> To succeed in becoming a part of the Spice AI team, candidates must demonstrate more than just technical proficiency. Being an A-player is table stakes. We look for individuals who are adaptable, thrive on challenge, and are deeply committed to the mission of making a better world through intelligent, AI-driven software.

    ' } /> Conclusion'} /> At Spice AI, we believe A-players want to work with other A-players to create and contribute to something meaningful. We strive to build a team of exceptional individuals aligned to the Spice AI vision and mission. Hiring is principled and uniquely designed to ensure the right people are in the right roles. The interview process is thorough and challenging, but know that if you succeed, you will join and be working with others who have also overcome it and want to build side-by-side with you.

    ' } /> If that resonates with you, check out our open roles at spice.ai/careers, and apply today!

    ' } /> About Spice AI'} /> Founded in June 2021 by Microsoft and GitHub alumni Luke Kim and Phillip LeBlanc, Spice AI creates technology to help developers build intelligent apps that learn and adapt.

    ' } /> Before co-founding Spice AI, Luke was the co-creator of Azure Incubations in the Office of the Azure CTO, where he led cross-functional engineering teams to create and develop technologies like DaprOAM, and Radius.

    ' } /> Spice AI is backed by some of the top industry angel investors and leaders, including Nat Friedman, Chairman of GitHub, Mark Russinovich, CTO of Microsoft Azure, and Thomas Dohmke, CEO of GitHub who is also on Spice AI\'s board.

    ' } /> Spice AI also has notable VC backing from Madrona Venture GroupBasis Set VenturesFounders\' Co-op, and Picus Capital.

    ' } /> Learn More'} />
  • About Spice AI
  • Spice AI Careers
  • The Spice AI Blog
  • Spice.ai OSS GitHub
  • TechCrunch and GeekWire
  • ' } />

    '} /> --- ## Introducing Spice Cayenne: The Next-Generation Data Accelerator Built on Vortex for Performance and Scale URL: https://spice.ai/blog/introducing-spice-cayenne-data-accelerator Date: 2025-12-17T23:58:50 Description: Spice Cayenne is the next-generation Spice.ai data accelerator built for high-scale and low latency data lake workloads. TLDR'} /> Spice Cayenne is the next-generation Spice.ai data accelerator built for high-scale and low latency data lake acceleration workloads. It combines the Vortex columnar format with an embedded metadata engine to deliver faster queries and significantly lower memory usage than existing Spice data accelerators, including DuckDB and SQLite. Watch the demo for an overview of Spice Cayenne and Vortex.

    ' } /> ' } /> Introduction'} /> Spice.ai is a modern, open-source SQL query engine that enables development teams to federate, accelerate, search, and integrate AI across distributed data sources. It\'s designed for enterprises building data-intensive applications and AI agents across disparate, tiered data infrastructure. Data acceleration of disparate and disaggregated data sources is foundational across many vertical use cases the Spice platform enables.

    ' } /> Spice leans into the industry shift to object storage as the primary source of truth for applications. These object store workloads are often multi-terabyte datasets using open data lake formats like Parquet, Iceberg, or Delta that must serve data and search queries for customer-facing applications with sub-second performance. Spice data acceleration, which transparently materializes working sets of data in embedded databases like DuckDB and SQLite, is the core technology that makes these applications built on object storage functional. Embedded data accelerators are fast and simple for datasets up to 1TB, however for multi-terabyte workloads, a new class of accelerator is required.

    ' } /> So we built Spice Cayenne, the next-generation data accelerator for high volume and latency-sensitive applications. 

    ' } /> Spice Cayenne combines Vortex, the next-generation columnar file format from the Linux Foundation, with a simple, embedded metadata layer. This separation of concerns ensures that both the storage and metadata layers are fully optimized for what each does best. Cayenne delivers better performance and lower memory consumption than the existing DuckDB, Arrow, SQLite, and PostgreSQL data accelerators. 

    ' } /> This post explains why we built Spice Cayenne, how it works, when it makes sense to use instead of existing acceleration options, and how to get started.

    ' } /> How data acceleration works in Spice' } /> Spice accelerates datasets by materializing them in local compute engines; which can be ApacheDataFusion + Apache Arrow, SQLite, or DuckDB, in-memory or on-disk. This provides applications with high-performance, low-latency queries and dynamic compute flexibility beyond static materialization. It also reduces network I/O, avoids repeated round-trips to downstream data sources, and as a result, accommodates applications that need to access disparate data, join that data, and make it really fast. By bringing frequently accessed working sets of data closer to the application, Spice delivers sub-second, often single-digit millisecond queries without requiring additional clusters, ingestion pipelines, or ETL.

    ' } /> ' } /> To support the wide range of enterprise workloads run on Spice, the platform includes multiple acceleration engines suited to different data shapes, query patterns, and performance needs. The Spice ethos is to offer optionality: development teams can choose the engine that best fits their requirements. These are currently the following acceleration engines:

    ' } />
  • PostgreSQL: PostgreSQL is great for row-oriented workloads, but is not optimized for high-volume columnar analytics. 
  • Arrow (in-memory): Arrow is ideal for workloads that need very fast in-memory access and low-latency scans. The tradeoff is that data isn't persisted to disk and more sophisticated operations like indexes aren't supported. 
  • DuckDB: DuckDB offers excellent all-around performance for medium-sized datasets and analytical queries. Single file limits and memory usage, however, can become a constraint as data volume grows beyond a terabyte. 
  • SQLite: SQLite is a lightweight option that excels for smaller tables and row-based lookups. SQLite's single-writer model, file single limits, and limited parallelism make it less ideal for larger or analytical workflows.
  • " } /> Why we built Spice Cayenne'} /> Enterprise workloads on multi-terabyte datasets stored in object storage share a common set of pressure points; the volume of data continues to increase, more applications and services are querying the same accelerated tables at once, and teams need consistently fast performance without having to manage extra infrastructure.

    ' } /> Existing accelerators perform well at smaller scale but run into challenges at different inflection points:

    ' } />
  • Single-file architectures create bottlenecks for concurrency and updates.
  • Memory usage of embedded databases like DuckDB can be prohibitive.
  • Database and search index creation and storage can be prohibitive.
  • ' } /> These constraints inspired us to develop the next-generation accelerator for petabyte-scale, that keeps metadata operations lightweight, and maintains low-latency, high-performance queries even as dataset sizes and concurrency increase. It also was critically important the underlying technologies aligned with the Spice philosophy of open-source with strong community support and governance.

    ' } /> Spice Cayenne addresses these requirements by separating metadata and data storage into two complementary layers: the Vortex columnar format and an embedded metadata engine.

    ' } /> Spice Cayenne architecture '} /> ' } /> Cayenne is built with two core concepts:

    '} /> 1. Data: Vortex Columnar Format' } /> Data is stored in Vortex, the next-generation open-source, Apache-licensed format under the Linux Foundation. 

    ' } /> Compared with Apache Parquet, Vortex provides:

    '} />
  • 100x faster random access
  • 10-20x faster full scans
  • 5x faster writes
  • Zero-copy compatibility with Apache Arrow
  • Pluggable compression, encoding, and layout strategies
  • ' } /> Source: Vortex Github

    ' } /> Vortex has a clean separation of logical schema and physical layout, which Cayenne leverages to support efficient segment-level access, minimize memory pressure, and extend functionality without breaking compatibility. It draws on years of academic and systems research including innovations from projects like YouTube\'s Procella, FSST, FastLanes, ALP/G-ALP, and MonetDB/X100 to push the boundaries of what\'s possible in open-source analytics.

    ' } /> Extensible and community-driven, Vortex is already integrated with tools like Apache Arrow, DataFusion, and DuckDB, and is designed to support Apache Iceberg in future releases. It\'s also the foundation of commercial offerings from SpiralDB and PolarSignals. Since version 0.36.0, Vortex guarantees backward compatibility of the file format.

    ' } /> 2. Metadata Layer'} /> Cayenne stores metadata in an embedded database. SQLite is supported today, but aligned with the Spice philosophy of optionality, the design is extensible for pluggable metadata backends in the future. Cayenne's metadata layer was intentionally designed as simple as possible, optimizing for maximum ACID performance.

    " } /> The metadata layer includes:

    '} />
  • Schemas
  • Snapshots
  • File tracking
  • Statistics
  • Refreshes 
  • ' } /> All metadata access is done through standard SQL transactions. This provides:

    ' } />
  • A single, local source of truth
  • Fast metadata reads
  • Consistent ACID semantics
  • No external catalog servers
  • No scattered metadata files
  • ' } /> A single SQL query retrieves all metadata needed for query planning. This eliminates round-trip calls to object storage, supports file-pruning, and reduces sensitivity to storage throttling.

    ' } /> Together, the metadata engine and Vortex format enable Cayenne to scale beyond the limits of single-file engines while keeping acceleration operationally simple.

    ' } /> Benchmarks'} /> So, how does Spice Cayenne stack up to the other accelerators?

    ' } /> We benchmarked Cayenne against DuckDB v1.4.2 using industry standard benchmarks (TPC-H SF100 and ClickBench), comparing both query performance and memory efficiency. All tests ran on a 16 vCPU / 64 GiB RAM instance (AWS c6i.8xlarge equivalent) with local NVMe storage. Cayenne was tested with Spice v1.9.0.

    ' } />
    Cayenne accelerated TPC-H queries 1.4x faster than DuckDB (file mode) and used nearly 3x less memory.
    ' } />
    Cayenne was 14% faster than DuckDB file mode, and used 3.4x less memory.
    ' } /> Spice Cayenne achieves faster query times and drastically lower memory usage by pairing a purpose-built execution engine with the Vortex columnar format. Unlike DuckDB, Cayenne avoids monolithic file dependencies and high memory spikes, making it ideal for production-grade acceleration at scale.

    ' } /> Getting started with Spice Cayenne' } /> Use Cayenne by specifying engine: cayenne in the Spicepod.yml (dataset configuration).

    ' } /> Following are a few example configurations.

    '} /> Basic:

    '} /> ```yaml datasets: - from: spice.ai:path.to.my_dataset name: my_dataset acceleration: engine: cayenne mode: file ``` Full configuration:

    '} /> ```yaml version: v1 kind: Spicepod name: cayenne-example datasets: - from: s3://my-bucket/data/ name: analytics_data params: file_format: parquet acceleration: engine: cayenne enabled: true refresh_mode: full refresh_check_interval: 1h ``` Memory'} /> Memory usage depends on dataset size, query patterns, and caching configuration. Vortex's design reduces memory overhead by using selective segment reads and zero-copy access.

    " } /> Storage'} /> Disk space is required for:

    '} />
  • Vortex columnar data
  • Temporary files during query execution
  • Metadata tables
  • ' } /> Provision storage according to dataset size and refresh patterns.

    ' } /> Roadmap'} /> Spice Cayenne is in beta and still evolving. We encourage users to test Cayenne in development environments before deploying to production.

    ' } /> Upcoming improvements include:

    '} />
  • Index support
  • Improved snapshot bootstrapping
  • Additional metadata backends
  • Advanced compression and encoding strategies
  • Expanded data type coverage
  • ' } /> The goal for Spice Cayenne stable is for Cayenne to be the fastest, most efficient accelerator across the full range of analytical and operational data and AI workloads at terabyte & petabyte-scale.

    ' } /> Conclusion'} /> Spice Cayenne represents a step function improvement in Spice data acceleration, designed to serve multi-terabyte, high concurrency, and low-latency workflows with predictable operations. By pairing an embedded metadata engine with Vortex's high-performance format, Cayenne offers a scalable alternative to single-file accelerators while keeping configuration simple.

    " } /> Spice Cayenne is available in beta. We welcome feedback on the road to its stable release.

    ' } />

    '} /> ## Frequently Asked Questions ### What is Spice Cayenne? Spice Cayenne is a data accelerator engine built for multi-terabyte, low-latency [data lake acceleration](/use-case/datalake-accelerator) workloads. It combines the [Vortex columnar format](https://github.com/vortex-data) with an embedded SQLite-backed metadata engine to deliver faster queries and lower memory usage than DuckDB or Arrow-based alternatives. ### How does Cayenne compare to DuckDB for data acceleration? On TPC-H SF100 benchmarks, Cayenne delivers 1.4x faster query execution and uses 3x less memory than DuckDB file mode. On ClickBench, Cayenne is 14% faster with 3.4x less memory. These gains come from the Vortex format's zero-copy Arrow compatibility and fine-grained pruning capabilities, which avoid the monolithic file dependencies that drive DuckDB's memory spikes. ### What is the Vortex columnar format? Vortex is an open-source columnar file format under the Linux Foundation, designed as a modern alternative to Apache Parquet. It provides 100x faster random access, 10-20x faster scans, and 5x faster writes compared to Parquet. Vortex is zero-copy compatible with Apache Arrow, meaning data can be queried directly without conversion overhead. ### When should I use Cayenne instead of DuckDB or Arrow acceleration in Spice? Use Cayenne for large-scale data lake workloads (hundreds of gigabytes to multi-terabyte datasets) where memory efficiency and consistent query performance matter. DuckDB and Arrow remain good choices for smaller datasets or when DuckDB-specific SQL extensions are needed. Cayenne is the recommended default accelerator for production [SQL federation and acceleration](/platform/sql-federation-acceleration) deployments. --- ## Making Apps That Learn And Adapt URL: https://spice.ai/blog/making-apps-that-learn-and-adapt Date: 2021-11-05T05:55:47 Description: Building intelligent applications is still too hard for most developers-not because ML is impossible, but because it's treated as something separate from the app. In the Spice.ai announcement blog post, we shared some of the inspiration for the project stemming from challenges in applying and integrating AI/ML into a neurofeedback application. Building upon those ideas, in this post, we explore the shift in approach from a focus of data science and machine learning (ML) to apps that learn and adapt.

    ' } /> As a developer, I've followed the AI/ML space with keen interest and been impressed with the advances and announcements that only seem to be increasing. stateof.ai recently published its 2021 report, and once again, it's been another great year of progress. At the same time, it's still more challenging than ever for mainstream developers to integrate AI/ML into their applications. For most developers, where AI/ML is not their full-time job, and without the support of a dedicated ML team, creating and developing an intelligent application that learns and adapts is still too hard.

    " } /> Most solutions on the market, even those that claim they are for developers, focus on helping make ML easier instead of making it easier to build applications. These solutions have been great for advancing ML itself but have not helped developers leverage ML in their apps to make them intelligent. Even when a developer successfully integrates ML into an application, it might make that application smart, but often does not help the app continue to learn and adapt over time.

    ' } /> Traditionally, the industry has viewed AI/ML as separate from the application. A pipeline, service, or team is provided with data, which trains on that data, and can then provide answers or insights. These solutions are often created with a waterfall-like approach, gathering and defining requirements, designing, implementing, testing, and deploying. Sometimes this process can take months or even years.

    ' } /> With Spice.ai, we propose a new approach to building applications. By bringing AI/ML alongside your compute and data and incorporating it as part of your application, the app can incrementally adopt recommendations from the AI engine and in addition the AI engine can learn from the application's data and actions. This approach shifts from waterfall-like to agile-like, where the AI engine ingests streams of application and external data, along with the results of the application's actions, to continuously learn. This virtuous feedback cycle from the app to the AI engine and back again enables the app to get smarter and adapt over time. In this approach, building your application is developing the ML.

    " } /> Being part of the application is not just conceptual. Development teams deploy the Spice.ai runtime and AI engine with the application as a sidecar or microservice, enabling the app services and runtime to work together and for data to be kept application local. A developer teaches the AI engine how to learn by defining application goals and rewards for actions the application takes. The AI Engine observes the application and the consequences of its actions, which feeds into its experience. As the AI engine learns, the application can adapt.

    ' } />
    Figure 1: The intelligent app flywheel
    ' } /> As developers shift from thinking about disparate applications and ML to building applications where AI that learns and adapts is integrated as a core part of the application logic, a new class of intelligent applications will emerge. And as technical talent becomes even more scarce, applications built this way will be necessary, not just to be competitive but to be even built at all.

    ' } /> In the next post, I'll discuss the concept of Spicepods, bundles of configuration that describes how the application should learn, and how the Spice.ai runtime hosts and uses them to help developers make applications that learn.

    " } /> Learn more and contribute' } /> Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!

    ' } /> Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

    ' } /> If you are interested in partnering, we\'d love to talk. Try out Spice.aiemail us "hey," join our community Slack, or reach out on Twitter.

    ' } /> We are just getting started! 🚀

    '} /> Luke

    '} /> --- ## Making Object Storage Operational for Real-Time and AI Workloads URL: https://spice.ai/blog/making-object-storage-operational Date: 2025-10-06T17:37:00 Description: Transform object stores into real-time AI platforms. Spice adds federation, acceleration, hybrid search, and inference capabilities. TLDR'} />
  • Object storage and open table formats deliver nearly limitless scalability and cost efficiency, making them important pieces of modern data architectures. 
  • Despite their advantages, object storage can\'t function as an independent solution for workloads that require millisecond latency, sophisticated queries, or AI-driven retrieval, where throughput-optimized designs and limited query expressiveness introduce bottlenecks.
  • Spice expands the utility of these systems by pushing object storage closer to the application layer and layering on more advanced compute capabilities. 
  • Spice\'s federation and acceleration eliminates ETL and transforms object storage into a functional data layer for operational applications and AI agents.
  • ' } /> Introduction'} /> Although legacy systems and workflows remain common, many enterprises are re-evaluating their architectures to meet new demands  - driven in part, but not exclusively, by AI - that require support for more data-intensive and real-time applications.

    ' } /> The underlying storage needs for these novel workloads are generally outside the bounds of a traditional operational database for a handful of reasons - namely scalability, flexibility (the need to support heterogeneous data types), or availability (or some combination thereof). 

    ' } /> Object storage systems have experienced a renaissance in this environment, often being re-purposed or augmented for more transactional use cases than they\'ve historically supported. Platforms such as Amazon S3 and MinIO provide the scalability to handle petabytes of data, the cost efficiency of commodity hardware and open-source software, and the simplicity of a flat architecture that reduces management overhead. Although object storage systems don\'t offer some of the guardrails of operational databases like strong consistency, many operational use cases tolerate eventual consistency. Common scenarios like rate-limiting or feature lookups, for example, don\'t mandate strong consistency, and object storage systems help development teams avoid the performance tax strong consistency can impose. 

    ' } /> These attributes have made object storage a source of truth for operational workloads; development teams get the dual benefit of reduced system complexity while maintaining high reliability.

    ' } /> Challenges for Object Storage in Demanding Operational Workloads ' } /> Unfortunately, there's no free lunch in technology. 

    " } /> Object storage systems also come with significant tradeoffs for more performance-sensitive workloads. 

    ' } />
  • Object storage systems optimized for throughput rather than responsiveness introduce higher latency, limiting their use in real-time scenarios. 
  • The object storage key-value model makes complex SQL queries difficult to express and slows down analytical flexibility. 
  • Managing governance, consistency, and security at scale becomes a challenge in environments limited to eventual consistency. 
  • AI and ML workloads, which rely on random access patterns and low-latency retrieval, are not natively optimized for object storage.
  • Finally, for enterprises migrating from legacy databases, re-engineering data formats and pipelines to fit object stores can introduce complexity, cost, and downtime.
  • ' } /> ‍While object stores are now ubiquitous in enterprise environments, they can't serve as an independent solution for the operational and AI-driven workloads now shaping many application access patterns. 

    " } /> Open Table Formats: Structuring Data for Performance and Governance' } /> Open table formats like Apache IcebergDelta Lake, and Apache Parquet represent a step function of improvement for these more demanding operational data workloads by introducing database-like capabilities to object storage. These formats address the shortcomings of raw object storage, such as lack of transactional support and poor query performance, making them ideal for managing structured operational data:

    ' } />
  • Consistency Optionality: ACID transactions ensure reliable updates, while eventual consistency aligns with use cases where brief sync delays are tolerable.
  • Query Performance: Optimizations like data skipping and indexing make complex queries fast.
  • Governance and Security: Features like schema enforcement and audit trails support compliance.
  • Migration Support: Structured formats ease transitions from legacy systems by mimicking database functionality.
  • ' } /> ‍However, open table formats are still not a panacea for all operational workloads. They improve governance and query planning, but they don't solve the performance challenges of running federated queries across multiple operational and analytical systems or powering AI applications that embed both structured and unstructured data. Different tools for different jobs, as they say. 

    " } /> What if you could maintain all of the great attributes of object storage and open table formats, but add the orchestration necessary to actually power your application without a bunch of ETL pipelines?

    ' } /> Well, you now can with Spice.ai

    '} /> Transforming Object Storage into a High-Performance Data Layer with Spice ' } /> Spice was purpose-built to solve this problem. By unifying SQL query federation and acceleration, search and retrieval, and LLM inference into a single, deploy-anywhere runtime, Spice makes it possible to serve data and AI-powered experiences directly from your existing object storage - securely, at low latency, and without sacrificing the simplicity and economics of object storage. Built in Rust on top of modern open-source technologies like Apache DataFusion (query optimization), Apache Arrow (in-memory processing), DuckDB (fast analytics), Apache Iceberg (open table format), and OpenTelemetry (observability), Spice transforms object storage into a high-performance data layer equipped to serve the most demanding operational workloads. 

    ' } /> It's a lightweight (~150MB) and portable runtime that:

    "} />
  • Federates, Materializes, and Accelerates Data: Run SQL queries across databases, data lakes, and APIs without moving data. Store hot data in-memory or locally using Apache Arrow, DuckDB, or SQLite for sub-second queries.
  • Delivers Hybrid Search Across Unstructured and Structured Data: Execute keyword, vector, and full-text search from a single SQL query. 
  • Serves AI Models: Support local or hosted AI models, tying real-time data to AI outputs.
  • ' } />
    Figure 1: Spice.ai Compute Engine
    ' } /> One Runtime for All Your Data'} /> Where others solve one piece of the problem (search, query, or inference), Spice brings these capabilities together in one platform. The result is faster delivery of high-performance applications, with fewer moving parts to operate and maintain.

    ' } /> As you can imagine, Spice's value goes beyond operationalizing object stores. With Spice you can federate SQL across transactional and analytical systems, join it with Parquet in S3 or Iceberg tables, and avoid the latency and cost of moving data back and forth.

    " } /> You can run Spice wherever your application lives: as a sidecar for edge workloads, a microservice in the cloud, or a managed deployment. The benefit of this deployment optionality is that it gives applications and AI a controlled execution layer rather than direct database access.

    ' } /> For more on the Spice architecture, visit the OSS overview here

    ' } />
    Figure 2: AI-driven architecture with Spice.ai
    ' } /> Real-World Impact: Twilio, Barracuda, and NRC Health' } /> Let's take this out of the abstract and into some real-world applications built on Spice. 

    " } /> Twilio: Database CDN for Messaging Pipelines' } /> For Twilio, consistently fast data access is mission-critical. In their messaging pipelines, even a brief database outage could cascade into service interruptions. With Spice, Twilio stages critical control-plane datasets in object storage, then accelerates them locally for sub-second queries. This not only improved P99 query times to under 5ms but also introduced automated multi-tenancy controls that propagate updates in minutes instead of hours. By reducing reliance on direct database queries and adding a resilient S3 failover path, Twilio doubled data redundancy and improved overall reliability - all with a lightweight container drop-in.

    ' } /> Barracuda: Datalake Accelerator for Email Archives' } /> By deploying Spice as a datalake accelerator, Barracuda reduced P99 query times to under 200 milliseconds and moved audit logs into cost-efficient Parquet files on S3, which Spice queries directly. The shift not only eliminated costly data lakehouse queries but also reduced load on Cassandra, improving stability across the infrastructure. The result was a faster, more reliable customer experience at a fraction of the cost.

    ' } /> NRC Health: Data-Grounded AI for Healthcare Insights' } /> NRC Health needed a way to build secure, data-grounded AI features that could integrate multiple internal platforms - from MySQL and SharePoint to Salesforce - without lengthy development cycles. Spice provided a unified, AI-ready data layer that developers could access through a single interface. Developers found it easier to experiment with embeddings, search, and inference directly in Spice, avoiding the complexity of stitching together bespoke pipelines. The result is faster innovation and AI features grounded in real, relevant healthcare data.

    ' } /> Conclusion'} /> Object storage and open table formats have become critical parts of modern enterprise data infrastructure, but they were not designed to serve real-time operational or AI-driven workloads on their own. Spice fills that gap by pairing federation with acceleration, search, and inference, turning data lakes into low-latency, AI-ready data layers. For enterprises hoping to get the most leverage possible out of their operational data, Spice is the catalyst.

    ' } /> Getting Started with Spice'} /> Spice is open source (Apache 2.0) and can be installed in less than a minute on macOS, Linux, or Windows, and also offers an enterprise-grade Cloud deployment

    ' } />

    '} /> ## Frequently Asked Questions ### Why is object storage alone not enough for real-time workloads? Object storage (S3, ADLS, GCS) and open table formats like Apache Iceberg were designed for analytical batch processing, not low-latency queries. Read latencies of hundreds of milliseconds and the overhead of scanning metadata and Parquet files make them too slow for operational applications and AI agents that need sub-second responses. ### How does Spice make object storage operational? Spice pairs [SQL federation](/platform/sql-federation-acceleration) with a [data acceleration](/use-case/datalake-accelerator) layer that materializes working sets from object storage into local engines such as Arrow, DuckDB, or SQLite. Applications query Spice with standard SQL and get sub-millisecond responses while the source data remains in the data lake. ### Can Spice handle both search and structured queries on object storage data? Yes. Spice provides [hybrid search](/platform/hybrid-sql-search) that combines keyword (BM25), full-text, and vector similarity search alongside standard SQL queries in a single runtime. This lets you build applications that need both analytical queries and semantic retrieval over the same data lake. --- ## On Writing URL: https://spice.ai/blog/on-writing Date: 2024-05-23T19:14:00 Description: Writing is fundamental to formalizing thoughts, communicating effectively, and is the ultimate creation tool. Words matter. A single word can throw you into the depths of despair or raise you to euphoria. Every significant civilization, culture, and religion has placed emphasis on them because words are how we create. Every idea starts with words which develop, grow, and materialize through the process of writing. Writing is fundamental to formalizing thoughts, communicating effectively, and is the ultimate creation tool.

    ' } /> Thinking Formalized'} /> Writing is thinking formalized. We think and reason in words, so to think critically, you must write. Paul Graham, the founder of Y Combinator, has written at length on the power and necessity of putting ideas into words as a "severe test" to know something well. Graham also quotes Turing Award winner Leslie Lamport:

    ' } />

    '} /> Writing well shows clarity of thought, but confusing, illogical, and verbose writing betrays poor thinking. Richard Guindon wrote "Writing is nature\'s way of letting you know how sloppy your thinking is."

    ' } /> Part of Amazon\'s success is credited to its culture of writing and Jeff\'s requirement for 6-pagers. Instead of Powerpoint presentations, detailed documents are required, which are read in silence at the beginning of meetings. Writing documents ensures deep thinking, thorough understanding, critical evaluation, and effective communication. Documents also level the playing field for meeting participants, so everyone starts with the same contextual foundation facilitating higher-quality discussions.

    ' } /> Writing at Spice AI'} /> At Spice, we believe great writing is a reflection of clear thinking; and moreover the writing process helps create and crystallize that clear thinking. We infuse writing into everything we do, which started with the Spice AI vision, our first-principles, and includes the mission and strategy, customer discovery, problem-solution, product management, and engineering. Spice board meetings start with a board memo pre-read and discussion rather than a deck. And across the company we document major decisions using Spice Decision Records (SDRs), a version of ADRs, that extend beyond engineering.

    ' } /> Writing is leveraged throughout the customer-discovery and product creation process. This starts with written notes from Mom Test conversations, to using Amazon\'s Docs to define product vision and value proposition, which are iteratively tested with prospects and customers, providing distilled requirements for engineering, and creating clear messaging for customers and stakeholders.

    ' } /> Precison, concision, and structure are important, so writing doesn\'t have to take a lot of time. We start with simple templates that define the What or Goal-State of what we want to achieve, Why it should be prioritized, and By When. Here\'s the GitHub Issue template we use.

    ' } /> At Spice, writing is crucial in creating clarity, communicating effectively, and working as one team. A culture of writing ensures everyone is on the same page regarding goals, expectations, and plans. Great engineers are great communicators, a trait also recognized by other leaders like Paul Dix, CTO of InfluxData.

    ' } /> ' } /> For those early in their careers, clear thinking is table stakes. As you rise in seniority, what you ultimately get paid for is making and communicating a small number of high-quality decisions. Writing well becomes even more important as you progress in your career, especially in teams like Spice, where communication and leadership (creating clarity) are core values. At Spice, everyone is expected to write well.

    ' } /> Writing in the age of ChatGPT'} /> It's seductive to outsource your writing to AI, but if you do, your ability to think critically and create clarity for yourself and others will atrophy, along with your ability to create and forge your own destiny in the world. At Spice, while AI is encouraged for ideation and research, our guidance is to avoid using ChatGPT and other generative AI tools for writing as to maintain our critical thinking muscle. Some engineers take this even further and intentionally disable tools like GitHub Copilot when writing core or critical code.

    " } /> Principles for writing'} />
  • Audience. Be aware of, understand, and target a specific audience. Don\'t waste their time.
  • Clarity. Strive for clear, articulate, and straightforward expression. Be as specific, precise, and concise as possible. Remove all unnecessary words.
  • Narrative. Use narrative, story, or essay form, and avoid bullet-points except for specific lists.
  • Structure. Use structure to organize ideas coherently. E.g. write emails using the Inverted Pyramid structure.
  • Revision. Read, edit, and revise until the piece is as clear, precise, concise, and well-structured for the audience as possible.
  • ' } /> Conclusion'} /> Writing forces us to articulate ideas precisely which creates clarity. Through clarity we generate certainty - one of the 6 human needs. Through high certainty we can lead ourselves and others to do great things and ultimately create the world we desire.

    ' } /> Learning Resources'} />
  • Amazon on writing.
  • Jordan Peterson on Writing and his Essay Writing Guide.
  • On Writing Well by William Zinger
  • The Elements of Style by Strunk and White.
  • Paul Graham\'s Essays.
  • People\'s words and actions can actually shape your brain - TED Ideas.
  • ' } />

    '} /> --- ## Operationalizing Amazon S3 for AI: From Data Lake to AI-Ready Platform in Minutes URL: https://spice.ai/blog/operationalizing-amazon-s3-for-ai Date: 2026-02-03T18:03:10 Description: Transform Amazon S3 from passive storage to an AI-ready platform. Real-world example using Spice and S3 for hybrid search and LLM inference. TL;DR'} /> Amazon S3 has evolved into one of the most flexible and powerful systems of record, but leveraging it for real-time AI workloads requires significant distributed systems work to stitch together applications, databases, vector stores, and caches. 

    ' } /> Spice handles ingestion, federation, acceleration, caching, and query execution, so teams can build AI applications and agents directly on S3 without that distributed systems complexity.  

    ' } /> This post walks through a real-world example of using Spice to transform Amazon S3 from a passive object store into a low-latency, AI-ready execution layer.  

    ' } /> From storage to system of record ' } /> Amazon S3 is durable, predictable, and cost-effective at massive scale. Its API has become the storage standard, making it one of the most portable foundations in the modern data stack. 

    ' } /> But S3 wasn't designed for serving low-latency, operational workloads. To serve application queries, teams generally copy data into databases, search engines, and caches. 

    " } /> However, recently, AWS has extended S3 with primitives that can make it the backbone of AI workloads: 

    ' } />
  • S3 Tables bring managed Apache Iceberg tables directly into S3. Structured, tabular data that traditionally lived in databases or data warehouses can be queried with the simplicity of the S3 storage layer. 
  • ' } />
  • S3 Vectors is purpose-built vector storage designed for embeddings at petabyte scale that can power search and RAG, grounding foundation models with proprietary data, enabling semantic search that understands meaning rather than just matching keywords. 
  • ' } /> These foundational capabilities can change what's possible when you're building AI applications. They are, however, still only storage primitives that don't solve all of the distributed systems complexity you end up facing at scale: ingestion, federation, caching, sharding, partitioning, query optimization, re-ranking, observability, and more. Infrastructure complexity compounds quickly.  

    " } /> Even a nominally straight-forward Q&A application needs to ingest structured data (questions, answers, metadata) and unstructured data (long-form text), vectorize the data to produce embeddings (for semantic search) and keyword indexes (for exact matches), and provide high-quality context for LLM inference (for analysis and generation). To serve these workloads, you're still required to build and operate application-layer distributed systems code to deliver the system at scale. 

    " } />
    Figure 1. S3 and S3 Vectors are great storage primitives. Production AI systems need federation, ingestion, acceleration, hybrid search, and distributed query on top.
    ' } /> The missing piece is a runtime that turns S3 into an operational, AI-ready system that handles query execution, data lake acceleration, and coordination across structured data, vectors, and real-time sources. 

    ' } /> Spice AI: The operational layer for S3 ' } /> Spice integrates your applications and your data infrastructure - S3, databases, data warehouses, unstructured stores - and handles the distributed systems complexity for you. By unifying SQL query federation and accelerationsearch and retrieval, and LLM inference into a single, deploy-anywhere runtime, Spice makes it possible to serve data and AI-powered experiences directly from S3 - securely, at low latency, and without sacrificing the simplicity and economics of object storage. Spice is built in Rust with modern open-source technologies including Apache DataFusionApache ArrowApache Iceberg, and Vortex.  

    ' } /> Here are four specific challenges Spice addresses for these AI-driven workloads: 

    ' } /> Federate data where it lives ' } /> Spice executes SQL queries across databases, data lakes, and APIs with zero ETL. For example, you can write a single query that joins live data in S3 Tables and Aurora or combines historical logs in S3 with real-time operational data from DynamoDB. 

    ' } /> Spice has native data connectors across 35+ data sources: Everything from S3, PostgreSQL, MySQL, Snowflake, Databricks, DynamoDB, and more. 

    ' } /> Accelerate data for latency-sensitive AI apps ' } /> Spice implements tiered caching and acceleration using embedded databases including DuckDB, SQLite, and Vortex-based Spice Cayenne. Frequently accessed data is materialized locally for millisecond queries. 

    ' } /> This is critical for AI workloads: applications and agents making hundreds of queries per second can't wait for S3 round trips on every request. 

    " } /> Hybrid search across structured and unstructured data ' } /> Search is a foundational primitive for AI applications - RAG and agents rely on it to build high-quality, relevant context. But in traditional architectures, search lives in separate systems (Elasticsearch, Pinecone, etc.) with their own APIs, integrations, and copies of data. 

    ' } /> Spice takes an opinionated stance: search should use the same SQL interface as query, with indexes built off the same copy of data. 

    ' } /> Spice combines BM25 full-text search with vector similarity search and native re-ranking, enabling developers to execute hybrid search (vector and full-text search) from a single SQL query. Spice does the work to build partitioned data indexes, manage metadata for filtering, and parallelizes cross-index scatter-gather queries. 

    ' } /> Serve AI models ' } /> When you want to pipe query and search results into an LLM for analysis, classification, or generation, Spice provides built-in SQL functions for inference to models hosted on Bedrock, OpenAI, or self-hosted local models with full GPU-acceleration. Unlike alternative solutions, you don't need to wire external API calls, orchestration code, or copy data between systems.  

    " } /> You get S3's scale and cost efficiency without the distributed systems complexity in your application layer. 

    " } />
    Figure 2. The data and AI substrate for AWS applications: Spice provides federated SQL, acceleration, hybrid search, and LLM inference across DynamoDB, S3, S3 Vectors, S3 Tables, and more.
    ' } /> Building an AI-driven search application with Spice and S3 ' } /> Let's take this out of the abstract and into a real-world scenario. We're going to progressively improve the search experience for Apache Answer, the open-source Stack Overflow-style Q&A application.  

    " } />
    Figure 3. Apache Answer interface
    ' } /> By the end of this walk-though, we'll have: 

    " } />
  • Real-time ingestion from Kafka streaming into S3 Tables (structured data) and S3 Vectors (embeddings) so that queries and searches are real-time and accurate  
  • ' } />
  • Federated queries that join S3 data with live data in Aurora and DynamoDB with zero ETL 
  • ' } />
  • Hybrid search combining BM25 keyword matching with semantic vector search, re-ranked with Reciprocal Rank Fusion (RRF) 
  • ' } />
  • Sub-50ms query latency with built-in caching using DuckDB and the Spice Cayenne acceleration engine. 
  • ' } />
  • AI analysis where search results are piped directly into LLMs via SQL for classification, summarization, and generation 
  • ' } /> The entire implementation is less than 100 lines of declarative YAML configuration - eliminating custom application code, orchestration, and the need to copy data. 

    ' } /> This architecture is inspired by Spice AI's Founder and CEO, Luke Kim's Talk at re:Invent 2025. You can watch the full end-to-end demo in under 10 minutes here: 

    " } /> The baseline  ' } />
    Figure 4. Native search alternates between 20,000 results and sometimes nothing at all!
    ' } /> Let\'s start with the Apache Answer default and often the baseline for applications - good, old-fashioned string matching in PostgreSQL. When you search in Apache Answer for something like "MySQL connection error", you\'re waiting several seconds for 20,000 results - most of which are completely irrelevant. Sometimes the query just times out and returns nothing at all. This is a common outcome when your search is just matching keywords in a database with no understanding of semantic meaning or ability to distinguish between a question about MySQL connection errors, and a random post that happens to mention those words in passing. 

    ' } /> Let's see how we can improve this. 

    "} /> The architecture ' } />
    Figure 5. The Apache Answer Agent architecture with Spice and S3
    ' } /> First, to enable real-time indexing, questions and answers are streamed via Debezium CDC through Kafka. Spice ingests that stream and simultaneously indexes the content for BM25 full-text search, generates vector embeddings using Amazon Titan, and materialized data locally in DuckDB. 

    ' } /> Structured data (e.g. question IDs, timestamps, tags) is then written to S3 Tables where it's queryable as Iceberg tables. Embeddings are stored in S3 Vectors, automatically partitioned by date into separate indexes with filterable metadata. When a user searches, Apache Answer queries Spice with SQL, so no new API integration is required. Spice serves queries from the local acceleration, scatter-gather queries S3 Vectors across multiple partitioned indexes, combines full-text and vector results, and returns a re-ranked result set. 

    " } /> Step 1: Configure Spice  ' } /> Here\'s the configuration for everything just described in a single YAML file (what we call a Spicepod). 

    ' } /> Historic questions are stored in an S3 bucket with DuckDB acceleration, and new records are incrementally added as they arrive: 

    ' } />
    Figure 6: Acceleration configuration
    ' } />  Spice\'s AWS Glue catalog connector sets up S3 Tables for the structured data:

    ' } />
    Figure 7. AWS Glue configuration in Spice
    ' } /> S3 Vectors is used as the vector search engine with the answer text vectorized with the Bedrock hosted Amazon Titan model. The partition_by setting ensures data is striped across multiple indexes, and ingestion and time-based queries are incredibly fast. Metadata fields get pushed down into S3 Vectors, so users can filter by tags or date ranges without scanning every vector:

    ' } />
    Figure 8. Amazon S3 Vectors configuration in Spice
    ' } /> The BM25 search index configuration is even simpler; you just specify which columns to index for full-text search:

    ' } />
    Figure 9. Full-text search configuration in Spice
    ' } /> And finally, we're configuring Bedrock models so query and search results can be analyzed directly from SQL queries:

    " } />
    Figure 10: Amazon Bedrock configuration in Spice
    ' } /> Spice handles the complete end-to-end, from streaming ingestion from Kafka, partitioning data into the right indexes, generating embeddings, managing caches, and accessing everything through SQL.  

    ' } /> Step 2: Real-time data ingestion ' } />
    Figure 11. Streaming architecture with Kafka and Debezium
    ' } /> Now that the configuration is in place, we're ready to run Spice. Questions and answers begin streaming into the application through Kafka, which Spice processes in real-time. For each incoming record, it's generating embeddings using Titan, indexing the content for BM25 search, partitioning and sharding the records based on timestamp, and pushing filterable metadata like tags and creation date into S3 Vectors. 

    " } /> While data is streaming in, you can query it live in the Spice Cloud Playground, where the record count ticks up in real-time as data flows through the system. 

    ' } />
    Figure 12. Ingesting real-time data with Kafka
    ' } /> That LIKE query returned in under 100 milliseconds, searching across a quarter million records already ingested and arriving in real-time. This is the power of the tiered caching; frequently accessed queries hit the DuckDB acceleration locally instead of making round trips to S3. 

    ' } /> Step 3: Federate across multiple data sources ' } /> Spice doesn't just work with S3 but can federate queries across any data source in your stack. That means you can write a single SQL query that joins S3 data with data in Aurora and combines historical logs with real-time metrics from DynamoDB. 

    " } /> Spice pushes down queries from both sources, executes the join, and returnes unified results. 

    ' } /> Step 4: Full-text search with BM25 ' } /> Now, let's improve search results with full-text search for keyword matching and identifying specific terms. Spice provides a text_search function that hits the BM25 index we configured earlier directly (avoiding a full table scan), so if someone is searching for a specific error code or technical term, it will also be included and ranked in the results based on relevance:

    " } />
    Figure 13. Full-text search in Spice
    ' } /> Step 4: Semantic search with S3 Vectors ' } /> We'll now add semantic understanding with vector search. The syntax looks almost identical: 

    " } />
    Figure 14. Vector search in Spice
    ' } /> Behind that simple vector_search SQL function is a lot of power. Spice automatically vectorizes the query text using the same Titan model we configured earlier. Then it searches across multiple daily indexes in S3 Vectors and combines the results. We're partitioning by date, so there might be dozens of indexes involved. It applies metadata filters to narrow down the search space, scatter-gather executes the similarity searches in parallel, merges the results, and returns the top-k most similar answers. 

    " } /> You can see exactly what's happening, by running an EXPLAIN query: 

    " } />
    Figure 15. Explain plan
    ' } /> The query plan shows multiple parallel queries to S3 Vectors. Each box in the visualization represents a separate index being searched. Spice automatically shards the query across all relevant daily indexes, executes them in parallel, merges and ranks the results, and returns a unified result set. 

    ' } /> This is the kind of thing you'd normally need to build yourself - writing code to manage index metadata, parallel scatter-gather searches, handle failures and retries, merge results with proper ranking. With Spice, you get it out of the box. 

    " } /> Step 5: Hybrid search (BM25 + vector) ' } /> Full-text search and vector search combined provide higher relevance search results. Full-text-search excels at catching exact technical terms and keywords. If someone searches for "error code 1045" you want that exact match. Vector search understands semantic similarity; someone searching for "database connection problems" should find answers about "DB connectivity issues". 

    ' } /> Hybrid search combines both modalities. Here\'s how to do that with Spice: 

    ' } />
    Figure 16. Spice hybrid search
    ' } /> This query runs both searches in parallel, ranks each result set, then uses Reciprocal Rank Fusion (RRF) to combine them into a single ranking (for a more in-depth explainer on RRF, visit the docs). 

    ' } /> Step 6: Feed results into AI for analysis ' } /> Now that we\'ve got high-quality search results, we can pipe them directly into an LLM for deeper analysis. Spice provides an ai() function that makes this trivial:

    ' } />
    Figure 17. Spice ai() function.
    ' } /> This query pipes the top 10 search results to Amazon Nova to extract the main technology keywords. The results come back right in the SQL result set: 

    ' } />
    Figure 18. AI query results
    ' } /> From this simple example, you can extrapolate some pretty interesting use cases that can be enabled with this pattern: 

    ' } />
  • Run sentiment analysis on customer feedback at scale. For example, search for complaints about a specific product feature and analyze the emotional tone.  
  • ' } />
  • Identify security threats in logs by searching for suspicious patterns and using an LLM to assess severity.  
  • ' } />
  • Detect fraud by finding similar transaction patterns and asking an AI to explain why they\'re anomalous, automatically tagging and categorizing content by extracting themes and topics. 
  • ' } /> This is powerful - we went from raw data to AI-generated insights in just three SQL queries. And without an API integration to maintain, ETL'ing data, or custom orchestration code.  

    " } /> The results: Before and after ' } /> Let's step back and look at what we actually accomplished. We started with a Q&A application that had fundamentally broken PostgreSQL search: slow 2-3 second queries, 20,000 irrelevant results, no semantic understanding, and no ability to handle streaming data.  

    " } /> After implementing Spice with S3 Tables and S3 Vectors, we returned 20 highly relevant results. The results are dramatically better. Instead of 20,000 irrelevant results, we get 20 highly relevant answers that combine the precision of keyword matching with the semantic understanding of vector search:

    ' } />
    Figure 18. A short list of relevant, high quality results, trimmed down from the 20,000 baseline.
    ' } /> This solution incorporated:

    '} />
  • 250,000+ records streaming in real-time while maintaining query performance.  
  • ' } />
  • Queries federated across multiple data sources joining user data and other context. 
  • ' } />
  • Semantic vector search alongside keyword matching for better accuracy. 
  • ' } />
  • Search results piped directly into AI models for analysis. 
  • ' } /> The difference in user experience is night and day, yet we built this entire system with minimal YAML configuration and SQL queries. Instead of spending weeks building distributed systems and infrastructure, you're defining your configuration and immediately querying data, running searches, and integrating AI into your application, turning your data lake into an AI-ready platform. 

    " } /> Next steps ' } /> If you want to try this yourself, star the GitHub repo and explore the Spice.ai recipes. There are recipes for common patterns like hybrid search, S3 Vectors integration, and RAG workflows. 

    ' } /> The Spice documentation has a quickstart guide that walks through setting up your first Spice deployment and running queries. Learn more about how to get started with Spice and S3 Vectors in the launch blog, or the \'Architecting high-performance AI-driven data applications with Spice and AWS\' tutorial hosted on the AWS Storage Blog.  

    ' } /> If you\'re building AI applications with S3 and want to talk through your specific use case, join us on the Spice Community Slack. We\'d love to hear what you\'re working on.  

    ' } /> ## Frequently Asked Questions ### What does it mean to operationalize Amazon S3 for AI? Operationalizing S3 for AI means adding a runtime layer that turns object storage into a queryable, low-latency data platform for AI applications. S3 stores data durably and cheaply, but it lacks the query execution, acceleration, and vector search capabilities that AI workloads require. Spice bridges this gap by federating queries across [S3 Tables](/blog/getting-started-with-amazon-s3-vectors-and-spice), S3 Vectors, and structured data sources in a single SQL interface. ### Can S3 be used as a vector database for AI applications? Yes, with [Amazon S3 Vectors](https://aws.amazon.com/s3/features/vectors/), S3 can store and retrieve vector embeddings at scale. Spice integrates S3 Vectors as a native data source, enabling [hybrid search](/platform/hybrid-sql-search) that combines vector similarity with full-text and keyword search in a single SQL query -- without a separate vector database. ### How does Spice accelerate queries on S3 data? Spice materializes frequently accessed S3 datasets into local accelerator engines like Arrow, DuckDB, or [Cayenne](/blog/introducing-spice-cayenne-data-accelerator). This reduces query latency from seconds (raw S3 reads) to sub-millisecond responses, while the source data remains in S3. Refresh policies keep accelerated data current without manual ETL. ### What is the difference between S3 Tables and traditional S3 storage for analytics? S3 Tables provides managed Apache Iceberg table support directly in S3, adding ACID transactions, schema evolution, and partition pruning to object storage. Traditional S3 stores flat files (Parquet, CSV) without table-level semantics. S3 Tables eliminates the need for a separate catalog service and simplifies [data lake acceleration](/use-case/datalake-accelerator) workflows.

    '} /> --- ## Real-Time Control Plane Acceleration with DynamoDB Streams  URL: https://spice.ai/blog/real-time-acceleration-with-dynamodb-streams Date: 2026-01-22T19:42:03 Description: How to sync DynamoDB data to thousands of nodes with sub-second latency using a two-tier architecture with DynamoDB Streams and Spice acceleration. TL;DR: A global cloud communications company needed to sync their DynamoDB configuration to thousands of nodes with sub-second latency. Their multi-tier caching setup was creating cold start penalties, tight coupling, and TTL tuning overhead. The solution was a two-tier architecture using DynamoDB Streams and Spice data acceleration that eliminated cache complexity and delivered sub-second propagation. 

    ' } /> The challenge: Decoupling the data plane from an OLTP app' } /> A global cloud communications company came to us with a deceptively hard problem. They were building a new data processing platform with a clear separation between control and data planes: 

    ' } /> Control plane: An OLTP application backed by DynamoDB where customers configure their data pipelines (a single-table design holding all configuration data).

    ' } /> Data plane: Thousands of processing nodes needing access to this configuration with single-digit millisecond latency. When customers update configurations, changes must reflect in the pipeline within seconds. 

    ' } /> Their initial approach used multi-tiered caching: each data plane node ran a daemon with an in-memory LRU cache backed by DAX and DynamoDB. This led to three problems: 

    ' } />
  • Cold start penalty: Cache misses required network traversal to DAX or DynamoDB, adding latency 
  • ' } />
  • Tight coupling: Data plane nodes directly coupled to the OLTP database-cache misses meant queries hitting DynamoDB 
  • ' } />
  • TTL tuning overhead: Constant balancing between keeping hot data local and propagating changes quickly 
  • ' } /> What they really wanted was to decouple the data plane entirely from DynamoDB by accelerating the complete dataset locally on each node instead of falling back to the source on cache miss. 

    ' } /> The solution: DynamoDB + local acceleration with Spice ' } /> In this post, we\'ll walk through how DynamoDB Streams and Spice keep accelerated datasets in sync across thousands of nodes - illustrating a pattern applicable to many distributed systems where control plane data needs to be available at the edge with ultra-low latency. 

    ' } /> Introduction to Spice acceleration' } /> First, let\'s cover how Spice makes this architecture possible. Spice is a unified query, search, and LLM inference platform that enables data-intensive applications and AI agents. Spice can be deployed anywhere - in the cloud, on-premises, at the edge, or next to your application as a sidecar - and accelerates data access for teams querying disparate data sources with ultra-low latency.  

    ' } /> Spice data acceleration materializes working sets from distributed data sources into local accelerator engines such as Arrow, SQLite, DuckDB, or Spice Cayenne for high-performance querying. By bringing frequently accessed data closer to compute, applications avoid repeated round-trips to source systems while achieving sub-second latency across operational and analytical workloads. 

    ' } />
    Figure 1: Spice acceleration architecture
    ' } /> The target architecture ' } /> The customer's key requirements included:

    "} />
  • Scale to thousands of data plane nodes  
  • ' } />
  • Single-digit millisecond read latency from local storage  
  • ' } />
  • Sub-second replication from DynamoDB to accelerated datasets 
  • ' } />
  • Support fast cold start so new nodes receive data within seconds 
  • ' } /> With Spice's acceleration capabilities in mind, we designed a two-tiered Spice architecture: 

    " } />
    Figure 2: Spice + DynamoDB Streams architecture
    ' } /> How it works:  ' } />
  • The central Spice layer consumes DynamoDB Streams and maintains a near-real-time accelerated dataset 
  • ' } />
  • Each data plane node runs a local Spice daemon with SQLite or DuckDB that syncs from the central layer
  • Data plane processes read from localhost - no network egress or coupling to DynamoDB 
  • ' } /> DynamoDB Streams vs Kinesis ' } /> DynamoDB offers two change capture options. We evaluated both: 

    ' } />
  • DynamoDB Streams provides exactly-once delivery with strict ordering within each shard. Records arrive in write order with no duplicates. 
  • ' } />
  • Kinesis Data Streams can deliver duplicates and doesn\'t guarantee ordering, requiring deduplication logic on every message. 
  • ' } /> For keeping accelerated tables in sync, exactly-once delivery was decisive. We didn't want deduplication overhead, and the 24-hour retention is sufficient since we checkpoint continuously. The trade-offs-shorter retention and fewer consumers-were acceptable for this use case. 

    " } /> Bootstrapping: The checkpoint-first approach ' } /> When connecting a DynamoDB table to Spice, we need to load current state before consuming changes. This is trickier than it sounds. 

    ' } /> The Problem with LATEST Iterators ' } /> The naive approach is to get a LATEST iterator for each shard, scan the table, and start consuming. But DynamoDB Streams iterators expire after 15 minutes. If your table takes longer to scan, your iterators are gone. 

    ' } /> Buffering changes during scan has problems too. For high-throughput tables, you could exhaust memory. For idle streams, you might never receive a message to establish position. 

    ' } /> Our solution: Checkpoint first, scan second ' } />
  • Create a checkpoint at the current stream position by walking all shards and recording their sequence numbers. 
  • ' } />
  • Scan the entire table and load all existing rows. 
  • ' } />
  • Subscribe using the checkpoint from step 1 and start consuming from the recorded position. 
  • ' } /> ```rust let (should_bootstrap, checkpoint) = load_or_initialize_checkpoint(&dynamodb, &dataset_name).await?; if should_bootstrap { let bootstrap_stream = Arc::clone(&dynamodb) .bootstrap_stream() .await .map(move |msg| { msg.map(|change_batch| { ChangeEnvelope::new(Box::new(NoOpCommitter), change_batch, false) }) }); ``` After bootstrap completes, we commit the checkpoint and start the changes stream: 

    ' } /> ```rust bootstrap_stream .chain( stream::once(async move { let committer = DynamoDBStreamCommitter::new(checkpoint_cloned); if let Err(err) = committer.commit() { tracing::error!("Failed to commit bootstrap checkpoint: {:?}", err); } stream::empty() }) .flatten() ) .chain(changes_stream_from_checkpoint(&dynamodb, &checkpoint)) ``` The time travel trade-off ' } /> The checkpoint points to a moment before the scan completes. Some changes during the scan will replay afterward. The table can briefly go back in time-a row might update to an older value before catching up. 

    ' } /> We mitigate this by not marking the dataset ready until stream lag drops below a threshold (default 2 seconds). Downstream consumers only see the dataset once it's caught up. 

    " } /> This approach works for any table regardless of size or throughput. There's no dependence on receiving messages within a window and no unbounded memory buffering. 

    " } /> Cold start and snapshotting ' } /> For the customer's use case, cold start performance was critical. New data plane nodes need to spin up with data ready in seconds, not minutes. 

    " } /> Our solution is to snapshot the accelerated dataset to object storage with the checkpoint embedded: 

    ' } />
    Figure 3: Snapshot to S3
    ' } /> When a new node starts, it downloads the latest snapshot from S3, reads the embedded watermark, and resumes the CDC stream from that position. 

    ' } /> This gets nodes operational in seconds rather than re-scanning the entire source table. For a dataset of a few gigabytes, startup time drops from minutes to single-digit seconds. 

    ' } /> Shard management with a pure state machine ' } /> DynamoDB Streams organizes data into shards with parent-child relationships. You must fully process a parent before reading children to maintain ordering. 

    ' } /> We modeled this as a state machine: 

    '} /> ```rust pub struct StreamState { active: HashMap, initializing: HashMap, blocked: HashMap, historical: HashMap, } ``` The key insight was to keep state transitions pure. All transitions happen through methods that take input and return results without external API calls: 

    ' } /> ```rust pub fn handle_poll_result( &mut self, shard_id: &str, new_iterator: Option, records: Vec, ) -> Result { if let Some(iter) = new_iterator { self.active.get_mut(shard_id)?.update_iterator(iter); } else { self.active.remove(shard_id); self.promote_children(shard_id); } } ``` When a shard exhausts, we promote its children from 'blocked' to 'initializing'. This separation means we can test every state transition without mocking AWS. 

    " } /> Error handling: Transient vs fatal ' } /> Errors fall into two categories: 

    '} /> ```rust pub enum Error { // Permanent - require intervention TableNotFound, StreamNotFound, StreamBeyondRetention, // Retriable - resolve with retry Timeout, ConnectionFailure, Throttled, // Special handling IteratorExpired, } ``` Iterator expiration needs special treatment. DynamoDB Streams iterators expire after 15 minutes of inactivity. You can't retry with the same iterator - you need a new one from your last checkpoint:

    " } /> ```rust if error.is_retriable() { tracing::warn!("Poll error for shard {}, will retry: {}", shard_id, error); Ok(()) } else if matches!(error, Error::IteratorExpired) { tracing::warn!("Iterator expired for shard {}, reinitializing", shard_id); reinitialize_shard_with_checkpoint(shard_id); Ok(()) } else { Err(error) } } ``` For transient errors, exponential backoff with a 60-second cap prevents thundering herds while recovering quickly from brief network issues.

    ' } /> Watermarks and dataset readiness ' } /> To track how far behind real-time we are, we use watermarks based on each record's approximate creation time. The minimum watermark across active shards indicates global progress. 

    " } /> ```rust fn combine_shard_batches(poll_results: &[ShardPollResult]) -> DynamoDBStreamBatch { let mut shard_watermarks = Vec::new(); for shard_result in poll_results { let is_watermark_eligible = match &shard_result.outcome { PollOutcome::Records { .. } => true, PollOutcome::Failed => true, // Failed shards represent unprocessed lag PollOutcome::Empty => false, // Empty shards are caught up }; if is_watermark_eligible { if let Some(watermark) = shard_result.current_watermark { shard_watermarks.push(watermark); } } } let watermark = shard_watermarks.into_iter().min() .unwrap_or_else(SystemTime::now); } ``` This watermark drives dataset readiness. A dataset is marked ready when lag drops below the threshold so downstream consumers don't see stale data during catch-up. 

    " } /> ```rust ChangeEnvelope::new( Box::new(committer), change_batch, lag.is_some_and(|l| l < acceptable_lag), // Ready signal ) ``` For the customer's use case, this means data plane processes don't see the local dataset until it's within 2 seconds of real-time-no stale reads during catch-up.

    " } /> Checkpointing for reliability' } /> Checkpoints capture sequence number positions for each shard:

    ' } /> ```rust pub struct ShardCheckpoint { pub sequence_number: String, pub parent_id: Option, pub updated_at: SystemTime, pub position: CheckpointPosition, } pub enum CheckpointPosition { At, // Resume AT this sequence (inclusive) - not yet processed After, // Resume AFTER this sequence (exclusive) - already processed } ``` On recovery, we resume from leaf shards only - those with no children in the checkpoint. Parents are already exhausted: 

    ' } /> ```rust pub fn leaf_shards(&self) -> Vec<(&String, &ShardCheckpoint)> { let parent_ids: HashSet<&str> = self.shards.values() .filter_map(|sc| sc.parent_id.as_deref()) .collect(); self.shards.iter() .filter(|(shard_id, _)| !parent_ids.contains(shard_id.as_str())) .collect() } ``` Checkpoints serialize as JSON to Spice's file-accelerated storage, enabling reliable resume after restarts. 

    " } /> Scaling to thousands of nodes ' } /> With thousands of data plane nodes, having each node consume directly from DynamoDB Streams isn't realistic. The central Spice layer acts as a fan-out point. 

    " } /> Edge nodes poll the central layer on a configurable interval using an append refresh strategy. This scales well - each edge node independently pulls updates without coordinating with others. Nodes can also filter to pull only relevant partitions, reducing data transfer for deployments where different node pools need different data subsets.  

    ' } /> For the customer's use case, different teams had different requirements: one had fewer nodes but larger datasets, while the other had smaller datasets across more nodes. The pull-based architecture handles both patterns efficiently. 

    " } /> Metrics and monitoring ' } /> DynamoDB Streams lacks built-in lag metrics, so we built our own and exposed them through OpenTelemetry: 

    ' } /> ```rust pub struct MetricsCollector { pub active_shards_number: RwLock, pub records: AtomicUsize, pub transient_errors: AtomicUsize, pub watermark: RwLock>, } ``` Exposed through OpenTelemetry:

    '} />
  • shards_active: Current active shards being polled 
  • records_consumed_total: Total records since startup 
  • lag_ms: Current lag from watermark to wall clock 
  • errors_transient_total: Recoverable error count 
  • ' } /> The lag metric is especially important for the customer's SLA; they need to verify configuration changes propagate within seconds.

    " } /> Configuration: A complete example ' } /> One of our design principles is making everything as easy as possible for developers. Here's a complete Spicepod configuration implementing the architecture described above: 

    " } /> ```yaml version: v1 kind: Spicepod name: dynamodb-streams-demo snapshots: enabled: true location: s3:// bootstrap_on_failure_behavior: fallback params: s3_auth: key s3_key: ${secrets:AWS_ACCESS_KEY_ID} s3_secret: ${secrets:AWS_SECRET_ACCESS_KEY} s3_region: us-east-2 datasets: - from: dynamodb: name:
    params: dynamodb_aws_region: ap-northeast-2 dynamodb_aws_auth: iam_role acceleration: enabled: true refresh_mode: changes engine: duckdb mode: file snapshots: enabled snapshots_trigger: time_interval snapshots_trigger_threshold: 2m metrics: - name: shards_active - name: records_consumed_total - name: lag_ms - name: errors_transient_total ``` This configuration points Spice at a DynamoDB table, enables CDC, accelerates to DuckDB, snapshots to S3 for fast cold start, falls back to bootstrap if snapshot loading fails, and exposes key metrics. 

    ' } /> That's it. No custom CDC consumers to build, checkpoint management code to write, or shard tracking logic to maintain. Point it at your table and start querying. 

    " } /> The magic moment ' } /> When our customer first deployed this configuration, their feedback was immediate: "It was too easy." They had expected weeks of integration work. Instead, they had real-time DynamoDB synchronization running in an afternoon. 

    ' } /> Lessons learned'} />
  • Choose abstractions that match your guarantees: DynamoDB Streams\' exactly-once delivery saved us from deduplication complexity. The "simpler" option with fewer features was actually less work. 
  • Bootstrap carefully: The checkpoint-first approach handles edge cases that naive strategies miss-large tables, idle streams, memory constraints. Temporary "time travel" during catch-up is an acceptable trade-off. 
  • Pure state machines pay off immediately: Separating state transitions from I/O made shard management testable and easy to reason about.
  • Build observability from day one: Without AWS-provided lag metrics, we built our own. Having watermarks and lag tracking from the start made debugging and operations much easier. 
  • Design for the scaling requirements: The two-tier architecture with push/pull flexibility handles both the "few nodes, large data" and "many nodes, small data" patterns the customer needed. 
  • ' } /> The architecture is extensible. If we need Kinesis Data Streams support for longer retention, the core state machine and checkpointing logic can be reused with a deduplication layer on top. 

    ' } /> Conclusion  ' } /> This pattern of using real-time change data capture to decouple application data planes from OLTP systems is increasingly common in modern, distributed architectures. What made this particular customer challenge compelling was the combination of strict latency requirements, thousands of downstream consumers, and the need for operational simplicity. 

    ' } /> By treating DynamoDB Streams as a reliable source of truth and pairing it with accelerated, local query engines, we were able to eliminate cache complexity, remove DynamoDB from the application critical path, and deliver configuration changes across the fleet in seconds.  

    ' } /> The same approach generalizes well beyond this use case: any system that needs fast, consistent access to changing data without rebuilding custom CDC consumers or managing fragile caching layers can benefit from this architecture.

    ' } /> Get started ' } /> Check out the Spice DynamoDB connector docs, Spice\'s broader CDC support, and the below demo for overviews on using Spice and DynamoDB.

    ' } /> And if you want to dig deeper into architectures like this, ask questions, or share what you\'re building, join the Spice community on Slack

    ' } /> --- ## Real-Time Hybrid Search Using RRF: A Hands-On Guide with Spice URL: https://spice.ai/blog/real-time-hybrid-search-using-rrf Date: 2025-10-23T16:10:00 Description: Learn how to build hybrid search with Reciprocal Rank Fusion (RRF) directly in SQL using Spice - combining text, vector, and time-based relevance in one query for faster, more accurate results. **TL;DR:** [Reciprocal Rank Fusion (RRF)](/platform/hybrid-sql-search) combines keyword, vector, and metadata search results into a single ranked list without score normalization. This guide walks through building real-time hybrid search with RRF directly in SQL using Spice -- from setup and embedding generation to multi-signal queries with time decay. --- Surfacing relevant answers to searches across datasets has historically meant navigating significant tradeoffs. Keyword (or lexical) search is fast, cheap, and commoditized, but limited by the constraints of exact matching. Vector (or semantic) search captures nuance and intent, but can be slower, harder to debug, and expensive to run at scale. Combining both usually entails standing up multiple engines (e.g. Elasticsearch for text, Pinecone for vectors), writing custom ranker logic, and maintaining ETL and data sync pipelines. 

    ' } /> As real-time, AI-powered applications and agents become ubiquitous, these compromises are less tenable. Users demand instant, context-rich results that balance precision with intent. This applies to both consumer and enterprise application search environments; for example, a business user searching across internal knowledge bases, or a customer searching for an item in their chat history on a consumer app. Waiting for data pipelines to sync, dealing with custom APIs, or troubleshooting a multi-system ranking stack introduces a variety of sub-optimal outcomes: inconsistent rankings, higher latency, or just simply inaccurate results. 

    ' } /> Hybrid Search with Reciprocal Rank Fusion (RRF)' } /> Reciprocal Rank Fusion (RRF) is an algorithm for hybrid search that helps mitigate the search challenge unfolded above. Instead of favoring one search modality, RRF merges results from multiple independent searches and variables (text, vector, metadata, recency, etc.) by combining their ranks and giving each signal proportional influence. This avoids "winner take all" blending and delivers results that are both topically relevant and contextually meaningful. 

    ' } /> In practice, each search query is executed independently, and the ranks of the returned results are combined using the following formula:

    ' } /> RRF Score = Σ(rank_weight / (k + rank))

    '} /> Documents that appear across multiple result sets receive higher scores, while the smoothing parameter k controls how much rank position affects the final score (lower values make higher-ranked items more influential).

    ' } /> RRF can also incorporate custom weighting and temporal decay, enabling developers to:

    ' } />
  • Adjust the influence of each query type using the rank_weight parameter.
  • Apply recency boosting by specifying a time_column and decay function.
    • Exponential decay: : e^(-decay_constant * age_in_units) where age is in decay_scale_secs
    • Linear decay:  max(0, 1 - (age_in_units / decay_window_secs))
  • ' } /> ‍This approach lets you incorporate exact keyword matches, semantic similarity, and time-based relevance in one consistent ranking.

    ' } />
    Figure 1: RFF Flowchart
    ' } /> RRF is fully integrated in Spice\'s hybrid search platform:

    ' } />
  • SQL operators combine text and vector search in a single query
  • Rank weights can be tuned per-query 
  • Recency and metadata can be included with no extra code
  • No external ranking server, no additional infrastructure, and no manual pipeline management are required
  • ' } /> ‍Let's take this out of the abstract and review a sequence of queries that illustrate the business value of RFF in Spice, from basic to more sophisticated ranking techniques. 

    " } /> Hybrid Search'} /> ```python -- Combine vector and text search for enhanced relevance SELECT id, title, content, fused_score FROM rrf( vector_search(documents, 'machine learning algorithms'), text_search(documents, 'neural networks deep learning', content), join_key => 'id' -- explicit join key for performance ) WHERE fused_score > 0.01 ORDER BY fused_score DESC LIMIT 5; ``` This first example illustrates the basic building blocks of RRF-powered hybrid search: merging semantic/vector and traditional keyword/text retrieval in one query. The result set balances conceptual relevance - capturing results related to "machine learning algorithms" - with precise keyword matches like "neural networks" or "deep learning.". Using the join_key ensures that performance scales commensurately with data volume.

    ' } /> Weighted Ranking'} /> ```python -- Boost semantic search over exact text matching SELECT fused_score, title, content FROM rrf( text_search(posts, 'artificial intelligence', rank_weight => 50.0), vector_search(posts, 'AI machine learning', rank_weight => 200.0) ) ORDER BY fused_score DESC LIMIT 10; ``` Weighting lets you fine-tune intent. Semantic results for "AI machine learning" are given four times more influence than exact text matches for "artificial intelligence." This allows development teams to favor context and meaning, surfacing more relevant content even when users don\'t type a precise phrase.

    ' } /> Recency-Boosted'} /> ```python -- Exponential decay favoring recent content SELECT fused_score, title, created_at FROM rrf( text_search(news, 'breaking news'), vector_search(news, 'latest updates'), time_column => 'created_at', recency_decay => 'exponential', decay_constant => 0.05, decay_scale_secs => 3600 -- 1 hour scale ) ORDER BY fused_score DESC LIMIT 10; ``` Finally, RRF can incorporate time as a ranking signal (important for use cases like trading exchanges, news, or social media). 

    ' } /> By specifying a time_column and a decay function, you can automatically boost time-pertinent results. In this example, exponential decay prioritizes newer stories while keeping hybrid relevance intact.

    ' } /> Use Case Walk-Through'} /> Now, let's walk through a hands-on example: capturing real-time Bluesky posts, embedding and full-text indexing them automatically, and running hybrid search queries with RRF via SQL.

    " } /> Step 1. Set up'} /> Clone this repository:

    '} /> ```python git clone https://github.com/spiceai/cookbook.git cd cookbook/search ``` Install websocat and set up Python:

    '} /> ```python brew install websocat python -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` Step 2. Preview and capture data'} /> We can read real-time posts using Bluesky\'s Jetstream relay service. Use websocat to preview the stream and ensure that the relay is functional:

    ' } /> ```python websocat wss://jetstream2.us-east.bsky.network/subscribe\?wantedCollections=app.bsky.feed.post | jq { "did": "did:plc:ei3py27iy2orpykshoudxnls", "time_us": 1758813540806266, "kind": "commit", "commit": { "rev": "3lzoas6yujs2z", "operation": "create", "collection": "app.bsky.feed.post", "rkey": "3lzoas6nbhs2e", "record": { "$type": "app.bsky.feed.post", "createdAt": "2025-09-25T15:19:00.163Z", "langs": [ "ja" ], "text": "🧐🧐🧐🧐🧐" }, "cid": "bafyreighkijp5zyclu6qdjtfskmr65ttvxvedvqmfvwgfyf6iaq4jfdje4" } } ... ^C ``` Let's convert this stream into a Parquet file that Spice AI can read. Let this run for a little while, until satisfied with the total number collected. Run again at any time to resume appending:

    " } /> ```python websocat wss://jetstream2.us-east.bsky.network/subscribe\?wantedCollections=app.bsky.feed.post | ./generate_parquet.py [info] boot! [info] INSERTED 250 ROWS; TOTAL 250 [info] INSERTED 250 ROWS; TOTAL 500 ``` Step 3. Start Spice and Search'} /> In a new terminal, start Spice. It will embed, full-text index, and ingest the latest data. Additionally, the file connector is using fsnotify to watch it for updates, to eagerly ingest data.

    ' } /> ```python spice run ``` You should see this output:

    '} /> ```python 2025-09-26T15:21:38.154354Z INFO spiced: Starting runtime v1.8.0-unstable-build.71ac09ff2+models.metal 2025-09-26T15:21:38.225135Z INFO runtime::init::caching: Initialized results cache; max size: 128.00 MiB, item ttl: 1s 2025-09-26T15:21:38.229824Z INFO runtime::init::caching: Initialized search results cache; max size: 128.00 MiB, item ttl: 1s 2025-09-26T15:21:38.230575Z INFO runtime::init::caching: Initialized embeddings cache; max size: 128.00 MiB, item ttl: 1s 2025-09-26T15:21:38.658824Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052 2025-09-26T15:21:38.658888Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051 2025-09-26T15:21:38.678694Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090 2025-09-26T15:21:47.550688Z INFO runtime::init::embedding: Embedding Model potion_128m ready 2025-09-26T15:21:47.659106Z INFO runtime::init::dataset: Dataset bluesky_posts initializing... 2025-09-26T15:21:47.730735Z INFO runtime::dataconnector::file: Watching changes to bluesky_posts.parquet 2025-09-26T15:21:47.730999Z INFO runtime::init::dataset: Dataset bluesky_posts registered (file://bluesky_posts.parquet), acceleration (duckdb:file, append), results cache enabled. 2025-09-26T15:21:47.740354Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset bluesky_posts 2025-09-26T15:21:57.885819Z INFO runtime::accelerated_table::refresh_task: Dataset bluesky_posts received 38,101 records 2025-09-26T15:21:58.507599Z INFO runtime::accelerated_table::refresh_task: Loaded 38,101 rows (54.72 MiB) for dataset bluesky_posts in 10s 775ms. 2025-09-26T15:21:58.550191Z INFO runtime: All components are loaded. Spice runtime is ready! 2025-09-26T15:22:20.335633Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset bluesky_posts 2025-09-26T15:22:21.960722Z INFO runtime::accelerated_table::refresh_task: Loaded 251 rows (339.49 kiB) for dataset bluesky_posts in 1s 656ms. ``` In a new terminal, start the Spice SQL REPL:

    '} /> ```python spice sql ``` Basic Hybrid Search'} /> Combine exact text matching with semantic similarity for comprehensive results:

    ' } /> ```python -- Find posts about space travel using both exact text and semantic search select fused_score, text, created_at, langs from rrf( text_search(bluesky_posts, 'space travel'), vector_search(bluesky_posts, 'space travel') ) order by fused_score desc limit 10; ``` Weighted Ranking'} /> Boost specific search strategies using rank_weight to prioritize different result types:

    ' } /> ```python -- Heavily prioritize semantic similarity over exact text matches select fused_score, text, rkey from rrf( text_search(bluesky_posts, 'artificial intelligence', rank_weight => 50.0), vector_search(bluesky_posts, 'AI machine learning', rank_weight => 200.0) ) order by fused_score desc limit 15; -- Prioritize exact mentions while including semantic results select fused_score, text, created_at from rrf( text_search(bluesky_posts, 'climate change', rank_weight => 300.0), vector_search(bluesky_posts, 'environmental sustainability', rank_weight => 100.0) ) order by fused_score desc limit 20; ``` Recency-Boosted Search'} /> Use temporal information to surface recent content with exponential or linear decay:

    ' } /> ```python -- Recent posts get higher scores with exponential decay select fused_score, text, created_at, rkey from rrf( text_search(bluesky_posts, 'breaking news'), vector_search(bluesky_posts, 'latest updates'), time_column => 'created_at', recency_decay => 'exponential', decay_constant => 0.05, decay_scale_secs => 3600 -- 1 hour scale ) order by fused_score desc limit 10; -- Linear decay for trending topics over the last day select fused_score, text, created_at from rrf( text_search(bluesky_posts, 'trending now'), vector_search(bluesky_posts, 'viral popular'), time_column => 'created_at', recency_decay => 'linear', decay_window_secs => 86400 -- 24 hours ) order by fused_score desc limit 15; ``` Advanced Parameter Tuning'} /> Fine-tune the RRF algorithm using the smoothing parameter k:

    ' } /> ```python -- Lower k value for more aggressive ranking differences select fused_score, text, langs from rrf( text_search(bluesky_posts, 'technology innovation'), vector_search(bluesky_posts, 'tech startups'), k => 20.0 -- More aggressive than default 60.0 ) order by fused_score desc limit 12; -- Higher k for smoother score distribution select fused_score, text, created_at from rrf( text_search(bluesky_posts, 'social media'), vector_search(bluesky_posts, 'online platforms'), k => 120.0 -- Smoother than default 60.0 ) order by fused_score desc limit 10; ``` Multi-Language and Content Analysis' } /> Combine vector search queries across languages for similar concepts:

    ' } /> ```python -- Find posts about "breaking news" with semantic query in Spanish, but keyword match in English select fused_score, text, langs, created_at from rrf( vector_search(bluesky_posts, 'ultimas noticias', rank_weight => 100), text_search(bluesky_posts, 'news'), time_column => 'created_at', recency_decay => 'exponential', decay_constant => 0.05, decay_scale_secs => 3600 -- 1 h ) where trim(text) != '' order by fused_score desc limit 15; -- Find posts about breaking news using two semantic queries in Spanish, but filter results for English select fused_score, text, langs, created_at from rrf( vector_search(bluesky_posts, 'ultimas noticias'), vector_search(bluesky_posts, 'noticias de ultima hora'), time_column => 'created_at', recency_decay => 'exponential', decay_constant => 0.05, decay_scale_secs => 3600 -- 1 h ) where langs like '%en%' and trim(text) != '' order by fused_score desc limit 15; ``` Step 4. Enable agentic support'} /> Stop Spice, and go to spicepod.yml and uncomment the models block. Update the .env file with your OpenAI key. Then start Spice again.

    ' } /> ```python spice run ``` Afterwards, begin a chat session:

    '} /> ```python spice chat ``` Try to query for insights using natural language:

    '} /> ```python chat> Can you see how many posts there are in the last day about photography? There were 676 posts about photography in the last day on the Bluesky platform. If you have any further questions or need additional insights, feel free to ask! Time: 10.12s (first token 9.50s). Tokens: 1635. Prompt: 1588. Completion: 47 (75.62/s). chat> Can you show me a breakdown by language? Here's a breakdown of the posts about photography in the last day by language: 1. **English (en):** 596 posts 2. **German (de):** 41 posts 3. **Finnish (fi):** 20 posts 4. **Unspecified:** 17 posts 5. **English, Hebrew, Sanskrit (en, he, sa):** 1 post 6. **Dutch (nl):** 1 post ``` And that's it. We've just walked through a full

    "} />

    '} /> Next steps'} /> Hybrid Search with RRF in Spice eliminates external ranking servers, sync pipelines, or duplicated datasets; you can query, rank, and reason across disparate data sources from a single SQL interface. Whether you're powering an internal knowledge assistant or surfacing live content from social feeds, you get near-real-time, context-rich results with minimal overhead.

    " } /> Get started:

    '} />
  • Sign up for Spice Cloud for free, or get started with Spice OSS 
  • Explore the Hybrid Search Docs
  • Go through the RFF cookbook example
  • Schedule a demo if you\'d like full a walk-through
  • ' } /> ## Frequently Asked Questions ### What is Reciprocal Rank Fusion (RRF) and how does it work? Reciprocal Rank Fusion is a ranking algorithm that merges results from multiple search methods -- such as keyword search and vector search -- into a single, unified ranking. RRF scores each result based on its position in each individual ranking using the formula `1 / (k + rank)`, then sums scores across all rankings. This approach is lightweight, requires no model training, and consistently produces high-quality hybrid results. ### When should I use hybrid search instead of vector search alone? Hybrid search outperforms vector-only search when queries contain specific identifiers, product names, error codes, or exact terms that semantic embeddings may miss. It also helps when your dataset includes both structured metadata and unstructured text. Combining keyword precision with semantic understanding via RRF produces more consistent results across diverse query types. ### How does Spice implement hybrid search in a single query? Spice provides built-in `vector_search` and `text_search` [SQL functions](/platform/hybrid-sql-search) that can be combined with standard SQL joins and a `rrf()` ranking function. This means keyword search, vector similarity, and RRF fusion all execute within the same query engine -- no external services, no multi-system orchestration. Results are ranked and returned in a single round-trip. ### What is the advantage of RRF over other fusion methods like weighted scoring? RRF is rank-based rather than score-based, which makes it robust across search methods that use different scoring scales. Weighted scoring requires tuning weights for each method and can be sensitive to score distribution changes. RRF delivers strong results out of the box without hyperparameter tuning, making it a practical default for most [application search](/use-case/application-search) use cases. '} /> --- ## Spice AI achieves SOC 2 Type II compliance URL: https://spice.ai/blog/spice-ai-achieves-soc-2-type-ii-compliance Date: 2024-03-05T19:25:00 Description: Spice AI completes SOC 2 Type II audit, demonstrating enterprise-grade security and compliance for its data and AI infrastructure platform.
    Spice AI has achieved SOC-2 Type II compliance.
    ' } /> In June last year, Spice AI announced enterprise-grade performance with our second-generation platform for sub-second SQL queries across 100TBs of time-series data.

    ' } /> Today, we're announcing Spice AI has achieved SOC 2 Type II compliance as of Feb 16, 2024, in accordance with American Institute of Certified Public Accountants (AICPA) standards for SOC for Service Organizations also known as SSAE 18. Achieving this standard with an unqualified opinion serves as third-party industry validation that Spice AI provides enterprise-level security for customer's data secured in the Spice AI platform.

    " } /> Spice AI provides a data and AI infrastructure platform that brings together the infra building blocks needed to build intelligent applications. Security and compliance are top priority for Spice AI. Principles including Compliance, Secure-Access-Control, and Data Protection, are core to how we build and operate the Spice.ai platform, team, and company.

    ' } />
    The Spice.ai platform is built from security and compliance first-principles.
    ' } /> Spice AI was audited by Prescient Assurance, a leader in security and compliance attestation for B2B, SAAS companies worldwide. Prescient Assurance is a registered public accounting in the US and Canada and provides risk management and assurance services which includes but is not limited to SOC 2, PCI, ISO, NIST, GDPR, CCPA, HIPAA, and CSA STAR. For more information about Prescient Assurance, you may reach out them at info@prescientassurance.com.

    ' } /> An unqualified opinion on a SOC 2 Type II audit report demonstrates to Spice AI\'s current and future customers that we manage data and our platform with the highest standard of security and compliance. More information on Spice AI security can be found at the Spice AI security page.

    ' } /> ' } />

    '} /> --- ## The Spice.ai for GitHub Copilot Extension is now available! URL: https://spice.ai/blog/spice-ai-for-github-copilot-extension-now-available Date: 2024-10-27T19:07:00 Description: With the Spice.ai Extension, developers can interact with data, like product requirements documents (PRDs), tickets, and tabular data, from any external data source directly within GitHub Copilot. Save hours copying and pasting across various platforms, relevant data and answers are now surfaced in Copilot Chat, right when you need it. The new Spice.ai for GitHub Copilot Extension, now available in preview, gives developers access to data from external sources directly within the GitHub Copilot experience.

    ' } /> Developers often face the hassle of switching between multiple environments to get the data they need. Whether it's referencing internal documentation or copying details from another ticketing system, the constant context-switching disrupts focus and consumes valuable development time.

    " } /> With the Spice.ai Extension, developers can interact with data, like product requirements documents (PRDs), tickets, and tabular data, from any external data source directly within GitHub Copilot. Save hours copying and pasting across various platforms, relevant data and answers are now surfaced in Copilot Chat, right when you need it.

    ' } />
    Chatting with logs stored on S3 in Copilot Chat with the Spice.ai Extension.
    ' } /> Spice.ai Extension installation and activation' } /> Getting started with the Spice.ai Extension is easy. Get the extension directly from GitHub Copilot in just three steps! 

    ' } />
  • Type @spiceai in Copilot Chat to activate the extension
  • Click Connect to authorize the Spice.ai Cloud Platform - our enterprise-grade data and AI platform. Spice.ai integrates with GitHub for authentication, automatically creating an account and Copilot app, so you can easily configure data sources.
  • Next, you\'ll be able to choose from a set of ready-to-use datasets, like React.js and TailwindCSS, to get started. Spice.ai can also connect to a wide range of data sources, including GitHub repositories, SQL databases, data warehouses, data lakes, and GraphQL endpoints, which can be configured later.
  • ' } /> Now, with the Spice.ai Extension configured, you can mention @spiceai in Copilot Chat to access configured datasets, documentation, issue trackers, and more.

    ' } />
    Configuring the extension.
    ' } /> Use cases and prompts for the Spice.ai Extension' } /> The Spice.ai GitHub Copilot Extension, gives you access to external datasets, right within Copilot Chat. Here are just a few ways you can use it.

    ' } />
  • Query available datasets. Quickly list all available datasets
    • Try: @spiceai What datasets do you have access to?
  • Access relevant documentation. Need documentation related to the file or component you\'re working on?
    • Try: @spiceai What documentation is relevant to this file?
    • Try: @spiceai Write documentation about the user authentication issue.
  • Review tickets and issues. Sometimes issues, tickets, advisories might be stored in external systems
    • Try: @spiceai What OPEN issues are relevant to Next.js ISR?
    • Try: @spiceai Find the 5 most recent CLOSED issues in Next.js related to routing. Include a brief summary of each issue or fix and a link to the issue.
  • ' } /> Check out the video below to see how fast and simple it is to get started with the Spice.ai GitHub Copilot Extension.

    ' } /> Ready to Try the Spice.ai for Copilot Extension?' } /> You can start using the Spice.ai for GitHub Copilot Extension today. It\'s available in preview for free by the Community Edition, with usage limits. Stay tuned for the upcoming general availability launch, where we\'ll introduce paid commercial plans for professionals and organizations.

    ' } /> → Get the Spice.ai for GitHub Copilot Extension now.

    ' } /> Got questions or feedback? Let us know on the Spice AI Discord. Your feedback helps shape the future of the extension as we work to improve it and add more data sources.

    ' } /> We\'re excited to see how you\'ll use the Spice.ai for GitHub Copilot Extension to accelerate your development process. Follow the Spice.ai Blog for updates.

    ' } /> Learn more about Spice.ai and GitHub Copilot' } /> Spice AI is making data-driven AI app development simple and easy for developers. By providing tools that make data more accessible and actionable to AI, Spice AI helps developers to build useful and accurate AI applications faster.

    ' } />

    '} /> The Spice.ai GitHub for Copilot Extension was developed through the GitHub Copilot Partner Program, which supports partners building personalized workflows for Copilot. GitHub Copilot empowers developers by automating repetitive tasks and with proven results of boosting developer productivity by up to 55%, and Spice.ai.

    ' } />

    '} />

    '} /> --- ## Spice.ai is now generally available! URL: https://spice.ai/blog/spice-ai-is-now-generally-available Date: 2023-10-25T19:34:00 Description: Spice.ai is now available for everyone, including a new community-centric developer hub and Community Edition complimentary for developers. Powering intelligent applications with composable data and time-series AI building blocks

    ' } /> TL;DR'} /> Spice.ai, now publicly available, is your hub for building intelligent data and time-series AI applications.

    ' } /> Get composable, data and AI building blocks, including pre-trained machine learning models for AI predictions, and a petabyte-scale cloud data platform preloaded with 100TB+ of ready-to-use Web3, Asset Prices, and time-series data. Create, fork, and share hosted Datasets, Views, and ML Models with the new GitHub integrated Spice.ai Community Edition - complimentary for developers.

    ' } /> Login with GitHub to get started in seconds →

    ' } /> ' } /> Spice.ai is now generally available!' } /> Since the Waitlisted Preview launch last year, projects like YakoaEigenLayer, and Entendre Finance have leveraged building blocks from Spice AI\'s enterprise-grade platform to create high-performance, highly available, data and AI-driven applications.

    ' } /> Today, we're announcing Spice.ai is publicly available for everyone, including a brand-new community-centric hub and a new Community Edition plan complimentary for developers.

    " } /> Developers now have access to petabyte-scale cloud data and AI infrastructure, preloaded with 100TB+ of Web3, Asset Prices, and time-series data, delivered with SQL, Apache Arrow, and DuckDB. Community members can create, fork, and share datasets, and access data in real-time to power data-driven applications, monitoring, and analytics. Datasets can be fed directly into Spice.ai hosted ML training and inferencing for real-time decision-making applications.

    ' } /> Spice.ai is an application platform that brings together everything needed to build an intelligent application in one application-centric AI-backend-as-a-service.

    ' } /> ' } /> ' } /> ' } /> ' } /> Announcing new building blocks'} /> We're rolling out exciting new tools and features that can take your intelligent application development to the next level.

    " } /> Spice Firecache: Turbocharge Your SQL Queries' } /> Spice Firecache is a real-time, in-memory SQL service based on cloud-scale DuckDB instances that enables blazing fast SQL query up to 10x the performance of general SQL query. EigenLayer uses Spice Firecache to enable scenarios not possible before, including serving dynamic data to their high traffic dashboards, real-time monitoring, and analytics. ML models hosted on the Spice.ai platform can be paired with Firecache to power fast, low-latency inferencing, as demonstrated by the AI predictions demo on the Spice.ai website.

    ' } />
  • If you like DuckDB, you\'ll love it at cloud-scale, automatically provisioned and updated with real-time data.
  • ' } /> Spice Functions: Beyond SQL'} /> Spice Functions is a hosted Bacalhau service for serverless function compute that enables data transformation and processing co-located with your data and machine learning pipelines. Developers can write code in Go or Python. And every Spice Function deployment gets access to a dedicated hosted DuckDB instance for working data.

    ' } />
  • Spice Functions unlocks scenarios that are difficult, expensive, or even impossible in pure SQL, such as adding to a basic accumulator or applying a custom algorithm on each new block of data.
  • ' } />
    Adding PostgreSQL as an external data source.
    ' } /> Custom Datasets and Views: Your Data, Your Rules' } /> Developers can tailor Spice data to their application with Custom Datasets and Views, defined in GitHub, alongside their application code. Datasets can be populated with Spice Functions and by connecting to external data sources starting with PostgreSQL. We\'re excited to announce that Yakoa - IP protection for the blockchain - is one of the first projects to make their NFT data available in the Spice.ai platform with the release of their Copymint datasets.

    ' } />

    "Other solutions were prohibitively expensive - what we could do in Spice with a single query would have taken millions of API calls in other platforms." - Andrew Dworschak, CEO & Co-founder of Yakoa

    ' } /> We\'re also making over 15 EigenLayer datasets public that ecosystem participants can use to build data-driven experiences, like Nethermind has done with their Restaking Dashboard. Combined with Spice.ai\'s rich Ethereum Beacon chain data and HTTP API, observability into the EigenLayer universe has never been easier.

    ' } />
    Demo of Spice Firecache accelerated inferencing on Spice.ai hosted ML Models on the Spice.ai website.
    ' } /> Spice ML Models: Automated Machine Learning' } /> Like Datasets and Views, ML Model definitions are synced from GitHub to the Spice.ai platform and connected to Spice.ai hosted data, hosted machine learning pipelines, and Spice Firecache. The entire machine learning data lifecycle from origin to processing to training and inferencing is automatically and seamlessly managed by the Spice.ai platform so that developers can create decision-making applications, such as predicting and responding to resource requirements or mitigating potential security concerns with ease.

    ' } /> Summary'} /> With the release of Spice Firecache, Spice Functions, Custom Datasets and Views, ML Models, and a community-centric developer hub to build and share datasets like from innovators Yakoa and EigenLayer, developers have the next set of building blocks to ship intelligent software for application and ecosystem observability, real-time monitoring and security, AI-powered IP protection, and more!

    ' } /> Try the Spice.ai platform today →

    ' } />

    '} />

    '} /> --- ## Faster, Simpler Dashboards with Spice and Power BI URL: https://spice.ai/blog/spice-and-power-bi Date: 2025-09-15T18:51:00 Description: Spice AI built a Microsoft Power BI Connector on top of the Flight SQL ADBC driver that makes it easy for Power BI users to query across operational databases, analytical warehouses, and object stores. TL;DR'} /> Spice AI built a Microsoft Power BI Connector on top of the Flight SQL ADBC driver that makes it easy for Power BI users to query across operational databases, analytical warehouses, and object stores. Spice federates large OLTP and OLAP datasets and accelerates them at the application layer in DuckDB and Arrow to support sub-second dashboards, driving significant performance and ease-of-use improvements for enterprise BI use cases. Under the hood, Spice leverages Arrow Database Connectivity (ADBC) and Flight SQL for Arrow-native performance that eliminates row-to-column conversion overhead.

    ' } /> The Enterprise BI Challenge'} /> Enterprises rely on Power BI to analyze and visualize data, but the data itself often lives across many systems: operational databases like Postgres or MongoDB, historical datasets in S3 or Delta/Iceberg tables, and real-time streams from systems like Kafka. Traditionally, making this data available in Power BI requires complex ETL pipelines, duplicated storage in various warehouses, and ongoing engineering effort to keep everything in sync. This results in dashboards with slow refresh times and fragile, operationally-intensive pipelines.  

    ' } /> Spice changes this dynamic by combining query federation and local acceleration into a single, lightweight runtime. Instead of copying data into a warehouse, Spice connects directly to distributed systems, federates queries across them, and accelerates datasets with DuckDB and Apache Arrow at the application layer. 

    ' } /> Introducing the Spice.ai Power BI Connector, Built on ADBC' } /> The Spice.ai Power BI Connector sits between Power BI and the systems it needs to query. Data federation removes the need to consolidate sources up front, and Power BI can treat them as if they were a single dataset; analysts can configure Spice once and immediately query across their data estate. For Power BI users, this means dashboards can run on live data from OLTP and OLAP sources without the burden of additional infrastructure management.

    ' } />

    '} />
    Figure 1: Data Federation in Spice
    ' } />

    '} /> Acceleration with DuckDB & Arrow' } /> Federation simplifies connectivity, but acceleration delivers performance; the Spice runtime prefetches working sets of data from these upstream systems and locally accelerates them using DuckDB and Apache Arrow. This eliminates repeated network round-trips, stores data closer to applications, and enables sub-second queries - even for billion-row datasets. Data can be refreshed on a schedule or in real time with CDC, ensuring dashboards always present the latest information.

    ' } /> Further, unlike traditional lakehouses that focus on analytical workloads, Spice accelerates both operational and analytical data. Teams can query transactional sources like Postgres alongside large analytical datasets in S3, all with the same sub-second performance and without overwhelming production databases. For example, Spice can ingest row-based Postgres tables, accelerate them in DuckDB, and serve the results to Power BI in columnar Arrow format for interactive dashboards.

    ' } />

    '} />
    Figure 2: Acceleration in Spice
    ' } /> Decreasing Latency with ADBC'} /> Under the hood, the Spice.ai Power BI Connector is built on the Arrow Database Connectivity (ADBC) and Flight SQL. ADBC provides a vendor-agnostic, Arrow-native standard for delivering columnar data directly to applications and dashboards. In contrast to JDBC and ODBC, ADBC avoids row/column conversions and provides a faster, more efficient path for analytical queries.

    ' } /> Use Case Example: Faster Power BI Queries in Spice Compared to RDS' } /> Let's take this out of the abstract and introduce a practical use case to illustrate the value of Spice.

    " } /> Consider an order management platform that previously relied on an AWS RDS database for recent transactions and S3 for longer-term history. Without Spice, a dashboard showing historical orders would require querying RDS directly, moving potentially millions of rows across the network, and joining that data with S3 data in a warehouse. With Spice, those tables can be federated, accelerated locally, and queried directly by Power BI. 

    ' } /> The video illustrates the performance delta between Spice's acceleration model (the dashboard on the right side of the screen) and a more traditional approach of querying RDS directly (the dashboard on the left side of the screen): query time is decreased from 1324 ms to 223 ms with Spice compared to RDS. That's the difference between a clunky analyst experience and an interactive, real-time workflow.

    " } />

    '} /> ' } />

    '} /> Before (without Spice)'} />
  • Data must be moved into a warehouse via ETL jobs.
  • Dashboards refresh slowly because data flows through JDBC/ODBC, converting columnar to row back to column again.
  • Engineering teams maintain multiple pipelines and troubleshoot schema mismatches.
  • ' } /> After (with Spice + ADBC)'} />
  • Both Postgres and S3 datasets are defined in a Spicepod (the core configuration unit in Spice, a YAML-based package that defines the datasets, models, and acceleration an application requires). 
  • Power BI connects to Spice through the ADBC-based connector.
  • Federated queries run directly across both sources; results return in Arrow format with no row/column conversions.
  • Data stored in Postgres is accelerated in DuckDB.
  • Dashboards refresh in sub-seconds. Pipelines are eliminated, and analysts can build new reports faster.
  • ' } /> Walkthrough: Installing and Using the Connector' } /> Power BI Desktop'} />
  • Download the latest spice_adbc.mez file from the releases page 
  • Copy to the Power BI Custom Connectors directory: C:\\Users\\[USERNAME]\\Documents\\Microsoft Power BI Desktop\\Custom Connectors
  • ' } /> ```python Invoke-WebRequest -Uri "https://github.com/spiceai/powerbi-connector/releases/latest/download/spice_adbc.mez" -OutFile "C:\Users\[USERNAME]\Documents\Microsoft Power BI Desktop\Custom Connectors\spice_adbc.mez" ```
  •  Enable Uncertified Connectors in Power BI Desktop settings and restart Power BI Desktop.
  • ' } /> Adding Spice as a Data Source'} />
  • Open Power BI Desktop.
  • Click on Get Data → More....
  • In the dialog, select Spice.ai connector.
  • ' } /> ' } />

    '} />
  • Click Connect.
  • Enter the ADBC (Arrow Flight SQL) Endpoint:
    • For Spice Cloud Platform:
      grpc+tls://flight.spiceai.io:443
      (Use the region-specific address if applicable.)
    • For on-premises/self-hosted Spice.ai:
      • Without TLS (default): grpc://<server-ip>:50051
      • With TLS: grpc+tls://<server-ip>:50051
  • ' } /> ' } />
  • Select the Data Connectivity mode:
    • Import: Data is loaded into Power BI, enabling extensive functionality but requiring periodic refreshes and sufficient local memory to accommodate the dataset.
    • DirectQuery: Queries are executed directly against Spice in real-time, providing fast performance even on large datasets by leveraging Spice\'s optimized query engine.
  • Click OK.
  • Select Authentication option:
    • Anonymous: Select for unauthenticated on-premises deployments.
  • ' } /> API Key: Your Spice.ai API key for authentication (required for Spice Cloud). Follow the guide to obtain it from the Spice Cloud portal.

    ' } /> ' } />
  • Click Connect to establish the connection.
  • ' } /> Working with Spice datasets'} /> After establishing a connection, Spice datasets appear under their respective schemas, with the default schema being spice.public. When writing native queries, use the PostgreSQL dialect, as Spice is built on this standard.

    ' } /> ' } /> For a list of supported data types, visit the docs here.

    ' } /> Getting started with Spice and Power BI' } /> Spice extends Power BI beyond the traditional limits imposed on it by sub-optimal ETL pipelines.

    ' } /> Spice enables Power BI users to:

    '} />
  • Run federated SQL queries across disparate data sources in one place with zero ETL required. 
  • Accelerate and materialize large datasets for sub-second dashboards in Power BI.
  • Use either Import Mode for full feature access or DirectQuery Mode for real-time results.
  • Build on open standards like ADBC and Apache Arrow, not on proprietary SDKs.
  • ' } />

    '} /> To try it out, download the Spice Power BI Connector and follow the documentation. Configure your first dataset, connect Power BI, and experience how federation and acceleration can make your dashboards faster and more reliable.

    ' } /> Resources'} />
  • Sign up for Spice Cloud for free, or get started with Spice Open Source
  • Install the connector
  • Explore the Spice cookbooks and docs 
  • ' } />

    '} /> --- ## Spice Cloud v1.10: Caching Acceleration Mode, DynamoDB Streams Support, & More! URL: https://spice.ai/blog/spice-cloud-v1-1-0 Date: 2025-12-10T18:30:11 Description: Spice v1.10 includes a new caching acceleration mode, a new DynamoDB Streams data connector in preview, Amazon S3 location-based pruning, S3 Tables write support, and several performance and security improvements. Spice Cloud & Spice.ai Enterprise 1.10 are live!

    '} /> Spice v1.10 includes a new caching acceleration mode, a new DynamoDB Streams data connector in preview, Amazon S3 location-based pruning, S3 Tables write support, and several performance and security improvements.

    ' } /> New in Spice Cloud: Persisted Metrics!' } /> View complete monitoring history after restarts, ensuring no gaps in visibility.

    ' } /> ' } /> Spice Cloud customers will automatically upgrade to v1.10 on deployment, while Spice.ai Enterprise customers can consume the Enterprise v1.9.0 image from the Spice AWS Marketplace listing.

    ' } /> What\'s New in v1.10'} /> Caching Acceleration Mode'} /> Spice's new caching acceleration mode provides stale-while-revalidate (SWR) behavior for accelerations with background refreshes, enabling file-persisted caching using Cayenne, DuckDB, or SQLite.

    " } />
    Figure 1: Spicepod.yaml caching acceleration configuration example
    ' } /> Learn more about getting started with caching accelerator in in the acceleration docs.

    ' } /> DynamoDB Streams Support (Preview)' } /> The DynamoDB connector now integrates with DynamoDB Streams, enabling real-time change-data-capture (CDC) for DynamoDB with automatic table bootstrapping and acceleration snapshots support.

    ' } />
    Figure 2: DynamoDB Streams Spicepod.yaml configuration example
    ' } /> Learn more about using DynamoDB Streams in the docs.

    ' } /> S3 Data Connector Improvements '} /> The S3 data connector now supports location-based predicate pruning - dramatically reducing data scanned by pushing down location filter predicates to S3 listing operations. And, AWS S3 Tables now have full read/write capability in Spice!

    ' } />
    Figure 3: Location-based predicate pruning example
    ' } />
    Figure 4: S3 writes example
    ' } /> Learn more in the S3 data connector docs.

    ' } /> TinyLFU Cache Eviction Policy '} /> TinyLFU is now available for the SQL results cache! TinyLFU is a probabilistic cache admission policy that maintains higher hit rates than LRU (Least Recently Used) while keeping memory usage predictable, making it ideal for workloads with uneven query patterns

    ' } />
    Figure 5: TinyLFU configuration example
    ' } /> Learn more in the caching docs.

    ' } /> Additional Updates'} />
  • Search optimizations for faster full-text-search (FTS) queries, reduced vector index overhead, and better limit pushdown.
  • Security hardening including stronger identifier handling, expanded token redaction, and safe archive extraction.
  • Developer experience updates including new health probe latency metrics and REPL history improvements. 
  • ' } /> For more details on v1.10, visit the release notes.

    ' } /> New to the Spice Cookbook'} /> DynamoDB Streams'} /> ' } /> Learn how to configure a Spice dataset to stream real-time changes from an AWS-hosted DynamoDB table using DynamoDB Streams. Watch as inserts, updates, and deletes automatically flow into Spice!

    ' } /> Try the DynamoDB Streams cookbook here.

    ' } /> Caching Accelerator '} /> ' } /> This recipe walks you through the setup and core functionality for the new caching accelerator, which provides intelligent caching for HTTP-based datasets with Stale-While-Revalidate (SWR) support.

    ' } /> Try the caching accelerator cookbook here.

    ' } /> Schedule a demo if you\'d like to see the product live or have any questions, sign up for Spice Cloud for free, or get started with Spice Open Source.

    ' } /> --- ## Spice Cloud v1.11: Spice Cayenne Reaches Beta, Apache DataFusion v51, DynamoDB Streams Improvements, & More URL: https://spice.ai/blog/spice-cloud-v1-11 Date: 2026-01-30T17:56:27 Description: v1.11 brings Spice Cayenne to Beta, DataFusion v51 and Apache Arrow v57.2, improved DynamoDB Streams, and more. We're excited to announce Spice v1.11 is now available in Spice Cloud - a major release with over 43 new features, improvements, and fixes. 
     
    Spice v1.11 brings Spice Cayenne to Beta with 3x lower memory usage than DuckDB, significant performance upgrades across the entire compute stack: DataFusion v51, Apache Arrow v57.2, improved DynamoDB Streams, and an optimized caching acceleration mode. 
     
    And, Spice Cloud monitoring has new real-time metrics and dashboards! 

    " } /> Monitor your Spice Cloud apps in production ' } /> New real-time dashboards give you complete visibility into API performance, data egress, and cache efficiency - so you can optimize costs and catch issues before they impact users. 

    ' } />
    Figure 1. HTTP and Flight API request dashboards for insights into volume, latency, and query performance.
    ' } />

    '} />
    Figure 2. Track data egress costs and cache hit rates in real-time. 
    ' } /> New to Spice Cloud? Sign up and get $25 in free AI credits. Query databases, data lakes, and data warehouses, add instant RAG and AI analysis with zero-ETL. (US customers only). 

    ' } /> Spice Cloud customers will automatically upgrade to v1.10 on deployment, while Spice.ai Enterprise customers can consume the Enterprise v1.9.0 image from the Spice AWS Marketplace listing.

    ' } /> Join the v1.11 Release Community Call ' } /> ' } /> Connect with the Spice team and community for live demos of what's new in v1.11. Ask questions, share feedback, and get a preview of what's next. 

    " } /> ' } /> Major v1.11 Features' } /> Spice Cayenne Beta ' } /> Spice Cayenne TPCH Benchmark
    Figure 3. Spice Cayenne TPC-H SF-100 benchmark
    ' } /> Spice Cayenne, the premier Spice data accelerator built on the Vortex columnar format, has been promoted to Beta. Cayenne delivers 1.4x faster queries than DuckDB with 3x lower memory usage on TPC-H SF100 benchmarks. 

    ' } /> New Cayenne features in v1.11 include: 

    ' } />
  • Acceleration Snapshots: Point-in-time recovery for fast bootstrap and rollback capabilities 
  • Key-based Deletion Vectors: More efficient data management and faster delete operations 
  • S3 Express One Zone: Store Cayenne files in S3 Express One Zone for single-digit millisecond latency 
  • Primary Key On-Conflict Handling: New `on_conflict` config for Cayenne tables with primary keys supports upsert or duplicate-ignore behavior 
  • ' } /> ' } /> Apache DataFusion v51 Upgrade ' } />
    Figure 4: Apache DataFusion performance improvements. Source: DataFusion docs.
    ' } /> DataFusion v51 brings significant performance improvements and new SQL functionality: 

    ' } /> Performance: 

    ' } />
  • Faster CASE expression evaluation with short-circuit optimization 
  • Better defaults for remote Parquet reads (avoids 2 I/O requests per file) 
  • 4x faster Parquet metadata parsing  
  • ' } /> New SQL features: 

    ' } />
  • Support for |> syntax for inline transforms 
  • `DESCRIBE <query>` returns schema of any query without executing it  
  • Named function arguments `param => value` syntax for scalar, aggregate, and window functions 
  • Decimal32/Decimal64 type support 
  • ' } /> ' } /> Apache Arrow 57.2 Upgrade ' } />
    Figure 5. Apache Parquet performance with Thrift Parser.
    ' } /> Arrow 57.2 delivers major performance improvements: 

    '} />
  • 4x faster Parquet metadata parsing with rewritten thrift metadata parser 
  • Parquet Variant Support (Experimental): Read/write support for semi-structured data 
  • Parquet Geometry Support: Read/write for `GEOMETRY` and `GEOGRAPHY` types 
  • New `arrow-avro` Crate: Efficient conversion between Apache Avro and Arrow with projection pushdown 
  • ' } /> ' } /> DynamoDB Connector & DynamoDB Streams Improvements ' } />
    Figure 6. DynamoDB Streams configuration.
    ' } /> DynamoDB Streams are now more reliable and flexible with JSON nesting support and improved batch deletion handling. 

    ' } /> ' } /> Caching Acceleration Mode Improvements ' } />
    Figure 6. Sample caching acceleration mode configuration.
    ' } /> Major performance optimizations and reliability fixes for caching acceleration mode deliver sub-millisecond cached queries with faster response times on cache misses. 

    ' } /> Performance: 

    ' } />
  • Non-blocking cache writes: Cache misses no longer block query responses; data writes asynchronously 
  • Batch cache writes: Multiple entries written in batches for better throughput 
  • ' } /> Reliability: 

    ' } />
  • Stale-While-Revalidate (SWR) behavior: Refreshes only the entries that were accessed instead of refreshing all stale rows  
  • Deduplicated refresh requests: Prevents redundant source queries 
  • Fixed cache hit detection: Queries now correctly detect cached data 
  • ' } /> ' } /> Additional Features ' } /> Prepared Statements: Spice now supports prepared statements, enabling parameterized queries that improve performance and security by preventing SQL injection attacks with full SDK support across the Go, Rust, .NET, Java, JavaScript, and Python clients. 

    ' } /> iceberg-rust v0.8.0: v0.8.0 brings support for Iceberg V3 table metadata format, INSERT INTO for partitioned tables, and more. 

    ' } /> Acceleration Snapshots Improvements: Additions in v1.11 include flexible triggers based on time intervals or batch counts, automatic compaction to reduce storage overhead, and better creation policies that only create snapshot when data changes. 

    ' } /> New Data Connectors 

    ' } />
  • NFS: Query data on Unix/Linux NFS exports 
  • ScyllaDB: Query the high-performance NoSQL database via CQL. 
  • ' } /> Google LLM Support: Spice now supports Google embedding and chat models via the Google AI provider 

    ' } /> URL Tables: Query data directly via URL in SQL from S3, Azure Blob Storage, and HTTP/HTTPS. 

    ' } /> Hash Indexing for Arrow Acceleration (experimental): Arrow-based accelerations now support opt-in hash indexing for faster point lookups on equality predicates.

    ' } /> From the Blog: How we Use Apache DataFusion at Spice AI  ' } /> ' } /> A technical deep-dive on how Spice uses and extends Apache DataFusion with custom table providers, optimizer rules, and UDFs to power federated SQL, search, and AI inference. 

    ' } /> ' } /> From the Blog: Real-Time Control Plane Acceleration with DynamoDB Streams' } /> ' } /> Learn how to stream DynamoDB data to thousands of nodes with sub-second latency using a two-tier architecture with DynamoDB Streams and Spice acceleration. 

    ' } /> ' } /> New Recipe in the Spice Cookbook: ScyllaDB Connector   ' } /> ' } /> Learn how to connect Spice to ScyllaDB for sub-second federated queries. 

    ' } /> ' } /> As always, we\'d love your feedback! Join us on Slack to connect directly with the team and other Spice users.

    ' } />

    '} /> --- ## Spice Cloud v1.8.0: Iceberg Write Support, Acceleration Snapshots & More URL: https://spice.ai/blog/spice-cloud-v1-8-0-iceberg-writes Date: 2025-10-08T17:29:00 Description: Announcing Spice Cloud v1.8.0 - now with Iceberg write support, acceleration snapshots, partitioned S3 Vectors indexes & a new AI SQL function
    Spice Cloud & Spice.ai Enterprise 1.8.0 are live! v1.8.0 includes Iceberg write support, acceleration snapshots, partitioned S3 Vector indexes, a new AI SQL function for LLM integration, and an updated Spice.js SDK.

    ' } /> V1.8.0 also introduces developer experience upgrades, including a redesigned Spice Cloud dashboard with tabbed navigation:

    ' } />
    Figure 1: Switch between datasets, queries, and models without losing context.
    ' } /> Spice Cloud customers will automatically upgrade to v1.8.0 on deployment, while Spice.ai Enterprise customers can consume the Enterprise v1.8.0 image from the Spice AWS Marketplace listing.

    ' } /> What\'s New in v1.8.0'} /> Iceberg Write Support (Preview)' } /> Spice now supports writing to Apache Iceberg tables using standard SQL INSERT INTO statements. This greatly simplifies creating and updating Iceberg datasets in the Spice runtime - letting teams directly manipulate open table data with SQL instead of third-party tools. Learn more about Spice SQL federation or get started with Iceberg writes in Spice here.

    ' } /> Example query:

    '} /> ```python -- Insert from another table INSERT INTO iceberg_table SELECT * FROM existing_table; -- Insert with values INSERT INTO iceberg_table (id, name, amount) VALUES (1, 'John', 100.0), (2, 'Jane', 200.0); -- Insert into catalog table INSERT INTO ice.sales.transactions VALUES (1001, '2025-01-15', 299.99, 'completed'); ``` Acceleration Snapshots (Preview)'} /> A new snapshotting system enables datasets accelerated with file-based engines (DuckDB or SQLite) to bootstrap from stored snapshots in object storage like S3 - significantly reducing cold-start latency and simplifying distributed deployments. Learn more.

    ' } /> Partitioned Amazon S3 Vector Indexes' } /> Vector search at scale is now faster and more efficient with partitioned Amazon S3 Vector indexes - ideal for large-scale semantic search, recommendation systems, and embedding-based applications. Combine with hybrid SQL search for unified keyword and vector retrieval. Learn more.

    ' } /> AI SQL Function (Preview)'} /> A new asynchronous ai SQL function enables developers to call large language models (LLMs) directly from SQL, making it possible to integrate LLM inference directly into federated or analytical workflows without additional services. Learn more.

    ' } /> Spice.js v3.0.3 SDK'} /> v3.0.3 brings improved reliability and broader platform support. Highlights include new query methods, automatic transport fallback between gRPC and HTTP, and built-in health checks and dataset refresh controls. Learn more.

    ' } /> Bug & Stability Fixes'} /> v1.8.0 also includes numerous fixes and improvements:

    '} />
  • Reliability: Improved logging, error handling, and network readiness checks across connectors (Iceberg, Databricks, etc.).
  • Vector search durability and scale: Refined logging, stricter default limits, safeguards against index-only scans and duplicate results, and always-accessible metadata for robust queryability at scale.
  • Cache behavior: Tightened cache logic for modification queries.
  • Full-Text Search: FTS metadata columns now usable in projections.
  • RRF Hybrid Search: Reciprocal Rank Fusion (RRF) UDTF enhancements for advanced hybrid search scenarios.
  • ' } />

    '} /> For more on v1.8.0, check out the full release notes.

    ' } /> v1.8 Release Community Call'} /> Join us on Thursday, October 16th for live demos of the new functionality delivered in v1.8! Register here.

    ' } />
    Figure 2: October 16th, v1.8 Release Community Call
    ' } /> Resources to Get Started with Spice' } />
  • Sign up for Spice Cloud for free, or get started with Spice Open Source
  • Explore the Spice cookbooks and docs
  • Schedule a demo if you\'d like to see the product live or have any questions
  • ' } />

    '} /> --- ## Spice Cloud v1.9.0: Introducing the Spice Cayenne Data Accelerator URL: https://spice.ai/blog/spice-cloud-v1-9-0-cayenne-data-accelerator Date: 2025-11-20T18:21:00 Description: Spice Cloud v1.9.0 adds the Cayenne Data Accelerator, Apache DataFusion v50, HTTP data connector support for querying endpoints as tables, and much more. Spice Cloud & Spice.ai Enterprise 1.9.0 are live!

    '} /> Our mission at Spice is to make building data-intensive applications and AI systems easier, faster, and more secure. With v1.9.0, we're taking a big step forward.

    " } /> This release introduces Spice Cayenne, our new premier data accelerator based on Vortex, upgrades to DataFusion v50 and DuckDB v1.4.2, new HTTP Data Connector support for querying API endpoints as tables, and many more improvements across performance, scalability, and developer experience.

    ' } /> Spice Cloud customers will automatically upgrade to v1.9.0 on deployment, while Spice.ai Enterprise customers can consume the Enterprise v1.9.0 image from the Spice AWS Marketplace listing.

    ' } /> What\'s New in v1.9.0'} /> Cayenne Data Accelerator (Beta)'} /> Cayenne is the new premier data accelerator for high-volume, multi-file workloads. Built on the Vortex columnar format from the Linux Foundation, Cayenne offers better ingestion and query performance than DuckDB without single-file scaling limits. Spice Cayenne supports high concurrency, retention policies, and SQL-based lifecycle management.

    ' } />
    Figure 1: TPC-H SF-100 benchmark
    ' } />
    Figure 2: ClickBench benchmark
    ' } /> Learn more about getting started with Cayenne in the docs.

    ' } /> Apache DataFusion v50 Upgrade'} /> DataFusion v50 features faster filter pushdown, new SQL functions, and more reliable execution plans. Learn more in the Apache DataFusion blog.

    ' } />
    Figure 3: Apache DataFusion performance improvements
    ' } /> HTTP Data Connector: Query Endpoints as Tables' } /> Query HTTP endpoints as tables in SQL queries with dynamic filters, with full support for results-caching including new stale-while-revalidate (SWR) support. Learn more here.

    ' } />
    Figure 4: HTTP data connector query
    ' } /> Full-Text and Vector Search on Views' } /> You can now add full-text indexes or embeddings to accelerated views for advanced search across pre-aggregated or transformed data. Vector engines on views are now also supported. Visit the search and embeddings docs for more information.

    ' } />
    Figure 5: Full-text and vector search on views example
    ' } /> AWS Authentication Improvements' } /> AWS SDK credential initialization now includes more robust retry logic and better handling, supporting transient and extended network and AWS outages without manual intervention. Learn more.

    ' } /> Additional Updates'} />
  • DuckDB v1.4.2: Composite ART indexes for accelerated table scans, intermediate materialization, and better refresh performance
  • Git Data Connector: Query data directly from Git repositories 
  • DynamoDB Connector Improvements: Improved filter handling, parallel scan support, and better handling for misconfigured queries
  • Spice Java SDK v0.4.0 with configurable memory limits
  • CLI Improvements: version pinning, persistent query history, and tab completion
  • Dedicated Query Thread Pool is now enabled by default
  • ' } /> For more details on v1.9.0, visit the release notes.

    ' } /> --- ## Spice Firecache | Cloud-Scale DuckDB URL: https://spice.ai/blog/spice-firecache Date: 2023-12-05T19:28:00 Description: Cloud-Scale DuckDB Spice AI is not just another data indexer or data provider.

    '} /> Spice.ai is a set of AI/ML-infra building blocks for creating data and AI-driven applications - web3 data included.

    ' } /> In October\'s General Availability announcement we also announced Spice Firecache.

    ' } /> Spice Firecache is a real-time, in-memory SQL service based on cloud-scale DuckDB instances that enables blazing fast SQL query up to 10x the performance of general SQL query. For example, EigenLayer uses Spice Firecache to enable scenarios not possible before, including serving dynamic data to their high traffic dashboards, real-time monitoring, and analytics. ML models hosted on the Spice.ai platform can be paired with Firecache to power super-fast, low-latency inferencing, as demonstrated by the AI predictions demo on the Spice.ai website.

    ' } /> If you like DuckDB, you'll love it at cloud-scale, automatically provisioned and updated with real-time data.

    " } />
    Experience high concurrency, blazing fast SQL query up to 10x the performance of general SQL query in the Firecache Playground.
    ' } /> 🔥 Spice Firecache in Preview'} /> Spice.ai is preloaded with over 100TBs of Web3 and Asset Prices data which can be combined with your own PostgreSQL, MySQL, and data lake external data sources. While organizations need access to terabytes or petabytes of data, the working set of data that is used day-to-day is often orders of magnitude smaller, generally in the 10s of GBs.

    ' } /> Cloud-Scale DuckDB. Spice Firecache enables you to configure specific datasets for inclusion into cloud-scale managed DuckDB instances for query performance 10x or sometimes even 100x faster than general SQL query.

    ' } /> Platform managed. The entire Firecache data lifecycle, from ETL to query ensures datasets are updated in real-time and is completely platform managed. All the user has to do is enable Firecache on a dataset, via configuration as code through GitHub-connected Datasets, deploy it, and start querying.

    ' } /> Easy to query. Once datasets have been deployed, they can be queried by the same interfaces as general SQL query, including with the Playground SQL Query Editor in the Spice.ai portal, the HTTP API, the Apache Arrow Flight API, and SDKs.

    ' } /> Dedicated instances. Dedicated Firecache instances are deployed per organization and are Spice app-specific, only available to the Spice app to which they were deployed.

    ' } /> Easy to integrate. Simply swap to using the Firequery() method instead of the Query() method using one of the GoPythonNode.js, or Rust SDKs to start using Firecache.

    ' } /> What it enables'} /> Intelligent data and AI-driven software is often limited by the speed and cost of data retrieval from big data warehouses and systems.

    ' } /> High-performance access to data is required to power real-time operations, observability, analytics, ML inferencing, and recently popular Retrieval-Augmented Generation (RAG) for large-language-models (LLMs).

    ' } /> By keeping the day-to-day working set of frequently queried data available in Firecache, data retrieval latency can be significantly reduced, resulting in higher-performance (and lower-cost) frontends, applications, and AI-powered insights.

    ' } /> This is particularly beneficial for customers like EigenLayer who quickly rose to popularity, needing to make 10s of millions of SQL queries a month and growing to power their dynamic community and user dashboards. Spice Firecache ensures that software can meet user expectations for speed and interactivity, supercharging the user experience, all while reducing cost.

    ' } /> Spice Firecache is currently available in preview for Spice AI Design Partners and Enterprise customers.

    ' } /> Get in touch at hey@spice.ai if you\'d like to trial Firecache for free.

    ' } /> Thank you!

    '} /> The Team at Spice AI

    '} /> 🌐 Community updates:

    '} /> Spice AI recently partnered with RISC Zero, a leader in Zero-Knowledge (ZK) technology to provide ZK-provable ML inferencing for Ethereum gas-fees.

    ' } /> Read the full case-study at spice.ai/cases/risc-zero.

    ' } /> 👥 Join the growing Spice AI Community:

    ' } /> Twitter | LinkedIn | GitHub | Discord | Telegram

    ' } /> 💡 We're hiring! 💡

    " } /> We\'re looking for innovators across engineering, design, and devrel. Discover the latest opportunities at Spice AI here and learn how you can play a leading role in shaping the future of intelligent software and application development.

    ' } /> About Spice AI'} /> Spice.ai is a set of AI/ML-infra building blocks for creating data and AI-driven applications - web3 data included.

    ' } /> ​Spice AI eliminates the complexity of building and operating costly data and AI infrastructure by composing real-time and historical time-series data, custom ETL, machine learning training and inferencing, in a high-performance, enterprise-grade platform.

    ' } /> Have questions or feedback? Get in touch on Discord.

    ' } /> Spice.ai | View previous releases

    ' } />

    '} />

    '} /> --- ## Spice OSS, rebuilt in Rust URL: https://spice.ai/blog/spice-oss-rebuilt-in-rust Date: 2024-04-01T19:19:00 Description: Spice.ai OSS has been rebuilt from the ground up in Rust, delivering the performance, safety, and portability needed for production data infrastructure. Last week we announced the next-generation of Spice OSS, the technology behind Spice Firecache, completely rebuilt from the ground up in Rust.

    ' } /> Spice OSS is as a unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse or data lake.

    ' } /> Spice federates SQL query across databases (MySQL, PostgreSQL, etc.), data warehouses (Snowflake, BigQuery, etc.) and data lakes (S3, MinIO, Databricks, etc.) so you can easily use and combine data wherever it lives. And of course Spice OSS connects to the Spice.ai Cloud Platform.

    ' } /> Datasets can be materialized and accelerated using your engine of choice, including DuckDB, SQLite, PostgreSQL, and in-memory Apache Arrow records, for ultra-fast, low-latency query. Accelerated engines run in your infrastructure giving you flexibility and control over price and performance.

    ' } /> You can get started with Spice on your own machine in less than 30 seconds by following the quickstart at github.com/spiceai/spiceai as Phillip demonstrated below.

    ' } /> --- ## Getting Started with Spice.ai SQL Query Federation & Acceleration URL: https://spice.ai/blog/spice-sql-query-federation-acceleration Date: 2025-10-14T17:15:00 Description: Learn how to use Spice.ai to federate and accelerate queries across operational and analytical systems with zero ETL. **TL;DR:** Spice provides [SQL query federation](/platform/sql-federation-acceleration) and local acceleration in a lightweight runtime that lets applications query distributed data sources -- databases, warehouses, and lakes -- through a single SQL interface with sub-second performance, zero ETL, and no data movement. --- Modern applications succeed when they can deliver speed, intelligence, and reliability at scale. Achieving that depends increasingly on how quickly and efficiently they can access the relevant underlying data sources. Most enterprises, however, rely on pipelines, warehouses, and APIs that make real-time or intelligent apps prohibitively expensive or slow. Engineering teams are faced with how to make these fragmented systems faster, more secure, and more productive without rebuilding from scratch. Spice was built to solve this problem, delivering SQL query federation and local acceleration in a lightweight, portable runtime that turns distributed data environments into a high-performance data layer capable of supporting the most demanding workloads.

    ' } />
    Figure 1: Federated SQL Query in Spice
    ' } /> The Current State of Data Access in the Enterprise' } /> The primary mandate for development teams is satisfying the performance, availability, and security benchmarks your use case mandates at a cost profile that makes sense for your business; how you satisfy those requirements in terms of the underlying technology deployed is ultimately an implementation detail. 

    ' } /> It's evident that this philosophy is put into practice when you take a peek behind the 'enterprise curtain', where you'll see a patchwork of different systems deployed at different layers of abstraction: operational and analytical apps built on on-premise, cloud, hybrid, or serverless infrastructure depending on the use case. 

    " } /> In order to make these very heterogenous architectures operational, historically (largely by necessity) development teams would have to patch together a variety of pipelines to connect to data spread out across the enterprise: customer records in transactional databases, historical data in warehouses or data lakes, semi-structured content in object stores, etc. All across a variety of deployment tiers. 

    ' } /> How can you make your existing infrastructure investment more productive, secure, and performant, while also serving the low-latency data access that intelligent applications and agents demand? Copying everything into a single warehouse no longer satisfies the performance, security, or cost requirements of modern workloads.

    ' } /> How Spice\'s SQL Federation & Acceleration Solves the Data Access Problem' } /> Spice is an open-source data and AI platform that federates and accelerates your operational and analytical data and deploys it at the application layer. Instead of centralizing all data in one system, it lets teams query data in place across multiple sources in a lightweight runtime that runs anywhere - edge, cloud, or on-prem. 

    ' } />
  • SQL Query Federation: Run SQL queries across OLTP databases, OLAP warehouses, and object stores without ETL.
  • ' } /> Spice.ai Open Source Query Federation
    Figure 2: SQL Query Federation in Spice
    ' } />

    '} /> SQL Query Acceleration: Cache or index working sets locally in DuckDB or SQLite, cutting query latency from seconds to milliseconds.

    ' } />
    Figure 3: SQL Query Acceleration in Spice
    ' } /> By combining SQL federation and acceleration in a single runtime, Spice reduces infrastructure complexity and delivers the sub-second latency needed for real-time apps and AI agents - whereas traditional approaches rely on heavy ETL pipelines or pre-aggregations to make object storage and open table formats queryable. And for use cases that demand search, Spice packages keyword, vector, and full-text search in one SQL query for truly hybrid retrieval.

    ' } /> This breadth of capabilities brings object storage fully into the operational path; Spice turns object storage into active, queryable data layers that support real-time ingestion, transformation, and retrieval. You can query data where it lives, ingest real-time updates through change-data-capture (CDC), index for vector and full-text search, and write directly back to Iceberg tables using standard SQL.

    ' } /> Use Case Example: Combining S3, PostgreSQL, and Dremio Data in Single Query' } /> Consider a customer portal that needs to show per-customer delivery stats in real time, which for this use case means joining customer orders in S3 with trip data stored in Dremio. With a traditional pipeline, this would require duplicating data into a warehouse, incurring both cost and latency.

    ' } /> Spice, conversely, federates and accelerates that data locally. Here's what it looks like in practice (which you can validate for yourself in the next section):

    " } />
  • Raw S3 query (~2,800 rows): 0.87s
  • Accelerated S3 query with Spice: 0.02s (40x faster)
  • Raw Dremio query (100,000 rows): 2.67s
  • Accelerated Dremio query with Spice: 0.01s (250x faster)
  • Federated aggregation across both: 0.009s
  • ' } /> This performance delta transforms multi-system queries from a batch job into something fast enough for an interactive app.

    ' } /> Cookbook: How To Run It Yourself' } /> Watch the video and follow along with the cookbook below to see how to fetch combined data from S3 Parquet, PostgreSQL, and Dremio in a single query.

    ' } /> Follow these steps to use Spice to federate SQL queries across data sources' } /> Step 1. Clone the github.com/spiceai/cookbook repo and navigate to the federation directory.

    ' } /> ```python git clone https://github.com/spiceai/cookbook cd cookbook/federation ``` Step 2. Initialize the Spice app. Use the default name by pressing enter when prompted.

    ' } /> ```python spice init name: (federation)? ``` Step 3. Log into the demo Dremio instance. Ensure this command is run in the federation directory.

    ' } /> ```python spice login dremio -u demo -p demo1234 ``` Step 4. Add the spiceai/fed-demo Spicepod from spicerack.org.

    ' } /> ```python spice add spiceai/fed-demo ``` Step 5. Start the Spice runtime.

    '} /> ```python spice run 2025/01/27 11:36:41 INFO Checking for latest Spice runtime release... 2025/01/27 11:36:42 INFO Spice.ai runtime starting... 2025-01-27T19:36:43.199530Z INFO runtime::init::dataset: Initializing dataset dremio_source 2025-01-27T19:36:43.199589Z INFO runtime::init::dataset: Initializing dataset s3_source 2025-01-27T19:36:43.199709Z INFO runtime::init::dataset: Initializing dataset dremio_source_accelerated 2025-01-27T19:36:43.199537Z INFO runtime::init::dataset: Initializing dataset s3_source_accelerated 2025-01-27T19:36:43.201310Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051 2025-01-27T19:36:43.201625Z INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090 2025-01-27T19:36:43.205435Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090 2025-01-27T19:36:43.209349Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052 2025-01-27T19:36:43.401179Z INFO runtime::init::results_cache: Initialized results cache; max size: 128.00 MiB, item ttl: 1s 2025-01-27T19:36:43.624011Z INFO runtime::init::dataset: Dataset dremio_source_accelerated registered (dremio:datasets.taxi_trips), acceleration (arrow), results cache enabled. 2025-01-27T19:36:43.625619Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset dremio_source_accelerated 2025-01-27T19:36:43.776300Z INFO runtime::init::dataset: Dataset dremio_source registered (dremio:datasets.taxi_trips), results cache enabled. 2025-01-27T19:36:44.182533Z INFO runtime::init::dataset: Dataset s3_source registered (s3://spiceai-demo-datasets/cleaned_sales_data.parquet), results cache enabled. 2025-01-27T19:36:44.203734Z INFO runtime::init::dataset: Dataset s3_source_accelerated registered (s3://spiceai-demo-datasets/cleaned_sales_data.parquet), acceleration (sqlite), results cache enabled. 2025-01-27T19:36:44.205146Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset s3_source_accelerated 2025-01-27T19:36:45.138393Z INFO runtime::accelerated_table::refresh_task: Loaded 2,823 rows (1010.18 kiB) for dataset s3_source_accelerated in 933ms. 2025-01-27T19:36:46.313896Z INFO runtime::accelerated_table::refresh_task: Loaded 100,000 rows (27.91 MiB) for dataset dremio_source_accelerated in 2s 688ms. ``` Step 6. In another terminal window, start the Spice SQL REPL and perform the following SQL queries:

    ' } /> ```python spice sql -- Query the federated S3 source select * from s3_source; +--------------+------------------+------------+-------------------+---------+---------------------+---------+---------+-------+------+--------------+------+--------------+--------------------+------------------+-------------------------------+---------------+-------+-------+-------------+---------+-----------+-------------------+--------------------+-----------+ | order_number | quantity_ordered | price_each | order_line_number | sales | order_date | status | quarter | month | year | product_line | msrp | product_code | customer_name | phone | address_line1 | address_line2 | city | state | postal_code | country | territory | contact_last_name | contact_first_name | deal_size | +--------------+------------------+------------+-------------------+---------+---------------------+---------+---------+-------+------+--------------+------+--------------+--------------------+------------------+-------------------------------+---------------+-------+-------+-------------+---------+-----------+-------------------+--------------------+-----------+ | 10107 | 30 | 95.7 | 2 | 2871.0 | 2003-02-24T00:00:00 | Shipped | 1 | 2 | 2003 | Motorcycles | 95 | S10_1678 | Land of Toys Inc. | 2125557818 | 897 Long Airport Avenue | | NYC | NY | 10022 | USA | | Yu | Kwai | Small | | 10121 | 34 | 81.35 | 5 | 2765.9 | 2003-05-07T00:00:00 | Shipped | 2 | 5 | 2003 | Motorcycles | 95 | S10_1678 | Reims Collectables | 26.47.1555 | 59 rue de l'Abbaye | | Reims | | 51100 | France | EMEA | Henriot | Paul | Small | | 10134 | 41 | 94.74 | 2 | 3884.34 | 2003-07-01T00:00:00 | Shipped | 3 | 7 | 2003 | Motorcycles | 95 | S10_1678 | Lyon Souveniers | +33 1 46 62 7555 | 27 rue du Colonel Pierre Avia | | Paris | | 75508 | France | EMEA | Da Cunha | Daniel | Medium | ... +--------------+------------------+------------+-------------------+---------+---------------------+---------+---------+-------+------+--------------+------+--------------+--------------------+------------------+-------------------------------+---------------+-------+-------+-------------+---------+-----------+-------------------+--------------------+-----------+ Time: 0.876282458 seconds. 500/2823 rows displayed. ``` ```python -- Query the accelerated S3 source select * from s3_source_accelerated; ``` Output:

    '} /> ```python +---------------------+-----------------+------------------+-------------+------------+--------------+ | pickup_datetime | passenger_count | trip_distance_mi | fare_amount | tip_amount | total_amount | +---------------------+-----------------+------------------+-------------+------------+--------------+ | 2013-08-22T08:24:12 | 1 | 1.1 | 7.5 | 0.0 | 8.0 | | 2013-08-21T12:40:46 | 1 | 6.1 | 23.0 | 0.0 | 23.5 | | 2013-08-24T00:40:17 | 2 | 0.6 | 4.5 | 0.0 | 5.5 | ... +---------------------+-----------------+------------------+-------------+------------+--------------+ Time: 0.015666208 seconds. 500/100000 rows displayed. -- Perform an aggregation query that combines data from S3 and Dremio WITH all_sales AS ( SELECT sales FROM s3_source_accelerated UNION ALL select fare_amount+tip_amount as sales from dremio_source_accelerated ) SELECT SUM(sales) as total_sales, COUNT(*) AS total_transactions, MAX(sales) AS max_sale, AVG(sales) AS avg_sale FROM all_sales; ``` Output:

    '} /> ```python +--------------------+--------------------+----------+--------------------+ | total_sales | total_transactions | max_sale | avg_sale | +--------------------+--------------------+----------+--------------------+ | 11501140.079999998 | 102823 | 14082.8 | 111.85376890384445 | +--------------------+--------------------+----------+--------------------+ Time: 0.009526666 seconds. 1 rows. ``` Closing Thoughts'} /> Federated SQL with Spice gives development teams a faster, simpler way to work with distributed data, and allows enterprises to accommodate modern access patterns on top of their existing infrastructure investment. By eliminating ETL bottlenecks and enabling low-latency queries across multiple systems, Spice delivers consistent, high-performance access to data wherever it lives.

    ' } /> Clone the cookbook repo and give Spice federation a try!

    ' } /> Getting Started with Spice'} /> Spice is open source (Apache 2.0) and can be installed in less than a minute on macOS, Linux, or Windows, and also offers an enterprise-grade Cloud deployment.

    ' } />
  • Explore the open source docs and blog
  • Visit the getting started guide
  • Explore the 75+ cookbooks
  • Try Spice.ai Cloud for a fully managed deployment and get started for free.
  • ' } />

    '} /> ## Frequently Asked Questions ### What is SQL query federation? SQL query federation is the ability to execute a single SQL query across multiple data sources -- databases, data warehouses, data lakes, and APIs -- without moving the data first. Spice federates queries across [30+ connectors](/platform/sql-federation-acceleration) and returns results through a unified SQL interface, eliminating the need for ETL pipelines or data duplication. ### How does data acceleration differ from caching? Data acceleration materializes selected datasets into a local engine (Arrow, DuckDB, SQLite, or [Cayenne](/blog/introducing-spice-cayenne-data-accelerator)) with configurable refresh policies, so queries always hit pre-loaded data. Traditional caching stores individual query results with TTL-based expiration. Acceleration provides consistent sub-millisecond performance for any query pattern against the accelerated dataset, not just previously executed queries. ### Can Spice query data without moving it from the source? Yes. Spice's federation layer pushes query predicates down to source systems and returns only the matching rows. For data that is queried frequently, you can optionally enable acceleration to materialize a local copy with automatic refresh. Both modes avoid traditional ETL -- the data stays at the source or is managed declaratively by Spice. ### What data sources does Spice support for federation? Spice supports over 30 data connectors including PostgreSQL, MySQL, DynamoDB, Snowflake, Databricks, S3 (Parquet, Iceberg, Delta Lake), ClickHouse, Dremio, SharePoint, GitHub, and more. New connectors are added regularly. See the full list in the [Spice documentation](https://spiceai.org/docs/components/data-connectors). --- ## Spice.ai's approach to Time-Series AI URL: https://spice.ai/blog/spiceais-approach-to-time-series-ai Date: 2021-11-18T18:27:54 Description: Explore the challenges of time-series AI and why Spice.ai uses a data-driven reinforcement learning approach to help developers build adaptive, intelligent applications. The Spice.ai project strives to help developers build applications that leverage new AI advances which can be easily trained, deployed, and integrated. We recently introduced Spicepods: a declarative way to create AI applications with Spice.ai technology. While there are many libraries and platforms in the space, Spice.ai is focused on time-series data aligning to application-centric and frequently time-dependent data, and a reinforcement learning approach, which can be more developer-friendly than expensive, labeled supervised learning.

    ' } /> This post will discuss some of the challenges and directions for the technology we are developing.

    ' } /> Time Series'} /> Time Series processing visualization: a time window is usually chosen to process part of the data stream Figure 1. Time Series processing visualization: a time window is usually chosen to process part of the data stream

    ' } /> Time series AI has become more popular over recent years, and there is extensive literature on the subject, including time-series-focused neural networks. Research in this space points to the likelihood that there is no silver bullet, and a single approach to time series AI will not be sufficient. However, for developers, this can make building a product complex, as it comes with the challenge of exploring and evaluating many algorithms and approaches.

    ' } /> A fundamental challenge of time series is the data itself. The shape and length are usually variable and can even be infinite (real-time streams of data). The volume of data required is often too much for simple and efficient machine learning algorithms such as Decision Trees. This challenge makes Deep Learning popular to process such data. There are several types of neural networks that have been shown to work well with time series so let's review some of the common classes:

    " } />
  • Convolutional Neural Networks (CNN): CNN\'s can only accept data with fixed lengths: even with the ability to pad the data, this is a major drawback for time-series data as a specific time window needs to be decided. Despite this limitation, they are the most efficient network to train (computation, data needed, time) and usually the smallest storage. CNN\'s are very robust and used in image/video processing, making them a very good baseline to start with while also benefiting from refined and mature development over the years, such as with the very efficient MobileNet with depth-wise convolutions.
  • Recurrent Neural Networks (RNN): RNNs have been researched for several decades, and while they aren\'t as fast to train as CNNs, they can be faster to apply as there is no need to feed a time window like CNNs if the desired input/output is in real-time (in a continuous fashion, also called \'online). RNNs are proven to be very good in some situations, and many new models are being discovered.
  • Transformers: Most of the state-of-the-art results today have been made from transformers and their variations. They are very good at correlating sparse information. Popularized in the famous paper Attention is all you need, transformers are proven to be flexible with high-performance in many classes (Vision Transformers, Perceiver, etc.). They suffer the same limitation as CNNs for the length of their input (fixed at training time), but they also have a disadvantage of not scaling well with the size of the data (quadratic growth with the length of the time series). They are also the most expensive network to train in general.
  • ' } /> While not a complete representation of classes of neural networks, this list represents the areas of the most potential for Spice.ai's time-series AI technology. We also see other interesting paradigms to explore when improving the core technology like Memory Augmented Neural Networks (MANN) or neural network-based Genetical Algorithms.

    " } /> Reinforcement Learning' } /> Reinforcement Learning (RL) has grown steadily, especially in fields like robotics. Usually, RL doesn't require as much data processing as Supervised Learning, where large datasets can be demanding for hardware and people alike. RL is more dynamic: agents aren't trained to replicate a specific behaviors/output but explore and 'exploit' their environment to maximize a given reward.

    " } /> Most of today's research is based on environments the agent can interact with during the training process, known as online learning. Usually, efficient training processes have multiple agent/environment pairs training together and sharing their experiences. Having an environment for agents to interact enables different actions from the actual historical state known as on-policy learning, and using only past experiences without an environment is off-policy learning.

    " } /> AI training without interacting with the environment Figure 2. AI training without interacting with the environment (real world nor simulation). Only gathered data is used for training.

    ' } /> Spice.ai is initially taking an off-policy approach, where an environment (either pre-made or given by the user) is not required. Despite limiting the exploration of agents, this aligns to an application-centric approach as:

    ' } />
  • Creating a real-world model or environment can be difficult and expensive to create, arguably even impossible.
  • Off-policy learning is normally more efficient than on-policy (time/data and computation).
  • ' } /> The Spice.ai approach to time series AI can be described as \'Data-Driven\' Reinforcement Learning. This domain is very exciting, and we are building upon excellent research that is being published. The Berkeley Artificial Intelligence Research\'s blog shows the potential of this field and many other research entities that have made great discoveries like DeepMindOpen AIFacebook AI and Google AI (among many others). We are inspired and are building upon all the research in Reinforcement Learning to develop core Spice.ai technology.

    ' } /> If you are interested in Reinforcement Learning, we recommend following these blogs, and if you\'d like to partner with us on the mission of making it easier to build intelligent applications by leveraging RL, we invite you to discuss with us on Slack, or reach out on Twitter.

    ' } /> --- ## Spicepods: From Zero to Hero URL: https://spice.ai/blog/spicepods-from-zero-to-hero Date: 2021-12-02T18:41:30 Description: A step-by-step guide to authoring a Spicepod from scratch and using it to build an application that learns and adapts over time. In my previous post, Teaching Apps how to Learn with Spicepods, I introduced Spicepods as packages of configuration that describe an application\'s data-driven goals and how it should learn from data. To leverage Spice.ai in your application, you can author a Spicepod from scratch or build upon one fetched from the spicerack.org registry. In this post, we\'ll walk through the creation and authoring of a Spicepod step-by-step from scratch.

    ' } /> As a refresher, a Spicepod consists of:

    '} />
  • A required YAML manifest that describes how the pod should learn from data
  • Optional seed data
  • Learned model/state
  • Performance telemetry and metrics
  • ' } /> We\'ll create the Spicepod for the ServerOps Quickstart, an application that learns when to optimally run server maintenance operations based upon the CPU-usage patterns of a server machine.

    ' } /> We\'ll also use the Spice CLI, which you can install by following the Getting Started guide or Getting Started YouTube video.

    ' } /> Fast iterations' } /> Modern web development workflows often include a file watcher to hot-reload so you can iteratively see the effect of your change with a live preview.

    ' } /> Spice.ai takes inspiration and enables a similar Spicepod manifest authoring experience. If you first start the Spice.ai runtime in your application root before creating your Spicepod, it will watch for changes and apply them continuously so that you can develop in a fast, iterative workflow.

    ' } /> You would normally do this by opening two terminal windows side-by-side, one that runs the runtime using the command spice run and one where you enter CLI commands. In addition, developers would open the Spice.ai dashboard located at http://localhost:8000 to preview changes they make.

    ' } /> Figure 1. Spice.ai\'s modern development workflow' } /> Creating a Spicepod' } /> The easiest way to create a Spicepod is to use the Spice.ai CLI command: spice init <Spicepod name>. We'll make one in the ServerOps Quickstart application called serverops.

    " } /> Figure 2. Creating a Spicepod.' } /> The CLI saves the Spicepod manifest file in the spicepods directory of your application. You can see it created a new serverops.yaml file, which should be included in your application and be committed to your source repository. Let's take a look at it.

    " } /> Figure 3. Spicepod manifest.' } /> The initialized manifest file is very simple. It contains a name and three main sections being:

    ' } />
  • dataspaces
  • actions
  • training
  • ' } /> We\'ll walk through each of these in detail, and as a Spicepod author, you can always reference the documentation for the Spicepod manifest syntax.

    ' } /> Authoring a Spicepod manifest' } /> You author and edit Spicepod manifest files in your favorite text editor with a combination of Spice.ai CLI helper commands. We eventually plan to have a VS Code extension and dashboard/portal editing abilities to make this even easier.

    ' } /> Adding a dataspace' } /> To build an intelligent, data-driven application, we must first start with data.

    ' } /> A Spice.ai dataspace is a logical grouping of data with definitions of how that data should be loaded and processed, usually from a single source. A combination of its data source and its name identifies it, for example, nasdaq/msft or twitter/tweets. Read more about Dataspaces in the Core Concepts documentation.

    ' } /> Let\'s add a dataspace to the Spicepod manifest to load CPU metric data from a CSV file. This file is a snapshot of data from InfluxDB, a time-series database we like.

    ' } /> Figure 4. Adding a dataspace.' } /> We can see this dataspace is identified by its source hostmetrics and name cpu. It includes a data section with a file data connector, the path to the file, and a data processor to know how to process it. In addition, it defines a single measurement usage_idle under the measurements section, which is a measurement of CPU load. In Spice.ai, measurements are the core primitive the AI engine uses to learn and is always numerical data. Spice.ai includes a growing library of community contributable data connectors and data processors you can consist of in your Spicepod to access data. You can also contribute your own.

    ' } /> Finally, because the data is a snapshot of live data loaded from a file, we must set a Spicepod epoch_time that defines the data's start Unix time.

    " } /> Now we have a dataspace, called hostmetrics/cpu, that loads CSV data from a file and processes the data into a usage_idle measurement. The file connector might be swapped out with the InfluxDB connector in a production application to stream real-time CPU metrics into Spice.ai. And in addition, applications can always send real-time data to the Spice.ai runtime through its API with a simple HTTP POST (and in the future, using Web Sockets and gRPC).

    ' } /> Adding actions' } /> Now that the Spicepod has data, let's define some data-driven actions so the ServerOps application can learn when is the best time to take them. We'll add three actions using the CLI helper command, spice action add.

    " } /> Figure 5. Adding actions.' } /> And in the manifest:

    '} /> Figure 6. Actions added to the manifest' } /> Adding rewards' } /> The Spicepod now has data and possible actions, so we can now define how it should learn when to take them. Similar to how humans learn, we can set rewards or punishments for actions taken based on their effect and the data. Let's add scaffold rewards for all actions using the spice rewards add command.

    " } /> Figure 7. Adding rewards' } /> We now have rewards set for each action. The rewards are uniform (all the same), meaning the Spicepod is rewarded the same for each action. Higher rewards are better, so if we change perform_maintenance to 2, the Spicepod will learn to perform maintenance more often than the other actions. Of course, instead of setting these arbitrarily, we want to learn from data, and we can do that by referencing the state of data at each time-step in the time-series data as the AI engine trains.

    ' } /> Figure 8. Rewards added to the manifest' } /> The rewards themselves are just code. Currently, we currently support Python code, either inline or in a .py external code file and we plan to support several other languages. The reward code can access the time-step state through the prev_state and new_state variables and the dataspace name. For the full documentation, see Rewards.

    ' } /> Let's add this reward code to perform_maintenance, which will reward performing maintenance when there is low CPU usage.

    " } /> cpu_usage_prev = 100 - prev_state.hostmetrics_cpu_usage_idle
    cpu_usage_new = 100 - new_state.hostmetrics_cpu_usage_idle
    cpu_usage_delta = cpu_usage_prev - cpu_usage_new
    reward = cpu_usage_delta / 100
    ' } /> This code takes the CPU usage (100 minus the idle time) deltas between the previous time state and the current time state, and sets the reward to be a normalized delta value between 0 and 1. When the CPU usage is moving from higher cpu_usage_prev to lower cpu_usage_low, its a better time to run server maintenance and so we reward the inverse of the delta. E.g. 80% - 50% = 30% = 0.3. However, if the CPU moves lower to higher, 50% - 80% = -30% = -0.3, it\'s a bad time to run maintenance, so we provide a negative reward or "punish" the action.

    ' } /> Figure 9. Reward code' } /> Through these rewards and punishments and the CPU metric data, the Spicepod will when it is a good time to perform maintence and be the decision engine for the ServerOps application. You might be thinking you could write code without AI to do this, which is true, but handling the variety of cases, like CPU spikes, or patterns in the data, like cyclical server load, would take a lot of code and a development time. Applying AI helps you build faster.

    ' } /> Putting it all together' } /> The manifest now has defined data, actions, and rewards. The Spicepod can get data to learn which actions to take and when based on the rewards provided.

    ' } /> If the Spice.ai runtime is running, the Spicepod automatically trains each time the manifest file is saved. As this happens reward performance can be monitored in the dashboard.

    ' } /> Once a training run completes, the application can query the Spicepod for a decision recommendation by calling the recommendations API http://localhost:8000/api/v0.1/pods/serverops/recommendation. The API returns a JSON document that provides the recommended action, the confidence of taking that action, and when that recommendation is valid.

    ' } /> In the ServerOps Quickstart, this API is called from the server maintenance PowerShell script to make an intelligent decision on when to run maintenance. The ServerOps Sample, which uses live data, can be continuously trained to learn and adapt even as the live data changes due to load patterns changing.

    ' } /> The full Spicepod manifest from this walkthrough can be added from spicerack.org using the spice add quickstarts/serverops command.

    ' } /> Summary'} /> Leveraging Spice.ai to be the decision engine for your server maintenance application helps you build smarter applications, faster that will continue to learn and adapt over time, even as usage patterns change over time.

    ' } /> Learn more and contribute' } /> Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!

    ' } /> Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

    ' } /> If you are interested in partnering, we\'d love to talk. Try out Spice.aiemail us "hey," join our community Slack, or reach out on Twitter.

    ' } /> We are just getting started! 🚀

    '} /> Luke

    '} /> --- ## Teaching Apps how to Learn with Spicepods URL: https://spice.ai/blog/teaching-apps-how-to-learn-with-spicepods Date: 2021-11-15T18:20:37 Description: Learn how Spicepods define application goals, rewards, and learning behavior - making it easy for developers to build applications that learn and adapt over time. The last post in this series, Making Apps that Learn and Adapt, described the shift from building AI/ML solutions to building apps that learn and adapt. But, how does the app learn? And as a developer, how do I teach it what it should learn?

    ' } /> With Spice.ai, we teach the app how to learn using a Spicepod.

    ' } /> Imagine you own a restaurant. You created a menu, hired staff, constructed the kitchen and dining room, and got off to a great start when it first opened. However, over the years, your customers' tastes changed, you've had to make compromises on ingredients, and there's a hot new place down the street... business is stagnating, and you know that you need to make some changes to stay competitive.

    " } /> You have a few options. First, you could gather all the data, such as customer surveyss, seasonal produce metrics, and staff performance profiles. You may even hire outside consultants. You then take this data to your office, and after spending some time organizing, filtering, and collating it, you've discovered an insight! Your seafood dishes sell poorly and cost the most... you are losing money! You spend several weeks or months perfecting a new menu, which you roll out with much fanfare! And then... business is still poor. What!? How could this be? It was a data-driven approach! You start the process again. While this approach is a worthy option, it has long latency from data to learning to implementation.

    " } /> Another option is to build real-time learning and adaption directly into the restaurant. Imagine a staff member whose sole job was learning and adapting how the restaurant should operate; lets name them Blue. You write a guide for Blue that defines certain goal metrics, like customer food ratings, staff happiness, and of course, profit. Blue tracks each dish served, from start to finish, from who prepared it to its temperature, its costs, and its final customer taste rating. Blue not only learns from each customer review as each dish is consumed but also how dish preparation affects other goal metrics, like profitability. The restaurant staff consults Blue to determine any adjustments to improve goal metrics as they work. The latency from data to learning, to adaption, has been reduced, from weeks or months to minutes. This option, of course, is not feasible for most restaurants, but software applications can use this approach. Blue and his instructions are analogous to the Spice.ai runtime and manifest.

    ' } /> In the Spice.ai model, developers teach the app how to learn by describing goals and rewarding its actions, much like how a parent might teach a child. As these rewards are applied in training, the app learns what actions maximize its rewards towards the defined goals.

    ' } /> Returning to the restaurant example, you can think of the Spice.ai runtime as Blue, and Spicepod manifests as the guide on how Blue should learn. Individual staff members would consult with Blue for ongoing recommendations on decisions to make and how to act. These goals and rewards are defined in Spicepods or \"pods\" for short. Spicepods are packages of configuration that describe the application's goals and how it should learn from data. Although it's not a direct analogy, Spicepods and their manifests can be conceptualized similar to Docker containers and Dockerfiles. In contrast, Dockerfiles define the packaging of your app, Spicepods specify the packaging of your app's learning and data.

    " } /> Anatomy of a Spicepod' } /> A Spicepod consists of:

    '} />
  • A required YAML manifest that describes how the pod should learn from data
  • Optional seed data
  • Learned model/state
  • Performance telemetry and metrics
  • ' } /> Developers author Spicepods using the spice CLI command such as with spice pod init <name> or simply by creating a manifest file such as mypod.yaml in the spicepods directory of their application.

    ' } /> Spicepods as packages' } /> On disk, Spicepods are generally layouts of a manifest file, seed data, and trained models, but they can also be exported as zipped packages.

    ' } /> When the runtime exports a Spicepod using the spice export command, it is saved with a .spicepod extension. It can then be shared, archived, or imported into another instance of the Spice.ai runtime.

    ' } /> Soon, we also expect to enable publishing of .spicepods to spicerack.org, from where community-created Spicepods can easily be added to your application using spice add <pod name> (currently, only Spice AI published pods are available on spicerack.org).

    ' } /> Treating Spicepods as packages and enabling their sharing and distribution through spicerack.org will help developers share their "restaurant guides" and build upon each other\'s work, much like they do with npmjs.org or pypi.org. In this way, developers can together build better and more intelligent applications.

    ' } /> In the next post, we\'ll dive deeper into authoring a Spicepod manifest to create an intelligent application. Follow @spice_ai on Twitter to get an update when we post.

    ' } /> If you haven\'t already, read the next the first post in the series, Making Apps that Learn and Adapt.

    ' } /> Learn more and contribute' } /> Building intelligent apps that leverage AI is still way too hard, even for advanced developers. Our mission is to make this as easy as creating a modern web page. If the vision resonates with you, join us!

    ' } /> Our Spice.ai Roadmap is public, and now that we have launched, the project and work are open for collaboration.

    ' } /> If you are interested in partnering, we\'d love to talk. Try out Spice.aiemail us "hey," join our community Slack, or reach out on Twitter.

    ' } /> We are just getting started! 🚀

    '} /> Luke

    '} /> --- ## True Hybrid Search: Vector, Full-Text, and SQL in One Runtime URL: https://spice.ai/blog/true-hybrid-search Date: 2025-09-22T17:46:00 Description: Build hybrid search without managing multiple systems. Query vectors, run full-text search, and execute SQL in one unified runtime. TL;DR '} />
  • The success of enterprise AI projects boils down to search and retrieval. In order to deliver production-grade AI apps, developers need tools that efficiently access data across distributed systems. 
  • Modern AI apps demand hybrid search across structured, unstructured, and vectorized data. Today\'s fragmented stacks introduce latency, complexity, and risk.
  • Spice solves the search challenge with its hybrid SQL search functionality that natively combines keyword/text search, relational filters, and vector similarity in one query.
  • Hybrid search alone isn\'t enough. To deliver production-grade AI, it must be combined with query federation, acceleration, and inference. Spice AI unifies query, search, and inference in a single, open-source runtime that queries data in place-across databases, warehouses, and object stores like S3-while accelerating the slowest layers.
  • Spice operationalizes object stores for AI, making S3, ADLS, and GCS fast enough for semantic and hybrid search without duplicating data into specialized systems.
  • ' } /> Show Me the (Data)!'} /> It's well established (and maybe even trite) to say that enterprises are going all-in on artificial intelligence, with more than $40 billion directed toward generative AI projects in recent years.

    " } /> The initial results have been underwhelming. A recent study from the Massachusetts Institute of Technology\'s NANDA initiative concluded that despite the enormous allocation of capital, 95% of enterprises have seen no measurable ROI from their AI initiatives. Only 5% of custom AI pilots ever make it to production, and just a fraction of those deliver meaningful business outcomes.

    ' } /> AI models are only as effective as the context (data) they can retrieve, but enterprise data is scattered across transactional databases, analytical platforms, object stores, and countless other data sources. To accommodate this fragmentation, enterprises have historically relied on fragile ETL pipelines, complex integrations, and siloed point solutions like search engines or vector databases. Each new layer introduces latency, operational overhead, and security risk. The proof is reflected in the results of MIT's study: instead of accelerating innovation, these pipelines are slowing it down. Enterprise-grade AI applications fail not necessarily due to the underlying models or because they can't be built, but because they can't reliably access the data they need.

    " } /> The Search Imperative'} /> Enterprise AI is fundamentally a search and retrieval problem. Applications and agents must be able to reach across structured and unstructured data, apply both exact filters and semantic search, and return the right context in milliseconds.

    ' } /> This requires three retrieval paradigms:

    '} />
  • Exact filtering: Exact matches and filters on structured, relational, or semi-structured data for fields, timestamps, and metadata.
  • Keyword search: Keyword discovery in unstructured text like documents and logs, ranked on term frequency and proximity.
  • Semantic search: Vector embeddings for semantic-based similarity across any data type, from tables to narratives, beyond literal terms.
  • ' } /> ‍Large organizations have worked around this problem by standing up multiple specialized systems: a relational database for queries, a search engine for text, a vector store for embeddings, etc. They then build pipelines to keep those systems in sync, often resulting in complexity and fragility.

    ' } /> Meanwhile, object storage offerings like Amazon S3, Azure Blob, and Google Cloud Storage serve as the system of record for massive volumes of enterprise data - and thus excellent data sources for AI applications. However, they were never designed for low-latency retrieval. Even with the emergence of open formats like Parquet, Iceberg, and Delta, raw performance lags far behind what modern AI applications require.

    ' } /> This ultimately leaves enterprise developers in a catch-22: either they duplicate data into faster but more expensive systems, or they accept latency that makes real-time AI use cases impractical. 

    ' } /> An Emerging Data Convergence'} /> Three industry shifts are now reshaping this landscape and enabling a new generation of applications:

    ' } />
  • Object stores are becoming queryable. With Iceberg, Delta, Hudi, and now S3 Vectors, they are evolving into platforms for active workloads, not just cold storage.
  • AI workloads are inherently hybrid. They need structured data for grounding, unstructured text for context, and embeddings for semantics. No single monolithic database can meet these needs.
  • Enterprises are under pressure to simplify. The cost of maintaining separate systems for query, search, and vector retrieval is too high - in dollars, in complexity, and in security risk.
  • ' } /> Taken together, these shifts mandate a new substrate that unifies search, query federation, and inference across all enterprise data.

    ' } /> Spice.ai: From Fragmented Data to Unified Intelligence' } /> Spice.ai was purpose-built to address this challenge, offering a data and AI platform that combines hybrid search (vector, full-text, and keyword) with query federation, acceleration, and LLM inference in one engine. 

    ' } /> Where other vendors solve a piece of the problem, Spice addresses the full lifecycle. For developers, this means one query interface replacing three. A hybrid search against structured, unstructured, and vectorized data can be expressed in a single SQL statement, with Spice abstracting all of the orchestration. Applications that once required stitching together a handful of different systems can now be built against one, and results that once took minutes arrive in milliseconds. 

    ' } /> Take the below Spice query as an example. One SQL combines vector search, full-text search, temporal and lexical filtering.

    ' } />
    Figure 1: Hybrid SQL search query
    ' } /> But search alone isn't enough. Enterprise AI applications also need low latency retrieval across data systems when integrating with LLMs. That's why Spice pairs its search engine with query federation, acceleration, and AI inference in a single, deploy-anywhere runtime. Instead of moving data, Spice queries it where it resides - across OLTP databases, OLAP warehouses, and object stores like S3 - while accelerating the slowest layers through caching and materialization for sub-second access. This makes it possible to serve data and AI-powered experiences directly from your existing systems - securely, at low latency, and without costly re-engineering.

    " } /> In turn, this architecture turns object stores from passive archives into active, AI-ready data layers. With the introduction of native support for Amazon S3 Vectors, Spice can even create, store, and query embeddings natively in S3, making semantic search as accessible as traditional text search or object retrieval.

    ' } /> Enterprises eliminate the operational drag of ETL pipelines and duplicated data stores, security improves because sensitive databases are never directly exposed to AI agents, and perhaps most importantly, AI apps that failed due to lack of reliable retrieval are now viable.

    ' } /> Architecture Overview'} /> Spice's lightweight compute engine is designed to integrate directly into your application stack and is built on a fully open-source foundation. It embeds the DataFusion SQL query engine, supports over 35 open-source connectors, and can locally materialize data for acceleration using DuckDB, SQLite, or PostgreSQL.

    " } /> Core components of Spice:

    '} />
  • Search & Retrieval. Vector, hybrid, and full-text search across structured and unstructured data.
  • Federated SQL Engine. Execute queries across disparate data sources.
  • Acceleration Engine. Materialize and pre-cache data for millisecond access.
  • LLM Inference. Load models locally or use as a router to hosted AI platforms like OpenAI, Anthropic, Bedrock, or NVIDIA NIM.
  • ' } />
    Figure 2: Spice.ai Compute Engine
    ' } /> Spice can be deployed:

    '} />
  • As a sidecar: Co-located with your application for ultra-low latency.
  • As a shared service: Centralized deployment serving multiple applications.
  • At the edge: Serve data and AI capabilities as close as possible to the user.
  • In the cloud: Fully-managed via the Spice.ai Cloud Platform.
  • ' } /> This deployment flexibility ensures Spice fits to your application architecture and compliance requirements. 

    ' } /> Applications Built on Spice'} /> Whether you're powering a RAG-enabled customer service bot, an internal AI agent, or a high-throughput transactional API, the foundational challenge is the same: getting fast, secure access to the right data at the right time.

    " } /> Spice solves that challenge once, and lets you deploy the same solution for every workload. Spice enables three primary categories of applications: AI Apps, RAG Apps, and Data Apps. 

    ' } /> 1. AI Apps'} /> AI models are only as good as the context they can access. As discussed, most enterprise AI efforts fail because retrieval is too slow, incomplete, or insecure.

    ' } /> Key benefits of using Spice for AI Apps include:

    '} />
  • Ground LLMs in enterprise data without moving data into separate stores.
  • Retrieve structured and unstructured data in real time, with hybrid SQL search.
  • Keep sensitive systems secure by acting as a containerized execution layer, so AI agents never query a production database directly.
  • ' } />
    Figure 3: AI app architecture
    ' } /> 2. RAG Apps'} /> Traditional RAG (retrieval-augmented generation) is powerful but fragile - performance, relevance, and freshness/veracity all depend on the retrieval layer.

    ' } /> Key benefits of using Spice for RAG Apps include:

    '} />
  • Unified retrieval that spans transactional, analytical, and object store data sources.
  • Low-latency query performance for interactive AI experiences.
  • Dynamic materializations that refresh context automatically, ensuring AI agents always have the latest data without paying a tax on re-ingestion.
  • ' } />

    '} />
    Figure 4: Hybrid search architecture
    ' } /> 3. Data Apps'} /> Modern data applications often need to unify data from multiple disparate systems, deliver low-latency results, and scale globally without constant ETL jobs or manual integrations.

    ' } /> Key benefits of using Spice for Data Apps include:

    '} />
  • Federated SQL queries across OLTP, OLAP, and object store systems without pre-processing or data duplication.
  • Accelerated database and object store queries to sub-second speeds, enabling a "Database CDN" model where working datasets are staged close to the application.
  • Real-time updates via Change Data Capture (CDC), intervals, or event triggers, so apps never query stale data.
  • ' } />
    Figure 5: Federation + Acceleration in Spice
    ' } /> The Path Forward with Spice ' } /> The boundaries between operational databases, analytics, and object stores are dissolving, and AI applications demand all three together in real-time. 

    ' } /> Traditional approaches, anchored on moving and transforming data between systems, can't keep up with these demands. Spice makes search and retrieval reliable, fast, and unified - turning fragmented data into a single searchable layer. With sub-second access to transactional, analytical, and object data, enterprises can finally deliver intelligent applications at scale.

    " } /> Getting Started'} /> Spice is open source (Apache 2.0) and can be installed in less than a minute on macOS, Linux, or Windows:

    ' } />
  • Explore the open source docs and blog for cookbooks, integration examples.
  • Visit the getting started guide 
  • Explore the 70+ cookbooks 
  • Try Spice.ai Cloud for a fully managed deployment and get started for free
  • ' } />

    '} /> --- ## What Data Informs AI-driven Decision Making? URL: https://spice.ai/blog/what-data-informs-ai-driven-decision-making Date: 2022-01-04T05:36:13 Description: Learn the three classes of data required for intelligent decision-making and how Spice.ai simplifies runtime data engineering for AI-powered applications. AI unlocks a new generation of intelligent applications that learn and adapt from data. These applications use machine learning (ML) to out-perform traditionally developed software. However, the data engineering required to leverage ML is a significant challenge for many product teams. In this post, we\'ll explore the three classes of data you need to build next-generation applications and how Spice.ai handles runtime data engineering for you.

    ' } /> While ML has many different applications, one way to think about ML in a real-time application that can adapt is as a decision engine. Phillip discussed decision engines and their potential uses in A New Class of Applications That Learn and Adapt. This decision engine learns and informs the application how to operate. Of course, applications can and do make decisions without ML, but a developer normally has to code that logic. And the intelligence of that code is fixed, whereas ML enables a machine to constantly find the appropriate logic and evolve the code as it learns. For ML to do this, it needs three classes of data.

    ' } /> The three classes of data for informed decision making' } /> We don't want any decision, though. We want high-quality, informed decisions. If you consider making higher quality, informed decisions over time, you need three classes of information. These classes are historical information, real-time or present information, and the results of your decisions.

    " } /> Especially recently, stock or crypto trading is something many of us can relate to. To make high-quality, informed investing decisions, you first need general historical information on the price, security, financials, industry, previous trades, etc. You study this information and learn what might make a good investment or trade.

    ' } /> Second, you need a real-time updated stream of data as it happens to make a decision. If you were stock trading, this information might be the stock price on the day or hour you want to make the trade. You need to apply what you learned from historical data to the current information to decide what trade to place.

    ' } /> Finally, if we're going to make better decisions over time, we need to capture and learn from the results of those decisions. Whether you make a great or poor trade, you want to incorporate that experience into your historical learning.

    " } /> Using all three data classes together results in higher quality decisions over time. Broad data across these classes are useful, and we could make some nice trades with that. Still, we can make an even higher quality trading decision with personal context. For example, we may want to consider the individual tax consequences or risk level of the trade for our situation. So each of these classes also comes with global or local variants. We combine global information, like what worked well for everyone, and local experience, what worked well for us and our situation, to make the best, overall informed decision.

    ' } /> The waterfall approach to data engineering' } /> Consider how you would capture these three data classes and make them available to both the application and ML in the trading example. This data engineering can be a pretty big challenge.

    ' } /> First, you need a way to gather and consume historical information, like stock prices, and keep that updated over time. You need to handle streaming constantly updated real-time data to make runtime decisions on how to operate. You need to capture and match the decisions you make and feed that back into learning. And finally, you need a way to provide personal or local context, like holding off on sell trades until next year, to stay within a tax threshold, or identifying a pattern you like to trade. If all this wasn\'t enough, as we learned from Phillip\'s AI needs AI-ready data post, all three data classes need to be in a format that ML can use.

    ' } /> If you can afford a data or ML team, they may do much of this for you. However, this model starts to look quite waterfall-like and is not suited well to applications that want to learn and adapt in real-time. Like a waterfall approach, you would provide requirements to your data team, and they would do the data engineering required to provide you with the first two classes of data, historical and real-time. They may give you ML-ready data or train an ML model for you. However, there is often a large latency to apply that data or model in your application and a long turn-around time if it does not meet your requirements. In addition, to capture the third class of data, you would need to capture and send the results of the decisions your application made as a result of using those models back to the data team to incorporate in future learning. This latency through the data, decision-making, learning, and adaptation process is often infeasible for a real-world app.

    ' } /> And, if you can't afford a data team, you have to figure out how to do all that yourself.

    " } /> The agile approach' } /> Modern software engineering practices have favored agile methodologies to reduce time to learn and adapt applications to customer and business needs. Spice.ai takes inspiration from agile methods to provide developers with a fast, iterative development cycle.

    ' } /> Spice.ai provides mechanisms for making all three classes of data available to both the application and the decision engine. Developers author Spicepods declaring how data should be captured, consumed, and made ML-ready so that all three classes are consistent and ML available.

    ' } /> The Spice.ai runtime exposes developer-friendly APIs and data connectors for capturing and consuming data and annotating that data with personal context. The runtime generates AI-ready data for you and makes it available directly for ML. These APIs also make it easy to capture application decisions and incorporate the resulting learning.

    ' } /> The Spice.ai approach short circuits the traditional waterfall-like data process by keeping as much data as possible application local instead of round-tripping through an external pipeline or team, especially valuable for real-time data. The application can learn and adapt faster by reducing the latency of decision consequences to learning.

    ' } /> Spice.ai enables personalized learning from personal context and experiences through the interpretations mechanism. Interpretations allow an application to provide additional information or an "interpretation" of a time range as input to learning. The trading example could be as simple as labeling a time range as a good time to buy or providing additional contextual information such as tax considerations, etc. Developers can also use interpretations to record the results of decisions with more context than what might be available in the observation space. You can read more about Interpretations in the Spice.ai docs.

    ' } /> While Spice.ai focuses on ensuring consistent ML-ready data is available, it does not replace traditional data systems or teams. They still have their place, especially for large historical datasets, and Spice.ai can consume data produced by them. Where possible, especially for application and real-time data, Spice.ai keeps runtime data local to create a virtuous cycle of data from the application to the decision engine and back again, enabling faster and more agile learning and adaption.

    ' } /> Summary'} /> In summary, to build an intelligent application driven from AI recommended decisions, a significant amount of data engineering can be required to learn, make decisions, and incorporate the results. The Spice.ai runtime enables you as a developer to focus on consuming those decisions and tuning how the AI engine should learn rather than the runtime data engineering.

    ' } /> The potential of the next generation of intelligent applications to improve the quality of our lives is very exciting. Using AI to help applications make better decisions, whether that be AI-assisted investing, improving the energy efficiency of our homes and buildings, or supporting us in deciding on the most appropriate medical treatment, is very promising.

    ' } /> Learn more and contribute' } /> Even for advanced developers, building intelligent apps that leverage AI is still way too hard. Our mission is to make this as easy as creating a modern web page. If that vision resonates with you, join us!

    ' } /> If you want to get involved, we\'d love to talk. Try out Spice.aiemail us "hey," join our community Slack, or reach out on Twitter.

    ' } /> Luke

    '} /> --- ## Write to Apache Iceberg Tables with SQL in Spice URL: https://spice.ai/blog/write-to-apache-iceberg-tables-with-sql Date: 2025-11-18T23:27:09 Description: Spice v1.8 adds native Apache Iceberg write support with standard SQL INSERT INTO statements. Build complete data workflows without ETL - query, accelerate, and write from one runtime. **TL;DR:** Spice v1.8 adds native [Apache Iceberg](https://iceberg.apache.org/) write support using standard SQL `INSERT INTO` statements. Write query results, transformed data, or new records directly to Iceberg tables from the same runtime used for [federation and acceleration](/platform/sql-federation-acceleration) -- no separate ETL pipeline required. --- With the release of Spice v1.8, developers can now write directly to Apache Iceberg tables and catalogs using standard SQL INSERT INTO statements. 

    ' } /> This feature extends Spice\'s SQL federation capabilities beyond reads, enabling data ingestion, transformation, and pipeline workloads to write results back into Iceberg directly from the same runtime used for queries and acceleration.

    ' } /> What sets Spice apart from other query engines is its broader, application-focused feature set designed for modern data and AI workloads. Spice brings together federation, hybrid search, embedded LLM inference, and now native writes in one unified runtime - enabling teams to build complete, end-to-end workflows without the management overhead and performance concessions of using multiple systems. 

    ' } /> Iceberg write support is available in preview, with append-only operations and schema validation for secure and predictable data management.

    ' } /> From Read-Only Federation to Full Data Workflows' } /> Data teams are standardizing on open table formats like Apache Iceberg to unify analytical and operational data across systems; Iceberg offers a consistent way to store, version, and manage data across different engines and clouds, helping teams avoid vendor lock-in while maintaining strong governance and interoperability.

    ' } /> Supporting Iceberg writes natively inside Spice means development teams can:

    ' } />
  • Direct Writes to Iceberg without ETL: Insert data directly into Iceberg from SQL queries.
  • Simplify ingestion paths: Load transformed or federated data into Iceberg without separate tools.
  • Enforce governance: Maintain schema validation and secure access through read_write permissions.
  • ' } /> Paired with Spice\'s built-in SQL federation and acceleration, these write capabilities make it easier to use Iceberg not just as a storage solution, but as a queryable data layer for both operational and AI workloads

    ' } /> How It Works'} /> Spice supports INSERT_INTO statements on Iceberg tables and catalogs explicitly marked as read_write.

    ' } /> Example Spicepod configuration:

    ' } /> ```python catalogs: - from: iceberg:http://localhost:8181/v1/namespaces access: read_write # Uncomment this line name: ice params: iceberg_s3_endpoint: http://localhost:9000 iceberg_s3_access_key_id: admin iceberg_s3_secret_access_key: password iceberg_s3_region: us-east-1 ``` And, here's an example SQL query:

    "} /> ```python -- Insert from another table INSERT INTO iceberg_table SELECT * FROM existing_table; -- Insert with values INSERT INTO iceberg_table (id, name, amount) VALUES (1, 'John', 100.0), (2, 'Jane', 200.0); -- Insert into catalog table INSERT INTO ice.sales.transactions VALUES (1001, '2025-01-15', 299.99, 'completed'); ``` Support for updates, deletes, and merges will be added in future releases.

    ' } /> Now, let's walk through an end-to-end workflow demonstrating how to execute Iceberg writes in Spice.

    " } /> Write to Iceberg Tables with Spice Cookbook' } /> Prerequisites: 

    '} />
  • Access to an Iceberg catalog, or Docker to run an Iceberg catalog locally.
  • Spice is installed (see the Getting Started documentation).
  • ' } /> Step 1: Create a new directory and initialize a Spicepod' } /> ```python mkdir iceberg-catalog-recipe cd iceberg-catalog-recipe spice init ``` Step 2. Run the Docker container for the Iceberg catalog' } /> In a separate terminal, clone the cookbook repository and run the Docker container for the Iceberg catalog.

    ' } /> ```python git clone https://github.com/spiceai/cookbook.git cd cookbook/catalogs/iceberg docker compose up -d ``` Step 3. Add the Iceberg Catalog Connector to your Spicepod' } /> ```python catalogs: - from: iceberg:http://localhost:8181/v1/namespaces # access: read_write name: ice params: iceberg_s3_endpoint: http://localhost:9000 iceberg_s3_access_key_id: admin iceberg_s3_secret_access_key: password iceberg_s3_region: us-east-1 ``` Step 4. Run Spice'} /> ```python spice run 2025/01/27 11:08:36 INFO Checking for latest Spice runtime release... 2025/01/27 11:08:37 INFO Spice.ai runtime starting... 2025-01-27T19:08:37.494155Z INFO runtime::init::dataset: No datasets were configured. If this is unexpected, check the Spicepod configuration. 2025-01-27T19:08:37.494905Z INFO runtime::init::catalog: Registering catalog 'ice' for iceberg 2025-01-27T19:08:37.499162Z INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090 2025-01-27T19:08:37.499174Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051 2025-01-27T19:08:37.500689Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090 2025-01-27T19:08:37.503376Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052 2025-01-27T19:08:37.696469Z INFO runtime::init::results_cache: Initialized results cache; max size: 128.00 MiB, item ttl: 1s 2025-01-27T19:08:37.697178Z INFO runtime::init::catalog: Registered catalog 'ice' with 1 schema and 8 tables ``` Step 5. Query the Iceberg catalog' } /> ```python spice sql sql> show tables; +---------------+--------------+--------------+------------+ | table_catalog | table_schema | table_name | table_type | +---------------+--------------+--------------+------------+ | ice | tpch_sf1 | lineitem | BASE TABLE | | ice | tpch_sf1 | nation | BASE TABLE | | ice | tpch_sf1 | orders | BASE TABLE | | ice | tpch_sf1 | supplier | BASE TABLE | | ice | tpch_sf1 | customer | BASE TABLE | | ice | tpch_sf1 | partsupp | BASE TABLE | | ice | tpch_sf1 | region | BASE TABLE | | ice | tpch_sf1 | part | BASE TABLE | | spice | runtime | task_history | BASE TABLE | | spice | runtime | metrics | BASE TABLE | +---------------+--------------+--------------+------------+ ``` Run Pricing Summary Report Query (Q1). More information about TPC-H and all the queries involved can be found in the official TPC Benchmark H Standard Specification.

    ' } /> ```python select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from ice.tpch_sf1.lineitem where l_shipdate <= date '1998-12-01' - interval '110' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus ; ``` Output:

    '} /> ```python +--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+ | l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order | +--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+ | A | F | 37734107.00 | 56586554400.73 | 53758257134.8700 | 55909065222.827692 | 25.522005 | 38273.129734 | 0.049985 | 1478493 | | N | F | 991417.00 | 1487504710.38 | 1413082168.0541 | 1469649223.194375 | 25.516471 | 38284.467760 | 0.050093 | 38854 | | N | O | 73416597.00 | 110112303006.41 | 104608220776.3836 | 108796375788.183317 | 25.502437 | 38249.282778 | 0.049996 | 2878807 | | R | F | 37719753.00 | 56568041380.90 | 53741292684.6040 | 55889619119.831932 | 25.505793 | 38250.854626 | 0.050009 | 1478870 | +--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+ Time: 0.186233833 seconds. 10 rows. ``` Step 6. Write to Iceberg tables'} /> To enable write operations to Iceberg tables, uncomment the access: read_write configuration and restart Spice.

    ' } /> 6.1. Update the Spicepod configuration' } /> Edit the spicepod.yaml file to uncomment the access line:

    ' } /> ```python catalogs: - from: iceberg:http://localhost:8181/v1/namespaces access: read_write # Uncomment this line name: ice params: iceberg_s3_endpoint: http://localhost:9000 iceberg_s3_access_key_id: admin iceberg_s3_secret_access_key: password iceberg_s3_region: us-east-1 ``` 6.2. Restart Spice'} /> Stop the current Spice instance (Ctrl+C) and restart it:

    '} /> ```python spice run ``` 6.3. Insert data into Iceberg tables' } /> Now you can write data to the Iceberg tables using SQL INSERT statements:

    ' } /> ```python spice sql ``` ‍Example: Insert a new region into the region table:

    '} /> ```python INSERT INTO ice.tpch_sf1.region (r_regionkey, r_name, r_comment) VALUES (5, 'ANTARCTICA', 'A cold and remote region'); +-------+ | count | +-------+ | 1 | +-------+ ``` Example: Insert a new nation into the nation table:

    '} /> ```python INSERT INTO ice.tpch_sf1.nation (n_nationkey, n_name, n_regionkey, n_comment) VALUES (25, 'PENGUINIA', 5, 'A vibrant home for brave penguins in Antarctica'); +-------+ | count | +-------+ | 1 | +-------+ ``` ‍Verify the inserts by querying the tables:

    '} /> ```python SELECT * FROM ice.tpch_sf1.region WHERE r_regionkey = 5; SELECT * FROM ice.tpch_sf1.nation WHERE n_nationkey = 25; ``` Step 7. View the Iceberg tables in MinIO' } /> Navigate to http://localhost:9001 and login with admin and password. View the iceberg bucket to see the created Iceberg tables.

    ' } /> Step 8. Clean up'} /> ```python docker compose down --volumes --rmi local ``` Next steps with Iceberg writes in Spice' } /> Iceberg write support is available in preview. See the Iceberg connector docs for configuration details and try the Iceberg Catalog Connector recipe to get started. 

    ' } /> Feedback is welcome as we round out support for Iceberg writes in upcoming releases! 

    ' } />

    '} /> --- ## Careers URL: https://spice.ai/careers Date: 2025-11-19T20:45:41 Description: Join Spice AI and help build the future of data and AI infrastructure. --- ## Contact URL: https://spice.ai/contact Date: 2025-11-19T20:34:04 Description: Get in touch with the Spice AI team. Whether you're exploring enterprise deployments, pricing, integrations, or technical questions, we're here to help. --- ## Spice.ai Cookbook URL: https://spice.ai/cookbook Date: 2026-02-24T00:00:00 Description: A collection of guides and samples to help you build data-grounded AI apps and agents with Spice.ai Open-Source. Find ready-to-use examples for data acceleration, AI agents, LLM memory, and more. {/* */} The cookbook is a curated set of practical, ready-to-run recipes for Spice.ai Open Source. Recipes cover connectors, federation, acceleration, hybrid search, model integration, and deployment patterns.

    \n', }, { title: 'Which recipe should I start with?', paragraph: '

    If you are new to Spice, start with a recipe that matches your immediate goal: federated SQL for multi-source queries, DuckDB acceleration for faster performance, or OpenAI SDK and MCP recipes for AI agent use cases.

    \n', }, { title: 'Do cookbook recipes work with Spice Cloud?', paragraph: '

    Yes. Cookbook patterns are based on Spice OSS capabilities and can be adapted to Spice Cloud workflows. You can start locally with OSS and move to managed deployments as workloads grow. See Spice Cloud pricing.

    \n', }, ], padding_top: 'st-lg', padding_bottom: 'sb-lg', }} /> --- ## AI Model Serving URL: https://spice.ai/feature/ai-model-serving Date: 2025-11-14T14:45:31 Description: Serve, evaluate, and ground AI models directly inside Spice. Call LLMs locally or connect to hosted providers from one secure, high-performance runtime. --- ## Distributed Query URL: https://spice.ai/feature/distributed-query Date: 2025-11-04T19:51:09 Description: Scale beyond single-node limits with petabyte-scale, multi-node, distributed queries. --- ## Edge to Cloud Deployments URL: https://spice.ai/feature/edge-to-cloud-deployments Date: 2025-11-14T14:44:32 Description: Deploy Spice anywhere, from lightweight sidecars to enterprise clusters. Choose the architecture that fits your performance, scale, and governance needs. --- ## MCP Server & Gateway URL: https://spice.ai/feature/mcp-server-gateway Date: 2025-11-14T14:44:48 Description: Deploy MCP servers locally or over SSE, route tools to models, and expose Spice securely as an MCP gateway with full observability. --- ## Real-Time Change Data Capture URL: https://spice.ai/feature/real-time-change-data-capture Date: 2025-11-14T14:45:01 Description: Sync accelerated datasets with real-time changes using Change Data Capture (CDC) and maintain low-latency analytics without full-table refreshes. --- ## Secure AI Sandboxing URL: https://spice.ai/feature/secure-ai-sandboxing Date: 2025-11-14T14:45:12 Description: Safely connect AI to enterprise data. Spice isolates access for agents and models, enforcing least privilege, observability, and compliance across every query. --- ## Get a demo URL: https://spice.ai/get-a-demo Date: 2026-01-09T02:06:37 Description: Get in touch with the Spice AI team. Whether you're exploring enterprise deployments, pricing, integrations, or technical questions, we're here to help. --- ## Home URL: https://spice.ai/home Date: 2025-10-14T16:53:33 Description: Ground AI in enterprise data with zero ETL. Spice is an open-source SQL query & hybrid search engine built for data-intensive apps & AI agents. data lakehouse', paragraph: 'Open-source SQL federation, data acceleration, and hybrid search for data-intensive AI apps. ', cta: [ { type: 'filled', cta: { title: 'Talk to an engineer', url: 'https://meetings.hubspot.com/lukekim/talk-to-sales?uuid=836fd7be-a95e-4cee-b0cb-044fd8ea52a4&utm_campaign=22505145-Website+Demo+Requests&utm_source=homepageheader&utm_medium=website&utm_content=demo+request', target: '_blank', }, }, { type: 'bordered', cta: { title: 'Start for free', url: '/login', target: '_blank', }, }, ], media_type: 'image', media_image: { ID: 1499, id: 1499, title: 'homepage_header', filename: 'homepage_header.png', filesize: 310155, url: '/website-assets/media/2025/11/homepage_header.png', link: '/home/homepage_header/', alt: 'Spice.ai platform architecture diagram showing unified query, search, and AI inference', author: '6', description: '', caption: '', name: 'homepage_header', status: 'inherit', uploaded_to: 2, date: '2025-11-27 16:59:58', modified: '2025-11-27 16:59:58', menu_order: 0, mime_type: 'image/png', type: 'image', subtype: 'png', icon: '/website-assets/media/2025/11/quote_icon.svg', width: 1614, height: 1028, sizes: { thumbnail: '/website-assets/media/2025/11/homepage_header.png', 'thumbnail-width': 150, 'thumbnail-height': 96, medium: '/website-assets/media/2025/11/homepage_header.png', 'medium-width': 300, 'medium-height': 191, medium_large: '/website-assets/media/2025/11/homepage_header.png', 'medium_large-width': 768, 'medium_large-height': 489, large: '/website-assets/media/2025/11/homepage_header.png', 'large-width': 1024, 'large-height': 652, '1536x1536': '/website-assets/media/2025/11/homepage_header.png', '1536x1536-width': 1536, '1536x1536-height': 978, '2048x2048': '/website-assets/media/2025/11/homepage_header.png', '2048x2048-width': 1614, '2048x2048-height': 1028, }, }, media_video: '', }} /> anywhere', description: 'Run Spice.ai Open Source locally, at the edge, or on the fully managed Spice.ai Cloud Platform. Lightweight, portable, and designed for scale.', cta: { title: 'Learn more', url: '/feature/edge-to-cloud-deployments', target: '_blank', }, }, { icon: { ID: 453, id: 453, title: 'homepage_ai_sandboxing', filename: 'homepage_ai_sandboxing.svg', filesize: 0, url: '/website-assets/media/2025/11/homepage_ai_sandboxing.svg', link: '/home/homepage_ai_sandboxing/', alt: 'AI sandboxing and security icon', author: '6', description: '', caption: '', name: 'homepage_ai_sandboxing', status: 'inherit', uploaded_to: 2, date: '2025-11-05 20:14:15', modified: '2025-11-05 20:14:15', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/2025/11/quote_icon.svg', width: 48, height: 48, sizes: { thumbnail: '/website-assets/media/2025/11/homepage_ai_sandboxing.svg', 'thumbnail-width': 48, 'thumbnail-height': 48, medium: '/website-assets/media/2025/11/homepage_ai_sandboxing.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/homepage_ai_sandboxing.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/homepage_ai_sandboxing.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/homepage_ai_sandboxing.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/homepage_ai_sandboxing.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'AI sandboxing
    & security', description: 'Provision isolated, least-privilege datasets for apps and agents with zero direct database access. Keep governance intact while enabling RAG, agents, and AI workflows.', cta: { title: 'Learn more', url: '/feature/secure-ai-sandboxing', target: '_blank', }, }, { icon: { ID: 1041, id: 1041, title: 'homepage_distributed_observability', filename: 'homepage_distributed_observability.svg', filesize: 0, url: '/website-assets/media/2025/11/homepage_distributed_observability.svg', link: '/home/homepage_distributed_observability/', alt: 'Distributed observability icon', author: '6', description: '', caption: '', name: 'homepage_distributed_observability', status: 'inherit', uploaded_to: 2, date: '2025-11-21 15:43:03', modified: '2025-11-21 15:43:03', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/2025/11/quote_icon.svg', width: 65, height: 48, sizes: { thumbnail: '/website-assets/media/2025/11/homepage_distributed_observability.svg', 'thumbnail-width': 65, 'thumbnail-height': 48, medium: '/website-assets/media/2025/11/homepage_distributed_observability.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/homepage_distributed_observability.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/homepage_distributed_observability.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/homepage_distributed_observability.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/homepage_distributed_observability.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Distributed
    observability', description: 'Perform end-to-end tracing across SQL, embeddings, search, and LLM calls. Debug, measure latency, and prove ROI from a single view.', cta: { title: 'Learn more', url: 'https://docs.spice.ai/features/observability', target: '_blank', }, }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## Cybersecurity URL: https://spice.ai/industry/cybersecurity Date: 2025-11-21T22:02:05 Description: Build fast, reliable, and intelligent cybersecurity applications. Spice delivers unified data access, real-time performance, and embedded AI integration across any environment. --- ## Financial Services URL: https://spice.ai/industry/financial-services Date: 2025-11-21T21:57:14 Description: Unify, govern, and accelerate sensitive financial data. Spice delivers federation, hybrid search, and integrated AI for regulated workloads. --- ## SaaS URL: https://spice.ai/industry/saas Date: 2025-11-21T22:11:12 Description: Power SaaS with live, governed data. Federate across warehouses and DBs, accelerate to millisecond latency, and add AI-all on one portable runtime. --- ## Integrations URL: https://spice.ai/integrations Date: 2025-11-21T00:46:47 Description: Spice offers 30+ integrations with leading databases, warehouses, data lakes, streaming systems, and more. --- ## What is Apache Ballista? URL: https://spice.ai/learn/apache-ballista Date: 2026-02-14T00:00:00 Description: Apache Ballista is a distributed SQL query engine that extends Apache DataFusion across multiple nodes. Learn how Ballista works, its architecture, and how it compares to Spark and Trino. [Apache DataFusion](/learn/apache-datafusion) is a powerful single-node SQL query engine, but some workloads exceed what a single machine can handle. A dataset might be too large to fit in memory, a query might need to scan terabytes of data within a latency budget, or the computational cost of a complex join might benefit from parallelism across multiple cores on multiple nodes. Apache Ballista solves this by extending DataFusion's query engine across a cluster of machines. It takes DataFusion's SQL parsing, logical planning, and optimization capabilities and adds the distributed execution layer needed to partition work, shuffle data between nodes, and merge results. Each node in a Ballista cluster runs DataFusion as its local query engine, and Ballista coordinates them. ## Architecture Ballista uses a scheduler-executor architecture, similar in concept to Spark's driver-executor model but implemented in Rust with Apache Arrow as the native data format. ### Scheduler The scheduler is the coordinator of a Ballista cluster. When a SQL query arrives, the scheduler: 1. **Parses and optimizes** the query using DataFusion's standard SQL parser and optimizer 2. **Creates a distributed execution plan** by analyzing the logical plan and inserting exchange operators where data needs to move between nodes (e.g., for joins and aggregations that require data from multiple partitions) 3. **Partitions the work** into stages and tasks. A stage is a sequence of operations that can execute on a single partition without data exchange. Tasks are individual units of work assigned to executor nodes. 4. **Assigns tasks to executors** based on data locality, executor capacity, and load balancing 5. **Tracks progress** and handles retries if an executor fails or a task times out The scheduler maintains a global view of the cluster state, including which executors are available, which tasks are running, and which stages are complete. ### Executors Executors are the worker nodes that perform the actual computation. Each executor: 1. **Receives task assignments** from the scheduler 2. **Executes tasks** using its local DataFusion engine. The executor reads data from its assigned partitions, applies the operators in the physical plan (scan, filter, project, aggregate, etc.), and produces Arrow record batches as output. 3. **Writes intermediate results** to local storage (or exchanges them with other executors) for downstream stages 4. **Reports status** back to the scheduler, including completion, failure, and performance metrics Because each executor runs a full DataFusion engine, all of DataFusion's single-node optimizations -- predicate pushdown, projection pruning, vectorized execution on Arrow arrays -- apply at the per-node level. Ballista adds the coordination and data exchange layer on top. ## How Distributed Query Execution Works Distributed query execution introduces several concepts that don't exist in single-node engines. Understanding these is key to understanding how Ballista (and distributed query engines in general) operate. ### Partitioning Data is divided into partitions -- subsets of the full dataset that can be processed independently. Partitioning can be based on: - **Hash partitioning:** Rows are assigned to partitions based on a hash of one or more columns. This ensures that all rows with the same key end up in the same partition, which is necessary for hash joins and group-by aggregations. - **Range partitioning:** Rows are assigned to partitions based on value ranges. This is useful for ordered scans and range queries. - **Round-robin partitioning:** Rows are distributed evenly across partitions without regard to content. This maximizes parallelism for operations that don't require co-located keys. The choice of partitioning strategy affects both performance and correctness. A hash join, for example, requires that both sides of the join are hash-partitioned on the join key so that matching rows are co-located on the same executor. ### Shuffles and Exchanges When a query requires data to be repartitioned -- for example, when a hash join needs data partitioned by the join key, but the data is currently range-partitioned -- a shuffle (or exchange) occurs. During a shuffle: 1. Each executor reads its local partitions and computes the target partition for each row based on the new partitioning scheme 2. Rows are serialized as Arrow record batches and sent over the network to the appropriate executor 3. The receiving executor collects incoming batches and makes them available for the next stage of execution Shuffles are the most expensive operation in distributed query execution because they involve network I/O and serialization. Minimizing unnecessary shuffles is a key optimization goal for distributed query planners. ### Stages Ballista breaks a distributed query plan into stages separated by exchange boundaries. Within a stage, all operations can execute on a single partition without data exchange. Between stages, shuffles repartition the data as needed. For example, a query that joins two tables and then aggregates the result might be broken into three stages: 1. **Stage 1:** Scan and filter table A, hash-partition by join key 2. **Stage 2:** Scan and filter table B, hash-partition by join key 3. **Stage 3:** Perform the hash join on co-located partitions, then aggregate Stages 1 and 2 can execute in parallel across different executors. Stage 3 depends on both Stage 1 and Stage 2 completing, because it needs the shuffled output from both. ## Ballista vs. Other Distributed Query Engines ### Ballista vs. Apache Spark Spark is the most widely deployed distributed data processing framework. It runs on the JVM, supports multiple languages (Scala, Python, Java, R), and has a mature ecosystem of libraries for batch processing, streaming, machine learning, and graph processing. Ballista differs in several ways: - **Language and runtime:** Ballista is written in Rust with no JVM dependency. This means lower memory overhead, more predictable performance (no garbage collection pauses), and faster startup times. - **Data format:** Ballista uses Apache Arrow as its native in-memory format. Spark uses its own internal row format for many operations and converts to/from Arrow when interfacing with external systems. Ballista's native Arrow integration eliminates this conversion overhead. - **Footprint:** Ballista is a lightweight distributed query engine. Spark is a comprehensive data processing framework that includes batch, streaming, ML, and graph libraries. Ballista is smaller and more focused. - **Extensibility:** Both are extensible, but Ballista inherits DataFusion's Rust trait-based extension model, while Spark uses JVM-based plugin interfaces. Choose Spark when you need a mature, full-featured distributed data processing platform with a large ecosystem. Choose Ballista when you need a lightweight, Rust-native distributed SQL engine with native Arrow integration and lower operational overhead. ### Ballista vs. Trino Trino (formerly Presto) is a distributed SQL query engine designed for interactive analytics and [SQL federation](/learn/sql-federation) across heterogeneous data sources. Trino has a mature production track record and a rich connector ecosystem. Ballista and Trino share the same high-level architecture (scheduler + workers), but differ in implementation: - **Language:** Trino is written in Java. Ballista is written in Rust. - **Data format:** Trino uses its own internal page format. Ballista uses Apache Arrow natively. - **Embeddability:** Trino is designed to be deployed as a standalone cluster. Ballista, like DataFusion, is designed to be embeddable -- it can be integrated into a larger application rather than requiring standalone deployment. - **Maturity:** Trino has years of production deployment at major companies. Ballista is newer and under active development. Choose Trino when you need a production-proven distributed SQL engine with a broad connector ecosystem. Choose Ballista when you need a Rust-native, Arrow-native distributed engine that can be embedded into a custom system. ## Ballista and DataFusion: The Relationship Ballista is built directly on top of DataFusion. This relationship is fundamental to understanding both projects: - **DataFusion** provides SQL parsing, logical planning, query optimization, and single-node physical execution. It is a library that runs in a single process. - **Ballista** adds distributed scheduling, partitioning, shuffles, and inter-node coordination. It uses DataFusion as the per-node execution engine. When Ballista executes a query, each executor node runs DataFusion locally. DataFusion handles all the per-partition computation -- scanning, filtering, projecting, joining, aggregating. Ballista handles the coordination between nodes -- deciding which executor processes which partition, managing shuffles, and collecting final results. This separation means that improvements to DataFusion's optimizer or execution engine automatically benefit Ballista deployments. And DataFusion extensions -- custom table providers, UDFs, optimizer rules -- work in Ballista without modification. ## Current Status and Development Ballista is an incubating project within the Apache Arrow ecosystem. It is under active development, with contributions from multiple organizations. Key areas of ongoing work include: - **Fault tolerance:** Improving task retry logic and executor failure recovery - **Resource management:** Better scheduling based on executor memory and CPU availability - **Performance:** Reducing shuffle overhead and improving exchange operator efficiency - **Integration:** Expanding the set of data sources and file formats supported in distributed mode Ballista is suitable for experimental and early production workloads. For mission-critical production deployments that require mature fault tolerance and operations tooling, teams should evaluate Ballista alongside established alternatives like Trino and Spark. ## How Spice Uses Distributed Query Concepts Spice builds on the distributed query concepts pioneered by Ballista and other distributed engines. Spice's distributed architecture enables [SQL federation](/learn/sql-federation) and [data acceleration](/learn/data-acceleration) across multiple nodes: - **Distributed federation:** Queries are federated across data sources from any node in a Spice cluster. The query planner determines the optimal execution strategy, including which sources to query from which nodes. - **Distributed acceleration:** Accelerated datasets can be partitioned across nodes, with each node caching a subset of the data. Queries are routed to the nodes that hold the relevant partitions. - **Arrow-native transport:** Like Ballista, Spice uses Apache Arrow as its native data format for inter-node communication, eliminating serialization overhead. By combining DataFusion's single-node query engine with distributed execution capabilities, Spice delivers [sub-second federated queries](/platform/sql-federation-acceleration) across distributed data sources and acceleration caches. ## Advanced Topics ### Scheduler-Executor Architecture in Depth The scheduler and executors communicate through a combination of gRPC services and Arrow Flight endpoints. The scheduler exposes a planning API that accepts SQL or pre-built logical plans and returns a job identifier. It then decomposes the job into a directed acyclic graph (DAG) of stages and tasks. ```mermaid flowchart TD Client["Client"] -->|"SQL / Logical Plan"| Scheduler["Scheduler"] Scheduler -->|"Task Assignment"| E1["Executor 1"] Scheduler -->|"Task Assignment"| E2["Executor 2"] Scheduler -->|"Task Assignment"| E3["Executor 3"] E1 -->|"Shuffle Data"| E2 E1 -->|"Shuffle Data"| E3 E2 -->|"Shuffle Data"| E1 E2 -->|"Shuffle Data"| E3 E3 -->|"Shuffle Data"| E1 E3 -->|"Shuffle Data"| E2 E1 -->|"Status / Results"| Scheduler E2 -->|"Status / Results"| Scheduler E3 -->|"Status / Results"| Scheduler Scheduler -->|"Final Results"| Client ``` Each executor registers with the scheduler at startup, reporting its available resources (CPU cores, memory). The scheduler uses this information to make placement decisions. When a task completes, the executor reports back with metrics -- execution time, rows processed, bytes shuffled -- which the scheduler uses to refine future scheduling decisions within the same job. ### Shuffle Strategies Shuffles are the most performance-critical aspect of distributed query execution. Ballista supports several shuffle strategies, each suited to different workload patterns: **Hash shuffle** is the default for joins and group-by aggregations. Each executor hashes each row's partition key and writes it to one of N output partitions. The receiving executors pull their assigned partitions. This ensures co-location of matching keys but can create hot partitions if the key distribution is skewed. **Sort-merge shuffle** is used when the downstream stage requires sorted input -- for example, a sort-merge join or a global ORDER BY. Each executor sorts its local partition and writes sorted runs. The downstream stage merges these sorted runs without needing to buffer the full dataset. **Broadcast shuffle** is an optimization for small tables. When one side of a join is small enough to fit in executor memory, the scheduler broadcasts the entire small table to all executors rather than hash-partitioning both sides. This eliminates one full shuffle and is a significant performance win for star-schema queries with small dimension tables. The query planner selects shuffle strategies based on the physical plan operators, available statistics, and configurable thresholds (e.g., the broadcast size limit). ### Fault Tolerance Distributed query execution must handle executor failures gracefully. Ballista's fault tolerance model operates at the task level: - **Heartbeat monitoring:** The scheduler expects periodic heartbeats from each executor. If an executor misses consecutive heartbeats, the scheduler marks it as lost and reassigns its in-flight tasks to other executors. - **Task retries:** When a task fails -- whether due to executor failure, out-of-memory errors, or data source errors -- the scheduler retries the task on a different executor up to a configurable retry limit. If the task depends on intermediate shuffle data that was stored on the failed executor, the scheduler re-executes the upstream stage that produced that data. - **Stage-level recovery:** If a shuffle output is lost because the executor that stored it has failed, the scheduler must re-execute the entire upstream stage to regenerate the shuffle data. This is the most expensive failure mode and is the primary motivation for persisting shuffle data to durable storage in production deployments. ### Resource Scheduling Resource-aware scheduling is essential for stable cluster operation. Ballista's scheduler tracks each executor's resource utilization and enforces constraints: - **Memory-based admission control:** The scheduler estimates the memory requirements of each task based on the physical plan operators (e.g., hash joins require memory proportional to the build side). Tasks are assigned to executors that have sufficient free memory. - **Slot-based concurrency:** Each executor advertises a fixed number of task slots (typically equal to the number of CPU cores). The scheduler does not assign more tasks than an executor has slots, preventing CPU oversubscription. - **Data locality preferences:** When a task reads data from a specific storage location, the scheduler prefers executors that are co-located with that data. This reduces network I/O for the initial scan stage. If no co-located executor has available capacity, the scheduler falls back to a remote executor. These mechanisms work together to keep executor utilization high while avoiding overload conditions that would cause task failures or performance degradation. Spark is a comprehensive, JVM-based distributed data processing framework with mature libraries for batch, streaming, ML, and graph processing. Ballista is a focused, Rust-native distributed SQL query engine built on Apache DataFusion and Arrow. Ballista has lower memory overhead, faster startup, and native Arrow integration, but Spark has a larger ecosystem, broader language support, and years of production hardening at scale.

    ', }, { title: 'How does Ballista compare to Trino?', paragraph: '

    Both are distributed SQL query engines with scheduler-worker architectures. Trino is written in Java, uses its own internal data format, and is designed for standalone cluster deployment with a mature connector ecosystem. Ballista is written in Rust, uses Apache Arrow natively, and is designed to be embeddable. Trino is more production-proven; Ballista offers Rust-native performance and tighter Arrow integration.

    ', }, { title: 'Is Ballista production ready?', paragraph: '

    Ballista is under active development as an incubating Apache project. It is suitable for experimental and early production workloads. For mission-critical production deployments that require mature fault tolerance, operational tooling, and SLA guarantees, teams should evaluate Ballista alongside established alternatives like Trino and Spark, considering their specific requirements for maturity versus native Arrow and Rust integration.

    ', }, { title: 'What is the relationship between Ballista and DataFusion?', paragraph: '

    Ballista is built on top of DataFusion. DataFusion provides the single-node SQL query engine -- parsing, planning, optimization, and execution. Ballista adds the distributed layer -- scheduling, partitioning, shuffles, and inter-node coordination. Each executor node in a Ballista cluster runs DataFusion locally. DataFusion extensions (custom table providers, UDFs, optimizer rules) work in Ballista without modification.

    ', }, { title: 'What are common use cases for distributed SQL query engines?', paragraph: '

    Distributed query engines are used when single-node capacity is insufficient: scanning terabytes of data within latency budgets, running complex joins across very large tables, parallelizing expensive aggregations, and serving concurrent analytical queries that exceed single-machine throughput. They are also used for federated queries across geographically distributed data sources where pushing computation to the data is more efficient than centralizing it.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Apache DataFusion? URL: https://spice.ai/learn/apache-datafusion Date: 2026-01-22T00:00:00 Description: Apache DataFusion is an open-source, extensible SQL query engine written in Rust. Learn how DataFusion works, its architecture, how it compares to Trino and DuckDB, and how teams extend it for production use cases. Building a SQL query engine from scratch is a multi-year effort. Parsing SQL, generating logical plans, optimizing query execution, managing memory, and parallelizing work across cores are all hard problems individually. Combining them into a reliable, performant system is harder still. Apache DataFusion provides a production-quality implementation of all of these components as an embeddable Rust library. Instead of building a query engine from zero, developers embed DataFusion and extend it with custom table providers, user-defined functions, and optimizer rules specific to their use case. The result is a fully featured SQL engine tailored to a particular domain, built in weeks instead of years. ## Core Architecture DataFusion processes a SQL query through a well-defined pipeline: parsing, planning, optimization, and execution. Each stage is modular and extensible. ### SQL Parsing and Logical Planning When a SQL query arrives, DataFusion parses it into an abstract syntax tree (AST) and then converts the AST into a logical plan. The logical plan is a tree of relational algebra operations -- scans, filters, projections, joins, aggregations, sorts -- that describe what the query computes without specifying how to compute it. For example, the query: ```sql SELECT customer_name, SUM(amount) FROM orders WHERE created_at > '2026-01-01' GROUP BY customer_name ORDER BY SUM(amount) DESC LIMIT 10 ``` Produces a logical plan roughly equivalent to: ``` Limit (10) Sort (SUM(amount) DESC) Aggregate (GROUP BY customer_name, SUM(amount)) Filter (created_at > '2026-01-01') Scan (orders) ``` ### Query Optimization DataFusion applies a series of optimization passes to the logical plan. These include: - **Predicate pushdown:** Moving filter expressions closer to the data source so less data is read - **Projection pushdown:** Eliminating columns that are not needed by downstream operators - **Constant folding:** Evaluating constant expressions at planning time rather than execution time - **Join reordering:** Selecting the most efficient join order based on available statistics - **Common subexpression elimination:** Computing repeated expressions once and reusing the result The optimizer is rule-based and extensible. Developers can register custom optimization rules that apply domain-specific transformations. For example, a [federation engine](/learn/sql-federation) can add rules that push certain operations down to remote data sources. ### Physical Planning and Execution After optimization, the logical plan is converted to a physical plan that specifies the actual execution strategy: which join algorithm to use (hash join, sort-merge join, nested loop), how to partition work across threads, and how to manage memory. Execution produces a stream of Apache Arrow record batches. Arrow is a columnar in-memory format that enables zero-copy data exchange between operators and between systems. Because DataFusion is built natively on Arrow, there is no serialization or deserialization overhead between planning and execution -- the data stays in Arrow format throughout. ## Key Features ### Full SQL Support DataFusion supports a comprehensive subset of SQL, including: - Standard `SELECT`, `INSERT`, `UPDATE`, `DELETE` statements - `JOIN` (inner, left, right, full outer, cross, semi, anti) - Window functions (`ROW_NUMBER`, `RANK`, `LAG`, `LEAD`, etc.) - Common table expressions (CTEs) with `WITH` clauses - Subqueries and correlated subqueries - `UNION`, `INTERSECT`, `EXCEPT` set operations - `GROUP BY`, `HAVING`, `ORDER BY`, `LIMIT`, `OFFSET` ### Extensibility DataFusion's primary design goal is extensibility. The key extension points are: **Custom table providers** allow DataFusion to query any data source. A table provider implements the `TableProvider` trait, telling DataFusion how to scan data from a specific source. Out of the box, DataFusion includes providers for Parquet, CSV, JSON, and Arrow IPC files. Custom providers can connect to databases, APIs, object stores, or any other data source. **User-defined functions (UDFs)** extend DataFusion's expression language. Scalar UDFs operate on individual rows, aggregate UDFs operate on groups of rows, and window UDFs operate over window frames. UDFs are registered with the session context and can be used in SQL queries like built-in functions. **Custom optimizer rules** allow developers to add domain-specific optimizations. For example, a federated query engine can add rules that detect when a filter or aggregation can be pushed down to a remote source and rewrite the plan accordingly. **Custom physical plan operators** allow developers to implement new execution strategies. For example, a distributed query engine can replace DataFusion's local join operator with a distributed shuffle-join that partitions data across multiple nodes. ### Native Arrow Integration DataFusion operates on Apache Arrow arrays throughout the entire pipeline. This means: - No serialization overhead between operators - Zero-copy data exchange with other Arrow-based systems - Compatibility with the broader Arrow ecosystem (PyArrow, Arrow Flight, etc.) - Efficient SIMD operations on columnar data ## DataFusion vs. Other Query Engines ### DataFusion vs. DuckDB DuckDB is an embedded analytical database -- a "SQLite for analytics." Like DataFusion, it is designed for in-process analytical queries. The key difference is in extensibility and architecture. DuckDB is a complete, self-contained database with its own storage engine, transaction manager, and query executor. It is written in C++ and provides a SQL interface with minimal configuration. DataFusion is a query engine library, not a database. It does not include a storage engine or transaction manager. Instead, it provides the query planning and execution components that developers embed into their own systems. DataFusion is written in Rust and designed to be extended with custom table providers, UDFs, and optimizer rules. Choose DuckDB when you need a self-contained analytical database. Choose DataFusion when you are building a custom data system and need an embeddable, extensible query engine as a foundation. ### DataFusion vs. Trino (Presto) Trino (formerly Presto) is a distributed SQL query engine designed for federated queries across data warehouses and data lakes. It is deployed as a standalone cluster of coordinator and worker nodes. DataFusion is a single-node, embeddable library. It does not include built-in distributed execution (though [Apache Ballista](/learn/apache-ballista) adds distributed capabilities on top of DataFusion). Trino is a production-ready distributed system with its own cluster management, fault tolerance, and resource scheduling. Choose Trino when you need a standalone, distributed federated query engine with its own cluster infrastructure. Choose DataFusion when you need an embeddable query engine that you control and extend within your own application. ### DataFusion vs. Apache Spark SQL Spark SQL is the SQL interface to Apache Spark, a distributed data processing framework. Spark is designed for large-scale batch processing and runs on JVM-based cluster infrastructure (YARN, Mesos, Kubernetes). DataFusion is a lightweight, Rust-native library with no JVM dependency. It is designed for low-latency, in-process query execution rather than large-scale distributed batch processing. DataFusion's startup time is measured in milliseconds; Spark's is measured in seconds to minutes. Choose Spark when you need large-scale distributed batch processing with a mature ecosystem. Choose DataFusion when you need a lightweight, low-latency query engine embedded in a Rust or Python application. ## How Spice Extends DataFusion [Spice](/platform/sql-federation-acceleration) uses DataFusion as its core query engine and extends it with several capabilities: ### Custom Table Providers for Federated Data Spice registers custom DataFusion table providers for each connected data source -- PostgreSQL, MySQL, Databricks, Amazon S3, Snowflake, and [30+ others](/integrations). When a query references a table backed by a remote source, the corresponding table provider handles connectivity, dialect translation, and data retrieval. ### Custom Optimizer Rules for Pushdown Spice adds optimizer rules that analyze the query plan and determine which operations can be pushed down to each source. For example, a filter on a PostgreSQL-backed table is rewritten into a `WHERE` clause in the generated PostgreSQL query, so only matching rows are transferred. This minimizes data movement and maximizes source-side performance. ### UDFs for Search and AI Inference Spice extends DataFusion's function library with UDFs that enable [hybrid search](/learn/hybrid-search) (full-text and vector search within SQL), AI model inference, and embedding generation. These functions are available in standard SQL queries: ```sql SELECT id, content, search_score FROM documents WHERE search(content, 'deployment strategies', 'hybrid') ORDER BY search_score DESC LIMIT 10 ``` ### Integration with the Acceleration Layer When query acceleration is enabled, Spice stores cached data in [Vortex](/learn/vortex) format and exposes it through custom DataFusion table providers. The query optimizer can push filters and projections directly into the Vortex storage layer, enabling sub-second query performance over locally cached data. ## The DataFusion Ecosystem DataFusion's embeddable design has led to a growing ecosystem of projects built on top of it: - **Spice:** [SQL federation](/learn/sql-federation), acceleration, and AI inference engine - **Apache Ballista:** [Distributed query execution](/learn/apache-ballista) layer for DataFusion - **InfluxDB 3.0:** Time-series database rebuilt on DataFusion and Arrow - **Apache Comet:** Spark-compatible query accelerator using DataFusion - **Delta-rs:** Delta Lake implementation in Rust with DataFusion integration - **GlareDB:** SQL interface for querying across databases and data lakes This ecosystem demonstrates DataFusion's value proposition: rather than each project building its own SQL parser, optimizer, and execution engine, they share a common, well-tested foundation and focus on their differentiating features. ## Advanced Topics ### The Query Pipeline in Detail A SQL query passes through a series of well-defined stages before producing results. Understanding this pipeline is essential for developers who need to extend or debug DataFusion behavior. ```mermaid flowchart LR SQL["SQL String"] --> Parse["Parse"] Parse --> LP["Logical Plan"] LP --> Optimize["Optimize"] Optimize --> OLP["Optimized\nLogical Plan"] OLP --> Physical["Physical\nPlanning"] Physical --> PP["Physical Plan"] PP --> Execute["Execute"] Execute --> Arrow["Arrow\nRecord Batches"] ``` The parser converts a SQL string into a logical plan tree. The optimizer applies a sequence of rule-based passes -- predicate pushdown, projection pruning, join reordering, and others -- to produce an optimized logical plan. The physical planner then selects concrete execution strategies (e.g., hash join vs. sort-merge join) and generates a physical plan. Finally, the execution engine evaluates the physical plan and streams results as Apache Arrow record batches. Each stage is independently extensible. Developers can register custom analyzer rules (which run before optimization), custom optimizer rules (which transform the logical plan), and custom physical plan nodes (which implement new execution strategies). ### The Catalog System DataFusion organizes data through a three-level naming hierarchy: catalog, schema, and table. A `SessionContext` holds a default catalog that contains one or more schemas, each of which contains tables. When a query references `orders`, DataFusion resolves it through this hierarchy -- by default, `datafusion.public.orders`. The catalog system is trait-based and fully replaceable. Developers implement the `CatalogProvider`, `SchemaProvider`, and `TableProvider` traits to integrate their own metadata stores. For example, a [SQL federation](/learn/sql-federation) engine can register a catalog provider that discovers schemas and tables dynamically from a remote database's information schema, making remote tables queryable as if they were local. The `TableProvider` trait is the most commonly extended interface. It defines how DataFusion scans data from a source, what statistics are available for the optimizer, what filters can be pushed down to the source, and what the schema of the data is. ### Custom Execution Plans When the built-in physical plan operators are not sufficient, developers create custom `ExecutionPlan` implementations. A custom execution plan node participates in the standard pipeline: it receives input partitions, applies its logic, and produces output partitions as Arrow record batch streams. Common use cases for custom execution plans include remote execution (sending part of a query to an external system and reading results back as Arrow), custom caching (materializing intermediate results for reuse across queries), and specialized operators (e.g., a graph traversal operator or a time-series interpolation operator that has no SQL equivalent). Custom execution plans integrate with DataFusion's partition-aware execution model. They declare how many output partitions they produce, whether they require a specific input partitioning, and how they distribute work across threads. ### Memory Management and Spilling DataFusion uses a `MemoryPool` abstraction to track and limit memory consumption during query execution. Operators that accumulate state -- hash joins, hash aggregations, sorts -- register their memory usage with the pool. When an operator's allocation request would exceed the configured memory limit, DataFusion triggers spilling: the operator writes its in-memory state to temporary files on disk and continues processing with reduced memory. The spilling mechanism is critical for handling queries that process more data than available memory. Hash joins spill their build-side partitions, sorts spill sorted runs, and aggregations spill partial aggregate state. When the spilled data is needed, it is read back from disk and merged. This enables DataFusion to process arbitrarily large datasets within a fixed memory budget, albeit with the performance trade-off of disk I/O. Developers can implement custom `MemoryPool` strategies to match their deployment constraints -- for example, a pool that reserves memory for concurrent queries or one that integrates with an external resource manager. DuckDB is a self-contained embedded analytical database with its own storage engine and transaction manager. DataFusion is a query engine library without built-in storage -- it provides SQL parsing, planning, optimization, and execution that developers embed into their own systems. DuckDB is designed for end users who want a ready-to-use analytical database. DataFusion is designed for developers building custom data systems who need an extensible query engine as a foundation.

    ', }, { title: 'Is Apache DataFusion production ready?', paragraph: '

    Yes. DataFusion is used in production by multiple companies and projects, including Spice, InfluxDB 3.0, and Apache Comet. It is an Apache Software Foundation project with an active contributor community, regular releases, and comprehensive test coverage. Its Rust implementation provides memory safety and predictable performance characteristics suited for production workloads.

    ', }, { title: 'How does DataFusion compare to Trino?', paragraph: '

    Trino is a standalone, distributed SQL query engine deployed as a cluster of coordinator and worker nodes. DataFusion is a single-node, embeddable library. Trino includes built-in cluster management, fault tolerance, and resource scheduling. DataFusion provides query planning and execution components that developers embed and extend within their own applications. Trino is designed for standalone deployment; DataFusion is designed for embedding.

    ', }, { title: 'How do you extend DataFusion with custom functionality?', paragraph: '

    DataFusion provides several extension points: custom table providers (to query any data source), user-defined functions (scalar, aggregate, and window), custom optimizer rules (to add domain-specific plan transformations), and custom physical plan operators (to implement new execution strategies). Extensions are implemented as Rust traits and registered with the DataFusion session context.

    ', }, { title: 'What is the relationship between DataFusion and Apache Arrow?', paragraph: '

    DataFusion is part of the Apache Arrow ecosystem and operates on Arrow arrays throughout its entire query pipeline. Arrow provides the columnar in-memory data format; DataFusion provides the SQL query engine that operates on that format. This native integration means zero serialization overhead between operators and zero-copy data exchange with other Arrow-based systems like PyArrow, Arrow Flight, and Ballista.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is BM25 Full-Text Search? URL: https://spice.ai/learn/bm25-full-text-search Date: 2026-03-05T00:00:00 Description: BM25 (Best Match 25) is the standard ranking function for full-text search. Learn how BM25 scores documents using term frequency, inverse document frequency, and document length normalization, and how it compares to TF-IDF. Full-text search is the foundation of information retrieval. When a user types a query, the search system must quickly find the most relevant documents from potentially millions of candidates and rank them in order of relevance. BM25 -- Best Match 25 -- is the ranking function that powers this process in virtually every modern search engine, from Elasticsearch and Apache Lucene to PostgreSQL's full-text search. BM25 works by scoring each document based on three factors: how often query terms appear in the document (term frequency), how rare those terms are across the entire corpus (inverse document frequency), and how long the document is relative to the average (document length normalization). These three signals combine to produce a relevance score that is remarkably effective across a wide range of search tasks. ## How BM25 Scores Documents ### Term Frequency (TF) The simplest relevance signal is how many times a query term appears in a document. A document mentioning "kubernetes" ten times is probably more relevant to a query about Kubernetes than one mentioning it once. But raw term frequency has a problem: the tenth occurrence of a term adds less relevance than the first. BM25 addresses this with a saturation function -- term frequency contributes to the score with diminishing returns, controlled by the parameter **k1** (typically set to 1.2). Higher k1 values allow term frequency to keep contributing longer before saturating. ### Inverse Document Frequency (IDF) Not all terms are equally informative. A search for "kubernetes deployment error" should weight "kubernetes" and "deployment" more heavily than "error," because "error" appears in many more documents and is less discriminating. IDF measures term rarity across the corpus. Terms that appear in few documents get high IDF scores; terms that appear everywhere get low scores. BM25 uses a logarithmic IDF formula: ``` IDF(t) = log((N - df(t) + 0.5) / (df(t) + 0.5) + 1) ``` Where **N** is the total number of documents and **df(t)** is the number of documents containing term **t**. ### Document Length Normalization Longer documents naturally contain more term occurrences, but that doesn't make them more relevant. A 10,000-word document mentioning "kubernetes" five times is likely less focused on the topic than a 500-word document with the same count. BM25 normalizes for document length using the parameter **b** (typically set to 0.75, range 0 to 1). When b = 1, full length normalization is applied -- longer documents are penalized proportionally. When b = 0, no normalization is applied. The normalization compares each document's length to the average document length in the corpus. ### The BM25 Formula Putting it together, the BM25 score for a document **D** given a query **Q** with terms **q1, q2, ..., qn** is: ``` BM25(D, Q) = sum(IDF(qi) * (tf(qi, D) * (k1 + 1)) / (tf(qi, D) + k1 * (1 - b + b * (|D| / avgdl)))) ``` Where: - **tf(qi, D)** is the frequency of term qi in document D - **|D|** is the document length - **avgdl** is the average document length across the corpus - **k1** controls term frequency saturation (default: 1.2) - **b** controls document length normalization (default: 0.75) ## How Inverted Indexes Power Full-Text Search BM25 scoring is only half the story. The other half is the data structure that makes it possible to score documents quickly: the **inverted index**. ### Tokenization Before indexing, raw text is broken into tokens. The sentence "BM25 handles full-text search" might tokenize into: `["bm25", "handles", "full", "text", "search"]`. Tokenization typically includes lowercasing, punctuation removal, and often stemming (reducing words to their root form -- "searching" becomes "search"). ### Posting Lists The inverted index maps each unique token to a **posting list** -- a sorted list of document IDs where that token appears, along with term frequency and position information: ``` "kubernetes" → [(doc_3, tf=5), (doc_17, tf=2), (doc_42, tf=8), ...] "deployment" → [(doc_3, tf=3), (doc_8, tf=1), (doc_17, tf=6), ...] ``` ### Query Execution When a query arrives, the search engine: 1. Tokenizes the query into terms 2. Looks up the posting list for each term 3. Intersects or unions the posting lists (depending on AND/OR semantics) 4. Computes BM25 scores for candidate documents 5. Returns the top-k documents sorted by score This process is fast because posting lists are pre-computed at index time. A multi-term query only needs to scan the posting lists for its specific terms rather than every document in the corpus. ```mermaid flowchart LR A[Query] --> B[Tokenize] B --> C[Lookup Inverted Index] C --> D[BM25 Score] D --> E[Rank] E --> F[Results] ``` ## BM25 vs. TF-IDF TF-IDF (Term Frequency -- Inverse Document Frequency) is the predecessor to BM25. Both use term frequency and inverse document frequency, but BM25 improves on TF-IDF in two important ways: 1. **Term frequency saturation:** TF-IDF uses raw or log-scaled term frequency, which continues to grow with more occurrences. BM25's saturation function ensures that additional occurrences of a term contribute diminishing marginal relevance, which better reflects how humans judge relevance. 2. **Document length normalization:** TF-IDF has no built-in mechanism for adjusting scores based on document length. BM25's b parameter provides a tunable normalization that penalizes longer documents appropriately. These differences make BM25 consistently more effective in benchmarks and real-world search applications. TF-IDF is still used in some contexts (notably, as a feature in machine learning pipelines), but BM25 is the default choice for document ranking. ## Full-Text Search in SQL Databases vs. Dedicated Search Engines Full-text search is available in both SQL databases and dedicated search engines, with different tradeoffs: **SQL databases** (PostgreSQL, MySQL) provide built-in full-text search using `tsvector` / `tsquery` (PostgreSQL) or `MATCH ... AGAINST` (MySQL). This is convenient -- no additional infrastructure -- but limited. SQL full-text search typically offers basic BM25-like scoring, limited tokenization options, and slower performance at scale compared to dedicated engines. **Dedicated search engines** (Elasticsearch, Apache Solr, Meilisearch) are purpose-built for search. They provide advanced tokenization and analyzers, configurable BM25 parameters, faceting, highlighting, suggestions, and horizontal scaling through index sharding. The tradeoff is operational complexity -- another system to deploy, monitor, and keep in sync with your primary data store. ## When BM25 Falls Short BM25 is powerful but limited to lexical matching -- it can only find documents that contain the exact terms in the query. This creates a fundamental problem called **vocabulary mismatch**: - "How do I cancel my subscription?" won't match a document titled "Account termination guide" - "Fix slow database" won't match "Query performance optimization" - "ML model serving" won't match "Machine learning inference deployment" When vocabulary mismatch is a significant problem, consider [hybrid search](/learn/hybrid-search), which combines BM25 full-text search with [vector search](/learn/vector-search) to capture both exact term matches and semantic similarity. Hybrid search uses [embeddings](/learn/embeddings) to understand meaning, while BM25 handles the precise keyword matching that vector search misses. ## BM25 Full-Text Search with Spice [Spice](/platform/hybrid-sql-search) integrates BM25 full-text search alongside [vector search](/learn/vector-search) and [SQL federation](/learn/sql-federation) in a single unified runtime. Rather than deploying a separate search engine and keeping it synchronized with your data sources, Spice provides full-text search as a native query capability. This means you can: - Run BM25 full-text search across federated data from [30+ connected sources](/integrations) without maintaining separate search infrastructure - Combine full-text search with vector similarity in [hybrid search](/learn/hybrid-search) queries using built-in RRF fusion - Use [real-time CDC](/learn/change-data-capture) to keep search indexes fresh as source data changes - Express search queries in standard SQL alongside your existing analytical and operational queries ```sql -- Full-text search in Spice SELECT * FROM search( 'knowledge_base', 'kubernetes deployment error', mode => 'fts', limit => 10 ) ``` The unified approach eliminates the data synchronization problem -- when source data changes, both full-text and vector indexes update through the same [change data capture](/learn/change-data-capture) pipeline. This is especially valuable for [RAG applications](/learn/retrieval-augmented-generation) where stale search indexes lead to outdated or incorrect answers. ## Advanced Topics ### Stemming and Analyzers Tokenization is more nuanced than splitting on whitespace. **Analyzers** are configurable pipelines that process text before indexing and at query time. A typical analyzer includes: 1. **Character filters:** Normalize Unicode, strip HTML tags, or replace patterns 2. **Tokenizer:** Split text into tokens (by whitespace, word boundaries, or n-grams) 3. **Token filters:** Lowercase, remove stop words, apply stemming or lemmatization **Stemming** reduces words to their root form so that "running," "runs," and "ran" all match the stem "run." Common stemmers include the Porter Stemmer (aggressive, fast) and the Snowball Stemmer (language-aware, more accurate). Stemming improves recall but can reduce precision -- "university" and "universe" might stem to the same root. ### Index Sharding For large corpora, a single inverted index becomes a bottleneck. **Sharding** splits the index across multiple nodes, where each shard holds a subset of documents. A query is broadcast to all shards, each returns its local top-k results, and a coordinator merges the results. Sharding strategies include document-based sharding (each shard holds a random subset of documents) and term-based sharding (each shard holds posting lists for a subset of terms). Document-based sharding is more common because it distributes load evenly and allows each shard to compute local BM25 scores independently. ### BM25F for Multi-Field Scoring Documents often have multiple fields -- title, body, URL, metadata. A match in the title is typically more relevant than a match in the body. **BM25F** (BM25 with field weights) extends BM25 to handle multi-field documents by computing a weighted combination of term frequencies across fields before applying the BM25 formula. For example, you might weight title matches 3x higher than body matches and URL matches 2x higher. BM25F computes an effective term frequency as `tf_effective = w_title * tf_title + w_body * tf_body + w_url * tf_url`, then applies the standard BM25 formula using this combined frequency. This produces better rankings than scoring each field independently and combining scores, because the saturation function is applied to the combined frequency rather than to each field separately. BM25 stands for Best Match 25. It is the 25th iteration of a series of ranking functions developed by Stephen Robertson and Karen Sparck Jones as part of the Okapi information retrieval system in the 1990s. It has become the de facto standard ranking function for full-text search.

    ', }, { title: 'What are the default BM25 parameters?', paragraph: '

    The two main BM25 parameters are k1 and b. The typical defaults are k1 = 1.2 (controls term frequency saturation -- higher values allow term frequency to contribute more) and b = 0.75 (controls document length normalization -- higher values penalize longer documents more). These defaults work well for most use cases, but tuning them on your specific data can improve results.

    ', }, { title: 'How does BM25 differ from TF-IDF?', paragraph: '

    BM25 improves on TF-IDF in two key ways: it applies a saturation function to term frequency (so the 10th occurrence of a term adds less relevance than the first), and it normalizes for document length (so longer documents are not unfairly favored). These improvements make BM25 consistently more effective for document ranking in practice.

    ', }, { title: 'When should I use BM25 vs. vector search?', paragraph: '

    Use BM25 when queries involve exact terms, identifiers, error codes, or technical terminology that must be matched precisely. Use vector search when queries are conceptual and vocabulary mismatch is likely. For most production systems, hybrid search -- combining BM25 and vector search -- delivers the best results by capturing both exact matches and semantic similarity.

    ', }, { title: 'Can BM25 handle multi-language search?', paragraph: '

    Yes, but it requires language-specific configuration. Each language needs appropriate tokenization rules, stop word lists, and stemming algorithms. Most search engines support multiple language analyzers that can be applied per-field or per-index. For multilingual corpora, you can either maintain separate indexes per language or use a language-detection step to route queries to the correct analyzer.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Change Data Capture (CDC)? URL: https://spice.ai/learn/change-data-capture Date: 2025-12-19T00:00:00 Description: Change data capture (CDC) tracks row-level changes in databases and streams them in real time. Learn how CDC works, common patterns, and how it enables real-time data pipelines. Keeping data synchronized across systems is one of the hardest problems in distributed architectures. A customer updates their shipping address in the transactional database, but the analytics dashboard still shows the old one. A product price changes, but the search index serves stale results for hours. An AI model generates answers based on yesterday's data because the vector index hasn't been refreshed. The traditional solution is batch ETL: extract all data on a schedule, transform it, and load it into downstream systems. This works, but it introduces latency (minutes to hours), wastes resources (re-extracting unchanged data), and adds fragile pipeline infrastructure to maintain. Change data capture (CDC) solves this by streaming only the rows that changed, as they change. Instead of periodic bulk extracts, CDC monitors the database's internal change log and delivers a continuous stream of insert, update, and delete events to downstream consumers. The result is near-real-time data synchronization with minimal impact on the source database. ## How CDC Works: Three Implementation Patterns CDC can be implemented at different levels of the database stack. Each approach makes different tradeoffs between reliability, latency, and source impact. ### Log-Based CDC Log-based CDC reads the database's transaction log -- the internal record of every committed change. In PostgreSQL, this is the Write-Ahead Log (WAL). In MySQL, it's the binary log (binlog). In SQL Server, it's the transaction log. This is the preferred approach for production workloads because: - **Zero application changes:** The transaction log already exists. CDC reads it asynchronously without modifying queries, adding triggers, or changing schemas. - **Complete capture:** Every committed change is captured, including deletes. Nothing is missed. - **Minimal source impact:** Reading the log is an asynchronous, read-only operation. It adds negligible overhead to the source database. - **Ordering guarantees:** Changes are read in commit order, preserving transactional consistency. The main limitation is that transaction log formats are database-specific. Each database requires its own CDC connector, and log retention policies must be configured to keep logs available long enough for the CDC process to read them. ### Trigger-Based CDC Database triggers fire on insert, update, or delete operations and write change records to a separate tracking table. An external process then reads the tracking table and forwards changes to downstream systems. Trigger-based CDC works on databases that don't expose their transaction logs (some older or proprietary systems), but it has significant drawbacks: - **Write overhead:** Every write operation on the source table triggers an additional write to the tracking table, increasing latency and I/O - **Schema maintenance:** The tracking table must be maintained alongside the source schema - **Performance impact:** Under high write loads, triggers can become a bottleneck ### Polling-Based CDC A process periodically queries the source table using a timestamp column (`updated_at`) or incrementing sequence number to detect new or changed rows. This is the simplest approach to implement but the least reliable: - **Misses deletes:** Without a soft-delete pattern, there's no way to detect that a row was removed - **Latency proportional to poll interval:** A 5-minute polling interval means changes are at least 5 minutes stale - **Source load:** Frequent polling adds query load to the source database Polling-based CDC is useful for prototyping or for sources that support no other mechanism, but it's generally not suitable for production real-time workloads. ## CDC Pipeline Architecture A production CDC pipeline has three components: the change capture mechanism, a transport layer, and downstream consumers. ### Capture The CDC connector monitors the source database and emits a stream of change events. Each event includes: - **Operation type:** INSERT, UPDATE, or DELETE - **Before state:** The row values before the change (for updates and deletes) - **After state:** The row values after the change (for inserts and updates) - **Metadata:** Timestamp, transaction ID, source table, and schema information ### Transport Change events are typically published to a message broker or streaming platform -- Kafka is the most common choice. The transport layer provides durability (events aren't lost if a consumer is temporarily offline), ordering (events from the same table are delivered in commit order), and fan-out (multiple consumers can independently read the same stream). For simpler architectures, CDC can also be consumed directly without a message broker. Some systems, like Spice, provide built-in CDC consumption that eliminates the need for a separate streaming layer. ### Consumers Downstream systems consume change events and update their local state. Common consumers include: - **Analytics databases:** Keep analytical copies synchronized with transactional sources - **Search indexes:** Update Elasticsearch or OpenSearch indexes as data changes - **Cache layers:** Invalidate or refresh Redis or Memcached entries when source data changes - **Vector indexes:** Re-embed and re-index documents for [RAG systems](/learn/retrieval-augmented-generation) as content is updated - **Acceleration layers:** Refresh local query caches used by [SQL federation](/learn/sql-federation) engines ## CDC Use Cases ### Real-Time Analytics and Dashboards Stream changes from transactional databases into analytical systems so dashboards and reports reflect the current state of the business. Instead of waiting for the next ETL batch, every change is visible within seconds. This is particularly important for operational dashboards -- monitoring inventory levels, tracking order fulfillment, or observing system health -- where stale data leads to wrong decisions. ### Cache and Search Index Synchronization Application caches (Redis, Memcached) and search indexes (Elasticsearch, OpenSearch) go stale when source data changes. Without CDC, teams resort to time-based expiration (which causes periodic staleness) or manual invalidation logic (which is error-prone and hard to maintain). CDC automates this entirely: when a row changes in the source database, the corresponding cache entry or search index document is updated within seconds. No manual invalidation, no stale reads. ### AI and RAG Pipeline Freshness [Retrieval-augmented generation](/learn/retrieval-augmented-generation) systems depend on vector indexes that represent the current state of source data. If the vector index is rebuilt nightly, every answer is at least a day stale. CDC enables incremental index updates: when a document changes in the source database, only that document is re-embedded and re-indexed. This keeps RAG retrieval fresh without the cost of full re-indexing. ### Event-Driven Microservices CDC turns database changes into a stream of events that microservices can react to. Instead of services polling each other for updates, changes propagate automatically through the event stream. This pattern -- sometimes called the "outbox pattern" -- decouples services while ensuring they stay synchronized. ### Data Lake Ingestion Continuously stream changes from operational databases into data lakes (S3, GCS, Azure Blob) in formats like Apache Parquet or Apache Iceberg. This replaces batch export jobs and ensures the data lake reflects the current state of operational systems. ## CDC Best Practices ### Schema Evolution Handling Source database schemas change over time -- columns are added, data types are modified, tables are renamed. A robust CDC pipeline must handle these changes gracefully. The most common approaches are: - **Schema registries** that track schema versions and ensure consumers can handle multiple versions - **Automatic schema migration** where the CDC pipeline detects changes and applies them downstream - **Backward-compatible changes** enforced by policy, so consumers always handle the latest schema ### Monitoring and Alerting CDC pipelines should be monitored for: - **Lag:** The time between when a change is committed at the source and when it's applied downstream. Increasing lag indicates the pipeline is falling behind. - **Error rates:** Failed events that couldn't be applied downstream - **Log retention:** If the source database's transaction log is truncated before CDC reads it, changes are lost permanently ### Initial Load When a CDC pipeline is first set up, the downstream system needs a full snapshot of the current source data. This "initial load" or "snapshot" must be coordinated with the CDC stream to avoid duplicates or gaps. Most production CDC tools handle this automatically. ## CDC with Spice [Spice](/feature/real-time-change-data-capture) uses CDC to keep accelerated datasets synchronized with source systems. When source data changes, the CDC pipeline detects the change and updates the local acceleration cache within seconds. This enables [federated SQL queries](/learn/sql-federation) that are both real-time (reflecting the latest source state) and fast (served from local acceleration). Spice supports CDC from PostgreSQL, MySQL, and other common sources, with built-in change detection that eliminates the need for a separate streaming infrastructure like Kafka for many use cases. ## Advanced Topics ### WAL Internals and Logical Replication Log-based CDC in PostgreSQL reads from the Write-Ahead Log (WAL), which is the database's crash-recovery mechanism. Every committed transaction is first written to the WAL before being applied to the actual data files. CDC leverages this by attaching a logical replication slot to the WAL, which tells PostgreSQL to retain log segments until the CDC consumer has acknowledged them. A logical replication slot decodes the raw WAL bytes into structured change events using an output plugin (e.g., `pgoutput` or `wal2json`). The output plugin determines the format of change events -- whether they include full row images, old values for updated columns, or just the changed fields. Configuring `REPLICA IDENTITY FULL` on a table ensures that UPDATE and DELETE events include the complete before-state of the row, which is critical for consumers that need to maintain materialized views or detect specific field-level changes. The key operational concern with WAL-based CDC is slot management. If a CDC consumer goes offline for an extended period, the replication slot prevents PostgreSQL from reclaiming WAL segments. This can cause disk usage to grow unbounded, eventually filling the disk and crashing the database. Production CDC deployments must monitor replication slot lag and set maximum retention policies to prevent this failure mode. ```mermaid sequenceDiagram participant App as Application participant DB as PostgreSQL participant WAL as Write-Ahead Log participant Slot as Replication Slot participant CDC as CDC Consumer participant Target as Downstream Target App->>DB: INSERT / UPDATE / DELETE DB->>WAL: Write transaction record DB-->>App: Acknowledge commit Slot->>WAL: Read from last confirmed LSN WAL-->>Slot: Raw WAL bytes Slot->>CDC: Decoded change event CDC->>Target: Apply change CDC->>Slot: Confirm LSN processed ``` ### Exactly-Once Delivery Semantics Distributed systems offer three delivery guarantees: at-most-once (changes may be lost), at-least-once (changes may be duplicated), and exactly-once (each change is applied precisely once). CDC pipelines must handle the gap between at-least-once delivery (which most transport layers provide) and exactly-once semantics (which consumers require). The standard approach is idempotent consumers. Instead of trying to guarantee that each change event is delivered exactly once (which is impractical in distributed systems), the consumer is designed so that applying the same event multiple times produces the same result. For database targets, this means using `UPSERT` (INSERT ... ON CONFLICT UPDATE) instead of plain INSERT. For search indexes, it means writing documents with deterministic IDs so that re-applying an update overwrites the previous version. When idempotency is insufficient -- for example, when the consumer maintains counters or running aggregates -- the consumer must track its position in the change stream (the Log Sequence Number, or LSN, in PostgreSQL terms) and store it transactionally alongside the applied changes. On recovery, the consumer resumes from its last committed LSN, ensuring no events are processed twice. ### CDC at Scale Scaling CDC introduces challenges that don't exist in single-source deployments. When dozens or hundreds of tables must be captured simultaneously, the CDC system must manage connection limits, replication slot resources, and downstream throughput. Partitioning the change stream by table or by key range allows parallel processing. Events for independent tables can be consumed by separate workers without coordination. Events within a single table can be partitioned by primary key, enabling parallel consumers that each handle a subset of rows -- as long as ordering is maintained within each partition. Backpressure management is critical at scale. If a downstream consumer slows down (due to indexing lag, network congestion, or resource contention), the CDC pipeline must buffer events without dropping them and without allowing unbounded memory growth. Production systems use bounded buffers with overflow to persistent storage -- writing excess events to disk when in-memory buffers fill, then draining the disk buffer when the consumer catches up. Monitoring at scale requires tracking per-table lag (the delay between source commit and downstream application), throughput (events per second per table), and error rates. Alerting on lag growth is the most important signal, because increasing lag indicates that the pipeline is falling behind and may eventually lose data if WAL retention is exceeded. ETL extracts data in bulk on a schedule, transforms it, and loads it into a target system. CDC captures only the changes (inserts, updates, deletes) as they happen and streams them continuously. ETL is batch-oriented and introduces latency; CDC is event-driven and near-real-time. Many modern architectures use CDC to replace the "extract" step of traditional ETL.

    ', }, { title: 'Which databases support log-based CDC?', paragraph: '

    Most modern relational databases support log-based CDC: PostgreSQL (via logical replication and WAL), MySQL (via binlog), SQL Server (via its built-in CDC feature), Oracle (via LogMiner or GoldenGate), and MongoDB (via change streams). The specific configuration varies by database.

    ', }, { title: 'Does CDC add overhead to the source database?', paragraph: '

    Log-based CDC adds minimal overhead because it reads the transaction log asynchronously -- it does not modify queries or add triggers. The primary cost is slightly increased disk I/O for log retention. Trigger-based and polling-based CDC add more overhead because they execute additional queries or triggers during write operations.

    ', }, { title: 'How does CDC handle schema changes?', paragraph: '

    Schema changes (adding columns, changing data types) are one of the most challenging aspects of CDC. Log-based CDC systems typically detect schema changes in the transaction log and propagate them downstream. The downstream consumer must handle schema evolution -- for example, by using a schema registry or applying migrations automatically.

    ', }, { title: 'Can CDC be used with data federation?', paragraph: '

    Yes. CDC and data federation are complementary patterns. Federation queries data across sources in real time; CDC keeps local acceleration caches synchronized with those sources. Together, they enable sub-second federated queries backed by always-fresh local data. Spice uses this combination to deliver real-time performance across distributed data sources.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Data Acceleration? URL: https://spice.ai/learn/data-acceleration Date: 2026-01-08T00:00:00 Description: Data acceleration caches frequently accessed data locally for sub-second query performance without moving data permanently. Learn how it works, acceleration strategies, and when to use it. Querying data where it lives -- across data warehouses, object stores, and transactional databases -- is the promise of [SQL federation](/learn/sql-federation). But federation alone has a performance ceiling: every query must travel over the network to the source system, wait for that system to process it, and transfer results back. For latency-sensitive applications, dashboards, and AI workloads, that round-trip can be too slow. Data acceleration solves this by maintaining a local, queryable copy of frequently accessed datasets in a fast engine close to the application. The source system remains the system of record. The acceleration layer handles reads, serving queries in milliseconds instead of seconds or minutes. When source data changes, the acceleration layer is refreshed -- often via [change data capture](/learn/change-data-capture) -- so it stays current. This is not a new concept. Database caching, materialized views, and read replicas all address the same fundamental problem. What distinguishes modern data acceleration is that it works across heterogeneous sources (not just within a single database), integrates with federation engines for transparent query routing, and supports real-time refresh mechanisms that keep cached data fresh. ## How Data Acceleration Works with Federation Data acceleration and SQL federation are complementary patterns that address different parts of the query lifecycle. **Federation** provides unified access. A single SQL query can reach PostgreSQL, S3, Databricks, and 30+ other sources through [Spice's connector ecosystem](/integrations). The federation engine handles connection management, dialect translation, and predicate pushdown. **Acceleration** provides speed. Datasets that are queried frequently or require low latency are cached locally in a fast engine. When a query arrives for an accelerated dataset, the federation engine serves it from the local cache instead of routing it to the remote source. The two patterns work together in a query router: 1. A query arrives at the federation engine 2. The engine checks whether the requested datasets are accelerated locally 3. If yes, the query is served from the local acceleration engine (milliseconds) 4. If no, the query is federated to the remote source (seconds to minutes, depending on the source) This means applications get a single SQL endpoint that transparently handles both accelerated and federated queries. Developers don't need to manage separate connections or caching logic. ## Acceleration Strategies Not all acceleration is the same. The choice of engine, storage medium, and materialization scope determines the performance characteristics and resource requirements. ### In-Memory vs. On-Disk Acceleration **In-memory acceleration** stores data in RAM using columnar formats like Apache Arrow. This delivers the fastest query performance -- sub-millisecond scans on datasets that fit in memory -- but is limited by available RAM and volatile (data is lost on restart unless backed by a persistent store). **On-disk acceleration** stores data on local SSD using embedded engines like DuckDB or SQLite. Performance is slower than in-memory (milliseconds instead of microseconds) but can handle much larger datasets and survives restarts without re-loading from the source. The right choice depends on the workload: - **In-memory:** Real-time dashboards, AI inference pipelines, embedding lookups, and any workload where single-digit-millisecond latency matters - **On-disk:** Analytical queries over large datasets, batch processing, and workloads where durability matters more than raw speed ### Full vs. Partial Materialization **Full materialization** caches the entire dataset locally. Every row from the source table is replicated in the acceleration engine. This approach is simple and ensures every query can be served locally, but it requires enough storage to hold the full dataset and enough bandwidth to keep it synchronized. **Partial materialization** caches only a subset of the data -- typically filtered by time range, partition, or access frequency. For example, an acceleration layer might cache only the last 90 days of order data, while queries for older data are federated to the source warehouse. Partial materialization reduces storage and refresh costs but requires the query router to determine whether a given query can be served from the local cache or must be routed to the source. ## Keeping Accelerated Data Fresh Acceleration is only useful if the cached data is reasonably current. Stale acceleration caches can be worse than no acceleration, because queries return outdated results with no indication that the data is behind. ### Change Data Capture (CDC) Refresh The most effective refresh strategy uses [change data capture](/learn/change-data-capture) to stream row-level changes from the source database to the acceleration layer. When a row is inserted, updated, or deleted at the source, the change event is applied to the local cache within seconds. CDC refresh provides near-real-time freshness with minimal source impact. It works particularly well with transactional databases (PostgreSQL, MySQL) that expose write-ahead logs. ### Scheduled Refresh For sources that don't support CDC -- object stores like S3, REST APIs, or some SaaS platforms -- the acceleration layer refreshes on a schedule. The refresh interval can range from seconds to hours depending on freshness requirements. Scheduled refresh is simpler to implement but introduces a staleness window equal to the refresh interval. A 5-minute schedule means cached data can be up to 5 minutes behind. ### Append-Only Refresh For time-series and event data that is never updated (only new rows are appended), the acceleration layer can use append-only refresh. It tracks the latest timestamp or sequence number and fetches only new rows on each refresh cycle. This is efficient because it avoids re-scanning unchanged data. ## Acceleration Engines The choice of acceleration engine determines query performance, supported SQL features, and resource requirements. ### Apache Arrow Apache Arrow is an in-memory columnar data format designed for analytical processing. It enables zero-copy reads and SIMD-optimized computation, making it one of the fastest options for scan-heavy analytical queries. Arrow is the default in-memory acceleration engine in Spice. Arrow's main limitation is memory: the entire accelerated dataset must fit in RAM. For datasets that exceed available memory, on-disk engines are a better fit. ### DuckDB DuckDB is an embedded analytical database that stores data on disk in a columnar format. It supports a rich SQL dialect (including window functions, CTEs, and complex aggregations) and performs well on analytical workloads over datasets that are too large for in-memory processing. Spice supports DuckDB as an on-disk acceleration engine, making it suitable for multi-gigabyte datasets where in-memory caching is not practical. ### Choosing the Right Engine The decision framework is straightforward: - **Dataset fits in memory + lowest possible latency required:** Use Arrow (in-memory) - **Dataset too large for memory + complex analytical queries:** Use DuckDB (on-disk) - **Mixed workloads:** Use both -- accelerate hot, frequently accessed datasets with Arrow and larger, less latency-sensitive datasets with DuckDB ## When to Accelerate vs. When to Federate Not every dataset benefits from acceleration. The decision depends on query patterns, freshness requirements, and data volume. **Accelerate when:** - The dataset is queried frequently (dashboard queries, API endpoints, AI pipelines) - Low latency is required (sub-second response times) - The source system is slow or expensive to query (data warehouses billed per query, remote object stores) - The dataset is small enough to cache cost-effectively **Federate without acceleration when:** - The dataset is queried infrequently (ad-hoc exploration, one-off reports) - The source system is already fast enough for the use case - Data freshness requirements are strict and CDC is not available - The dataset is too large to cache practically In practice, most production deployments use a mix: hot datasets are accelerated for performance, while the long tail of less-frequently-accessed data is queried via federation on demand. ## Acceleration for AI Workloads AI applications place unique demands on data infrastructure. Models need fast access to embeddings, features, and context data -- often with strict latency budgets measured in milliseconds. ### Embedding Caches [Retrieval-augmented generation](/learn/retrieval-augmented-generation) systems search vector indexes to find relevant context for LLM prompts. Accelerating the embedding store locally eliminates the network round-trip to a remote vector database, reducing retrieval latency from hundreds of milliseconds to single-digit milliseconds. ### Feature Stores Machine learning models consume feature vectors at inference time. Accelerating feature data in an in-memory engine ensures that model serving pipelines can retrieve features without blocking on slow source queries. ### RAG Index Acceleration RAG systems combine vector search with structured data retrieval. Accelerating both the vector index and the associated metadata tables ensures that the full RAG pipeline -- retrieval, context assembly, and LLM prompt construction -- runs within tight latency budgets. ## The Spice Acceleration Architecture [Spice](/platform/sql-federation-acceleration) combines SQL federation and data acceleration in a single runtime. The architecture works as follows: 1. **Define datasets** in a Spicepod configuration file, specifying the source connector and acceleration settings (engine, refresh mode, refresh interval) 2. **Initial load:** Spice reads the full dataset from the source and loads it into the local acceleration engine 3. **Query routing:** Incoming SQL queries are routed to the local acceleration engine for accelerated datasets, or federated to the source for non-accelerated datasets 4. **Refresh:** CDC streams or scheduled refresh cycles keep the acceleration cache synchronized with the source A Spicepod dataset configuration looks like: ```yaml datasets: - from: postgres:orders name: orders acceleration: engine: arrow refresh_mode: changes refresh_check_interval: 1s ``` This configuration connects to a PostgreSQL `orders` table, accelerates it in-memory using Arrow, and refreshes via CDC with a 1-second check interval. Queries against this dataset return in milliseconds, backed by data that is at most seconds behind the source. For workloads that need to scale beyond a single node, [Spice Cayenne](/blog/introducing-spice-cayenne-data-accelerator) provides a next-generation acceleration engine built for high-throughput [data lake](/use-case/datalake-accelerator) workloads. ## Advanced Topics ### Cache Eviction Strategies When the acceleration layer has finite memory or disk capacity, it must decide which data to evict when new data arrives. The eviction strategy directly affects cache hit rates and query performance. **LRU (Least Recently Used)** evicts the data that has not been accessed for the longest time. This works well when query patterns follow temporal locality -- recently accessed datasets are likely to be accessed again soon. LRU is simple to implement but can be defeated by sequential scans: a single large query that touches every cached dataset can flush the entire cache. **LFU (Least Frequently Used)** evicts the data accessed the fewest times. This protects frequently accessed datasets from being evicted by one-off queries, but it can be slow to adapt when access patterns shift -- a dataset that was popular last week keeps its high frequency count even if it is no longer relevant. **TTL (Time-To-Live)** evicts data based on age rather than access patterns. Each cached dataset has a configured TTL, and data is evicted (or marked for refresh) when the TTL expires. TTL-based eviction is common in acceleration layers because it directly controls freshness -- a 5-minute TTL guarantees that cached data is never more than 5 minutes stale. TTL works well in combination with [CDC-based refresh](/learn/change-data-capture), where TTL serves as a fallback eviction mechanism when CDC streams are unavailable. In practice, most acceleration engines combine strategies. For example, Spice's acceleration layer uses CDC or scheduled refresh to keep data current (effectively a freshness-driven policy) and relies on the configured engine's memory management for capacity-based eviction. ### Tiered Storage A single acceleration engine is often not sufficient for diverse workloads. Tiered storage addresses this by placing data in different engines based on access patterns and performance requirements. The typical tiers are: - **Hot tier (in-memory, Arrow):** Datasets queried hundreds or thousands of times per minute -- embedding lookups, real-time dashboard queries, feature store reads. Sub-millisecond latency, limited by RAM. - **Warm tier (on-disk, DuckDB):** Datasets queried regularly but without sub-millisecond requirements -- hourly reports, batch analytics, ad-hoc exploration over moderate-sized datasets. Millisecond-range latency, limited by SSD capacity. - **Cold tier (federated, no acceleration):** Datasets queried infrequently or too large to cache. Queries are federated to the source system on demand. Latency depends on source performance. Tiered storage can be configured statically (the operator assigns each dataset to a tier) or dynamically (the engine promotes and demotes datasets between tiers based on observed access patterns). Static assignment is simpler and more predictable. Dynamic tiering optimizes resource utilization but adds complexity in monitoring and debugging. ### Consistency Models Acceleration introduces a fundamental tradeoff between performance and consistency. The cached copy may lag behind the source, so queries against the acceleration layer may return slightly stale data. **Eventual consistency** is the default model for most acceleration deployments. The acceleration layer is updated asynchronously -- via CDC or scheduled refresh -- and queries may return data that is seconds to minutes behind the source. This model is acceptable for dashboards, analytics, and most AI workloads where slight staleness does not affect correctness. **Read-your-writes consistency** guarantees that if an application writes to the source and then reads from the acceleration layer, it sees its own write. Achieving this requires either synchronous refresh (the acceleration layer is updated before the write is acknowledged, which adds latency) or write-awareness (the application marks certain reads as requiring the latest data, and the acceleration layer routes those reads to the source instead of the cache). **Strong consistency** guarantees that the acceleration layer always reflects the current source state. This is impractical for most acceleration deployments because it requires synchronous coordination between the source and the cache, negating the latency benefits of acceleration. When strong consistency is required, the better approach is to query the source directly via [SQL federation](/learn/sql-federation) and reserve acceleration for workloads that tolerate eventual consistency. Traditional caching (Redis, Memcached) stores key-value pairs and requires application logic to manage cache invalidation. Data acceleration caches entire datasets in a queryable SQL engine and supports automatic refresh via CDC or scheduled sync. Acceleration preserves the full SQL interface, so applications query the acceleration layer with standard SQL rather than key-based lookups.

    ', }, { title: 'How fresh is accelerated data?', paragraph: '

    Freshness depends on the refresh strategy. CDC-based refresh keeps accelerated data within seconds of the source. Scheduled refresh introduces a staleness window equal to the refresh interval (e.g., a 1-minute schedule means data can be up to 1 minute behind). Append-only refresh for event data is typically near-real-time for new data.

    ', }, { title: 'When should I accelerate data vs. query the source directly?', paragraph: '

    Accelerate when datasets are queried frequently, low latency is critical, or the source system is slow or expensive to query. Query the source directly (via federation) when access is infrequent, the source is already fast enough, or the dataset is too large to cache cost-effectively. Most production deployments accelerate hot datasets and federate the rest.

    ', }, { title: 'How does data acceleration differ from materialized views?', paragraph: '

    Materialized views are a database-internal feature that precomputes and stores query results within the same database. Data acceleration caches data from external sources in a local engine, often a different engine than the source. Acceleration works across heterogeneous data sources (PostgreSQL, S3, Databricks), while materialized views are limited to a single database.

    ', }, { title: 'What are the cost implications of data acceleration?', paragraph: '

    Data acceleration trades compute and storage resources at the edge or application layer for reduced load and cost at the source. In-memory acceleration (Arrow) requires RAM proportional to dataset size. On-disk acceleration (DuckDB) requires local SSD storage. The cost savings come from reduced source query volume, lower data warehouse bills (for usage-based pricing), and eliminated ETL pipeline infrastructure.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## Data Virtualization vs Data Replication: How to Choose URL: https://spice.ai/learn/data-virtualization-vs-replication Date: 2026-01-15T00:00:00 Description: Compare data virtualization and data replication -- two foundational approaches to data integration. Learn the key differences, tradeoffs, and when each approach is appropriate for your workloads. When an application, dashboard, or AI model needs data from multiple systems -- transactional databases, cloud warehouses, object stores, SaaS APIs -- engineering teams face a fundamental design decision: query the data where it lives, or copy it somewhere faster and closer. [Data virtualization](/learn/data-virtualization) takes the first approach. A virtualization layer presents a unified SQL interface across sources, translating and routing queries to each system at runtime. The data never moves -- only the query results are returned to the consumer. Data replication takes the second approach. Data is physically copied from source systems into a target system -- a data warehouse, a data lake, or a local acceleration engine. Consumers query the replica, which is optimized for their specific access patterns. Neither approach is inherently better. Each excels in different scenarios, and most production data architectures use both in combination. This guide explains the key differences, walks through a decision framework, and shows how modern platforms unify both patterns. ## How Data Virtualization Works Data virtualization provides a query abstraction layer over distributed data sources. Instead of moving data, it moves queries. The virtualization engine connects to each source system, translates the incoming SQL into the native dialect of each source, executes queries in parallel, and merges results before returning them to the application. Key characteristics of virtualization: - **No data movement:** Data stays in its source systems. There is no duplication, no storage cost for copies, and no synchronization to maintain. - **Always fresh:** Every query reads from the live source, so results always reflect the current state of each system. - **Rapid onboarding:** New data sources become queryable immediately after connecting -- no schema design, migration scripts, or pipeline orchestration required. - **Source-dependent performance:** Query latency depends on source system performance, network distance, and query complexity. Remote sources and complex cross-source joins can be slow. Virtualization engines optimize performance through predicate pushdown (pushing filters to the source so only matching rows are transferred), aggregation pushdown (computing sums and counts at the source), and query parallelization (executing requests to independent sources concurrently). These optimizations narrow the performance gap with co-located data, but they cannot eliminate the network round-trip entirely. ## How Data Replication Works Data replication physically copies data from source systems into a target system optimized for the consumer's workload. The replication process can be batch-oriented (traditional ETL that runs on a schedule) or continuous (streaming pipelines powered by [change data capture](/learn/change-data-capture)). Key characteristics of replication: - **Co-located data:** Queries run against local, pre-optimized copies. Cross-table joins, aggregations, and scans are fast because all data is in one place. - **Predictable performance:** Latency is determined by the target system, not the source. Query times are consistent regardless of source load or network conditions. - **Storage and pipeline costs:** Maintaining replicas requires storage for the copies and engineering effort to keep them synchronized. Schema changes at the source can break pipelines. - **Staleness window:** Unless replication is continuous, the replica always lags behind the source by at least the replication interval. Modern replication approaches have narrowed the freshness gap significantly. CDC-based replication can keep replicas within seconds of the source, and [data acceleration](/learn/data-acceleration) engines maintain queryable local copies that refresh automatically -- eliminating much of the traditional ETL burden. ## Key Differences: Side-by-Side Comparison The following table summarizes the core tradeoffs between virtualization and replication across the dimensions that matter most in production. | Dimension | Data Virtualization | Data Replication | |---|---|---| | **Data freshness** | Real-time -- always reads live source data | Depends on replication method: batch ETL (minutes to hours), CDC (seconds) | | **Query performance** | Source-dependent; network round-trip for every query | Fast and predictable; queries run against local, optimized copies | | **Storage cost** | No additional storage -- data stays at source | Requires storage for each replica; cost scales with data volume | | **Operational complexity** | Low setup; no pipelines to maintain | Pipelines must be built, monitored, and maintained over time | | **Schema change handling** | Transparent -- connector reads current schema at query time | Pipeline breakage risk; schema changes must be propagated | | **Cross-source joins** | Handled at query time; performance depends on data volume | Fast if all data is co-located in the target system | | **Source system load** | Every consumer query hits the source system | Source is queried only during replication; consumer queries don't touch it | | **Offline resilience** | Queries fail if a source is unavailable | Queries succeed against the replica even if the source is down | | **Best for** | Real-time access, ad-hoc exploration, rapid prototyping | High-throughput analytics, latency-sensitive applications, offline access | Neither column is uniformly better. The right choice depends on the specific workload, freshness requirements, and performance constraints. ## Decision Framework Choosing between virtualization and replication -- or determining the right mix of both -- requires evaluating four key factors. ### 1. Freshness Requirements If the workload requires data that is always current -- real-time dashboards, fraud detection, operational monitoring -- virtualization provides guaranteed freshness without pipeline delays. If the workload tolerates minutes or hours of staleness -- historical analytics, monthly reporting, compliance archives -- replication with batch ETL is simpler and more cost-effective. For workloads that need both freshness and speed -- sub-second queries on near-real-time data -- the answer is often CDC-based replication, where a local acceleration cache is kept current through continuous change streaming. ### 2. Query Performance Needs If queries must return in milliseconds and the source systems are remote, slow, or expensive to query, replication is the right pattern. Pre-computing and co-locating data ensures consistent, fast query times regardless of source conditions. If query latency in the hundreds-of-milliseconds-to-seconds range is acceptable, virtualization avoids the overhead of maintaining replicas. Query pushdown optimizations can make virtualized queries surprisingly fast, especially for simple lookups and filtered reads. ### 3. Data Volume and Breadth For workloads that access a small number of well-defined datasets repeatedly, replication is efficient -- the cost of maintaining copies is justified by the performance benefit. For workloads that need broad, ad-hoc access across many datasets (some of which may be queried only once), virtualization avoids the waste of replicating data that may never be read. In practice, most organizations have a mix: a small set of "hot" datasets that are queried constantly, and a long tail of datasets accessed infrequently. The hot datasets are candidates for replication; the long tail is best served by virtualization. ### 4. Operational Capacity Replication requires ongoing engineering investment: pipeline monitoring, failure handling, schema evolution, storage management, and cost optimization. Teams with mature data engineering practices and existing pipeline infrastructure can absorb this cost. Teams that are small, moving fast, or focused on application development rather than data infrastructure may prefer the operational simplicity of virtualization. ### Quick Reference - **Choose virtualization** when freshness is non-negotiable, the dataset count is high, queries are infrequent or ad-hoc, and operational simplicity matters. - **Choose replication** when query performance is critical, the workload is high-throughput, the dataset set is stable and well-defined, and offline resilience is needed. - **Choose both** when different workloads have different requirements -- which is the case for nearly every production data platform. ## Advanced Topics ### Consistency Models in Hybrid Architectures When virtualization and replication coexist in the same platform, consistency becomes a design challenge. A query might touch both a virtualized dataset (live from the source) and a replicated dataset (potentially seconds behind). The results reflect two different points in time, which can produce subtle inconsistencies. For example, an application joins a virtualized `orders` table with a replicated `customers` table. If a new customer places an order, the virtualized `orders` table shows the order immediately, but the replicated `customers` table might not yet contain the new customer record. The join produces a row with a null customer name -- a temporal inconsistency. Handling this requires either accepting eventual consistency (appropriate for most analytical and AI workloads), designing queries to tolerate missing joins (using LEFT JOINs instead of INNER JOINs for cross-boundary queries), or ensuring that related datasets use the same access pattern (both virtualized or both replicated). Production platforms that support transparent query routing -- automatically choosing between virtualized and accelerated paths -- must document their consistency guarantees so application developers can make informed decisions. ### Materialization Strategies for Cost Optimization The cost profile of replication depends heavily on what is replicated, how often, and where. Full-table replication of a multi-terabyte fact table is expensive in both storage and refresh compute. Partial materialization strategies reduce this cost without sacrificing query coverage. **Time-windowed materialization** replicates only recent data -- for example, the last 90 days of transactions. Queries within the window are served from the fast local copy; queries for older data are federated to the source warehouse or data lake. This pattern works well for [operational data lakehouse](/use-case/operational-data-lakehouse) architectures where recent data drives operational decisions and historical data supports periodic analysis. **Aggregation-based materialization** replicates pre-computed aggregates rather than raw rows. Instead of replicating 100 million order line items, the acceleration layer materializes daily revenue by product category -- a tiny fraction of the storage cost. This approach trades query flexibility for efficiency: only queries that match the pre-computed aggregations can be served from the replica. **Access-pattern-driven materialization** monitors query logs to identify which datasets and columns are actually accessed, then replicates only that subset. If an application queries 5 columns out of a 200-column table, replicating only those 5 columns reduces storage by 97%. This strategy requires a feedback loop between the query engine and the replication layer -- a capability found in modern [data lake acceleration](/use-case/datalake-accelerator) platforms. ### Federation Pushdown Optimization The performance gap between virtualization and replication narrows significantly when the virtualization engine can push more computation to the source. Beyond simple predicate pushdown, advanced engines support join pushdown (executing joins between two tables in the same source rather than fetching both and joining locally), limit pushdown (stopping source scans after enough rows are collected), and projection pushdown (requesting only the columns needed rather than full rows). The effectiveness of pushdown depends on the source system's capabilities. A PostgreSQL source can handle complex pushed-down predicates, joins, and aggregations. An S3 source with Parquet files can handle predicate pushdown and projection pushdown but not joins. A REST API source may support no pushdown at all -- every request returns full results that must be filtered locally. Understanding each [connector's](/integrations) pushdown capabilities is essential for predicting virtualized query performance and deciding which datasets to replicate for better results. ## How Spice Combines Both Approaches Most comparisons of virtualization and replication present them as either/or choices. In practice, the strongest data architectures use both -- and the challenge is combining them in a single, coherent platform rather than operating two separate systems. [Spice](/platform/sql-federation-acceleration) unifies data virtualization and replication in one runtime. The platform provides [SQL federation](/learn/sql-federation) across [30+ data connectors](/integrations) -- databases, warehouses, object stores, and streaming systems -- so any dataset is queryable immediately through a single SQL endpoint. This is the virtualization layer: no data movement, always-fresh results, instant onboarding. For datasets that require faster performance, Spice adds [data acceleration](/learn/data-acceleration) -- local replication into in-memory (Apache Arrow) or on-disk (DuckDB) engines that serve queries in milliseconds. Acceleration is configured per dataset, so teams can selectively replicate only the datasets that benefit from it. The acceleration cache is kept fresh through [change data capture](/learn/change-data-capture) or scheduled refresh, depending on the source. The query router handles this transparently. When a query arrives, Spice checks whether the requested datasets are accelerated locally. If so, the query is served from the local engine. If not, it is federated to the remote source. Applications interact with a single SQL endpoint and do not need to know which datasets are virtualized and which are replicated. This hybrid approach gives teams the operational simplicity of virtualization -- no pipelines to build for initial access -- with the performance of replication where it matters. A team can start by federating all data sources for immediate unified access, identify the hot datasets through query patterns, and then selectively accelerate those datasets without changing application queries. The [operational data lakehouse](/use-case/operational-data-lakehouse) and [data lake accelerator](/use-case/datalake-accelerator) use cases demonstrate this pattern in production, where Spice federates across the full data estate and accelerates the datasets that drive latency-sensitive applications and AI workloads. Yes. Most production data architectures use both. Virtualization provides broad, real-time access across all data sources, while replication (or acceleration) delivers fast performance for frequently accessed datasets. Modern platforms like Spice unify both patterns in a single runtime with transparent query routing.

    ', }, { title: 'When should I choose data virtualization over data replication?', paragraph: '

    Choose virtualization when data freshness is critical, when you need ad-hoc access across many sources, when operational simplicity matters, or when you are prototyping and want to avoid building ETL pipelines. Virtualization is also preferred when storage costs for replicas would be prohibitive.

    ', }, { title: 'When should I choose data replication over data virtualization?', paragraph: '

    Choose replication when sub-second query latency is required, when workloads are high-throughput and predictable, when offline resilience is needed (queries must succeed even if a source is down), or when the source system cannot handle the additional query load from virtualized access.

    ', }, { title: 'Does data replication always mean stale data?', paragraph: '

    Not necessarily. Traditional batch ETL introduces hours of staleness, but modern replication using change data capture (CDC) keeps replicas within seconds of the source. CDC-based acceleration combines the performance benefits of replication with near-real-time freshness, effectively eliminating the staleness tradeoff for most workloads.

    ', }, { title: 'How does Spice handle the choice between virtualization and replication?', paragraph: '

    Spice provides both patterns in a single platform. All connected data sources are queryable immediately via SQL federation (virtualization). Teams can then selectively accelerate specific datasets into local in-memory or on-disk engines (replication) for faster performance. The query router transparently serves each query from the optimal path -- no application changes required.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Data Virtualization? URL: https://spice.ai/learn/data-virtualization Date: 2026-01-03T00:00:00 Description: Data virtualization provides a unified view of data across multiple sources without physical replication. Learn how it works, how it compares to ETL, and when to use it. Enterprise data is distributed by nature. Customer records live in PostgreSQL, analytics events in Snowflake, product catalogs in a data lake on S3, and business metrics in a SaaS tool like Salesforce or HubSpot. When an application, dashboard, or AI model needs to combine data from several of these systems, teams traditionally build ETL pipelines to replicate everything into a central warehouse. Data virtualization eliminates this replication step. A virtualization layer sits between data consumers (applications, BI tools, AI models) and data sources, presenting a single unified interface while the data remains in its original system. Queries are translated and routed to each source at runtime, and results are merged and returned as if they came from a single database. ## How Data Virtualization Works A data virtualization platform operates in three layers: connectivity, abstraction, and optimization. ### The Connectivity Layer Connectors maintain live links to each data source. A production virtualization platform supports dozens of connector types: relational databases (PostgreSQL, MySQL, SQL Server), cloud warehouses (Databricks, Snowflake, BigQuery), object stores (Amazon S3, Azure Blob Storage, Google Cloud Storage), streaming systems (Kafka), and SaaS APIs. Each connector handles authentication, connection pooling, and protocol translation. The application never interacts with source databases directly -- it only communicates with the virtualization layer. ### The Abstraction Layer The virtualization engine presents a unified schema to consumers. Tables from different sources appear as if they belong to the same database. Applications query a single endpoint using standard SQL, unaware of where or how the underlying data is stored. This abstraction is powerful because it decouples applications from infrastructure. If the team migrates a dataset from PostgreSQL to Databricks, the application query doesn't change -- only the connector configuration in the virtualization layer needs to be updated. ### The Optimization Layer Raw virtualization would be slow: every query would require a network round-trip to each source, and all joins and aggregations would happen in the virtualization layer. Production platforms optimize this with several techniques: - **Predicate pushdown:** Filters are pushed to source databases so only matching rows are transferred - **Aggregation pushdown:** Operations like `COUNT`, `SUM`, and `AVG` are computed at the source when possible - **Query parallelization:** Requests to independent sources execute concurrently - **Local acceleration:** Frequently accessed datasets are cached locally for sub-second performance The combination of these optimizations means that virtualized queries can approach -- and sometimes match -- the performance of queries against a co-located warehouse. ## Data Virtualization vs. ETL Data virtualization and ETL solve the same problem, but they approach it from opposite directions. Understanding the tradeoffs helps teams choose the right pattern for each workload. ### Data Movement and Storage ETL physically copies data from sources into a central warehouse. This means duplicate storage costs, ongoing pipeline maintenance, and the engineering effort to keep copies synchronized. When source schemas change -- a column is added, a data type is modified -- ETL pipelines break and require manual intervention. Data virtualization queries data in place. There is no duplication, no pipeline to break, and no synchronization to maintain. New data sources become queryable as soon as a connector is configured. ### Data Freshness This is often the deciding factor. ETL pipelines run on schedules -- hourly, daily, or (at best) every few minutes. The warehouse always contains a stale snapshot. For batch analytics on historical data, this staleness is acceptable. For real-time use cases -- operational dashboards, AI models that need current data, [retrieval-augmented generation (RAG)](/learn/retrieval-augmented-generation) systems, fraud detection -- staleness is not acceptable. Data virtualization queries live sources, so results always reflect the current state of each system. ### Performance Here the tradeoff is more nuanced. ETL pre-computes and co-locates data, so warehouse queries are fast. Virtualization depends on source performance and network latency, which can vary. The best approach for production workloads is a hybrid: virtualize for real-time access, and accelerate performance-critical datasets with local caching kept fresh by [change data capture (CDC)](/learn/change-data-capture). This gives you the freshness of virtualization with the speed of co-located data. ### When to Use Each **ETL works well for:** - Stable, high-volume analytical workloads with known query patterns - Historical data analysis where freshness doesn't matter - Compliance archives that require a durable copy of data **Data virtualization works well for:** - Real-time operational dashboards and monitoring - AI and machine learning workloads that need fresh data - Ad-hoc exploration across multiple sources - Rapid prototyping where pipeline setup time is prohibitive - Data mesh architectures where domain teams own their data ## Key Benefits ### Faster Time to Value New data sources become queryable immediately after connecting -- no schema design, migration scripts, or pipeline orchestration required. Teams go from data source to query results in minutes, not weeks. ### Reduced Infrastructure Costs Without a centralized warehouse to store duplicate copies, teams save on storage, compute, and the engineering effort to keep pipelines running. For organizations with petabytes of data across dozens of sources, this cost reduction is substantial. ### Simplified Governance A virtualization layer provides a single point of access control, audit logging, and policy enforcement across all connected sources. Instead of managing permissions on each database individually, security teams define policies once. This is particularly important in regulated industries like [financial services](/industry/financial-services) where data access must be auditable. ### Application Decoupling Applications query the virtualization layer, not individual databases. This means infrastructure changes -- migrating a database, scaling a warehouse, switching cloud providers -- don't require application changes. The virtualization layer absorbs the complexity. ## Common Use Cases ### Unified Data Access for AI AI models and [RAG systems](/use-case/retrieval-augmented-generation) require data from multiple operational and analytical sources. Building ETL pipelines for each model is slow and fragile. A virtualization layer provides a single query interface to all data sources, so AI teams can focus on model quality instead of data plumbing. ### Real-Time Operational Dashboards Business intelligence dashboards that need current data from transactional databases, CRMs, and event streams can query a virtualization layer directly. The results are always fresh, and adding a new data source to a dashboard takes minutes instead of days. ### Data Mesh and Domain-Oriented Architectures In a data mesh, each domain team owns its data products. A virtualization layer enables governed, cross-domain queries without centralizing data into a monolithic warehouse. Each team maintains autonomy while the organization gets a unified view. ## Data Virtualization with Spice [Spice](/platform/sql-federation-acceleration) provides data virtualization through [SQL federation](/learn/sql-federation) with [30+ prebuilt connectors](/integrations) for databases, warehouses, object stores, and streaming systems. Queries are automatically optimized with predicate pushdown and parallelization. For performance-critical workloads, Spice adds local acceleration -- caching frequently accessed datasets in-memory or on-disk -- with [CDC-based refresh](/feature/real-time-change-data-capture) to keep cached data current. This hybrid approach delivers the freshness of virtualization with sub-second query performance. ## Advanced Topics ### Schema Mapping and Reconciliation A core challenge in data virtualization is presenting a coherent schema across sources that model the same concepts differently. A customer table in PostgreSQL might use `customer_id` as the primary key, while the same customer data in Salesforce uses `account_id`, and an S3-based data lake stores it as `cust_id` in a Parquet file. Schema mapping resolves these differences by defining explicit relationships between source-specific schemas and the unified virtual schema. The virtualization layer maintains a mapping catalog that records which virtual column corresponds to which source column, along with any type conversions required. When a query references `customers.id` in the virtual schema, the engine translates it to the correct source-specific column name and type for each underlying system. Production virtualization platforms support several mapping patterns: direct mapping (one-to-one column correspondence), computed mapping (a virtual column derived from an expression over source columns), and conditional mapping (different sources provide the same virtual column, with a priority order for conflict resolution). ```mermaid flowchart TB subgraph Virtual Schema V1[customers.id] V2[customers.name] V3[customers.revenue] end subgraph PostgreSQL P1[customer_id] P2[full_name] end subgraph Salesforce S1[account_id] S2[account_name] S3[annual_revenue] end subgraph S3 Data Lake L1[cust_id] L2[name] L3[total_rev] end V1 --- P1 V1 --- S1 V1 --- L1 V2 --- P2 V2 --- S2 V2 --- L2 V3 --- S3 V3 --- L3 ``` ### Semantic Layer Design Beyond raw schema mapping, production virtualization systems benefit from a semantic layer -- a set of business-oriented definitions that sit on top of the virtual schema. The semantic layer defines metrics (e.g., "monthly recurring revenue" = `SUM(amount) WHERE type = 'recurring' AND status = 'active'`), dimensions (e.g., "region" mapped from different geographic fields across sources), and relationships (e.g., customers have many orders, orders belong to one product). The semantic layer serves two purposes. First, it provides consistent metric definitions that all consumers -- BI tools, AI models, application queries -- use identically. Without it, different teams may calculate "revenue" differently depending on which source they query. Second, it enables [text-to-SQL](/learn/text-to-sql) systems to generate more accurate queries, because the LLM can reference well-defined business concepts rather than raw column names. Designing an effective semantic layer requires collaboration between data engineers (who understand the source schemas), domain experts (who define business metrics), and platform teams (who configure the virtualization layer). The key tradeoff is between expressiveness and maintenance cost -- a comprehensive semantic layer improves query accuracy but requires ongoing updates as business logic evolves. ### Write-Back Patterns Most data virtualization deployments are read-only: applications query the virtual layer, and writes go directly to source systems. However, some use cases require write-back -- the ability to write through the virtualization layer back to a source system. Write-back adds complexity because the virtualization layer must determine which source to target, validate that the write operation is permitted by governance policies, and handle conflicts when multiple sources contain overlapping data. Common write-back patterns include single-source routing (writes are always directed to one designated source), source-aware routing (the write target is determined by the virtual table's primary source mapping), and two-phase writes (the virtualization layer coordinates writes to multiple sources transactionally). Write-back is most commonly used for operational applications that need to update records visible across the virtual schema -- for example, updating a customer status that must be reflected in both the transactional database and the CRM. For analytical workloads, write-back is rarely needed because the virtualization layer serves as a read-optimized access point. Data federation is a specific implementation of data virtualization that uses SQL as the query interface. Data virtualization is the broader concept, encompassing SQL federation, REST API abstraction, GraphQL layers, and other query paradigms. In practice, the terms are often used interchangeably when the interface is SQL.

    ', }, { title: 'Does data virtualization replace a data warehouse?', paragraph: "

    Not necessarily. Data virtualization complements a warehouse by providing real-time access to operational data that hasn't been loaded yet. Many organizations use both: a warehouse for historical analytics and heavy aggregations, and a virtualization layer for real-time queries, AI workloads, and ad-hoc exploration.

    ", }, { title: 'How does data virtualization handle performance?', paragraph: '

    Raw virtualization depends on source performance, which can be slow for remote or overloaded systems. Production platforms add query acceleration -- local caching of frequently accessed data -- to deliver sub-second performance. Spice combines virtualization with in-memory and on-disk acceleration for latency-sensitive workloads.

    ', }, { title: 'Is data virtualization secure?', paragraph: '

    A virtualization layer centralizes access control, providing a single point for authentication, authorization, and audit logging across all connected sources. This simplifies compliance because policies are enforced in one place rather than across each individual data source.

    ', }, { title: 'What data sources can be virtualized?', paragraph: '

    Most platforms support relational databases (PostgreSQL, MySQL, SQL Server), cloud warehouses (Databricks, Snowflake, BigQuery), object stores (S3, Azure Blob, GCS), streaming systems (Kafka, Kinesis), and SaaS APIs. Spice provides 30+ prebuilt connectors covering the most common enterprise data sources.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What are Embeddings? URL: https://spice.ai/learn/embeddings Date: 2026-03-02T00:00:00 Description: Embeddings are dense vector representations of text, images, or code that capture semantic meaning. Learn how embedding models work, what dimensions represent, and how embeddings enable semantic search, RAG, and classification. Modern AI and search systems work with meaning, not just words. When a user searches for "cancel my subscription," the system should also find documents about "account termination" -- even though the two phrases share no words. Embeddings make this possible by converting text (or images, code, or any data) into numerical vectors where similar meanings are close together in vector space. An embedding is a list of numbers -- typically 384 to 3072 floating-point values -- that represents the semantic content of a piece of text. These vectors are produced by embedding models trained on large text corpora, where the training objective ensures that semantically similar inputs produce vectors that are geometrically close (as measured by cosine similarity or dot product) in high-dimensional space. ## What Embeddings Represent Each dimension in an embedding vector captures some aspect of meaning, but unlike hand-crafted features, these dimensions are learned automatically and don't correspond to human-interpretable concepts. The key property is that the geometric relationships between vectors reflect semantic relationships between inputs: - "king" - "man" + "woman" produces a vector close to "queen" - "Python" and "JavaScript" are closer together than "Python" and "photosynthesis" - "How do I reset my password?" and "I forgot my login credentials" produce similar vectors This property emerges from training. Embedding models learn to compress the statistical patterns of language into dense vectors such that inputs appearing in similar contexts (and thus having similar meanings) end up near each other in vector space. ## How Embedding Models Work ### Transformer Encoders Most modern embedding models are based on **encoder-only transformers** derived from the BERT architecture. The process works as follows: 1. **Tokenization:** Input text is split into subword tokens using a vocabulary (e.g., WordPiece, BPE). "Kubernetes deployment" might become `["kubernetes", "deploy", "##ment"]`. 2. **Token encoding:** Each token is mapped to a learned vector, and positional encodings are added. 3. **Self-attention layers:** Multiple transformer layers process the token vectors, with each layer's self-attention mechanism allowing every token to attend to every other token. This builds contextual representations -- the vector for "bank" differs depending on whether the surrounding context is about finance or rivers. 4. **Pooling:** The final token representations are combined into a single vector representing the entire input. Common pooling strategies include CLS token pooling (using the special classification token's output), mean pooling (averaging all token vectors), and max pooling. ```mermaid flowchart LR A[Text] --> B[Tokenize] B --> C[Transformer Encoder] C --> D[Pooling] D --> E["Vector [768 dims]"] ``` ### Sentence Transformers Raw BERT embeddings are not optimized for semantic similarity -- a sentence like "A dog sits on a bench" and "A dog is sitting outside" might produce dissimilar vectors despite having similar meanings. **Sentence transformer** models (like those from the sentence-transformers library) fine-tune BERT-style encoders using contrastive learning objectives. During training, the model is shown pairs of similar and dissimilar sentences. It learns to produce embeddings where similar pairs have high cosine similarity and dissimilar pairs have low cosine similarity. This training process -- often using triplet loss or contrastive loss -- transforms a general-purpose language model into one that produces semantically meaningful embeddings. ## Embedding Dimensions Embedding models produce vectors of a fixed size, called the **embedding dimension**. Common dimensions include: | Model | Dimensions | Context | | ----------------------------- | ---------- | --------------- | | OpenAI text-embedding-3-small | 1536 | General purpose | | OpenAI text-embedding-3-large | 3072 | Higher quality | | Cohere embed-v3 | 1024 | Multilingual | | BGE-large-en-v1.5 | 1024 | Open-source | | E5-large-v2 | 1024 | Open-source | | all-MiniLM-L6-v2 | 384 | Lightweight | Higher dimensions generally capture more nuanced meaning but require more storage and compute for similarity calculations. A 768-dimensional embedding for a single text chunk uses 3,072 bytes (768 x 4 bytes per float32). At scale -- millions of documents with multiple chunks each -- embedding storage becomes a significant consideration. ## How Embeddings Enable AI Applications ### Semantic Search The most direct application of embeddings is [vector search](/learn/vector-search). Documents are embedded at index time, queries are embedded at search time, and the nearest vectors in the index are returned as results. This captures meaning rather than keywords -- "cancel subscription" matches "account termination" because their embedding vectors are close together. ### Retrieval-Augmented Generation (RAG) [RAG systems](/learn/retrieval-augmented-generation) use embeddings to retrieve relevant context before generating an answer. The query is embedded, similar document chunks are retrieved via vector search, and the retrieved text is passed to a language model as context. Embedding quality directly determines retrieval quality, which in turn determines answer quality. ### Clustering and Classification Embeddings enable unsupervised clustering (grouping similar documents without labeled data) and few-shot classification (categorizing documents with only a handful of examples per category). Because embeddings capture meaning, a simple k-nearest-neighbors classifier over embedding space can achieve strong results without task-specific model training. ## Embeddings vs. Keyword Matching The fundamental difference is that keyword matching operates on surface-level token overlap, while embeddings operate on learned semantic representations: | Aspect | Keyword Matching (BM25) | Embeddings | | ----------------------------------------------- | --------------------------- | ------------------------------- | | Matching basis | Exact token overlap | Semantic similarity | | "cancel subscription" vs. "account termination" | No match | High similarity | | Exact identifiers (error codes, IDs) | Strong match | Weak match | | Computational cost | Low (inverted index lookup) | Higher (vector computation) | | Storage | Inverted index | Dense vectors (KB per document) | | Interpretability | High (which terms matched) | Low (opaque vector space) | In practice, combining both approaches through [hybrid search](/learn/hybrid-search) delivers better results than either alone. [BM25 full-text search](/learn/bm25-full-text-search) handles exact terms and identifiers, while embeddings capture meaning and handle vocabulary mismatch. ## Choosing Embedding Models Selecting an embedding model involves tradeoffs between quality, cost, latency, and operational complexity: **Commercial APIs** (OpenAI, Cohere, Google) offer high-quality embeddings with simple API calls. The tradeoffs are per-token cost, network latency, vendor dependency, and sending data to external services. OpenAI's text-embedding-3 models and Cohere's embed-v3 are strong general-purpose choices. **Open-source models** (BGE, E5, GTE, all-MiniLM) can run locally, eliminating cost and latency concerns. The MTEB (Massive Text Embedding Benchmark) leaderboard is the standard reference for comparing model quality across tasks. Open-source models have closed much of the quality gap with commercial options, especially for English-language tasks. Key factors to evaluate: - **Quality on your domain:** General benchmarks don't always predict performance on domain-specific data. Test with your actual queries and documents. - **Latency requirements:** Smaller models (384 dimensions) embed text faster than larger ones (3072 dimensions), which matters for real-time search. - **Multilingual support:** If your data spans languages, choose a model trained on multilingual data (e.g., Cohere embed-v3, multilingual-e5-large). - **Max token length:** Models have a maximum context window (typically 512 tokens). Longer documents must be chunked before embedding. ## Embeddings with Spice [Spice](/platform/hybrid-sql-search) integrates embedding generation alongside [SQL queries](/learn/sql-federation) and [hybrid search](/learn/hybrid-search) in a single runtime. Rather than managing separate embedding services and vector databases, Spice handles the entire pipeline -- generating embeddings, storing vectors, and executing similarity search -- within the same system that handles your SQL queries. This means you can: - Generate embeddings using configured [LLM models](/learn/llm-inference) alongside your data queries, without external API orchestration - Store and index embeddings alongside your relational data from [30+ connected sources](/integrations) - Combine embedding-based [vector search](/learn/vector-search) with [BM25 full-text search](/learn/bm25-full-text-search) in [hybrid search](/learn/hybrid-search) queries - Keep embeddings fresh as source data changes through [real-time CDC](/learn/change-data-capture) ```sql -- Generate embeddings and search in Spice SELECT * FROM search( 'knowledge_base', 'how to reset API credentials', mode => 'hybrid', limit => 10 ) ``` The unified approach eliminates the typical architecture where embeddings are generated by one service, stored in a vector database, and queried separately from your relational data. Instead, embedding generation, storage, and search are co-located with your [SQL federation](/learn/sql-federation) layer. ## Advanced Topics ### Fine-Tuning Embeddings General-purpose embedding models may not perform optimally on domain-specific data. **Fine-tuning** adapts a pre-trained embedding model to your domain using labeled pairs (similar and dissimilar examples from your data). The most effective approach is **contrastive fine-tuning**: given anchor-positive-negative triplets, the model learns to pull positive pairs closer and push negative pairs apart in embedding space. Even small fine-tuning datasets (a few thousand pairs) can significantly improve retrieval quality on domain-specific queries. Libraries like sentence-transformers provide straightforward fine-tuning APIs. The risk is overfitting -- a model fine-tuned too aggressively on a narrow domain may lose its general-purpose capabilities. Techniques like multi-task training (mixing domain-specific and general pairs) mitigate this. ### Matryoshka Representations Standard embeddings use a fixed dimension, but not all tasks require the full vector. **Matryoshka Representation Learning (MRL)** trains embedding models such that any prefix of the full embedding is itself a valid, lower-dimensional embedding. For example, a 768-dimensional Matryoshka embedding can be truncated to 256 or 128 dimensions with graceful quality degradation. This enables adaptive precision -- use the full embedding for high-quality search and a truncated version for fast approximate filtering. OpenAI's text-embedding-3 models support this: you can request a 256-dimensional embedding from a model that natively produces 3072 dimensions. ### Quantized Embeddings Storing millions of float32 embedding vectors requires substantial memory. **Quantization** reduces storage and compute costs by representing each dimension with fewer bits: - **float32** (default): 4 bytes per dimension, full precision - **float16**: 2 bytes per dimension, minimal quality loss - **int8**: 1 byte per dimension, noticeable but often acceptable quality loss - **binary**: 1 bit per dimension, significant quality loss but 32x compression Binary quantization is particularly effective as a first-pass filter -- use binary similarity to quickly identify candidates, then re-rank using full-precision embeddings. ### Chunking Strategies for Long Documents Embedding models have a maximum context window (typically 512 tokens), so longer documents must be split into chunks before embedding. Chunking strategy significantly affects retrieval quality: - **Fixed-size chunks** (e.g., 256 tokens with 50-token overlap): Simple and predictable, but may split sentences or paragraphs mid-thought. - **Semantic chunking:** Split at paragraph or section boundaries to preserve coherent units of meaning. More complex to implement but produces higher-quality chunks. - **Recursive chunking:** Start with large chunks and recursively split oversized chunks at the most natural boundary (paragraphs, then sentences, then words). The overlap between adjacent chunks ensures that information at chunk boundaries is not lost. Typical overlap is 10-20% of the chunk size. A vector is a mathematical object -- a list of numbers. An embedding is a specific kind of vector produced by a trained model to represent the semantic meaning of an input. All embeddings are vectors, but not all vectors are embeddings. The term "embedding" implies that the vector was learned to capture meaning in a way that preserves semantic relationships.

    ', }, { title: 'How many dimensions should my embeddings have?', paragraph: '

    It depends on your quality and performance requirements. Models with 384 dimensions (like all-MiniLM-L6-v2) are fast and memory-efficient, suitable for lightweight applications. Models with 768-1024 dimensions (like BGE-large or E5-large) offer a good balance of quality and efficiency. Models with 1536-3072 dimensions (like OpenAI text-embedding-3) provide the highest quality but require more storage and compute. Start with a mid-range model and evaluate whether higher dimensions improve results on your specific data.

    ', }, { title: 'Do I need to re-embed all my documents if I change embedding models?', paragraph: '

    Yes. Embeddings from different models exist in different vector spaces and are not comparable. If you switch from one embedding model to another, all documents must be re-embedded with the new model. This is one reason why embedding model selection is an important upfront decision -- migrating between models at scale can be time-consuming and costly.

    ', }, { title: 'How do embeddings handle multiple languages?', paragraph: '

    Multilingual embedding models (like Cohere embed-v3 or multilingual-e5-large) are trained on text from many languages simultaneously. These models map semantically similar content to nearby vectors regardless of language, so a query in English can match a document in French if the content is semantically similar. The quality of cross-lingual matching varies by language pair and model.

    ', }, { title: 'What is the relationship between embeddings and RAG?', paragraph: '

    Embeddings are the foundation of the retrieval step in retrieval-augmented generation (RAG). Documents are embedded and stored in a vector index. When a user asks a question, the query is embedded and the most similar document chunks are retrieved. These chunks are then passed as context to a language model, which generates an answer grounded in the retrieved information. Better embeddings lead to better retrieval, which leads to more accurate and relevant generated answers.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## Full-Text Search vs Vector Search: How to Choose URL: https://spice.ai/learn/full-text-search-vs-vector-search Date: 2026-01-25T00:00:00 Description: Full-text search matches exact keywords using BM25 scoring, while vector search finds semantically similar content using embeddings. Learn the key differences, when to use each approach, and how hybrid search combines both for optimal results. Full-text search and vector search solve the same fundamental problem -- finding relevant information in a collection of documents -- but they approach it from opposite directions. Full-text search looks for documents that contain the query's exact terms. Vector search looks for documents whose meaning is closest to the query's meaning, regardless of which words are used. Neither approach is universally better. Each has strengths that correspond to the other's weaknesses, and understanding these tradeoffs is essential for choosing the right search strategy for your application. In many production systems, the answer is to use both through [hybrid search](/platform/hybrid-sql-search). ## How Full-Text Search Works [Full-text search](/learn/bm25-full-text-search) uses an inverted index -- a data structure that maps every unique term to the list of documents containing it. When a query arrives, the system looks up each query term in the inverted index, finds the documents that match, and scores them using a ranking function. The standard ranking function is **BM25** (Best Match 25), which scores documents based on three signals: 1. **Term frequency:** How often the query term appears in the document (with diminishing returns for repeated occurrences) 2. **Inverse document frequency:** How rare the term is across the entire corpus (rare terms are more informative) 3. **Document length normalization:** Shorter documents with the same term count are typically more focused Full-text search is fast, predictable, and interpretable. You can explain why a document ranked highly -- it contained specific terms at specific frequencies. The inverted index enables sub-millisecond lookups even across millions of documents. ## How Vector Search Works [Vector search](/learn/vector-search) converts both documents and queries into numerical vectors called [embeddings](/learn/embeddings) -- lists of floating-point numbers that encode semantic meaning. These embeddings are generated by machine learning models trained on large text corpora, where the model learns to place semantically similar text close together in high-dimensional space. At query time, the search query is embedded into a vector using the same model, and the system finds the stored vectors closest to the query vector using a distance metric like cosine similarity. The result is a ranked list of documents ordered by semantic similarity to the query. Vector search captures meaning rather than keywords. "How do I fix a slow API?" matches documents about "endpoint latency optimization" because both concepts map to nearby regions in the embedding space -- even though they share no words. ## Key Differences at a Glance | Aspect | Full-Text Search (BM25) | Vector Search | | --- | --- | --- | | **What it matches** | Exact terms and their variants | Semantic meaning | | **Index type** | Inverted index (posting lists) | Vector index (HNSW, IVF) | | **Query "cancel subscription"** | Matches docs containing "cancel" and "subscription" | Matches docs about account termination, ending service, etc. | | **Exact identifiers** (error codes, product SKUs) | Precise match | Weak -- may return generic related content | | **Synonym handling** | None without manual expansion | Automatic -- learned from training data | | **Scoring transparency** | High -- term weights are interpretable | Low -- similarity scores are opaque | | **Storage per document** | Posting list entries (compact) | Dense vector, typically 1-12 KB depending on dimensions | | **Index build cost** | Low (tokenize and insert into posting lists) | Higher (generate embeddings via ML model, build ANN index) | | **Query latency** | Sub-millisecond | Sub-millisecond to low milliseconds (ANN lookup) | | **Cold start** | Works immediately with any text | Requires an embedding model and vector index | ## Where Full-Text Search Excels Full-text search is the stronger choice when: - **Queries contain exact identifiers.** Product names, error codes, model numbers, API endpoints, and other precise identifiers need exact matching. Searching for "ERR-4502" should return documents about that specific error, not documents about errors in general. - **Domain-specific terminology matters.** In legal, medical, or scientific contexts, precise terminology carries specific meaning. "Negligence" and "carelessness" are not interchangeable in a legal search. - **Users expect keyword behavior.** When users put terms in quotes or use Boolean operators (AND, OR, NOT), they expect keyword-level precision. - **Interpretability is required.** Full-text search can highlight exactly which terms matched and why a document scored highly. This is valuable for debugging search quality and for user-facing search interfaces that show match highlights. - **Infrastructure simplicity is a priority.** Full-text search requires no ML models, no embedding generation pipeline, and no GPU infrastructure. An inverted index is fast to build and cheap to maintain. ## Where Vector Search Excels Vector search is the stronger choice when: - **Vocabulary mismatch is the primary challenge.** Users describe problems in their own words, which rarely match the terminology in your documentation. "My app is crashing on startup" should find documents about "application initialization failures." - **Natural language questions drive search.** Conversational queries like "how do I speed up my database queries?" express intent that keyword matching cannot capture. - **Cross-language or multi-modal search is needed.** Multilingual embedding models can match queries in one language to documents in another. Multi-modal models can match text queries to images or code. - **Search powers an AI pipeline.** In [retrieval-augmented generation (RAG)](/learn/retrieval-augmented-generation) and [AI agent](/use-case/secure-ai-agents) workflows, semantic retrieval finds the conceptually relevant context that the LLM needs to generate accurate answers. - **Content is unstructured and varied.** Knowledge bases, support tickets, internal wikis, and Slack archives contain diverse language that benefits from semantic understanding over exact term matching. ## When to Use Each: A Decision Framework The right search approach depends on your query patterns, data characteristics, and application requirements. **Start with full-text search if:** - Your data has structured identifiers that users search for directly - Query patterns are predictable and keyword-oriented - You need a simple, low-maintenance search solution - Match transparency is a requirement **Start with vector search if:** - Users ask natural language questions - Vocabulary mismatch between queries and documents is common - You are building RAG or AI-powered features - Your content spans diverse topics and terminology **Use hybrid search if:** - Your queries include a mix of exact lookups and conceptual questions - You cannot predict whether a given query will be keyword-oriented or semantic - Retrieval accuracy is mission-critical (as in RAG, [application search](/use-case/application-search), or enterprise knowledge bases) - You want the highest overall retrieval quality without compromising on either precision or recall In practice, most production [search applications](/use-case/application-search) benefit from hybrid search because real-world query traffic is a mix of all these patterns. A user might search for "ERR-4502 connection timeout" -- where the error code needs exact matching and "connection timeout" benefits from semantic understanding. ## How Hybrid Search Combines Both [Hybrid search](/learn/hybrid-search) runs full-text search and vector search in parallel against the same query, then merges the results using a fusion algorithm. The most common fusion method is **Reciprocal Rank Fusion (RRF)**, which scores each document based on its rank position in each result set rather than its raw score. The process works in three steps: 1. **Parallel retrieval:** The query is simultaneously processed by the BM25 inverted index and the vector index, producing two independent ranked result sets 2. **Score normalization:** Because BM25 scores and cosine similarity scores are on different scales, they must be normalized before combining 3. **Rank fusion:** RRF assigns each document a score of `1 / (k + rank)` for each result set it appears in, sums these scores, and sorts by the combined score Documents that rank highly in both result sets receive the highest combined scores. Documents that rank highly in only one set still appear in the final results, but lower in the ranking. ```sql -- Hybrid search combining BM25 and vector search in Spice SELECT * FROM search( 'knowledge_base', 'how to handle connection timeout errors', mode => 'hybrid', limit => 10 ) ``` Hybrid search adds minimal latency over either method alone because the two searches execute concurrently. The fusion step is a lightweight rank-based operation that typically adds only a few milliseconds. ## Advanced Topics ### Embedding Model Selection and Its Impact on Search Quality The quality of vector search depends heavily on the embedding model. General-purpose models like OpenAI's `text-embedding-3-large` or open-source models like `bge-large-en-v1.5` work well across many domains, but domain-specific fine-tuning can significantly improve results. Key considerations for embedding model selection: - **Dimensionality:** Higher dimensions (1024-3072) capture more nuance but require more storage and compute. Lower dimensions (384-768) are faster and cheaper but may lose fine-grained distinctions. - **Training data:** Models trained on code perform better for code search. Models trained on scientific papers perform better for research retrieval. General models are a reasonable default. - **Asymmetric vs. symmetric:** Some models are trained for asymmetric search (short query vs. long document), while others are trained for symmetric similarity (similar-length passages). Choose based on your use case. When vector search underperforms, the embedding model is often the bottleneck. Before adding complexity (re-ranking, query expansion), evaluate whether a better-suited embedding model improves baseline results. ### Query Expansion and Reformulation Full-text search can be improved without switching to vector search through **query expansion** -- automatically adding related terms to the original query. Techniques include: - **Synonym expansion:** Augmenting "car insurance" with "automobile insurance" and "vehicle coverage" using a synonym dictionary or thesaurus - **Pseudo-relevance feedback:** Running the initial query, extracting frequent terms from the top results, and re-running the query with those terms added - **LLM-based reformulation:** Using a language model to generate alternative phrasings of the query, then running all variations and merging results Query expansion narrows the gap between full-text and vector search by addressing vocabulary mismatch at the query level rather than the index level. However, it increases query latency (multiple queries per search) and can introduce noise if expanded terms are imprecise. ### Evaluation Metrics for Comparing Search Methods Objectively comparing full-text and vector search requires standardized evaluation metrics: - **Recall@k:** The fraction of relevant documents that appear in the top-k results. High recall means the system finds most relevant documents. This is critical for RAG, where missing a relevant document means the LLM lacks context. - **Precision@k:** The fraction of top-k results that are actually relevant. High precision means fewer irrelevant results clutter the output. - **NDCG (Normalized Discounted Cumulative Gain):** Measures ranking quality -- not just whether relevant documents appear, but whether they appear near the top. NDCG penalizes relevant documents that rank lower more heavily. - **MRR (Mean Reciprocal Rank):** The average of 1/rank for the first relevant result across a set of queries. Useful when users care most about the single best result. When evaluating hybrid search against individual methods, measure all four metrics across a representative query set. Hybrid search typically improves recall@k significantly (by capturing both keyword and semantic matches) while maintaining or improving precision and NDCG. ## How Spice Combines Both [Spice](/platform/hybrid-sql-search) provides full-text search, vector search, and hybrid search in a single SQL-native runtime -- eliminating the need to deploy and synchronize separate search systems. With Spice, you can: - **Run BM25 and vector search in one query** using a single `search()` function with mode selection (`fts`, `vector`, or `hybrid`) - **Combine search with SQL** to filter results by metadata, join with relational data, and express complex retrieval logic -- all in standard SQL - **Keep indexes fresh** with [real-time change data capture](/learn/change-data-capture) that updates both full-text and vector indexes as source data changes - **Search across federated sources** using [SQL federation](/learn/sql-federation) to query data from [30+ connected sources](/integrations) without moving it into a separate search system - **Generate embeddings in the same runtime** using built-in [LLM inference](/learn/llm-inference), so embedding generation and search happen without external API calls This unified approach is particularly valuable for [application search](/use-case/application-search) and RAG use cases where teams would otherwise need to maintain a vector database, a search engine, and an application layer to combine their results. Spice handles all three in a single system, reducing infrastructure complexity while delivering hybrid search quality. ```sql -- Full-text, vector, and hybrid search in one runtime SELECT * FROM search('docs', 'connection timeout error', mode => 'fts', limit => 10); SELECT * FROM search('docs', 'connection timeout error', mode => 'vector', limit => 10); SELECT * FROM search('docs', 'connection timeout error', mode => 'hybrid', limit => 10); ``` No. Vector search excels at handling vocabulary mismatch and understanding query intent, but full-text search is more precise for exact identifiers, error codes, product names, and domain-specific terminology. Neither method is universally superior -- the best choice depends on your query patterns and data characteristics.

    ', }, { title: 'What is the main advantage of hybrid search over using one method alone?', paragraph: '

    Hybrid search captures both exact keyword matches and semantic similarity in a single query. This means it handles mixed queries -- like "ERR-4502 connection timeout" where part needs exact matching and part benefits from semantic understanding -- without requiring the application to decide which search method to use per query.

    ', }, { title: 'Does vector search require a GPU?', paragraph: '

    Generating embeddings (at index time and query time) benefits from GPU acceleration, especially for large batches. However, the vector search itself -- the ANN index lookup -- runs on CPU and is fast without a GPU. Many production systems generate embeddings via API calls to hosted models and run vector search on CPU-only infrastructure.

    ', }, { title: 'How much additional infrastructure does vector search require compared to full-text search?', paragraph: '

    Vector search requires an embedding model (hosted or self-managed) to generate vectors, a vector index (HNSW or IVF) that uses more memory than an inverted index, and a pipeline to embed new documents as they arrive. In a unified runtime like Spice, these components are integrated, reducing the infrastructure overhead to a single system.

    ', }, { title: 'Can I migrate from full-text search to hybrid search incrementally?', paragraph: '

    Yes. A common migration path is to start with full-text search, add vector search alongside it, and use hybrid fusion to combine results. Because hybrid search runs both methods in parallel, you can deploy it without removing your existing full-text search infrastructure. Tune the fusion weights to control how much influence each method has on the final ranking.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is a Hybrid Data Architecture? URL: https://spice.ai/learn/hybrid-data-architecture Date: 2026-03-08T00:00:00 Description: A hybrid data architecture combines application sidecars for sub-millisecond reads with a centralized cluster for data ingestion, acceleration, and distributed compute. Learn how the sidecar-cluster pattern works and when to use it. Running a data layer as a sidecar alongside your application gives you the lowest possible read latency -- queries travel over loopback, not the network. But a pure sidecar model breaks down when you need centralized data ingestion, distributed queries across large datasets, or coordination of [acceleration](/learn/data-acceleration) refreshes. Every sidecar must independently connect to upstream sources, manage its own refresh cycles, and handle ingestion overhead. Running everything in a centralized cluster solves the coordination problem. A single cluster can ingest data once, manage refresh schedules, and serve distributed analytical queries. But now every read from an application pod must cross the network to the cluster, adding milliseconds of latency to every query -- unacceptable for hot-path workloads. A hybrid data architecture combines both patterns. Application sidecars cache frequently accessed datasets locally for sub-millisecond reads. A centralized cluster handles the heavy work: data ingestion from upstream sources, [acceleration](/learn/data-acceleration) and refresh, distributed compute for large queries, and [hybrid search](/learn/hybrid-search) indexing. When a sidecar receives a query for data beyond its cached working set, it transparently delegates to the cluster. The application never needs to know which tier served the response. This is the most common production topology for latency-sensitive, data-intensive applications. It separates the concerns of fast reads (sidecars) from data management (cluster) while keeping both accessible through a unified query interface. ## How Hybrid Data Architecture Works The hybrid architecture is a two-tier model. The first tier consists of lightweight sidecars deployed as containers alongside application pods -- typically in Kubernetes. The second tier is a centralized cluster that manages data pipelines, acceleration, and distributed query execution. ### The Two-Tier Model **Sidecars** run as pod-level containers in Kubernetes, co-located with the application. Each sidecar maintains a subset of accelerated datasets in local memory or on local disk. Because the sidecar runs on the same node as the application (or even in the same pod), queries travel over loopback -- the network hop is eliminated entirely. Sidecars start in seconds and scale horizontally with application pods. Each sidecar is configured via a `spicepod.yaml` that declares which datasets to cache, which acceleration engines to use (Arrow for in-memory, DuckDB for on-disk), and which views, search indices, or AI models to load locally. The sidecar handles only caching and query serving -- it does not run data ingestion or refresh pipelines. **The cluster** is a centralized deployment (single node or distributed) that handles everything the sidecars do not: connecting to upstream [data sources](/integrations), running ingestion pipelines, managing acceleration refresh cycles (including [CDC-based refresh](/learn/change-data-capture)), executing distributed queries across large datasets, and maintaining search indices. The cluster is the authoritative data layer. It connects to data warehouses, transactional databases, object stores, and streaming platforms, then accelerates and serves that data to sidecars on demand. ### Query Routing and Transparent Delegation When a query arrives at a sidecar, the sidecar checks whether the requested data is available in its local acceleration cache. If the data is cached locally, the query is served directly -- sub-millisecond, zero network hops. If the data is not in the sidecar's local cache -- because the dataset isn't configured for local acceleration, or because the query requires data the sidecar doesn't hold -- the sidecar transparently delegates the query to the cluster over Arrow Flight (gRPC). The cluster executes the query against its own accelerated datasets or federates it to the upstream source, then streams the results back to the sidecar. The application receives the response through the same interface, unaware of which tier served it. This transparent delegation is what makes the pattern practical. Application code does not need conditional logic to decide where to route queries. The sidecar handles routing automatically based on its local cache state. ### Cache Management and Invalidation Sidecars do not manage their own data ingestion. Instead, the cluster handles all ingestion and refresh, then sidecars either pull updated data from the cluster on a configured schedule or receive push-based updates. When the cluster refreshes a dataset -- whether via CDC from a PostgreSQL WAL, scheduled polling from an S3 bucket, or streaming ingestion from Kafka -- the updated data becomes available to sidecars on their next refresh cycle. The sidecar's refresh interval determines the maximum staleness of its local cache relative to the cluster. For datasets where even seconds of staleness are unacceptable, sidecars can be configured to delegate all queries for that dataset to the cluster, using local caching only for datasets with relaxed freshness requirements. This per-dataset configuration gives operators fine-grained control over the latency-vs-freshness tradeoff. ```mermaid flowchart TB subgraph K8s["Kubernetes Cluster"] subgraph Pod1["Application Pod 1"] App1[Application] <-->|loopback| SC1[Sidecar] end subgraph Pod2["Application Pod 2"] App2[Application] <-->|loopback| SC2[Sidecar] end subgraph Pod3["Application Pod 3"] App3[Application] <-->|loopback| SC3[Sidecar] end subgraph ClusterTier["Centralized Cluster"] CL[Cluster Node] end end SC1 <-->|Arrow Flight gRPC| CL SC2 <-->|Arrow Flight gRPC| CL SC3 <-->|Arrow Flight gRPC| CL CL <-->|Connectors| PG[(PostgreSQL)] CL <-->|Connectors| S3[(S3 / Data Lake)] CL <-->|Connectors| DW[(Data Warehouse)] ``` ## The CDN Analogy The hybrid sidecar-cluster pattern mirrors how content delivery networks (CDNs) work. In a CDN, an origin server holds the canonical content. Edge nodes cache popular content close to end users. When a user requests content that the edge node has cached, it serves it immediately -- low latency, no origin round-trip. When the edge node doesn't have the content, it fetches from the origin, caches it, and serves the response. In the hybrid data architecture, the cluster is the origin server. It holds the full set of accelerated datasets, manages ingestion pipelines, and handles distributed queries. The sidecars are the edge nodes. They cache the hot working set -- the datasets and rows that the co-located application queries most frequently -- and serve them with sub-millisecond latency over loopback. When a sidecar receives a query for data it doesn't have, it fetches from the cluster (the origin), just as a CDN edge fetches from the origin. The network cost of this delegation is higher than a local cache hit, but far lower than querying the upstream data source directly, because the cluster already has the data accelerated and ready to serve. The analogy extends to scaling. CDNs add edge nodes without increasing load on the origin -- each node serves cached content independently. Similarly, adding sidecars does not increase load on upstream data sources. The cluster absorbs ingestion, and sidecars serve reads from their local caches. Ten sidecars or a thousand sidecars place the same load on PostgreSQL, S3, or Databricks -- only the cluster connects to those sources. ## Benefits of the Hybrid Pattern ### Sub-Millisecond Reads Because sidecars run alongside the application -- on loopback in Kubernetes -- queries that hit the local cache avoid all network overhead. In-memory acceleration with Apache Arrow delivers sub-millisecond reads for cached datasets. This is critical for hot-path workloads: API serving, real-time dashboards, AI inference pipelines, and [retrieval-augmented generation](/learn/retrieval-augmented-generation) that need embedding lookups in single-digit milliseconds. ### Centralized Data Management Data ingestion, acceleration refresh, and pipeline orchestration happen once in the cluster, not redundantly in every sidecar. This means upstream data sources see a single connection (from the cluster), not hundreds of connections from individual sidecars. It also simplifies operational management -- refresh schedules, CDC pipelines, and schema evolution are configured and monitored in one place. ### Horizontal Scalability Sidecars scale with application pods. When Kubernetes scales an application from 3 replicas to 30, each new pod gets its own sidecar that caches the configured datasets. The cluster's load does not increase proportionally because sidecars serve most reads from their local cache. Only cache misses and refresh cycles generate cluster traffic. This scaling model is particularly valuable for multi-tenant SaaS applications and microservice architectures where dozens or hundreds of application instances need fast access to the same datasets. ### Resilience Sidecars serve cached data even if the cluster is temporarily unavailable. If the network between a sidecar and the cluster goes down, the sidecar continues serving queries from its local cache. Queries that require delegation will fail, but cached workloads remain unaffected. When the cluster recovers, sidecars resume normal operation -- fetching updates and delegating cache misses. This resilience model is similar to how CDN edge nodes continue serving cached content during origin outages. The sidecar's local cache acts as a buffer against cluster-level disruptions. ## When to Use Hybrid Architecture The hybrid pattern is not universally optimal. It adds architectural complexity -- two tiers to deploy, configure, and monitor. The following scenarios justify that complexity. ### Real-Time + Analytical Workloads Applications that need both sub-millisecond reads for real-time serving and distributed analytical queries across large datasets benefit from the two-tier split. Sidecars handle the real-time reads; the cluster handles the analytical workload. This separation prevents heavy analytical queries from competing with latency-sensitive application queries for the same resources. For example, an [operational data lakehouse](/use-case/operational-data-lakehouse) that serves both live dashboards and batch reports can use sidecars for the dashboard queries and the cluster for the batch analytics -- all through a unified SQL interface. ### Multi-Instance and Multi-Tenant Applications When multiple application instances (microservices, API replicas, tenant-specific deployments) need fast access to the same datasets, the hybrid pattern avoids each instance independently connecting to and querying upstream sources. The cluster ingests once, and sidecars distribute the cached data across all instances. A multi-tenant SaaS platform that runs isolated pods per tenant can deploy a sidecar in each tenant pod. Each sidecar caches the datasets relevant to that tenant, while the cluster manages the full dataset across all tenants. The tenant's queries are fast (local sidecar), and the platform's upstream sources see only the cluster's connections. ### Reducing Upstream Source Load If the priority is reducing load on upstream data sources -- a production PostgreSQL database, a rate-limited SaaS API, or a cost-per-query data warehouse -- the hybrid pattern centralizes all source access in the cluster. Sidecars never connect to upstream sources directly. This is the same principle behind CDN origin shielding: the edge never reaches the origin except through a controlled, cacheable path. ### Edge Computing and Distributed Deployments Applications running across multiple regions or at the edge benefit from the hybrid pattern when a central cluster can be deployed in a primary region and sidecars deployed alongside applications in satellite regions. Sidecars cache the working set locally, absorbing most read traffic without cross-region network hops. Delegation to the cluster handles the long tail of queries that miss the local cache. ### When Hybrid Architecture Is Not Ideal **Simple single-instance applications** that don't need horizontal scaling gain little from the two-tier model. A single sidecar or embedded deployment is simpler and sufficient. **Pure batch workloads** with relaxed latency requirements (seconds to minutes are acceptable) can run directly against the cluster or the upstream source without needing the sidecar tier. **Unreliable networks between sidecars and cluster** undermine the delegation model. If the sidecar-to-cluster connection is intermittent, queries that miss the local cache will fail unpredictably. In these scenarios, deploying full, self-contained instances (each with its own ingestion) may be more reliable. ## Advanced Topics ### Cache Coherency Strategies In a distributed caching architecture, coherency -- ensuring all sidecars have a consistent view of the data -- is a design decision, not a guarantee. The hybrid pattern offers several strategies depending on the application's tolerance for staleness. **Pull-based refresh** is the simplest model. Each sidecar periodically pulls the latest data from the cluster on a configured interval (e.g., every 10 seconds). This introduces a staleness window equal to the refresh interval, but it is predictable and easy to reason about. Most production deployments use this model for datasets where seconds of staleness are acceptable. **Push-based invalidation** reduces staleness by having the cluster notify sidecars when data changes. When the cluster completes a refresh cycle (e.g., a CDC update), it pushes an invalidation signal to all connected sidecars. Sidecars then pull the updated data immediately rather than waiting for the next scheduled refresh. This reduces worst-case staleness from the full refresh interval to the time it takes for the invalidation-plus-pull cycle. **Delegate-on-write** avoids coherency issues entirely for specific datasets by never caching them in the sidecar. All queries for those datasets are delegated to the cluster, which always has the latest data. This sacrifices sidecar-level latency for guaranteed freshness, and is appropriate for datasets where even seconds of staleness are unacceptable. In practice, production deployments use a mix of these strategies, configured per dataset based on freshness requirements. Hot, frequently read datasets with relaxed freshness use pull-based refresh. Datasets requiring near-real-time freshness use push-based invalidation. Datasets requiring absolute freshness use delegation. ### Multi-Region Deployments The hybrid pattern extends naturally to multi-region architectures. A primary cluster runs in one region, handling ingestion, refresh, and serving as the authoritative data layer. Sidecars in other regions cache the working set locally, serving reads without cross-region latency. For multi-region setups with stricter latency requirements, a secondary cluster can be deployed in each region, replicating data from the primary cluster. Sidecars in each region connect to their regional cluster rather than the primary, reducing delegation latency. This mirrors the CDN pattern of regional origin servers behind a global origin. The key consideration in multi-region deployments is conflict resolution for writes. If applications in multiple regions write to the same datasets, the architecture must define how those writes are reconciled. The hybrid pattern is primarily a read-optimized architecture -- writes flow through the application's transactional database, and the cluster ingests those changes via CDC or scheduled refresh. ### Sidecar Resource Budgeting Each sidecar consumes CPU and memory on the application pod's node. In Kubernetes, this means setting resource requests and limits in the sidecar container spec that reflect the sidecar's working set. The primary resource dimension is memory. A sidecar using Arrow in-memory acceleration requires enough RAM to hold all configured datasets. A sidecar accelerating 500 MB of datasets with Arrow needs approximately 500 MB of memory (plus overhead for query execution buffers). Sidecars using DuckDB for on-disk acceleration require less memory but need local disk space. CPU requirements are typically modest -- sidecars serve cached data using Arrow's zero-copy reads, which require minimal CPU. CPU spikes occur during refresh cycles (loading updated data from the cluster) and during complex queries that involve local computation (joins, aggregations). A practical budgeting approach is to start with memory equal to 1.5x the total dataset size (to account for query buffers and refresh overhead), a CPU limit of 0.5-1 core, and monitor actual usage during load testing. The sidecar's `spicepod.yaml` controls exactly which datasets are cached, so operators can tune the working set to fit within the resource budget. ## Hybrid Data Architecture with Spice [Spice](/platform/sql-federation-acceleration) implements the hybrid sidecar-cluster pattern as its most common production deployment topology. The Spice runtime runs as both a sidecar (lightweight, caching, pod-level) and a cluster node (full-featured, ingestion, distributed compute). Sidecars are configured via `spicepod.yaml`, specifying which datasets to accelerate locally, which acceleration engine to use, and the cluster endpoint for delegation: ```yaml # Sidecar spicepod.yaml datasets: - from: spice.ai/cluster:orders name: orders acceleration: engine: arrow refresh_mode: full refresh_check_interval: 10s - from: spice.ai/cluster:products name: products acceleration: engine: duckdb refresh_check_interval: 60s ``` The cluster connects to upstream sources -- PostgreSQL, S3, Databricks, and [30+ other connectors](/integrations) -- handles ingestion and CDC-based refresh, and serves queries from sidecars that exceed the local cache. Communication between sidecars and the cluster uses Arrow Flight (gRPC) with mTLS encryption, ensuring data in transit is encrypted. This architecture enables teams to build [data lake acceleration layers](/use-case/datalake-accelerator), power [AI agent workloads](/use-case/secure-ai-agents) with sub-millisecond data access, and run federated queries across heterogeneous sources -- all through a single, Kubernetes-native deployment. The cluster can be self-managed or run on the Spice Cloud Platform for managed operations. For a detailed deployment guide, see the [hybrid architecture documentation](https://spiceai.org/docs/deployment/architectures/hybrid). A sidecar is a lightweight runtime that runs alongside an application pod, caching frequently accessed datasets locally for sub-millisecond reads. The cluster is a centralized deployment that handles data ingestion, acceleration refresh, distributed queries, and serves as the authoritative data layer. Sidecars delegate queries to the cluster when data is not available in the local cache.

    ', }, { title: 'How does the CDN analogy apply to data architecture?', paragraph: '

    In a CDN, edge nodes cache popular content close to users while the origin server holds the full content set. In a hybrid data architecture, sidecars act as edge nodes -- caching hot data close to the application -- while the cluster acts as the origin server, holding the complete accelerated dataset. Both patterns reduce latency by serving from the nearest cache and delegating to the origin only on cache misses.

    ', }, { title: 'How do sidecars stay in sync with the cluster?', paragraph: '

    Sidecars periodically pull updated data from the cluster on a configured refresh interval. The cluster manages all upstream data ingestion and refresh (including CDC-based refresh from transactional databases). Depending on the configuration, sidecars can also receive push-based invalidation signals or delegate all queries for freshness-critical datasets directly to the cluster.

    ', }, { title: 'Does the hybrid pattern increase load on upstream data sources?', paragraph: '

    No. The hybrid pattern reduces upstream source load because only the cluster connects to data sources -- sidecars never connect to upstream systems directly. Whether you run 5 or 500 sidecars, the upstream source sees the same number of connections and queries from the single cluster.

    ', }, { title: 'When should I use a single sidecar instead of the hybrid architecture?', paragraph: '

    A single sidecar (without a cluster) is simpler and sufficient for single-instance applications with small datasets, where the sidecar can handle both ingestion and caching. The hybrid pattern is justified when you need centralized data management, horizontal scaling across many application instances, or separation of real-time reads from analytical workloads.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Hybrid Search? URL: https://spice.ai/learn/hybrid-search Date: 2026-02-06T00:00:00 Description: Hybrid search combines vector similarity search with keyword matching to deliver more accurate results than either method alone. Learn how hybrid search works, ranking algorithms like RRF, and when to use it. Search has evolved beyond simple keyword matching. Vector search uses embeddings to find semantically similar content -- "refund policy" matches "return process" even though they share no words. But vector search has blind spots: it can miss exact product names, error codes, or technical identifiers that keyword search handles effortlessly. Hybrid search solves this by running both methods in parallel and combining their results. A user query is simultaneously matched against a vector index (for semantic relevance) and a keyword index (for exact term matching). The results are merged using a ranking algorithm that balances both signals, producing a final result set that captures both meaning and precision. ## Why Pure Vector Search Falls Short Vector search (also called semantic search) encodes text into high-dimensional vectors using an embedding model and finds the closest vectors to the query. This captures meaning well -- "How do I cancel my subscription?" matches content about "account termination procedures." But vector search struggles with: - **Exact identifiers:** Product names, model numbers, error codes, and acronyms may not embed well. Searching for "ERR-4502" might return results about errors in general rather than the specific error code. - **Rare or technical terms:** Domain-specific jargon, newly coined terms, or proper nouns that weren't well-represented in the embedding model's training data produce weak vectors. - **Precision at the tail:** For broad queries, vector search returns semantically related but not precisely relevant results. The top results are good, but quality drops quickly. ## Why Pure Keyword Search Falls Short Keyword search (BM25, TF-IDF) matches documents that contain the query's exact terms, weighted by frequency and rarity. This is precise for exact matches but misses semantic equivalents: - **Synonym blindness:** "automobile insurance" won't match content about "car coverage" despite identical meaning. - **Intent misunderstanding:** "How to speed up queries" won't match content titled "Query Performance Optimization" because the terms differ. - **Vocabulary mismatch:** Users describe problems in their own words, which often don't match the terminology in documentation or knowledge bases. ## How Hybrid Search Works A hybrid search system operates in three stages: parallel retrieval, score normalization, and result fusion. ### Parallel Retrieval The query is processed by both search systems simultaneously: 1. **Vector search:** The query is embedded into a vector and matched against the vector index using cosine similarity or dot product. Returns the top-k most semantically similar documents with similarity scores. 2. **Keyword search:** The query is tokenized and matched against the inverted index using BM25 scoring. Returns the top-k documents containing the most relevant keyword matches. These two retrievals are independent and can execute concurrently, so hybrid search doesn't add meaningful latency over running either method alone. ### Score Normalization Vector similarity scores and BM25 scores are on different scales. Cosine similarity ranges from -1 to 1, while BM25 scores are unbounded positive numbers. Before combining results, scores must be normalized to a common scale. Common normalization approaches include min-max normalization (scaling to 0-1 range within each result set) and z-score normalization (centering on mean with unit standard deviation). ### Result Fusion The normalized results are combined using a ranking algorithm. The most common approach is **Reciprocal Rank Fusion (RRF)**, which works by: 1. Assigning each result a score based on its rank position in each result set: `1 / (k + rank)` 2. Summing scores for documents that appear in multiple result sets 3. Sorting by the combined score The key insight behind RRF is that it doesn't rely on raw scores at all -- only rank positions. This makes it robust to score distribution differences between search methods. ```sql -- Example: Hybrid search with RRF in Spice SELECT * FROM search( 'customer_docs', 'how to cancel subscription', mode => 'hybrid', limit => 10 ) ``` Other fusion methods include: - **Weighted linear combination:** `score = alpha * vector_score + (1 - alpha) * keyword_score`, where alpha controls the balance - **Cross-encoder re-ranking:** A more expensive model re-scores the merged candidates for higher precision - **Learned fusion:** A trained model determines optimal weights per query type ## Hybrid Search for RAG Hybrid search is especially important for [retrieval-augmented generation (RAG)](/learn/retrieval-augmented-generation) systems, where retrieval quality directly determines answer quality. RAG applications face a unique challenge: the retrieved context must be both semantically relevant (the right topic) and factually precise (the right details). Pure vector search might retrieve content about the right topic but miss the specific document containing the answer. Pure keyword search might find exact term matches in irrelevant contexts. Hybrid search addresses both failure modes: - **Semantic recall:** Vector search ensures the retriever captures conceptually related content even when the user's language differs from the source material - **Precision grounding:** Keyword search ensures exact terms, names, and identifiers are matched, preventing the retriever from drifting to related-but-wrong content In practice, RAG systems using hybrid search show measurably higher answer accuracy than those using either search method alone, especially for technical and domain-specific queries where vocabulary mismatch is common. ## Hybrid Search vs. Separate Search Systems Many teams build hybrid search by stitching together separate systems -- a vector database (Pinecone, Weaviate, Qdrant) alongside a search engine (Elasticsearch, OpenSearch). This approach works but introduces operational complexity: - **Two systems to deploy and manage:** Separate infrastructure, monitoring, and scaling for each - **Data synchronization:** Both systems must index the same data, and changes must propagate to both - **Application-layer fusion:** The application must query both systems, normalize scores, and merge results - **Latency overhead:** Two network round-trips instead of one A unified runtime that supports vector search, keyword search, and SQL in a single system eliminates these problems. The data is indexed once, queries execute in a single round-trip, and fusion happens internally without application code. ## Hybrid Search with Spice [Spice](/platform/hybrid-sql-search) provides hybrid search natively in a single runtime: - **Vector, full-text, and SQL search** combined in one query engine -- no separate systems to manage - **Built-in RRF and weighted fusion** for combining vector and keyword results - **[SQL federation](/learn/sql-federation)** for searching across [30+ connected data sources](/integrations) - **[Real-time CDC](/learn/change-data-capture)** to keep search indexes fresh as source data changes - **[LLM inference](/learn/llm-inference)** for generating embeddings alongside search queries in the same runtime This unified approach means RAG applications can index, search, and generate in one system rather than orchestrating separate vector databases, search engines, and data pipelines. ## Advanced Topics ### Learned Sparse Representations (SPLADE) Traditional keyword search relies on exact term matching with BM25 scoring. SPLADE (Sparse Lexical and Expansion) models improve on this by learning sparse representations that include term expansion -- a trained model predicts which vocabulary terms are relevant to a passage even if they don't appear in the text. For example, a passage about "automobile insurance premiums" would receive non-zero weights for related terms like "car," "vehicle," "coverage," and "policy." This means SPLADE captures some semantic understanding within a sparse representation, bridging the gap between keyword and vector search. The practical benefit is that SPLADE models can replace or augment BM25 in the keyword leg of a hybrid search pipeline. Because the output is still a sparse vector, it uses the same inverted index infrastructure as BM25 -- no separate vector database required. SPLADE representations are typically more expensive to compute than BM25 scores but cheaper than dense embeddings, making them an attractive middle ground. In hybrid search systems, replacing BM25 with a SPLADE model improves recall on queries where vocabulary mismatch is an issue, while maintaining the precision advantages of sparse representations for exact-match queries. The tradeoff is additional indexing compute and the need to train or fine-tune the SPLADE model on domain-specific data for optimal results. ### Cross-Encoder Re-ranking The initial retrieval stage of hybrid search -- whether BM25, vector, or both -- uses models that encode the query and documents independently. This is efficient (each document embedding is computed once at index time) but limits how precisely the system can assess relevance. Cross-encoder re-ranking addresses this by scoring each candidate document against the query in a single forward pass through a transformer model. The query and document tokens attend to each other directly, producing a more nuanced relevance score. Because this is computationally expensive (each query-document pair requires a full model inference), cross-encoders are applied only to the top candidates returned by the initial retrieval stage -- typically re-scoring the top 50-100 results to produce the final top-k. The re-ranking stage typically adds 50-200ms of latency depending on the number of candidates and the model size. In practice, this is an acceptable tradeoff for applications like [retrieval-augmented generation](/learn/retrieval-augmented-generation) where retrieval precision directly determines answer quality. Models like Cohere Rerank, ColBERT, and open-source cross-encoders from the sentence-transformers library are commonly used. ### Multi-Stage Retrieval Pipelines Production search systems often use more than two stages. A common architecture is: 1. **Candidate generation:** BM25 or a fast approximate nearest neighbor (ANN) search retrieves a broad set of candidates (hundreds to thousands) with high recall but moderate precision. 2. **First-pass ranking:** A lightweight model (e.g., a small bi-encoder or SPLADE) re-scores candidates to reduce the set to a manageable size (50-100). 3. **Second-pass re-ranking:** A cross-encoder re-ranks the reduced candidate set for maximum precision, producing the final top-k results. Each stage narrows the candidate set while applying progressively more expensive (and more accurate) scoring. This cascade design balances latency and quality: the cheap first stage ensures nothing important is missed, while the expensive final stage ensures the top results are maximally relevant. Tuning a multi-stage pipeline requires optimizing each stage independently. The candidate generation stage must have high recall (retrieve all potentially relevant documents), even at the cost of lower precision. The re-ranking stages must have high precision (correctly rank the most relevant documents at the top). Metrics like recall@100 for the first stage and NDCG@10 for the final stage are standard benchmarks. RRF is a ranking algorithm that merges results from multiple search methods by scoring each document based on its rank position (not its raw score) in each result set. Documents appearing near the top of multiple result sets receive the highest combined scores. RRF is popular because it is simple, effective, and doesn't require score normalization.

    ", }, { title: 'Does hybrid search add latency compared to vector search alone?', paragraph: '

    Minimal. The vector and keyword searches execute in parallel, so the total latency is roughly the maximum of the two rather than the sum. In a unified runtime where both indexes are co-located, hybrid search typically adds only a few milliseconds for the fusion step.

    ', }, { title: 'When should I use hybrid search instead of pure vector search?', paragraph: '

    Use hybrid search when your queries involve exact identifiers (product names, error codes, IDs), domain-specific terminology, or when retrieval precision matters more than just semantic similarity. In practice, hybrid search outperforms pure vector search for most production use cases, especially in RAG systems and enterprise search.

    ', }, { title: 'How do I tune the balance between vector and keyword results?', paragraph: '

    With RRF, the balance is determined by the k parameter (typically 60). With weighted linear combination, adjust the alpha weight between 0 (all keyword) and 1 (all vector). Start with equal weighting and tune based on evaluation metrics. The optimal balance depends on your data and query patterns.

    ', }, { title: 'Can hybrid search work with SQL queries?', paragraph: '

    Yes. In systems like Spice, hybrid search is expressed as SQL -- you can combine vector similarity, keyword matching, and structured SQL filters in a single query. This is especially powerful for filtering search results by metadata (date ranges, categories, access permissions) alongside semantic and keyword matching.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## Learn Data & AI URL: https://spice.ai/learn/index Date: 2026-03-11T00:00:00 Description: Learn about the core technologies behind Spice.ai -- SQL federation, data virtualization, RAG, hybrid search, change data capture, the Model Context Protocol, and more. --- ## What is LLM Inference? URL: https://spice.ai/learn/llm-inference Date: 2026-01-29T00:00:00 Description: LLM inference is the process of generating text by running input through a trained large language model. Learn how inference works, key performance metrics, and optimization techniques like KV caching, quantization, and speculative decoding. Training a large language model and running inference on it are two fundamentally different operations. Training adjusts billions of parameters over weeks or months using massive datasets and GPU clusters. Inference uses those fixed parameters to generate output from a single input -- and it needs to happen in milliseconds. For most developers, inference is the only interaction point with a model. Whether you are building a chatbot, a code assistant, a search pipeline, or an autonomous agent, the quality and speed of inference determines the user experience. Understanding how inference works -- and what levers you have to optimize it -- is essential for building production AI systems. ## How LLM Inference Works Inference is the forward pass through a trained neural network. For large language models, this process has four stages: tokenization, the forward pass, sampling, and detokenization. ### Tokenization The model does not process raw text. Before inference begins, the input prompt is converted into a sequence of **tokens** -- integer IDs that map to subword units in the model's vocabulary. A tokenizer splits text into these units based on a learned vocabulary (typically 32,000 to 128,000 tokens). For example, the sentence "SQL federation queries multiple databases" might be tokenized into `["SQL", " feder", "ation", " queries", " multiple", " databases"]`, where each piece maps to an integer ID. The model operates entirely on these integer sequences. ### The Forward Pass The token IDs are converted into dense vector embeddings and passed through the model's transformer layers. Each layer applies self-attention (computing relationships between all tokens in the sequence) and feed-forward transformations. For a model like Llama 3 70B, this means passing through 80 transformer layers with 64 attention heads each. The output of the final layer is a probability distribution over the entire vocabulary for the **next token**. This is the core computation: given a sequence of tokens, predict the probability of every possible next token. ### Sampling The raw probability distribution is processed by a sampling strategy to select the next token. Common strategies include: - **Greedy decoding:** Always select the highest-probability token. Deterministic but can produce repetitive output. - **Temperature sampling:** Scale the probability distribution by a temperature parameter. Lower temperatures (e.g., 0.2) make the distribution sharper, favoring high-probability tokens. Higher temperatures (e.g., 1.0) flatten the distribution, increasing diversity. - **Top-k sampling:** Restrict selection to the k most probable tokens, then sample from that subset. - **Top-p (nucleus) sampling:** Restrict selection to the smallest set of tokens whose cumulative probability exceeds p (e.g., 0.9), then sample. The choice of sampling strategy affects output quality, creativity, and consistency. For structured tasks like code generation or SQL queries, low temperature with top-p sampling typically produces the best results. For creative writing or brainstorming, higher temperatures introduce useful variation. ### Detokenization The selected token ID is mapped back to its text representation using the tokenizer's vocabulary. This token is appended to the output and -- critically -- fed back into the model as part of the input for predicting the next token. This autoregressive loop continues until the model generates a stop token or reaches a maximum length. This sequential, token-by-token generation is why LLM inference is inherently slower than a simple database query. Each new token requires a forward pass through the entire model. ## Inference Performance Metrics Four metrics define the performance profile of an LLM inference system: ### Latency Total time from request to complete response. For a chat application, this is how long the user waits. Latency is the sum of time-to-first-token plus the time to generate all subsequent tokens. ### Time to First Token (TTFT) The time between receiving a request and producing the first output token. TTFT is dominated by the **prefill phase** -- processing the entire input prompt through the model in a single forward pass. Longer prompts mean longer TTFT because the model must compute attention across all input tokens before generating any output. For interactive applications, TTFT determines perceived responsiveness. A system with 200ms TTFT feels responsive even if total generation takes several seconds, because the user sees output beginning almost immediately. ### Tokens Per Second (TPS) The rate at which output tokens are generated after the first token. TPS measures the speed of the **decode phase** -- the autoregressive loop where each new token is generated one at a time. TPS is bounded by memory bandwidth rather than compute, because each decode step reads the full model weights from memory to generate a single token. ### Throughput The total number of tokens a system can generate per second across all concurrent requests. A system with 50 TPS per request serving 20 concurrent users has an aggregate throughput of 1,000 tokens per second. Throughput determines cost-efficiency: higher throughput means more work done per GPU-hour. ## Inference vs. Training Training and inference use the same model architecture but differ in nearly every operational dimension: - **Direction:** Training computes forward and backward passes (backpropagation) to update weights. Inference computes only the forward pass with fixed weights. - **Compute profile:** Training is compute-bound -- dominated by matrix multiplications across large batches. Inference (especially the decode phase) is memory-bandwidth-bound -- each token generation reads the full model weights but performs relatively little computation. - **Hardware:** Training requires clusters of high-end GPUs with fast interconnects (NVLink, InfiniBand). Inference can run on a single GPU, a CPU, or even edge devices depending on the model size and latency requirements. - **Batching:** Training uses large batch sizes (thousands of samples) for efficiency. Inference batches are constrained by latency requirements -- larger batches improve throughput but increase per-request latency. ## Inference Optimization Techniques Several techniques reduce the cost and latency of LLM inference without significantly affecting output quality. ### KV Cache During autoregressive generation, the model recomputes attention over all previous tokens at each step. The **key-value (KV) cache** stores the intermediate key and value tensors from previous tokens so they don't need to be recomputed. This turns each decode step from O(n^2) attention to O(n), dramatically reducing computation for long sequences. The tradeoff is memory. For a 70B parameter model with a 4,096-token context, the KV cache can consume several gigabytes of GPU memory. Managing KV cache memory is one of the primary challenges in serving long-context models. ### Quantization Quantization reduces the precision of model weights from 16-bit floating point to 8-bit integers (INT8) or 4-bit integers (INT4). This reduces memory usage by 2-4x and increases inference speed because lower-precision operations are faster and the model reads less data from memory. ``` # Model memory usage at different precisions # Llama 3 70B parameters: # FP16: ~140 GB (70B params x 2 bytes) # INT8: ~70 GB (70B params x 1 byte) # INT4: ~35 GB (70B params x 0.5 bytes) ``` Modern quantization methods (GPTQ, AWQ, GGUF) minimize quality loss by calibrating quantization ranges against representative data. In practice, INT8 quantization produces output nearly indistinguishable from FP16 for most tasks. INT4 introduces measurable quality degradation but enables running large models on consumer hardware. ### Speculative Decoding Speculative decoding uses a small, fast **draft model** to generate several candidate tokens quickly, then verifies them in a single forward pass through the large target model. If the draft tokens are accepted (because the large model assigns them high probability), multiple tokens are produced in the time it would take to generate one. This technique works well when the draft model's predictions frequently align with the target model -- which is common for straightforward text. Speculative decoding can improve TPS by 2-3x without any quality loss, because rejected draft tokens are replaced with the target model's output. ### Continuous Batching Traditional batching groups requests into fixed-size batches and processes them together. The problem: short requests finish early but wait for the longest request in the batch to complete, wasting GPU cycles. Continuous batching (also called iteration-level batching) inserts new requests into the batch as soon as existing requests finish, keeping the GPU fully utilized. This improves throughput significantly -- frameworks like vLLM and TensorRT-LLM use continuous batching to serve 2-5x more concurrent requests on the same hardware. ## Local vs. Cloud Inference Developers choosing where to run inference face a set of tradeoffs: **Cloud/API inference** (OpenAI, Anthropic, Google) provides access to the largest models without managing infrastructure. The tradeoffs are per-token cost, network latency, data privacy constraints, and vendor dependency. For prototyping and applications where the largest models are necessary, cloud inference is the practical starting point. **Local inference** runs models on your own hardware -- GPUs, CPUs, or edge devices. This eliminates per-token cost, removes network latency, and keeps data private. The tradeoffs are hardware investment, model size limitations (you need enough memory to fit the model), and operational overhead. Quantized open-source models (Llama, Mistral, Qwen) make local inference increasingly practical for production workloads. **Hybrid approaches** route requests to local or cloud models based on task complexity, latency requirements, or cost budgets. Simple classification or extraction tasks go to a fast, small local model. Complex reasoning tasks go to a large cloud model. This pattern optimizes for both cost and quality. ## Inference for Embeddings vs. Generation Not all LLM inference is text generation. **Embedding inference** runs input through a model to produce a dense vector representation rather than generating new tokens. Embedding models are used for [semantic search](/learn/hybrid-search), [retrieval-augmented generation](/learn/retrieval-augmented-generation), clustering, and classification. Embedding inference is fundamentally different from generative inference: - **Single forward pass:** Embeddings are produced in one pass through the model. There is no autoregressive loop, no sampling, no token-by-token generation. - **Batch-friendly:** Embedding requests can be batched aggressively because there is no sequential dependency between tokens. - **Latency profile:** Embedding latency scales with input length but is typically 10-100x faster than generating the same number of tokens, because there is no decode phase. Production systems often run embedding and generative models side by side. A search query generates an embedding (fast, single-pass inference), which retrieves relevant documents, which are then fed to a generative model for synthesis (slower, autoregressive inference). ## LLM Inference with Spice [Spice](/platform/llm-inference) serves LLM inference alongside [federated SQL queries](/learn/sql-federation), embedding search, and [tool calling](/learn/llm-tool-calling) in a single runtime. This co-location means AI applications can: - **Query data and run inference in one request:** Retrieve context from databases via [SQL federation](/learn/sql-federation), generate embeddings for [hybrid search](/learn/hybrid-search), and produce a response -- all through a single endpoint. - **Route across models:** Direct requests to local open-source models or cloud APIs based on task requirements, cost, and latency constraints. - **Combine inference with tool use:** Models served through Spice can invoke tools via the [MCP gateway](/feature/mcp-server-gateway) to access live data, execute queries, and take actions as part of the inference loop. - **Observe everything:** Distributed tracing across data queries, inference calls, and tool invocations provides full visibility into end-to-end AI workflows. This unified approach eliminates the need to stitch together separate services for data access, model serving, and tool execution -- reducing operational complexity while improving latency through co-located processing. ## Advanced Topics ### The Inference Pipeline A complete inference request passes through multiple stages, each with distinct performance characteristics and optimization opportunities. ```mermaid flowchart LR A[Prompt] --> B[Tokenize] B --> C[Prefill] C --> D[Decode Loop] D --> E[Detokenize] E --> F[Response] ``` The prefill phase processes all input tokens in parallel through the model's transformer layers, producing the KV cache and the first output token. The decode phase then generates tokens one at a time in an autoregressive loop, reading from and appending to the KV cache at each step. Prefill is compute-bound (matrix multiplications across the full input sequence), while decode is memory-bandwidth-bound (reading model weights for each single-token generation). Understanding this distinction is essential for choosing the right optimization strategy. ### PagedAttention The KV cache is the primary memory bottleneck in LLM serving. Traditional implementations pre-allocate a contiguous block of GPU memory for each request's KV cache based on the maximum possible sequence length. This leads to significant memory waste -- a request that generates 100 tokens still reserves memory for the full context window (e.g., 8,192 or 128,000 tokens). PagedAttention, introduced by the vLLM project, applies virtual memory concepts from operating systems to KV cache management. Instead of allocating contiguous memory, it divides the KV cache into fixed-size blocks (pages) that are allocated on demand as new tokens are generated. Pages can be stored non-contiguously in GPU memory and mapped through a block table, similar to how a CPU's page table maps virtual addresses to physical memory. The practical impact is substantial: PagedAttention reduces KV cache memory waste from 60-80% to near zero, enabling 2-4x more concurrent requests on the same GPU hardware. This directly translates to higher throughput and lower cost per token. PagedAttention also enables efficient memory sharing for techniques like parallel sampling and beam search, where multiple output sequences share the same input prefix. ### Prefix Caching Many inference workloads involve repeated prefixes. Chat applications prepend the same system prompt to every request. [RAG](/learn/retrieval-augmented-generation) systems share common instructions and formatting templates. API endpoints serving the same application reuse the same tool definitions and context structures. Prefix caching stores the KV cache entries for common prefixes in GPU memory so they don't need to be recomputed for each request. When a new request arrives with a matching prefix, the system copies the cached KV entries (or references them via PagedAttention's block table) and only computes the prefill for the unique portion of the prompt. For workloads where the shared prefix constitutes 50-90% of the input (common in production applications with long system prompts), prefix caching can reduce time-to-first-token by a corresponding 50-90%. This optimization is especially impactful for [tool calling](/learn/llm-tool-calling) workloads where tool definitions are repeated across every request. ### Inference Serving Architectures Production inference serving systems must balance throughput, latency, and cost across diverse workload patterns. Two architectural approaches have emerged. **Model-parallel serving** distributes a single large model across multiple GPUs using tensor parallelism (splitting layers across GPUs) or pipeline parallelism (assigning different layers to different GPUs). Tensor parallelism reduces per-token latency by parallelizing the computation within each layer, while pipeline parallelism increases throughput by processing different requests at different pipeline stages simultaneously. **Disaggregated serving** separates the prefill and decode phases onto different hardware. Prefill is compute-bound and benefits from high-FLOPS GPUs, while decode is memory-bandwidth-bound and benefits from GPUs with high memory bandwidth. By routing prefill and decode to hardware optimized for each phase, disaggregated architectures can improve overall cost-efficiency by 30-50% compared to running both phases on the same hardware. This pattern is gaining adoption in large-scale serving systems where the workload justifies the additional routing complexity. Training adjusts a model's parameters by computing forward and backward passes over large datasets, typically requiring GPU clusters and running for weeks. Inference uses the trained, fixed parameters to generate output from a single input in milliseconds to seconds. Training is compute-bound; inference (particularly the decode phase) is memory-bandwidth-bound.

    ", }, { title: 'What is the difference between TTFT and TPS?', paragraph: '

    Time to First Token (TTFT) measures how long it takes to produce the first output token, dominated by processing the input prompt (the prefill phase). Tokens Per Second (TPS) measures how fast subsequent tokens are generated during the decode phase. A system can have low TTFT but moderate TPS, or vice versa -- they are independent performance dimensions influenced by different bottlenecks.

    ', }, { title: 'Does quantization significantly reduce output quality?', paragraph: '

    INT8 quantization produces output that is nearly indistinguishable from full-precision (FP16) inference for most tasks, with minimal quality degradation. INT4 quantization introduces measurable quality loss, particularly on reasoning-heavy tasks, but enables running large models on significantly less hardware. Modern quantization methods (GPTQ, AWQ) minimize this loss by calibrating against representative data.

    ', }, { title: 'When should I use local inference vs. a cloud API?', paragraph: '

    Use cloud APIs when you need the largest models, want to avoid infrastructure management, or are prototyping. Use local inference when per-token cost, data privacy, or network latency are primary concerns. Many production systems use a hybrid approach: routing simple tasks to fast local models and complex tasks to large cloud models based on cost and quality requirements.

    ', }, { title: 'How does embedding inference differ from text generation?', paragraph: '

    Embedding inference produces a fixed-size vector representation of input text in a single forward pass -- no autoregressive token generation, no sampling. This makes it significantly faster and more batch-friendly than generative inference. Embedding inference is used for semantic search, retrieval-augmented generation (RAG), classification, and clustering, while generative inference produces free-form text output.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is LLM Tool Calling? URL: https://spice.ai/learn/llm-tool-calling Date: 2026-02-20T00:00:00 Description: LLM tool calling is a capability where a model outputs structured function calls instead of plain text, enabling AI agents to query databases, call APIs, and take actions. Learn how tool calling works, security considerations, and how MCP standardizes tool use. Large language models generate text. But text alone cannot query a database, send an email, read a file, or call an API. Tool calling bridges this gap by enabling a model to output structured function calls that an application can execute on the model's behalf. Without tool calling, developers resort to prompt engineering -- instructing the model to output JSON in a specific format, then parsing that output with custom code. This approach is fragile: the model might produce malformed JSON, hallucinate function names, or include extra text around the structured output. Tool calling formalizes this interaction, giving the model a typed interface for expressing "I want to call this function with these arguments." Tool calling is the foundation of agentic AI. Every autonomous agent that plans, executes multi-step tasks, and interacts with the real world depends on the ability to invoke tools reliably and correctly. ## How Tool Calling Works A tool calling interaction follows a defined loop between the application, the model, and external systems. ### Step 1: Define Available Tools The application provides the model with a set of **tool definitions** -- each specifying a name, description, and parameter schema. These definitions tell the model what tools are available and how to call them. ```json { "tools": [ { "name": "query_database", "description": "Execute a read-only SQL query against the application database", "parameters": { "type": "object", "properties": { "sql": { "type": "string", "description": "The SQL query to execute" } }, "required": ["sql"] } }, { "name": "search_documents", "description": "Search indexed documents using natural language", "parameters": { "type": "object", "properties": { "query": { "type": "string" }, "limit": { "type": "integer", "default": 10 } }, "required": ["query"] } } ] } ``` The quality of tool definitions directly affects how well the model uses them. Clear, specific descriptions and well-typed parameter schemas reduce errors and hallucinated arguments. ### Step 2: Model Selects a Tool Given the user's message and the available tool definitions, the model decides whether to respond with text or invoke a tool. If it chooses a tool, it outputs a structured object with the tool name and arguments: ```json { "tool_call": { "name": "query_database", "arguments": { "sql": "SELECT customer_name, SUM(amount) as total FROM orders WHERE created_at > '2026-01-01' GROUP BY customer_name ORDER BY total DESC LIMIT 10" } } } ``` The model does not execute the tool. It produces a structured request that the application interprets. ### Step 3: Application Executes the Tool The application receives the tool call, validates the arguments, and executes the function. This is where security controls, rate limiting, and authorization checks are applied. The application -- not the model -- decides whether the tool call is safe to execute. ### Step 4: Result Fed Back to Model The tool's output is sent back to the model as a new message in the conversation. The model then reasons about the result and either responds to the user with text or makes another tool call. ```json { "role": "tool", "name": "query_database", "content": "[{\"customer_name\": \"Acme Corp\", \"total\": 142500}, {\"customer_name\": \"Globex\", \"total\": 98300}]" } ``` This loop -- tool call, execution, result, reasoning -- can repeat multiple times in a single interaction, enabling multi-step workflows. ## The Multi-Step Tool Calling Loop Simple questions need a single tool call. Complex tasks require multiple steps where the output of one tool informs the next. Consider a user asking: "Which of our enterprise customers had the highest support ticket volume last quarter, and what were the top issues?" A capable agent might execute this sequence: 1. **Call `query_database`** to get enterprise customers from the CRM 2. **Call `query_database`** to get support tickets for those customers in the last quarter 3. **Call `query_database`** to aggregate tickets by issue category 4. **Reason** about the results and produce a summary Each step depends on the previous step's output. The model plans the sequence, executes tools iteratively, and synthesizes the results into a coherent response. This multi-step reasoning is what distinguishes agentic tool use from simple function calling. ### Parallel Tool Calls Some models support **parallel tool calling**, where multiple independent tools are invoked in a single turn. If the model needs both customer data and product data, it can issue both queries simultaneously rather than sequentially. This reduces round trips and improves latency in multi-step workflows. ## Tool Calling vs. Prompt Engineering for Structured Output Before tool calling was widely available, developers extracted structured actions from models using prompt engineering: ``` You are a helpful assistant. When the user asks for data, respond with a JSON object like: {"action": "query", "sql": "SELECT ..."} Do not include any other text in your response. ``` This approach has several problems: - **Unreliable formatting:** The model might include markdown code fences, explanatory text, or malformed JSON. - **No schema validation:** There is no formal contract between the model's output and the expected structure. - **Ambiguous intent:** The model might respond with text when a tool call was expected, or vice versa. - **No tool discovery:** Adding new tools requires rewriting the system prompt rather than adding a typed definition. Tool calling solves these problems by making function invocation a first-class capability of the model. The model produces typed, validated tool calls through a dedicated output channel, separate from text generation. This is more reliable, easier to maintain, and scales to dozens or hundreds of tools. ## How MCP Standardizes Tool Calling The [Model Context Protocol (MCP)](/learn/model-context-protocol) standardizes how AI applications discover, connect to, and invoke tools across distributed servers. Without MCP, every AI application implements its own tool calling integration for each external service. MCP defines a universal protocol so a tool built once works with any MCP-compatible client. MCP's contribution to tool calling is threefold: **Discovery:** MCP servers expose tool manifests -- machine-readable descriptions of available tools, their parameters, and their capabilities. An AI application connecting to an MCP server automatically discovers what tools are available without hardcoded configurations. **Transport:** MCP defines how tool calls and results are transmitted between the AI application (client) and the tool provider (server), supporting both local execution (stdio) and remote execution (SSE over HTTP). **Interoperability:** A tool exposed as an MCP server works with Claude, GitHub Copilot, Cursor, and any other MCP-compatible client. This eliminates the O(N x M) integration problem where N applications each need custom integrations for M tools. ## Tool Calling for Data Access One of the most common tool calling patterns is giving AI models access to data through SQL queries, API calls, or search operations. ### SQL as a Tool When a model has access to a `query_database` tool, it can answer data questions by writing and executing SQL. This is more flexible than pre-computed dashboards because the model generates queries dynamically based on the user's specific question. [SQL federation](/learn/sql-federation) makes this pattern even more powerful. A single tool can provide access to PostgreSQL, MySQL, Snowflake, S3, and [30+ other data sources](/integrations) through one interface. The model writes a SQL query; the federation engine routes it to the correct source. ### Search as a Tool Models that need to retrieve relevant documents or context can use search tools. A [hybrid search](/learn/hybrid-search) tool combines keyword and semantic search to find relevant content, which the model then uses to generate informed responses. This is the tool-calling-based approach to [retrieval-augmented generation (RAG)](/learn/retrieval-augmented-generation). ### API Calls as Tools Tools can wrap any HTTP API -- reading from a CRM, posting a message to Slack, creating a Jira ticket, or triggering a deployment. Each API endpoint becomes a tool with a defined schema, and the model invokes it as needed during multi-step workflows. ## Security Considerations Tool calling introduces a new attack surface: the model can now take actions, not just produce text. Security must be treated as a first-class concern. ### Input Validation Every tool call argument must be validated before execution. A model generating SQL queries could produce destructive statements (`DROP TABLE`, `DELETE FROM` without a `WHERE` clause). The application must enforce read-only constraints, parameterize inputs, and reject malformed queries. ```python # Always validate and constrain tool call arguments def execute_query(sql: str) -> str: # Reject write operations normalized = sql.strip().upper() if any(normalized.startswith(kw) for kw in ["DROP", "DELETE", "UPDATE", "INSERT", "ALTER", "TRUNCATE"]): return "Error: Only read-only queries are permitted." # Execute with timeout and row limit result = db.execute(sql, timeout=5000, max_rows=1000) return json.dumps(result) ``` ### Sandboxing Tools that execute code, access file systems, or interact with infrastructure should run in sandboxed environments with minimal permissions. A code execution tool should run in a container with no network access and no persistent storage. A file system tool should be restricted to a specific directory. ### Authorization Not every user (or model) should have access to every tool. Authorization policies should control: - Which tools are available to which models or users - What parameter values are permitted (e.g., restricting queries to specific tables) - Rate limits on tool invocations - Audit logging for compliance and debugging ### Prompt Injection Malicious content in tool results can attempt to manipulate the model's behavior. If a search tool returns a document containing "Ignore all previous instructions and...", the model might follow those instructions. Defenses include sanitizing tool outputs, using separate system prompts for tool results, and monitoring for anomalous model behavior after tool execution. ## Tool Calling with Spice [Spice](/feature/mcp-server-gateway) provides governed tool calling through its MCP gateway, combining tool execution with [federated data access](/learn/sql-federation) and [LLM inference](/learn/llm-inference) in a single runtime: - **MCP server federation:** Aggregate tools from multiple MCP servers behind a single endpoint. Models access all available tools through one connection rather than managing separate integrations. - **Governed tool routing:** Assign specific tools to specific models with fine-grained access controls. A customer-facing model gets read-only data tools; an internal automation agent gets broader access. - **Data tools built in:** SQL queries, embedding search, and [hybrid search](/learn/hybrid-search) are available as tools natively -- no external MCP server needed for data access. - **End-to-end observability:** Distributed tracing follows a request from the [inference](/learn/llm-inference) call through tool execution, data queries, and back to the model, providing full visibility into multi-step agent workflows. - **Security controls:** Input validation, rate limiting, and audit logging are applied at the gateway level, enforcing consistent policies across all tool invocations regardless of which model or client initiated the call. This approach means AI applications get model inference, tool calling, and data access through a single, governed infrastructure layer -- reducing complexity while maintaining the security controls that enterprise deployments require. ## Advanced Topics ### The Tool Calling Loop In agentic workflows, tool calling is not a single request-response exchange. It is an iterative loop where the model reasons, invokes tools, processes results, and decides whether to continue or respond to the user. ```mermaid sequenceDiagram participant User participant LLM participant Tools User->>LLM: User message + tool definitions loop Until LLM responds with text LLM->>Tools: Tool call (name + arguments) Tools->>LLM: Tool result Note over LLM: Reason about result end LLM->>User: Final text response ``` Understanding the mechanics of this loop -- and the failure modes at each step -- is essential for building reliable agent systems. ### Parallel Tool Calls When the model needs data from multiple independent sources, sequential tool calls introduce unnecessary latency. Parallel tool calling enables the model to emit multiple tool call requests in a single turn, which the application executes concurrently and returns as a batch. Consider an agent asked: "Compare our Q1 revenue against the industry benchmark and check if any support escalations are open." The model can issue a `query_revenue` call and a `check_escalations` call simultaneously. The application runs both, returns both results, and the model synthesizes a single response. Not all models support parallel tool calls natively. For models that do (including Claude and GPT-4), the tool call response contains an array of calls rather than a single call. The application must match each result back to the correct call ID when returning results. For models that don't support parallel calls, the application can implement a planning layer that detects independent tool calls across sequential turns and executes them concurrently, returning results in a single batch. Parallel tool calls reduce round trips and end-to-end latency proportionally to the number of independent calls. In multi-step agent workflows with 3-5 independent data lookups, parallel execution can cut total latency by 60-80%. ### Tool Call Chaining and Planning Complex tasks require the model to plan a sequence of tool calls where each step depends on the output of the previous one. This is tool call chaining -- the model decomposes a high-level objective into an ordered sequence of tool invocations. A user asking "Find the customer with the highest churn risk and draft a retention email based on their recent activity" requires: 1. Call `query_database` to retrieve churn risk scores 2. Call `query_database` to fetch the top customer's recent activity 3. Call `send_email` (or draft the email in text) based on the activity data The model must plan this chain, execute each step, validate intermediate results, and adjust the plan if unexpected data appears (e.g., the highest-risk customer has no recent activity on record). Effective chaining depends on the model's ability to maintain a coherent plan across multiple turns. Providing the model with explicit planning instructions in the system prompt -- "Think step by step about what information you need before taking action" -- improves chaining reliability. Some frameworks (like LangChain's plan-and-execute pattern) formalize this by having the model output an explicit plan before executing any tools. ### Error Recovery Patterns Tool calls fail. Databases time out, APIs return errors, arguments are malformed, and rate limits are hit. A robust tool calling system needs strategies for handling these failures gracefully. **Retry with backoff** is the simplest pattern: if a tool call returns a transient error (timeout, rate limit, 503), the application retries with exponential backoff before returning a permanent failure to the model. The model should not be responsible for implementing retry logic -- this is an application-layer concern. **Fallback tools** provide alternative paths when the primary tool fails. If a real-time API is unavailable, the application can fall back to a cached data source or a different API that provides approximate data. The model receives the result with a note that it came from a fallback source. **Graceful degradation** means the model explains what it could not do rather than failing silently. If a tool call fails after retries and no fallback is available, the model should report the specific failure ("I wasn't able to retrieve the latest revenue data because the database connection timed out") and offer what it can provide from the context it has. This is preferable to hallucinating an answer or returning a generic error message. Error context matters: when returning a tool failure to the model, include the error type, a human-readable message, and whether the error is transient or permanent. This gives the model enough information to decide whether to retry, use an alternative approach, or report the issue to the user. Tool calling and function calling refer to the same capability -- the model outputs a structured function invocation instead of plain text. "Function calling" was the original term used by OpenAI; "tool calling" is the more general term adopted across the industry. MCP and most frameworks use "tool calling" to describe this capability.

    ', }, { title: 'How does the model decide which tool to use?', paragraph: "

    The model selects tools based on the user's message, the conversation history, and the tool definitions (name, description, parameter schema) provided by the application. Clear, specific tool descriptions are critical -- they help the model match user intent to the correct tool. When multiple tools could apply, the model uses the descriptions and parameter schemas to choose the best fit.

    ", }, { title: 'What are the security risks of tool calling?', paragraph: '

    Tool calling allows models to take actions, not just produce text, which introduces risks: SQL injection through generated queries, unauthorized access to sensitive APIs, destructive operations (deleting data, modifying configurations), and prompt injection through tool results. Mitigations include input validation, read-only constraints, sandboxed execution, authorization policies, and audit logging.

    ', }, { title: 'How does tool calling differ from retrieval-augmented generation (RAG)?', paragraph: "

    RAG retrieves relevant documents and injects them into the model's context before generation. Tool calling lets the model invoke arbitrary functions -- which can include retrieval, but also database queries, API calls, code execution, and actions. Tool calling is more general: RAG is one pattern that can be implemented through tool calling (a search tool), but tool calling supports many other patterns beyond retrieval.

    ", }, { title: 'Can a model use multiple tools in a single interaction?', paragraph: '

    Yes. Models can make sequential tool calls where the output of one tool informs the next, enabling multi-step workflows. Some models also support parallel tool calling, where multiple independent tools are invoked simultaneously in a single turn. Multi-step tool use is the foundation of agentic AI, where models plan and execute complex tasks autonomously.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is the Model Context Protocol (MCP)? URL: https://spice.ai/learn/model-context-protocol Date: 2026-02-25T00:00:00 Description: The Model Context Protocol (MCP) is an open standard for connecting AI models to external data and tools. Learn how MCP works, its architecture, and how MCP servers enable agentic AI. AI models are increasingly expected to do more than generate text. They query databases, read files, call APIs, search the web, and execute code. Each of these capabilities requires the model to interact with an external system -- and until recently, every integration was custom. An AI coding assistant that needs to read your project files requires one integration. If it also needs to search Jira tickets, that's a second integration with completely different authentication, error handling, and response formatting. Adding GitHub, Slack, or a database means building and maintaining yet more custom code. The Model Context Protocol (MCP) standardizes this. MCP defines a universal interface between AI applications and the external tools and data sources they need. Instead of building custom integrations for each service, developers implement the MCP server specification once, and any MCP-compatible AI client can discover and use those tools. ## MCP Architecture MCP follows a client-server architecture with three core concepts: servers, clients, and transports. ### MCP Servers An MCP server is a process that exposes capabilities to AI models through a standardized interface. A server can expose three types of capabilities: **Tools** are functions that an AI model can invoke. A database query tool accepts a SQL string and returns results. A file system tool reads or writes files. A Slack tool sends messages to channels. Each tool has a defined name, description, and parameter schema that the AI model uses to understand what the tool does and how to call it. **Resources** are data that a model can read. A project's file tree, a database schema, or a configuration file can be exposed as MCP resources. Unlike tools, resources are read-only -- the model accesses them for context but doesn't invoke them to perform actions. **Prompts** are reusable templates that structure how a model interacts with the server. A code review prompt might define how the model should analyze a pull request; a data analysis prompt might structure how the model should query and interpret results. ### MCP Clients AI applications act as MCP clients. The client discovers available servers, reads their capability manifests (what tools, resources, and prompts they expose), and orchestrates interactions between the AI model and servers. When the model decides to use a tool, the client handles: 1. **Discovery:** Querying available servers to find a tool that matches the model's intent 2. **Parameter construction:** Formatting the model's requested parameters according to the tool's schema 3. **Invocation:** Sending the request to the correct server over the configured transport 4. **Response handling:** Parsing the result and feeding it back to the model for further reasoning ### Transport Layer MCP supports two transport mechanisms that determine how clients and servers communicate: **stdio (Standard I/O)** is for local execution. The MCP server runs as a subprocess of the client application, and they communicate over stdin/stdout. This is the simplest setup -- no networking, no authentication, minimal latency. Most development tools and IDE integrations use stdio transport. **SSE (Server-Sent Events) over HTTP** is for remote execution. The MCP server runs on separate infrastructure and the client connects over HTTP. SSE enables distributed architectures where MCP servers are hosted centrally and shared across multiple clients, teams, or applications. The transport choice determines the deployment model: stdio for local, single-user tools; SSE for shared, enterprise-grade infrastructure. ## Why MCP Matters ### Before MCP: Custom Integrations Everywhere Before MCP, connecting an AI model to a new data source or tool required custom code for each combination of AI application and external service. A team using Claude, ChatGPT, and a custom AI agent would need three separate integrations for the same database -- each with its own authentication, error handling, and response parsing logic. This approach doesn't scale. Every new AI application or tool requires O(N x M) integrations (N applications x M tools) rather than O(N + M) with a shared standard. ### After MCP: Build Once, Use Everywhere With MCP, a tool integration is built once as an MCP server and works with any MCP-compatible client. A PostgreSQL MCP server works with Claude, GitHub Copilot, Cursor, and custom AI agents -- no modification needed for each client. This changes the economics of AI tool integration. Instead of every team building its own connectors, a shared ecosystem of MCP servers emerges. Open-source MCP servers already exist for dozens of common services: databases, file systems, GitHub, Slack, Jira, Google Drive, and more. ### Agentic AI Workflows AI agents -- models that plan and execute multi-step tasks autonomously -- need reliable, discoverable access to tools. An agent that's asked to "analyze our sales pipeline and email a summary to the VP" needs to: 1. Query the CRM database for pipeline data 2. Run analytical queries across historical data 3. Generate a formatted summary 4. Send an email through the organization's email system MCP provides the standardized discovery and invocation layer that makes this possible. The agent discovers available tools, reads their schemas to understand parameters, invokes them in sequence, and handles results -- all through a uniform protocol. ### Security and Governance In enterprise environments, giving AI models unrestricted access to tools and data creates security and compliance risks. MCP enables centralized governance: - **Access control:** Define which models can invoke which tools, with what parameters, under what conditions - **Audit logging:** Record every tool invocation for compliance and debugging - **Rate limiting:** Prevent AI models from overwhelming external services - **Data masking:** Filter sensitive information from tool responses before they reach the model These controls are especially important in regulated industries like [financial services](/industry/financial-services) and [cybersecurity](/industry/cybersecurity) where AI access to data must be governed and auditable. ## MCP Servers in Practice ### Database Access An MCP server wrapping a database (PostgreSQL, MySQL, Snowflake) exposes tools for executing queries and resources for reading schema information. The AI model can discover the database schema, write SQL queries, and execute them -- all through the MCP interface. With [SQL federation](/learn/sql-federation), a single MCP server can provide access to multiple databases simultaneously, so the AI model queries any connected source through one interface. ### Developer Tools IDE integrations (VS Code, JetBrains, Cursor) use MCP to give AI coding assistants access to: - **Project files:** Reading and writing source code - **Build systems:** Running tests, compiling code, checking linting - **Version control:** Viewing diffs, creating branches, committing changes - **Documentation:** Searching API docs, reading README files, accessing style guides The AI assistant discovers these capabilities through MCP and uses them as needed during coding tasks. ### Enterprise MCP Gateways As organizations deploy more MCP servers, managing them individually becomes unwieldy. An **MCP gateway** sits between clients and servers, providing: - **Federation:** Multiple MCP servers behind a single endpoint - **Centralized auth:** One authentication point for all tool access - **Observability:** Distributed tracing across all MCP interactions - **Load balancing:** Routing requests across server replicas This gateway pattern is how MCP scales from individual developer tools to enterprise-wide AI infrastructure. ## MCP vs. Function Calling Function calling is a model-level capability where the AI generates structured arguments for predefined functions. MCP is a protocol that standardizes how models discover, connect to, and invoke those functions across distributed servers. Think of it this way: function calling defines the "what" (the model wants to call a function with these arguments), and MCP defines the "how" (discovering the function, routing the request to the right server, handling authentication, and returning results). They're complementary layers. Function calling without MCP means custom integration code for every tool. MCP without function calling means the model can't express tool-use intent. Together, they create a complete system for AI-tool interaction. ## MCP with Spice [Spice](/feature/mcp-server-gateway) functions as an enterprise MCP gateway, federating distributed MCP servers with: - **Internal and remote hosting:** Run stdio-based tools locally for low-latency or federate to remote servers over SSE - **Governed tool routing:** Dynamically assign tools to specific models with fine-grained access controls - **Hybrid data access:** Combine MCP tool results with [federated SQL queries](/learn/sql-federation), embeddings, and [hybrid search](/platform/hybrid-sql-search) in a single runtime - **Distributed tracing:** Full visibility into execution paths across MCP servers, models, and data sources This means AI applications get tool access, data access, and query capabilities through a single, governed infrastructure layer. ## Advanced Topics ### MCP Transport Protocols in Depth The choice of transport protocol determines how MCP clients and servers communicate, and each transport has implications for latency, scalability, and security. **stdio** runs the MCP server as a child process of the client. Communication happens over stdin and stdout using JSON-RPC 2.0 messages. This transport has near-zero latency (no network overhead), strong process isolation, and simple lifecycle management -- the server starts and stops with the client. The limitation is that stdio servers cannot be shared across multiple clients or machines. Each client spawns its own server instance. **Streamable HTTP** (which supersedes the earlier SSE-only transport) uses HTTP POST for client-to-server messages and Server-Sent Events for server-to-client streaming. This enables remote deployment where MCP servers run on dedicated infrastructure and clients connect over the network. Streamable HTTP supports standard HTTP infrastructure: load balancers, TLS termination, API gateways, and authentication middleware. The tradeoff is network latency (typically 1-50ms per round trip) and the need for explicit authentication and authorization. For enterprise deployments, the Streamable HTTP transport is necessary. Multiple AI applications across an organization can connect to centrally managed MCP servers. When combined with a gateway, this creates a hub-and-spoke architecture where tool governance is centralized regardless of which client initiates the request. ### Capability Negotiation When an MCP client connects to a server, the first exchange is a capability negotiation. The client sends an `initialize` request declaring its protocol version and supported features. The server responds with its own capabilities -- which of the three primitives (tools, resources, prompts) it supports, whether it supports change notifications, and any server-specific metadata. This negotiation serves two purposes. First, it ensures version compatibility -- a client can detect if a server uses an unsupported protocol version and fail gracefully. Second, it enables feature discovery. A client connecting to an unknown server learns exactly what the server offers without hardcoded assumptions. If a server only exposes resources (no tools), the client knows not to send tool invocation requests. Capability negotiation also supports **change notifications**. A server that declares support for `tools/listChanged` can notify the client when its tool set changes at runtime -- for example, when a new database table is added or a new API endpoint becomes available. The client refreshes its tool manifest without requiring a restart or reconnection. This dynamic discovery is critical for long-running agent systems that need to adapt to evolving tool landscapes. ### Gateway Patterns ```mermaid flowchart TD C1[AI Client A] --> G[MCP Gateway] C2[AI Client B] --> G C3[AI Client C] --> G G --> S1[Database MCP Server] G --> S2[Search MCP Server] G --> S3[API MCP Server] G --> S4[File System MCP Server] ``` As organizations deploy dozens of MCP servers, managing direct client-to-server connections becomes impractical. An MCP gateway sits between clients and servers, aggregating tools from multiple servers behind a single endpoint. The gateway pattern provides several architectural benefits. **Tool aggregation** combines tools from all downstream servers into a unified manifest. A client sees one tool list regardless of how many servers provide those tools. **Centralized authentication** means each client authenticates once with the gateway rather than managing credentials for every downstream server. **Access control policies** are enforced at the gateway level -- an AI model used for customer support gets access to knowledge base tools but not to deployment or financial tools, regardless of what the underlying servers expose. **Observability** is another key benefit. The gateway is the single point through which all [tool calls](/learn/llm-tool-calling) pass, making it the natural place to implement distributed tracing, latency monitoring, and usage metrics. When a multi-step agent workflow spans five tool calls across three servers, the gateway can trace the entire execution path and measure end-to-end performance. Gateway architectures also enable **tool versioning and migration**. When a tool's implementation changes -- for example, migrating a database query tool from one backend to another -- the gateway can route requests to the new implementation without any client changes. This decouples tool consumers (AI applications) from tool providers (MCP servers), enabling independent evolution of both sides. Function calling is a model-level capability where the AI generates structured arguments for predefined functions. MCP is a protocol that standardizes how models discover, connect to, and invoke those functions across distributed servers. MCP builds on function calling by standardizing the discovery, transport, and execution layer -- so tools are interoperable across AI applications rather than hardcoded into each one.

    ', }, { title: 'What is an MCP server?', paragraph: '

    An MCP server is a process that exposes tools, resources, and prompts through the Model Context Protocol. It can run locally as a subprocess (using stdio transport) or remotely over HTTP with Server-Sent Events (SSE). Examples include MCP servers for databases, file systems, and SaaS APIs like GitHub, Slack, and Jira.

    ', }, { title: 'What is an MCP gateway?', paragraph: '

    An MCP gateway federates multiple MCP servers behind a single endpoint, handling authentication, authorization, rate limiting, and observability for all tool invocations. Gateways are important for enterprise deployments where AI access to tools must be centrally governed. Spice functions as an MCP gateway with fine-grained access controls and distributed tracing.

    ', }, { title: 'Is MCP only for chat-based AI applications?', paragraph: '

    No. While MCP is widely used with chat assistants and coding tools, the protocol is application-agnostic. Any AI system that needs to interact with external tools or data sources can use MCP -- including autonomous agents, batch processing pipelines, and retrieval-augmented generation (RAG) systems.

    ', }, { title: 'How does MCP handle security?', paragraph: '

    MCP supports authentication at the transport layer (TLS for SSE connections, process isolation for stdio). MCP gateways add authorization policies controlling which models can invoke which tools with what parameters. Enterprise deployments typically combine MCP with role-based access control, audit logging, and network segmentation.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## RAG vs Fine-Tuning: How to Choose URL: https://spice.ai/learn/rag-vs-fine-tuning Date: 2026-02-05T00:00:00 Description: RAG retrieves external data at inference time while fine-tuning embeds knowledge into model weights. Learn the key differences, tradeoffs, and when to use each approach for production AI applications. When teams build AI applications that need domain-specific knowledge, they face a fundamental question: should the model retrieve relevant data at query time, or should that knowledge be trained directly into the model's weights? This is the core distinction between retrieval-augmented generation (RAG) and fine-tuning. Neither approach is universally better. RAG excels at injecting current, factual information into model responses. Fine-tuning excels at shaping how a model reasons, responds, and follows domain-specific patterns. Understanding the tradeoffs between them -- and knowing when to combine them -- is essential for building production AI systems that are accurate, maintainable, and cost-effective. ## What is RAG? [Retrieval-augmented generation](/learn/retrieval-augmented-generation) is an architecture pattern that retrieves relevant data from external sources at inference time and includes it in the LLM's prompt as context. Rather than relying solely on knowledge stored in model weights, the LLM generates responses grounded in specific, current information. A RAG pipeline operates in three stages: 1. **Indexing:** Source data (documents, database records, knowledge base articles) is chunked and converted into [embeddings](/learn/embeddings) -- dense vector representations stored in a searchable index. 2. **Retrieval:** When a user query arrives, the system searches the index using vector similarity, keyword matching, or [hybrid search](/platform/hybrid-sql-search) to find the most relevant chunks. 3. **Generation:** The retrieved chunks are injected into the LLM prompt as context, and the model generates a response grounded in that specific data. RAG does not modify the model itself. The same base model can serve different use cases simply by changing which data sources it retrieves from. This makes RAG highly flexible and straightforward to update -- new knowledge becomes available as soon as it is indexed. ## What is Fine-Tuning? Fine-tuning modifies a pre-trained model's weights by continuing its training on a domain-specific dataset. This permanently embeds knowledge, behavior patterns, and stylistic preferences into the model. After fine-tuning, the model "knows" the new information in the same way it knows its original training data -- through learned parameters rather than external context. The fine-tuning process typically involves: 1. **Data preparation:** Curating a dataset of input-output pairs that demonstrate the desired behavior (e.g., question-answer pairs in your domain, examples of the target writing style, or task-specific demonstrations). 2. **Training:** Running additional training passes over this data, adjusting the model's weights to minimize prediction error on the new examples. Techniques like LoRA (Low-Rank Adaptation) reduce the computational cost by training only a small subset of parameters. 3. **Evaluation:** Testing the fine-tuned model against held-out examples to measure improvement and check for regressions in general capability. Fine-tuning changes the model permanently. The resulting model carries its new knowledge and behaviors without needing any external data at inference time. ## Key Differences The following table summarizes the core tradeoffs between RAG and fine-tuning across the dimensions that matter most for production systems. | Dimension | RAG | Fine-Tuning | | --- | --- | --- | | **Knowledge source** | External data retrieved at query time | Embedded in model weights during training | | **Data freshness** | Real-time -- updates available as soon as data is indexed | Static -- requires retraining to incorporate new information | | **Setup cost** | Moderate -- requires retrieval infrastructure (search index, embedding pipeline) | High -- requires curated training data, GPU compute, and training expertise | | **Update cost** | Low -- re-index changed data | High -- retrain the model on updated data | | **Inference latency** | Higher -- adds retrieval step before generation | Lower -- no retrieval step required | | **Accuracy on factual queries** | High -- answers grounded in retrieved source data | Variable -- depends on training data coverage | | **Hallucination risk** | Lower for covered topics -- model has source context | Higher for edge cases outside training distribution | | **Behavioral customization** | Limited -- model behavior unchanged | Strong -- can reshape tone, style, and reasoning patterns | | **Context window dependency** | Yes -- bounded by how much context the model can process | No -- knowledge is in weights, not context | | **Auditability** | High -- can trace answers to specific source documents | Low -- knowledge is distributed across model parameters | ## When to Use RAG RAG is the better choice when your application needs to work with data that changes frequently, when auditability and source attribution matter, or when you need to query across multiple data sources without retraining a model. **Use RAG when:** - **Data changes frequently.** Product documentation, pricing, policies, inventory, and support articles change regularly. RAG reflects these changes as soon as the index is updated, without retraining. - **Source attribution is required.** Compliance, legal, and customer-facing applications often need to cite the specific documents that informed a response. RAG naturally supports this because the retrieved chunks are available alongside the generated answer. - **You query multiple or heterogeneous data sources.** Enterprise data lives across databases, wikis, APIs, and file systems. RAG can retrieve from all of these sources through a unified search layer. - **You need to control costs.** RAG avoids the GPU compute and training pipeline required for fine-tuning. Adding new knowledge is a data indexing operation, not a model training operation. - **Accuracy on factual questions is critical.** Grounding responses in retrieved source data significantly reduces hallucinations compared to relying solely on model weights. ## When to Use Fine-Tuning Fine-tuning is the better choice when you need to change how a model behaves, not just what information it has access to. It is particularly effective for shaping output format, tone, reasoning style, and domain-specific patterns. **Use fine-tuning when:** - **You need a specific output format or style.** If your application requires responses in a particular structure (JSON, specific templates, clinical language, legal prose), fine-tuning teaches the model to consistently produce that format. - **You need domain-specific reasoning.** Medical diagnosis, legal analysis, and financial modeling involve reasoning patterns that general-purpose models may not handle well. Fine-tuning on expert examples teaches the model how to reason in your domain. - **Latency is critical.** Fine-tuning eliminates the retrieval step, reducing inference latency. For real-time applications where every millisecond matters, this can be significant. - **The knowledge is static and well-defined.** If your domain knowledge rarely changes (e.g., established medical terminology, programming language syntax, mathematical concepts), fine-tuning embeds it directly without needing retrieval infrastructure. - **You want to reduce prompt size.** Fine-tuned models carry knowledge in their weights, so you don't need to include large amounts of context in each prompt. This reduces token costs and avoids context window limitations. ## Decision Framework Use the following framework to determine which approach -- or combination -- fits your use case. ### Step 1: Identify the Problem Type Ask: **"Am I trying to give the model new information, or change how it behaves?"** - New information (facts, documents, records) --> RAG - New behavior (style, format, reasoning patterns) --> Fine-tuning - Both --> Combine RAG and fine-tuning ### Step 2: Assess Data Volatility Ask: **"How often does the underlying data change?"** - Daily or more frequently --> RAG (retraining at this cadence is impractical) - Monthly to quarterly --> Either approach works; consider other factors - Rarely or never --> Fine-tuning is viable ### Step 3: Evaluate Auditability Requirements Ask: **"Do I need to trace responses back to specific source documents?"** - Yes --> RAG (source attribution is a built-in capability) - No --> Either approach works ### Step 4: Consider Infrastructure and Cost Ask: **"What infrastructure and expertise do I have available?"** - Strong data infrastructure, limited ML training expertise --> RAG - Strong ML training expertise, stable training data --> Fine-tuning - Both --> Combine approaches ### Step 5: Plan for the Combination In many production systems, the answer is not RAG _or_ fine-tuning, but RAG _and_ fine-tuning. A common pattern is: - **Fine-tune** the model for domain-specific behavior: output format, terminology, reasoning style, and tone - **Use RAG** to inject current, factual knowledge at query time: product data, customer records, policy documents, real-time metrics This combination gives you a model that both _behaves_ correctly for your domain and _knows_ the latest information -- without requiring retraining every time your data changes. ## Advanced Topics ### RAG with Structured Data Most RAG tutorials focus on unstructured text -- documents, articles, knowledge bases. But enterprise data is frequently structured: relational databases, data warehouses, operational systems. Structured data RAG retrieves from SQL-queryable sources rather than (or in addition to) vector indexes. Instead of embedding and searching document chunks, structured data RAG translates natural language queries into SQL, executes them against connected databases, and includes the results as context for the LLM. This approach is particularly effective for questions involving aggregations, filtering, joins, and exact lookups -- operations where vector similarity search performs poorly. [Hybrid SQL search](/platform/hybrid-sql-search) combines both paradigms: vector search for semantic retrieval over unstructured content and SQL queries for precise retrieval from structured data. This is critical in enterprise environments where the answer to a question may require joining product documentation (unstructured) with pricing tables (structured) and customer records (structured). ### Parameter-Efficient Fine-Tuning Full fine-tuning updates all of a model's parameters, which is computationally expensive and risks catastrophic forgetting -- the model loses general capabilities as it overfits to the new data. Parameter-efficient fine-tuning (PEFT) methods address this by training only a small fraction of parameters. **LoRA (Low-Rank Adaptation)** is the most widely adopted PEFT method. It freezes the original model weights and injects small, trainable rank-decomposition matrices into each layer. Instead of updating millions or billions of parameters, LoRA trains thousands to millions -- reducing GPU memory requirements by 60-80% while achieving comparable quality to full fine-tuning on most tasks. **QLoRA** combines LoRA with quantization, loading the base model in 4-bit precision and training only the LoRA adapters in full precision. This enables fine-tuning large models (7B-70B parameters) on a single consumer GPU -- a significant reduction in the infrastructure barrier to fine-tuning. These techniques make fine-tuning more accessible, but the fundamental tradeoffs remain: fine-tuning still requires curated training data, evaluation infrastructure, and retraining when the domain evolves. ### Combining RAG and Fine-Tuning in Production The most sophisticated production systems use fine-tuning and RAG together, but integrating them introduces its own challenges. A fine-tuned model may have learned patterns during training that conflict with retrieved context at inference time. For example, if the model was fine-tuned on outdated pricing information and the RAG system retrieves current pricing, the model must correctly prioritize the retrieved context over its trained knowledge. Techniques to manage this include instruction tuning the model to explicitly prefer retrieved context over internal knowledge, using system prompts that reinforce context-grounding behavior, and evaluating with adversarial examples where retrieved context contradicts trained knowledge. Monitoring is essential in combined systems. Track how often the model's responses align with retrieved context versus its trained knowledge. A drift toward trained knowledge (ignoring retrieved context) is a signal that the fine-tuning is overriding RAG -- a common failure mode that degrades accuracy as source data diverges from training data. ## How Spice Powers RAG Pipelines [Spice](/use-case/retrieval-augmented-generation) provides the data infrastructure layer that production RAG systems require -- unified retrieval across structured and unstructured data, with the performance characteristics needed for real-time AI applications. **[Hybrid SQL search](/platform/hybrid-sql-search)** combines vector similarity, full-text keyword matching, and structured SQL queries in a single interface. Rather than managing separate vector databases, search engines, and relational databases, Spice executes all three retrieval modes in one query. This is particularly important for enterprise RAG where answers depend on both unstructured documents and structured operational data. **[LLM inference](/platform/llm-inference)** runs embedding models and generation models alongside data queries in the same runtime. Embedding generation, retrieval, and response generation happen within a single system -- eliminating the network hops and orchestration complexity of stitching together separate embedding services, vector databases, and LLM APIs. **Data federation and acceleration** connect RAG pipelines to data wherever it lives. Spice federates queries across [30+ data sources](/integrations) -- databases, warehouses, APIs, and file systems -- so the retrieval layer has access to all relevant enterprise data without complex ETL pipelines. Query acceleration caches frequently accessed data locally for low-latency retrieval, a critical requirement when RAG queries must complete in hundreds of milliseconds. **Real-time data freshness** keeps indexes current as source data changes. Through change data capture and incremental re-indexing, Spice ensures that the retrieval layer reflects the latest state of your data -- addressing one of the most common failure modes in production RAG systems where stale indexes produce outdated answers. For teams evaluating whether to use RAG, fine-tuning, or both, Spice provides the retrieval infrastructure that makes RAG practical at production scale -- letting you focus on the AI application logic rather than the underlying data plumbing. Yes, and many production systems do. A common pattern is to fine-tune the model for domain-specific behavior (output format, reasoning style, terminology) and use RAG to inject current factual knowledge at query time. This gives you a model that both behaves correctly for your domain and has access to the latest information without retraining.

    ', }, { title: 'Which approach is more cost-effective?', paragraph: '

    RAG is typically more cost-effective for knowledge-intensive applications. The primary cost is retrieval infrastructure (search indexes, embedding pipelines), which scales predictably. Fine-tuning requires GPU compute for training, curated datasets, and retraining whenever domain knowledge changes. However, fine-tuning can reduce per-query costs by eliminating the retrieval step and reducing prompt token counts.

    ', }, { title: 'Does RAG work with structured data like databases?', paragraph: '

    Yes. While most RAG implementations focus on unstructured text, production RAG systems increasingly retrieve from structured data sources using SQL queries alongside vector search. Spice supports hybrid SQL search that combines vector similarity, keyword matching, and structured SQL retrieval in a single query -- making it possible to ground LLM responses in both documents and database records.

    ', }, { title: 'How do I know if my fine-tuned model is hallucinating?', paragraph: '

    Fine-tuned models hallucinate when queries fall outside their training distribution -- they generate plausible-sounding but incorrect responses. Detection requires evaluation datasets with known correct answers, human review of edge cases, and monitoring confidence signals. RAG reduces this risk by grounding responses in retrieved source data, making it easier to verify accuracy and trace answers to specific documents.

    ', }, { title: 'What are the latency tradeoffs between RAG and fine-tuning?', paragraph: '

    RAG adds a retrieval step before generation, typically adding 50-200ms depending on index size, search complexity, and infrastructure. Fine-tuned models skip this step, generating responses directly from weights. For latency-critical applications (real-time chat, autocomplete), this difference matters. However, RAG latency can be minimized with query acceleration, local caching, and optimized search infrastructure.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Retrieval Augmented Generation (RAG)? URL: https://spice.ai/learn/retrieval-augmented-generation Date: 2025-12-27T00:00:00 Description: Retrieval augmented generation (RAG) grounds LLM responses in real data by retrieving relevant context at inference time. Learn how RAG works, its architecture, and production best practices. Large language models are trained on massive text corpora, but that training data has a cutoff date and doesn't include an organization's private data. When you ask an LLM a question about your company's products, internal policies, or recent events, it either guesses (hallucinating a plausible-sounding answer) or admits it doesn't know. Retrieval augmented generation solves this by adding a retrieval step before generation. Instead of relying solely on knowledge baked into model weights, RAG retrieves relevant documents, database records, or API responses and injects them into the LLM's context window. The model generates its answer grounded in this specific, current data -- dramatically reducing hallucinations and enabling domain-specific, accurate responses. ## How RAG Works A RAG system operates in three stages: indexing source data ahead of time, retrieving relevant context at query time, and augmenting the LLM prompt with that context before generation. ### Stage 1: Indexing Before RAG can retrieve anything, source data must be indexed. This involves: 1. **Collecting source data** from databases, knowledge bases, document stores, APIs, and file systems 2. **Chunking** long documents into smaller segments (typically 256-1024 tokens) that fit within an LLM's context window 3. **Embedding** each chunk into a vector representation using an embedding model (e.g., OpenAI's text-embedding-3, Cohere Embed, or open-source alternatives) 4. **Storing** the vectors alongside the original text in a vector index or database This indexing step runs ahead of time and must be refreshed as source data changes. Stale indexes are one of the most common failure modes in production RAG systems -- the retriever returns outdated context, and the LLM generates outdated answers. ### Stage 2: Retrieval When a user query arrives: 1. The query is embedded using the same embedding model used during indexing 2. The query vector is compared against the index to find the most semantically similar chunks (typically using cosine similarity or dot product) 3. The top-k most relevant chunks are returned as candidate context In practice, pure vector search often isn't enough. **Hybrid search** -- combining vector similarity with keyword (BM25) matching -- significantly improves retrieval quality. Vector search captures semantic meaning ("What are your refund terms?" matches "return policy"), while keyword search catches exact terms that vector search might miss (product names, error codes, technical identifiers). The quality of retrieval is the single most important factor in RAG performance. If the retriever returns irrelevant chunks, the LLM generates poor answers regardless of how capable the model is. Improving retrieval quality -- through better chunking, hybrid search, re-ranking, and metadata filtering -- typically has a larger impact than switching to a more powerful LLM. ### Stage 3: Augmented Generation The retrieved chunks are assembled into the LLM prompt as context, typically before the user's question: ``` Context: [Retrieved chunk 1] [Retrieved chunk 2] [Retrieved chunk 3] User question: What is the refund policy for enterprise plans? Answer based on the context above: ``` The LLM generates a response grounded in this specific, retrieved data. Because the model has the actual source material in its context, it can provide accurate, specific answers with citations traceable back to source documents. ## RAG vs. Fine-Tuning RAG and fine-tuning are the two main approaches to customizing LLM behavior with domain-specific knowledge. They solve different problems and are often combined. **Fine-tuning** modifies a model's weights by training on domain-specific data. This permanently embeds knowledge and behavior patterns into the model. Fine-tuning is effective for changing the model's tone, style, or reasoning patterns -- for example, training it to respond like a technical support agent or to follow a specific output format. **RAG** retrieves knowledge at inference time without modifying the model. This is effective for injecting factual, frequently changing information -- product documentation, internal policies, customer data, real-time metrics. The key tradeoff is **maintenance cost vs. flexibility**: - Fine-tuning is expensive (GPU hours, labeled data) and slow to update. When information changes, the model must be retrained. - RAG is cheap to update -- new knowledge becomes available as soon as it's indexed. But it depends on retrieval quality and is bounded by context window size. Most production systems use both: fine-tuning for behavior and style, RAG for factual knowledge. ## Production RAG Challenges The gap between a RAG prototype and a production RAG system is significant. Several data infrastructure challenges determine whether the system is reliable enough for real users. ### Data Freshness If your vector index is rebuilt nightly, every answer is at least a day stale. For many use cases -- customer support, compliance queries, operational dashboards -- this staleness is unacceptable. Production RAG systems need real-time or near-real-time indexing. This typically involves [change data capture (CDC)](/learn/change-data-capture) to detect when source data changes and incrementally update the vector index. Without CDC, teams resort to periodic full re-indexing, which is slow and expensive at scale. ### Retrieval Quality Poor retrieval is the most common reason RAG systems underperform. Common failure modes include: - **Chunking too aggressively:** Important context is split across chunks, so no single chunk contains enough information - **Missing keyword matches:** Pure vector search misses exact terms (product names, error codes) that users search for - **Irrelevant results:** The retriever returns semantically similar but factually irrelevant content - **Missing metadata filters:** Queries that should be scoped (e.g., "2026 pricing") retrieve content from all time periods [Hybrid search](/platform/hybrid-sql-search) -- combining vector similarity with BM25 keyword matching -- addresses several of these failures. Re-ranking models (e.g., Cohere Rerank, cross-encoders) can further improve precision by scoring retrieved chunks against the query using a more expensive model. ### Multi-Source Federation Enterprise data doesn't live in a single database. Customer records are in PostgreSQL, product documentation is in Confluence, support tickets are in Zendesk, and financial data is in Snowflake. A production RAG system needs to retrieve from all of these sources. Building a separate retrieval pipeline for each source is fragile and doesn't scale. [SQL federation](/learn/sql-federation) provides a unified query interface across all sources, so the RAG system can retrieve structured data from any connected system alongside vector search results. ### Observability and Evaluation Production RAG systems need monitoring to detect quality degradation: - **Retrieval metrics:** Are the retrieved chunks relevant? How often does the retriever return empty or low-confidence results? - **Generation metrics:** Are the LLM's answers faithful to the retrieved context, or is it hallucinating beyond what the context supports? - **End-to-end metrics:** Are users finding the answers helpful? What's the failure rate? Without observability, RAG quality degrades silently as source data changes, retrieval patterns shift, or index staleness increases. ## Common RAG Use Cases ### Customer Support and Knowledge Base Q&A Connect an LLM to product documentation, FAQ articles, and support ticket history. Users ask natural language questions and receive accurate, cited answers drawn from authoritative sources -- reducing support ticket volume and improving resolution time. ### Internal Enterprise Search Employees search across internal wikis, policies, engineering docs, and Slack history using natural language. RAG provides answers with citations, not just a list of matching documents. ### Code Assistance and Developer Tools AI coding assistants use RAG to ground suggestions in the actual codebase, API documentation, and project-specific patterns. This dramatically reduces incorrect or hallucinated code suggestions. ### Compliance and Regulatory Queries Legal and compliance teams query regulatory documents, internal policies, and audit records. RAG ensures answers are traceable to specific source documents -- critical for regulatory compliance. ## RAG with Spice [Spice](/use-case/retrieval-augmented-generation) provides the data infrastructure layer that production RAG systems require: - **[Hybrid search](/platform/hybrid-sql-search)** combining vector, full-text, and SQL retrieval in a single query - **[SQL federation](/learn/sql-federation)** for retrieving structured data from [30+ connected sources](/integrations) alongside vector search results - **[Real-time CDC](/feature/real-time-change-data-capture)** to keep vector indexes and acceleration caches fresh as source data changes - **[LLM inference](/platform/llm-inference)** for running embedding and generation models alongside data queries This unified runtime means RAG applications can index, retrieve, and generate in a single system instead of stitching together separate vector databases, search engines, and data pipelines. ## Advanced Topics ### The Full RAG Pipeline A production RAG pipeline involves more stages than the basic three-step model suggests. Between the user's query and the final generated response, multiple processing and refinement steps determine answer quality. ```mermaid flowchart LR A[Query] --> B[Embed Query] B --> C[Retrieve Candidates] C --> D[Re-rank] D --> E[Augment Prompt] E --> F[Generate Response] ``` Understanding each stage -- and where quality breaks down -- is essential for debugging and improving RAG systems in production. ### Chunking Strategies How source documents are split into chunks has an outsized impact on retrieval quality. The simplest approach -- splitting on a fixed token count -- often breaks mid-sentence or separates a question from its answer. **Recursive character splitting** divides text hierarchically: first by section headers, then by paragraphs, then by sentences. This preserves semantic boundaries better than fixed-size splits. **Semantic chunking** goes further by using an embedding model to detect topic shifts and placing chunk boundaries where the semantic similarity between adjacent sentences drops. This produces chunks that are coherent units of meaning rather than arbitrary slices. **Parent-child chunking** (also called small-to-big retrieval) indexes small chunks for retrieval precision but returns the surrounding parent chunk for generation context. The retriever matches on a focused passage, but the LLM receives enough surrounding context to generate a complete answer. This balances retrieval precision against generation context -- a tradeoff that single-level chunking cannot address. Chunk overlap -- including 10-20% of the previous chunk at the start of each new chunk -- helps preserve context at boundaries but increases index size. The optimal overlap depends on the nature of the source material. ### Re-ranking Initial retrieval (whether vector, keyword, or [hybrid](/learn/hybrid-search)) uses a bi-encoder model that embeds the query and documents independently. This is fast but imprecise -- the query and document never directly attend to each other. **Cross-encoder re-rankers** score each candidate by processing the query and document together through a single model, allowing full cross-attention between them. This produces significantly more accurate relevance scores but is too expensive to run against the full index. The standard pattern is to retrieve a larger candidate set (e.g., top-50 from initial retrieval) and re-rank to the final top-k (e.g., top-5). Re-ranking is one of the highest-impact improvements for RAG quality. In benchmarks, adding a cross-encoder re-ranker to a hybrid retrieval pipeline typically improves answer accuracy by 10-20% without changing any other component. ### Multi-Hop Retrieval Some questions cannot be answered from a single retrieved passage. "How does our enterprise pricing compare to competitors mentioned in Q4 analyst reports?" requires first finding the analyst reports, then extracting competitor mentions, then retrieving pricing data for each competitor. Multi-hop retrieval decomposes complex queries into sub-queries, retrieves context for each, and chains the results. The LLM generates intermediate queries based on partial results, retrieves additional context, and synthesizes across all retrieved information. This is more complex than single-shot retrieval -- it requires the LLM to plan a retrieval strategy -- but it's necessary for questions that span multiple documents or require reasoning across disparate data sources. Frameworks for multi-hop retrieval include iterative retrieval (retrieve, reason, retrieve again) and graph-based retrieval (following entity relationships across a knowledge graph to gather connected context). Both patterns increase latency but enable the system to answer questions that would otherwise require multiple user interactions. Fine-tuning modifies a model's weights by training on domain-specific data, permanently embedding that knowledge. RAG retrieves relevant data at inference time and injects it into the prompt. Fine-tuning is better for changing style or behavior; RAG is better for injecting factual, frequently changing knowledge. Most production systems combine both.

    ", }, { title: 'What types of data can RAG retrieve from?', paragraph: '

    RAG can retrieve from any data source that can be indexed: relational databases, document stores, vector databases, APIs, PDFs, wikis, and file systems. SQL-based retrieval -- querying structured data directly -- is an emerging pattern that complements vector search for structured enterprise data.

    ', }, { title: 'Does RAG eliminate hallucinations completely?', paragraph: '

    No. RAG significantly reduces hallucinations by grounding responses in retrieved data, but the model can still misinterpret context or generate plausible but incorrect inferences. Proper evaluation, chunk quality monitoring, and response validation remain important in production systems.

    ', }, { title: 'What is hybrid search in the context of RAG?', paragraph: '

    Hybrid search combines vector similarity search (finding semantically similar content) with keyword search (exact term matching). This improves retrieval quality because vector search alone can miss exact terms while keyword search alone misses semantic meaning. Combining both methods is especially important for technical and domain-specific queries.

    ', }, { title: 'How does RAG handle large-scale enterprise data?', paragraph: '

    Enterprise RAG requires a data infrastructure layer that can federate queries across multiple sources, cache frequently accessed data for low-latency retrieval, and keep indexes fresh through real-time synchronization. Spice provides SQL federation, query acceleration, and hybrid search in a single runtime purpose-built for these requirements.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## Sidecar vs Microservice Architecture: How to Choose URL: https://spice.ai/learn/sidecar-vs-microservice-architecture Date: 2026-03-10T00:00:00 Description: Sidecar and microservice are two deployment architectures for data and AI runtimes. Learn the key differences in latency, scaling, resource usage, and when to use each pattern. When deploying a data or AI runtime -- a query engine, inference server, or acceleration layer -- one of the first architectural decisions is how the runtime relates to the applications that consume it. The two most common patterns are the **sidecar** and the **microservice** (centralized) deployment. The sidecar pattern co-locates the runtime alongside each application instance, typically in the same Kubernetes pod or on the same machine. The microservice pattern deploys the runtime as a standalone service, independently scaled and accessed over the network. Neither approach is universally better. The right choice depends on latency requirements, scale, resource constraints, and organizational structure. This guide explains how each architecture works, compares them across the dimensions that matter in production, and provides a decision framework for choosing the right pattern. ## How Sidecar Architecture Works In a sidecar deployment, the data or AI runtime runs as a secondary process alongside the primary application -- in the same Kubernetes pod, the same virtual machine, or the same container group. The application communicates with the sidecar over the local loopback interface (`localhost`), eliminating network hops between the application and its runtime. ```mermaid flowchart LR subgraph Pod A A1[App A] --> S1[Runtime Sidecar] end subgraph Pod B A2[App B] --> S2[Runtime Sidecar] end S1 --> DB[(Data Sources)] S2 --> DB ``` Key characteristics of sidecar deployments: - **Local loopback communication.** The application talks to the runtime over `localhost`, avoiding network latency, DNS resolution, and load balancer overhead. Round-trip times are measured in microseconds rather than milliseconds. - **Lifecycle coupling.** The sidecar starts, stops, and restarts with the application pod. There is no separate deployment pipeline or versioning to manage -- the runtime version is pinned to the application deployment. - **Per-instance resource allocation.** Each application pod gets its own runtime instance with dedicated CPU, memory, and any locally [accelerated data](/learn/data-acceleration). There is no contention between applications. - **Data locality.** Accelerated datasets are replicated to each sidecar instance. Queries against cached data never leave the machine, delivering consistent sub-millisecond response times. The tradeoff is resource duplication. If ten application pods each run a sidecar, the cluster runs ten copies of the runtime, each consuming CPU and memory. Accelerated datasets are replicated to each sidecar, multiplying storage usage. For small-to-moderate deployments, this overhead is manageable. At large scale, it can become expensive. ### When Sidecar Architecture Excels Sidecar deployments are well-suited to scenarios where latency dominates other concerns: - **Real-time decision-making.** A trading bot that needs sub-millisecond access to market data benefits from having the data runtime in the same pod. Every network hop adds latency that can translate into missed opportunities. - **Latency-critical AI inference.** Applications that call an [LLM or embedding model](/learn/llm-inference) as part of a request-response cycle benefit from local inference where the model runtime is co-located with the calling application. - **Autonomous edge deployments.** When applications run at edge locations with unreliable network connectivity, a sidecar ensures the runtime remains available even if the connection to central services drops. ## How Microservice Architecture Works In a microservice deployment, the data or AI runtime runs as an independent service -- one or more replicas behind a load balancer, accessed over the network via HTTP, gRPC, or a database protocol like Arrow Flight SQL. The runtime is decoupled from any single application and serves multiple consumers. ```mermaid flowchart LR A1[App A] --> LB[Load Balancer] A2[App B] --> LB A3[App C] --> LB LB --> R1[Runtime Replica 1] LB --> R2[Runtime Replica 2] R1 --> DB[(Data Sources)] R2 --> DB ``` Key characteristics of microservice deployments: - **Loose coupling.** The runtime has its own deployment lifecycle, versioning, and scaling rules. It can be upgraded, restarted, or scaled without touching application deployments. - **Shared infrastructure.** A single runtime service can serve multiple applications and teams. Data is [federated and accelerated](/platform/sql-federation-acceleration) once and shared across all consumers rather than duplicated per pod. - **Independent scaling.** The runtime scales based on its own resource utilization and query load, not on the number of application pods. If query traffic spikes, the runtime auto-scales without requiring the applications to scale in tandem. - **Network hop.** Every query travels over the network from the application to the runtime service. Even within the same cluster, this adds latency compared to local loopback -- typically single-digit milliseconds, but measurable for latency-sensitive workloads. The tradeoff is added infrastructure complexity. The microservice requires service discovery, health checks, load balancing, and connection pooling. Network partitions, DNS failures, or load balancer misconfiguration can disrupt connectivity between applications and the runtime. ### When Microservice Architecture Excels Microservice deployments fit scenarios where sharing, scaling, and operational independence matter more than absolute latency: - **Shared data and AI platform.** When multiple applications or teams need access to the same federated data layer and [connector integrations](/integrations), a centralized microservice avoids duplicating configuration and accelerated datasets across dozens of sidecars. - **Variable traffic patterns.** If query load fluctuates significantly -- low during off-hours, high during business hours -- an independently scaled microservice can right-size resources without over-provisioning every application pod. - **Independent release cycles.** When the data platform team needs to upgrade the runtime, patch security vulnerabilities, or add new [data connectors](/integrations) without coordinating with every application team, a decoupled microservice is the right pattern. ## Comparison Table The following table summarizes the key differences between sidecar and microservice architectures across the dimensions that matter most in production. | Dimension | Sidecar | Microservice | |---|---|---| | **Latency** | Sub-millisecond via local loopback | Single-digit milliseconds over the network | | **Scaling** | Scales with application pods | Scales independently based on query load | | **Resource usage** | Runtime duplicated per pod; higher aggregate resource cost | Shared runtime; more efficient resource utilization | | **Data acceleration** | Accelerated data replicated to each sidecar | Single shared acceleration cache | | **Deployment coupling** | Tightly coupled to application lifecycle | Independent deployment and versioning | | **Operational complexity** | Low -- no service discovery or load balancing needed | Higher -- requires load balancer, health checks, connection pooling | | **Multi-tenant access** | One application per sidecar | Multiple applications and teams share one service | | **Failure blast radius** | Failure affects only one application pod | Failure can affect all consuming applications | | **Cost at scale** | Higher -- N copies of runtime for N pods | Lower -- shared replicas serve all consumers | | **Best for** | Latency-critical, small-to-moderate scale | Shared platform, variable traffic, large organizations | Neither column is strictly better. The right choice depends on the workload requirements, which the decision framework below addresses. ## Decision Framework Use the following questions to determine which architecture fits each deployment scenario. ### 1. How sensitive is the application to latency? - **Sub-millisecond required:** Sidecar -- local loopback eliminates network overhead - **Single-digit milliseconds acceptable:** Microservice works well - **Mixed requirements:** Use a tiered approach (described below) ### 2. How many applications consume the runtime? - **One or a few tightly coupled applications:** Sidecar keeps things simple - **Many applications across multiple teams:** Microservice avoids duplicating configuration and accelerated data across sidecars - **Both:** Central microservice for shared access, sidecars for latency-critical paths ### 3. What are the resource constraints? - **Resource-constrained environment (edge, small clusters):** Evaluate whether duplicating the runtime per pod is feasible. A single microservice may use fewer total resources - **Ample cluster resources:** Sidecar duplication overhead is tolerable for the latency benefit - **Cost-sensitive at scale:** Microservice -- sharing runtime replicas is more efficient than running one per pod ### 4. How independent are deployment lifecycles? - **Application and runtime release together:** Sidecar simplifies coordination -- same pod, same deployment - **Runtime team and application teams release independently:** Microservice decouples release cycles - **Mixed:** Microservice with pinned versions for stability-critical consumers ### 5. What is the expected scale? - **Small-to-moderate (dozens of pods):** Sidecar duplication overhead is manageable - **Large (hundreds or thousands of pods):** Microservice avoids the cost of running hundreds of runtime instances - **Growing rapidly:** Start with microservice to avoid re-architecting as scale increases ### Summary Matrix | Scenario | Recommended architecture | |---|---| | Real-time trading bot needing sub-millisecond data access | Sidecar | | Shared AI inference engine serving multiple teams | Microservice | | Edge deployment with unreliable connectivity | Sidecar | | Large org where 20+ services query the same data layer | Microservice | | Latency-critical app with variable query traffic | Sidecar with auto-scaling, or tiered approach | | Cost-sensitive cluster with limited resources | Microservice | ## Tiered and Hybrid Approaches In practice, many organizations combine sidecar and microservice patterns in a tiered architecture. This approach uses sidecars for performance-critical paths and a centralized microservice for everything else. A common tiered pattern consists of: - **Edge tier.** Sidecars deployed at edge locations for low-latency local access and offline resilience. - **Application tier.** Sidecars co-located with latency-sensitive applications that require sub-millisecond data access or inline AI inference. - **Platform tier.** A centralized microservice deployment that serves shared queries, batch workloads, and applications where single-digit-millisecond latency is acceptable. This tiered model lets teams optimize each workload independently. A real-time fraud detection service might run a sidecar for instant access to accelerated risk scores, while a reporting dashboard queries the same data through the centralized microservice. Both consume the same [data connectors](/integrations) and acceleration layer -- just at different latency tiers. ## Advanced Topics ### Multi-Cluster Federation In distributed enterprises, data and AI runtimes may span multiple Kubernetes clusters across regions or cloud providers. Multi-cluster federation adds a routing layer that directs queries to the nearest or most appropriate runtime instance. Sidecar deployments in each cluster can serve local reads, while a central microservice handles cross-cluster queries that require joining data from multiple regions. The key challenge is consistency. When accelerated data is replicated to sidecars across clusters, each sidecar's cache may be at a slightly different point in time. Architectures that require strong consistency across clusters typically route those queries to a single authoritative microservice instance, accepting the latency penalty for correctness. ### Service Mesh Integration In Kubernetes environments, service meshes like Istio or Linkerd add observability, mutual TLS, and traffic management to service-to-service communication. For microservice deployments, the service mesh provides load balancing, circuit breaking, and retry logic that improve reliability between applications and the runtime. Sidecar deployments benefit from service meshes differently. Since the application-to-runtime communication happens over `localhost`, the mesh proxy does not intercept it. However, the mesh still manages outbound traffic from the sidecar to data sources, providing encryption and observability for those connections. ### Resource Optimization Strategies Both architectures can be optimized to reduce resource overhead. For sidecar deployments, key strategies include limiting the datasets accelerated at each sidecar to only those the co-located application needs, using memory-mapped storage for large accelerated datasets to reduce RSS memory pressure, and configuring CPU limits to prevent the sidecar from starving the primary application. For microservice deployments, optimization focuses on connection pooling to reduce per-query overhead, query result caching to avoid redundant source queries, and horizontal pod autoscaling tuned to query concurrency rather than CPU utilization. [Data acceleration](/learn/data-acceleration) in the microservice tier reduces load on upstream data sources and improves query latency for frequently accessed datasets. ## Deployment Architectures with Spice [Spice](/platform/sql-federation-acceleration) supports both sidecar and microservice deployment patterns natively, along with tiered and cluster architectures for enterprise workloads. In **sidecar mode**, Spice deploys alongside the application in the same pod. The application queries Spice over `localhost` via Arrow Flight SQL, HTTP, or gRPC. Accelerated datasets are cached locally in Apache Arrow (in-memory) or DuckDB (on-disk), delivering sub-millisecond query latency. This pattern works well for latency-critical applications that need real-time access to [federated and accelerated data](/use-case/datalake-accelerator). In **microservice mode**, Spice runs as an independent service with one or more replicas behind a load balancer. Multiple applications and teams share a single Spice deployment, querying the same [federated data sources](/integrations) and acceleration caches. The runtime scales independently based on query traffic, and the platform team manages it separately from application deployments. For organizations with mixed requirements, Spice supports a **tiered architecture** where sidecars serve performance-critical paths and a centralized microservice handles shared workloads. Edge, application, and platform tiers can each run Spice with different acceleration configurations tuned to their latency and throughput requirements. At enterprise scale, Spice provides a **cluster deployment** on Kubernetes with high availability, advanced security, centralized monitoring, and commercial support. The cluster architecture builds on the microservice pattern with multi-replica coordination, automated failover, and [operational data lakehouse](/use-case/operational-data-lakehouse) capabilities for mission-critical workloads. A sidecar deploys alongside the application in the same pod or machine, communicating over local loopback for sub-millisecond latency. A microservice runs as an independent service accessed over the network, enabling independent scaling and shared access across multiple applications. The core tradeoff is latency versus resource efficiency and operational independence.

    ', }, { title: 'When should I choose a sidecar architecture over a microservice?', paragraph: '

    Choose a sidecar when your application requires sub-millisecond data access, when lifecycle coupling with the application simplifies operations, or when you are deploying at edge locations with unreliable network connectivity. Sidecar is best for small-to-moderate scale where the resource overhead of duplicating the runtime per pod is acceptable.

    ', }, { title: 'Does a microservice architecture introduce significant latency?', paragraph: '

    A microservice adds a network hop compared to a sidecar, typically single-digit milliseconds within the same Kubernetes cluster. For most applications -- dashboards, batch analytics, shared AI inference -- this latency is negligible. For latency-critical workloads like real-time trading or inline fraud detection, the difference can matter, making the sidecar pattern more appropriate.

    ', }, { title: 'Can I combine sidecar and microservice patterns in the same system?', paragraph: '

    Yes. A tiered architecture uses sidecars for performance-critical paths and a centralized microservice for shared or batch workloads. This is common in production -- for example, a real-time pricing engine runs a sidecar for sub-millisecond access, while reporting dashboards query the same data through a centralized microservice deployment.

    ', }, { title: 'How does the sidecar pattern affect resource usage at scale?', paragraph: '

    Each application pod runs its own copy of the runtime, so resource usage scales linearly with the number of pods. If 50 pods each run a sidecar with 2 GB of accelerated data, the cluster uses 100 GB of memory for data alone. Microservice architectures are more resource-efficient at scale because a shared set of replicas serves all consumers without per-pod duplication.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## SQL Federation vs ETL: How to Choose URL: https://spice.ai/learn/sql-federation-vs-etl Date: 2026-02-20T00:00:00 Description: SQL federation and ETL are two approaches to accessing data across distributed systems. Learn the key differences, when to use each, and how modern platforms combine both for real-time performance. Most organizations store data across many systems: transactional databases like PostgreSQL and MySQL, analytical warehouses like Databricks and Snowflake, object stores like Amazon S3, and SaaS platforms like Salesforce. When applications or analytics need to combine data from several of these systems, teams must choose how to bridge the gap. The two dominant approaches are **ETL (extract, transform, load)** and **SQL federation**. ETL copies data from source systems into a central warehouse on a schedule. SQL federation queries data in place across sources at runtime using a single SQL interface. Neither approach is universally better -- they solve different problems and make different tradeoffs around freshness, performance, complexity, and cost. This guide explains how each approach works, compares them across the dimensions that matter in production, and provides a framework for choosing the right pattern for your workloads. ## How ETL Works ETL is the traditional approach to centralizing data. A pipeline extracts data from source systems, transforms it into the target schema, and loads it into a central warehouse or data lake. ### Extract The pipeline connects to each source system and reads data -- either a full snapshot or an incremental batch based on timestamps or sequence numbers. Extraction can be scheduled (hourly, daily) or triggered by events. ### Transform Raw data is cleaned, normalized, and reshaped to match the target schema. Transformations may include deduplication, type casting, joining reference tables, computing derived columns, and enforcing data quality rules. ### Load Transformed data is written to the central warehouse, data lake, or [lakehouse](/use-case/datalake-accelerator). The target system becomes the single source of truth for downstream consumers -- BI tools, dashboards, and analytical queries. ETL pipelines are well-understood and widely supported by tools like Apache Airflow, dbt, Fivetran, and Apache Spark. They work reliably for batch analytics over historical data, but they introduce inherent latency: data in the warehouse is always at least as old as the last pipeline run. ## How SQL Federation Works [SQL federation](/learn/sql-federation) takes a different approach. Instead of moving data, a federation engine connects to each source at query time, translates a single SQL query into source-specific requests, and merges the results. The data stays where it is. A federated query goes through three stages: 1. **Query planning:** The engine parses the SQL, identifies which tables map to which sources, and builds an optimized execution plan with predicate and aggregation pushdown. 2. **Distributed execution:** Sub-queries are dispatched to each source in parallel. Filters and aggregations are pushed down to minimize data transfer. 3. **Result merging:** Partial results are joined, sorted, and formatted in the federation layer before being returned to the application. Federation provides real-time access to data across [30+ source types](/integrations) through a single SQL endpoint. Applications see a unified interface regardless of where data is stored. ## Comparison Table The following table summarizes the key differences between SQL federation and ETL across the dimensions that matter most in production deployments. | Dimension | SQL Federation | ETL | |---|---|---| | **Data freshness** | Real-time -- queries hit live sources | Batch -- as fresh as the last pipeline run (minutes to hours) | | **Data movement** | None -- data stays in source systems | Full copy into a central warehouse or lake | | **Time to first query** | Minutes -- configure a connector and query | Days to weeks -- design schemas, build transforms, orchestrate pipelines | | **Schema change handling** | Automatic -- queries execute against the current schema | Manual -- pipeline breaks require code changes and redeployment | | **Query performance** | Bounded by source latency and network; improved with acceleration | Fast for pre-computed, co-located data | | **Storage cost** | No duplication | Duplicate storage in the warehouse | | **Operational overhead** | Low -- no pipelines to monitor | High -- pipeline failures, scheduling, orchestration | | **Best for** | Real-time access, ad-hoc queries, AI workloads | Batch analytics, historical reporting, compliance archives | | **Source availability dependency** | High -- sources must be available at query time | Low -- warehouse is independent after loading | Neither column is strictly better. The right choice depends on the workload requirements, which the decision framework below addresses. ## When ETL Is the Right Choice ETL remains the right approach for several well-defined scenarios. ### Heavy Analytical Workloads on Historical Data When analysts run complex aggregations, window functions, and multi-table joins over months or years of data, co-locating that data in a warehouse optimized for analytical queries delivers the best performance. Federation would require pulling large volumes of data over the network on every query. ### Known, Stable Query Patterns If the same set of reports and dashboards run daily against the same datasets, ETL's batch model is efficient. The upfront cost of building pipelines is amortized over many query executions, and the warehouse can be tuned for those specific access patterns. ### Compliance and Audit Requirements Some regulatory frameworks require durable, timestamped copies of data in a controlled environment. ETL into a governed warehouse or [data lake](/use-case/datalake-accelerator) satisfies these requirements by producing an immutable historical record. ### Source Systems with Limited Query Capacity If a source database cannot handle additional analytical query load -- for example, a production OLTP system under heavy write pressure -- extracting data on a schedule and querying the copy avoids adding load to the source. ## When SQL Federation Is the Right Choice Federation excels in scenarios where freshness, speed-to-value, and cross-source access matter more than pre-computed performance. ### Real-Time Operational Applications Applications that need current data from multiple systems -- operational dashboards, monitoring tools, customer-facing portals -- benefit from federation's real-time access. Stale data from a batch pipeline can lead to incorrect decisions or degraded user experiences. ### AI and Machine Learning Workloads AI models, [retrieval-augmented generation (RAG)](/learn/retrieval-augmented-generation) systems, and inference pipelines require fresh, multi-source data. Federation provides the real-time, cross-source access these workloads demand without building separate data pipelines for each model. ### Ad-Hoc Exploration and Prototyping When data teams need to explore a new data source or prototype a cross-system query, federation eliminates the weeks of pipeline engineering that ETL requires. Configure a connector and start querying in minutes. ### Data Mesh Architectures In data mesh, each domain team owns its data products. Federation enables governed, cross-domain queries without centralizing everything into a monolithic warehouse. Each team maintains autonomy while the organization gets unified access. ## Decision Framework Use the following questions to determine which approach fits each workload. In many organizations, the answer is "both" -- different workloads within the same architecture use different patterns. ### 1. How fresh does the data need to be? - **Seconds to minutes:** Federation, potentially with [change data capture](/learn/change-data-capture) for acceleration cache refresh - **Hours to days:** ETL is sufficient - **Mixed requirements:** Federation for real-time consumers, ETL for batch analytics ### 2. How predictable are the query patterns? - **Known, repeated queries:** ETL can pre-compute and optimize for these patterns - **Ad-hoc, exploratory, or evolving:** Federation adapts without pipeline changes - **Both:** Accelerate known patterns locally; federate the rest on demand ### 3. What is the acceptable time-to-value? - **Minutes:** Federation -- connect and query immediately - **Weeks are acceptable:** ETL with proper schema design and pipeline engineering - **Start fast, optimize later:** Federation first, add acceleration and ETL for mature workloads ### 4. What are the source system constraints? - **Sources can handle additional query load:** Federation is straightforward - **Sources are capacity-constrained:** ETL extracts data during off-peak windows, or federation with [data acceleration](/learn/data-acceleration) caches the data locally to avoid repeated source queries ### 5. What is the data volume? - **Moderate (gigabytes):** Federation with acceleration handles this well - **Very large (terabytes+):** ETL into a [data lake or lakehouse](/use-case/datalake-accelerator) may be more practical for full-scan analytical queries - **Mixed:** Federate smaller, real-time datasets; ETL larger, historical datasets ### Summary Matrix | Scenario | Recommended approach | |---|---| | Real-time dashboard over 3 databases | Federation | | Monthly revenue report over 2 years of data | ETL | | AI model needing fresh features from 5 sources | Federation with acceleration | | Compliance archive of transactional records | ETL | | Ad-hoc exploration of a new data source | Federation | | High-frequency analytics on a data lake | ETL into lakehouse, or federation with acceleration | ## Advanced Topics ### Hybrid Architectures: Combining Federation and ETL In practice, the most effective data architectures use both patterns. Federation handles real-time access and cross-source queries, while ETL pipelines populate warehouses and data lakes for heavy analytical workloads. The challenge is managing the boundary between the two. A common hybrid pattern is **federate-first with selective materialization**. All data sources are accessible via federation by default. As query patterns mature and performance requirements become clear, specific datasets are materialized -- either through traditional ETL into a warehouse, or through local acceleration caches kept fresh by [change data capture](/learn/change-data-capture). This approach minimizes upfront pipeline engineering while providing an optimization path for production workloads. The key architectural decision is where to place the boundary between federated and materialized data. Criteria include query frequency (datasets queried hundreds of times per minute should be materialized), latency sensitivity (sub-second requirements demand local acceleration or warehouse co-location), and data volume (very large datasets are expensive to federate repeatedly). ### ELT and the Modern Data Stack The traditional ETL sequence -- extract, transform, load -- has evolved into ELT (extract, load, transform), where raw data is loaded into the warehouse first and transformations happen inside the warehouse using SQL. Tools like dbt popularized this pattern by enabling transformation-as-code within the warehouse. ELT addresses some of ETL's pain points: transformations are version-controlled, testable, and run inside a powerful SQL engine. But ELT still requires extraction pipelines, still introduces batch latency, and still duplicates data into a central store. For organizations evaluating federation vs. ETL, ELT shares most of ETL's tradeoffs -- the key distinction remains batch movement vs. real-time in-place access. Federation complements ELT architectures by providing real-time access to data that hasn't yet been extracted and loaded. An application can query the federation layer for the freshest data while the ELT pipeline processes the same data for historical analytics on a schedule. ### Federation Performance Optimization Raw federation performance depends on source latency, network bandwidth, and the federation engine's ability to push computation down to sources. Several techniques close the gap between federation and co-located warehouse queries. **Predicate pushdown** is the most impactful optimization. When the federation engine pushes `WHERE` clauses to the source, only matching rows are transferred -- reducing network transfer by orders of magnitude for selective queries. **Parallel execution** dispatches sub-queries to independent sources concurrently. A query joining data from PostgreSQL, S3, and Databricks issues all three sub-queries simultaneously rather than sequentially. **Result caching** stores the results of expensive federated queries for a configurable TTL. Subsequent identical queries are served from cache without hitting the sources. This is particularly effective for dashboard queries that refresh on a fixed interval. **Local acceleration** goes further than result caching by materializing entire datasets locally. Instead of caching individual query results, the acceleration layer maintains a full, queryable copy of the dataset that is refreshed via CDC or scheduled sync. This enables sub-second performance for any query pattern against the accelerated dataset, not just previously executed queries. These optimizations can be combined. A federation engine might push predicates to the source, execute sub-queries in parallel, serve frequently accessed datasets from local acceleration, and cache the merged results for identical follow-up queries. ## How Spice Bridges Federation and ETL [Spice](/platform/sql-federation-acceleration) combines SQL federation and local data acceleration in a single runtime, providing a practical middle ground between pure federation and full ETL. Queries are federated across [30+ data connectors](/integrations) with automatic predicate pushdown and parallel execution. For datasets that require lower latency than raw federation can deliver, Spice provides local acceleration -- caching data in-memory (Apache Arrow) or on-disk (DuckDB) with [change data capture](/learn/change-data-capture) keeping the cache synchronized with source systems. This hybrid approach gives teams the real-time access and operational simplicity of federation, with the performance characteristics of co-located data -- without building and maintaining traditional ETL pipelines. Data freshness is measured in seconds (via CDC) rather than hours (via batch ETL), and new data sources become queryable in minutes rather than weeks. For large-scale [data lake workloads](/use-case/datalake-accelerator), Spice provides acceleration engines optimized for high-throughput analytical queries over object storage, bridging the gap between data lake scalability and the sub-second performance that applications require. The result is an architecture where federation, acceleration, and ETL coexist. Teams start with federation for immediate access, add acceleration for performance-critical datasets, and retain ETL pipelines only for workloads that genuinely require batch materialization into a central warehouse. For many workloads, yes. Federation eliminates the need for ETL pipelines when the goal is real-time, cross-source data access. However, ETL remains valuable for heavy analytical workloads over large historical datasets, compliance archives, and scenarios where source systems cannot handle additional query load. Most production architectures use both patterns for different workloads.

    ', }, { title: 'Is SQL federation slower than querying a warehouse?', paragraph: '

    Raw federation can be slower because queries must travel over the network to source systems. However, federation engines with local acceleration close this gap by caching frequently accessed data in-memory or on-disk. With acceleration, federated queries can match warehouse performance while providing real-time data freshness that ETL cannot.

    ', }, { title: 'How does federation handle source system outages?', paragraph: '

    If a source system is unavailable, federated queries that depend on it will fail. Production federation engines mitigate this with local acceleration caches that can serve queries even when the source is temporarily offline, connection pooling with health checks, and configurable fallback behavior. ETL avoids this issue because the warehouse is independent after data is loaded.

    ', }, { title: 'What happens to ETL pipelines when source schemas change?', paragraph: '

    Schema changes are one of the biggest operational challenges with ETL. When a source adds a column, changes a data type, or renames a table, ETL pipelines break and require manual fixes. Federation handles schema changes more gracefully because queries execute against the current schema at runtime. The query adapts automatically as long as the referenced columns still exist.

    ', }, { title: 'Can I use federation and ETL together in the same architecture?', paragraph: '

    Yes, and this is the recommended approach for most organizations. Use federation for real-time access, ad-hoc queries, and AI workloads. Use ETL for batch analytics, historical reporting, and compliance. Spice supports this hybrid model natively -- federated queries and locally accelerated datasets coexist in a single runtime, so teams choose the right pattern per workload without managing separate infrastructure.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is SQL Federation? URL: https://spice.ai/learn/sql-federation Date: 2025-12-15T00:00:00 Description: SQL federation lets you query multiple databases and data sources with a single SQL statement, without moving or copying data. Learn how federated queries work, key benefits, and common use cases. Most organizations store data across many systems: transactional databases like PostgreSQL and MySQL, analytical warehouses like Databricks and Snowflake, object stores like Amazon S3, and streaming platforms like Kafka. When an application or analyst needs to combine data from several of these systems, the traditional approach is to build ETL (extract, transform, load) pipelines that copy everything into a central warehouse. SQL federation takes a different approach. Instead of moving data, a federation engine connects to each source at query time, translates a single SQL query into source-specific requests, and merges the results. The data stays where it is. The application sees a single, unified SQL interface. ## How SQL Federation Works A federated query goes through three stages: planning, execution, and merging. ### Query Planning When a query arrives, the federation engine parses the SQL and identifies which tables map to which data sources. It then builds an optimized execution plan. The planner determines which operations -- filters, joins, aggregations, sorts -- can be pushed down to each source system versus which must be handled in the federation layer. This planning step is critical for performance. A well-optimized plan minimizes the amount of data transferred over the network by pushing as much work as possible to the sources. ### Predicate and Aggregation Pushdown Pushdown is the most important optimization in SQL federation. When the engine detects that a filter (e.g., `WHERE created_at > '2026-01-01'`) or aggregation (e.g., `COUNT(*)`, `SUM(amount)`) can be executed natively by the source database, it pushes that operation down rather than pulling all the raw data into the federation layer. For example, consider a query that joins customer records from PostgreSQL with order events from Clickhouse, filtered to the last 30 days: ```sql SELECT c.name, COUNT(o.id) as order_count FROM postgres.customers c JOIN clickhouse.orders o ON c.id = o.customer_id WHERE o.created_at > NOW() - INTERVAL '30 days' GROUP BY c.name ``` A federation engine with good pushdown will: 1. Push the `WHERE o.created_at > ...` filter to Clickhouse, so only recent orders are transferred 2. Potentially push the `COUNT` aggregation partially to each source 3. Pull only the filtered, reduced result sets into the federation layer for the final join Without pushdown, the engine would pull every row from both tables and filter locally -- a much slower and more expensive operation. ### Result Merging After each source returns its partial results, the federation layer applies any remaining operations: cross-source joins, final sorts, limit clauses, and formatting. The merged result is returned to the application as a single result set, indistinguishable from a query against a single database. ## SQL Federation vs. ETL Pipelines ETL and SQL federation solve the same fundamental problem -- accessing data across systems -- but they make different tradeoffs. **Data movement:** ETL copies data from sources into a central warehouse on a schedule. Federation queries data in place at runtime. ETL introduces storage duplication and pipeline maintenance. Federation eliminates both but depends on source availability at query time. **Data freshness:** ETL pipelines run on schedules (hourly, daily), so warehouse data is always behind. Federation queries live sources, so results reflect the current state. For AI workloads, real-time dashboards, and operational applications, this freshness difference is significant. **Time to value:** ETL requires schema design, transformation logic, and orchestration before data is queryable. Federation makes a new source available as soon as a connector is configured -- often in minutes. **Performance:** Raw federation queries are bounded by source performance and network latency. ETL trades freshness for speed by pre-computing and co-locating data. The best systems combine both: federation for real-time access, with local acceleration caching for performance-critical queries. **Maintenance:** ETL pipelines break when source schemas change, requiring manual fixes. Federation adapts more gracefully because queries execute against the current schema at runtime. In practice, many production systems use both patterns. Federated queries handle real-time access and ad-hoc exploration, while [acceleration caches](/feature/real-time-change-data-capture) -- kept fresh via change data capture -- provide sub-second performance for latency-sensitive workloads. ## Key Benefits of SQL Federation ### No ETL Pipelines to Maintain Every ETL pipeline is a liability: it can break when source schemas change, it introduces data staleness, and it requires engineering time to build and monitor. Federation eliminates these pipelines for many use cases, reducing the operational burden on data teams. ### Unified SQL Interface Application developers write standard SQL against a single endpoint. The federation engine handles connectivity, dialect translation, and schema mapping across PostgreSQL, MySQL, S3, Databricks, and [30+ other sources](/integrations). Teams don't need to learn each source's query language or manage separate connections. ### Real-Time Data Access Because queries execute against live sources, results reflect the current state of each system. This is critical for operational dashboards, AI workloads, and any application where stale data leads to bad decisions. ### Governed, Secure Access A federation layer provides a single point of access control, audit logging, and policy enforcement. Instead of managing permissions across every source individually, teams define policies once at the federation layer. This simplifies compliance and security, especially in regulated industries like [financial services](/industry/financial-services) and [cybersecurity](/industry/cybersecurity). ## Common SQL Federation Use Cases ### Cross-Database Analytics Join customer data in PostgreSQL with event data in Clickhouse and product data in Amazon S3 -- all in a single query. Federation eliminates the need to pre-join datasets in a warehouse, making it possible to run ad-hoc analytics across any combination of sources. ### AI and Machine Learning Pipelines AI models and [retrieval-augmented generation (RAG)](/learn/retrieval-augmented-generation) systems need fresh, complete data from multiple sources. Federation provides the real-time, multi-source data access that AI workloads require without building and maintaining separate data pipelines for each model. ### Operational Data Lakehouses Combine transactional databases with analytical stores and object storage into a [single queryable layer](/use-case/operational-data-lakehouse). Federation bridges the gap between OLTP and OLAP workloads, enabling teams to query operational and analytical data together. ### Data Mesh Architectures In data mesh, each domain team owns its data products. Federation provides governed, cross-domain queries without centralizing data into a monolithic warehouse. Each team maintains autonomy over its data while the federation layer enables organization-wide access. ## SQL Federation with Spice [Spice](/platform/sql-federation-acceleration) combines SQL federation and local acceleration in a single runtime. Queries are federated across [30+ data connectors](/integrations) with predicate pushdown, then frequently accessed data is automatically cached locally for sub-second performance. This combination addresses the main limitation of pure federation -- source latency -- while preserving the benefits of real-time data access. [Change data capture](/learn/change-data-capture) keeps acceleration caches synchronized with source systems, so cached data is always fresh. ## Advanced Topics ### Federation Query Planning Internals The query planner is the most performance-critical component in a federation engine. When a multi-source SQL query arrives, the planner must decompose it into a set of sub-queries that each target a single source, determine the optimal execution order, and decide which operations to execute locally versus remotely. Modern federation engines build a logical plan tree from the parsed SQL, then apply a series of optimizer rules. The most impactful rules include join reordering (choosing which source to query first based on estimated selectivity), predicate pushdown (moving filters as close to the source as possible), and projection pruning (requesting only the columns needed by the final result, rather than `SELECT *` from each source). The planner also handles type coercion between sources. PostgreSQL's `TIMESTAMPTZ`, Clickhouse's `DateTime64`, and S3 Parquet's `TIMESTAMP_MICROS` all represent timestamps differently. The planner inserts cast operations to normalize types before cross-source joins. ```mermaid flowchart LR A[SQL Query] --> B[Query Parser] B --> C[Logical Planner] C --> D[Optimizer Rules] D --> E1[Source Query: PostgreSQL] D --> E2[Source Query: Clickhouse] D --> E3[Source Query: S3] E1 --> F[Merge & Join Layer] E2 --> F E3 --> F F --> G[Final Result Set] ``` ### Cost-Based Optimization Simple rule-based planners apply optimizations in a fixed order, which works for straightforward queries but misses opportunities in complex ones. Cost-based optimizers (CBOs) estimate the execution cost of multiple candidate plans and select the cheapest one. Cost estimation in federation is harder than in a single database because the planner must account for network transfer costs, source-specific query latency, and varying source capabilities. A CBO might estimate that pushing a `GROUP BY` to Clickhouse (which is optimized for aggregations) saves more time than pushing it to PostgreSQL, even if both support the operation. The planner assigns cost weights to network transfer, source compute, and local compute, then evaluates candidate plans against these weights. In practice, federation engines maintain statistics about source performance -- average query latency, throughput capacity, and supported pushdown operations -- and use these to inform planning decisions. ### Connection Pooling and Source Management Each federated query opens connections to one or more source databases. Without connection pooling, a burst of concurrent queries could exhaust source connection limits and cause failures. Federation engines maintain connection pools for each configured source, reusing connections across queries and enforcing concurrency limits. Connection pools also handle health checks and failover. If a source becomes temporarily unavailable, the pool marks it unhealthy and returns errors immediately rather than hanging on connection timeouts. When the source recovers, the pool resumes routing queries to it. For latency-sensitive workloads, some engines support read replicas or fallback sources -- if the primary source is slow, the query is routed to a replica or an [acceleration cache](/learn/data-acceleration) instead. ETL copies data from source systems into a central warehouse on a schedule. SQL federation queries data in place at runtime without copying it. ETL introduces latency and pipeline maintenance, while federation provides real-time access but depends on source availability. Many teams combine both: federated queries for real-time access and acceleration caches for performance-critical workloads.

    ', }, { title: 'How does SQL federation handle performance across slow data sources?', paragraph: '

    Federation engines use predicate pushdown to minimize data transfer and query acceleration (local caching) to avoid repeated round-trips to slow sources. Spice combines federation with local acceleration -- frequently accessed data is cached in-memory or on-disk, enabling sub-second queries even when the underlying source is slow or remote.

    ', }, { title: 'What types of data sources support SQL federation?', paragraph: '

    Most federation engines support relational databases (PostgreSQL, MySQL, SQL Server), analytical warehouses (Databricks, Snowflake, BigQuery), object stores (S3, Azure Blob, GCS), and streaming systems. The specific connectors vary by engine. Spice supports 30+ data connectors out of the box.

    ', }, { title: 'Is SQL federation suitable for production workloads?', paragraph: '

    Yes, when paired with query acceleration and proper governance. Raw federation without caching can be slow for latency-sensitive applications. Production-grade federation engines like Spice add local acceleration, connection pooling, and fault tolerance to ensure reliable sub-second performance.

    ', }, { title: 'How does SQL federation differ from data virtualization?', paragraph: '

    Data virtualization is the broader concept of abstracting data access across sources. SQL federation is a specific implementation that uses SQL as the query interface. All SQL federation is data virtualization, but data virtualization can also include REST APIs, GraphQL, or other query paradigms.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Text-to-SQL? URL: https://spice.ai/learn/text-to-sql Date: 2026-01-15T00:00:00 Description: Text-to-SQL uses large language models to translate natural language questions into SQL queries. Learn how it works, common approaches, key challenges, and production patterns. Databases hold answers to most business questions. But getting those answers requires SQL -- a skill that most stakeholders, analysts, and even many developers don't use daily. The gap between "I want to know X" and `SELECT ... FROM ... WHERE ...` is where text-to-SQL fits in. Text-to-SQL systems accept a natural language question (e.g., "What were our top 10 customers by revenue last quarter?"), generate the corresponding SQL query, execute it against a database, and return the results. The translation is handled by a large language model (LLM) that has been given enough context about the database schema to produce valid, executable SQL. This is not a new idea -- natural language interfaces to databases (NLIDBs) date back to the 1970s. What changed is that modern LLMs are good enough at SQL generation to make these systems practical for real workloads. ## How Text-to-SQL Works A text-to-SQL system has four stages: schema context injection, natural language parsing, SQL generation, and result delivery. ### Schema Context Injection Before the LLM can generate SQL, it needs to understand the database structure. This means providing table names, column names, data types, primary and foreign key relationships, and ideally sample values or column descriptions. This context is injected into the LLM prompt alongside the user's question. The quality of schema context directly determines output quality. An LLM that knows `orders.customer_id` references `customers.id` will generate correct joins. Without that context, it may hallucinate column names or produce syntactically valid but semantically wrong queries. Schema context can be provided statically (a fixed schema description in the prompt) or dynamically (retrieved at query time based on the user's question). Dynamic retrieval is more scalable for large databases with hundreds of tables, because it limits the prompt to only the relevant tables. ### Natural Language Parsing The LLM interprets the user's intent from their natural language input. This involves: - Identifying the entities referenced (tables, columns, metrics) - Understanding temporal references ("last quarter," "year-over-year") - Resolving ambiguity ("revenue" might mean `gross_revenue`, `net_revenue`, or `total_amount` depending on the schema) - Recognizing aggregation intent ("top 10," "average," "total") This step is where most errors originate. Natural language is inherently ambiguous, and the same question can map to different SQL queries depending on business context. ### SQL Generation The LLM produces a SQL query based on the parsed intent and schema context. A well-constructed prompt might yield: ```sql SELECT c.name, SUM(o.total_amount) AS revenue FROM customers c JOIN orders o ON c.id = o.customer_id WHERE o.created_at >= DATE_TRUNC('quarter', CURRENT_DATE - INTERVAL '3 months') AND o.created_at < DATE_TRUNC('quarter', CURRENT_DATE) GROUP BY c.name ORDER BY revenue DESC LIMIT 10 ``` The generated SQL must be syntactically correct for the target database dialect (PostgreSQL, MySQL, DuckDB, etc.), use only columns and tables that actually exist, and correctly express the user's intent. ### Execution and Result Delivery The generated SQL is executed against the database, and results are returned to the user. In production systems, this step includes validation (checking the SQL for syntax errors or disallowed operations before execution), sandboxing (running the query with restricted permissions), and result formatting (converting tabular results into a human-readable response). ## NSQL: Natural SQL as an Emerging Concept NSQL (natural SQL) refers to an emerging approach where the boundary between natural language and SQL becomes fluid. Rather than treating text-to-SQL as a strict translation problem -- natural language in, SQL out -- NSQL systems allow users to express queries in a mix of natural language and SQL fragments. For example, a user might write: "Show me all orders WHERE total > 1000 from last week." The system interprets the natural language portions ("from last week") and passes through the SQL fragments (`WHERE total > 1000`) directly. NSQL is still an early concept, but it reflects a practical reality: power users often know parts of the SQL they want and prefer to specify those directly rather than relying entirely on LLM interpretation. ## Common Approaches to Text-to-SQL ### Prompt Engineering with General-Purpose LLMs The most common approach uses a general-purpose LLM (GPT-4, Claude, Llama) with a carefully engineered prompt that includes the database schema, query examples, and instructions for SQL generation. This requires no model training and can be deployed immediately. The prompt typically includes: - The database schema (DDL statements or structured descriptions) - A few examples of natural language to SQL mappings (few-shot learning) - Instructions about the target SQL dialect - Business-specific terminology mappings This approach works well for straightforward queries but can struggle with complex joins, subqueries, and domain-specific logic. ### Fine-Tuned Models Fine-tuning a base LLM on a dataset of (question, SQL) pairs from a specific database produces a model that is specialized for that schema. Fine-tuned models are typically more accurate for their target database than general-purpose models, but they require training data, compute resources, and retraining when the schema changes. Open-source models like SQLCoder and NSQL-Llama have been fine-tuned specifically for text-to-SQL tasks and achieve competitive accuracy on benchmarks like Spider and BIRD. ### RAG over Schema Metadata For databases with hundreds or thousands of tables, including the full schema in every prompt is impractical. Retrieval-augmented generation ([RAG](/learn/retrieval-augmented-generation)) addresses this by retrieving only the relevant schema elements at query time. When a user asks a question, the system: 1. Embeds the question and searches a vector index of table and column descriptions 2. Retrieves the most relevant tables and their schemas 3. Constructs a prompt with only the relevant schema context 4. Generates SQL using the focused context This approach scales to large databases and reduces hallucination by limiting the schema surface area the LLM must reason about. ## Challenges in Text-to-SQL ### Ambiguity in Natural Language "Show me active users" could mean users who logged in today, users with active subscriptions, or users who have made a purchase in the last 30 days. Without explicit business definitions, the LLM must guess -- and it often guesses wrong. The most reliable mitigation is to include business glossaries or metric definitions in the prompt context: "active user = a user with at least one login event in the last 30 days." ### Complex Joins and Multi-Table Queries Single-table queries are relatively straightforward. Performance degrades significantly for queries that require multiple joins, correlated subqueries, window functions, or CTEs (common table expressions). These queries demand that the LLM correctly trace foreign key relationships across several tables -- a task that increases in difficulty with schema size. ### Hallucinated Column and Table Names LLMs can generate SQL that references columns or tables that don't exist. This is particularly common when the schema context is incomplete or when column names are ambiguous. A model might generate `SELECT user_email FROM users` when the actual column is `email_address`. Schema validation before execution catches these errors, but it doesn't fix them. More advanced systems re-prompt the LLM with the error message, allowing it to self-correct. ### SQL Dialect Differences SQL is not a single language. PostgreSQL, MySQL, BigQuery, DuckDB, and SQL Server each have their own syntax for date functions, string operations, window functions, and type casting. A text-to-SQL system must generate queries in the correct dialect for the target database. ## Evaluating Text-to-SQL Systems Two primary metrics are used to evaluate text-to-SQL accuracy: **Execution accuracy** measures whether the generated SQL, when executed, produces the correct result set. This is the more practical metric -- it doesn't matter if the SQL is different from the reference query as long as the results match. **Exact match accuracy** measures whether the generated SQL exactly matches a reference query. This is stricter and less useful in practice, since many different SQL queries can produce the same results. On the Spider benchmark (a widely used text-to-SQL evaluation dataset), state-of-the-art systems achieve 85-90% execution accuracy on simple queries but drop to 50-70% on complex queries involving multiple joins, nested subqueries, and aggregations. ## Production Patterns for Text-to-SQL ### SQL Validation and Sandboxing Never execute LLM-generated SQL without validation. At minimum, production systems should: - Parse the SQL and verify that all referenced tables and columns exist - Check for disallowed operations (DROP, DELETE, UPDATE in read-only contexts) - Execute the query with a read-only database user and strict resource limits (timeouts, row limits) - Log every generated query for audit and debugging ### Result Verification After execution, verify that the results are reasonable. Common checks include: - Row count sanity checks (a query asking for "top 10" should return at most 10 rows) - Type validation (a "revenue" column should contain numeric values) - NULL handling (unexpected NULLs often indicate a wrong join) ### Multi-Turn Refinement When the first query doesn't match the user's intent, a conversational interface allows the user to refine their question. The system can use the previous query, its results, and the user's feedback to generate an improved query. This iterative approach significantly improves practical accuracy. ## Text-to-SQL with SQL Federation Text-to-SQL becomes more powerful when combined with [SQL federation](/learn/sql-federation). In a federated environment, a single SQL query can access data from PostgreSQL, Databricks, S3, and dozens of other sources simultaneously. Text-to-SQL on top of federation means a user can ask a natural language question that spans multiple data systems -- without knowing where the data lives or how to connect to each source. For example, "Compare customer satisfaction scores from our CRM with order volumes from the warehouse" becomes a federated query that joins data across two different systems, all triggered by a single natural language question. The combination also benefits from [LLM inference](/learn/llm-inference) served alongside the data layer. When the LLM that generates SQL and the federation engine that executes it are co-located, the round-trip time from question to answer is minimized. [Tool calling](/learn/llm-tool-calling) patterns enable LLMs to invoke SQL execution as a tool, creating agentic workflows where the model can iteratively query, inspect results, and refine its approach. Spice provides both [SQL federation](/platform/sql-federation-acceleration) and LLM inference in a single runtime, making it possible to build text-to-SQL applications that query across 30+ data sources with sub-second latency. ## Advanced Topics ### Schema-Aware Prompting The difference between a text-to-SQL system that works in demos and one that works in production often comes down to how schema context is structured in the LLM prompt. Naive approaches dump the full DDL (CREATE TABLE statements) into the prompt and hope the model figures out the relationships. Production systems are more deliberate. Effective schema-aware prompting includes several layers of context beyond raw DDL: - **Column descriptions:** Natural language annotations explaining what each column contains. `orders.status` might be an enum with values `pending`, `shipped`, `delivered`, `cancelled` -- without this context, the LLM cannot correctly filter by status. - **Foreign key annotations:** Explicit statements like "orders.customer_id references customers.id" guide the model toward correct joins. Without these, the model may join on column name similarity, which often produces wrong results. - **Sample values:** Including 3-5 representative values for categorical columns (e.g., `region: ['us-east', 'us-west', 'eu-west', 'ap-southeast']`) helps the LLM generate correct filter predicates. - **Business glossary entries:** Definitions like "active customer = a customer with at least one order in the last 90 days" resolve ambiguity before the model encounters it. For large schemas, dynamic schema retrieval is essential. The system embeds the user's question, searches a vector index of table and column descriptions, and includes only the top-k most relevant tables in the prompt. This keeps the prompt focused and reduces hallucination of non-existent columns. ```mermaid flowchart LR A[Natural Language Query] --> B[Schema Retrieval] B --> C[Prompt Assembly] C --> D[LLM Generates SQL] D --> E[SQL Validation] E -->|Valid| F[Query Execution] E -->|Invalid| G[Error Feedback to LLM] G --> D F --> H[Result Formatting] ``` ### Query Validation Pipelines LLM-generated SQL cannot be trusted without validation. A production validation pipeline applies multiple checks before execution: **Syntactic validation** parses the SQL using the target dialect's parser. This catches malformed queries, unclosed parentheses, and invalid keywords before they reach the database. Parsing also produces an AST (abstract syntax tree) that subsequent checks can inspect. **Schema validation** verifies that every table and column referenced in the query actually exists in the database catalog. This is the most effective defense against hallucinated column names -- the most common failure mode in text-to-SQL systems. Schema validation also checks data types: if the query compares a string column to an integer, the validator flags the type mismatch. **Policy validation** enforces security and governance rules. Common policies include: no DDL statements (DROP, ALTER, CREATE), no DML mutations (INSERT, UPDATE, DELETE) in read-only contexts, no queries against restricted tables or columns, and mandatory WHERE clauses on large tables to prevent full table scans. Policy validation inspects the AST to detect disallowed operations. **Cost estimation** uses the database's EXPLAIN output to estimate query cost before execution. Queries that exceed a cost threshold (indicating a potential full table scan or cartesian join) are rejected or flagged for human review. This prevents runaway queries that could overload the database. When validation fails, the most effective recovery strategy is to feed the error message back to the LLM and ask it to regenerate the query. Most validation errors (wrong column name, missing join condition) are easily correctable with one retry. ### Multi-Turn SQL Conversations Single-turn text-to-SQL -- one question, one query -- covers the simplest use cases. Production systems increasingly support multi-turn conversations where users iteratively refine their queries. In a multi-turn flow, the system maintains a conversation context that includes: the original question, the generated SQL, the query results, and any follow-up questions. When the user says "now filter that to just the US region" or "break that down by month," the system must modify the previous query rather than generating a new one from scratch. The technical challenge is context window management. Each turn adds the previous SQL and results to the prompt, consuming token budget. Production systems manage this by summarizing previous turns (replacing full result sets with row counts and column summaries), maintaining a running SQL query that accumulates modifications, and capping conversation depth (typically 5-10 turns before resetting context). Multi-turn conversations also enable a powerful debugging pattern: when a query returns unexpected results, the user can ask "why are there NULL values in the revenue column?" and the system can inspect the query, identify the likely cause (a LEFT JOIN that produced unmatched rows), and suggest a correction. This iterative refinement loop makes text-to-SQL practical for exploratory analysis where the user doesn't know the exact question upfront. Text-to-SQL treats the problem as a strict translation from natural language to SQL. NSQL (natural SQL) is an emerging concept that allows users to mix natural language and SQL fragments in a single query, giving power users more control over the generated output while still handling natural language interpretation for ambiguous parts.

    ', }, { title: 'How accurate is text-to-SQL in practice?', paragraph: '

    On standard benchmarks, state-of-the-art systems achieve 85-90% execution accuracy on simple queries (single table, basic filters and aggregations). Accuracy drops to 50-70% for complex queries involving multiple joins, subqueries, and window functions. In production, accuracy depends heavily on schema context quality, prompt engineering, and the complexity of the target database.

    ', }, { title: 'What are the security risks of text-to-SQL?', paragraph: '

    The primary risk is SQL injection -- an LLM could generate destructive SQL (DROP TABLE, DELETE FROM) if not properly constrained. Production systems mitigate this by executing generated queries with read-only database credentials, parsing and validating SQL before execution, disallowing DDL and DML operations, and enforcing query timeouts and row limits.

    ', }, { title: 'Can text-to-SQL handle complex multi-table queries?', paragraph: '

    It can, but accuracy decreases with query complexity. Queries requiring multiple joins, correlated subqueries, CTEs, and window functions are significantly harder for LLMs to generate correctly. Providing complete foreign key relationships and example queries in the prompt context improves results. RAG-based schema retrieval also helps by surfacing the most relevant tables.

    ', }, { title: 'Is text-to-SQL production-ready?', paragraph: '

    Text-to-SQL is production-ready for specific, well-scoped use cases -- particularly when paired with SQL validation, sandbox execution, result verification, and human-in-the-loop review for critical queries. It works best as an assistant that generates draft queries for review, rather than a fully autonomous system executing unchecked SQL against production databases.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Vector Search? URL: https://spice.ai/learn/vector-search Date: 2026-03-08T00:00:00 Description: Vector search (semantic search) finds the most similar items by comparing vector embeddings using distance metrics like cosine similarity. Learn how ANN algorithms like HNSW work, vector index tradeoffs, and when to use vector search vs. keyword search. Traditional search systems match keywords. Vector search matches meaning. When a user searches for "how to fix a slow API," vector search can find documents about "improving endpoint latency" or "API performance optimization" -- even though these phrases share no words with the query. This capability has made vector search a foundational technology for AI applications, from semantic search to [retrieval-augmented generation (RAG)](/learn/retrieval-augmented-generation). Vector search works by converting text (or images, code, or any data) into numerical vectors called [embeddings](/learn/embeddings), then finding the vectors in a database that are closest to the query vector. The "closeness" is measured by a distance metric, and specialized index structures make this lookup fast even across millions or billions of vectors. ## How Vector Search Works The vector search pipeline has two phases: indexing and querying. ### Indexing At index time, every document (or document chunk) is passed through an embedding model to produce a dense vector -- a list of floating-point numbers, typically 384 to 3072 dimensions. These vectors are stored in a vector index alongside metadata (document ID, text, source, timestamps). ### Querying At query time: 1. The search query is passed through the same embedding model to produce a query vector 2. The vector index finds the **k nearest neighbors** -- the k stored vectors closest to the query vector 3. The corresponding documents are returned, ranked by similarity ```mermaid flowchart LR A[Query] --> B[Embed] B --> C[ANN Index Lookup] C --> D[Top-k Nearest Vectors] D --> E[Results] ``` The critical property is that vectors for semantically similar content are close together in the embedding space. "Cancel my subscription" and "terminate my account" produce nearby vectors because the embedding model learned that these phrases have similar meanings. ## Distance Metrics The choice of distance metric determines how "closeness" between vectors is measured: ### Cosine Similarity Measures the angle between two vectors, ignoring their magnitudes. Two vectors pointing in the same direction have a cosine similarity of 1, orthogonal vectors have 0, and opposing vectors have -1. This is the most common metric because it handles vectors of different magnitudes gracefully. ``` cosine_similarity(A, B) = (A . B) / (||A|| * ||B||) ``` ### Dot Product Computes the sum of element-wise products. Unlike cosine similarity, the dot product is affected by vector magnitude -- longer vectors produce larger dot products. This is useful when magnitude carries information (e.g., a document's relevance or importance). Many embedding models produce normalized vectors, in which case dot product and cosine similarity are equivalent. ``` dot_product(A, B) = sum(A[i] * B[i]) ``` ### Euclidean Distance (L2) Measures the straight-line distance between two points in vector space. Smaller distances indicate greater similarity. Euclidean distance is sensitive to vector magnitude and works best when vectors are normalized to unit length. ``` euclidean_distance(A, B) = sqrt(sum((A[i] - B[i])^2)) ``` In practice, cosine similarity is the default choice for text [embeddings](/learn/embeddings). If your embedding model produces normalized vectors (most modern models do), all three metrics produce equivalent rankings. ## Approximate Nearest Neighbor (ANN) Algorithms Exact nearest neighbor search requires comparing the query vector against every stored vector -- an O(n) operation that becomes prohibitively slow at scale. **Approximate nearest neighbor (ANN)** algorithms trade a small amount of recall accuracy for dramatic speed improvements, making it possible to search millions of vectors in milliseconds. ### HNSW (Hierarchical Navigable Small World) HNSW is the most widely used ANN algorithm. It builds a multi-layer graph where: - The bottom layer contains all vectors, connected to their nearest neighbors - Higher layers contain progressively fewer vectors (a random subset), forming "express lanes" - Search starts at the top layer and navigates greedily toward the query vector, dropping to lower layers as it gets closer This hierarchical structure enables O(log n) search complexity. HNSW provides excellent recall (typically 95-99%) with sub-millisecond query times on datasets of millions of vectors. ### IVF (Inverted File Index) IVF partitions the vector space into clusters using k-means clustering. At query time, only the vectors in the nearest clusters are searched, reducing the search space dramatically. 1. **Index time:** Cluster all vectors into **nlist** partitions using k-means 2. **Query time:** Find the **nprobe** nearest cluster centroids, then search only the vectors within those clusters IVF is faster to build than HNSW and uses less memory, but typically achieves lower recall at the same query speed. It works well when combined with other techniques like product quantization. ### Product Quantization (PQ) Product quantization compresses vectors to reduce memory usage. It divides each vector into sub-vectors, quantizes each sub-vector to the nearest centroid in a learned codebook, and stores only the centroid IDs. This reduces memory by 10-100x at the cost of some accuracy. PQ is often combined with IVF (IVF-PQ) or HNSW (HNSW-PQ) to enable vector search on datasets too large to fit in memory at full precision. ## Vector Index Tradeoffs Choosing and configuring a vector index involves balancing three factors: | Factor | HNSW | IVF | IVF-PQ | | ------------ | --------------------------- | ----------------------------------- | ----------------------------- | | Recall | High (95-99%) | Moderate (85-95%) | Lower (80-90%) | | Query speed | Fast (sub-ms) | Fast (sub-ms) | Fast (sub-ms) | | Memory usage | High (full vectors + graph) | Moderate (full vectors + centroids) | Low (compressed vectors) | | Build time | Slow (graph construction) | Moderate (k-means) | Moderate (k-means + codebook) | | Update cost | Low (incremental insert) | High (re-clustering needed) | High (re-clustering needed) | For most applications with fewer than 10 million vectors, HNSW is the default choice -- it provides the best recall with acceptable memory usage. For larger datasets or memory-constrained environments, IVF-PQ provides a good balance. ## Vector Search vs. Keyword Search Vector search and [BM25 full-text search](/learn/bm25-full-text-search) have complementary strengths: | Aspect | Vector Search | Keyword Search (BM25) | | -------------------------------------------------- | -------------------------------------- | -------------------------- | | Matches | Semantic meaning | Exact terms | | "fix slow API" vs. "endpoint latency optimization" | Match | No match | | Error code "ERR-4502" | Weak (may match generic error content) | Exact match | | Vocabulary mismatch handling | Strong | None | | Interpretability | Low (opaque similarity scores) | High (which terms matched) | | Index type | Vector index (HNSW, IVF) | Inverted index | | Index storage | Dense vectors (KB per document) | Posting lists (smaller) | Neither method is strictly superior. Vector search excels at understanding intent and handling vocabulary mismatch. Keyword search excels at matching exact terms, identifiers, and technical terminology. In practice, [hybrid search](/learn/hybrid-search) -- running both methods and fusing results with algorithms like Reciprocal Rank Fusion -- delivers the best results for most production use cases. ## Vector Databases vs. Vector Search in SQL Engines Teams implementing vector search have two architectural choices: **Dedicated vector databases** (Pinecone, Weaviate, Qdrant, Milvus) are purpose-built for vector storage and search. They provide optimized ANN algorithms, built-in metadata filtering, and managed scaling. The tradeoff is another system to deploy, another data pipeline to maintain, and no native SQL support for joining vector results with relational data. **Vector search in SQL engines** embeds vector indexing within a SQL-compatible query engine. This approach allows vector similarity search alongside standard SQL queries -- filtering, joining, aggregating -- in a single system. The tradeoff is that general-purpose SQL engines may not match the raw vector search performance of dedicated systems, though for most workloads the difference is negligible. For [RAG applications](/learn/retrieval-augmented-generation) and [enterprise AI use cases](/use-case/secure-ai-agents), the ability to combine vector search with SQL is a significant advantage. Filtering search results by metadata (date ranges, access permissions, document categories), joining with relational data, and expressing complex retrieval logic in SQL eliminates the application-layer glue code required when vector search and SQL live in separate systems. ## Vector Search with Spice [Spice](/platform/hybrid-sql-search) provides vector similarity search alongside [BM25 full-text search](/learn/bm25-full-text-search) and SQL in a single unified runtime. This enables [hybrid search](/learn/hybrid-search) -- combining vector and keyword results with built-in RRF fusion -- without managing separate systems. Key capabilities: - **Vector, full-text, and SQL search** in one query engine -- store [embeddings](/learn/embeddings), build vector indexes, and search alongside your relational data - **[SQL federation](/learn/sql-federation)** across [30+ connected data sources](/integrations), with vector search results joinable with federated data - **[Real-time CDC](/learn/change-data-capture)** to keep vector indexes fresh as source data changes - **[LLM inference](/learn/llm-inference)** for generating embeddings alongside search queries in the same runtime ```sql -- Vector similarity search in Spice SELECT * FROM search( 'knowledge_base', 'how to optimize query performance', mode => 'vector', limit => 10 ) ``` The unified approach eliminates the operational complexity of maintaining separate vector databases and keeping them synchronized with your primary data stores. When source data changes, both vector and keyword indexes update through the same [change data capture](/learn/change-data-capture) pipeline, ensuring consistent search results across all modalities. ## Advanced Topics ### HNSW Internals HNSW (Hierarchical Navigable Small World) constructs a proximity graph with a hierarchical structure inspired by skip lists. Understanding its internals helps with tuning: **Graph construction:** When inserting a new vector, HNSW assigns it a random maximum layer (drawn from an exponential distribution). The vector is then connected to its nearest neighbors at each layer from the top down. The greedy search used during insertion finds these neighbors efficiently. **Key parameters:** - **M:** The number of bi-directional links per node at each layer. Higher M increases recall and memory usage. Typical values are 16-64. - **ef_construction:** The size of the dynamic candidate list during index construction. Higher values produce a better graph (higher recall) at the cost of slower build times. Typical values are 100-400. - **ef_search:** The size of the dynamic candidate list during search. Higher values increase recall at the cost of query latency. This is the primary parameter for tuning the recall-speed tradeoff at query time. The relationship between these parameters is: ef_construction determines the quality ceiling of the graph, M determines the memory footprint, and ef_search controls the runtime tradeoff between recall and speed. ### Filtered Vector Search In practice, vector search rarely operates in isolation -- users want to filter results by metadata (date ranges, categories, access permissions) alongside semantic similarity. Filtered vector search combines vector nearest-neighbor queries with predicate-based filtering. There are three approaches: 1. **Pre-filtering:** Apply metadata filters first, then search only the matching vectors. This is precise but can be slow if the filter is very selective (few matching vectors). 2. **Post-filtering:** Run vector search first, then filter results by metadata. This is fast but may return fewer than k results if many top candidates are filtered out. 3. **Integrated filtering:** Apply filters during the ANN search traversal. This is the most sophisticated approach, supported by modern vector indexes, and balances speed with precision. The choice depends on filter selectivity. Highly selective filters (matching < 1% of documents) favor pre-filtering. Broad filters (matching > 50%) favor post-filtering or integrated filtering. ### Multi-Vector Retrieval (ColBERT) Standard vector search represents each document as a single embedding vector. **ColBERT** (Contextualized Late Interaction over BERT) takes a different approach: it represents each document as a set of token-level vectors (one per token) and computes similarity using late interaction. At query time, each query token's vector is compared against all document token vectors using a MaxSim operation -- for each query token, find its maximum similarity to any document token, then sum these maximums. This token-level matching is more expressive than single-vector comparison because it can capture fine-grained relevance signals. ColBERT achieves higher retrieval quality than single-vector models, especially on queries requiring precise term-level matching. The tradeoff is significantly higher storage requirements (one vector per token instead of one per document) and more complex index structures. Recent work on compressed ColBERT representations (ColBERTv2) reduces storage costs while maintaining most of the quality advantage. They are often used interchangeably. Semantic search is the broader concept of searching by meaning rather than keywords. Vector search is the specific technique that powers semantic search -- encoding content as vectors and finding nearest neighbors. In practice, saying "vector search" implies the same capability as "semantic search."

    ', }, { title: 'How much memory does a vector index require?', paragraph: '

    Memory depends on the number of vectors, their dimensions, and the index type. A rough formula for HNSW: memory (bytes) = num_vectors * (dimensions * 4 + M * 8 + overhead). For example, 1 million 768-dimensional vectors with M=16 requires approximately 3.2 GB of memory. Product quantization can reduce this by 10-100x at the cost of some recall accuracy.

    ', }, { title: 'What recall rate should I target for production?', paragraph: '

    For most applications, 95% recall or higher is a good target. This means 95% of the true nearest neighbors are returned by the approximate search. For RAG applications where retrieval quality directly determines answer quality, aim for 98-99% recall. You can increase recall by tuning ef_search (HNSW) or nprobe (IVF) at the cost of slightly higher query latency.

    ', }, { title: 'Can I update vectors in place without rebuilding the index?', paragraph: '

    It depends on the index type. HNSW supports incremental inserts and deletes without rebuilding -- new vectors are connected into the existing graph. IVF-based indexes may require periodic re-clustering as the data distribution changes. In practice, most production systems use HNSW for workloads that require frequent updates.

    ', }, { title: 'When should I use vector search vs. hybrid search?', paragraph: '

    Use pure vector search when queries are primarily conceptual and vocabulary mismatch is the dominant challenge (e.g., natural language questions against a knowledge base). Use hybrid search when queries may include exact identifiers, technical terms, or product names that must be matched precisely. In most production systems, hybrid search outperforms pure vector search because it captures both semantic and lexical signals.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## What is Vortex? URL: https://spice.ai/learn/vortex Date: 2026-02-11T00:00:00 Description: Vortex is an open-source compressed columnar file format designed for analytical queries. Learn how Vortex compares to Parquet, its adaptive encoding system, and when to use it over Parquet for analytical workloads. Analytical workloads -- dashboards, AI pipelines, federated queries -- depend on fast, efficient reads over large columnar datasets. Apache Parquet has been the standard columnar file format for over a decade, but its design predates many of the techniques that modern hardware and query engines can exploit: memory-mapped I/O, zero-copy reads, and adaptive per-column encoding. Vortex is a new open-source columnar file format built from scratch to take advantage of these capabilities. It is developed by Spiral, the open-source team behind Spice, and is designed specifically for the demands of analytical query engines that need to scan, filter, and aggregate data as fast as possible. ## How Vortex Works At its core, Vortex stores data in a columnar layout -- each column is stored independently, so a query that only needs three columns out of fifty reads only those three. This is the same fundamental principle behind Parquet, ORC, and other columnar formats. Where Vortex diverges is in how it encodes and compresses the data within each column. ### Adaptive Encoding Traditional columnar formats apply a single encoding scheme per column (or per row group). Parquet, for example, uses dictionary encoding for low-cardinality columns and falls back to plain encoding otherwise. The encoding is chosen at write time and remains fixed. Vortex takes a different approach: it uses a cascading encoding system that adapts to the actual data distribution within each column segment. Rather than selecting a single encoding, Vortex can layer multiple encodings on top of each other: - **Dictionary encoding** for columns with repeated values - **Run-length encoding (RLE)** for columns with consecutive repeated values - **Frame-of-reference (FOR)** encoding for columns with values clustered around a base - **Bit-packing** for integer columns that don't use the full bit width - **Delta encoding** for monotonically increasing sequences like timestamps - **Constant encoding** for segments where every value is the same The encoding selection happens per column segment, not per column. A single column can use different encodings for different portions of the data, depending on the local distribution. This means Vortex consistently achieves better compression ratios than formats that apply a single encoding globally. ### Zero-Copy Reads and Memory Mapping Vortex is designed for zero-copy reads from memory-mapped files. When a query engine accesses a Vortex file, it can memory-map the file and read encoded data directly without first decompressing the entire column into a separate buffer. The encodings are designed so that common operations -- scanning, filtering, aggregation -- can operate directly on the encoded representation. This is a significant architectural difference from Parquet, where data must be fully decompressed and decoded before a query engine can process it. With Vortex, decompression is lazy: only the data actually needed by the query is decoded, and only at the point of use. ### Random Access Without Full Decompression Parquet supports predicate pushdown through row group statistics (min/max values), but once a row group is selected, the entire column chunk must be decompressed to access individual values. Vortex supports fine-grained random access within encoded segments. A query that needs a single value from a column can locate and decode just that value without decompressing the surrounding data. This property is particularly valuable for point lookups, late materialization, and any query pattern where only a small fraction of the data in a column is actually needed. ## Vortex vs. Apache Parquet Both Vortex and Parquet are columnar file formats, and they share the same goal: efficient storage and retrieval of analytical data. The differences are in execution. **Encoding flexibility:** Parquet uses a fixed set of encodings chosen at write time. Vortex uses adaptive, cascading encodings that vary per column segment based on data distribution. This gives Vortex consistently better compression ratios across diverse data types. **Decompression model:** Parquet requires full decompression of column chunks before processing. Vortex supports lazy decompression and can operate on encoded data directly, reducing memory usage and improving scan performance. **Random access:** Parquet's smallest addressable unit is a column chunk within a row group. Vortex supports finer-grained access within encoded segments, enabling efficient point lookups and late materialization. **Memory mapping:** Vortex is designed for zero-copy reads from memory-mapped files. Parquet was not designed with memory mapping as a primary access pattern, though some implementations (like DuckDB's Parquet reader) add this capability at the reader level. **Ecosystem maturity:** Parquet has broad ecosystem support -- virtually every data tool can read and write Parquet. Vortex is newer and currently used primarily within the Spice ecosystem. For interchange between systems, Parquet remains the standard. For performance-critical acceleration workloads, Vortex offers measurable advantages. ## Vortex vs. Other Columnar Formats ### Vortex vs. Lance Lance is a columnar format designed for machine learning workloads, with a focus on versioned datasets and fast vector search. Vortex is designed for general analytical query performance with an emphasis on scan speed and compression efficiency. Lance optimizes for ML-specific access patterns (random row access, version management); Vortex optimizes for the scan-filter-aggregate patterns common in SQL analytics. ### Vortex vs. Apache ORC ORC (Optimized Row Columnar) is the Hive ecosystem's columnar format. Like Parquet, ORC uses fixed encodings chosen at write time. Vortex's adaptive encoding system and lazy decompression give it performance advantages for scan-heavy workloads. ORC is tightly integrated with the Hadoop ecosystem; Vortex is designed for modern, Rust-native query engines. ### Vortex vs. Apache Arrow IPC Arrow IPC is an in-memory serialization format for Apache Arrow arrays. It is designed for zero-copy data exchange between processes, not for persistent storage with compression. Vortex is a storage format that achieves high compression while preserving the ability to operate on encoded data. They serve different purposes: Arrow IPC for inter-process communication, Vortex for on-disk analytical storage. ## Performance Characteristics Vortex's design yields several measurable performance benefits: - **Faster scan times:** Lazy decompression means the query engine avoids decoding data that is filtered out early. For selective queries (those that touch a small fraction of rows), this translates to significantly faster scans compared to formats that require full decompression. - **Better compression ratios:** Adaptive encoding that varies per column segment consistently achieves smaller file sizes than fixed-encoding formats on the same data. Smaller files mean less I/O, which compounds the scan speed improvement. - **Lower memory usage:** Zero-copy reads from memory-mapped files eliminate the need to allocate separate buffers for decompressed data. The working memory footprint of a Vortex-backed query is proportional to the data actually accessed, not the total column size. - **Efficient point lookups:** Random access within encoded segments enables efficient lookups without scanning or decompressing surrounding data. These characteristics are most impactful in acceleration workloads -- scenarios where data is cached locally for fast, repeated access by [SQL federation](/learn/sql-federation) queries, dashboards, or AI pipelines. ## How Spice Uses Vortex Spice uses Vortex as the storage format for its Cayenne data accelerator. When data is accelerated in Spice -- cached locally from remote sources like PostgreSQL, Databricks, or Amazon S3 -- it is stored in Vortex format on disk or in memory. This means that [federated queries](/learn/sql-federation) that hit the acceleration layer benefit from Vortex's lazy decompression, adaptive encoding, and zero-copy reads. The result is sub-second query performance over locally cached data, even for datasets that would be too large to hold fully decompressed in memory. The acceleration layer is kept synchronized with source systems using [change data capture](/learn/change-data-capture), so the Vortex-encoded local cache always reflects the current state of the source data. ### Cayenne and Vortex Cayenne is Spice's next-generation data accelerator, purpose-built for high-scale analytical workloads. It uses Vortex as its underlying storage format and adds: - **Incremental updates:** When source data changes, only the affected segments are re-encoded. The entire dataset does not need to be rewritten. - **Tiered storage:** Hot data is memory-mapped for zero-copy access. Warm data is stored on local disk. The tiering is transparent to the query engine. - **Integration with [Apache DataFusion](/learn/apache-datafusion):** Cayenne exposes Vortex-encoded data as DataFusion table providers, so the query engine can push filters and projections directly into the storage layer. ## When to Use Vortex Vortex is the right choice when: - **Query performance is the priority:** If you need the fastest possible scan, filter, and aggregate performance over columnar data, Vortex's adaptive encoding and lazy decompression provide measurable improvements over Parquet. - **Data is cached locally for acceleration:** Vortex is designed for the acceleration use case -- caching remote data locally for fast repeated access. - **Memory efficiency matters:** Zero-copy reads and lazy decompression reduce the memory footprint of analytical workloads. Parquet remains the better choice for data interchange between systems, archival storage in data lakes, and any scenario where broad ecosystem compatibility is more important than raw scan performance. ## Advanced Topics ### Encoding Selection Algorithms Vortex does not rely on manual encoding hints or fixed heuristics. Instead, it uses a cost-based encoding selection algorithm that evaluates each column segment against the available encoding schemes and selects the combination that minimizes a weighted objective of compressed size and expected decode cost. The algorithm works in two phases. First, it profiles a column segment to compute statistics: cardinality, run lengths, value range, null density, and sort order. Second, it evaluates each candidate encoding against these statistics. Dictionary encoding is favored when cardinality is low relative to segment length. Run-length encoding is favored when there are long runs of consecutive identical values. Frame-of-reference encoding is favored when values fall within a narrow range. Bit-packing is favored when the effective bit width is significantly smaller than the storage type's bit width. Delta encoding is favored for monotonically increasing or decreasing sequences. The cost model accounts for both storage efficiency (bytes per value after encoding) and query performance (estimated CPU cycles to decode a value). This trade-off matters because a highly compressed encoding that is expensive to decode may be slower in practice than a moderately compressed encoding that supports fast scans. The algorithm selects the encoding that optimizes for the expected query workload, which is scan-heavy by default. ### Cascading Encodings One of Vortex's distinguishing features is its support for cascading (layered) encodings. Rather than choosing a single encoding per segment, Vortex can stack multiple encodings in sequence. For example, a timestamp column with mostly increasing values might first be delta-encoded (converting absolute timestamps to small deltas), and then the resulting delta values might be bit-packed (since the deltas require fewer bits than the original timestamps). Cascading works because each encoding transforms data into a representation that may be more amenable to further encoding. The encoding selection algorithm evaluates multi-layer combinations, not just individual encodings. It uses a bounded search to avoid exponential blowup -- typically evaluating up to two or three layers, since additional layers rarely yield significant benefit. This approach lets Vortex achieve compression ratios that no single encoding can match. In practice, cascading is most effective on numeric columns with structured patterns -- timestamps, auto-incrementing IDs, sensor readings with bounded variation -- where the first encoding removes most of the entropy and the second encoding compresses the residual. ### Lazy Decompression and Pushdown Vortex's lazy decompression model is more than a performance optimization -- it changes which operations are possible at the storage layer. Because encoded data retains enough structure for certain operations, Vortex supports compute pushdown into the encoding layer itself. For example, a filter predicate like `WHERE timestamp > '2026-01-01'` on a delta-encoded column can be evaluated without fully decoding the column. The storage layer translates the predicate threshold into the delta domain (by computing the delta relative to the base value) and evaluates it against the encoded representation. Only segments that pass the filter are decoded to Arrow arrays for further processing. Similarly, min/max statistics and null counts are maintained per segment in the Vortex file metadata. The query engine uses these statistics for segment pruning -- skipping entire segments that cannot contain matching rows -- before any data is read from disk. This pushdown capability is exposed to [Apache DataFusion](/learn/apache-datafusion) through the `TableProvider` interface. When DataFusion pushes filters and projections down to a Vortex table provider, the provider evaluates them at the encoding level, reads and decodes only the qualifying segments, and returns the results as Arrow record batches. The query engine never sees or processes data that was pruned at the storage layer. ### Segment Layout and Metadata Vortex files are organized into segments, each containing a contiguous range of rows for a single column. Segment boundaries are chosen based on a target size (typically 64 KB to 1 MB of encoded data), balancing between fine-grained pruning and metadata overhead. Each segment stores its encoding type, compressed data, null bitmap, and lightweight statistics (row count, null count, min, max). The file footer contains a segment index that maps row ranges to segment offsets. This index enables efficient range scans and point lookups: the query engine binary-searches the index to find the relevant segments, reads only those segments from disk, and decodes them. For memory-mapped files, this translates to a small number of page faults rather than a sequential scan of the entire column. Both are columnar file formats for analytical data, but they differ in encoding and decompression. Parquet uses fixed encodings chosen at write time and requires full decompression before processing. Vortex uses adaptive, cascading encodings that vary per column segment and supports lazy decompression -- only the data actually needed by a query is decoded. This gives Vortex faster scan times and better compression ratios for most workloads, while Parquet has broader ecosystem support.

    ', }, { title: 'Is Vortex open source?', paragraph: '

    Yes. Vortex is developed by Spiral, the open-source team behind Spice, and is available under an open-source license. The source code, documentation, and issue tracker are publicly accessible.

    ', }, { title: 'When should I use Vortex instead of Parquet?', paragraph: '

    Use Vortex when query performance and memory efficiency are priorities -- particularly for data acceleration workloads where data is cached locally for fast, repeated access. Use Parquet when you need broad ecosystem compatibility or are storing data in a data lake for interchange between multiple tools.

    ', }, { title: 'Can Vortex be used with existing data tools?', paragraph: '

    Vortex is currently used primarily within the Spice ecosystem, where it powers the Cayenne data accelerator. It is not a drop-in replacement for Parquet in arbitrary data pipelines. For data interchange, Parquet remains the standard. Within Spice, Vortex is used automatically when data acceleration is enabled.

    ', }, { title: 'What is the relationship between Vortex and Apache Arrow?', paragraph: '

    Vortex is designed to work with Apache Arrow. When Vortex data is decoded, it produces Arrow arrays that can be processed by any Arrow-compatible query engine. Vortex also supports operating on encoded data directly (without decoding to Arrow) for operations like filtering and aggregation, which is where its performance advantages come from. Arrow IPC is an in-memory interchange format; Vortex is a compressed on-disk storage format.

    ', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## Spice AI for AWS URL: https://spice.ai/partners/aws Description: Build Fast, Scalable AI Applications with Spice AI and Amazon Web Services Amazon S3, DynamoDB, Redshift, Aurora, RDS, and Amazon MSK with Spice.ai's unified query engine - no data movement required.", }, { icon: 'bolt', title: 'Integrate Amazon Bedrock AI', description: 'Connect Amazon Nova, Titan, Claude, and other foundation models through Amazon Bedrock for LLM inference and embeddings, with Bedrock Guardrails support for safe AI applications.', }, { icon: 'shield', title: 'Vector Search with S3 Vectors', description: 'Build semantic search and RAG applications using Amazon S3 Vectors - a new S3 bucket type for sub-second similarity queries on billions of vectors at up to 90% cost reduction.', }, { icon: 'building', title: 'Deploy via AWS Marketplace', description: 'Purchase and deploy Spice.ai Enterprise through AWS Marketplace with consolidated billing, Private Offers support, and fast onboarding using your existing AWS procurement workflows.', }, ], padding_top: 'st-lg', padding_bottom: 'sb-lg', theme: 'aws', }} /> Iceberg tables in Amazon S3 Tables using AWS Glue catalog integration.', url: 'https://spiceai.org/docs/components/data-connectors/s3', }, { logo: { src: '/website-assets/partners/aws/services/dynamodb.svg', alt: 'Amazon DynamoDB', }, title: 'Amazon DynamoDB & Streams', description: 'Federated SQL queries on DynamoDB tables with automatic schema inference and real-time CDC via DynamoDB Streams.', url: 'https://spiceai.org/docs/components/data-connectors/dynamodb', }, { logo: { src: '/website-assets/partners/aws/services/bedrock.svg', alt: 'Amazon Bedrock', }, title: 'Amazon Bedrock (Nova, Titan, Claude)', description: 'LLM inference with Amazon Nova and embeddings with Titan. Supports Bedrock Guardrails for content filtering.', url: 'https://spiceai.org/docs/components/models/aws-bedrock', }, { logo: { src: '/website-assets/partners/aws/services/s3-vectors.svg', alt: 'Amazon S3 Vectors', }, title: 'Amazon S3 Vectors', description: 'Sub-second similarity queries on billions of vectors with up to 90% cost reduction compared to traditional vector databases.', url: 'https://spice.ai/blog/amazon-s3-vectors', }, { logo: { src: '/website-assets/partners/aws/services/glue.svg', alt: 'AWS Glue', }, title: 'AWS Glue Data Catalog', description: 'Discover and query tables from AWS Glue with glob pattern filtering. Supports Iceberg and Delta Lake table formats.', url: 'https://spiceai.org/docs/components/catalogs/glue', }, { logo: { src: '/website-assets/partners/aws/services/redshift.svg', alt: 'Amazon Redshift', }, title: 'Amazon Redshift, Aurora & RDS', description: 'Connect to Redshift clusters, Aurora PostgreSQL/MySQL, and RDS instances with full SQL federation.', url: 'https://spiceai.org/docs/deployment/aws/integrations', }, ], padding_top: 'st-lg', padding_bottom: 'sb-lg', theme: 'aws', }} /> --- ## Spice AI for Databricks URL: https://spice.ai/partners/databricks Description: Build Fast, Accurate AI Applications with Spice AI and Databricks inventory tracking or fraud detection.", }, { icon: 'bolt', title: 'Embed AI with Applications', description: 'Integrate Databricks Mosaic AI model serving and embeddings with the Spice engine to deploy AI features, such as low-latency recommendation systems, search, or predictive maintenance.', }, { icon: 'shield', title: 'Streamline Data Governance', description: 'Manage Apache Iceberg and Delta Lake tables using Unity Catalog, enforcing secure access, ensuring compliance, and restricted data access.', }, { icon: 'bolt', title: 'Optimize Workload Performance', description: 'Use Spice.ai to cache hot data, replicate high-demand datasets, and load-balance hosted AI endpoints, maintaining speed and resilience for applications like real-time dashboards.', }, ], padding_top: 'st-lg', padding_bottom: 'sb-lg', }} /> DuckDB and SQLite.', url: 'https://spiceai.org/docs/components/data-connectors/databricks#spark-connect', }, { icon: 'bolt', title: 'Databricks Mosaic AI Integration', description: 'Model serving and embeddings integrations to bring MosaicAI alongside applications.', url: 'https://spiceai.org/docs/components/embeddings/databricks', }, { icon: 'shield', title: 'Unity Catalog Support', description: 'Comprehensive support for governance and security.', url: 'https://spiceai.org/docs/components/catalogs/databricks', }, { icon: 'building', title: 'Apache Iceberg & Delta Lake Support', description: 'Query and management of open format tables via Unity Catalog.', url: 'https://spiceai.org/docs/components/data-connectors/databricks#delta-lake-s3', }, { icon: 'users', title: 'Service Principal Authentication', description: 'M2M & U2M OAuth authentication for enterprise-grade role-based security.', url: 'https://spiceai.org/docs/components/data-connectors/databricks#authentication', }, ], padding_top: 'st-lg', padding_bottom: 'sb-lg', }} /> --- ## Spice AI for NetApp URL: https://spice.ai/partners/netapp Description: Build Accelerated, Data-Grounded AI Applications with Spice AI and NetApp ONTAP NetApp ONTAP S3, NFS, SMB, and Azure NetApp Files with Spice.ai's unified query engine - no data movement required.", }, { icon: 'bolt', title: 'Accelerate Queries Locally', description: 'Cache frequently accessed data alongside your application for sub-second latency. Eliminate network round-trips and reduce load on primary storage systems.', }, { icon: 'shield', title: 'Build Modern Data Lakehouses', description: 'Combine NetApp ONTAP S3 Lakehouse with Dremio and PostgreSQL for a composable data platform that supports real-time analytics and AI workloads.', }, { icon: 'building', title: 'Integrate LLMs with Your Data', description: 'Connect foundation models like GPT-4 to your NetApp data sources. Use natural language queries with spice chat to get answers grounded in your enterprise data.', }, ], padding_top: 'st-lg', padding_bottom: 'sb-lg', theme: 'netapp', }} /> ONTAP S3 buckets. Build data lakehouses with native Apache Iceberg support.', url: 'https://spiceai.org/docs/components/data-connectors/s3', }, { logo: { src: '/website-assets/partners/netapp/services/nfs.svg', alt: 'NFS', }, title: 'NFS File Shares', description: 'Access data on NFS-mounted NetApp volumes directly. Query files from network-attached storage with the File Data Connector.', url: 'https://spiceai.org/docs/components/data-connectors/file', }, { logo: { src: '/website-assets/partners/netapp/services/smb.svg', alt: 'SMB', }, title: 'SMB/CIFS Shares', description: 'Connect to SMB file shares for Windows-based workflows. Federate data from CIFS-mounted NetApp storage.', url: 'https://spiceai.org/docs/components/data-connectors/file', }, { logo: { src: '/website-assets/partners/netapp/services/sftp.svg', alt: 'SFTP', }, title: 'FTP/SFTP Access', description: 'Query files from FTP and SFTP servers. Supports Parquet, CSV, and JSON formats with secure transfer.', url: 'https://spiceai.org/docs/components/data-connectors/ftp', }, { logo: { src: '/website-assets/partners/netapp/services/ontap.svg', alt: 'ONTAP', }, title: 'Azure NetApp Files', description: 'Connect to Azure NetApp Files by Instaclustr for cloud-native enterprise file storage with high performance.', url: 'https://community.netapp.com/t5/Tech-ONTAP-Blogs/Building-Accelerated-Data-Grounded-Apps-with-Spice-ai-and-NetApp/ba-p/463612', }, { logo: { src: '/website-assets/partners/netapp/services/file.svg', alt: 'File Connector', }, title: 'Local File Systems', description: 'Query any locally accessible filesystem including mounted network shares. Supports automatic refresh on file changes.', url: 'https://spiceai.org/docs/components/data-connectors/file', }, ], padding_top: 'st-lg', padding_bottom: 'sb-lg', theme: 'netapp', }} /> --- ## Partners URL: https://spice.ai/partners Date: 2025-11-21T00:46:27 Description: Partner with Spice AI to deliver faster data, search, and AI solutions for your customers. Build integrations, co-market solutions, and power data-intensive applications and AI agents together. --- ## Hybrid SQL Search URL: https://spice.ai/platform/hybrid-sql-search Date: 2025-11-14T14:39:28 Description: Combine vector similarity, full-text, and keyword search in one SQL query. Fast, scalable, and production-ready. every time', description: 'Retrieve answers that reflect context (semantic meaning) and precision (exact match).', cta: { title: 'Explore hybrid search docs', url: 'https://spiceai.org/docs/features/search', target: '', }, }, { icon: { ID: 605, id: 605, title: 'Platform Search Any Scale Icon', filename: 'platform_search_any_scale_icon.svg', filesize: 8140, url: '/website-assets/media/2025/11/platform_search_any_scale_icon.svg', link: '/platform/hybrid-sql-search/attachment/platform_search_any_scale_icon/', alt: '', author: '6', description: '', caption: '', name: 'platform_search_any_scale_icon', status: 'inherit', uploaded_to: 543, date: '2025-11-18 19:39:16', modified: '2025-11-18 19:39:16', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/default.png', width: 0, height: 0, sizes: { thumbnail: '/website-assets/media/2025/11/platform_search_any_scale_icon.svg', 'thumbnail-width': 1, 'thumbnail-height': 1, medium: '/website-assets/media/2025/11/platform_search_any_scale_icon.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/platform_search_any_scale_icon.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/platform_search_any_scale_icon.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/platform_search_any_scale_icon.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/platform_search_any_scale_icon.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Search at any scale', description: 'Index and search billions of embeddings with low-latency, across cloud object storage or operational databases. Performance remains consistent as datasets grow.', cta: { title: 'Read the S3 Vectors guide', url: '/blog/getting-started-with-amazon-s3-vectors-and-spice', target: '', }, }, { icon: { ID: 604, id: 604, title: 'Platform Search SQL Control Icon', filename: 'platform_search_sql_control_icon.svg', filesize: 6603, url: '/website-assets/media/2025/11/platform_search_sql_control_icon.svg', link: '/platform/hybrid-sql-search/attachment/platform_search_sql_control_icon/', alt: '', author: '6', description: '', caption: '', name: 'platform_search_sql_control_icon', status: 'inherit', uploaded_to: 543, date: '2025-11-18 19:39:15', modified: '2025-11-18 19:39:15', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/default.png', width: 0, height: 0, sizes: { thumbnail: '/website-assets/media/2025/11/platform_search_sql_control_icon.svg', 'thumbnail-width': 1, 'thumbnail-height': 1, medium: '/website-assets/media/2025/11/platform_search_sql_control_icon.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/platform_search_sql_control_icon.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/platform_search_sql_control_icon.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/platform_search_sql_control_icon.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/platform_search_sql_control_icon.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Full SQL control', description: 'Design, refine, and combine search results entirely in standard SQL.', cta: { title: 'See SQL search reference', url: 'https://spiceai.org/docs/reference/sql/search', target: '', }, }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> Hybrid search in Spice.ai merges vector similarity (semantic) and full-text BM25 (keyword) results into one ranked output. Both search types run in parallel, and their ranks are combined using Reciprocal Rank Fusion (RRF) for optimal relevance.

    \n

    You can query this through the /v1/search API or with SQL functions like vector_search() and text_search(). Because results are treated as tables, you can filter, join, and aggregate just like any SQL dataset.

    \n', }, { title: 'How is Spice.ai different from a vector database?', paragraph: '

    Traditional vector databases require you to pair a vector index with separate text and keyword search systems, all running on clusters you have to provision and maintain. Spice.ai unifies vector, text, and relational search in a single runtime that you can deploy locally, in your cloud, or fully managed.

    \n

    You can join vector results with structured data, apply SQL filters, and run inference on top of them, all without moving data or managing multiple systems. For developers, this means a single query layer that delivers the power of a vector database, the flexibility of SQL, and the speed of an accelerated cache.

    \n', }, { title: 'When should I use S3 Vectors?', paragraph: "

    Use S3 Vectors when you need to store and query embeddings at large scale. It's ideal for workloads with millions or billions of vectors that don't need always-on compute. By offloading storage and similarity search to S3 Vectors, you get the scalability and durability of S3 with sub-second queries via transient compute. Spice manages the entire lifecycle: embedding, indexing, filtering, and query orchestration.

    \n", }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## LLM Inference URL: https://spice.ai/platform/llm-inference Date: 2025-10-29T14:30:24 Description: Call LLMs directly from SQL. Generate, summarize, and enrich data inline using the SQL AI function or natural language queries. in one workflow', description: 'Avoid context switching and integrate AI with standard SQL operations. Chain model responses to filters, joins, or aggregations to build RAG pipelines.', cta: { title: 'Explore SQL AI functions', url: 'https://spiceai.org/docs/reference/sql/ai', target: '', }, }, { icon: { ID: 606, id: 606, title: 'Platform LLM Governance Icon', filename: 'platform_llm_governance_icon.svg', filesize: 5586, url: '/website-assets/media/2025/11/platform_llm_governance_icon.svg', link: '/platform/llm-inference/attachment/platform_llm_governance_icon/', alt: '', author: '6', description: '', caption: '', name: 'platform_llm_governance_icon', status: 'inherit', uploaded_to: 543, date: '2025-11-18 19:39:16', modified: '2025-11-18 19:39:16', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/default.png', width: 0, height: 0, sizes: { thumbnail: '/website-assets/media/2025/11/platform_llm_governance_icon.svg', 'thumbnail-width': 1, 'thumbnail-height': 1, medium: '/website-assets/media/2025/11/platform_llm_governance_icon.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/platform_llm_governance_icon.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/platform_llm_governance_icon.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/platform_llm_governance_icon.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/platform_llm_governance_icon.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Maintain data governance
    & security', description: 'All AI-driven operations are performed within your governed SQL environment, so data never leaves your compliance boundaries and access is fully auditable.', cta: { title: 'Explore secure AI sandboxing', url: '/feature/secure-ai-sandboxing', target: '', }, }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> AI() is a built-in Spice function that lets you call large language models directly inside SQL queries. It takes a prompt (and optional data columns) as input and returns model completions as query results. This allows you to summarize, translate, generate, or classify text inline without additional code or API management.

    \n', }, { title: 'How does text-to-SQL work?', paragraph: '

    Spice uses your preferred LLM to convert prompts into executable SQL. Results are constrained to your connected datasets and subject to all existing SQL permissions and governance rules.

    \n', }, { title: 'Can I use different model providers?', paragraph: '

    Yes. Spice abstracts model providers behind a common interface; select OpenAI, Anthropic, Bedrock, or your custom model by name in each call. This keeps your SQL portable and futureproof.

    \n', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## SQL Federation & Acceleration URL: https://spice.ai/platform/sql-federation-acceleration Date: 2025-11-14T14:40:16 Description: Query any data source with sub-second speed. Spice combines SQL federation and acceleration in a single runtime with zero ETL. data sources', description: 'Query across data lakes, operational databases, and analytical warehouses. Join, aggregate, and analyze without data movement.', cta: { title: 'Explore federated SQL query', url: 'https://docs.spice.ai/features/federated-sql-query', target: '', }, }, { icon: { ID: 603, id: 603, title: 'Platform Federation Fast Icon', filename: 'platform_federation_fast_icon.svg', filesize: 8658, url: '/website-assets/media/2025/11/platform_federation_fast_icon.svg', link: '/platform/sql-federation-acceleration/attachment/platform_federation_fast_icon/', alt: '', author: '6', description: '', caption: '', name: 'platform_federation_fast_icon', status: 'inherit', uploaded_to: 543, date: '2025-11-18 19:39:14', modified: '2025-11-18 19:39:14', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/default.png', width: 0, height: 0, sizes: { thumbnail: '/website-assets/media/2025/11/platform_federation_fast_icon.svg', 'thumbnail-width': 1, 'thumbnail-height': 1, medium: '/website-assets/media/2025/11/platform_federation_fast_icon.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/platform_federation_fast_icon.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/platform_federation_fast_icon.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/platform_federation_fast_icon.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/platform_federation_fast_icon.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Deliver sub-second
    query performance', description: 'Accelerate frequently-accessed data by materializing and indexing hot tables with local engines like DuckDB and SQLite.', cta: { title: 'Explore data acceleration', url: 'https://docs.spice.ai/features/data-acceleration', target: '', }, }, { icon: { ID: 602, id: 602, title: 'Platform Federation Simplify Icon', filename: 'platform_federation_simplify_icon.svg', filesize: 2610, url: '/website-assets/media/2025/11/platform_federation_simplify_icon.svg', link: '/platform/sql-federation-acceleration/attachment/platform_federation_simplify_icon/', alt: '', author: '6', description: '', caption: '', name: 'platform_federation_simplify_icon', status: 'inherit', uploaded_to: 543, date: '2025-11-18 19:39:14', modified: '2025-11-18 19:39:14', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/default.png', width: 0, height: 0, sizes: { thumbnail: '/website-assets/media/2025/11/platform_federation_simplify_icon.svg', 'thumbnail-width': 1, 'thumbnail-height': 1, medium: '/website-assets/media/2025/11/platform_federation_simplify_icon.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/platform_federation_simplify_icon.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/platform_federation_simplify_icon.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/platform_federation_simplify_icon.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/platform_federation_simplify_icon.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Simplify your data stack', description: 'Replace multiple engines, ETL jobs, and custom caches with one lightweight runtime that handles federation, acceleration, and hybrid search in a single environment.', cta: { title: 'Get started with Spice', url: 'https://spiceai.org/docs/getting-started', target: '', }, }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> SQL federation lets you query data across multiple sources as if it were one. With Spice, you can connect directly to systems like S3, PostgreSQL, or Snowflake and execute unified SQL queries. Spice handles source integration, query planning, and result merging automatically without ETL.

    \n', }, { title: 'How does acceleration in Spice work?', paragraph: '

    Spice materializes and indexes hot data locally using embedded engines such as DuckDB, PostgreSQL, and SQLite. Frequently-queried data is cached and optimized for sub-second responses, while changes from the source are synced through CDC. This approach delivers analytical performance for operational workloads-ideal for APIs, dashboards, and AI agents.

    \n', }, { title: 'How is Spice different from other query engines?', paragraph: '

    Traditional query engines focus on analytics and often require separate systems for federation, caching, and serving. Spice unifies these capabilities in a single runtime built for operational and AI workloads.

    \n

    What sets Spice apart from other query engines is its broader, application-focused feature set designed for modern data and AI workloads. Spice combines federation, hybrid search, and embedded LLM inference into a single runtime, enabling teams to build complete, end-to-end workflows without the management overhead and performance concessions of using multiple systems.

    \n', }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> --- ## Spice Cloud Plans URL: https://spice.ai/pricing/cloud Description: Flexible cloud pricing plans for teams of all sizes Serverless Compute Instances

    ' }, { text: '

    Single Instance
    2 vCPU / 4 GB

    ' }, { text: '

    Single Instance
    4 vCPU / 8 GB

    ' }, { text: '

    Multi-Instance
    Custom vCPU & Mem

    ' }, ], }, { cell: [ { text: '

    Cloud Hosting

    ' }, { text: '

    Shared, Multi-Tenant Cluster
    (us-east-1, us-west-2)

    ', }, { text: '

    Shared, Multi-Tenant Cluster
    (us-east-1, us-west-2)

    ', }, { text: '

    Dedicated AWS Cluster
    (us-east-1, us-west-2 + additional regions by request)

    ', }, ], }, { cell: [ { text: '

    Ephemeral Local Storage

    ' }, { text: '

    100 MB

    ' }, { text: '

    1.0 GB

    ' }, { text: '

    4.0+ GB

    ' }, ], }, { cell: [ { text: '

    Persistent Object Storage

    ' }, { text: '

    N/A

    ' }, { text: '

    N/A

    ' }, { text: '

    10.0+ GB

    ' }, ], }, { cell: [ { text: '

    Query Limits

    ' }, { text: '

    Up to 16 concurrent queries
    90-second timeout

    ' }, { text: '

    Up to 64 concurrent queries
    5-minute timeout

    ' }, { text: '

    Up to 1024 concurrent queries
    30-minute timeout

    ', }, ], }, { cell: [ { text: '

    APIs

    ' }, { text: '

    HTTP, Arrow Flight,
    Arrow Flight SQL, ADBC

    ' }, { text: '

    HTTP, Arrow Flight,
    Arrow Flight SQL, ADBC

    ' }, { text: '

    HTTP, Arrow Flight, Arrow Flight SQL,
    JDBC/ODBC/ADBC

    ', }, ], }, { cell: [ { text: '

    SDKs

    ' }, { text: '

    Python, Go, Node.js,
    Rust, C#, Java

    ' }, { text: '

    Python, Go, Node.js,
    Rust, C#, Java

    ' }, { text: '

    Python, Go, Node.js,
    Rust, C#, Java

    ' }, ], }, { cell: [ { text: '

    Availability

    ' }, { text: '

    Single Cloud,
    Single Region

    ' }, { text: '

    Single Cloud,
    Single Region

    ' }, { text: '

    Multi-Cloud, Multi-Region
    High-Availability

    ' }, ], }, { cell: [ { text: '

    Users

    ' }, { text: '

    1

    ' }, { text: '

    Unlimited

    ' }, { text: '

    Unlimited

    ' }, ], }, { cell: [ { text: '

    Apps

    ' }, { text: '

    5

    ' }, { text: '

    10

    ' }, { text: '

    1000

    ' }, ], }, { cell: [ { text: '

    Commercial License

    ' }, { text: '

    No

    ' }, { text: '

    Yes

    ', icon: 'check' }, { text: '

    Yes

    ', icon: 'check' }, ], }, { cell: [ { text: '

    Commercial Resale

    ' }, { text: '

    No

    ' }, { text: '

    No

    ' }, { text: '

    Yes

    ', icon: 'check' }, ], }, { cell: [ { text: '

    Support

    ' }, { text: '

    Community support

    ' }, { text: '

    Standard, 8/5 next-business day

    ' }, { text: '

    Premium, 24/7 on-call

    ' }, ], }, { cell: [ { text: '

    Support Channels

    ' }, { text: '

    Community Slack

    ' }, { text: '

    Private Slack channel, Email

    ' }, { text: '

    Private GitHub repo, Private Slack, Email, On-Call Pager

    ', }, ], }, { cell: [ { text: '

    SLA

    ' }, { text: '

    99.0+ SLA

    ' }, { text: '

    99.0+ SLA

    ' }, { text: '

    Enterprise 99.9+ SLA

    ' }, ], }, { cell: [ { text: '

    Compliance

    ' }, { text: '

    SOC 2 Type II

    ', icon: 'check' }, { text: '

    SOC 2 Type II

    ', icon: 'check' }, { text: '

    SOC 2 Type II

    ', icon: 'check' }, ], }, { cell: [ { text: '

    SSO

    ' }, { text: '

    Sign in with GitHub

    ' }, { text: '

    Sign in with GitHub

    ' }, { text: '

    Sign in with GitHub

    ' }, ], }, { cell: [ { text: '

    Trial

    ' }, { text: '

    Free 7-day Pro for Teams trial

    ' }, { text: '

    Free 7-day Pro for Teams trial

    ' }, { text: '

    Contact us

    ' }, ], }, ], }} /> Compute', description: 'Scale automatically with serverless compute instances. No infrastructure to manage - just deploy and go.', cta: { title: 'Get started', url: 'https://spice.ai/login', target: '_blank', }, }, { icon: { url: '/website-assets/svg/Pricing_Built-on-open-source.svg', alt: 'Enterprise Security', title: 'Enterprise Security', }, title: 'Enterprise
    Security', description: 'SOC 2 Type II compliant with enterprise-grade security controls and privacy protections built in.', cta: { title: 'Learn more', url: '/security/', target: '', }, }, { icon: { url: '/website-assets/svg/Pricing_Designed-for-performance-and-scale.svg', alt: 'High Availability', title: 'High Availability', }, title: 'High
    Availability', description: 'Enterprise plans include multi-region, high-availability deployments with 99.9%+ SLA.', cta: { title: 'Contact sales', url: '/contact/', target: '', }, }, ], }} /> All new accounts start with a free 7-day Pro for Teams trial. This gives you access to enhanced compute resources (4 vCPU / 8 GB), up to 64 concurrent queries, 5-minute query timeout, and 1.0 GB ephemeral storage. No credit card required to start.

    ', }, { title: 'Can I upgrade or downgrade my plan?', paragraph: '

    Yes, you can upgrade or downgrade your plan at any time. Changes take effect immediately and billing is prorated based on your usage.

    ', }, { title: 'What APIs and SDKs are supported?', paragraph: '

    All plans support HTTP, Apache Arrow Flight, Apache Arrow Flight SQL, and ADBC APIs. SDKs are available for Python, Go, Node.js, Rust, C#, and Java. Enterprise plans also include JDBC/ODBC support.

    ', }, { title: 'What is the difference between Developer and Pro for Teams?', paragraph: '

    Developer is designed for individual developers with a single user, 5 apps, and basic compute resources. Pro for Teams supports unlimited users, 10 apps, enhanced compute (4 vCPU / 8 GB), higher concurrency limits, commercial licensing, and standard support with private Slack and email channels.

    ', }, { title: 'What does Enterprise include?', paragraph: '

    Enterprise provides dedicated AWS clusters, multi-region high-availability, custom compute configurations, persistent object storage, up to 1024 concurrent queries, 30-minute query timeout, commercial resale rights, premium 24/7 on-call support, private GitHub repository access, and a 99.9%+ SLA.

    ', }, ], }} /> --- ## Pricing URL: https://spice.ai/pricing Date: 2025-11-19T12:16:23 Description: Start free and deploy anywhere: on your laptop, on-prem, at the edge, or in the cloud. Flexible pricing designed for teams building data-intensive applications & AI agents. Optimized For

    \n', }, { text: '

    Community

    \n', }, { text: '

    Managed Workloads

    \n', }, { text: '

    Self-Hosted Workloads

    \n', }, ], }, { cell: [ { text: '

    Scale

    \n', }, { text: '

    •  Single-Node
    \n•  Engine only

    \n', }, { text: '

    •  High-Availability
    \n•  Multi-Node with Clustering

    \n', }, { text: '

    •  High-Availability
    \n•  Multi-Node with Clustering

    \n', }, ], }, { cell: [ { text: '

    Control Plane

    \n', }, { text: '

    Static, configuration-driven only (YAML)

    \n', }, { text: '

    Dynamic, remote (Portal/API-managed)

    \n', }, { text: '

    Dynamic, remote/local (API/YAML)
    \nremote control in preview

    \n', }, ], }, { cell: [ { text: '

    Commercial License

    \n', }, { text: '

    Apache 2.0

    \n', }, { text: '

      Consumption-based TOS

    \n', }, { text: '

      Enterprise Software License

    \n', }, ], }, { cell: [ { text: '

    Code / Security Audited

    \n', }, { text: '

    No

    \n', }, { text: '

      Yes

    \n', }, { text: '

      Yes

    \n', }, ], }, { cell: [ { text: '

    Security Updates / Bugfixes

    \n', }, { text: '

    Latest minus-one release (last 6-8 weeks)

    \n', }, { text: '

    Rolling updates; automated patches; immediate fixes

    \n', }, { text: '

    Tiered up to 3 years; guaranteed patches, regular updates

    \n', }, ], }, { cell: [ { text: '

    24/7 Enterprise SLA

    \n', }, { text: '

    N/A

    \n

     

    \n', }, { text: '

      99.9% uptime, proactive failover

    \n', }, { text: '

      with 24/7 on-call

    \n', }, ], }, { cell: [ { text: '

    Support

    \n', }, { text: '

    Community (GitHub Issues, Discord)

    \n', }, { text: '

    Rolling updates; automated patches; immediate fixes

    \n', }, { text: '

    Tiered up to 3 years; guaranteed patches, regular updates

    \n', }, ], }, { cell: [ { text: '

    Distribution

    \n', }, { text: '

    OSS Docker Images; GitHub release binaries

    \n', }, { text: '

    Enterprise image; AWS Marketplace SaaS

    \n', }, { text: '

    Enterprise Image; AWS Marketplace ECR & AMI

    \n', }, ], }, { cell: [ { text: '

    Hosting

    \n', }, { text: '

    Self-Hosted

    \n', }, { text: '

    Self-hosted BYOL; Kubernetes, AWS AMI

    \n', }, { text: '

    Fully-managed, cloud-hosted; dedicated deployments (AWS)

    \n', }, ], }, { cell: [ { text: '

    Monitoring & Observability

    \n', }, { text: '

    DIY; OpenTelemetry (e.g. Grafana)

    \n', }, { text: '

    Real-time monitoring & observability; built-in dashboards

    \n

     

    \n', }, { text: '

    BYO (e.g., Datadog, New Relic, etc.) or Spice Cloud Connect

    \n

     

    \n', }, ], }, { cell: [ { text: '

    Compliance

    \n', }, { text: '

    N/A

    \n', }, { text: '

      SOC 2 Type II; audited security & privacy controls

    \n', }, { text: '

      SOC 2 Type II; audited security & privacy controls

    \n', }, ], }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> AI engine', description: 'Federate data, accelerate queries, perform hybrid search, and run AI models all from one portable runtime that can be deployed locally, on-prem, at the edge, or in the cloud.', cta: { title: 'Learn more', url: 'https://spiceai.org/docs/deployment', target: '_blank', }, }, { icon: { ID: 1136, id: 1136, title: 'Pricing_Built on open-source', filename: 'Pricing_Built-on-open-source.svg', filesize: 5035, url: '/website-assets/media/2025/11/Pricing_Built-on-open-source.svg', link: '/pricing/attachment/pricing_built-on-open-source/', alt: '', author: '6', description: '', caption: '', name: 'pricing_built-on-open-source', status: 'inherit', uploaded_to: 648, date: '2025-11-21 22:20:30', modified: '2025-11-21 22:20:30', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/default.png', width: 0, height: 0, sizes: { thumbnail: '/website-assets/media/2025/11/Pricing_Built-on-open-source.svg', 'thumbnail-width': 1, 'thumbnail-height': 1, medium: '/website-assets/media/2025/11/Pricing_Built-on-open-source.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/Pricing_Built-on-open-source.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/Pricing_Built-on-open-source.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/Pricing_Built-on-open-source.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/Pricing_Built-on-open-source.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Built on
    open-source', description: 'Leverage modern open-source technologies, including Arrow, DataFusion, DuckDB, SQLite, Iceberg, and more in one engine. ', cta: { title: 'Learn more', url: 'https://spiceai.org/docs', target: '_blank', }, }, { icon: { ID: 1137, id: 1137, title: 'Pricing_Designed for performance and scale', filename: 'Pricing_Designed-for-performance-and-scale.svg', filesize: 5289, url: '/website-assets/media/2025/11/Pricing_Designed-for-performance-and-scale.svg', link: '/pricing/attachment/pricing_designed-for-performance-and-scale/', alt: '', author: '6', description: '', caption: '', name: 'pricing_designed-for-performance-and-scale', status: 'inherit', uploaded_to: 648, date: '2025-11-21 22:20:31', modified: '2025-11-21 22:20:31', menu_order: 0, mime_type: 'image/svg+xml', type: 'image', subtype: 'svg+xml', icon: '/website-assets/media/default.png', width: 0, height: 0, sizes: { thumbnail: '/website-assets/media/2025/11/Pricing_Designed-for-performance-and-scale.svg', 'thumbnail-width': 1, 'thumbnail-height': 1, medium: '/website-assets/media/2025/11/Pricing_Designed-for-performance-and-scale.svg', 'medium-width': 1, 'medium-height': 1, medium_large: '/website-assets/media/2025/11/Pricing_Designed-for-performance-and-scale.svg', 'medium_large-width': 1, 'medium_large-height': 1, large: '/website-assets/media/2025/11/Pricing_Designed-for-performance-and-scale.svg', 'large-width': 1, 'large-height': 1, '1536x1536': '/website-assets/media/2025/11/Pricing_Designed-for-performance-and-scale.svg', '1536x1536-width': 1, '1536x1536-height': 1, '2048x2048': '/website-assets/media/2025/11/Pricing_Designed-for-performance-and-scale.svg', '2048x2048-width': 1, '2048x2048-height': 1, }, }, title: 'Designed for performance and scale', description: "Spice's architecture is optimized for low-latency data access. Deploy with single-node or distributed multi-node query execution.", cta: { title: 'Learn more', url: 'https://spiceai.org/docs/features/data-acceleration', target: '_blank', }, }, ], padding_top: 'unset', padding_bottom: 'unset', }} /> Yes. Spice.ai Enterprise is available on AWS Marketplace, allowing teams to purchase and deploy Spice through their existing AWS billing and procurement workflows. This includes support for Private Offers, consolidated billing, and fast onboarding. Visit the AWS Marketplace listing here.

    \n', }, { title: 'Is on-prem supported?', paragraph: '

    Yes. Spice is portable and can run on-prem in Kubernetes, VMs, or bare metal. Enterprises can also use private cloud deployments or hybrid models where acceleration and model serving run close to the application while governance is centralized.

    \n', }, { title: 'How is Spice Cloud Enterprise different from the other offerings?', paragraph: '

    Enterprise Cloud provides a dedicated, multi-region, high-availability cluster with significantly higher compute, storage, and concurrency limits, as well as enterprise-grade features. Unlike the Developer and Pro tiers, Enterprise includes custom vCPU/memory configurations, persistent object storage, JDBC/ODBC support, 1024+ concurrent queries, commercial licensing and resale rights, and Premium 24/7 on-call support with a 99.9% SLA.

    \n', }, ], padding_top: 'st-xl', padding_bottom: 'sb-xl', }} /> --- ## Privacy Policy URL: https://spice.ai/privacy-policy Date: 2025-10-23T00:58:38 Description: Your privacy matters. Read Spice AI's policy on data collection, usage, protection, and your rights to control your personal information. Large title si eros laoreet cui bitur mey socio' } /> Viverra volutpat varius gravida ultrices pulvinar. Hyperlink facilisis risus, quisque imperdiet pellentesque cursus etiam porttitor. Diam, feugiat facilisis congue lectus neque, risus. Integer id suscipit ut quis in hendrerit placerat nisi a. Volutpat at vitae vitae luctus ut odio. Felis vel risus elementum, sagittis. Mattis et faucibus diam quis risus. Odio vel viverra nulla consectetur varius tellus integer ultrices.

    ' } /> Small title si eros laoreet cui bitur mey socio' } /> Viverra volutpat varius gravida ultrices pulvinar. Adipiscing facilisis risus, quisque imperdiet pellentesque cursus etiam hyperlink hover. Diam, feugiat facilisis congue lectus neque, risus. Integer id suscipit ut quis in hendrerit placerat nisi a. Volutpat at vitae vitae luctus ut odio.
    Volutpat at vitae vitae luctus ut odio. Felis vel risus elementum, sagittis. Mattis et faucibus diam quis risus. Odio vel viverra nulla consectetur varius tellus integer ultrices.

    ' } /> Small title si eros laoreet cui bitur mey socio' } />
  • List volutpat varius gravida ultrices pulvinar.
  • Adipiscing facilisis risus, quisque imperdiet pellentesque cursus etiam porttitor.
  • Diam, feugiat facilisis congue lectus neque, risus.
  • Integer id suscipit ut quis in hendrerit placerat nisi a.
  • Volutpat at vitae vitae luctus ut odio.
  • ' } /> Small title si eros laoreet cui bitur mey socio' } />
  • List volutpat varius gravida ultrices pulvinar.
  • Adipiscing facilisis risus, quisque imperdiet pellentesque cursus etiam porttitor.
  • Diam, feugiat facilisis congue lectus neque, risus.
  • Integer id suscipit ut quis in hendrerit placerat nisi a.
  • Volutpat at vitae vitae luctus ut odio.
  • ' } /> Small title si eros laoreet cui bitur mey socio' } /> Viverra volutpat varius gravida ultrices pulvinar. Adipiscing facilisis risus, quisque imperdiet pellentesque cursus etiam hyperlink hover. Diam, feugiat facilisis congue lectus neque, risus. Integer id suscipit ut quis in hendrerit placerat nisi a. Volutpat at vitae vitae luctus ut odio.
    Volutpat at vitae vitae luctus ut odio. Felis vel risus elementum, sagittis. Mattis et faucibus diam quis risus. Odio vel viverra nulla consectetur varius tellus integer ultrices.

    ' } /> --- ## Security URL: https://spice.ai/security Date: 2025-11-21T00:42:01 Description: Learn how Spice AI protects your data with SOC 2 Type II compliance, strong access controls, encryption, secure coding, and a principled, defense-in-depth approach. Yes. Spice AI has achieved SOC 2 Type II compliance, independently audited by Prescient Assurance in accordance with AICPA standards. This certification validates our commitment to enterprise-grade security, availability, and process integrity. A copy of the audit report is available to customers on the Spice.ai Enterprise plan upon request.

    \n', }, { title: 'How is data protected in Spice?', paragraph: '

    Spice AI encrypts all sensitive data in transit and at rest. Corporate secrets are stored in an enterprise-grade password manager with SSO access, and service secrets are managed using platform-specific secure key vaults. TLS 1.2+ is enforced for all encrypted transmissions. Access is logged, auditable, and restricted using least-privilege and JIT access controls.

    \n', }, { title: 'How does Spice enforce access control?', paragraph: '

    Spice uses a combination of SSO, RBAC, strong authentication, and least-privilege policies to protect systems and environments. Access is granted only when required through just-in-time (JIT) workflows and automatically expires after a limited time. All access is logged, monitored, and auditable.

    \n', }, ], padding_top: 'st-xl', padding_bottom: 'sb-xl', }} /> --- ## Application Search URL: https://spice.ai/use-case/application-search Date: 2025-11-21T16:16:20 Description: Add fast, relevant search to your app with hybrid SQL search. Governed, low-latency, and easy to ship anywhere. --- ## Datalake Accelerator URL: https://spice.ai/use-case/datalake-accelerator Date: 2025-11-21T16:44:36 Description: Accelerate query performance in your data lake with Spice. Run SQL locally on federated datasets for up to 100x faster performance. --- ## Operational Data Lakehouse URL: https://spice.ai/use-case/operational-data-lakehouse Date: 2025-11-21T18:54:05 Description: Federate, accelerate, and serve data-intensive apps and AI agents directly from object storage with millisecond performance. --- ## Retrieval-Augmented Generation URL: https://spice.ai/use-case/retrieval-augmented-generation Date: 2025-11-21T18:37:26 Description: Build more accurate and trustworthy RAG systems. Spice unifies SQL federation, vector search, and model inference for data-grounded AI responses. --- ## Secure AI Agents URL: https://spice.ai/use-case/secure-ai-agents Date: 2025-11-21T19:01:05 Description: Build and deploy AI agents that are secure by design. Federate-governed context, enforce policy inline, and route to any model with full auditability. ---