A Developer’s Guide to Understanding Spice.ai

Engineering

Spice AI

Spice Cloud Platform

Spice OSS

Viktor Yershov

February 5, 2026
A Developer’s Guide to Understanding Spice.ai

TL;DR 

This hands-on guide is designed to help developers quickly build an understanding of Spice: what it is (an AI-native query engine that federates queries, accelerates data, and integrates search and AI), when to use it (data-intensive applications and AI agents), and how it can be leveraged to solve enterprise-scale data challenges.  

*Note: This guide was last updated on February 5, 2026. Please see the docs for the latest updates.  

Who this guide is for 

This guide is for developers who want to understand why, how, and when to use Spice.ai.  

If you are new to Spice, you might also be wondering how Spice is different to other query engines or data and AI platforms. Most developers exploring Spice are generally doing one of the following: 

  • Operationalizing data lakes for real-time queries and search 
  • Building applications that need fast access to disparate data 
  • Building AI applications and agents that need fast, secure context 

Let’s start with the problem Spice is solving to anchor the discussion.  

The problem Spice solves 

Modern applications face a distributed data challenge. 

Enterprise data is spread across operational databases, data lakes, warehouses, third-party APIs, and more. Each source has its own interface, latency characteristics, and access patterns. 

AI workloads amplify the problem. RAG applications generally require: 

  • A vector database (e.g. Pinecone, Weaviate) for embeddings 
  • A text search engine (e.g. Elasticsearch) for keyword matching 
  • A cache layer (e.g. Redis) for performance & latency 
  • Model hosting and serving (OpenAI, Anthropic) for LLM inference 
  • Orchestration code and services to coordinate everything 

This can be a lot of complexity, even for a simple application. 

What is Spice? 

Spice is an open-source SQL querysearch, and LLM-inference engine written in Rust, purpose-built for data-driven applications and AI agents. At its core, Spice is a high-performance compute engine that federates, searches, and processes data across your existing infrastructure - querying & accelerating data where it lives and integrating search and AI capabilities through SQL. 

Figure 1. Spice.ai architecture

Unlike databases that require migrations & maintenance, Spice takes a declarative configuration approach: datasets, views, models, tools are defined in declarative YAML, and Spice handles the operations of fetching, caching, and serving that data.  

This makes Spice ideal when: 

  • Your application needs fast, unified access to disparate data sources 
  • You want simplicity and to avoid building and maintaining ETL pipelines 
  • You want an operational data lake house for applications and agents 
  • You need sub-second query performance without ETL 

What Spice is not: 

  • Not a replacement for PostgreSQL or MySQL (use those for transactional workloads) 
  • Not a data warehouse (use Snowflake/Databricks for centralized analytics) 

Mental model: Spice as a data and AI substrate 

Think of Spice as the operational data & AI layer between your applications and your data infrastructure. 

Figure 2. Spice as the data substrate for data-intensive AI apps

How this guide works 

We'll start with a hands-on quickstart to get Spice running, then progressively build your mental model through the core concepts: 

  1. Federation 
  1. Acceleration 
  1. Views 
  1. Caching 
  1. Snapshots 
  1. Models 
  1. Search 
  1. Writes 

By the end, you'll understand how these primitives are used together to solve enterprise-scale data challenges.

Quickstart 

To install and get Spice started, run: 

curl https://install.spiceai.org | /bin/bash

Or using Homebrew: 

brew install spiceai/spiceai/spice

Next, in any folder, create a spicepod.yaml file with the following content: 

version: v1
kind: Spicepod
name: my_spicepod

datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips

In the same folder, run:

spice run

And, finally, in another terminal, run:

> spice sql
Welcome to the Spice.ai SQL REPL! Type 'help' for help.

show tables; -- list available tables
sql> show tables;
+--------------+---------------+--------------+-------------+
| table_catalog | table_schema | table_name  | table_type   |
+--------------+---------------+--------------+-------------+
| spice              | runtime          | task_history  | BASE TABLE |
| spice              | public             | taxi_trips      | BASE TABLE |
+--------------+---------------+--------------+-------------+

Time: 0.010767 seconds. 2 rows.
sql> select count(*) from taxi_trips ;
+----------+
| count(*) |
+----------+
| 2964624  |
+----------+
Understanding what just happened  

In that quickstart, you: 

  1. Configured a dataset (taxi_trips) pointing to a remote S3 bucket 
  1. Started the Spice runtime, which connected to that source 
  1. Queried the data using standard SQL - without moving or copying it. 

Spice.ai Cloud Platform 

You can run the same Spicepod configuration in Spice.ai Cloud, the fully managed version of Spice that extends the open-source runtime with enterprise capabilities: built-in observability, elastic scaling, and team collaboration. 

Core Concepts 

1. Federation 

In the quickstart, you queried taxi_trips stored in a remote S3 bucket using standard SQL without copying or moving that data. That's federation in action - querying data where it lives, not where you've moved it to. 

This is foundational to Spice's architecture. Federation in Spice enables you to query data across multiple heterogeneous sources using a single SQL interface, without moving data or building ETL pipelines. 

Traditional approaches force you to build ETL pipelines that extract data from these sources, transform it, and load it into a centralized database or warehouse. Every new data source means building and maintaining another pipeline. 

Spice connects directly to your existing data sources and provides a unified SQL interface across all of them. You configure datasets declaratively in YAML, and Spice handles the connection, query translation, and result aggregation.  

Spice supports query federation across: 

  • Databases: PostgreSQL, MySQL, Microsoft SQL Server, Oracle, MongoDB, ClickHouse, DynamoDB, ScyllaDB 
  • Data Warehouses: Snowflake, Databricks, BigQuery 
  • Data Lakes: S3, Azure Blob Storage, Delta Lake, Apache Iceberg 
  • Other Sources: GitHub, GraphQL, FTP/SFTP, IMAP, Kafka, HTTP/API, and 30+ more connectors 
Figure 3. Spice Federation (and acceleration) architecture
How it works 

When you configure multiple datasets from different sources, Spice's query planner (built on Apache DataFusion) optimizes and routes queries appropriately: 

datasets:
  # From PostgreSQL
  - from: postgres:customers
    name: customers
    params:
      pg_host: db.example.com
      pg_user: ${secrets:PG_USER}

  # From S3 Parquet files
  - from: s3://bucket/orders/
    name: orders
    params:
      file_format: parquet

  # From Snowflake
  - from: snowflake:analytics.sales
    name: sales
-- Query across all three sources in one statement
SELECT c.name, o.order_total, s.region
FROM customers c
  JOIN orders o ON c.id = o.customer_id
  JOIN sales s ON o.id = s.order_id
WHERE s.region = 'EMEA';

Without additional configuration, each query fetches data directly from the underlying sources. Spice optimizes this as much as possible using filter pushdown and column projection.  

📚 DocsSpice Federation and Data Connectors 

2. Acceleration 

Federation solves the data movement problem, but alone often isn't enough for production applications. Querying remote S3 buckets for every request introduces latency - even with query pushdown and optimization, round-trips to distributed data sources can take seconds (or tens of seconds) for large datasets. 

Figure 4. Acceleration example in a Spice sidecar architecture

Spice data acceleration materializes working sets of data locally, reducing query latency from seconds to milliseconds. When enabled, Spice syncs data from connected sources and stores it in local stores, like DuckDB or Vortex - giving you the speed of local data with the flexibility of federated access. 

You can think of acceleration as an intelligent caching layer that understands your data access patterns. Hot data gets materialized locally for instant access and cold data remains federated. Unlike traditional caches that just store query results or static database materializations, Spice accelerates entire datasets with configurable refresh strategies, with the flexible compute of an embedded database.  

Acceleration Engines 
Engine Mode Best For 
Arrow In-memory only Ultra-fast analytical queries, ephemeral workloads 
DuckDB Memory or file General-purpose OLAP, medium datasets, persistent storage 
SQLite Memory or file Row-oriented lookups, OLTP patterns, lightweight deployments 
Cayenne File only High-volume multi-file workloads, terabyte-scale data 

To enable acceleration, add the acceleration block to your dataset configuration: 

datasets: 
  - from: s3://data-lake/events/ 
    name: events 
    acceleration: 
      enabled: true 
      engine: cayenne       # Choose your engine 
      mode: file            # 'memory' or 'file' 
  

With this configuration, Spice fetches the events dataset from S3 and stores it in a local Spice Cayenne Vortex files. Queries to events are then served from the local disk instead of making remote calls to S3.  

Figure 5. Spice Cayenne architecture

While DuckDB and SQLite are general purpose engines, Spice Cayenne is purpose-built for modern data lake workloads. It's built on Vortex - a next-generation columnar format under the Linux Foundation - designed for the scale and access patterns of object storage. 

Learn more: Introducing the Spice Cayenne Data Accelerator  

📚 DocsData Accelerators 

Refresh Modes 

Spice offers multiple strategies for keeping accelerated data synchronized with sources: 

Mode Description Use Case 
full Complete dataset replacement on each refresh Small, slowly-changing datasets 
append (batch) Adds new records based on a time column Append-only logs, time-series data 
append (stream) Continuous streaming without time column Real-time event streams 
changes CDC-based incremental updates via Debezium or DynamoDB Frequently updated transactional data 
caching Request-based row-level caching API responses, HTTP endpoints 
# Full refresh every 8 hours
acceleration:
  refresh_mode: full
  refresh_check_interval: 8h

# Append mode: check for new records from the last day every 10 minutes
acceleration:
  refresh_mode: append
  time_column: created_at
  refresh_check_interval: 10m
  refresh_data_window: 1d

# Continuous ingestion using Kafka
acceleration:
  refresh_mode: append

# CDC with Debezium or DynamoDB Streams
acceleration:
  refresh_mode: changes

📚 DocsRefresh Modes 

Retention Policies

While refresh modes control how acceleration is populated, retention policies prevent unbounded growth. As data continuously flows into an accelerated dataset—especially in append or streaming modes—storage can grow indefinitely. Retention policies automatically evict stale data using time-based or custom SQL strategies. 
 
Retention is particularly useful for time-series workloads like logs, metrics, and event streams where only recent data is relevant for queries. For example, an application monitoring dashboard might only need the last 7 days of logs for troubleshooting, while a real-time analytics pipeline processing IoT sensor data might retain just 24 hours of readings. By defining retention policies, you ensure accelerated datasets stay bounded and performant without manual intervention. 
 
Spice supports two retention strategies: time-based, which removes records older than a specified period, and custom SQL-based, which executes arbitrary DELETE statements for more complex eviction logic. Once defined, Spice runs retention checks automatically at the configured interval: 

acceleration: 
  # Common retention parameters 
  retention_check_enabled: true 
  retention_check_interval: 1h  
 # Time-based retention policy     
 retention_period: 7d 
  # Custom SQL-based Retention 
  retention_sql: "DELETE FROM logs WHERE status = 'archived'"

📚 DocsRetention 

Constraints and Indexes 

Accelerated datasets support primary key constraints and indexes for optimized query performance and data integrity: 

datasets: 
  - from: postgres:orders 
    name: orders 
    acceleration: 
      enabled: true 
      engine: duckdb 
      primary_key: order_id # Creates non-null unique index 
      indexes: 
        customer_id: enabled  # Single column index 
        '(created_at, status)': unique  # Multi-column unique index

📚 DocsConstraints & Indexes 

3. Views 

Views are virtual tables defined by SQL queries - useful for pre-aggregations, transformations, and simplified access patterns: 

views: 
  - name: daily_revenue 
    sql: | 
      SELECT  
        DATE_TRUNC('day', created_at) as day, 
        SUM(amount) as revenue, 
        COUNT(*) as transactions 
      FROM orders 
      GROUP BY 1 
 
  - name: top_customers 
    sql: | 
      SELECT customer_id, SUM(total) as lifetime_value 
      FROM orders 
      GROUP BY customer_id 
      ORDER BY lifetime_value DESC 
      LIMIT 100 

📚 DocsViews 

4. Caching 

Spice provides in-memory caching for SQL query results, search results, and embeddings - all enabled by default. Caching eliminates redundant computation for repeated queries and improves performance for non-accelerated datasets. 

runtime: 
  caching: 

   sql_results: 
     enabled: true 
     cache_max_size: 128MiB 
     eviction_policy: lru 
     item_ttl: 1s 
     encoding: none 
   
  search_results: 
      enabled: true 
      cache_max_size: 128MiB 
      eviction_policy: lru 
      item_ttl: 1s 
      encoding: none 

  embeddings_results: 
      enabled: true 
      cache_max_size: 128MiB 
      eviction_policy: lru 
      item_ttl: 1s 
      encoding: none 
  
Option  Description  Default  
cache_max_size  Entry expiration duration  128 MiB  
item_ttl  Maximum cache storage  1 second  
eviction_policy  `lru` (least-recently-used) or `tiny_lfu`  lru  
encoding  Compression: `zstd` or `none`  none  

Spice also supports HTTP cache-control headers (no-cache, max-stale, only-if-cached) for fine-grained control over caching behavior per request. 

📚 DocsResults Caching 

5. Snapshots 

Snapshots allow file-based acceleration engines (DuckDB, SQLite, or Cayenne) to bootstrap from pre-stored snapshots in object storage. This dramatically reduces cold-start latency in distributed deployments. 

snapshots: 
  enabled: true 
  location: s3://large_table_snapshots 
 
datasets: 
  - from: postgres:large_table 
    name: large_table 
    acceleration: 
      engine: duckdb 
      mode: file 
      snapshots: enabled 

Snapshot triggers vary by refresh mode: 

  • refresh_complete: Creates snapshots after each refresh (full and batch-append modes) 
  • time_interval: Creates snapshots on a fixed schedule (all refresh modes) 
  • stream_batches: Creates snapshots after every N batches (streaming modes: Kafka, Debezium, DynamoDB Streams) 

📚 DocsSnapshots 

6. Models 

AI is a first-class capability in the Spice runtime - not a bolt-on integration. Instead of wiring external APIs, you call LLMs directly from SQL queries using the `ai()` function. Embeddings generate automatically during data ingestion, eliminating separate pipeline infrastructure. Text-to-SQL is schema-aware with direct data access, preventing the hallucinations common in external tools that don’t understand your table structure.  

This SQL-first approach means you can query your federated and accelerated data, pipe results to an LLM for analysis, and get synthesized answers in a single SQL statement.  

You can connect to hosted providers (OpenAI, Anthropic, Bedrock) or serve models locally with GPU acceleration. Spice provides an OpenAI-compatible AI Gateway, so existing applications using OpenAI SDKs can swap endpoints without code changes. 

Chat Models 

Connect to hosted models or serve locally: 

models: 
  - name: gpt4 
    from: openai:gpt-4o 
    params: 
      openai_api_key: ${secrets:OPENAI_API_KEY} 
      tools: auto    # Enable tool use 
 
  - name: claude 
    from: anthropic:claude-3-5-sonnet 
    params: 
      anthropic_api_key: ${secrets:ANTHROPIC_KEY} 
 
  - name: local_llama 
    from: huggingface:huggingface.co/meta-llama/Llama-3.1-8B 
  

Use via the OpenAI-compatible API or the spice chat CLI: 

$ spice chat
Using model: gpt4
chat> How many orders were placed last month?
Based on the orders table, there were 15,234 orders placed last month.
NSQL (Text-to-SQL) 

The /v1/nsql endpoint converts natural language to SQL and executes it: 

curl -XPOST "http://localhost:8090/v1/nsql" \ 
  -H "Content-Type: application/json" \ 
  -d '{"query": "What was the highest tip any passenger gave?"}'

Spice uses tools like table_schema, random_sample, and sample_distinct_columns to help models write accurate, contextual SQL. 

Embeddings 

Transform text into vectors for similarity search. These embeddings power the vector search capabilities covered in the 'search' section coming up next: 

embeddings: 
  - name: openai_embed 
    from: openai:text-embedding-3-small 
    params: 
      openai_api_key: ${secrets:OPENAI_API_KEY} 
 
  - name: bedrock_titan 
    from: bedrock:amazon.titan-embed-text-v2:0 
    params: 
      aws_region: us-east-1 
 
  - name: local_minilm 
    from: huggingface:sentence-transformers/all-MiniLM-L6-v2 
  

Configure columns for automatic embedding generation: 

datasets: 
  - from: postgres:documents 
    name: documents 
    acceleration: 
      enabled: true 
      columns: 
        - name: content 
          embeddings: 
            - from: openai_embed 
              chunking: 
                enabled: true 
                target_chunk_size: 512

📚 DocsModels & Embeddings 

7. Search 

In the previous section, we configured embeddings to generate automatically during data ingestion. Those embeddings enable vector search - one of three search methods Spice provides as native SQL functions. 

Spice takes the same integrated approach with search as it does with AI. Search indexes are built on top of accelerated datasets - the same data you're querying and piping to LLMs. Full-text search uses Tantivy with BM25 scoring for keyword matching. Vector search uses the embeddings you've already configured to generate during ingestion. Hybrid search combines both methods with Reciprocal Rank Fusion (RRF) to merge rankings - all via SQL functions like`text_search()`, `vector_search()`, and `rrf()`. Search in Spice powers retrieval-augmented generation (RAG), recommendation systems, and content discovery: 

Method Best For How It Works 
Full-Text Search  Keyword matching, exact phrases  BM25 scoring via Tantivy  
Vector Search  Semantic similarity, meaning-based retrieval  Embedding distance calculation  
Hybrid Search  Queries with both keywords and semantic similarityHybrid execution and ranking through Reciprocal Rank Fusion (RRF)  
Full-Text Search 

Full-text search performs keyword-driven retrieval optimized for text data. Powered by Tantivy with BM25 scoring, it excels at finding exact phrases, specific terms, and keyword combinations. Enable it by indexing the columns you want to search: 

datasets: 
  - from: postgres:articles 
    name: articles 
    acceleration: 
      enabled: true 
      columns: 
        - name: title 
          full_text_search: enabled 
        - name: body 
          full_text_search: enabled 
SELECT * FROM text_search(articles, 'machine learning', 10);
Vector Search 

Vector search uses embeddings to find documents based on semantic similarity rather than exact keyword matches. This is particularly useful when users search with different wording than the source content—a query for "how to fix login issues" can match documents about "authentication troubleshooting." 
 
Spice supports both local embedding models (like sentence-transformers from Hugging Face) and remote providers (OpenAI, Anthropic, etc.). Embeddings are configured as top-level components and referenced in dataset columns:

datasets: 
  - from: s3://docs/ 
    name: documents 
    vectors: 
      enabled: true 
      columns: 
        - name: body 
        embeddings:             
         - from: openai_embed 
 SELECT * FROM vector_search (documents, 'How do I reset my password?', 10) 
WHERE category = 'support' 
ORDER BY score;

Vector search is also available via the `/v1/search` HTTP API for direct integration with applications. 

Hybrid Search with RRF 

Neither vector nor full-text search alone produces optimal results for every query. A search for "Python error 403" benefits from both semantic understanding ("error" relates to "exception," "failure") and exact keyword matching ("403," "Python"). Hybrid search combines results from multiple search methods using Reciprocal Rank Fusion (RRF), merging rankings to improve relevance across diverse content types: 

SELECT * FROM rrf( 
  vector_search(docs, 'query', 10), 
  text_search(docs, 'query', 10) 
) LIMIT 10;

📚 DocsSearch & Vector Search 

8. Writing Data 

Spice supports writing to Apache Iceberg tables and Amazon S3 Tables via standard INSERT INTO statements. 

Apache Iceberg Writes 
catalogs: 
  - from: iceberg:https://glue.us-east 1.amazonaws.com/iceberg/v1/catalogs/123456/namespaces 
    name: ice 
    access: read_write 
 
datasets: 
  - from: iceberg:https://catalog.example.com/v1/namespaces/sales/tables/transactions 
    name: transactions 
    access: read_write
-- Insert from another table 
INSERT INTO transactions 
SELECT * FROM staging_transactions; 
 
-- Insert with values 
INSERT INTO transactions (id, amount, timestamp) 
VALUES (1001, 299.99, '2025-01-15'); 
 
-- Insert into catalog table 
INSERT INTO ice.sales.orders 
SELECT * FROM federated_orders; 
  
Amazon S3 Tables 

Spice offers full read/write capability for Amazon S3 Tables, enabling direct integration with AWS' managed table format for S3: 

datasets: 
  - from: glue:my_namespace.my_table 
    name: my_table 
    params: 
      glue_region: us-east-1 
      glue_catalog_id: 123456789012:s3tablescatalog/my-bucket 
    access: read_write 

Note: Write support requires access: read_write configuration.

📚 DocsWrite-Capable Connectors 

Deployment 

Spice is designed for deployment flexibility and optionality – from edge devices to multi-node distributed clusters. It ships as a single file ~140MB binary with no external dependencies beyond your configured data sources.  

This portability means you can deploy the same Spicepod configuration on a Raspberry Pi at the edge, as a sidecar in your Kubernetes cluster, or as a fully-managed cloud service - without code changes: 

Deployment Model Description Best For 
Standalone Single instance via Docker or binary Development, edge devices, simple workloads 
Sidecar Co-located with your application pod Low-latency access, microservices architectures 
Microservice Multiple replicas deployed behind a load balancer Loosely couple architectures, heavy or varying traffic 
Cluster Distributed multi-node deployment Large-scale data, horizontal scaling, fault tolerance 
Sharded Horizontal data partitioning across multiple instances Large scale data, distributed query execution 
Tiered Hybrid approach combining sidecar for performance and shared microservice for batch processing Varying requirements across different application components 
Cloud Fully-managed cloud platform  Auto-scaling, built-in observability, zero operational overhead.  

Putting it all together 

Spice makes data fast, federated, and AI-ready - through configuration, not code. The flexibility of this architecture means you can start simple and evolve incrementally.  

Concept Purpose 
Federation Query 30+ sources with unified SQL 
Acceleration Materialize data locally for sub-second queries 
Views Virtual tables from SQL transformations 
Snapshots Fast cold-start from object storage 
Models Chat, NSQL, and embeddings via OpenAI-compatible API 
Search Full-text and vector search integrated in SQL 
Writes INSERT INTO for Iceberg and Amazon S3 tables 

What can you build with Spice? 

Use Case How Spice Helps 
Operational Data Lakehouse  Serve real-time operational workloads and AI agents directly from Apache Iceberg, Delta Lake, or Parquet with sub-second query latency. Spice federates across object storage and databases, accelerates datasets locally, and integrates hybrid search and LLM inference - eliminating separate systems for operational access. 
Data lake Accelerator Accelerate data lake queries from seconds to milliseconds by materializing frequently-accessed datasets in local engines. Maintain the scale and cost efficiency of object storage while delivering operational-grade query performance with configurable refresh policies. 
Data Mesh Unified SQL access across distributed data sources with automatic performance optimization 
Enterprise Search Combine semantic and full-text search across structured and unstructured data 
RAG Pipelines Merge federated data with vector search and LLMs for context-aware AI applications 
Real-Time Analytics Stream data from Kafka or DynamoDB with sub-second latency into accelerated tables 
Agentic AI Build autonomous agents with tool-augmented LLMs and fast access to operational data 

Whether you're replacing complex ETL pipelines, building AI-powered applications, or deploying intelligent agents at the edge—Spice provides the primitives to deliver fast, context-aware access to data wherever it lives. 

📚 DocsUse Cases 

Next steps  

Now that you have a mental model for Spice, check out the cookbook recipes for 80+ examples, the GitHub repo, the full docs, and join us on Slack to connect directly with the team and other Spice users.

And, remember these principles: 

  1. Spice is a runtime, not a database: It federates across your existing data infrastructure 
  1. Configuration over code: Declarative YAML replaces custom integration code 
  1. Acceleration is optional but powerful: Start with federation, add acceleration for latency-sensitive use cases  
  1. Composable primitives: Federation + Acceleration + Search + LLM Models work together 
  1. SQL-first: Everything accessible through standard SQL queries 

Share
twitter logolinkedin logomailto logo
copy link logo
Get the latest insights

New releases, tutorials, platform updates, and more.

See Spice in action

Get a guided walkthrough of how development teams use Spice to query, accelerate, and integrate AI for mission-critical workloads.

Get a demo

content stat graphiccontent stat graphiccontent stat orb