Learn Data & AI

Understand the core technologies behind modern data and AI infrastructure. Each guide explains a key concept in depth -- how it works, when to use it, and how it connects to the broader data stack.

Data Infrastructure

SQL Federation

What is SQL Federation?

Query multiple databases with a single SQL statement without moving data. Learn how federated queries work, predicate pushdown, and common use cases.

Read the guide

Data Virtualization

What is Data Virtualization?

Access and combine data from multiple sources through a unified interface without replication. Learn how it compares to ETL and when to use it.

Read the guide

Data Acceleration

What is Data Acceleration?

Cache frequently accessed data locally for sub-second queries while keeping it fresh with CDC. Learn acceleration strategies and when to use them.

Read the guide

Change Data Capture

What is Change Data Capture?

Track row-level database changes and stream them in real time. Learn log-based, trigger-based, and polling patterns for real-time data pipelines.

Read the guide

Search

Hybrid Search

What is Hybrid Search?

Combine vector similarity with keyword matching for more accurate results. Learn about RRF, score fusion, and why hybrid search matters for RAG.

Read the guide

BM25 Full-Text Search

What is BM25 Full-Text Search?

The standard ranking function for full-text search. Learn how BM25 scores documents, how inverted indexes work, and when keyword search needs vector search.

Read the guide

Vector Search

What is Vector Search?

Find semantically similar content by comparing vector embeddings. Learn about ANN algorithms, distance metrics, and vector indexes.

Read the guide

AI & LLMs

Retrieval Augmented Generation

What is RAG?

Retrieval augmented generation grounds LLM responses in real data at inference time. Learn the three-stage pipeline, production challenges, and hybrid search integration.

Read the guide

LLM Inference

What is LLM Inference?

Understand how large language models generate responses. Learn about tokenization, KV caching, quantization, and latency optimization.

Read the guide

LLM Tool Calling

What is LLM Tool Calling?

LLMs output structured function calls instead of text to interact with external tools. Learn the tool calling loop, security considerations, and MCP.

Read the guide

Model Context Protocol

What is the Model Context Protocol?

MCP standardizes how AI models discover and invoke external tools and data. Learn the client-server architecture and how gateways enable enterprise AI.

Read the guide

Text-to-SQL

What is Text-to-SQL?

Translate natural language questions into SQL queries using LLMs. Learn about NSQL, schema-aware generation, and production safeguards.

Read the guide

Embeddings

What are Embeddings?

Dense vector representations that capture semantic meaning. Learn how embedding models work, how they enable semantic search and RAG, and how to choose the right model.

Read the guide

Open-Source Technologies

Apache DataFusion logo

What is Apache DataFusion?

An extensible SQL query engine written in Rust. Learn the architecture, how it compares to DuckDB and Trino, and how Spice extends it.

Read the guide

Apache Ballista logo

What is Apache Ballista?

A distributed SQL query engine that scales DataFusion across multiple nodes. Learn the scheduler-executor architecture and how it compares to Spark.

Read the guide

Vortex logo

What is Vortex?

A compressed columnar file format with adaptive encoding for fast analytical queries. Learn how it compares to Parquet and powers Spice Cayenne.

Read the guide

Comparisons

SQL Federation vs ETL

Query data in place or move it to a warehouse? Compare federation and ETL across latency, freshness, cost, and operational complexity.

Read the guide

Full-Text Search vs Vector Search

Keyword matching or semantic similarity? Compare BM25 and vector search across accuracy, performance, and use cases -- and learn when hybrid search wins.

Read the guide

RAG vs Fine-Tuning

Ground LLM responses with retrieved data or train the model directly? Compare RAG and fine-tuning across cost, freshness, accuracy, and implementation effort.

Read the guide

Data Virtualization vs Data Replication

Access data virtually or replicate it physically? Compare virtualization and replication across latency, consistency, cost, and when to combine both.

Read the guide

Sidecar vs Microservice Architecture

Deploy a data runtime alongside your app or as a shared service? Compare sidecar and microservice architectures across latency, scaling, resource usage, and operational complexity.

Read the guide

Architecture

Hybrid Data Architecture

What is a Hybrid Data Architecture?

Combine sidecar caching with a centralized cluster for sub-millisecond reads and centralized data management. Learn the CDN-for-data pattern.

Read the guide

See Spice in action

Get a guided walkthrough of how development teams use Spice to query, accelerate, and integrate AI for mission-critical workloads.

Get a demo