Learn Data & AI
Understand the core technologies behind modern data and AI infrastructure. Each guide explains a key concept in depth -- how it works, when to use it, and how it connects to the broader data stack.
Data Infrastructure
What is SQL Federation?
Query multiple databases with a single SQL statement without moving data. Learn how federated queries work, predicate pushdown, and common use cases.
Read the guide
What is Data Virtualization?
Access and combine data from multiple sources through a unified interface without replication. Learn how it compares to ETL and when to use it.
Read the guide
What is Data Acceleration?
Cache frequently accessed data locally for sub-second queries while keeping it fresh with CDC. Learn acceleration strategies and when to use them.
Read the guide
What is Change Data Capture?
Track row-level database changes and stream them in real time. Learn log-based, trigger-based, and polling patterns for real-time data pipelines.
Read the guide
Search
What is Hybrid Search?
Combine vector similarity with keyword matching for more accurate results. Learn about RRF, score fusion, and why hybrid search matters for RAG.
Read the guide
What is BM25 Full-Text Search?
The standard ranking function for full-text search. Learn how BM25 scores documents, how inverted indexes work, and when keyword search needs vector search.
Read the guide
What is Vector Search?
Find semantically similar content by comparing vector embeddings. Learn about ANN algorithms, distance metrics, and vector indexes.
Read the guide
AI & LLMs
What is RAG?
Retrieval augmented generation grounds LLM responses in real data at inference time. Learn the three-stage pipeline, production challenges, and hybrid search integration.
Read the guide
What is LLM Inference?
Understand how large language models generate responses. Learn about tokenization, KV caching, quantization, and latency optimization.
Read the guide
What is LLM Tool Calling?
LLMs output structured function calls instead of text to interact with external tools. Learn the tool calling loop, security considerations, and MCP.
Read the guide
What is the Model Context Protocol?
MCP standardizes how AI models discover and invoke external tools and data. Learn the client-server architecture and how gateways enable enterprise AI.
Read the guide
What is Text-to-SQL?
Translate natural language questions into SQL queries using LLMs. Learn about NSQL, schema-aware generation, and production safeguards.
Read the guide
What are Embeddings?
Dense vector representations that capture semantic meaning. Learn how embedding models work, how they enable semantic search and RAG, and how to choose the right model.
Read the guide
Open-Source Technologies
What is Apache DataFusion?
An extensible SQL query engine written in Rust. Learn the architecture, how it compares to DuckDB and Trino, and how Spice extends it.
Read the guide
What is Apache Ballista?
A distributed SQL query engine that scales DataFusion across multiple nodes. Learn the scheduler-executor architecture and how it compares to Spark.
Read the guide
What is Vortex?
A compressed columnar file format with adaptive encoding for fast analytical queries. Learn how it compares to Parquet and powers Spice Cayenne.
Read the guide
Comparisons
SQL Federation vs ETL
Query data in place or move it to a warehouse? Compare federation and ETL across latency, freshness, cost, and operational complexity.
Read the guide
Full-Text Search vs Vector Search
Keyword matching or semantic similarity? Compare BM25 and vector search across accuracy, performance, and use cases -- and learn when hybrid search wins.
Read the guide
RAG vs Fine-Tuning
Ground LLM responses with retrieved data or train the model directly? Compare RAG and fine-tuning across cost, freshness, accuracy, and implementation effort.
Read the guide
Data Virtualization vs Data Replication
Access data virtually or replicate it physically? Compare virtualization and replication across latency, consistency, cost, and when to combine both.
Read the guide
Sidecar vs Microservice Architecture
Deploy a data runtime alongside your app or as a shared service? Compare sidecar and microservice architectures across latency, scaling, resource usage, and operational complexity.
Read the guide
Architecture
What is a Hybrid Data Architecture?
Combine sidecar caching with a centralized cluster for sub-millisecond reads and centralized data management. Learn the CDN-for-data pattern.
Read the guide
See Spice in action
Get a guided walkthrough of how development teams use Spice to query, accelerate, and integrate AI for mission-critical workloads.
See Spice in action
Get a guided walkthrough of how development teams use Spice to query, accelerate, and integrate AI for mission-critical workloads.
Get a demo