HelpLogin
© 2022-2025 Spice AI, Inc.
SQL Query ReferenceDocsFAQSupport
PrivacyTerms
Status
Home
Datasets
Models
https://avatars.githubusercontent.com/lukekimlukekim/demo
lukekim / demo
🌶️0
Datasets
Models
LicenseMIT License
Datasets
View all datasets
taxi_trips
S3
Models

No models configured

Contributorslukekimlukekimmitchdevenportmitchdevenportphillipleblancphillipleblancewgeniusewgeniusjohnnygjohnnygy-f-uy-f-udependabot[bot]dependabot[bot]JeadieJeadiecerexhecerexhe
lukekim/demo/README.md

Spice.ai Demo App

This is a Spice.ai data and AI app.

Prerequisites

  • Spice.ai CLI installed
  • OpenAI API key
  • Hugging Face API token (optional, for LLaMA model)
  • curl and jq for API calls

Learn More

To learn more about Spice.ai, take a look at the following resources:

  • Spice.ai - learn about Spice.ai features, data, and API.
  • Get started with Spice.ai - try out the API and make basic queries.

Connect with us on Discord - your feedback is appreciated!


Demo Steps

Publishing a Spice App in the Cloud

Step 1: Forking and Using the Dataset

  1. Fork the repository https://github.com/jeadie/evals into your GitHub org.

Step 2: Creating a New App in the Cloud

  1. Log into the Spice.ai Cloud Platform and create a new app called evals. The app will start empty.
  2. Connect the app to your repository:
    • Go to the App Settings tab and select Connect Repository.
    • If the repository is not yet linked, follow the prompts to authenticate and link it.

Step 3: Deploying the App

  1. Set the app to Public:
    • Navigate to the app's settings and toggle the visibility to public.
  2. Redeploy the app:
    • Click Redeploy to load the datasets and configurations from the repository.

Step 4: Verifying and Testing

  1. Check the datasets in the Spice.ai Cloud:
    • Verify that the datasets are correctly loaded and accessible.
  2. Test public access:
    • Log in with a different account to confirm the app is accessible to external users.

Initializing a Local Spice App

  1. Initialize a new local Spice app

  2. Login to Spice.ai Cloud

  3. Get spicepod from Spicerack Navigate to spicerack.org, search for evals.

    image

Click on /evals, click on Use this app, and copy the spice connect command.

image

Paste the command into the terminal. Navigate to spicerack.org, search for evals, click on /evals, click on Use this app, and copy the spice connect command. Paste the command into the terminal.

The spicepod.yml should be updated to:

  1. Add a model to the spicepod

  2. Start spice

  3. Run an eval

  4. Explore incorrect results


Optional: Create an Eval to Use a Smaller Model

  1. Track the outputs of all AI model calls:

  2. Define a new view and evaluation:

  3. Add a smaller model to the spicepod:

  4. Verify models are loaded:

    You should see both models listed:

  5. Restart the Spice app:

  6. Test the larger model or run another eval:

  7. Run evaluations on both models:

  8. Compare model performance:

    This query will show:

    • Total number of queries processed
    • Number of correct answers
    • Accuracy percentage as a percentage

    You can use these metrics to decide if the smaller model provides acceptable performance for your use case.


Full Spicepod Configuration

Include the following spicepod.yml for reference:

mkdir demo
cd demo
spice init
mkdir demo
cd demo
spice init
spice login
spice login
spice connect <username>/evals
spice connect <username>/evals
spice run
spice run
curl -XPOST "http://localhost:8090/v1/evals/taxes" -H "Content-Type: application/json" -d '{
"model": "gpt-4o"
}' | jq
curl -XPOST "http://localhost:8090/v1/evals/taxes" -H "Content-Type: application/json" -d '{
"model": "gpt-4o"
}' | jq
spice sql
spice sql
spice models
spice models
NAME FROM STATUS
gpt-4o openai:gpt-4o ready
llama3 huggingface:huggingface.co/meta-llama/Llama-3.3-70B-Instruct ready
NAME FROM STATUS
gpt-4o openai:gpt-4o ready
llama3 huggingface:huggingface.co/meta-llama/Llama-3.3-70B-Instruct ready
spice run
spice run
spice chat
spice chat
# Run eval with GPT-4
curl -XPOST "http://localhost:8090/v1/evals/mimic-user-queries" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o"}' | jq

# Run eval with LLaMA
curl -XPOST "http://localhost:8090/v1/evals/mimic-user-queries" \
-H "Content-Type: application/json" \
-d '{"model": "llama3"}' | jq
# Run eval with GPT-4
curl -XPOST "http://localhost:8090/v1/evals/mimic-user-queries" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o"}' | jq

# Run eval with LLaMA
curl -XPOST "http://localhost:8090/v1/evals/mimic-user-queries" \
-H "Content-Type: application/json" \
-d '{"model": "llama3"}' | jq
spice sql
spice sql
version: v1beta1
kind: Spicepod
name: demo

dependencies:
- Jeadie/evals
version: v1beta1
kind: Spicepod
name: demo

dependencies:
- Jeadie/evals
models:
- name: gpt-4o
from: openai:gpt-4o
params:
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }
models:
- name: gpt-4o
from: openai:gpt-4o
params:
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }
runtime:
task_history:
captured_output: truncated
runtime:
task_history:
captured_output: truncated
views:
- name: user_queries
sql: |
SELECT
json_get_json(input, 'messages') AS input,
json_get_str((captured_output -> 0), 'content') as ideal
FROM runtime.task_history
WHERE task='ai_completion'
- name: latest_eval_runs
sql: |
SELECT model, MAX(created_at) as latest_run
FROM eval.runs
GROUP BY model
- name: model_stats
sql: |
SELECT
r.model,
COUNT(*) as total_queries,
SUM(CASE WHEN res.value = 1.0 THEN 1 ELSE 0 END) as correct_answers,
AVG(res.value) as accuracy
FROM eval.runs r
JOIN latest_eval_runs lr ON r.model = lr.model AND r.created_at = lr.latest_run
JOIN eval.results res ON res.run_id = r.id
GROUP BY r.model

evals:
- name: mimic-user-queries
description: |
Evaluates how well a model can copy the exact answers already returned to a user. Useful for testing if a smaller/cheaper model is sufficient.
dataset: user_queries
scorers:
- match
views:
- name: user_queries
sql: |
SELECT
json_get_json(input, 'messages') AS input,
json_get_str((captured_output -> 0), 'content') as ideal
FROM runtime.task_history
WHERE task='ai_completion'
- name: latest_eval_runs
sql: |
SELECT model, MAX(created_at) as latest_run
FROM eval.runs
GROUP BY model
- name: model_stats
sql: |
SELECT
r.model,
COUNT(*) as total_queries,
SUM(CASE WHEN res.value = 1.0 THEN 1 ELSE 0 END) as correct_answers,
AVG(res.value) as accuracy
FROM eval.runs r
JOIN latest_eval_runs lr ON r.model = lr.model AND r.created_at = lr.latest_run
JOIN eval.results res ON res.run_id = r.id
GROUP BY r.model

evals:
- name: mimic-user-queries
description: |
Evaluates how well a model can copy the exact answers already returned to a user. Useful for testing if a smaller/cheaper model is sufficient.
dataset: user_queries
scorers:
- match
models:
- name: llama3
from: huggingface:huggingface.co/meta-llama/Llama-3.2-3B-Instruct
params:
hf_token: ${ secrets:SPICE_HUGGINGFACE_API_KEY }

- name: gpt-4o # Keep previous model.
models:
- name: llama3
from: huggingface:huggingface.co/meta-llama/Llama-3.2-3B-Instruct
params:
hf_token: ${ secrets:SPICE_HUGGINGFACE_API_KEY }

- name: gpt-4o # Keep previous model.
version: v1beta1
kind: Spicepod
name: demo

dependencies:
- Jeadie/evals

runtime:
task_history:
captured_output: truncated

views:
- name: user_queries
sql: |
SELECT
json_get_json(input, 'messages') AS input,
json_get_str((captured_output -> 0), 'content') as ideal
FROM runtime.task_history
WHERE task='ai_completion'

evals:
- name: mimic-user-queries
description: |
Evaluates how well a model can copy the exact answers already returned to a user. Useful for testing if a smaller/cheaper model is sufficient.
dataset: user_queries
scorers:
- match

models:
- name: gpt-4o
from: openai:gpt-4o
params:
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }

- name: llama3
from: huggingface:huggingface.co/meta-llama/Llama-3.2-3B-Instruct
params:
hf_token: ${ secrets:SPICE_HUGGINGFACE_API_KEY }
version: v1beta1
kind: Spicepod
name: demo

dependencies:
- Jeadie/evals

runtime:
task_history:
captured_output: truncated

views:
- name: user_queries
sql: |
SELECT
json_get_json(input, 'messages') AS input,
json_get_str((captured_output -> 0), 'content') as ideal
FROM runtime.task_history
WHERE task='ai_completion'

evals:
- name: mimic-user-queries
description: |
Evaluates how well a model can copy the exact answers already returned to a user. Useful for testing if a smaller/cheaper model is sufficient.
dataset: user_queries
scorers:
- match

models:
- name: gpt-4o
from: openai:gpt-4o
params:
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }

- name: llama3
from: huggingface:huggingface.co/meta-llama/Llama-3.2-3B-Instruct
params:
hf_token: ${ secrets:SPICE_HUGGINGFACE_API_KEY }
SELECT
input,
output,
actual
FROM eval.results
WHERE value=0.0 LIMIT 5;
SELECT
input,
output,
actual
FROM eval.results
WHERE value=0.0 LIMIT 5;
SELECT
model,
total_queries,
correct_answers,
ROUND(accuracy * 100, 2) as accuracy_percentage
FROM model_stats
ORDER BY accuracy_percentage DESC;
SELECT
model,
total_queries,
correct_answers,
ROUND(accuracy * 100, 2) as accuracy_percentage
FROM model_stats
ORDER BY accuracy_percentage DESC;