Connect with us on Discord - your feedback is appreciated!
Demo Steps
Publishing a Spice App in the Cloud
Step 1: Forking and Using the Dataset
Fork the repository https://github.com/jeadie/evals into your GitHub org.
Step 2: Creating a New App in the Cloud
Log into the Spice.ai Cloud Platform and create a new app called evals. The app will start empty.
Connect the app to your repository:
Go to the App Settings tab and select Connect Repository.
If the repository is not yet linked, follow the prompts to authenticate and link it.
Step 3: Deploying the App
Set the app to Public:
Navigate to the app's settings and toggle the visibility to public.
Redeploy the app:
Click Redeploy to load the datasets and configurations from the repository.
Step 4: Verifying and Testing
Check the datasets in the Spice.ai Cloud:
Verify that the datasets are correctly loaded and accessible.
Test public access:
Log in with a different account to confirm the app is accessible to external users.
Initializing a Local Spice App
Initialize a new local Spice app
Login to Spice.ai Cloud
Get spicepod from Spicerack
Navigate to spicerack.org, search for evals.
Click on /evals, click on Use this app, and copy the spice connect command.
Paste the command into the terminal.
Navigate to spicerack.org, search for evals, click on /evals, click on Use this app, and copy the spice connect command. Paste the command into the terminal.
The spicepod.yml should be updated to:
Add a model to the spicepod
Start spice
Run an eval
Explore incorrect results
Optional: Create an Eval to Use a Smaller Model
Track the outputs of all AI model calls:
Define a new view and evaluation:
Add a smaller model to the spicepod:
Verify models are loaded:
You should see both models listed:
Restart the Spice app:
Test the larger model or run another eval:
Run evaluations on both models:
Compare model performance:
This query will show:
Total number of queries processed
Number of correct answers
Accuracy percentage as a percentage
You can use these metrics to decide if the smaller model provides acceptable performance for your use case.