spiceai/docs

spiceai/

docs

Help Login

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/components/models/huggingface.md

spiceai/docs | Spice Cloud Platform

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/components/models/huggingface.md

spiceai/docs/README.md

title: 'HuggingFace' description: 'Instructions for using machine learning models hosted on HuggingFace with Spice.' sidebar_label: 'HuggingFace' sidebar_position: 4

To use a model hosted on HuggingFace, specify the huggingface.co path in the from field and, when needed, the files to include.

Configuration

`from`

The from key takes the form of huggingface:model_path. Below shows 2 common example of from key configuration.

huggingface:username/modelname: Implies the latest version of modelname hosted by username.
huggingface:huggingface.co/username/modelname:revision: Specifies a particular revision of modelname by username, including the optional domain.

The from key follows the following regex format.

The from key consists of five components:

Prefix: The value must start with huggingface:.
Domain (Optional): Optionally includes huggingface.co/ immediately after the prefix. Currently no other Huggingface compatible services are supported.
Organization/User: The HuggingFace organization (org).
Model Name: After a /, the model name (model).
Revision (Optional): A colon (:) followed by the git-like revision identifier (revision).

`name`

The model name. This will be used as the model ID within Spice and Spice's endpoints (i.e. http://localhost:8090/v1/models). This can be set to the same value as the model ID in the from field.

`params`

Param	Description	Default
`hf_token`	The Huggingface access token.	-
`model_type`	The architecture to load the model as. Supported values: `mistral`, `gemma`, `mixtral`, `llama`, `phi2`, `phi3`, `qwen2`, `gemma2`, `starcoder2`, `phi3.5moe`, `deepseekv2`, `deepseekv3`	-
`tools`	Which [tools] should be made available to the model. Set to `auto` to use all available tools.	-
`system_prompt`

`files`

The specific file path for Huggingface model. For example, GGUF model formats require a specific file path, other varieties (e.g. .safetensors) are inferred.

Example

Access Tokens

Access tokens can be provided for Huggingface models in two ways:

In the Huggingface token cache (i.e. ~/.cache/huggingface/token). Default.
Via model params.

Examples

Load a ML model to predict taxi trips outcomes

Load a LLM model to generate text

Load a private model

For more details on authentication, see access tokens.

:::warning[Limitations]

The throughput, concurrency & latency of a locally hosted model will vary based on the underlying hardware and model size. Spice supports Apple metal and CUDA for accelerated inference.
ML models currently only support ONNX file format. :::

Cookbook

Use the Llama family of models locally from HuggingFace using Spice. Running Llama3 Locally