Model2Vec is a technique that distills embeddings from sentence transformer models into static word embeddings, providing efficient embedding generation, in parallel, without performing external API calls. This can result in sentence transformer models up to 500x faster and 15x smaller.
To use a Model2Vec embedding model with Spice, specify the model2vec prefix in the from field of your configuration.
Find models compatible with model2vec:
The following parameters are specific to Model2Vec models:
| Parameter | Description | Default |
|---|---|---|
hf_token | The Hugging Face access token for accessing private models. | - |
normalize | Whether to normalize embeddings (defaults to the model's configuration). | Model's default setting |
subfolder | Optional subfolder path for models that reside in a subfolder of the repo/path. | - |
parallelism | Number of parallel threads to use for embedding computation. | System CPU count |
embed_max_token_length | Maximum token length for embeddings. | - |
embed_custom_batch_size | Custom batch size override for embedding operations. | - |
For more details on Model2Vec parameters and functionality, refer to the model2vec-rs documentation.
Example configuration in spicepod.yaml for minishlab/potion-base-8m:
Model2Vec models can also be loaded from the local filesystem by specifying a file path:
Model2Vec supports private Hugging Face models with authentication:
For performance optimization, configure parallelism and embedding batch sizes:
Create custom Model2Vec embeddings by distilling existing sentence transformer models. For more detailed instructions, see the Model2Vec Quickstart guide. Here's how to distill the popular sentence-transformers/all-MiniLM-L6-v2 model:
Install the Model2Vec Python library:
Create a distillation script:
Use the distilled model with Spice:
Race!
Compare the throughput of the distilled embedding model with the full version by declaring both models in the same Spicepod. This uses example Wikipedia article data from Kaggle:
Start Spice with spice run:
Performance Results:
Note: The dramatic results are due to model2vec embedding execution being parallelized across all of the host's cores (default configuration). Per core, model2vec achieves a throughput of 300/400 rows/sec with this corpus. This specific test machine has 16 cores. Execution of SBERT models is currently not parallelized.
| Model Name | Model Type | Records Processed | Throughput (records/sec) |
|---|---|---|---|
sentence-transformers/all-MiniLM-L6-v2 | Model2Vec (Distilled) | 278,528 | ~4,043 |
sentence-transformers/all-MiniLM-L6-v2 | SBERT (Full) | 100 | ~1.1 |
| Performance Gain (model2vec) | - | - | ~3,675x faster |