title: 'Language Model Overrides' sidebar_label: 'Parameter Overrides' description: 'Learn how to override default LLM hyperparameters in Spice.' sidebar_position: 5 pagination_prev: null pagination_next: null tags:
The v1/chat/completion endpoint is compatible with OpenAI's API. It supports a subset of request body parameters defined in the OpenAI reference documentation. Spice helps configure different defaults for these request parameters.
Supported parameters:
frequency_penaltylogit_biaslogprobsmax_completion_tokensmetadatanparallel_tool_callspresence_penaltyresponse_formatseedstopstorestreamstream_optionstemperaturetool_choicetoolstop_logprobstop_puser:::warning[Deprecated Default Overrides Parameters]
The openai_ prefix is deprecated for non-OpenAI model providers. Use the model provider prefix instead.
:::
To specify a default override for a parameter, use the model provider prefix followed by the parameter name. For example, to set the temperature parameter to 0.1 for all requests with this model for Hugging Face model, use hf_temperature: 0.1. A temperature parameter in the request body will still override the default.
When sending this payload to spice /v1/chat/completions:
Will be passed to the OpenAI API as:
In addition to any system prompts provided in message dialogue, or added by model providers, Spice can configure an additional system prompt.
Any request to HTTP v1/chat/completion or v1/responses will include the configured system prompt. For v1/responses, the system prompt is set as the instructions field. If client-provided instructions are also included in the request, both are combined: the configured system prompt appears first, followed by the client instructions.
This example demonstrates how to create a specialized math tutoring model by combining system prompts with structured JSON output. The configuration ensures consistent, step-by-step mathematical solutions in a machine-readable format.
To use the configured math tutor, send a simple request to the chat completions endpoint:
Example response:
Visit OpenAI Structured Outputs for more information on how to use structured output formats.
Spice supports provider-aware prompt caching to reduce latency and cost for repeated prompts. Set prompt_cache_key on a model to enable the provider's native caching mechanism.
When prompt_cache_key is set as a model default, it is injected into every Chat API and Responses API request to that model (unless the request itself provides one). The key is mapped into the appropriate provider-native mechanism — for example, Anthropic's cache_control, xAI's x-grok-conv-id header, or Bedrock's CachePoint block. See the full provider mapping.
For the OpenAI Responses API, prompt_cache_retention can also be set to request a retention duration (e.g. "24h").
The prompt_cache_key can also be passed per-request in the /v1/nsql API body to enable caching for text-to-SQL queries.
For local models using mistral-rs, paged-attention scheduling is enabled automatically on supported backends (CUDA + Unix) for KV-cache prefix reuse — no configuration is needed.