The HTTP(s) Data Connector enables federated SQL query across supported file formats stored at an HTTP(s) endpoint. The connector supports dynamic query and data refresh through SQL-based filtering.
Custom HTTP headers can be specified for authentication, API keys, or other requirements. Headers are treated as sensitive data and will not be logged.
Headers can also be separated by semicolons:
fromThe from field specifies the HTTP(s) endpoint and can be configured in two ways:
Direct URL to a file: A complete URL pointing to a specific supported file.
Base domain/path: A base URL that will be combined with special metadata fields to construct the complete request.
The connector supports templated URLs with query parameters that can be dynamically populated using refresh_sql filters and special metadata fields.
nameThe dataset name. This will be used as the table name within Spice.
Example:
The dataset name cannot be a reserved keyword.
paramsThe connector supports authentication, timeout, connection pooling, and retry configuration via params.
| Parameter Name | Description |
|---|---|
http_port | Optional. Port to create HTTP(s) connection over. Default: 80 and 443 for HTTP and HTTPS respectively. |
http_username | Optional. Username for HTTP basic authentication. Default: None. |
http_password | Optional. Password for HTTP basic authentication. Default: None. Use the secret replacement syntax to load the password from a secret store, e.g. ${secrets:my_http_pass}. |
http_headers | Optional. Custom HTTP headers as a comma-separated list of key:value pairs. Example: Content-Type:application/json,Accept:application/json. Default: None. |
allowed_request_paths | Required for using request_path filters. Comma-separated list of allowed paths. Example: /api/users,/api/posts. Paths must start with / and cannot contain .. segments. |
request_query_filters |
When querying HTTP(s) datasets, Spice respects standard HTTP caching headers in responses. The connector supports the following cache-related response headers:
Cache-ControlThe Cache-Control response header from the HTTP(s) endpoint is passed through to clients querying Spice. When the HTTP(s) server returns a Cache-Control header with the stale-while-revalidate directive, clients can use this value to determine appropriate caching behavior.
For example, if the HTTP(s) endpoint returns:
Cache-Control: max-age=10, stale-while-revalidate=10
Clients querying Spice will receive this header and can:
The stale-while-revalidate behavior in Spice is controlled by the stale_while_revalidate_ttl parameter in the caching configuration. When stale_while_revalidate_ttl is set to 0 (default), stale data will not be served. When set to a non-zero value, Spice serves stale cache entries while revalidating in the background.
The HTTP connector provides advanced capabilities for working with dynamic APIs and RESTful services through special metadata fields.
The HTTP connector supports special metadata fields that provide fine-grained control over HTTP requests. These fields can be included in your dataset schema to dynamically construct request URLs and payloads.
:::warning Security Requirements For security, these metadata fields require explicit configuration to prevent unauthorized access:
request_path requires allowed_request_paths to be configured with glob patternsrequest_query requires request_query_filters: enabledrequest_body requires request_body_filters: enabled
:::| Field Name | Type | Description |
|---|---|---|
request_path | String | Specifies the URL path to append to the base URL from the from field. When using a base domain/path in from, request_path constructs the complete endpoint. Example: If from: https://api.example.com and request_path: /users/123, the request will be made to https://api.example.com/users/123. Requires allowed_request_paths parameter. |
request_query | String | Defines query parameters to append to the request URL. Formatted as a query string (e.g., key1=value1&key2=value2). These parameters are appended to the URL after any path specified in request_path. Requires request_query_filters: enabled. Maximum length: configurable via max_request_query_length (default: 1024 characters). |
request_body | String | Contains the request body for POST/PUT requests. Typically used with REST APIs that require a JSON or form-encoded payload. The content type should be specified using . Maximum size: configurable via (default: 16 KiB). |
These metadata fields work in combination:
from specifies a complete file URL, these fields are ignoredfrom specifies a base URL, these fields construct the full request dynamicallyrequest_path is appended to the base URLrequest_query is appended as query parametersrequest_body is sent as the request payload (requires appropriate HTTP method configuration)In addition to request metadata, the HTTP connector includes response metadata fields in the dataset schema. These fields capture information about the HTTP response and are available in SQL queries.
| Field Name | Type | Description |
|---|---|---|
content | String | The response body content. |
response_status | UInt16 | The HTTP status code of the response (e.g., 200, 404, 500). |
response_headers | Map(String, String) | The HTTP response headers as key-value pairs. Each header name maps to its value. Available for inspection in queries, e.g., to check content-type or custom headers returned by the API. |
fetched_at | Timestamp (Nanosecond) | The timestamp when the data was fetched. Uses the HTTP Date response header when available, falling back to the current system time. |
:::note When using caching refresh mode, transient HTTP error responses (5xx server errors and 429 Too Many Requests) are automatically excluded from the cache. These responses are still returned to the querying client but are not persisted, preventing temporary failures from polluting cached data. :::
The HTTP connector validates the configured endpoint during initialization to detect issues such as DNS errors, connection problems, or invalid URLs early in the startup process.
By default, the connector performs a health check by requesting a randomly generated path (e.g., /__spice_health_check_abc123def456) that is expected to return a 404 status. Any HTTP response, including 404 Not Found, indicates that the endpoint is reachable and the dataset will initialize successfully.
This default behavior works for most HTTP endpoints but may not be suitable for APIs that:
For endpoints that require a specific health check path, configure the health_probe parameter:
When a custom health probe is configured:
This provides more reliable validation for APIs with dedicated health check endpoints.
In this configuration, the health probe request to /api/status will include the authentication header, ensuring that the validation succeeds for APIs that require authentication on all endpoints.
The health_probe parameter has the following requirements:
/When using a base URL with special metadata fields, you can dynamically construct different API endpoints:
With the above configuration, you can query different endpoints by providing values for the special metadata fields:
The connector will construct requests like:
https://api.example.com/v1/users/123?include=profile,settingshttps://api.example.com/v1/data/upload with the JSON bodyThe allowed_request_paths parameter supports glob patterns to flexibly and securely match request paths. This provides a flexible way to configure path filtering without listing every possible endpoint.
Pattern Types:
Single wildcard (*): Matches any characters within a single path segment
/shows/* matches /shows/123 and /shows/breaking-bad/shows/* does not match /shows/123/episodes**Recursive wildcard (**)**: Matches any number of path segments
/api/** matches /api/users, /api/v1/users, and /api/v2/posts/123Character classes ([...]): Matches one character from a set
/api/v[0-9]/* matches /api/v1/users and /api/v2/posts/api/v[1-3]/* matches , , and Examples:
The special metadata fields can be combined with dynamic filters to create sophisticated data refresh patterns.
Query specific API endpoints dynamically:
This configuration:
request_path to specify the /events endpointrequest_query parameter using the latest timestamp from existing dataThis incrementally loads pages of data by:
This example demonstrates:
_body to send a JSON payload for a POST requestAPIs often return JSON data that requires parsing to extract specific fields. Spice provides JSON functions to process and transform JSON responses directly in SQL queries.
Extract specific fields from JSON responses:
APIs often return deeply nested JSON structures that require parsing to extract specific fields. Use chained JSON functions to navigate nested objects:
This demonstrates extracting nested objects step by step:
json_get(content, 'network') extracts the network objectjson_get_str(json_get(content, 'network'), 'name') gets the network name from the nested objectjson_get calls can be chained to navigate deeper levelsFor more details on available JSON functions including json_get, json_get_str, json_get_int, json_get_bool, and others, refer to the JSON functions reference.
The HTTP connector supports dynamic URL construction through refresh_sql with templated query parameters. This enables incremental data loading by appending filter conditions from the SQL query to the HTTP request URL.
When refresh_sql is specified with filters, the connector extracts filter conditions and appends them as query parameters to the URL. This is particularly useful for APIs that support filtering via query parameters.
In this example:
{start_time} and {end_time} placeholders in the URL are replaced with values extracted from the WHERE clause in refresh_sqlThe dynamic filter feature supports the following SQL operations:
=)>)<)>=)<=)BETWEENrefresh_sqlFor security and to prevent unauthorized access, the HTTP connector enforces the following constraints on special metadata fields:
request_path field cannot be used without configuring allowed_request_pathsallowed_request_paths must:
/.. path traversal segmentsallowed_request_paths list using:
* matches a single path segment (e.g., /shows/* matches /shows/123 but not /shows/123/episodes)** matches multiple path segments recursively (e.g., /api/** matches /api/v1/users and /api/v2/posts/123)[...] character classes (e.g., /api/v[0-9]/* matches but not )Example error when allowed_request_paths is not configured:
request_path filters are disabled for this dataset. Configure allowed_request_paths to enable them.
request_query field requires request_query_filters: enabledmax_request_query_length)? if presentExample error when query filters are not enabled:
request_query filters are disabled for this dataset. Enable request_query_filters to use them.
request_body field requires request_body_filters: enabledmax_request_body_bytes)request_body filter is present, the HTTP method automatically changes to POSTExample error when body filters are not enabled:
request_body filters are disabled for this dataset. Enable request_body_filters to use them.
To use the special metadata fields (request_path, request_query, request_body), you must:
request_path: Configure allowed_request_paths with a comma-separated list of allowed path patterns (supports glob patterns)request_query: Set request_query_filters: enabled in paramsrequest_body: Set request_body_filters: enabled in paramsExample minimal configuration for all three fields:
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the secret stores documentation. Additionally, learn how to use referenced secrets in component parameters by visiting the using referenced secrets guide.
Optional. Set to enabled to enable request_query filters. Default: disabled. When disabled, query parameter filters will be rejected. |
request_body_filters | Optional. Set to enabled to enable request_body filters for POST requests. Default: disabled. When disabled, request body filters will be rejected. |
client_timeout | Optional. Maximum time to wait for a response from the HTTP server (in seconds). Default: 30. Supports duration formats like 30s, 1m, 500ms, 2m30s. Applied to the entire request-response cycle. |
connect_timeout | Optional. Timeout for establishing HTTP(s) connections (in seconds). Default: 10. |
pool_max_idle_per_host | Optional. Maximum number of idle connections to keep alive per host. Default: 10. |
pool_idle_timeout | Optional. Timeout for idle connections in the pool (in seconds). Default: 90. |
max_retries | Optional. Maximum number of retries for failed HTTP requests. Default: 3. |
retry_backoff_method | Optional. Retry backoff strategy: fibonacci (default), linear, or exponential. |
retry_max_duration | Optional. Maximum total duration for all retries (e.g., 30s, 5m). If not set, retries continue up to max_retries. |
retry_jitter | Optional. Randomization factor for retry delays (0.0 to 1.0). Default: 0.3 (30% randomization). Set to 0 for no jitter. |
max_request_query_length | Optional. Maximum length in characters for request_query filter values. Default: 1024. Maximum: 4096. |
max_request_body_bytes | Optional. Maximum size in bytes for request_body filter values. Default: 16384 (16 KiB). Maximum: 65536 (64 KiB). |
health_probe | Optional. Custom health probe path for endpoint validation during initialization (e.g., /health, /api/status). The endpoint must return a 2xx status code to pass validation. If not set, a random path is used and any status (including 404) is accepted. Must start with /. |
http_headersrequest_body_filters: enabled.max_request_body_bytes/api/v1/users/api/v2/posts/api/v3/data/api/v1/users/api/v10/usersrequest_path filters are rejected