spiceai/docs

spiceai/

docs

Help Login

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/components/data-connectors/https.md

spiceai/docs | Spice Cloud Platform

trunk

Edit on GitHub

Fork

/docs/website/versioned_docs/version-1.11.x/components/data-connectors/https.md

spiceai/docs/README.md

title: 'HTTP(s) Data Connector' sidebar_label: 'HTTP(s) Data Connector' description: 'HTTP(s) Data Connector Documentation' pagination_prev: null

The HTTP(s) Data Connector enables federated SQL query across supported file formats stored at an HTTP(s) endpoint. The connector supports dynamic query and data refresh through SQL-based filtering.

Examples

Basic Example

Using Basic Authentication

Using Custom Headers

Custom HTTP headers can be specified for authentication, API keys, or other requirements. Headers are treated as sensitive data and will not be logged.

Headers can also be separated by semicolons:

Configuration

`from`

The from field specifies the HTTP(s) endpoint and can be configured in two ways:

Direct URL to a file: A complete URL pointing to a specific supported file.
Base domain/path: A base URL that will be combined with special metadata fields to construct the complete request.

The connector supports templated URLs with query parameters that can be dynamically populated using refresh_sql filters and special metadata fields.

`name`

The dataset name. This will be used as the table name within Spice.

Example:

The dataset name cannot be a reserved keyword.

`params`

The connector supports authentication, timeout, connection pooling, and retry configuration via params.

Parameter Name	Description
`http_port`	Optional. Port to create HTTP(s) connection over. Default: 80 and 443 for HTTP and HTTPS respectively.
`http_username`	Optional. Username for HTTP basic authentication. Default: None.
`http_password`	Optional. Password for HTTP basic authentication. Default: None. Use the secret replacement syntax to load the password from a secret store, e.g. `${secrets:my_http_pass}`.
`http_headers`	Optional. Custom HTTP headers as a comma-separated list of `key:value` pairs. Example: `Content-Type:application/json,Accept:application/json`. Default: None.
`allowed_request_paths`	Required for using `request_path` filters. Comma-separated list of allowed paths. Example: `/api/users,/api/posts`. Paths must start with `/` and cannot contain `..` segments.
`request_query_filters`

HTTP Response Headers

When querying HTTP(s) datasets, Spice respects standard HTTP caching headers in responses. The connector supports the following cache-related response headers:

`Cache-Control`

The Cache-Control response header from the HTTP(s) endpoint is passed through to clients querying Spice. When the HTTP(s) server returns a Cache-Control header with the stale-while-revalidate directive, clients can use this value to determine appropriate caching behavior.

For example, if the HTTP(s) endpoint returns:

Cache-Control: max-age=10, stale-while-revalidate=10

Clients querying Spice will receive this header and can:

Serve fresh data for 10 seconds after fetching.
Between 10-20 seconds, serve stale data while fetching fresh data in the background.
After 20 seconds, fetch fresh data before serving the next request.

The stale-while-revalidate behavior in Spice is controlled by the stale_while_revalidate_ttl parameter in the caching configuration. When stale_while_revalidate_ttl is set to 0 (default), stale data will not be served. When set to a non-zero value, Spice serves stale cache entries while revalidating in the background.

Advanced Features

The HTTP connector provides advanced capabilities for working with dynamic APIs and RESTful services through special metadata fields.

Special Metadata Fields

The HTTP connector supports special metadata fields that provide fine-grained control over HTTP requests. These fields can be included in your dataset schema to dynamically construct request URLs and payloads.

:::warning Security Requirements For security, these metadata fields require explicit configuration to prevent unauthorized access:

request_path requires allowed_request_paths to be configured with glob patterns
request_query requires request_query_filters: enabled
request_body requires request_body_filters: enabled :::

Field Name	Type	Description
`request_path`	String	Specifies the URL path to append to the base URL from the `from` field. When using a base domain/path in `from`, `request_path` constructs the complete endpoint. Example: If `from: https://api.example.com` and `request_path: /users/123`, the request will be made to `https://api.example.com/users/123`. Requires `allowed_request_paths` parameter.
`request_query`	String	Defines query parameters to append to the request URL. Formatted as a query string (e.g., `key1=value1&key2=value2`). These parameters are appended to the URL after any path specified in `request_path`. Requires `request_query_filters: enabled`. Maximum length: configurable via `max_request_query_length` (default: 1024 characters).
`request_body`	String	Contains the request body for POST/PUT requests. Typically used with REST APIs that require a JSON or form-encoded payload. The content type should be specified using . Maximum size: configurable via (default: 16 KiB).

These metadata fields work in combination:

If from specifies a complete file URL, these fields are ignored
If from specifies a base URL, these fields construct the full request dynamically
request_path is appended to the base URL
request_query is appended as query parameters
request_body is sent as the request payload (requires appropriate HTTP method configuration)

Response Metadata Fields

In addition to request metadata, the HTTP connector includes response metadata fields in the dataset schema. These fields capture information about the HTTP response and are available in SQL queries.

Field Name	Type	Description
`content`	String	The response body content.
`response_status`	UInt16	The HTTP status code of the response (e.g., `200`, `404`, `500`).
`response_headers`	Map(String, String)	The HTTP response headers as key-value pairs. Each header name maps to its value. Available for inspection in queries, e.g., to check `content-type` or custom headers returned by the API.
`fetched_at`	Timestamp (Nanosecond)	The timestamp when the data was fetched. Uses the HTTP `Date` response header when available, falling back to the current system time.

Querying Response Metadata

:::note When using caching refresh mode, transient HTTP error responses (5xx server errors and 429 Too Many Requests) are automatically excluded from the cache. These responses are still returned to the querying client but are not persisted, preventing temporary failures from polluting cached data. :::

Endpoint Validation

The HTTP connector validates the configured endpoint during initialization to detect issues such as DNS errors, connection problems, or invalid URLs early in the startup process.

Default Validation Behavior

By default, the connector performs a health check by requesting a randomly generated path (e.g., /__spice_health_check_abc123def456) that is expected to return a 404 status. Any HTTP response, including 404 Not Found, indicates that the endpoint is reachable and the dataset will initialize successfully.

This default behavior works for most HTTP endpoints but may not be suitable for APIs that:

Return error responses for unknown paths without proper HTTP status codes
Have strict path validation that rejects requests to non-existent endpoints
Require authentication for all paths, including health check endpoints

Custom Health Probe

For endpoints that require a specific health check path, configure the health_probe parameter:

When a custom health probe is configured:

The connector validates the endpoint by requesting the specified path
The health probe endpoint must return a 2xx status code (200-299) for validation to succeed
If the health probe returns a non-2xx status code, the dataset will fail to initialize with an error message

This provides more reliable validation for APIs with dedicated health check endpoints.

Example with Authentication

In this configuration, the health probe request to /api/status will include the authentication header, ensuring that the validation succeeds for APIs that require authentication on all endpoints.

Health Probe Requirements

The health_probe parameter has the following requirements:

Must start with /
Cannot exceed 2048 characters in length
The target endpoint must return a 2xx HTTP status code for validation to succeed

Advanced Usage

Using Special Metadata Fields with Base URL

When using a base URL with special metadata fields, you can dynamically construct different API endpoints:

With the above configuration, you can query different endpoints by providing values for the special metadata fields:

The connector will construct requests like:

https://api.example.com/v1/users/123?include=profile,settings
https://api.example.com/v1/data/upload with the JSON body

Securing Paths with Glob Patterns

The allowed_request_paths parameter supports glob patterns to flexibly and securely match request paths. This provides a flexible way to configure path filtering without listing every possible endpoint.

Pattern Types:

Single wildcard (*): Matches any characters within a single path segment
- Example: /shows/* matches /shows/123 and /shows/breaking-bad
- Does not match across path separators: /shows/* does not match /shows/123/episodes
**Recursive wildcard (**)**: Matches any number of path segments
- Example: /api/** matches /api/users, /api/v1/users, and /api/v2/posts/123
- Use for flexible API version matching or deep hierarchies
Character classes ([...]): Matches one character from a set
- Example: /api/v[0-9]/* matches /api/v1/users and /api/v2/posts
- Example: /api/v[1-3]/* matches , , and

Examples:

Dynamic Filters with Metadata Fields

The special metadata fields can be combined with dynamic filters to create sophisticated data refresh patterns.

Dynamic API Queries with SQL

Query specific API endpoints dynamically:

Incremental Loading with Metadata Fields

This configuration:

Uses request_path to specify the /events endpoint
Dynamically constructs the request_query parameter using the latest timestamp from existing data
On each refresh, only fetches events created after the last refresh

Paginated Data Loading

This incrementally loads pages of data by:

Tracking the last loaded page number
Constructing the next page query parameter
Fetching 100 records per page

POST Request with Dynamic Body

This example demonstrates:

Using _body to send a JSON payload for a POST request
Executing complex search queries against REST APIs
Fetching results based on structured query syntax

Processing JSON Responses

APIs often return JSON data that requires parsing to extract specific fields. Spice provides JSON functions to process and transform JSON responses directly in SQL queries.

Extracting Fields from JSON

Extract specific fields from JSON responses:

Working with Nested JSON

APIs often return deeply nested JSON structures that require parsing to extract specific fields. Use chained JSON functions to navigate nested objects:

This demonstrates extracting nested objects step by step:

json_get(content, 'network') extracts the network object
json_get_str(json_get(content, 'network'), 'name') gets the network name from the nested object
Multiple json_get calls can be chained to navigate deeper levels

Extracting Multiple Fields

Processing JSON Arrays

For more details on available JSON functions including json_get, json_get_str, json_get_int, json_get_bool, and others, refer to the JSON functions reference.

Refresh SQL with Dynamic Filters

The HTTP connector supports dynamic URL construction through refresh_sql with templated query parameters. This enables incremental data loading by appending filter conditions from the SQL query to the HTTP request URL.

How It Works

When refresh_sql is specified with filters, the connector extracts filter conditions and appends them as query parameters to the URL. This is particularly useful for APIs that support filtering via query parameters.

Time-Based Incremental Loading

In this example:

The {start_time} and {end_time} placeholders in the URL are replaced with values extracted from the WHERE clause in refresh_sql
Each refresh appends only new data since the last refresh
The connector automatically maps SQL filter conditions to URL query parameters

Supported Filter Operations

The dynamic filter feature supports the following SQL operations:

Equality comparisons (=)
Greater than (>)
Less than (<)
Greater than or equal (>=)
Less than or equal (<=)
Range queries with BETWEEN

Notes

URL parameters must match filter column names in the refresh_sql
Only filters that can be pushed down to the HTTP source will be applied to the URL
Complex filters may not be supported for URL templating

Limitations

Security Constraints

For security and to prevent unauthorized access, the HTTP connector enforces the following constraints on special metadata fields:

Request Path Limitations

Explicit Allow-List Required: The request_path field cannot be used without configuring allowed_request_paths
Path Pattern Format: All patterns in allowed_request_paths must:
- Start with /
- Not contain .. path traversal segments
- Not exceed 2048 characters in length
Glob Pattern Matching: Query filters are matched against glob patterns in the allowed_request_paths list using:
- * matches a single path segment (e.g., /shows/* matches /shows/123 but not /shows/123/episodes)
- ** matches multiple path segments recursively (e.g., /api/** matches /api/v1/users and /api/v2/posts/123)
- [...] character classes (e.g., /api/v[0-9]/* matches but not )

Example error when allowed_request_paths is not configured:

request_path filters are disabled for this dataset. Configure allowed_request_paths to enable them.

Request Query Limitations

Explicit Enable Required: The request_query field requires request_query_filters: enabled
Length Limit: Query strings are limited to 1024 characters by default (configurable up to 4096 via max_request_query_length)
Control Characters: Query strings cannot contain control characters
Leading Question Mark: The connector automatically strips leading ? if present

Example error when query filters are not enabled:

request_query filters are disabled for this dataset. Enable request_query_filters to use them.

Request Body Limitations

Explicit Enable Required: The request_body field requires request_body_filters: enabled
Size Limit: Request bodies are limited to 16 KiB (16,384 bytes) by default (configurable up to 64 KiB via max_request_body_bytes)
POST Method: When a request_body filter is present, the HTTP method automatically changes to POST

Example error when body filters are not enabled:

request_body filters are disabled for this dataset. Enable request_body_filters to use them.

Configuration Requirements

To use the special metadata fields (request_path, request_query, request_body), you must:

For request_path: Configure allowed_request_paths with a comma-separated list of allowed path patterns (supports glob patterns)
For request_query: Set request_query_filters: enabled in params
For request_body: Set request_body_filters: enabled in params

Example minimal configuration for all three fields:

Performance Considerations

Connection Pooling: The connector maintains up to 10 idle connections per host by default
Retry Overhead: With the default 3 retries and Fibonacci backoff, failed requests may take several seconds before returning an error
Cache Behavior: HTTP responses are cached based on the combination of path, query, and body parameters

Secrets

Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the secret stores documentation. Additionally, learn how to use referenced secrets in component parameters by visiting the using referenced secrets guide.