:::info Spice is built on Apache DataFusion and uses the PostgreSQL dialect, even when querying datasources with different SQL dialects. When using a data accelerator like DuckDB, function support is specific to each acceleration engine, and not all functions are supported by all acceleration engines. :::
Scalar functions help transform, compute, and manipulate data at the row level. These functions are evaluated for each row in a query result and return a single value per invocation. Spice.ai supports a broad set of scalar functions, including math, string, conditional, date/time, array, struct, map, regular expression, and hashing functions. The function set closely follows the PostgreSQL dialect.
Spark-compatible scalar functions registered by Spice are documented here only when their Spark-specific behavior differs from the PostgreSQL equivalent, or when they have no PostgreSQL analogue. For functions not listed below, refer to the Spark SQL built-in function reference for semantics.
Math functions in Spice.ai SQL help perform numeric calculations, transformations, and analysis. These functions operate on numeric expressions, which can be constants, columns, or results of other functions and operators. The following math functions are supported:
absReturns the absolute value of a numeric expression. If the input is negative, the result is its positive equivalent; if the input is positive or zero, the result is unchanged.
acosReturns the arc cosine (inverse cosine) of a numeric expression. The input must be in the range [-1, 1]. The result is in radians.
acoshReturns the inverse hyperbolic cosine of a numeric expression. The input must be greater than or equal to 1.
asinReturns the arc sine (inverse sine) of a numeric expression. The input must be in the range [-1, 1]. The result is in radians.
asinhReturns the inverse hyperbolic sine of a numeric expression.
atanReturns the arc tangent (inverse tangent) of a numeric expression. The result is in radians.
atan2Returns the arc tangent of the quotient of its arguments, that is, atan(expression_y / expression_x). The result is in radians and takes into account the signs of both arguments to determine the correct quadrant.
atanhReturns the inverse hyperbolic tangent of a numeric expression. The input must be in the range (-1, 1).
cbrtReturns the cube root of a numeric expression.
ceilReturns the smallest integer greater than or equal to the input value.
cosReturns the cosine of a numeric expression, where the input is in radians.
coshReturns the hyperbolic cosine of a numeric expression.
cotReturns the cotangent of a numeric expression, where the input is in radians.
degreesConverts radians to degrees.
expReturns the value of e (Euler's number) raised to the power of the input value.
factorialReturns the factorial of a non-negative integer. For values less than 2, returns 1.
floorReturns the largest integer less than or equal to the input value.
gcdReturns the greatest common divisor of two integer expressions. If both inputs are zero, returns 0.
isnanReturns true if the input is NaN (not a number), otherwise returns false.
iszeroReturns true if the input is +0.0 or -0.0, otherwise returns false.
lcmReturns the least common multiple of two integer expressions. If either input is zero, returns 0.
modReturns the remainder after dividing the first argument by the second, matching the Spark SQL %/mod semantics for signed values.
Reference: Spark SQL mod.
pmodReturns a positive remainder for integer or floating-point division. When the standard remainder is negative, the divisor is added to produce a non-negative result, mirroring Spark SQL behavior.
Reference: Spark SQL pmod.
lnReturns the natural logarithm (base e) of a numeric expression.
logReturns the logarithm of a numeric expression. If a base is provided, returns the logarithm to that base; otherwise, returns the base-10 logarithm.
log10Returns the base-10 logarithm of a numeric expression.
log2Returns the base-2 logarithm of a numeric expression.
nanvlReturns the first argument if it is not NaN; otherwise, returns the second argument.
piReturns an approximate value of π (pi).
pow and powerReturns the value of the first argument raised to the power of the second argument. pow is an alias for power.
radiansConverts degrees to radians.
randomReturns a random floating-point value in the range [0, 1). The random seed is unique for each row.
roundRounds a numeric expression to the nearest integer or to a specified number of decimal places.
rintRounds a double-precision value to the nearest integer using IEEE-754 "round to nearest, ties to even" rules and returns the rounded value as a floating-point number, matching Spark SQL semantics.
Reference: Spark SQL rint.
signumReturns the sign of a numeric expression. Returns -1 for negative numbers, 1 for zero and positive numbers.
sinReturns the sine of a numeric expression, where the input is in radians.
sinhReturns the hyperbolic sine of a numeric expression.
sqrtReturns the square root of a numeric expression.
tanReturns the tangent of a numeric expression, where the input is in radians.
tanhReturns the hyperbolic tangent of a numeric expression.
truncTruncates a numeric expression to a whole number or to a specified number of decimal places. If decimal_places is positive, truncates digits to the right of the decimal point; if negative, truncates digits to the left.
width_bucketAssigns a value to an equiwidth histogram bucket. Returns 0 when the value is below min_value, num_bucket + 1 when it is above max_value, and otherwise the 1-based bucket index, mirroring Spark SQL behavior.
Reference: Spark SQL width_bucket.
Conditional functions help handle null values, select among alternatives, and compare multiple expressions. These are useful for data cleaning and conditional logic in queries.
CASEStandard SQL CASE expression, supported in both simple and searched forms.
coalesceReturns the first non-null value from its arguments. Returns NULL only if every argument is NULL.
greatestReturns the largest value among the arguments, ignoring NULLs. Returns NULL only if every argument is NULL.
ifEvaluates a boolean condition and returns one of two expressions, matching the Spark SQL if function semantics.
Reference: Spark SQL if.
leastReturns the smallest value among the arguments, ignoring NULLs. Returns NULL only if every argument is NULL.
nullifReturns NULL if expression1 equals expression2; otherwise returns expression1. Useful for converting sentinel values to NULL.
expression2.nvlReturns expression2 when expression1 is NULL; otherwise returns expression1. Equivalent to coalesce(expression1, expression2).
Alias: ifnull.
nvl2Returns expression2 if expression1 is not NULL; otherwise returns expression3.
String functions in Spice.ai SQL help manipulate, analyze, and transform text data. These functions operate on string expressions, which can be constants, columns, or results of other functions. The implementation closely follows the PostgreSQL dialect. The following string functions are supported:
asciiReturns the Unicode code point of the first character in a string. If the string is empty, returns 0.
Related function: chr
bit_lengthReturns the number of bits in the string. Each character is counted according to its byte representation (8 bits per byte).
Related functions: length, octet_length
btrimRemoves the longest string containing only characters in trim_str from the start and end of str. If trim_str is omitted, whitespace is removed.
Alternative syntax: trim(BOTH trim_str FROM str) or trim(trim_str FROM str)
Aliases: trim
Related functions: ltrim, rtrim
char_lengthAlias of character_length.
character_lengthReturns the number of characters in a string, not bytes. Handles Unicode correctly.
Aliases: length, char_length
Related functions: bit_length, octet_length
chrReturns the character with the specified Unicode code point.
Related function: ascii
concatConcatenates two or more strings into a single string.
Related function: concat_ws
concat_wsConcatenates strings using a separator between each value.
Related function: concat
containsReturns true if search_str is found within str. The search is case-sensitive.
likePerforms SQL pattern matching using % to match zero or more characters and _ to match a single character. The optional SQL ESCAPE clause can be used to treat a wildcard literally, matching Spark SQL behavior.
% and _ wildcards.Reference: Spark SQL like.
ilikeCase-insensitive variant of like that treats ASCII characters in str and pattern without regard to case. The optional SQL ESCAPE clause may be used to treat % or _ literally.
% and _ wildcards.Reference: Spark SQL ilike.
ends_withReturns true if str ends with the substring substr.
find_in_setReturns the position (1-based) of str in the comma-separated list strlist. Returns 0 if not found.
initcapCapitalizes the first character of each word in the string. Words are delimited by non-alphanumeric characters.
Related functions: lower, upper
instrAlias of strpos.
leftReturns the first n characters from the left side of the string.
Related function: right
lengthAlias of character_length.
levenshteinReturns the Levenshtein distance between two strings.
lowerConverts all characters in the string to lower case.
Related functions: initcap, upper
luhn_checkValidates that a string of digits satisfies the Luhn checksum, returning true for valid numbers and false otherwise. This matches the Spark SQL implementation and is useful for validating identifiers such as credit card numbers.
Reference: Spark SQL luhn_check.
lpadPads the left side of the string with another string until the result reaches the specified length. If the padding string is omitted, a space is used.
Related function: rpad
ltrimRemoves the longest string containing only characters in trim_str from the start of str. If trim_str is omitted, whitespace is removed.
Alternative syntax: trim(LEADING trim_str FROM str)
Related functions: btrim, rtrim
octet_lengthReturns the number of bytes in the string.
Related functions: bit_length, length
overlayReplaces a substring of str with substr, starting at position pos for count characters. If count is omitted, uses the length of substr.
parse_urlExtracts a component from a URL, or retrieves an individual query parameter when provided a key, following Spark SQL semantics. Supported parts include HOST, PATH, QUERY, REF, PROTOCOL, FILE, and AUTHORITY.
QUERY part.Reference: Spark SQL parse_url.
positionAlias of strpos.
repeatReturns a string consisting of the input string repeated n times.
replaceReplaces all occurrences of substr in str with replacement.
reverseReturns the string with the character order reversed.
rightReturns the last n characters from the right side of the string.
Related function: left
rpadPads the right side of the string with another string until the result reaches the specified length. If the padding string is omitted, a space is used.
Related function: lpad
rtrimRemoves the longest string containing only characters in trim_str from the end of str. If trim_str is omitted, whitespace is removed.
Alternative syntax: trim(TRAILING trim_str FROM str)
Related functions: btrim, ltrim
split_partSplits the string on the specified delimiter and returns the substring at the given position (1-based).
starts_withReturns true if str starts with the substring substr.
strposReturns the position (1-based) of the first occurrence of substr in str. Returns 0 if not found.
Alternative syntax: position(substr in origstr)
Aliases: instr, position
substrExtracts a substring from str, starting at start_pos for length characters. If length is omitted, returns the rest of the string.
Alternative syntax: substring(str from start_pos for length)
Aliases: substring
substr_indexReturns the substring from str before or after a specified number of occurrences of the delimiter delim. If count is positive, returns everything to the left of the final delimiter (counting from the left). If count is negative, returns everything to the right of the final delimiter (counting from the right).
Aliases: substring_index
substringAlias of substr.
substring_indexAlias of substr_index.
to_hexConverts an integer to its hexadecimal string representation.
translateReplaces each character in str that matches a character in chars with the corresponding character in translation. If translation is shorter than chars, extra characters are removed.
trimAlias of btrim.
upperConverts all characters in the string to upper case.
Related functions: initcap, lower
uuidReturns a UUID v4 string value that is unique per row.
Binary string functions help encode and decode binary data, such as base64 and hexadecimal conversions. These are useful for working with encoded data or binary blobs.
bit_getReturns the bit (0 or 1) at the specified zero-based position when counting from the least-significant bit of an integral or binary expression, matching Spark SQL semantics.
Reference: Spark SQL bit_get.
bit_countCounts the number of set bits in an integral or binary expression. Useful for quick popcount operations on bitmaps or packed flags, aligned with Spark SQL behavior.
Reference: Spark SQL bit_count.
bitmap_countReturns the number of set bits in a binary bitmap produced by functions such as bitmap_construct_agg, mirroring the Spark SQL implementation.
Reference: Spark SQL bitmap_count.
Regular expression functions help match, extract, and replace patterns in strings. Spice.ai uses a PCRE-like regular expression syntax. Spice supports the following regular expressions:
regexp_likeReturns true if a regular expression has at least one match in a string, false otherwise.
regexp_matchReturns the first regular expression matches in a string.
regexp_replaceReplaces substrings in a string that match a regular expression.
regexp_countReturns the number of matches that a regular expression has in a string.
regexp_instrReturns the position in a string where the specified occurrence of a POSIX regular expression is located.
Time and date functions help extract, format, and manipulate temporal data. Functions include current_date, now, date_part, date_trunc, and various conversion functions. These are essential for time series analysis and working with timestamps.
current_dateReturns the current UTC date.
The current_date() return value is determined at query time and will return the same date, no matter when in the query plan the function executes.
current_timeReturns the current UTC time.
The current_time() return value is determined at query time and will return the same time, no matter when in the query plan the function executes.
current_timestampAlias of now.
date_binCalculates time intervals and returns the start of the interval nearest to the specified timestamp. Use date_bin to downsample time series data by grouping rows into time-based "bins" or "windows" and applying an aggregate or selector function to each window.
For example, if you "bin" or "window" data into 15 minute intervals, an input timestamp of 2023-01-01T18:18:18Z will be updated to the start time of the 15 minute bin it is in: 2023-01-01T18:15:00Z.
date_addAdds a number of days to a DATE or TIMESTAMP expression, matching Spark SQL semantics. Negative offsets move backwards in time.
Reference: Spark SQL date_add.
date_subSubtracts a number of days from a DATE or TIMESTAMP expression using Spark-compatible behavior.
Reference: Spark SQL date_sub.
last_dayReturns the last day of the month that contains the input date or timestamp, matching Spark SQL semantics.
Reference: Spark SQL last_day.
next_dayReturns the first date after start_date that matches the requested day of week. Valid day names include full names (e.g., Monday) or abbreviations such as Mon, matching Spark SQL behavior.
Reference: Spark SQL next_day.
date_formatAlias of to_char.
date_partReturns the specified part of the date as an integer.
part: Part of the date to return. The following date parts are supported:
expression: Time expression to operate on. Can be a constant, column, or function.
date_truncTruncates a timestamp value to a specified precision.
precision: Time precision to truncate to. The following precisions are supported:
expression: Time expression to operate on. Can be a constant, column, or function.
datepartAlias of date_part.
datetruncAlias of date_trunc.
from_unixtimeConverts an integer to RFC3339 timestamp format (YYYY-MM-DDT00:00:00.000000000Z). Integers and unsigned integers are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z) return the corresponding timestamp.
make_dateMake a date from year/month/day component parts.
nowReturns the current UTC timestamp.
The now() return value is determined at query time and will return the same timestamp, no matter when in the query plan the function executes.
to_charReturns a string representation of a date, time, timestamp or duration based on a Chrono format. Unlike the PostgreSQL equivalent of this function numerical formatting is not supported.
to_dateConverts a value to a date (YYYY-MM-DD).
Supports strings, integer and double types as input.
Strings are parsed as YYYY-MM-DD (e.g. '2023-07-20') if no Chrono formats are provided.
Integers and doubles are interpreted as days since the unix epoch (1970-01-01T00:00:00Z).
Returns the corresponding date.
Note: to_date returns Date32, which represents its values as the number of days since unix epoch(1970-01-01) stored as signed 32 bit value. The largest supported date value is 9999-12-31.
to_local_timeConverts a timestamp with a timezone to a timestamp without a timezone (with no offset or timezone information). This function handles daylight saving time changes.
to_timestampConverts a value to a timestamp (YYYY-MM-DDT00:00:00Z). Supports strings, integer, unsigned integer, and double types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no [Chrono formats] are provided. Integers, unsigned integers, and doubles are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
Note: to_timestamp returns Timestamp(Nanosecond). The supported range for integer input is between -9223372037 and 9223372036. Supported range for string input is between 1677-09-21T00:12:44.0 and 2262-04-11T23:47:16.0. Please use to_timestamp_seconds for the input outside of supported bounds.
to_timestamp_microsConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as microseconds since the unix epoch (1970-01-01T00:00:00Z) Returns the corresponding timestamp.
to_timestamp_millisConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as milliseconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
to_timestamp_nanosConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000000000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as nanoseconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
to_timestamp_secondsConverts a value to a timestamp (YYYY-MM-DDT00:00:00.000Z). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding timestamp.
to_unixtimeConverts a value to seconds since the unix epoch (1970-01-01T00:00:00Z). Supports strings, dates, timestamps and double types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided.
todayAlias of current_date.
Array functions in Spice.ai SQL help construct, transform, and query array data types. These functions operate on array expressions, which can be constants, columns, or results of other functions. The implementation closely follows the PostgreSQL dialect. The following array functions are supported:
arrayConstructs an array from the provided expressions using Spark-compatible semantics. Inputs are evaluated left to right, cast to a common element type, and collected into a single Arrow list value without removing duplicates or nulls.
Reference: Spark SQL array.
array_any_valueReturns the first non-null element in the array. If all elements are null, returns null.
array_appendAppends an element to the end of an array and returns the new array.
array_catAlias of array_concat.
array_concatConcatenates two or more arrays into a single array.
array_containsReturns true if the array contains the specified element.
Note: For array-to-array containment operations, use the @> operator.
array_dimsReturns an array of the array's dimensions. For a 2D array, returns the number of rows and columns.
array_distanceReturns the Euclidean distance between two input arrays of equal length.
array_distinctReturns a new array with duplicate elements removed, preserving the order of first occurrence.
array_elementExtracts the element at the specified index from the array. Indexing is 1-based.
array_exceptReturns an array containing elements in array1 that are not in array2, preserving first-occurrence order and without duplicates.
Alias: list_except.
array_hasReturns true if the array contains the specified element.
Aliases: array_contains, list_has.
array_has_allReturns true if every element of sub_array is present in array.
Alias: list_has_all.
array_has_anyReturns true if array and sub_array share at least one element.
Aliases: list_has_any, arrays_overlap.
array_intersectReturns an array of elements present in both input arrays, deduplicated.
Alias: list_intersect.
array_lengthReturns the length of the array at the given (optional) dimension. Dimension defaults to 1.
Alias: list_length.
array_maxReturns the maximum element of the array, ignoring NULLs.
Alias: list_max.
array_minReturns the minimum element of the array, ignoring NULLs.
Alias: list_min.
array_ndimsReturns the number of dimensions of the array.
Alias: list_ndims.
array_pop_backReturns the array with the last element removed.
Alias: list_pop_back.
array_pop_frontReturns the array with the first element removed.
Alias: list_pop_front.
array_positionReturns the 1-based position of the first occurrence of element in array, or NULL if not found. An optional from_index starts the search at a later position.
Aliases: list_position, array_indexof, list_indexof.
array_positionsReturns a 1-based array of all positions where element occurs in array.
Alias: list_positions.
array_prependPrepends an element to the beginning of an array.
Aliases: list_prepend, array_push_front, list_push_front.
array_removeReturns the array with the first occurrence of element removed.
Alias: list_remove.
array_remove_nReturns the array with the first max occurrences of element removed.
Alias: list_remove_n.
array_remove_allReturns the array with all occurrences of element removed.
Alias: list_remove_all.
array_repeatReturns an array containing element repeated count times.
Alias: list_repeat.
array_replaceReplaces the first occurrence of from with to in array.
Alias: list_replace.
array_replace_nReplaces the first max occurrences of from with to in array.
Alias: list_replace_n.
array_replace_allReplaces every occurrence of from with to in array.
Alias: list_replace_all.
array_resizeResizes array to the given length, padding with value (or NULL if omitted) when growing.
Alias: list_resize.
array_reverseReturns the array with elements in reverse order.
Alias: list_reverse.
array_sliceReturns a slice of the array from begin to end (1-based, inclusive). Negative indices count from the end.
Alias: list_slice.
array_sortReturns array sorted in ascending order (default). Optional arguments control sort direction (ASC/DESC) and null placement (NULLS FIRST/NULLS LAST).
Alias: list_sort.
array_to_stringConcatenates array elements into a single string using the given delimiter. Optional null_string replaces NULL elements.
Aliases: list_to_string, array_join, list_join.
array_unionReturns the set-union of two arrays, deduplicated.
Alias: list_union.
arrays_zipMerges the given arrays element-wise into an array of structs. Shorter arrays are padded with NULLs.
Alias: list_zip.
cardinalityReturns the total number of elements in an array (including nested elements) or the number of entries in a map.
emptyReturns true if the array has length 0 (or is NULL).
Aliases: array_empty, list_empty.
flattenFlattens a nested array into a single-level array.
make_arrayConstructs an array (Arrow list) from the given expressions. SQL [expr1, expr2, ...] literal syntax compiles to this function.
Alias: make_list.
rangeGenerates a numeric or date range as an array, half-open on the upper bound. When the step is omitted, the default is 1.
For dates, step is an interval literal, e.g. interval '1 day'. Use generate_series for the inclusive-upper-bound variant.
generate_seriesLike range, but the upper bound is inclusive.
string_to_arraySplits a string into an array of substrings using the given delimiter. An optional null_string turns matching substrings into NULLs.
Alias: string_to_list.
Struct functions help construct and access structured data types (Arrow structs). These are useful for working with nested or composite data.
structConstructs an anonymous Arrow struct from the given values. Field names default to c0, c1, ... in the order provided.
named_structConstructs an Arrow struct from alternating field-name / field-value pairs.
get_fieldExtracts a field by name from a struct or map. struct.field and struct['field'] sugar invoke this function.
Map functions help construct and query key-value data structures. These are useful for semi-structured or JSON-like data.
mapConstructs an Arrow map from alternating key/value arguments, or from two arrays (one of keys, one of values).
map_keysReturns the keys of a map as an array.
map_valuesReturns the values of a map as an array.
map_entriesReturns the entries of a map as an array of structs [{key, value}, ...].
map_extractLooks up a key in a map and returns the associated value, or NULL if the key is absent.
Alias: element_at.
Hashing functions compute cryptographic hashes and checksums for data integrity, fingerprinting, and security applications. Binary digest output is returned as a Binary (bytes) array; use encode(..., 'hex') to render as hex.
digestComputes the digest of the input using the named hash algorithm. Supported algorithms: 'md5', 'sha224', 'sha256', 'sha384', 'sha512', 'blake2s', 'blake2b', 'blake3'.
md5Computes the MD5 128-bit hash of a string and returns the result as a lowercase hex string.
sha224Computes the SHA-224 hash and returns a binary digest.
sha256Computes the SHA-256 hash and returns a binary digest.
sha384Computes the SHA-384 hash and returns a binary digest.
sha512Computes the SHA-512 hash and returns a binary digest.
Binary encoding utilities for converting between binary data and text representations.
encodeEncodes a string or binary value using the specified encoding. Supported encodings: 'hex', 'base64'.
decodeDecodes text back to binary using the specified encoding. Supported encodings: 'hex', 'base64'.
Union functions help work with union (variant) data types.
union_extractExtracts the value of a named member from a union, returning NULL if the union's active member doesn't match.
union_tagReturns the name of the active member of a union value as a string.
PostgreSQL-compatible functions for reading table and column comments from registered datasets. Comments originate from the dataset's source (for connectors that surface COMMENT ON TABLE / COMMENT ON COLUMN metadata, such as PostgreSQL, MySQL, Snowflake, and Databricks) or from description metadata attached to the dataset schema.
obj_descriptionReturns the comment attached to a registered table, or NULL if the table has no comment.
'table', 'schema.table', or 'catalog.schema.table') or an integer table OID. Unqualified names are resolved against the session's default catalog and schema.'pg_class', the call is treated as PostgreSQL-style obj_description(oid, 'pg_class'); any other value returns NULL.col_descriptionReturns the comment attached to a column on a registered table, or NULL if no comment exists.
obj_description.Additional scalar functions include type casting, type inspection, and version reporting.
arrow_castCasts an expression to a specific Arrow data type. Use this function when you need precise control over the target Arrow type, such as specifying timestamp precision.
'Int32', 'Utf8', 'Timestamp(Second, None)').See Data Types Reference for supported Arrow types.
arrow_try_castLike arrow_cast but returns NULL instead of erroring when the cast fails.
arrow_typeofReturns the Arrow data type of the given expression as a string.
arrow_metadataReturns the Arrow schema metadata associated with an expression as a map of key/value strings. Useful for inspecting field-level metadata (units, comments, logical type hints) attached during ingest.
versionReturns the underlying DataFusion runtime version string.
ai and embedSee AI Functions for ai() (LLM text generation) and embed() (vector embedding generation).
bucketAssigns a deterministic bucket identifier for a value by hashing the input and projecting it into a fixed number of buckets. Helpful for partition_by expressions and for co-locating related rows during acceleration refreshes.
[1, 1_000_000]. The literal's integer type (Int8 … Int64, UInt8 … UInt64) determines the return type.Returns an integer in the range [0, num_buckets - 1], matching the integer type of num_buckets. The same input value always maps to the same bucket for a given num_buckets (the hash uses a fixed seed, so buckets are stable across processes and runtime restarts).
In spicepod.yaml, use the function directly inside partition_by to build file-based accelerations:
truncateIceberg-style truncate transform. For numeric values, rounds down to the nearest multiple of width. For strings and binary, returns the first width characters/bytes. Useful for partitioning by wide numeric ranges or string prefixes.
Int64 literal that defines the bucket size or, for strings/binary, the number of leading units to retain. Maximum: i64::MAX / 2.Int8, Int16, Int32, Int64UInt8, UInt16, UInt32, UInt64Decimal128, Decimal256Utf8BinaryReturns the same type as value. For numbers, the result is the largest multiple of width that is less than or equal to value. For strings/binary, the result is the first width characters/bytes.
Spice.ai aims for compatibility with PostgreSQL, but some functions or behaviors may differ depending on the underlying engine version.