The append refresh mode incrementally adds new rows to the acceleration on each refresh. It is designed for append-only or immutable datasets such as time-series, event, and log data.
Use append when:
full mode on each interval.append mode requires a time_column that identifies new rows by comparing the local maximum value to the source. Data is incrementally refreshed where time_column in the source is greater than max(time_column) in the acceleration.
To account for clock skew or late-arriving rows, configure an overlap window with acceleration.refresh_append_overlap. Rows within the overlap are re-read on each refresh.
time_partition_columnDatasets partitioned by a less-granular time column (day, month, year) can specify time_partition_column in addition to time_column for efficient partition pruning at the source.
For object-store sources, set time_column or time_partition_column to the special value last_modified to append only newly created or updated files. Spice uses file metadata to determine which files are new, dramatically reducing scan time for large datasets.
If last_modified exists as a column in the data, the column value takes precedence over file metadata.
This is supported for connectors that accept the file format parameter, such as s3://, abfs://, and file://.
Append-mode accelerations that define a time_column wait to report ready until the first append refresh completes after snapshot bootstrap. This keeps the dataset out of rotation until the freshest data is available while still benefiting from snapshot-assisted startup.
Pair refresh_mode: append with a primary_key and on_conflict: upsert to handle source rows that are occasionally updated. See End-to-End Incremental Ingestion Example.