title: 'Arrow Data Accelerator Deployment Guide' sidebar_label: 'Deployment Guide' description: 'Operating guide for the Arrow (in-memory) data accelerator in production: memory sizing, indexes, and observability.' sidebar_position: 10 pagination_prev: null pagination_next: null tags:
Production operating guide for the Arrow in-memory data accelerator covering memory sizing, optional hash indexes, and observability.
The Arrow accelerator is an in-process, in-memory engine. There is no external storage and no authentication or secret management required.
The Arrow accelerator is not durable. Data is held in RAM and is lost on process restart; every restart re-materializes the dataset from the source connector.
primary_key (or secondary indexes entry) is configured, building a hash map over the indexed columns. Build time scales linearly with rows; memory overhead is approximately 24–48 bytes per row plus the key size.Generic acceleration metrics are available with the dataset_acceleration_ prefix. Hash-index operations emit dedicated metrics when the index is enabled:
| Metric | Type | Description |
|---|---|---|
hash_index_builds | Counter | Total hash-index builds (one per refresh). |
hash_index_build_duration_ms | Histogram | Time to build the hash index. |
hash_index_entries | Gauge | Number of entries in the index. |
hash_index_memory_bytes | Gauge | Approximate memory footprint of the index. |
hash_index_lookups | Counter | Total hash-index lookups performed by queries. |
hash_index_lookup_rows | Counter | Total rows returned via hash-index lookups. |
See Component Metrics for enabling and exporting metrics. Refresh metrics are described in Acceleration.
Arrow acceleration operations (refresh, query) participate in task history through the shared acceleration spans (accelerated_table_refresh, sql_query). No Arrow-specific spans are emitted — the accelerator is a thin wrapper over Arrow memory.
primary_key constraint; unique constraints alone do not enable the index.partition_by: Not applicable — Arrow accelerator holds a single in-memory representation.| Symptom | Likely cause | Resolution |
|---|---|---|
| OOM on refresh | Source dataset larger than RAM. | Switch to a durable accelerator (DuckDB / SQLite / Cayenne) that supports spill to disk. |
| Long startup time | Full-dataset refresh runs on boot. | Switch to a durable accelerator so refresh is incremental, not full, on restart. |
hash_index ignored | No primary-key constraint on the dataset. | Add primary_key: to the dataset definition; hash index activates automatically. |
| Query slow for point lookups | No primary key/index, or wrong key column. | Add a primary_key: (or secondary indexes: entry); ensure the query filter matches the indexed columns. |
| Accelerator refuses to start with file mode | Arrow rejects file-mode acceleration. | Switch engine: to duckdb, sqlite, postgres, or cayenne. |