title: 'ScyllaDB Data Connector'
sidebar_label: 'ScyllaDB Data Connector'
description: 'ScyllaDB Data Connector Documentation'
tags:
The ScyllaDB Data Connector enables federated SQL queries on data stored in ScyllaDB clusters using CQL (Cassandra Query Language).
Configuration
from
The from field takes the form scylladb:{table_name} where table_name is the table identifier in the ScyllaDB keyspace to read from.
name
The dataset name. This will be used as the table name within Spice.
Example:
The dataset name cannot be a reserved keyword.
params
The ScyllaDB data connector can be configured by providing the following params. Use the secret replacement syntax to load the secret from a secret store, e.g. ${secrets:scylladb_pass}.
| Parameter Name | Description | Required | Default |
|---|
scylladb_host | Hostname(s) of ScyllaDB nodes. Comma-separated for multiple nodes. | Yes | - |
scylladb_hosts | Alternative to scylladb_host. Comma-separated list of hostnames. | No | - |
scylladb_port | ScyllaDB CQL native transport port. | No | 9042 |
scylladb_keyspace | The keyspace to use for queries. | Yes | - |
scylladb_user | Username for authentication. | No | - |
scylladb_pass | Password for authentication. | No | - |
scylladb_datacenter | Preferred datacenter for connection routing. | No | - |
scylladb_ssl | Enable SSL/TLS for connections. | No | false |
connection_timeout |
Types
The table below shows the CQL data types supported, along with the type mapping to Apache Arrow types in Spice.
| CQL Type | Arrow Type | Notes |
|---|
boolean | Boolean | |
tinyint | Int8 | |
smallint | Int16 | |
int | Int32 | |
bigint | Int64 | |
counter | Int64 | Cassandra counter type |
float | Float32 | |
double | Float64 | |
decimal |
Decimal Handling
CQL decimal is an arbitrary-precision type, while Arrow Decimal128 has a maximum precision of 38 digits. The connector uses:
- Precision: 38 (maximum for Decimal128)
- Scale: 2 (suitable for monetary/financial data)
For decimals that exceed this precision, values may be truncated or rounded.
Date/Time Handling
- CQL
date: Stored as days since epoch with a 2^31 offset. Converted to Arrow Date32 (signed days since 1970-01-01).
- CQL
timestamp: Milliseconds since Unix epoch. Directly mapped to Arrow Timestamp(Millisecond).
- CQL
time: Nanoseconds since midnight. Converted to Arrow Timestamp(Microsecond) with nanosecond truncation.
Query Execution
Due to fundamental differences between CQL and SQL, the connector implements a local filtering strategy. All data is fetched from ScyllaDB using SELECT * queries, and filtering, joins, aggregations, and other SQL operations are performed locally by DataFusion.
Why Filter Pushdown is Disabled
CQL lacks many SQL constructs:
| Feature | SQL | CQL |
|---|
| CASE WHEN | ✅ | ❌ |
| Subqueries | ✅ | ❌ |
| Complex JOINs | ✅ | ❌ |
| CAST expressions | ✅ | ❌ |
| INTERVAL | ✅ | ❌ |
| Window functions | ✅ | ❌ |
| NULLS FIRST/LAST | ✅ | ❌ |
| COUNT(DISTINCT) | ✅ | ❌ |
| Arbitrary WHERE | ✅ | ❌ |
Because of these limitations, the connector:
- Fetches full table data using
SELECT * FROM keyspace.table
- Lets DataFusion handle all filtering locally after data retrieval
- Supports projection pushdown—only requested columns are transferred
Streaming Execution
Query results are streamed using the scylla driver's paging mechanism in batches of 8192 rows by default, minimizing memory usage for large result sets.
Performance Considerations
Since filter pushdown is disabled, all table data is transferred for each query. Consider the following optimizations:
Enable Acceleration
For frequently queried data, enable Spice acceleration to cache data locally:
Configure Datacenter Locality
Set the datacenter preference to route queries to the nearest nodes:
Adjust Connection Timeout
Set connection timeouts appropriately for your network:
Examples
Basic Federated Query
With Authentication
Multi-Node Cluster
With Acceleration
Using Environment Variables
Limitations
CQL Limitations
The following SQL operations cannot be pushed down to ScyllaDB and are performed locally:
- JOINs: All joins are performed locally by DataFusion
- Aggregations: COUNT, SUM, AVG, etc. are computed locally
- Subqueries: Nested queries are not supported in CQL
- Window functions: RANK, ROW_NUMBER, etc. not supported
- Complex WHERE clauses: CQL requires partition key in WHERE; Spice fetches all data
- ORDER BY: Sorting is done locally
Connector Limitations
- Read-only: The connector does not support INSERT, UPDATE, or DELETE operations
- Decimal precision: Fixed at precision=38, scale=2; may not suit all use cases
- Collection types: Lists, sets, and maps are converted to JSON string representation
- Large tables: Without acceleration, large tables cause significant data transfer
Data Type Limitations
- varint: Arbitrary-precision integers are converted to strings
- duration: CQL durations are converted to string representation
- UDTs: User-defined types are converted to string representation
- Nested collections: Deeply nested collections become complex JSON strings
Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the secret stores documentation. Additionally, learn how to use referenced secrets in component parameters by visiting the using referenced secrets guide.
See Also