Was this page helpful?
ScyllaDB Python Driver is available under the Apache v2 License. ScyllaDB Python Driver is a fork of DataStax Python Driver. See Copyright here.
Caution
You're viewing documentation for an unstable version of Scylla Python Driver. Switch to the latest stable version.
scylla-driver is shard aware and contains extensions that work with the TokenAwarePolicy supported by Scylla 2.3 and onwards. Using this policy, the driver can select a connection to a particular shard based on the shard’s token. As a result, latency is significantly reduced because there is no need to pass data between the shards.
Details on the scylla cql protocol extensions https://github.com/scylladb/scylla/blob/master/docs/dev/protocol-extensions.md#intranode-sharding
For using it you only need to enable TokenAwarePolicy on the Cluster
See the configuration of native_shard_aware_transport_port and native_shard_aware_transport_port_ssl on scylla.yaml:
https://github.com/scylladb/scylla/blob/master/docs/dev/protocols.md#cql-client-protocol
from cassandra.cluster import Cluster
from cassandra.policies import TokenAwarePolicy, RoundRobinPolicy
cluster = Cluster(load_balancing_policy=TokenAwarePolicy(RoundRobinPolicy()))
shard_aware_options
Setting it to dict(disable=True) would disable the shard aware functionally, for cases favoring once connection per host (example, lots of processes connecting from one client host, generating a big load of connections
Other option is to configure scylla by setting enable_shard_aware_drivers: false on scylla.yaml.
from cassandra.cluster import Cluster
cluster = Cluster(shard_aware_options=dict(disable=True))
session = cluster.connect()
assert not cluster.is_shard_aware(), "Shard aware should be disabled"
# or just disable the shard aware port logic
cluster = Cluster(shard_aware_options=dict(disable_shardaware_port=True))
session = cluster.connect()
cluster.is_shard_aware()
New method available on Cluster allowing to check whether the remote cluster supports shard awareness (bool)
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
if cluster.is_shard_aware():
print("connected to a scylla cluster")
cluster.shard_aware_stats()
New method available on Cluster allowing to check the status of shard aware connections to all available hosts (dict)
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
stats = cluster.shard_aware_stats()
if all([v["shards_count"] == v["connected"] for v in stats.values()]):
print("successfully connected to all shards of all scylla nodes")
SCYLLA_RATE_LIMIT_ERROR Error
The ScyllaDB 5.1 introduced a feature called per-partition rate limiting. In case the (user defined) per-partition rate limit is exceeded, the database will start returning a Scylla-specific type of error: RateLimitReached.
from cassandra import RateLimitReached
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
session.execute("""
CREATE KEYSPACE IF NOT EXISTS keyspace1
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
""")
session.execute("USE keyspace1")
session.execute("""
CREATE TABLE tbl (pk int PRIMARY KEY, v int)
WITH per_partition_rate_limit = {'max_writes_per_second': 1}
""")
prepared = session.prepare("""
INSERT INTO tbl (pk, v) VALUES (?, ?)
""")
try:
for _ in range(1000):
self.session.execute(prepared.bind((123, 456)))
except RateLimitReached:
raise
ScyllaDB has a built-in 1MB page size limit that Cassandra does not have. This means that even if you set a high fetch_size (e.g., 10000 rows), ScyllaDB may return fewer rows per page if the total response size exceeds 1MB.
This behavior is particularly noticeable when:
Working with wide tables (many columns)
Using NumpyProtocolHandler where you want large arrays per page
Columns contain large values (blobs, long strings, etc.)
For example, with a table containing 1000 columns, you might receive only 30-50 rows per page even with fetch_size=10000.
Workaround: If you need to receive more rows per page (up to ScyllaDB’s 1MB limit), set default_fetch_size to None:
from cassandra.cluster import Cluster
from cassandra.protocol import NumpyProtocolHandler
from cassandra.query import tuple_factory
cluster = Cluster()
session = cluster.connect(keyspace="mykeyspace")
session.row_factory = tuple_factory
session.client_protocol_handler = NumpyProtocolHandler
session.default_fetch_size = None # Let ScyllaDB control page sizes
results = session.execute("SELECT * FROM wide_table")
With default_fetch_size = None, the driver won’t request a specific page size, allowing ScyllaDB to fill pages up to its 1MB limit. This results in larger arrays when using NumpyProtocolHandler.
For more details on paging, see Paging Large Queries.
scylla-driver is tablet-aware, which means that it is able to parse TABLETS_ROUTING_V1 extension to ProtocolFeatures, recieve tablet information sent by Scylla in the custom_payload part of the RESULT message, and utilize it. Thanks to this, queries to tablet-based tables are still shard-aware.
Details on the scylla cql protocol extensions https://github.com/scylladb/scylladb/blob/master/docs/dev/protocol-extensions.md#negotiate-sending-tablets-info-to-the-drivers
Details on the sending tablet information to the drivers https://github.com/scylladb/scylladb/blob/master/docs/dev/protocol-extensions.md#sending-tablet-info-to-the-drivers
Was this page helpful?
ScyllaDB Python Driver is available under the Apache v2 License. ScyllaDB Python Driver is a fork of DataStax Python Driver. See Copyright here.