You can perform Cassandra benchmarking using the built-in cassandra-stress tool, which measures database performance under different workloads such as write, read, and mixed operations.
Cassandra comes with a built-in tool called cassandra-stress that is used for testing performance. It’s located in the tools/bin/ folder of your Cassandra installation.
ls ~/cassandra/tools/bin | grep cassandra-stress
If you see cassandra-stress in the list, the tool is installed and ready to use.
Check the tool’s help options to verify it works correctly:
~/cassandra/tools/bin/cassandra-stress help
The output is similar to:
Usage: cassandra-stress <command> [options]
Help usage: cassandra-stress help <command>
---Commands---
read : Multiple concurrent reads - the cluster must first be populated by a write test
write : Multiple concurrent writes against the cluster
mixed : Interleaving of any basic commands, with configurable ratio and distribution - the cluster must first be populated by a write test
counter_write : Multiple concurrent updates of counters.
counter_read : Multiple concurrent reads of counters. The cluster must first be populated by a counterwrite test.
user : Interleaving of user provided queries, with configurable ratio and distribution
help : Print help for a command or option
print : Inspect the output of a distribution definition
version : Print the version of cassandra stress
The list of commands and options confirms that your setup is correct and you’re ready to start testing Cassandra’s performance.
Insert 10,000 rows with 50 concurrent threads using cassandra-stress:
~/cassandra/tools/bin/cassandra-stress write n=10000 -rate threads=50
The output is similar to:
******************** Stress Settings ********************
Command:
Type: write
Count: 10,000
No Warmup: false
Consistency Level: LOCAL_ONE
Target Uncertainty: not applicable
Key Size (bytes): 10
Counter Increment Distibution: add=fixed(1)
Rate:
Auto: false
Thread Count: 50
OpsPer Sec: 0
Population:
Sequence: 1..10000
Order: ARBITRARY
Wrap: true
Insert:
Revisits: Uniform: min=1,max=1000000
Visits: Fixed: key=1
Row Population Ratio: Ratio: divisor=1.000000;delegate=Fixed: key=1
Batch Type: not batching
Columns:
Max Columns Per Key: 5
Column Names: [C0, C1, C2, C3, C4]
Comparator: AsciiType
Timestamp: null
Variable Column Count: false
Slice: false
Size Distribution: Fixed: key=34
Count Distribution: Fixed: key=5
Errors:
Ignore: false
Tries: 10
Log:
No Summary: false
No Settings: false
File: null
Interval Millis: 1000
Level: NORMAL
Mode:
API: JAVA_DRIVER_NATIVE
Connection Style: CQL_PREPARED
Protocol Version: V5
Username: null
Password: null
Auth Provide Class: null
Max Pending Per Connection: 128
Connections Per Host: 8
Compression: NONE
Node:
Nodes: [localhost]
Is White List: false
Datacenter: null
Schema:
Keyspace: keyspace1
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Strategy Options: {replication_factor=1}
Table Compression: null
Table Compaction Strategy: null
Table Compaction Strategy Options: {}
Transport:
Truststore: null
Truststore Password: null
Keystore: null
Keystore Password: null
SSL Protocol: TLS
SSL Algorithm: null
SSL Ciphers: TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA
Port:
Native Port: 9042
JMX Port: 7199
JMX:
Username: null
Password: *not set*
Graph:
File: null
Revision: unknown
Title: null
Operation: WRITE
TokenRange:
Wrap: false
Split Factor: 1
Credentials file:
File: *not set*
CQL username: *not set*
CQL password: *not set*
JMX username: *not set*
JMX password: *not set*
Transport truststore password: *not set*
Transport keystore password: *not set*
Reporting:
Output frequency: 1s
Header frequency: *not set*
Connected to cluster: Test Cluster, max pending requests per connection 128, max connections per host 8
Datacenter: datacenter1; Host: localhost/127.0.0.1:9042; Rack: rack1
Created keyspaces. Sleeping 1s for propagation.
Sleeping 2s...
Warming up WRITE with 2500 iterations...
Running WRITE with 50 threads for 10000 iteration
type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
total, 10000, 10690, 10690, 10690, 3.7, 2.8, 9.5, 16.7, 28.9, 38.4, 0.9, 0.00000, 0, 0, 0, 0, 0, 0
Results:
Op rate : 10,690 op/s [WRITE: 10,690 op/s]
Partition rate : 10,690 pk/s [WRITE: 10,690 pk/s]
Row rate : 10,690 row/s [WRITE: 10,690 row/s]
Latency mean : 3.7 ms [WRITE: 3.7 ms]
Latency median : 2.8 ms [WRITE: 2.8 ms]
Latency 95th percentile : 9.5 ms [WRITE: 9.5 ms]
Latency 99th percentile : 16.7 ms [WRITE: 16.7 ms]
Latency 99.9th percentile : 28.9 ms [WRITE: 28.9 ms]
Latency max : 38.4 ms [WRITE: 38.4 ms]
Total partitions : 10,000 [WRITE: 10,000]
Total errors : 0 [WRITE: 0]
Total GC count : 0
Total GC memory : 0 B
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:00
END
Run a read benchmark on your Cassandra database using cassandra-stress. This simulates multiple clients reading from the cluster at the same time and records performance metrics such as throughput and latency.
~/cassandra/tools/bin/cassandra-stress read n=10000 -rate threads=50
The output is similar to:
******************** Stress Settings ********************
Command:
Type: read
Count: 10,000
No Warmup: false
Consistency Level: LOCAL_ONE
Target Uncertainty: not applicable
Key Size (bytes): 10
Counter Increment Distibution: add=fixed(1)
Rate:
Auto: false
Thread Count: 50
OpsPer Sec: 0
Population:
Distribution: Gaussian: min=1,max=10000,mean=5000.500000,stdev=1666.500000
Order: ARBITRARY
Wrap: false
Insert:
Revisits: Uniform: min=1,max=1000000
Visits: Fixed: key=1
Row Population Ratio: Ratio: divisor=1.000000;delegate=Fixed: key=1
Batch Type: not batching
Columns:
Max Columns Per Key: 5
Column Names: [C0, C1, C2, C3, C4]
Comparator: AsciiType
Timestamp: null
Variable Column Count: false
Slice: false
Size Distribution: Fixed: key=34
Count Distribution: Fixed: key=5
Errors:
Ignore: false
Tries: 10
Log:
No Summary: false
No Settings: false
File: null
Interval Millis: 1000
Level: NORMAL
Mode:
API: JAVA_DRIVER_NATIVE
Connection Style: CQL_PREPARED
Protocol Version: V5
Username: null
Password: null
Auth Provide Class: null
Max Pending Per Connection: 128
Connections Per Host: 8
Compression: NONE
Node:
Nodes: [localhost]
Is White List: false
Datacenter: null
Schema:
Keyspace: keyspace1
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Strategy Options: {replication_factor=1}
Table Compression: null
Table Compaction Strategy: null
Table Compaction Strategy Options: {}
Transport:
Truststore: null
Truststore Password: null
Keystore: null
Keystore Password: null
SSL Protocol: TLS
SSL Algorithm: null
SSL Ciphers: TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA
Port:
Native Port: 9042
JMX Port: 7199
JMX:
Username: null
Password: *not set*
Graph:
File: null
Revision: unknown
Title: null
Operation: READ
TokenRange:
Wrap: false
Split Factor: 1
Credentials file:
File: *not set*
CQL username: *not set*
CQL password: *not set*
JMX username: *not set*
JMX password: *not set*
Transport truststore password: *not set*
Transport keystore password: *not set*
Reporting:
Output frequency: 1s
Header frequency: *not set*
Sleeping 2s...
Warming up READ with 2500 iterations...
Connected to cluster: Test Cluster, max pending requests per connection 128, max connections per host 8
Datacenter: datacenter1; Host: localhost/127.0.0.1:9042; Rack: rack1
Running READ with 50 threads for 10000 iteration
type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
total, 1540, 1540, 1540, 1540, 8.1, 6.2, 19.2, 38.4, 73.3, 80.9, 1.0, 0.00000, 0, 0, 0, 0, 0, 0
total, 9935, 8395, 8395, 8395, 5.9, 4.2, 16.7, 33.1, 57.3, 86.0, 2.0, 0.48892, 0, 0, 0, 0, 0, 0
total, 10000, 4217, 4217, 4217, 8.5, 4.2, 27.1, 27.4, 27.4, 27.4, 2.0, 1.89747, 0, 0, 0, 0, 0, 0
Results:
Op rate : 4,962 op/s [READ: 4,962 op/s]
Partition rate : 4,962 pk/s [READ: 4,962 pk/s]
Row rate : 4,962 row/s [READ: 4,962 row/s]
Latency mean : 6.3 ms [READ: 6.3 ms]
Latency median : 4.5 ms [READ: 4.5 ms]
Latency 95th percentile : 17.4 ms [READ: 17.4 ms]
Latency 99th percentile : 33.4 ms [READ: 33.4 ms]
Latency 99.9th percentile : 59.6 ms [READ: 59.6 ms]
Latency max : 86.0 ms [READ: 86.0 ms]
Total partitions : 10,000 [READ: 10,000]
Total errors : 0 [READ: 0]
Total GC count : 0
Total GC memory : 0 B
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:02
END
The metrics below explain what each value in the cassandra-stress output represents:
Results from the run on the c4a-standard-4 (4 vCPU, 16 GB memory) Arm64 VM in GCP (SUSE shown; Ubuntu results were very similar):
| Metric | Write Test | Read Test |
|---|---|---|
| Operation Rate (op/s) | 10,690 | 4,962 |
| Partition Rate (pk/s) | 10,690 | 4,962 |
| Row Rate (row/s) | 10,690 | 4,962 |
| Latency Mean | 3.7 ms | 6.3 ms |
| Latency Median | 2.8 ms | 4.5 ms |
| Latency 95th Percentile | 9.5 ms | 17.4 ms |
| Latency 99th Percentile | 16.7 ms | 33.4 ms |
| Latency 99.9th Percentile | 28.9 ms | 59.6 ms |
| Latency Max | 38.4 ms | 86.0 ms |
| Total Partitions | 10,000 | 10,000 |
| Total Errors | 0 | 0 |
| Total GC Count | 0 | 0 |
| Total GC Memory | 0 B | 0 B |
| Total GC Time | 0.0 s | 0.0 s |
| Total Operation Time | 0:00:00 | 0:00:02 |
You’ve successfully deployed Apache Cassandra 5.0.5 on a Google Axion C4A Arm-based virtual machine, validated its functionality, and measured its performance using cassandra-stress. The benchmark results on Google Axion C4A Arm-based instances demonstrate strong performance characteristics.
Write operations achieved high throughput of 10,690 op/s, while read operations reached 4,962 op/s on the c4a-standard-4 Arm64 VM. Write latency was notably low with a mean of 3.7 ms compared to reads at 6.3 ms, indicating fast write processing on this Arm64 VM. The 95th and 99th percentile latencies show consistent performance, with writes significantly faster than reads. Zero errors or GC overhead confirm stable and reliable benchmarking results.
The Arm64 VM provides efficient and predictable performance, making it suitable for high-throughput Cassandra workloads. The low write latencies and high operation rates demonstrate that Arm-based infrastructure can effectively handle database operations that require both speed and consistency. These results provide a solid baseline for evaluating Cassandra performance on Arm64 architecture and can guide decisions about instance sizing and configuration for production deployments.
To continue building on this foundation, you can explore advanced Cassandra configurations such as multi-node cluster deployments, replication strategies for high availability, or performance tuning for specific workload patterns. You might also investigate integrating Cassandra with application frameworks or comparing performance across different Arm-based instance types to optimize for your use case.