Tune network workloads on Arm-based bare-metal instances: Summary

Tune network workloads on Arm-based bare-metal instances

Log an issue

Fork and edit

Discuss on Discord

Tune network workloads on Arm-based bare-metal instances

Review the results: Tomcat performance tuning on Arm Neoverse

Each tuning technique delivered measurable gains for the Tomcat HTTP benchmark on an Arm Neoverse bare‑metal server (workload generated with wrk2). The table summarizes requests per second and average latency at each stage.

Method	Requests/sec	Avg latency (s)
Baseline	357,835.75	10.26
NIC queues	378,782.37	8.35
NUMA-local	363,744.39	9.41
IOMMU	428,628.50	4.92

Key takeaways

IOMMU passthrough produced the largest throughput gain: +19.8% vs. baseline, with a 52.0% drop in average latency.
NIC queue count alignment improved throughput by +5.9% and reduced average latency by 18.6%.
NUMA locality yielded a smaller but consistent benefit: +1.7% throughput and 8.3% lower average latency.
Together, these techniques (IOMMU tuning, NIC queue optimization, and NUMA-aware placement) form a practical checklist for improving network workload performance on Arm Neoverse.

Next steps

Apply the same tuning pattern to other HTTP services and microservices (for example, NGINX, Envoy, or custom Jetty/Tomcat apps).
Re‑evaluate queue counts, CPU pinning, and IOMMU mode as you scale cores, update kernels, or change NIC drivers/firmware.
Track end‑to‑end SLOs (p95/p99 latency and error rates) in addition to average metrics to ensure sustained gains under real traffic.

Back

Tune network workloads on Arm-based bare-metal instances

Introduction

Set up Tomcat

Establish baseline performance

Tune performance with NIC queue counts

NUMA-based tuning

IOMMU-based tuning

Summary

Next Steps

Tune network workloads on Arm-based bare-metal instances

Review the results: Tomcat performance tuning on Arm Neoverse

Key takeaways

Next steps