IOMMU-based tuning

IOMMU (Input-Output Memory Management Unit) is a hardware feature that manages how I/O devices access memory. In cloud environments, SmartNICs are typically used to offload the IOMMU workload. On bare-metal systems, to align performance with the cloud, you should disable iommu.strict and enable iommu.passthrough settings to achieve better performance.

Setting IOMMU

  1. To configure the IOMMU setting, use a text editor to modify the grub file by adding or updating the GRUB_CMDLINE_LINUX configuration.
    

        
        
sudo vi /etc/default/grub

    

then add or update:

    

        
        
GRUB_CMDLINE_LINUX="iommu.strict=0 iommu.passthrough=1"

    
  1. Update GRUB and reboot to apply the settings.
    

        
        
sudo update-grub && sudo reboot

    
  1. Verify if the settings have been successfully applied:
    

        
        
sudo dmesg | grep iommu

    

You will notice that the IOMMU is already in passthrough mode:

    

        
        
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-6.14.0-1011-aws root=PARTUUID=1c3f3c20-db6b-497c-8727-f6702f73a5b2 ro iommu.strict=0 iommu.passthrough=1 console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
[    0.855658] iommu: Default domain type: Passthrough (set via kernel command line)

    

The result after configuring IOMMU

  1. Run the following command on the Arm Neoverse bare-metal where Tomcat is on:
    

        
        
for no in {96..103}; do sudo bash -c "echo 1 > /sys/devices/system/cpu/cpu${no}/online"; done
for no in {0..95} {104..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
net=$(ls /sys/class/net/ | grep 'en')
sudo ethtool -L ${net} combined 8
~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh

    
  1. Run run wrk2 on the x86_64 bare-metal instance as shown:
    

        
        
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample

    

The result after iommu tuning should look like:

    

        
          Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.92s     2.49s   10.08s    62.27%
    Req/Sec     3.36k    56.23     3.58k    69.64%
  25703668 requests in 1.00m, 13.33GB read
Requests/sec: 428628.50
Transfer/sec:    227.69MB

        
    
Back
Next