I'm running a VM on RHEL9 (qemu-kvm 9.0.0, release 10.el9_5), and I'm observing quite poor I/O performance (about a quarter of native performance). Here are some metrics, collected with fio:
$ fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting
INSIDE VM
- READ: IOPS=506, BW=2027KiB/s
- WRITE: IOPS=509, BW=2017/s, fsync avg latency=766 nanoseconds
NATIVE (no virtualization)
- READ: IOPS=1701, BW=6807KiB/s
- WRITE: IOPS=1693, BW=6773KiB/s, fsync avg latency=385 nanoseconds
$ fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=128 --direct=1 --numjobs=16 --runtime=60 --group_reporting
INSIDE VM
- READ: IOPS=2294, BW=9179KiB/s
- WRITE: IOPS=2292, BW=9168KiB/s, fsync avg latency=838 nanoseconds
NATIVE
- READ: IOPS=10.5k, BW=40.9MiB/s
- WRITE: IOPS=10.5k, BW=40.9MiB/s, fsync avg latency=338 nanoseconds
So I'm getting about a quarter of the performance inside the VM as outside. I was expecting a performance drop of a couple percent, maybe 10%, but not this much -- or are my expectations just unreasonable?
The setup is as follows:
- 96 cores (counting hyperthreading), all 96 of which are assigned to the VM
- 16 iothreads configured
- dedicated disk just for the VM (7.68T disks, SAS SSDs)
- disks are multipath and luks encrypted (decrypted on the host machine, then the /dev/mapper/... device is sent to the VM)
- disk in libvirt is configured as type='raw', cache='none', io='native', queues='8', queue_size='1024' (these values are admittedly a little arbitrary, but this is where I've ended up in my search for more performance so far) the bus is bus='virtio'
- I'm using libvirt 10.5.0 and create the VMs with virt-install
- inside the VM I'm running CentOS 7 currently (yes, very outdated, but since we have a LOT of VMs that need to be migrated it's been slow-going, and the hadoop cluster is getting the short end of that stick...)
I'm basically looking for alternative ideas at this point as to what might be reducing my performance -- could it be that I'm doing luks+multipath outside the VM instead of inside? Or that I should use more or less queues/queue_depth? or a different io engine? Should I try comparing virtio-scsi and virtio-blk?
I've also considered trying PCI forwarding, but since the scsi controller controls multiple disks (HPE synergy hardware) I can't easily do that, without giving all the disks to the VM.
Ultimately what I want to run inside these VMs is spark jobs + HDFS, so pretty disk heavy applications.