Poor I/O performance with qemu-kvm and native block device (dedicated disk)

Question

I'm running a VM on RHEL9 (qemu-kvm 9.0.0, release 10.el9_5), and I'm observing quite poor I/O performance (about a quarter of native performance). Here are some metrics, collected with fio:

$ fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting
INSIDE VM
- READ: IOPS=506, BW=2027KiB/s
 - WRITE: IOPS=509, BW=2017/s, fsync avg latency=766 nanoseconds
NATIVE (no virtualization)
 - READ: IOPS=1701, BW=6807KiB/s
 - WRITE: IOPS=1693, BW=6773KiB/s, fsync avg latency=385 nanoseconds
$ fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=128 --direct=1 --numjobs=16 --runtime=60 --group_reporting
INSIDE VM
 - READ: IOPS=2294, BW=9179KiB/s
 - WRITE: IOPS=2292, BW=9168KiB/s, fsync avg latency=838 nanoseconds
NATIVE
 - READ: IOPS=10.5k, BW=40.9MiB/s
 - WRITE: IOPS=10.5k, BW=40.9MiB/s, fsync avg latency=338 nanoseconds

So I'm getting about a quarter of the performance inside the VM as outside. I was expecting a performance drop of a couple percent, maybe 10%, but not this much -- or are my expectations just unreasonable?

The setup is as follows:

96 cores (counting hyperthreading), all 96 of which are assigned to the VM
16 iothreads configured
dedicated disk just for the VM (7.68T disks, SAS SSDs)
disks are multipath and luks encrypted (decrypted on the host machine, then the /dev/mapper/... device is sent to the VM)
disk in libvirt is configured as type='raw', cache='none', io='native', queues='8', queue_size='1024' (these values are admittedly a little arbitrary, but this is where I've ended up in my search for more performance so far) the bus is bus='virtio'
I'm using libvirt 10.5.0 and create the VMs with virt-install
inside the VM I'm running CentOS 7 currently (yes, very outdated, but since we have a LOT of VMs that need to be migrated it's been slow-going, and the hadoop cluster is getting the short end of that stick...)

I'm basically looking for alternative ideas at this point as to what might be reducing my performance -- could it be that I'm doing luks+multipath outside the VM instead of inside? Or that I should use more or less queues/queue_depth? or a different io engine? Should I try comparing virtio-scsi and virtio-blk?

I've also considered trying PCI forwarding, but since the scsi controller controls multiple disks (HPE synergy hardware) I can't easily do that, without giving all the disks to the VM.

Ultimately what I want to run inside these VMs is spark jobs + HDFS, so pretty disk heavy applications.

Poor I/O performance with qemu-kvm and native block device (dedicated disk)

0 Answers0