How is it possible that an M.2 NVMe SSD can have higher write IOPS compared to read IOPS?

Question

Sorry, I don't know the right place to ask this question. Meta tells me it is safe to ask here, on SuperUser.com:

Tonight, I was reading a review of the Seagate FireCuda 530 M.2 NVMe SSD here

(Please don't interpret this question as a shill for Tom's Hardware or Seagate!)

The 500GB model says:

Random Read 400,000 IOPS; Random Write 700,000 IOPS

The 1TB model says:

Random Read 800,000 IOPS; Random Write 1,000,000 IOPS

The 2 & 4TB model says:

Random Read 1,000,000 IOPS; Random Write 1,000,000 IOPS

These same facts appear on the manufacturer's data sheet

Amazingly, deeper in the review on page 2, it says:

Seagate’s 4TB FireCuda 530 does well during the random read workload, matching the Samsung 980 Pro and responding faster than the WD_Black SN850 at a QD of 1. We dialed the workload up to a QD of 256, and the FireCuda 530 maxed out at roughly 825,000 / 1,555,000 random read/write IOPS.

(My brain hurts when I read 1.5M write IOPS per second, as I came of "computing age" with 5.25 inch floppy disks!)

Questions:

How is this possible to have higher write IOPS compared to read IOPS? My assumption: It is always more expensive to write than to read. Why? Write validation / confirmation. Or are the write IOPS somehow less KB compared to read IOPS? If the hardware uses a massive write cache, is this also persistent between power cycles?
Why does the ratio of write and read IOPS change between models: (a) 500GB, (b) 1TB, and (c) 2 & 4TB?

Finally, I have seen this same trend for other M.2 NVMe SSDs.

score 1 · Answer 1 · answered May 22 '22 at 17:21

The answer is write caching, in SDRAM, SLC, or both. The NVMe controller can post completion of the Write I/O to the host as soon as the data has been received but before its written to its final NAND location. For reads the NVMe controller must deliver the data from NAND to the host, and this involves finding the block in the NAND mapping table (relatively time consuming at these very high IOPs levels) and then transferring the data from NAND.

How is it possible that an M.2 NVMe SSD can have higher write IOPS compared to read IOPS?

1 Answers1