5

Before an SSD sector1 was ever written to, it looks like all filled with zeros.

So, if I write all zeros to a sector, for the purpose of functionality, it will look just like a free one. Thus the controller has a technical possibility to treat it as such. My limited knowledge of IC architecture says that hardware-wise, slowdown from a circuit testing for all zeros would probably be negligible, if any at all.

The question is: does any flash/SSD controller actually implement this or anything similar?

It looks even more applicable to flash memory storage connected via interfaces that don't have the TRIM command, like USB.

In the answers posted so far, a few people outlined possible show-stopper issues. Yet they all turned out to be non-issues. Unless there's evidence those really are serious problems, please do not authoritatively claim they are but rather honestly say you're only hypothesizing that.


1A logical sector i.e. what the host sees.

ivan_pozdeev
  • 1,973

6 Answers6

3

Can I emulate TRIM by writing all zeros?

No.

Here's how flash works:

  • Unwritten flash is all 1's, and writes pull down the 1's to 0's.

  • Flash is written in a quantity of bytes known as a page, 2048 bytes being an example of a page size. (There is also a small amount of data - 64 bytes or so, that is also part of that page where ECC information can be stored)

  • What if you want to change 0's back to 1's. You can't unless you erase the page.

  • When you erase flash - which flips all the bits back to 1's if the page is not damaged, the quantity of bytes you can erase (the eraseblock size to borrow from Linux terminology) is typically bigger than the page size. 128k being an example of an eraseblock size.

  • Erasing takes a lot more time than just writing to a page.

So, because:

  • SSDs pretend they are standard hard drives to the host. Standard hard drives work on 512 byte sectors (called LBAs and numbered 0 to the capacity of the drive divded by 512), not 2048 or any other size;

  • and the SSD firmware has to do a lot of fakery in the background since there really isn't 512 byte places to store the data like on a spinning hard drive;

  • and writing to a page that doesn't need to be erased is faster than erasing it, then writing to it.

SSDs maintain something called an LBA to PBA table. The operating system, for example, tells the SSD to write to LBA 20, but it might really go into something like "Flash chip 2 page 56". This is maintained in the LBA to PBA table.

The SSD firmware will try to direct writes to fresh pages and avoid erasing unless necessary. If no unwritten pages are available, it will have to shuffle things around and do a read/maybe write somewhere else/eraseblock/write a bunch of stuff back cycle.

So the LBA to PBA table can be completely random.

TRIM tells the SSD that it can remove entries from this table (or mark as "LBA not written to yet") and actually erase some flash, and have it available for fast writes in the future.

So this is why writing all 0x00's or 0xFF's isn't equivalent. Only TRIM tells the firmware it's OK to not track things in that table and consider flash unused - and erase it in preparation for new writes.

Writing all 0x00's or 0xFF's results in a full LBA-to-PBA table that is tracking a data it thinks you are using and things will remain slow due to the need for it to shuffle things around and read/erase/rewrite.

LawrenceC
  • 75,182
3

The answer is that this appears to be the case at least for some SSDs*. I've tested this empirically for an older Kingston SSD (see below), and another person reported this to work for both a 2012 Sandisk SSD and a 2015 Samsung SSD.
As pointed out in the comments and in this answer, the data below does not prove that the firmware of the tested SSD treats writing zeros in the exact same way it treats a TRIM command; it's also possible that it e.g. applies compression which in practice has a similar, but non-identical effect. Hence a smoking-gun result is still outstanding and would require lower-level analysis of the hardware (or retrieving and directly analysing the firmware).

Due to data recovery from a dead laptop, I had a spare Kingston SATA 256GB SSD (RBU-SNS8152S3256GG2) inside an M.2-to-USB3 adapter which I decided to re-use for an RPi. Unfortunately, it turned out that the adapter does not support TRIM and that write performance was already significantly reduced from previous usage (~50MiB/s for sequential writes as per fio and even less for small random writes, even though zfs compression was accidentally still active, hence real speeds were even lower).

Trying to avoid buying another M.2-to-USB3 enclosure, I've decided to test whether writing all zeros to the SSD would restore the performance (and avoid excessive wear due to write amplification). It worked nicely, with sequential write speeds increasing to ~230MiB/s and 8KiB random writes around 110MiB/s (again tested via fio on a zfs dataset with compression on, for comparability with earlier results).

The explicit command to write zeros to the SSD was as follows:

dd if=/dev/zero of=/dev/disk/by-id/<device_id> bs=1M status=progress &> <filename>.log

While not being a proper benchmark and thus to be taken with a grain of salt, the first dd run itself had the following write speeds, showing the reduced and inconsistent performance: first dd run

After performing various tests incl. running dd a few times more (with random-valued & zero-valued input) and giving the SSD time to perform garbage collection, I did a final zero-valued dd run to refresh the SSD before putting it to use, resulting in the following write speeds: final dd run

Ignoring the few outliers which were probably caused by other processes interfering (computer was not idle during test), it now looks rock solid and mirrors the significantly better results from fio.

* I've also observed this behaviour for several older external SMR drives from Western Digital that do not support TRIM (at least not via the USB connection) and which were almost unusable after several years, with even sequential writes being excruciatingly slow and huge latency spikes; writing all zeros to those drives with dd fully refreshed them.

Eruvaer
  • 197
3

No and Yes.


TL;DR: TRIM is a command that is designed to inform the drive about LBA ranges it can set aside for garbage collection. While writing zeros can have this effect depending on how the firmware treats zero entropy data, in a worst case scenario it actually allocates entire LBA space leaving just over-provisioned space as wiggle room for the firmware, potentially increasing write amplification and thus wear - the exact opposite of what you're trying to accomplish.


Simple answer: No. TRIM is a command that in general causes a SSD drive to unmap LBA from PBA addresses. If such an unmapped sector is read the controller returns zeros without even reading the drive. Unmapped or stale sectors become available for the garbage collector that can then consolidate NAND blocks it can erase, after which space is available for the drive to write to.

Writing zeros is writing data and so writing zeros from LBA(min) to LBA(max) causes the entire LBA space to be mapped. Or at least this was how this worked and still works on older and perhaps cheaper, lower spec drives.

So while effect is or better appears the same when we read from the drive as reading from a zero-filled drive, it is not actually the same.


I have done experiments that show a difference in power consumption between reading from a zeroed drive vs. a 'trimmed' drive: For reading actual zeros from the NAND the drive has to do actual work, provide power to charge-pumps to achieve required voltages to read the NAND cells.

enter image description here


HOWEVER!

  1. As others suggested the SSD firmware may be clever enough to detect you're writing zeros and so may refrain from actually storing the zeros and instead make a 'note' in it's mapping tables that some LBA sector is having zeros written to it, so when the time comes you read the LBA sector the controller simply returns zero filled sectors.

  2. Even if the controller does not apply above method, it may be compressing data (example). And so for writing zeros to entire LBA space the controller may only have to allocate a fraction of NAND real-estate to store the data.

The smart firmware makes sense and controllers with a dedicated compression unit too, as if the manufacturer can limit the amount of data written to a drive, it will live longer.

The effect of writing zeros to the drive will have a similar effect as TRIM if zeros are written to LBA space that previously contained data: Controller detects the zeros or can compress that to virtually nothing, which means the LBA space containing the original data can be unmapped (and made available to garbage collector) while it virtually has to assign no physical space to storing the zeros.

To further support this idea, a colleague of mine was recently researching the translation algorithm of a modern SSD and for this he wrote a pattern to the drive (0x77 bytes). He noticed the SSD was not actually writing by observing power consumption: To write data the voltage in the NAND needs to be increased, 'pumped up' even more than when reading. So these controllers appear to detect any low entropy data (not just zeros) and for example keep a table with 'place holder data' for LBAs that have zero entropy data written to them.

So the longer answer is that it depends on the drive's controller and firmware. The more modern the drive the better the chance the firmware will detect zeros/compress and so effect will be similar to TRIM.

But you can not just assume a drive will work like this and also you could argue if it does, it's just a side effect of foe example on-the-fly compression done by the controller.

Difference is that TRIM is quicker as no actual data transport from host to drive is required. And another difference is that TRIM was sort of designed for what you want to do and probably more predictable in behavior.

2

The short answer is likely not, and if you can do that it would be only because your controller explicitly detects zeroed blocks and trim them (I'm not even sure if any controller does this but afaik some VM implementation can detect zeroed blocks and trim them on the host OS).

All previous answers talk about flash format and the fact it starts with all bits set to 1's, but that is completely besides the point. Most controllers will return all zeros for trimmed/unallocated blocks (not all of them guarantees it IIRC!) and so the idea is that if you write 0's to a block, the controller might be able to detect it and trim the block rather than allocating and saving it.

I don't think there can be a clear No answer without looking at all major controller firmwares and see if any implement this feature, but if a controller can implement it without noticeable overhead it's definitely a plus as zeroed blocks will both be faster to write and extend the disk's life expectancy, not just with the saved block write but also by leaving more free blocks for wear leveling.

1

Can I emulate TRIM by writing all zeros?

No.
The act of writing requires an erased sector, and then the actual write operation occurs.
The write operation is an indication to the SSD that this sector is in use (the opposite condition that you want with a real TRIM command).

Before an SSD sector was ever written to, it looks like all filled with zeros.

Incorrect, and apparently your question is based on this faulty premise.
An erased sector is filled with bytes of 0xFF (all ones).

A format traditionally writes all zeros to every sector.

So, if I write all zeros to a sector, for the purpose of functionality, it will look just like a free one.

No, it will not.
Beware that there are "free" sectors at the filesystem level, and "free" sectors at the SSD level. In theory they should be the same set, but since the SSD has to be explicitly informed by the filesystem that a sector is "free" (with a TRIM comand), there are discrepancies.

ADDENDUM

Thus the controller has a technical possibility to treat it as such. My limited knowledge of IC architecture says that hardware-wise, slowdown from a circuit testing for all zeros would probably be negligible, if any at all.

The question is: does any flash/SSD controller actually implement this or anything similar?

No, because that would lead to unintended data loss.
Whenever a program wrote a sector of all zeroes (e.g. a memory image can have such blocks), your scheme would allow the SSD to discard that sector, since it would handle it as an unmapped sector, instead of a sector in use and allocated to a file.

Bottom line, your proposed scheme (using data content) does not work.
If you want to designate a sector as free or unused, then there's the TRIM command.
There is no substitute write operation.

sawdust
  • 18,591
1

Actually, an erased SSD sector is filled with ones, not zeros. You are confusing SSD sectors (the actual physical sectors on the SSD that we want to trim) with disk sectors (the logical sectors the SSD presents to the file system after it's done with its management magic). Filling logical sectors with zero would untrim since it would force the SSD to allocate erased physical SSD sectors and fill them with zeros.

When a logical sector is trimmed, the SSD unmaps any physical sectors mapped to that logical sector. When it gets a chance, it erases them, which fills them with 1's. Once erased, they're added to a pool of erased physical sectors. The goal of trimming is to enlarge the pool of erased physical sectors.

When you read a logical sector that has no corresponding physical sector, the drive returns a page of zeroes. But it doesn't have to read any physical sector to do so, nor could it since no physical sectors are mapped.

See here for more details.