How does the FTL handle TRIM requests, and how do flash drives work when they don't support it?

Question

(Moved from a question on StackOverflow. TL;DR: I would like to know how modern flash drives and SSDs (the Samsung T5 doesn't seem to support TRIM, for instance) handle deleted data. Do they have a way to detect if some data is actually deleted or do they keep it around? How bad is the write-amplification issue when/if they wear-level, and is there any way around that? Can I 'refresh' the drive by clearing all (unused) LBAs with 0xFF so that new writes to that section of flash require no erases (since otherwise you're writing over prior data that may contain a 0-->1 transition and trigger an extra erase, PLUS several LBAs probably exist on the same block, so it may trigger several)? I know that's worse than TRIM, but since it doesn't have TRIM...)

A while ago, I asked the following question about filling a drive with 1s rather than zeros: How to fill a flash drive with 1s instead of zeros In it, I got the answer that this is not the same as, and is worse than, TRIM. While the latter makes sense to me (queueing an erase for later/reallocation vs forcing an erase, plus the lower-level approach is probably more optimized since the wear-leveling count isn't changed), it isn't clear to me what effect it will have or how drives work without TRIM support.

I'm vaguely aware of the high-level details on how FTLs (Flash Transition Layers) work in storage drives, but I have not been able to find information on write-amplification details in them. How do drives without TRIM work, and are there ways to extend the life of those drives?

Let's say I have a drive which has a block size of several pages per block (all of them, as far as I'm aware). If I change some data in a file, it causes a page (or several of them) to be updated (I'm going to skip the "Flash uses a virtual logical block and doesn't have logical blocks" argument, and just pretend that each one maps to one or more pages, but that a full LBA is probably smaller than a block).

If that page contains any 0s that change to 1s, the entire block must be erased before the data can be written (I don't care if this is the original block the changed data was in or not. Somewhere a block must be erased if the incoming page would set a 0 to a 1). Since there are other pages that weren't changed (let's say this is the end of the file, or it's fragmented), they get copied back to that block as well.

Now, here's where things get hazy. The FTL can remap pages into other positions in that block, or to other blocks entirely. I assume for the sake of the lifespan of the system mapping table (which must also be stored in that flash chip, albeit in a more durable portion), this isn't done too frequently, so most of the time it's probably written back to the same block.

The awkward part is that this erase cycle damages the flash, so it's better to write changed data to pre-erased blocks for speed and to avoid having to program the other pages (thus leaving them free for more actually-changed data). This probably does require an update to the system mapping table for where that page goes, so I'm not sure how this data is stored such that the mapping table isn't worn out before the flash itself is*.

There are two options here. One is that the drive supports TRIM. This lets the computer say, "I'm not using these LBAs, so you can mark them for garbage collection." As a result, the drive can skip writing-back the corresponding TRIMmed pages once that block is erased, and can use that to store a changed page without an erase sometime later. Or it might evict any remaining, valid, pages to an empty slot somewhere and erase the entire block for later. This makes sense.

It's all the USB thumb drives that don't support TRIM that confuse me. So, there, the drive has no idea that the file's contents that you just deleted are actually invalid (as the filesystem will only delete the pointer to that data). So I can assume that it eventually either has no empty pages left or it uses the reserve area for this. If the page is changed, it must copy back all pages to that block (or remap them to similar LBAs, at least) since it has no way of knowing if that block holds data or not. I would think that one of these drives would gradually fill with garbage data, since any time an LBA is used the drive has no way of knowing if it's still in use. If it's overwritten and remapped, perhaps something can be done with the old page, but it mostly can't consolidate freed pages. This also means that if it doesn't use reserve area for that scratch space, it will run out since eventually it thinks all pages/LBAs are being used (after enough crap is (re)written to the drive or the filesystem fills up at any point) and can't tell if the "contents" pages of a file are actually in use. This then leads to write amplification since whatever random contents the block used to have leftover from a deleted file get written back time and again on changes, rather than letting the empty space be used for wear-leveling or remapping. This would then force any wear-leveling to copy the target block's contents to the original so that they aren't lost, which results in extra writes and a shorter lifespan (plus it can't use "erased" pages efficiently or at all, resulting in guaranteed erases on change).

I would thus assume that writing the drive (say, with dd, or just cating a file) with 1s would overwrite all the cells into an ideal erased state ready for writing. I can guess that a remap is available since all the cells were overwritten, but the drive still can't tell if the new value of everything is actually significant to the overall filesystem and thus must continue to keep and report it (although any remaps or use of those erased pages are instantly available without a further erase). As a result, overwriting with zeros is probably bad, since any leftover, not-yet-reused pages in visible blocks must then stay programmed to zero (and will then require an erase whenever they're written, leaving us with the "no erased pages" issue from before).

But, if I were to overwrite with 0xff, the pages are already erased for whenever a remap moves a page into them, or that LBA is consumed by the filesystem. Shouldn't that be the superior method of clearing out a flash drive (if it doesn't support TRIM)? Sure, this still is not nearly as good as TRIM, but for drives that don't support it, what is the best option? Leave them entirely alone? Overwrite with zeros? Overwrite with ones? Probably by overwriting blank space only, maybe by creating large files with uniform contents (I imagine this is inferior to a tool that directly overwrites unused LBAs, but I'm not sure what that would be or if such exists)?

*I'm aware of at least one series of SSDs that had exactly this happen. This makes some sense (and also worries me, because that failure mode is undetectable and I don't know how rare it is), but I'm confused by how small, frequent writes make this more likely than larger ones. The same number of blocks/pages/whatever are changed on average, presumably, so shouldn't the remapping behavior also be equivalent? Or is this because each block of the mapping table stores multiple LBA-to-page maps, so the bulk write lets those change all at once too?

How does the FTL handle TRIM requests, and how do flash drives work when they don't support it?

0 Answers0