Why do SSD sectors have limited write endurance?

Question

I often see people mention that SSD sectors have a limited number of writes before they go bad, especially when compared to classic (rotating disc) hard drives where most of those fail due to mechanical failure, not sectors going bad. I am curious as to why that is.

I am looking for a technical yet consumer-oriented explanation, i.e. the exact component that fails and why frequent writes affect the quality of that component, but explained in such a way that it does not require an extreme amount of knowledge about SSDs.

score 81 · Accepted Answer · edited Aug 03 '16 at 01:48

Copied from "Why Flash Wears Out and How to Make it Last Longer ":

NAND flash stores the information by controlling the amount of electrons in a region called a “floating gate”. These electrons change the conductive properties of the memory cell (the gate voltage needed to turn the cell on and off), which in turn is used to store one or more bits of data in the cell. This is why the ability of the floating gate to hold a charge is critical to the cell’s ability to reliably store data.

Write and Erase Processes Cause Wear

When written to and erased during the normal course of use, the oxide layer separating the floating gate from the substrate degrades, reducing its ability to hold a charge for an extended period of time. Each solid-state storage device can sustain a finite amount of degradation before it becomes unreliable, meaning it may still function but not consistently. The number of writes and erasures (P/E cycles) a NAND device can sustain while still maintaining a consistent, predictable output, defines its endurance.

score 64 · Answer 2 · answered Aug 01 '16 at 13:16

Imagine a piece of regular paper and pencil. Now feel free to write and erase as many times as you please in one spot on the paper. How long does it take before you make it through the paper?

SSDs and USB flash drives have this basic concept but at the electron level.

jcbermu · Answer 3 · 2016-08-01T10:39:05.350

The problem is that the NAND flash substrate used suffers degradation on each erase. The erase process involves hitting the flash cell with a relatively large charge of electrical energy, this causes the semiconductor layer on the chip itself to degrade slightly.

This damage on the long run, increase bit-error rates that can be corrected with software, but eventually the error correction code routines in the flash controller can't keep up with these errors and the flash cell becomes unreliable.

Lister · Answer 4 · 2016-08-01T17:20:42.947

My answer is taken from people with more knowledge than me!

SSDs use what is called flash memory. A physical process occurs when data is written to a cell (electrons move in and out.) When this happens it erodes the physical structure. This process is pretty much like water erosion; eventually it's too much and the wall gives way. When this happens the cell is rendered useless.

Another way is that these electrons can get "stuck," making it harder for the cell to be read correctly. The analogy for this is a lot of people talking at the same time, and it's hard to hear anyone. You may pick out one voice, but it may be the wrong one!

SSDs try to spread the load evenly between its in use cells so that they wear down evenly. Eventually a cell will die and be marked as unavailable. SSDs have an area of "overprovisioned cells," i.e. spare cells (think substitutes in sport). When a cell dies, one of these are used instead. Eventually all these extra cells are used as well and the SSD will slowly become unreadable.

Hopefully that was a consumer friendly answer!

Edit: Source Here

bwDraco · Answer 5 · 2018-11-27T17:56:40.347

Nearly all consumer SSDs use a memory technology called NAND flash memory. The write endurance limit is due to the way flash memory works.

Put simply, flash memory operates by storing electrons inside an insulating barrier. Reading a flash memory cell involves checking its charge level, so to retain stored data, the electron charge must remain stable over time. To increase storage density and reduce cost, most SSDs use flash memory that distinguishes between not just two possible charge levels (one bit per cell, SLC), but four (two bits per cell, MLC), eight (three bits per cell, TLC), or even 16 (four bits per cell, TLC).

Writing to flash memory requires driving an elevated voltage to move electrons through the insulator, a process which gradually wears it down. As the insulation wears down, the cell is less able to keep its electron charge stable, eventually causing the cell to fail to retain data. With TLC and particularly QLC NAND, the cells are particularly sensitive to this charge drifting due to the need to distinguish among more levels to store multiple bits of data.

To further increase storage density and reduce cost, the process used to manufacture flash memory has been scaled down dramatically, to as small as 15nm today—and smaller cells wear down faster. For planar NAND flash (not 3D NAND), this means that while SLC NAND can last tens or even hundreds of thousands of write cycles, MLC NAND is typically good for only about 3,000 cycles and TLC a mere 750 to 1,500 cycles.

3D NAND, which stacks NAND cells one on top of another, can achieve higher storage density without having to shrink the cells as small, which enables higher write endurance. While Samsung has gone back to a 40nm process for its 3D NAND, other flash memory manufacturers such as Micron have decided to use small processes anyway (though not quite as small as planar NAND) to deliver maximum storage density and minimum cost. Typical endurance ratings for 3D TLC NAND are about 2,000 to 3,000 cycles, but can be higher in enterprise-class devices. 3D QLC NAND is typically rated for about 1,000 cycles.

An emerging memory technology called 3D XPoint, developed by Intel and Micron, uses a completely different approach to storing data which is not subject to the endurance limitations of flash memory. 3D XPoint is also vastly faster than flash memory, fast enough to potentially replace DRAM as system memory. Intel will sell devices using 3D XPoint technology under the Optane brand, while Micron will market 3D XPoint devices under the QuantX brand. Consumer SSDs with this technology may hit the market as soon as 2017, although it is my belief that for cost reasons, 3D NAND (primarily of the TLC variety) will be the dominant form of mass storage for the next several years.

score 4 · Answer 6 · answered Aug 03 '16 at 15:31

A flash cell stores static electricity. It's exactly the same kind of charge that you can store on an inflated balloon: you place a few extra electrons on it^∗.

What's special about static electricity is that it stays in place. Normally in electronics, everything is connected to everything else in some way with conductors, and even if there's a large resistor between a balloon and ground then the charge will vanish pretty quickly^†. The reason that a balloon stays charged is that air is actually an insulator: it has infinite resistivity.

Normally, that is. Since all matter^‡ consists of electrons and atom rumps, you can make anything a conductor: just apply enough energy, and some of the electrons will shake loose and be (for a short while) free to move closer to the balloon, or further from it. This actually happens in air with static electricity: we know this process as lightning!

I don't have to emphasise that lightning is a rather violent process. These electrons are a crucial part of the chemical structure of matter. In the case of air, lightning leaves a bit of the oxygen and nitrogen transformed to ozone and nitrogen dioxide. Only because the air keeps moving and mingling and those substances eventually react back to oxygen and nitrogen is the no “persistent harm” done, and the air is still an insulator.

Not so in case of a flash cell: here, the insulator must be way more compact. This is only feasible with solid-state oxide layers. Sturdy stuff, but it too isn't impervious to the effects of forcing some charge through the conductive material. And that's what eventually wrecks a flash cell, if you change its state too often.

By contrast, a DRAM cell doesn't have proper insulators in it. That's why it needs to be periodically refreshed, many times a second, to not lose information; however, because it's all just ordinary conductive charge transports, nothing much bad usually happens if you change the state of a RAM cell. Therefore, RAM endures many more read/write cycles than flash does.

^∗_{Or, for a positive charge, you remove some electrons from the molecule bonds. You need to take so few that this doesn't affect the chemical structure in a detectable way.}

^†_{These static charges are actually tiny. Even the smallest watch battery that lasts for years supplies enough charge every second to charge hundreds of balloons! It just doesn't have nearly enough voltage to punch through any noteworthy potential barrier.}

^‡_{At least, all matter on earth... let's not complicate things by going to neutron stars.}

score 1 · Answer 7 · answered Aug 04 '16 at 20:31

Less technical, and an answer to what I believe OP means by "I often see people mention that SSDs have a limited amount of writes in their sectors before they go bad, especially compared to classic rotating disk hard drives, where most drives fail due to mechanical failure, not sectors going bad."
I'll interpret the OP question as, "Since SSDs fail far more often than spinning rust, how can using one give a reasonable reliability?"

There are two types of reliability and failure. One is the thing fails completely due to age, quality, abuse, etc. Or, it may have a sector error due to lots of read/write.

Sector errors happen on all media. The drive controller (SSD or spinning) will re-map a failing sector data to a new sector. If it has failed completely, then it may still remap, but the data is lost. In SSD the sector is large and often fails completely.

SSDs can have one or both types of reliability. Read/write cycle issues can be helped with
having a larger drive. If you have a small drive and use it for OS like Windows, then it will get a lot of read/write cycles. The same OS on a much, much larger capacity drive will have fewer cycles. So, even a drive with "only" a few thousand cycles might not be a problem if each sector isn't erased frequently.
Balancing data - SSDs will move data from frequently used sectors to less frequently used ones. Think about the OS again, and updates, vs. a photo you took and just want to keep. At some point the SSD might swap the physical locations of the photo and an OS file to balance out the cycles.
Compression - compressing data takes less space, thus less writing.

Then there is quality of components. Getting the cheapest SSD or USB you can find might work for a while, but a quality one made for enterprise use will last a lot longer time, not just in erase cycles but in total use.

As drives get larger and larger (like 100-1000GB) then erase cycles become less of an issue even though they can sustain less writes. Some drives will use DRAM as a cache to help lower write cycles. Some will use a high-quality segment of the SSD for cache and lower quality for low cost and large size.

Modern good-quality consumer SSDs can last a good long time in a consumer machine. I have some 5+ years old that still work. I also have a couple of cheap, new ones that failed after a few months. Sometimes it is just (bad) luck.

Why do SSD sectors have limited write endurance?

7 Answers7

Linked