141

I need to test the resilience of some read/write code for some embedded hardware. How might I sacrifice a few SD cards and break several known sectors for a controlled study?

The only thing I can think of is to overwrite a single sector a few million times. I wonder if a Linux badblocks script can be created to run its destructive test on a single sector repeatedly for several hours.

mattdm
  • 3,011
Gabe Krause
  • 1,327

16 Answers16

169

An alternative approach that may be useful.

If your code runs under Linux then maybe you can test it with a "faulty" logical device. dmsetup can create devices that return I/O errors. Just build your device using error and/or flakey target. From man 8 dmsetup:

error
Errors any I/O that goes to this area. Useful for testing or for creating devices with holes in them.

flakey
Creates a similar mapping to the linear target but exhibits unreliable behaviour periodically. Useful for simulating failing devices when testing.

Note: flakey target usage is documented here. Basic example here.

As far as I know an I/O error will be reported immediately, so this is different than real SD card behavior where you can expect delay, stalling etc. Nevertheless I think this approach may be useful in some cases, at least to perform fast preliminary test or so.

75

This guy hacked the microcontroller inside SD cards used to mark bad blocks: https://www.bunniestudios.com/blog/?p=3554

You may be able to do the same and arbitrarily mark blocks as faulty.

Today at the Chaos Computer Congress (30C3), xobs and I disclosed a finding that some SD cards contain vulnerabilities that allow arbitrary code execution — on the memory card itself. On the dark side, code execution on the memory card enables a class of MITM (man-in-the-middle) attacks, where the card seems to be behaving one way, but in fact it does something else. On the light side, it also enables the possibility for hardware enthusiasts to gain access to a very cheap and ubiquitous source of microcontrollers.

.

These algorithms are too complicated and too device-specific to be run at the application or OS level, and so it turns out that every flash memory disk ships with a reasonably powerful microcontroller to run a custom set of disk abstraction algorithms. Even the diminutive microSD card contains not one, but at least two chips — a controller, and at least one flash chip (high density cards will stack multiple flash die).

.

The embedded microcontroller is typically a heavily modified 8051 or ARM CPU. In modern implementations, the microcontroller will approach 100 MHz performance levels, and also have several hardware accelerators on-die. Amazingly, the cost of adding these controllers to the device is probably on the order of $0.15-$0.30, particularly for companies that can fab both the flash memory and the controllers within the same business unit. It’s probably cheaper to add these microcontrollers than to thoroughly test and characterize each flash memory chip, which explains why managed flash devices can be cheaper per bit than raw flash chips, despite the inclusion of a microcontroller.

.

The crux is that a firmware loading and update mechanism is virtually mandatory, especially for third-party controllers. End users are rarely exposed to this process, since it all happens in the factory, but this doesn’t make the mechanism any less real. In my explorations of the electronics markets in China, I’ve seen shop keepers burning firmware on cards that “expand” the capacity of the card — in other words, they load a firmware that reports the capacity of a card is much larger than the actual available storage. The fact that this is possible at the point of sale means that most likely, the update mechanism is not secured.

In our talk at 30C3, we report our findings exploring a particular microcontroller brand, namely, Appotech and its AX211 and AX215 offerings. We discover a simple “knock” sequence transmitted over manufacturer-reserved commands (namely, CMD63 followed by ‘A’,’P’,’P’,’O’) that drop the controller into a firmware loading mode. At this point, the card will accept the next 512 bytes and run it as code.

FarO
  • 1,968
38

This typically won't work because most recent SD cards (or eMMC) use static and dynamic wear-levelling, meaning that an intelligent controller interprets your write instruction and maps it to one of the least used flash sectors.

The only thing you could do is try to contact your suppliers and ask for their datasheet; there might be some (vendor specific) ways to retrieve the state of their wear-levelling algorithm. This would potentially allow you to query the state/usage of the underlying flash. Or you might be unlucky and this might not exist.

If your goal is really to destroy flash, all you could do is run massive read and write cycles and continuously check that the data you are reading back is still consistent. E.g. create two large files, store their checksums and read/write them in order to verify their checksum. The larger the flash, the longer this process will take.

amo-ej1
  • 645
30

You can increase transistor wearing by increasing the operation temperature. Use write-erase cycles on a heated chip (70-120 °C); it will wear faster.

Pavlus
  • 548
17

Preface: This option requires additional programming and hardware modifications, but it would allow for controlled reads most likely transparent to the host.

An SD card has multiple I/O options, but it can be controlled over SPI. If you were to take an SD card and modify it so that you could attach the pins to a microcontroller (such as an Arduino) you could have the Arduino mimic the SD card and be transparent to the device reading the SD card. Your code on the microcontroller could purposely return bad data when needed. In addition, you could put an SD card on the microcontroller so the reads would be able to pass through the microcontroller to the SD card to allow for gigabytes of testing.

15

I would go to ebay/aliexpress and buy the cheapest SD card I can find from China, the one that are "too good to be true". They often come with faulty sectors or are in software set to be much larger than they actually are. Either way, you should end up with faulty SD card to use for testing.

GuzZzt
  • 151
11

Once upon a time, many years ago, I was paid to retrieve a set of graduation photos and videos from a SD card for a rather distraught mother. Upon close inspection, the card had somehow been physically damaged with a visible crack in the outer case and had several bad sectors, most notably several early, critical sectors, which made even the most reliable recovery programs at the time completely fail to read the card. Also, forensic data tools back then cost a fortune.

I ended up obtaining an identical brand/size SD card and writing my own custom raw data dump and restore utility to copy the data from the bad card to the good one. Every time the utility hit a bad sector, it would retry a number of times before writing all zeroes for that sector and, instead of giving up and stopping, ignore the failure and move on to the next sector. The retry attempts were made since I had also noticed that some sectors still had around a 40% read success rate. Once the data was on the new SD card, the recovery tools that had failed before worked flawlessly with minimal data loss/corruption. Overall, about 98% of all of the files were recovered. A number of items that had been previously deleted were also recovered because nothing is ever actually deleted - just marked as such and slowly overwritten. What started out as a slightly boring data recovery exercise became one of my more memorable and interesting personal software development projects. In case you were wondering, the mother was thrilled.

At any rate, this story goes to show that it is possible to physically damage a SD card such that data is still accessible but has sectors that are only barely functioning and anything attempting to read from it has difficulties doing so. SD card plastic tends to be pretty flimsy, so bending or cutting into some cheap ones might do the trick. Your mileage may vary.

You could also ask around at some data recovery places in your area. Since they specialize in data recovery from various failing or failed devices, they should have some useful input/tips and might even have some pre-busted SD cards on hand (e.g. for training purposes) that you could obtain from them.

6

This answer is an expansion on the comment of @Ruslan

  1. Fill your SD card up to about 99.9%
  2. Continiously re-write the content of the remaining 0.1% (Write A -delete-write B-delete - Write A ...)
  3. Test (periodically) whether you have already broken the card

Possible alternative:

Not sure whether this works for your purposes, but maybe it will actually suffice to physically damage your card, which could be a lot faster.

Dennis Jaheruddin
  • 496
  • 1
  • 5
  • 24
3

You could try introducing an unstable power supply or higher voltage signalling.

A common fault for a family of devices I know have a strong correlation between SD card corruption and intermittent battery contact.

PCARR
  • 129
3

Some older, low-capacity SD cards (16MB-ish) use flash chips in TSOP/TSSOP style packages. A workshop capable of SMT rework (if you are doing embedded work, you might have that skill inhouse, otherwise check for small companies doing board level phone/laptop repair) could conceivably separate and reattach that chip, so that it can be read and written raw (including the ECC codes) with a device programmer.

Still, be aware that you will be mainly testing:

  • How your device will handle possible timing aberrations/hiccups introduced by internal error correction

and in the worst case

  • how your device handles a terminally failing SD card.

If you just want to check how it behaves with erratic behaviour for whatever reason from an SD card, it is probably best to just introduce electrical noise into the interface lines (eg by putting a FET bus switch in between, and at random times momentarily switching it to a source of nonsensical signals (of the right electrical levels though).

rackandboneman
  • 780
  • 4
  • 6
2

Related to OlafM's answer but different: you can program a microcontroller of your own to speak the SD card protocol, and then emulate whatever behavior you want it to have.

1

Perhaps this is not the direction you wanted but I found removing my sd card while my radio or laptop was reading from it guarantees a crashed SD card about 1/5 or 1/10 times. It seems the cards don't do well having power removed during a read and presumably writes. After reading Robert Calhoun's comments below, it leads me to believe it may be damaging the FAT. Though I don't know why just reading causes a crash - there should not be any writing going on?

jwzumwalt
  • 295
1

The FAT32 Master Boot Record area is probably the most susceptible to abuse, since on a logical level it always needs to be in the same place. (Perhaps this is handled by the soft-remapping of bad sectors, but I am somewhat skeptical that this is implemented on all hardware.) So you could run sfdisk in a loop and see if you can wreck it that way.

But I am going to beg you to do whatever you can to improve hardware reliability, instead of trying to handle bad hardware in software. The problem that is that SD cards fail in all kinds of weird ways. They become unreadable, they become unwriteable, the give you bad data, they time out during operations, etc. Trying to predict all the ways a card can fail is very difficult.

Here's one of my favorite failures, "big data mode":

bad sd fake big data

SD cards are commodity consumer products that are under tremendous cost pressure. Parts change rapidly and datasheets are hard to come by. Counterfeit product is not unheard of. For cheap storage they are tough to beat, but while SSDs make reliability a priority, the priority for SD cards is speed, capacity and cost (probably not in that order.)

Your first line of defense is to use a solderable eMMC part with a real datasheet from a reputable manufacturer instead of a removable SD card. Yes, they cost more per GB, but the part will be in production for a longer period of time, and at least you know what you are getting. Soldering the part down also avoids a whole host of potential problems (cards yanked out during writes, poor electrical contact, etc.) with a removable card.

If your product needs removable storage, or it's just too late to change anything, then consider either spending the extra money for "industrial" grade cards, or treat them as disposable objects. What we do (under linux) is fsck the card on boot and reformat it if any errors are reported, as reformatting is acceptable in this use case. Then we fsck it again. If it still reports errors after reformatting, we RMA it and replace the hardware with a newer variant that uses eMMC.

Good luck!

1

If your sd-card is FAT32 formatted, you may hex-edit the 2 fats, and mark a sector as bad with the correct hex code. This is only a trick if you want to logic test a software supposed to find a bad sector at this particular place ; it won't harm your sd-card either, a reformat will bring it back to normal condition.

0

I wonder if a Linux badblocks script can be created to run its destructive test on a single sector repeatedly for several hours.

On a single sector—no, because the wear-levelling code inside the SD card will remap the logical blocks all over the place.

But you can easily run badblocks -w in a loop until it causes some bad blocks to appear. Something like this should work:

while badblocks -w /dev/xx; do :; done

assuming that badblocks returns 0 if no bad blocks were detected and ≠ 0 otherwise (the man page doesn't say and I haven't checked the source code.)

Tobia
  • 378
0

Normally with SD/uSD cards they implement wear leveling so this could be quite hard. Depending on type (single layer cell, multilayer, TLC, 3D-NAND etc) the write cycle required to break it enough to exhaust the sector pool may be in the multiple TB.

I did actually test this with a 4GB, 64GB and 256GB Pro Duo, SSD and thumbdrive, the 64GB K---s--- using 4 Micron 16GB chips lasted about 3.84TB before it failed with a single soft error in the FAT area. The 256GB using lasted a bit less but would estimate without direct chip access it probably wrote maybe 5TB before it finally gave out with MBR corruption but wasn't clear if the controller caused it as worked solidly in USB3 mode but USB2 had more glitches during readback and it also ran very hot. 4GB Duo failed in the reader when copying data, again can't be sure but equates to maybe 6 years of use and camera was also showing "Recovering" messages. Incidentally varying power supply voltage during write will make it fail a LOT faster. My 128GB microSD failed after about 2 years of use with similar symptoms, also had excess power drain and heat yet data read and wrote fine.

Removed irrelevant notes about X-ray experiments.