How to check the physical health of a USB stick in Linux?

Question

How to check the health status of a USB stick?

How do I know that a USB is broken beyond repair, or repairable?

score 100 · Answer 1 · edited Oct 31 '21 at 22:27

There is no way to query a USB memory stick for SMART-like parameters; I'm not aware of any memory sticks that support doing so even via publicly-available proprietary software. The best you can do is to check that you can successfully read+write to the entire device using badblocks.

https://en.wikipedia.org/wiki/Badblocks

You want to specify one of the write tests, which will wipe all data on the stick; make a backup first.

Find the device by looking at dmesg after plugging in the USB stick; you'll see a device name (most likely sd<letter>, e.g., sdc, sdd, etc.) and manufacturer information. Make sure you're using the proper device!

If the stick is formatted with a valid filesystem, you may have to unmount it first (with the umount command).

Example syntax, for a USB stick enumerated as /dev/sdz, outputting progress information, with a data-destructive write test and error log written to usbstick.log:

sudo badblocks -w -s -o usbstick.log /dev/sdz

You'll need to repartition and reformat the stick afterwards, assuming it passes; this test will wipe everything on the stick. Any failures indicate a failure of the device's memory controller, or it has run out of spare blocks to remap failed blocks. In that case, no area of the device can be trusted.

score 28 · Answer 2 · edited Dec 20 '14 at 23:25

Via [ubuntu] Error Check USB Flash Drive, I eventually found this, which could be helpful:

http://oss.digirati.com.br/f3/ "F3 - an alternative to h2testw"

I arrived at the blogs Fight Flash Fraud and SOSFakeFlash, which recomend the software H2testw (see here or here) to test flash memories. I downloaded H2testw and found two issues with it: (1) it is for Windows only, and (2) it is not open source. However, its author was kind enough to include a text file that explains what it does; this page is about my GPLv3 implementation of that algorithm.
My implementation is simple and reliable, and I don't know exactly how F3 compares to H2testw since I've never run H2testw. I call my implementation F3, what is short for Fight Flash Fraud, or Fight Fake Flash.

Addendum by @pbhj: F3 is in the Ubuntu repos. It has two part, f3write writes 1GB files to the device and f3read attempts to read them afterwards. This way capacity and ability to write and effectively read data are tested.

score 14 · Answer 3 · answered Jan 08 '12 at 23:33

It depends on the failure mode, I suppose. They're cheap for a reason.

As a USB device, watching the bus via device manager in Windows or the output of dmesg in Linux will tell you if the device is even recognized as being plugged in. If it isn't, then either the controller on board or the physical connections are broken.

If the device is recognized as being plugged in, but doesn't get identified as a disk controller (and I don't know how that could happen, but...) then the controller is shot.

If it's recognized as a disk drive, but you can't mount it, you might be able to repair it via fdisk and rewrite the partition table, then make another filesystem.

If you're looking for the equivalent of S.M.A.R.T., then you won't find it. Thumbdrive controllers are cheap. They're commodity storage, and not meant to have the normal failsafes and intelligence that modern drives have.

Lee Dunbar · Answer 4 · 2018-06-07T22:41:12.630

Along the way to today, this thread raised some questions.

-How long will this take (implied by discussion of letting it run overnight).

I'm currently testing a USB 3.0 128G Sandisk using sudo badblocks -w -s -o, it is connected to my USB 3/USBC PCIe card in an older Athlon 64x2. So, USB3 into USB3 on PCIe should be quite fast.

Here is my console command line at 33% completion:

Testing with pattern 0xaa: 33.35% done, 49:47 elapsed. (0/0/0 errors)

and again later:

Testing with pattern 0xaa: 54.10% done, 1:17:04 elapsed. (0/0/0 errors)

Next came this segment:

Reading and comparing: 43.42% done, 2:23:44 elapsed. (0/0/0 errors)

This process repeats with oxaa, then 0x55, 0xff, and finally 0x00.

ArchLinux gave an unqualified statement:

For some devices this will take a couple of days to complete.

N.B.: The testing was started about 8:30 p.m., testing had completed before 8:45 a.m. the next day, completing in about 12 hours for my situation.

-Destructive testing isn't the only method possible.

Wikipedia offered this statement:

badblocks -nvs /dev/sdb This would check the drive "sdb" in non-destructive read-write mode and display progress by writing out the block numbers as they are checked.

My current distro man page confirms the -n is nondestructive.

-n Use non-destructive read-write mode. By default only a non- destructive read-only test is done.

And finally that it isn't worth it. statement.

A summarizing statement, based on the situation of billions of memory sites in a flash chip, a failure is a cell that has already been written and erased tens of thousands of times, and is now failing. And when one test shows a cell has failed, remember that each file you added and erased is running up those cycles.

The idea here is that when 1 cell fails, many more cells are also reaching the same failure point. One cell failed today, but you use it normally for a while longer, then 3 more cells fail, then 24 more fail, then 183, and before you know it, the memory array is riddled with bad spots. There are only so many cells that can die before your usable capacity begins to fall, eventually falling rapidly. How will you know more cells are failing? So, posts here are guarding your data by saying once you have a bad cell, you are pretty much done in regards trustworthy storage. Your usage might still give you a few months.

It's your data.

HTH

Alex M · Answer 5 · 2020-01-22T23:40:10.267

Nobody seems to have mentioned a failure variant I ran into - a more general controller/interface failure.

When you plug a USB device in, it will generate some lines in dmesg. e.g.

 [ 3209.991107] usb 2-1.1: New USB device found, idVendor=0951, idProduct=1666
 [ 3209.991117] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
 [ 3209.991123] usb 2-1.1: Product: DataTraveler 3.0
 [ 3209.991129] usb 2-1.1: Manufacturer: Kingston

You can then run: lsusb

For more info you can focus on the Vendor ID:Product ID

lsusb -d -v 0951:1666

If your drive has been probed and recognised by the kernel you'll see a new /dev/sd? entry for a block storage device. If it hasn't automounted a filesystem, you can try to access the filesystem structure (as opposed to content):

e.g. mount /dev/sdb1 /mnt

In my case I had a fritzed controller on a new USB stick rather than dying NAND cells on an older one...

dmesg spat this out a while later, amongst many other messages:

[ 3356.078359] usb 2-1.1: new high-speed USB device number 36 using ehci-pci
[ 3361.098287] usb 2-1.1: device descriptor read/8, error -110
[ 3366.217872] usb 2-1.1: device descriptor read/8, error -110  
[ 3366.321702] usb 2-1-port1: unable to enumerate USB device

So, for me, once I'd finally got the USB filesystem mounted, half way through an fsck (to walk more NAND cells) it keeled over entirely and never came 'online' again!

Look for Krzysztof Opasiak - Debugging Usually Slightly Broken (USB) Devices and Drivers on UTube

Hope this adds a little more useful background, beyond the dying NAND cells scenario.

score 0 · Answer 6 · answered Mar 09 '22 at 01:24

mke2fs also check for bad block.

As it overwrites the disk, please back up the data in the disk before you proceed.

mke2fs -ccv /dev/sdb

Quote man 8 mke2fs:

-c     Check  the  device  for bad blocks before creating the file system.  If this option is specified twice, then a slower read-write test is used instead of a fast read-only test.

Of course, this method assume the disk uses ext2/ext3/ext4.

David Pickett · Answer 7 · 2015-08-15T17:36:04.407

Many failures are either complete or allow one location to support multiple locations. I wrote a little random write read program that uses a prime number for a pseudo-random number generator, for both patterns and addresses. The reads are staggered behind the writes by enough pages to ensure I am not testing ram cache on the system. It is not yet parameterized, just set up for a 64G device on my system with 8G ram. Feel free to criticize, parameterize, make it smarter.

This is a powerful check and faster than doing every byte bottom to top, but is also a great swap generator (rolls almost everything else out). I put swapiness at 1 temporarily and it became slower but more tolerable to other apps. Any tips on how to tune against swapout would also be appreciated:

$ sudo ksh -c 'echo 1 > /proc/sys/vm/swappiness'

$ cat mysrc/test64g.c

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>

int main( int argc, char **argv ){

        long long int mask = 0xFFFFFFFF8L ;    // 64Gb word
        long long int stag = 8413257 ;  // 8G / 1021
        long long int inc = 1021L ;     // prime < 1024

        long long int w_addr = 0L ;
        long long int r_addr = 0L ;
        long long int w_ct = 0L ;
        long long int r_ct = 0L ;
        long long int w_patt = 0xFEDCBA9876543210L ;
        long long int r_patt = 0xFEDCBA9876543210L ;
        long long int r_buf ;
        int fd, ret ;

        if ( argc < 2
          || argv[1] == NULL
          || 0 > ( fd = open( argv[1], O_RDWR ))){
                printf( "Fatal: Cannot open file $1 for RW.\n" );
                exit( 1 );
        }

        while ( 1 ){
                if ( (off_t)-1 == lseek( fd, w_addr & mask, SEEK_SET )){
                        printf( "Seek to %llX\n", w_addr & mask );
                        perror( "Fatal: Seek failed" );
                        exit( 2 );
                }

                if ( 8 != ( ret = write( fd, (void*)&w_patt, 8 ))){
                        printf( "Seek to %llX\n", w_addr & mask );
                        perror( "Fatal: Write failed" );
                        exit( 3 );
                }

                w_ct++ ;
                w_addr += inc ;
                w_patt += inc ;

                if ( ( w_ct - r_ct ) < stag ){
                        continue ;
                }

                if ( (off_t)-1 == lseek( fd, r_addr & mask, SEEK_SET )){
                        printf( "Seek to %llX\n", r_addr & mask );
                        perror( "Fatal: Seek failed" );
                        exit( 4 );
                }

                if ( 8 != ( ret = read( fd, (void*)&r_buf, 8 ))){
                        printf( "Seek to %llX\n", w_addr & mask );
                        perror( "Fatal: Read failed" );
                        exit( 5 );
                }

                if ( ( ++r_ct & 0XFFFFF ) == 0 ){
                        printf( "Completed %lld writes, %lld reads.\n", w_ct, r_ct );
                }

                if ( r_buf != r_patt ){
                        printf( "Data miscompare on read # %lld at address %llX:\nWas: %llX\nS/B: %llX\n\n", r_ct, r_addr & mask, r_buf, r_patt );
                }

                r_addr += inc ;
                r_patt += inc ;
        }
}

I Think before I am · Answer 8 · 2023-08-16T19:03:45.973

The OP asked two/three questions in one go.

How to check the health status of a USB stick?

2a. How do I know that a USB [stick] is broken beyond repair

2b. or repairable?

Sadly, the OP never received the correct answer, until now.

The answer is fairly simple:

You can't.

2a. If it fails to hold the data written to it.

2b. It never is.

Although this might look like a derogative answer at first glance, it is true nonetheless.

The only health -or smart- system for flash memory is you. You, yourself, need to keep track of the amount of bytes written to it.

The controller in the USB flash drive may, or may not, keep an account of empty cells. (Some do, some don't, price is not really an indication. Specs are.)

Since (supposedly) a single cell can change its value about 200 to 1000 times, the life-expectancy can vary wildly between size, manufacturer and usage.

Because most flash drives are WORM's anyway, the issue of data retention is a more poignant matter.

Which is proudly advertised by the more expensive brands (>10 years, some claim.)

Also speed can be an issue, I have noticed high speed ones to be unreadable after 10 years of non-usage, where the low speeds ones still retained the data intact.

Furthermore, it is useless to "test" a flash drive's cells by "hammering" them: it'll only reduce their "health". (Unless you write once, and then verify the written data.)

::

The short answers to the OP's questions are therefore:

No

2a. You don't

2b. You can't.

Compare this to photographs, that after even over 100 years, still show a discernible picture.

Of course this is Write-Once-Read-Many times memory. And not comparable to USB flash-stick-memory.

Thank you for reading this. Peace, please.

How to check the physical health of a USB stick in Linux?

8 Answers8

Linked