1

I use a USB thumb drive to back up my most important data. I have noticed that some files write very slowly and I finally realized what is different about them. These are files that are being over-written.

For example, mailbox files that change from one backup to the next. One is called IN.MBX and another is OUT.MBX

Overwriting the original file just crawls. But I can delete the old one and then copy over the new one in an instant. The files are hundreds of MB, but flash drives are fast when copying big files.

Someone asked a similar question here Why is it faster to copy than overwrite? six years ago and no one seems to have given a proper explanation.

Does anyone know why this is? It is a bit of a nuisance when doing backups each day.

3 Answers3

1

It has to do with HOW Flash storage (as opposed to HDD's) PHYSICALLY are made.
I cite fromspiceworks:

In Flash, it's actually not possible to directly overwrite a particular physical data location.  If a cell has stored data, and the controller decides it needs to write new data there, the cell must first be ERASED, and then a new WRITE operation can occur at that location.  ERASE and WRITE are electrically different operations in Flash.  If we had to do an ERASE and a WRITE consecutively in order to "overwrite" a particular piece of stored data, it would greatly decrease the effective write speed experienced by the user.

This has to do with how Flash drives are made, physically. A ERASE means: set all the cells(=bits) of that block to 1. This needs a HIGH energy (that is: a high voltage) and is slow. A WRITE means: set some of the bits to 0. This needs a LOWER energy (lower Voltage) and is faster. (Depending on the exact technology, '0' and '1' can be inverted).

This is actually the main reason that we will write new data in unused locations.  Wear leveling is important, but this differentiation between ERASE and WRITE operations dominates.

As of the citation, it seems that the controller of your particular flash drive is not well optimized. Normally, a flash drive would just delete the old entry in the file table, and write the new file in a different location. Then, when the drive is idle those regions with old data are being reset to ERASED = all bits are 1.

Such behaviour however can change if you don't have enough available free space: in that case, it could be that the controller has to first DELETE then ERASE then WRITE. This is nade file-by-file, so evidently, if you first erase these files as a user, the drive can just make one big EMPTY operation, which is faster than switching between DELETE, ERASE and WRITE operations.

Moreover:

the smallest unit that we can ERASE in Flash is known as a BLOCK (not to be confused with a logical block at the file system level).  A Flash BLOCK typically consists of 256 or 512 PAGES, so a Flash BLOCK can be up to 8 MB.
This difference between a 16kB write size and an 8MB erase size means that the storage device firmware needs to do some juggling when it comes time to start ERASING NAND BLOCKS in order to free up NAND PAGES for new writes from the host computer.  This process is known as Garbage Collection.  So, when you delete or change a file, the operating system will mark the corresponding set of logical blocks as invalid, and this sets of a chain of events which will eventually end up in the old data being ERASED, irretrievably.

What can you do?
a) buy a USB Drive with more space, to avoid lenghty ERASE-cycles
b) or, if your drive already is big enough, try a different manufacturer who may have a better algorithm for 'overwrite' commands (e.g: overwrite= delete&write at different location)
c) buy a small external SSD, which will be MUCH faster.

I'll try to find a comprehensive article I read some months ago and add it as reference

1NN
  • 10,044
0
  1. The USB interface is slow; USB 3.0 (blue) is slower than USB 3.1 (red), and USB 2 (black) is slower still.

  2. You don't have the same kind of IC in a flash drive as you do in a solid state drive (SSD). The controller in SSDs runs the multi-chip drive much faster than the single-chip flash drive.

  3. Writing a new file will be slightly faster than overwriting, for after all the blocks are written, the File Allocation Table is updated to show the space used by the original version of the file is to be reused. This, however, is only a minor increase in workload.

The process is different than explained by my learned colleague Jake Gould, whereas 1NN hit the nail on the head. With flash memory, you don't overwrite the previously used blocks; instead, a new file is written, then the prior version is marked as vacant in the FAT.

K7AAY
  • 9,725
0

As with what the answer in the link for the question says, if you want to overwrite it, you can't simply replace the data.

It just like when writing with a pencil on a paper, once you write, you can't write over it because it won't look right. So you have to erase it.

It is compounded by the fact of blocks/sectors. Everytime you want to overwrite a block, you must erase the block, then write it.

“Erasing is slower than writing.” - David Schwartz

Ex. [000000000000000000000000000000000000000000000000] is a block with each block holding 48 bytes. The thoeretical USB has 5 blocks, with a storage of 240B. Erasing and writing speed is the same at 1 byte/s.

You create a text file that says 'foo'. It is written to the first block.

Before:

[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]

After:

[010001100100111101001111000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]

So one block has been modified/written. Now if you want to overwrite and the content is 'bar'. You first have to erase it. (I don't know whether you zeroes the zeroes).

Current:

[010001100100111101001111000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]

Zeroing:

[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]

Erasing:

[111111111111111111111111111111111111111111111111]
[111111111111111111111111111111111111111111111111]
[111111111111111111111111111111111111111111111111]
[111111111111111111111111111111111111111111111111]
[111111111111111111111111111111111111111111111111]

Writing:

[010000100100000101010010000000000000000000000000]
[000000000000000000000000000000000000000000000000] 
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]
[000000000000000000000000000000000000000000000000]

As you can see, you have more operations in overwriting. If you delete it, then you simply remove the file pointer and it exist as raw data. Then you simply erase then write.

Giacomo1968
  • 58,727
Phoenix
  • 548