1

SMR drives seem to be best for writing once, reading several times. I have a significant number of hardlink duplicated directory structures stored on an SMR device, and wished to delete the latest one, however the process seems to be excruciatingly slow. Either there has been a fault, or this is expected behaviour.

I can imagine several reasons why this may be, does concrete information exist to explain this?

Thoughts:

  1. Updating file table requires modifying early parts of the disk needing significant rewriting of data as per SMR rewrites.
  2. Linked files require reference count updates that are modifying the file table, again requiring significant rewrite of data.

More generally, how does a filesystem on SMR function efficiently? The data is spread over the disk but the file table must be kept separate otherwise the shingling will kill performance. How could a drive managed SMR know where the file table was to handle it in any special way? Are some filesystems better than others in this case?

Specifics:

  • Drive: Seagate Archive 6TB
  • OS: CentOS 7 kernel 3.10.0
  • Filesystem: BTRFS
  • Drive usage: 66%
J Collins
  • 878

1 Answers1

1

BTRFS is copy on write (CoW), meaning that (in general) data blocks are heavily fragmented (I use ZFS, same issue), including the file table.

Likely every sector that gets modified affects a whole shingled unit, and that explains why you see the terrible performances: you are writing for example 10 data blocks (10x4=40 KiB)? due to lack of locality, you may be, in fact, reading 10x 10 MiB (40 MiB) and then writing another 40 MiB. Add the fact that there is seek latency in various steps of the process, and the performances are killed.

Probably no solution except switching to a non-SMR drive or to a non-CoW filesystem.

Edit: additional info https://lwn.net/Articles/637035/ (SMR-aware filesystems)

http://www.tomsitpro.com/articles/shingled-magnetic-recoding-smr-101-basics,2-933.html#p3 "HGST is utilizing 256MB bands in its inaugural offerings. Seagate indicates that the sizes of the bands are adjustable for custom drive workloads and applications" so you are reading AND writing potentially 256 MiB per every sector (4 KiB) you modify.

FarO
  • 1,968