How come writing varying file sizes results in varying speeds?

Question

Here is some output from the dd command:

$ dd if=/dev/zero of=test.file bs=10M count=1
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 0.130214 s, 80.5 MB/s
$ dd if=/dev/zero of=test.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00683995 s, 153 MB/s
$ dd if=/dev/zero of=test.file bs=512k count=1
1+0 records in
1+0 records out
524288 bytes (524 kB) copied, 0.0029348 s, 179 MB/s
$ dd if=/dev/zero of=test.file bs=10k count=1
1+0 records in
1+0 records out
10240 bytes (10 kB) copied, 0.000199126 s, 51.4 MB/s
$ dd if=/dev/zero of=test.file bs=1k count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000133526 s, 7.7 MB/s
$ dd if=/dev/zero of=test.file bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.000149828 s, 6.7 kB/s

Why is there a speed "hump" at 10k?
Why does it seem that speed is MUCH slower at very small file sizes?

Ярослав Рахматуллин · Accepted Answer · 2013-04-10T22:05:51.613

The observed effect is essentially correct, but the method is flawed for at least two reasons.

Various buffer sizes are tested, not various file sizes. This is because dd is instructed to read and write from the same two files, while only the bs= (size of read and write chunks) parameter changes.
The output speed of /dev/zero will be affected by the size (or buffer) of the read request because it will deliver the requested amount of bytes. This is reported by the in-out records.

The apparent slow down at 1 byte is caused by the fact it takes relatively more time to create the file than it takes to read and write the contents, while at the speed hump at 10K, the file creation ceases to dominate the overall time for the trial. The speed degradation from there on is governed by the burst-write (sequential) speed of the media where the test file is located.

To conduct a benchmark of the relative speed of various file sizes you would have split a set amount of data between files such as:

100MB to 10 files 
10MB  to 100 files, 
1MB   to 100 files,
500KB to 204 files, 
64K   to 1600 files
1K    to 102400 files.

Other factors come in to play as well such as the block/sector size of the media and the file system (block size) minimum allocation size for a single file.

Reactions to a comment by sawdust.

The "bs" in the dd command is for block size, and combined with the count, a suitable transfer size is specified to calculate I/O rate

A suitable and exact size may be chosen with those two, but the I/O rate will be worse for very low bs=. From what I know (man dd, GNU Coreutils), the dd parameter bs=BYTES is explained with "read and write up to BYTES bytes at a time". To me, that sounds like:

int bs  // read upto bs bytes in each read operation.
byte[] buffer
while ( bs = file.readIntobufferAndReturnBytesRead(buffer) ) {
     ... use data in buffer
}

In other words, many small read and writes will take longer than one large. If you feel that this is a bogus statement you are welcome to conduct an experiment where you fill a 5L bucket with tea spoons first, and compare the required time to complete the same task using cups. If you happen to be more familiar with the internal workings of dd, please present your evidence for why bs= is "block size" and what that means in the code.

Here why my second statement makes sense:

fixed trial.sh:

MB10="
10MB   1
1MB    10
512KB  20
256KB  40
128KB  80
64KB   160
32KB   320
16KB   640
4KB    2560
1KB    10240
512    20480
64     163840
1      10485760
"

BLOCKS100="
10MB 100
1MB 100
512KB 100
256KB 100
128KB 100
64KB 100
32KB 100
16KB 100
4KB 100
1KB 100
512 100
256 100
128 100
64 100
32 100
16 100
4 100
1 100
"
function trial {
    BS=(`echo -e "$1" | awk '{print $1}'`)
    CO=(`echo -e "$1" | awk '{print $2}'`)

    printf "%-8s %-18s %7s %12s %8s\n" bs count data time speed

    for ((i=0;i<${#BS[@]};i++ )); do 
        printf "%-8s %-18s" "bs=${BS[i]}" "count=${CO[i]}"
        dd if=/dev/zero of=/dev/null bs=${BS[i]} count=${CO[i]} \
            |& awk '/bytes/ { printf "%10s  %-12s %8s\n", $3" "$4, $6, $8""$9 }'

    done
    echo
}

trial "$BLOCKS100"
trial "$MB10"

.

$ sh trial.sh
bs       count                 data         time    speed
bs=10MB  count=100           (1.0 GB)  0.781882      1.3GB/s
bs=1MB   count=100           (100 MB)  0.0625649     1.6GB/s
bs=512KB count=100            (51 MB)  0.0193581     2.6GB/s
bs=256KB count=100            (26 MB)  0.00990991    2.6GB/s
bs=128KB count=100            (13 MB)  0.00517942    2.5GB/s
bs=64KB  count=100           (6.4 MB)  0.00299067    2.1GB/s
bs=32KB  count=100           (3.2 MB)  0.00166215    1.9GB/s
bs=16KB  count=100           (1.6 MB)  0.00111013    1.4GB/s
bs=4KB   count=100           (400 kB)  0.000552862   724MB/s
bs=1KB   count=100           (100 kB)  0.000385104   260MB/s
bs=512   count=100            (51 kB)  0.000357936   143MB/s
bs=256   count=100            (26 kB)  0.000509282  50.3MB/s
bs=128   count=100            (13 kB)  0.000419117  30.5MB/s
bs=64    count=100           (6.4 kB)  0.00035179   18.2MB/s
bs=32    count=100           (3.2 kB)  0.000352209   9.1MB/s
bs=16    count=100           (1.6 kB)  0.000341594   4.7MB/s
bs=4     count=100            (400 B)  0.000336425   1.2MB/s
bs=1     count=100            (100 B)  0.000345085   290kB/s

bs       count                 data         time    speed     
bs=10MB  count=1              (10 MB)  0.0177581     563MB/s   566MB/s    567MB/s
bs=1MB   count=10             (10 MB)  0.00759677    1.3GB/s   1.3GB/s    1.2GB/s
bs=512KB count=20             (10 MB)  0.00545376    1.9GB/s   1.9GB/s    1.8GB/s
bs=256KB count=40             (10 MB)  0.00416945    2.5GB/s   2.4GB/s    2.4GB/s
bs=128KB count=80             (10 MB)  0.00396747    2.6GB/s   2.5GB/s    2.6GB/s
bs=64KB  count=160            (10 MB)  0.00446215    2.3GB/s   2.5GB/s    2.5GB/s
bs=32KB  count=320            (10 MB)  0.00451118    2.3GB/s   2.4GB/s    2.4GB/s
bs=16KB  count=640            (10 MB)  0.003922      2.6GB/s   2.5GB/s    2.5GB/s
bs=4KB   count=2560           (10 MB)  0.00613164    1.7GB/s   1.6GB/s    1.7GB/s
bs=1KB   count=10240          (10 MB)  0.0154327     664MB/s   655MB/s    626MB/s
bs=512   count=20480          (10 MB)  0.0279125     376MB/s   348MB/s    314MB/s
bs=64    count=163840         (10 MB)  0.212944     49.2MB/s  50.5MB/s   52.5MB/s
bs=1     count=10485760       (10 MB)  16.0154       655kB/s   652kB/s    640kB/s

The relevant part for the second flaw is when the data size is constant (10 MB). The speeds are obviously slower for very small chunks. I'm not sure how to explain the "drop" at bs=10MB, but I could guess that it's due to how dd handles buffering for large chunks.

I had to get to the bottom of this (thanks to sawdust for challenging my assumption) ...

It looks like my assumption about buffer size==bs is incorrect, but it's not entirely false because the bs parameter influences the size as demonstrated with a dd that prints the buffer size. This means that my 2nd flaw is not relevant for files < 8K on systems where the page size is 4K:

$ ./dd  if=/dev/zero of=/dev/null  bs=1  count=1
OUTPUT_BLOCK_SLOP: 4095
MALLOC INPUT_BLOCK_SLOP: 8195, ibuf: 8196
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.000572 s, 1.7 kB/s


$ ./dd  if=/dev/zero of=/dev/null  bs=1805  count=1
OUTPUT_BLOCK_SLOP: 4095
MALLOC INPUT_BLOCK_SLOP: 8195, ibuf: 10000
1+0 records in
1+0 records out
1805 bytes (1.8 kB) copied, 0.000450266 s, 4.0 MB/s

(dd.c from coreutils 8.20)

 line:text
   21:   #define SWAB_ALIGN_OFFSET 2
   97:   #define INPUT_BLOCK_SLOP (2 * SWAB_ALIGN_OFFSET + 2 * page_size - 1)
   98:   #define OUTPUT_BLOCK_SLOP (page_size - 1)

 1872:  real_buf = malloc (input_blocksize + INPUT_BLOCK_SLOP);         // ibuf
 1889:      real_obuf = malloc (output_blocksize + OUTPUT_BLOCK_SLOP);  // obuf
                 // if conversion is on, othervise obuf=ibuf 

 2187:  page_size = getpagesize ();

man 3 memcpy

   void *memcpy(void *dest, const void *src, size_t n);

   The memcpy() function copies n bytes from memory area src to memory area
   dest.  The memory areas must not overlap.  Use memmove(3) if the  memory
   areas do overlap.

score 0 · Answer 2 · answered Apr 08 '13 at 23:51

Small files require the OS to stop and create directory entries for each file. So a 100mb has 1 directory entry and then the data. A 100 files 1mb in size has a 100 directory entries each one takes time to create. It also takes time to locate the next free sector and a bit more time to jump to that sector and start writing. Really small files will have inaccurate timings. The operation is done in memory and writing to the disk whenever the OS feels like it. The timing then represents the time it took to do it in memory and not to the disk. Unless you have a RAID,SSD, or other super fast storage anything >100mb/s is a memory timing.

How come writing varying file sizes results in varying speeds?

2 Answers2

Linked