52

I wanted to produce a 1 GB random file, so I used following command.

dd if=/dev/urandom of=output bs=1G count=1

But instead every time I launch this command I get a 32 MB file:

<11:58:40>$ dd if=/dev/urandom of=output bs=1G count=1
0+1 records in
0+1 records out
33554431 bytes (34 MB, 32 MiB) copied, 0,288321 s, 116 MB/s

What is wrong?

EDIT:

Thanks to great answers in this topic I came with solution that reads 32 chunks 32 MB large which makes 1GB:

dd if=/dev/urandom of=output bs=32M count=32

Other solution was given that reads 1 GB straight to the memory and then writes to disk. This solution takes a lot of memory so it is not preffered:

dd if=/dev/urandom of=output bs=1G count=1 iflag=fullblock

2 Answers2

97

bs, the buffer size, means the size of a single read() call done by dd.

(For example, both bs=1M count=1 and bs=1k count=1k will result in a 1 MiB file, but the first version will do it in a single step, while the second will do it in 1024 small chunks.)

Regular files can be read at nearly any buffer size (as long as that buffer fits in RAM), but devices and "virtual" files often work very close to the individual calls and have some arbitrary restriction of how much data they'll produce per read() call.

For /dev/urandom, this limit is defined in urandom_read() in drivers/char/random.c:

#define ENTROPY_SHIFT 3

static ssize_t
urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
{
    nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
    ...
}

This means that every time the function is called, it will clamp the requested size to 33554431 bytes.

By default, unlike most other tools, dd will not retry after receiving less data than requested – you get the 32 MiB and that's it. (To make it retry automatically, as in Kamil's answer, you'll need to specify iflag=fullblock.)


Note also that "the size of a single read()" means that the whole buffer must fit in memory at once, so massive block sizes also correspond to massive memory usage by dd.

And it's all pointless because you usually won't gain any performance when going above ~16–32 MiB blocks – syscalls aren't the slow part here, the random number generator is.

So for simplicity, just use head -c 1G /dev/urandom > output.

grawity
  • 501,077
22

dd may read less than ibs (note: bs specifies both ibs and obs), unless iflag=fullblock is specified. 0+1 records in indicates that 0 full blocks and 1 partial block was read. However any full or partial block increases the counter.

I don't know the exact mechanism that makes dd read a block that is less than 1G in this particular case. I guess any block is read to the memory before it's written, so memory management may interfere (but this is only a guess). Edit: this concurrent answer explains the mechanism that makes dd read a block that is less than 1G in this particular case.

Anyway, I don't recommend such large bs. I would use bs=1M count=1024. The most important thing is: without iflag=fullblock any read attempt may read less than ibs (unless ibs=1, I think, this is quite inefficient though).

So if you need to read some exact amount of data, use iflag=fullblock. Note iflag is not required by POSIX, your dd may or may not support it. According to this answer ibs=1 is probably the only POSIX way to read an exact number of bytes. Of course if you change ibs then you will need to recalculate the count. In your case lowering ibs to 32M or less will probably fix the issue, even without iflag=fullblock.

In my Kubuntu I would fix your command like this:

dd if=/dev/urandom of=output bs=1M count=1024 iflag=fullblock