12

I am looking for a way to clone single disk drive to more than one disk drive at the same time.

I have prepared system images on 1TB disks, and it takes almost 2 hours to clone one disk to another, and then it goes up linearly, in order to have, say, 30 disks cloned.

Is it possible to clone one disk to more than one target drive simultaneously?

mr.b
  • 361
  • 2
  • 6
  • 17

10 Answers10

17

You can use bash's "process substitution" along with the tee command to do this:

cat drive.image | tee >(dd of=/dev/sda) >(dd of=/dev/sdb) >(dd of=/dev/sdc) | dd of=/dev/sdd

or for clarity (at the expense of a little efficiency) you can make the last dd be called the same way as the others and send the stdout of tee to /dev/null:

cat drive.image | tee >(dd of=/dev/sda) >(dd of=/dev/sdb) >(dd of=/dev/sdc) >(dd of=/dev/sdd) | /dev/null

and if you have it installed you can use pipe viewer instead of cat to get a useful progress indicator:

pv drive.image | tee >(dd of=/dev/sda) >(dd of=/dev/sdb) >(dd of=/dev/sdc) | dd of=/dev/sdd

This reads the source image only once, so the source drive does suffer head-thrashing which will probably be why you see exponential slow-down when you try copy the image multiple times by other methods. Using tee like above, the processes should run at the speed of the slowest destination drive.

If you have the destination drives are connected via USB, be aware that they may all be sharing bus bandwidth so writing many in parallel may be no faster than writing them in sequentially because the USB bus becomes the bottleneck not the source or destination drives.

The above assumes you are using Linux or similar (it should work on OSX too though the device names may be different), if you are using Windows or something else then you need a different solution.

Edit

Imaging over the network has a similar problem to imaging many drives over USB - the transport channel becomes the bottleneck instead of the drives - unless the software you use supports some form of broadcast or multicast transmission.

For the dd method you could probably daisy-chain netcat + tee + dd processes on each machine like so:

  1. Source machine cat/pv/dds the data through nc to destination machine 1.
  2. Destination machine 1 has nc listening for the data from the source machine and piping it through tee which is in turn sending it to dd (and so to the disk) and another nc process which sends to destination machine 2.
  3. Destination machine 2 has nc listening for the data from the destination machine 1 and piping it through tee which is in turn sending it to dd (and so to the disk) and another nc process which sends to destination machine 3.
  4. and so on until the last machine which just has nc picking up the data from the previous machine and sending it to disk via dd.

This way you are potentially using your full network bandwidth assuming that you your switch and network cards have all negotiated a full-duplex link. Instead of the source machine sending 10 copies of the data out (assuming 10 destination machines) so each is limited to 1/10th of the outgoing bandwidth it is only sending 1. Each destination machine is taking one copy of the data and sending it out again. You might need to tweak the buffer size settings of pv, nc and dd to get closer to best practical performance.

If you can find some software that just supports multicast though, that would be much easier (and probably a little faster)! But the above is the sort of hacky solution I might be daft enough to try...

Edit Again

Another thought. If the drive image compresses well (which it will if large chunks of it are full of zeros) the outgoing bandwidth of the source machine need not be a problem even if sending to many destinations at once. Just compress the image first, transmit that to everywhere using tee+nc, and decompress on the destinations (network->nc->decompressor->dd->disk).

5

First answer on google suggested (on a Linux system): dd if=/dev/sdb of=- | tee >(dd of=/dev/sdc) >(dd of=/dev/sdd) >(dd of=/dev/sde), where /dev/sdb is the hard drive you want to clone and /dev/sdc, /dev/sdb, and /dev/sde are drives to clone to (you can add as many more of these as you want, just copypaste). A LiveCD should do it, and remember to be careful with your drive letters!

marcusw
  • 1,768
2

All i know is that there are some things called Hard Drive Duplicators. These are special Devices to clone (duplicate) HDs to multiple Drives at the same time. Maybe this article helps you.

Diskilla
  • 1,542
1

Since nobody has mentioned it yet, I'll mention Clonezilla and their Server Edition. (unfortunately, there doesn't appear to be a direct link to it, but you can find "Server Edition" in the site's left nav menu...)

I've had great luck with Clonezilla Live edition but have yet to try Server Edition. Looks pretty slick though.

Chris_K
  • 8,851
1

If you are using Mac OS X this is built in. From the machine your are going to serve the image from start a multicast asr session. From the clients launch to the boot disk, open terminal, and connect to the asr multicast stream. Free.

Details: http://www.bombich.com/mactips/multicast.html

speeds images
  • 496
  • 2
  • 3
1

So many answers here base on

cat … | tee >(dd of=/dev/sda) >(dd of=/dev/sdb) … | dd …

or similar contraption, while sole tee can read from an image/disk and write to disks, like this:

<source tee /dev/sda /dev/sdb … >/dev/sdz

Remember tee writes to its stdout, that's why the last destination is provided as a redirection.

You probably need sudo tee … to write to sda and such. The quirk is you need an elevated shell in the first place to set up the redirection to /dev/sdz. This can be solved by:

<source sudo tee /dev/sda /dev/sdb … /dev/sdz >/dev/null

but even this won't work if you need sudo to access the source. Then (and virtually only then) cat may be useful:

sudo cat source | sudo tee /dev/sda /dev/sdb … /dev/sdz >/dev/null

Or you can spawn an elevated shell for the task:

sudo sh -c '<source tee /dev/sda /dev/sdb … >/dev/sdz'

My point is you don't need process substitution, you don't need dd or cat. Reading and writing can be done with tee alone. It won't show you progress though, so pv is a sane addition:

pv source | tee /dev/sda /dev/sdb … >/dev/sdz

or (if sudo needed):

sudo pv source | sudo tee /dev/sda /dev/sdb … /dev/sdz >/dev/null

Still without >(…), useless cat or dd.

1

I found 2 useful links on the web relating to this. One used dd without cat to do the diskdupe:

dd if=/dev/sdb | tee >(dd of=/dev/sdc) | tee >(dd of=/dev/sdj) | dd of=/dev/sdh

http://joshhead.wordpress.com/2011/08/04/multiple-output-files-with-dd-utility

This is further expanded with another link to show a progress meter:

dd if=/dev/sdb | pv -s $(blockdev --getsize64 /dev/sdb) | tee >(dd of=/dev/sdc) | tee >(dd of=/dev/sdj) | dd of=/dev/sdh

http://www.commandlinefu.com/commands/view/6177/dd-with-progress-bar-and-statistics

user35060
  • 141
0

I wanted to expand on David's answer

pv drive.image | tee >(dd of=/dev/sda) >(dd of=/dev/sdb) >(dd of=/dev/sdc) | dd of=/dev/sdd

The drive.image can actually be another device, like /dev/sde

Second, the dd command will work magnitudes faster with a proper bs setting. I used bs=64k and saw a 6 times speed increase for copying a 40 Gig partition, from 1 hour to 10 minutes.

So the final command will look like this:

pv drive.image | tee >(dd bs=64k of=/dev/sda) >(dd bs=64k of=/dev/sdb) >(dd bs=64k of=/dev/sdc) | dd bs=64k of=/dev/sdd

If your source is a drive instead of a file, it'll look like this:

pv /dev/sde | tee >(dd bs=64k of=/dev/sda) >(dd bs=64k of=/dev/sdb) >(dd bs=64k of=/dev/sdc) | dd bs=64k of=/dev/sdd

Nelson
  • 1,393
0

Parallel hard disk duplication is a common task in computer forensics. dc3dd (man page) is a dedicated tool which allows parallel copying of a single source toward multiple destinations and works like UNIX dd, with multiple of= options allowed.

It is also possible to enable the hashing of the source volume and of the copies to verify their integrity.

pietrodn
  • 553
0

Use case

This answer applies to a case where you suspect the source drive may be faulty, so you want to use GNU ddrescue or a similar tool. The tool writes to exactly one seekable file. You want it to write to two or more seekable files at the same time.


Longer description

Many answers here (including my other one) use tee, which is the right tool if you can read the source from the beginning to the end in one go.

What if the source drive may be faulty? Then GNU ddrescue is the right tool. GNU ddrescue will try to read good fragments of the drive first, then it will try (and retry) to read as much of bad fragments as possible. For this it needs its input and output files to be seekable. The source drive is seekable, a pipe to tee is not. You cannot use tee to fork the output of ddrescue to multiple files.


Specific problem

I want to clone /dev/sdSRC to /dev/sdDST and to a regular file named image. The source drive may be faulty, so I want to use GNU ddrescue for reading. In general I may want to write to a larger number of destinations.


Possible workaround

I can use ddrescue to clone /dev/sdSRC to image first, then I can reliably copy from image to /dev/sdDST (and to all other destinations, if any, using tee). Time-wise this is suboptimal, the latter phase must start after ddrescue finishes. In theory a good solution might write to all the destinations simultaneously and thus finish when ddrescue finishes.


Solution

In Linux you can assemble several block devices into one, so each write gets mirrored to all of them. This can be done "by hand" with dmsetup (low level tool) or more conveniently with mdadm (higher level tool).

dmsetup or mdadm needs block devices to work with. You can "expose" a regular file (whose size is a multiple of 512 bytes) as a block device by using losetup.

This leads to the following procedure (most steps require root access):

  1. Prepare a regular file of the right size in bytes. The size of /dev/sdSRC in bytes can be revealed by blockdev --getsize64 /dev/sdSRC. In this example the size is 500107862016.

    truncate -s 500107862016 image
    
  2. "Expose" image as a block device, i.e. create a loop device associated with the image:

    losetup --find --show image
    

    The above command will print the pathname of the loop device. In this example it's /dev/loop9.

    (Note: losetup --find may be troublesome. Read the description in man 8 losetup.)

  3. Build an array. Note you need mdadm --build (straightforward assembly without a superblock), not mdadm --create.

    mdadm --build my-array --level=raid1 --raid-devices=2 /dev/loop9 /dev/sdDST
    

    (my-array is an arbitrary name.)

    The command will print a message revealing the pathname of the newly created device: /dev/md/my-array or so.

  4. Use ddrescue to copy /dev/sdSRC to /dev/md/my-array:

    ddrescue -b 512 -c 65536 /dev/sdSRC /dev/md/my-array image.ddrescue.map
    
  5. Stop (disassemble) the array:

    mdadm --stop /dev/md/my-array
    
  6. Detach the loop device:

    losetup --detach /dev/loop9
    

Now /dev/sdDST and image are two separate copies of /dev/sdSRC, as good copies as ddrescue could get.


Hints

  • In general you can interrupt ddrescue and resume later by running the same command using the same image.ddrescue.map, even after a reboot. A reboot will disassemble the array though. In case of a reboot you need to repeat losetup --find … and mdadm --build …, only then ddrescue … to continue. Note /dev/sdSRC, /dev/sdDST and /dev/loopN may be different after a reboot; adjust the commands accordingly.

  • The procedure can easily be generalized, so you can get three or more copies (to regular files and/or block devices) simultaneously. For each regular file you need losetup --find … to "expose" it as a block device. Then you need to pass all the relevant devices to mdadm --build with the exact number of them as --raid-devices=.