24

I have send a large amount of data from one machine to another. If I send with rsync (or any other method), it will go at a steady 320kb/sec. If I initiate two or three transfers at once, each will go at 320, and if I do four at once, they will max out the link.

I need to be able to send data as fast as possible, so I need a tool that can do inverse multiplexing with file transfers. I need a general solution, so running split on the source machine and catting them together at the other end is not practical. I need this to work in an automated fashion.

Is there a tool that does this, or do I need to make my own? The sender is CentOS, receiver is FreeBSD.

11 Answers11

33

Proof it all adds up - I present the 'holy grail' of remote mirror commands. Thanks to davr for the lftp suggestion.

lftp -c "mirror --use-pget-n=10 --verbose sftp://username:password@server.com/directory" 

The above will recursively mirror a remote directory, breaking each file into 10 threads as it transfers!

Gareth
  • 19,080
10

There are a couple tools that might work.

  • LFTP - supports FTP, HTTP, and SFTP. Supports using multiple connections to download a single file. Assuming you want to transfer a file from remoteServer to localServer, install LFTP on localServer, and run:

    lftp -e 'pget -n 4 sftp://userName@remoteServer.com/some/dir/file.ext'

    The '-n 4' is how many connections to use in parallel.

  • Then there are the many 'download accelerator' tools, but they generally only support HTTP or FTP, which you might not want to have to set up on the remote server. Some examples are Axel, aria2, and ProZilla

davr
  • 5,588
9

If you have few and large files use lftp -e 'mirror --parallel=2 --use-pget-n=10 <remote_dir> <local_dir>' <ftp_server>: you willll download 2 files with each file split in 10 segments with a total of 20 ftp connections to <ftp_server>;

If you have a large amount of small files, then use lftp -e 'mirror --parallel=100 <remote_dir> <local_dir>' <ftp_server>: you'll download 100 files in parallel without segmentation, then. A total of 100 connections will be open. This may exaust the available clients on the server, or can get you banned on some servers.

You can use --continue to resume the job :) and the -R option to upload instead of download (then switching argument order to <local_dir> <remote_dir>).

2

You may be able to tweak your TCP settings to avoid this problem, depending on what's causing the 320KB/s per connection limit. My guess is that it is not explicit per-connection rate limiting by the ISP. There are two likely culprits for the throttling:

  1. Some link between the two machines is saturated and dropping packets.
  2. The TCP windows are saturated because the bandwidth delay product is too large.

In the first case each TCP connection would, effectively, compete equally in standard TCP congestion control. You could also improve this by changing congesting control algorithms or by reducing the amount of backoff.

In the second case you aren't limited by packet loss. Adding extra connections is a crude way of expanding the total window size. If you can manually increase the window sizes the problem will go away. (This might require TCP window scaling if the connection latency is sufficiently high.)

You can tell approximately how large the window needs to be by multiplying the round trip "ping" time by the total speed of the connection. 1280KB/s needs 1280 (1311 for 1024 = 1K) bytes per millisecond of round trip. A 64K buffer will be maxed out at about 50 ms latency, which is fairly typical. A 16K buffer would then saturate around 320KB/s.

1

If you can setup passwordless ssh login, then this will open 4 concurrent scp connections (-n) with each connection handling 4 files (-L):

find . -type f | xargs -L 4 -n 4 /tmp/scp.sh user@host:path

File /tmp/scp.sh:

#!/bin/bash

#Display the help page
function showHelp()
{
    echo "Usage: $0 <destination> <file1 [file2 ... ]>"
}

#No arguments?
if [ -z "$1" ] || [ -z "$2" ]; then
    showHelp
    exit 1
fi

#Display help?
if [ "$1" = "--help" ] || [ "$1" = "-h" ]; then
    showHelp
    exit 0
fi

#Programs and options
SCP='scp'
SCP_OPTS='-B'
DESTINATION="$1";shift;

#Check other parameters
if [ -z "$DESTINATION" ]; then
    showHelp
    exit 1
fi

echo "$@"

#Run scp in the background with the remaining parameters.
$SCP $SCP_OPTS $@ $DESTINATION &
user67730
  • 11
  • 1
1

How is your data structured? A few large files? A few large directories? You could spawn off multiple instances of rsync on specific branches of your directory tree.

It all depends on how your source data is structured. There are tons of unix tools to slice, dice, and reassemble files.

0

Try sort all files on inode (find /mydir -type f -print | xargs ls -i | sort -n) and transfer them with for example cpio over ssh. This will max out your disk and make the network you bottleneck. Faster than that it's hard to go when going across network.

0

I know a tool that can transfer files in chunks. The tool is called 'rtorrent' package/port that's available on both hosts ;) BitTorrent clients often reserve disk space before the transfer, and chunks are written directly from sockets to the disk. Additionally, you'll be able to review ALL transfers' states in a nice ncurses screen.

You can create simple bash scripts to automate "*.torrent" file creation and ssh a command to the remote machine so it downloads it. This looks a bit ugly, but I don't think you'll find any simple solution without developing :)

kolypto
  • 3,121
0

FTP uses multiple connections for downloads. If you can set up a secure channel for FTP over a VPN or FTP over SSH, you should be able to max out your network link. (Note that special considerations are required for FTP over SSH--see the link.)

FTPS (FTP over SSL) might also do what you need.

You could also use an SFTP client that supports multiple connections, but I'm not sure if SFTP supports multiple connections for a single file. This should do what you need most of the time, but may not give you the maximum throughput when you only have to transfer one large file.

rob
  • 14,388
-1

Solution 1: I'm not sure if this is practical in your case, but you could create a spanned archive (for example, a tarfile split into chunks, or a spanned 7zip archive), then use multiple instances of rsync to send them over the network and reassemble/extract them on the other side. You could write a general-purpose script whose arguments are the directory to be transferred and the number of connections to use. The obvious downside is that you'll need twice as much free space on both sides, and will have the additional overhead of archiving/extracting the files on both ends.

Solution 2: a better solution would be to write a script or program that divides the large directory tree into subtrees based on size, then copies those subtrees in parallel. It might simplify things if you copy the entire directory structure (without the files) first.

rob
  • 14,388
-1

Are you two machines running in a trusted environment? You could try netcat. On the server side:

tar -czf - ./yourdir | nc -l 9999

and on the client:

nc your.server.net 9999 > yourdir.tar.gz

You can have the client connection use an ssh tunnel:

ssh -f -L 23333:127.0.0.1:9999 foo@your.server.net sleep 10; \
    nc 127.0.0.1 23333 > yourdir.tar.gz

Even an entire partition can be moved this way:

dd if=/dev/sda1 | gzip -9 | nc -l 9999

and on the client:

nc your.server.net 9999 > mysda1.img.gz

.

Note

netcat is not the most secure transfer tool out there, but in the right environment is can be fast because it has such low overhead.

HowtoForge has a good examples page.

DaveParillo
  • 14,761