12

Possible Duplicate:
Free way to share large files over the Internet?
What are some options for transfering large files without using the Internet?

My wife's lab is doing a project here in the US with collaborators in Singapore. They occasionally need to transfer a large amount of high-dimensional image data (~10GB compressed) across continents. With current technologies, what would be a good solution for this usage scenario?

I can think of a few but none of them seems ideal:

  • Direct connection via Internet: transfer rate is about 500KB/s, also lacking a tool to handle errors/retransmissions.
  • Upload to a common server or service such as Dropbox: painful to upload for non-US collaborator.
  • Burning discs or copying to HDs and shipping through Courier: latency is significant, plus the extra work to make a local copy.

Any suggestions?

Update: neither party of the collaboration are tech-savvy users.

Frank
  • 760

6 Answers6

20

I suggest you use rsync. Rsync supports delta-transfer algorithm, so if your files are only partially changed, or if the previous transfer was terminated abnormally, Rsync is smart enough to sync only what's new/changed.

There are several ports of the original Rsync to Windows and other non-unix-compatible systems, both free and non-free. Please see Rsync Wikipedia article for details.

Rsync over SSH is very widely used, and works well. 10GB is relatively small amount of data nowdays, and you didn't specify what "occasionally" means. Weekly? Daily? Hourly? With 500KB/sec transfer rate it will take around 6 hours, not really a long time. If you need to transfer the data frequently, it is probably better to create a cron task to start rsync automatically.

haimg
  • 23,153
  • 17
  • 83
  • 117
12

Connection across the internet can be a viable option and a program such as bittorrent is exactly suited to this purpose as it will break the files up into logical pieces to be sent over the internet to be reconstructed at the other end.

Bittorrent also gives you automatic error correction, repair of damaged pieces and if more people are needing the files then they will get the benefit of being able to be supplied the file from as many sources as already have (parts of) the file downloaded.

Granted people see it as a nice way to download films and such, but the it does have many more legal uses.

A lot of bittorrent clients also have built in trackers so you don't have to have a dedicated server to host the files.

Mokubai
  • 95,412
6

Split the file up in chunks of e.g. 50MB (using e.g. split). Compute checksums for all of them (e.g. md5sum). Upload directly using FTP and an error-tolerant FTP client, such as lftp on Linux. Transfer all of the chunks and a file containing all checksums.

On the remote site, verify that all the chunks have the desired checksum, reupload those that failed, and reassemble them to the original file (e.g. using cat).

Revert location of server (I posted under the assumption that the destination site provided the server and you start the transfer locally when the files are ready) as needed. Your FTP client shouldn't care.


I have had similar issues in the past and using an error-tolerant FTP client worked. No bits were ever flipped, just regular connection aborts, so I could skip creating chunks and just upload the file. We still provided a checksum for the complete file, just in case.

Daniel Beck
  • 111,893
3

A variation of the answer of Daniel Beck is to split up the files in chunks in the order of 50MB to 200MB and create parity files for the whole set.

Now you can transfer the files (including the parity files) with FTP, SCP or something else to the remote site and do a check after arrival of the whole set. Now if there are parts damaged they can be fixed by the parity files if there are enough blocks. This depends more or less on how many files are damaged and how many parity files you created.

Parity files are used a lot on Usenet to send large files. Most of the time they are split up as RAR archives then. It's not uncommon to send data up to 50 to 60GB this way.

You should definitely check out the first link and you could also take a look at QuickPar, a tool that can be used to create parity files, verifies your downloaded files and can even restore damaged files with the provided parity files.

Martijn
  • 241
1

Is it one big 10GB file? Could it be easily split up?

I haven't played with this much, but it struck me as an interesting and relatively simple concept that might work in this situation:

http://sendoid.com/

Craig H
  • 1,252
  • 12
  • 13
0

Make the data available via ftp/http/https/sftp/ftps (requiring logon credentials) and use any download manager on the client side.

Download managers are specifically designed to retrieve data regardless of any errors that may occur so they ideally fit your task.

As for the server, an FTP server is typically the easiest to set up. You may consult a list at Wikipedia. HTTPS, SFTP and FTPS allow encryption (in pure FTP/HTTP, password is sent in clear text) but SFTP/FTPS are less commonly supported by client software and HTTP/HTTPS server setup is tricky.

ivan_pozdeev
  • 1,973