I am running my shell script on machineA which copies the files from machineB and machineC to machineA.
If the file is not there in machineB, then it should be there in machineC for sure. So I will try to copy file from machineB first, if it is not there in machineB then I will go to machineC to copy the same files.
In machineB and machineC there will be a folder like this from which I am supposed to copy the files -
/data/pe_t1_snapshot/20140317
I need to copy around 400 files in machineA from machineB and machineC and each file size is around 3.5 GB and network is 10 Gigabytes which is encrypted and decrypted at the both ends.
Earlier, I was trying to copy the files one by one in machineA which is really slow and it is taking around 3 hours. Is there any way, I can have 5 different threads and each thread handles a file at a time so there should only be 5 background processes running. I don't want to download all the files in parallel since 400 parallel transfers will cause packet loss and angry network admins :)
Or Split the big group of files in sets of five files, and download those five files in parallel until all the files have been completed?
Below is my shell script which copies the file one by one in machineA from machineB and machineC.
#!/bin/bash
readonly PRIMARY=/export/home/david/dist/primary
readonly FILERS_LOCATION=(machineB machineC)
PRIMARY_PARTITION=(0 3 5 7 9 11 13 15 17 19 21 23 25 27 29) # this will have more file numbers around 400
dir1=/data/pe_t1_snapshot/20140317
# delete all the files first
find "$PRIMARY" -mindepth 1 -delete
for el in "${PRIMARY_PARTITION[@]}"
do
scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/s5_daily_1980_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir1/s5_daily_1980_"$el"_200003_5.data $PRIMARY/.
done
Problem Statement:-
I don't want to download ALL files in parallel. I am just trying to limit the number of threads to four or five. Our Unix Admin suggested me to try like this and it will help me in my file transfers speed and I am not sure how I can enforce the number of threads in my above shell script or split the big group of file numbers into sets of five files and download them in parallel?
Is this possible to do? If yes, then can anyone provide an example on this?