I'm building on the answer here to write a backup script. The script I have is roughly
backup_files="/etc /home"
excludes="--exclude-vcs --exclude-ignore-recursive=.tarignore"
#(Skip irrelevant details)
total_size= du -csb $backup_files |awk '{print $1}'|tail -n 1
tar cf - $excludes $backup_files -P | pv -s $total_size | gzip > "$target_file"
Only, the computation for total_size ends up overestimating the amount of time. I've been fiddling around with the script to tighten the estimate, but I'm encountering some problems. For instance, I have tried
all_files=$(tar cvf /dev/null $excludes $backup_files -P |grep -v -e /$)
total_size=$(du -csb $all_files)
Which runs into the issue of too many arguments (approximately a million files). Iterating over this with a for loop runs into issues with filenames. Among other things, spaces break the loop and some odd Unicode filenames break stuff. Also, I tried timing the loop and it would take hours.
With a few pointers from comments and a now deleted answer, I've gotten as far as
run_tar () {
printf '%s\n' "$excludes" "$backup_files" | tar -cSPf - --files-from -
}
list_files () {
printf '%s\n' "$excludes" "$backup_files" | tar -cvPf /dev/null --files-from - | grep -v -e /$
}
compute_size(){
list_files | while read -r f;
do
echo -ne "$f\0"
done | du -csb --files0-from - |awk '{print $1}'|tail -n 1
}
This fixes the overhead from the for loop and the problems with spaces. Currently, it takes about a minute or two to process a million or so files.
Where I'm still stuck with are the Unicode errors. The filenames are rendered as e.g. Yle P\344\344uutiset.xml. Forwarding errors to /dev/null hides the problem, and this is a handful of files anyway. A ls of one of the misbehaving directories shows that there's a file called 'Yle P'$'\344\344''uutiset.xml'. I think this instance is a case of filename breakage but the issue remains that these are still valid filenames. For that matter, the newline character is also a valid filename separator.
How do I include the few files that I'm missing from the total?