215

I have a couple of big files that I would like to compress. I can do this with for example

tar cvfj big-files.tar.bz2 folder-with-big-files

The problem is that I can't see any progress, so I don't have a clue how long it will take or anything like that. Using v I can at least see when each file is completed, but when the files are few and large this isn't the most helpful.

Is there a way I can get tar to show more detailed progress? Like a percentage done or a progress bar or estimated time left or something. Either for each single file or all of them or both.

Journeyman Geek
  • 133,878
Svish
  • 41,258

17 Answers17

233

I prefer oneliners like this:

tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '{print $1}') | gzip > big-files.tar.gz

It will have output like this:

4.69GB 0:04:50 [16.3MB/s] [==========================>        ] 78% ETA 0:01:21

For OSX (from Kenji's answer)

tar cf - /folder-with-big-files -P | pv -s $(($(du -sk /folder-with-big-files | awk '{print $1}') * 1024)) | gzip > big-files.tar.gz

Explanation:

  • tar tarball tool
  • cf create file
  • - use stdout instead of a file (to be able to pipe the output to the next command)
  • /folder-with-big-files The input folder to zip
  • -P use absolute paths (not necessary, see comments)

pipe to

  • pv progress monitor tool
  • -s use the following size as the total data size to transfer (for % calculation)
    • $(...) evaluate the expression
    • du -sb /folder-with-big-files disk usage summarize in one line with bytes. Returns eg 8367213097 folder-with-big-files
    • pipe (|) to awk '{print $1}' which returns only the first part of the du output (the bytes, removing the foldername)

pipe to

  • gzip gzip compression tool
  • big-files.tar.gz output file name
checksum
  • 2,602
85

Use pv. To report the progress correctly, pv needs to know how many bytes you are throwing at it. So, the first step is to calculate the size (in kbyte). You can also completely drop the progress bar and just let pv tell you how many bytes it has seen; it would report a 'done that much and that fast'.

% SIZE=`du -sk folder-with-big-files | cut -f 1`

And then:

% tar cvf - folder-with-big-files | pv -p -s ${SIZE}k | \ 
     bzip2 -c > big-files.tar.bz2
akira
  • 63,447
38

better progress bar..

apt-get install pv dialog

(pv -n file.tgz | tar xzf - -C target_directory ) \
2>&1 | dialog --gauge "Extracting file..." 6 50

enter image description here

Mr. Black
  • 495
36

Check out the --checkpoint and --checkpoint-action options in the tar info page (as for my distribution, the description for these options is not contained in the man page → RTFI).

See https://www.gnu.org/software/tar/manual/html_section/checkpoints.html

With these (and maybe the functionality to write your own checkpoint command), you can calculate a percentage…

Flux
  • 301
helper
  • 369
31

Inspired by helper’s answer

Another way is use the native tar options

FROMSIZE=`du --block-size=1 --apparent-size --summarize ${FROMPATH} | cut -f 1`;
CHECKPOINT=$((FROMSIZE / 10240 / 50));
CHECKPOINTACTION=`printf 'ttyout=\b-\>'`
echo "Estimated: [==================================================]";
echo -n "Progess:   [ ";
tar -c --record-size=10240 --checkpoint="${CHECKPOINT}" --checkpoint-action="${CHECKPOINTACTION}" -f - "${FROMPATH}" | bzip2 > "${TOFILE}";
echo -e "\b]"

the result is like

Estimated: [==================================================]
Progess:   [---------------------->

a sample script here

campisano
  • 481
29

Using only tar

tar has the option (since v1.12) to print status information on signals using --totals=$SIGNO, e.g.:

tar --totals=USR1 -czf output.tar input.file
Total bytes written: 6005319680 (5.6GiB, 23MiB/s)

The Total bytes written: [...] information gets printed on every USR1 signal, e.g.:

pkill -SIGUSR1 tar

Source:

Tometzky
  • 486
Murmel
  • 1,325
6

Method based upon tqdm:

tar -v -xf tarfile.tar -C TARGET_DIR | tqdm --total $(tar -tvf tarfile.tar | wc -l) > /dev/null
J_Zar
  • 161
4

You won't see how much is left, but you will see that it is progressing with a very simple addition to your command --checkpoint=1000

# tar  xf  file.tar --checkpoint=1000
tar: Read checkpoint 1000
tar: Read checkpoint 2000
tar: Read checkpoint 3000
tar: Read checkpoint 4000
tar: Read checkpoint 5000
tar: Read checkpoint 6000
...

More details on the manual of tar [link updated 2024-05-26]

Rub
  • 261
4

For simple extracting with pv.

pv mysql.tar.gz | tar -x

You'll get an output like this:

 249MiB 0:00:19 [14.0MiB/s] [==>                               ] 10% ETA 0:02:44

Here it is in action:

enter image description here

To install pv on macOS just use Homebrew with:

brew install pv

On other systems, you can look at the source repo here.

3

Just noticed the comment about MacOS, and while I think the solution from @akira (and pv) is much neater I thought I'd chase a hunch and a quick playaround in my MacOS box with tar and sending it a SIGINFO signal. Funnily enough, it worked :) if you're on a BSD-like system, this should work, but on a Linux box, you might need to send a SIGUSR1, and/or tar might not work the same way.

The down side is that it will only provide you with an output (on stdout) showing you how far through the current file it is since I'm guessing it has no idea about how big the data stream it's getting is.

So yes, an alternative approach would be to fire up tar and periodically send it SIGINFOs anytime you want to know how far it's gotten. How to do this?

The ad-hoc, manual approach

If you want to be able to check status on an ad-hoc basis, you can hit control-T (as Brian Swift mentioned) in the relevant window which will send the SIGINFO signal across. One issue with that is it will send it to your entire chain I believe, so if you are doing:

% tar cvf - folder-with-big-files | bzip2 -c > big-files.tar.bz2

You will also see bzip2 report it's status along with tar:

a folder-with-big-files/big-file.imgload 0.79  cmd: bzip2 13325 running 
      14 0.27u 1.02s
  adding folder-with-big-files/big-file.imgload (17760256 / 32311520)

This works nicely if you just want to check if that tar you're running is stuck, or just slow. You probably don't need to worry too much about formatting issues in this case, since it's only a quick check..

The sort of automated approach

If you know it's going to take a while, but want something like a progress indicator, an alternative would be to fire off your tar process and in another terminal work out it's PID and then throw it into a script that just repeatedly sends a signal over. For example, if you have the following scriptlet (and invoke it as say script.sh PID-to-signal interval-to-signal-at):

#!/bin/sh

PID=$1 INTERVAL=$2 SIGNAL=29 # excuse the voodoo, bash gets the translation of SIGINFO, # sh won't..

kill -0 $PID # invoke a quick check to see if the PID is present AND that # you can access it..

echo "this process is $$, sending signal $SIGNAL to $PID every $INTERVAL s" while [ $? -eq 0 ]; do sleep $INTERVAL; kill -$SIGNAL $PID; # The kill signalling must be the last statement # or else the $? conditional test won't work done echo "PID $PID no longer accessible, tar finished?"

If you invoke it this way, since you're targeting only tar you'll get an output more like this

a folder-with-big-files/tinyfile.1
a folder-with-big-files/tinyfile.2
a folder-with-big-files/tinyfile.3
a folder-with-big-files/bigfile.1
adding folder-with-big-files/bigfile.1 (124612 / 94377241)
adding folder-with-big-files/bigfile.1 (723612 / 94377241)
...

which I admit, is kinda pretty.

Last but not least - my scripting is kinda rusty, so if anyone wants to go in and clean up/fix/improve the code, go for your life :)

tanantish
  • 1,257
  • 9
  • 12
2

On macOS, first make sure that you have all the commands available, and install the missing ones (e.g. pv) using brew.

If you only want to tar without compression, go with:

tar -c folder-with-big-files | pv -s $[$(du -sk folder-with-big-files | awk '{print $1}') * 1024] > folder-with-big-files.tar

If you want to compress, go with:

tar cf - folder-with-big-files -P | pv -s $[$(du -sk folder-with-big-files | awk '{print $1}') * 1024] | gzip > folder-with-big-files.tar.gz

Note: It may take a while before the progress bar appears. Try on a smaller folder first to make sure it works, then move to folder-with-big-files.

2

Inspired by Noah Spurrier’s answer

function tar {
  local bf so
  so=${*: -1}
  case $(file "$so" | awk '{print$2}') in
  XZ) bf=$(xz -lv "$so" |
    perl -MPOSIX -ane '$.==11 && print ceil $F[5]/50688') ;;
  gzip) bf=$(gzip -l "$so" |
    perl -MPOSIX -ane '$.==2 && print ceil $F[1]/50688') ;;
  directory) bf=$(find "$so" -type f | xargs du -B512 --apparent-size |
    perl -MPOSIX -ane '$bk += $F[0]+1; END {print ceil $bk/100}') ;;
  esac
  command tar "$@" --blocking-factor=$bf \
    --checkpoint-action='ttyout=%u%\r' --checkpoint=1
}

Source

Zombo
  • 1
1

If you known the file number instead of total size of all of them:

an alternative (less accurate but suitable) is to use the -l option and send in the unix pipe the filenames instead of data content.

Let's have 12345 files into mydir, command is:

[myhost@myuser mydir]$ tar cfvz ~/mytarfile.tgz .|pv -s 12345 -l > /dev/null 

you can know such value in advance (because of your use case) or use some command like find+wc to discover it:

[myhost@myuser mydir]$ find | wc -l
12345
bzimage
  • 36
1

better looking progress bar

Install the dialog and pv commands with

sudo apt-get install dialog pv

and then execute tar like this

(tar cf - /folder-with-big-files | pv -n -s $(du -sb /folder-with-big-files | awk '{print $1}') | gzip -9 > big-files.tar) 2>&1 | dialog --gauge 'Your backup is in progress...' 7 70
Savman
  • 11
1

If you are ok with using 7z:

7z a example.tar example/

This will show the scanning drive stage

Scanning the drive:
2608M 139834 Scan  example/file.txt

as well as some other useful information to watch.

Scanning the drive:
157318 folders, 601997 files, 13683142277 bytes (13 GiB)           

Creating archive: example.tar

Items to compress: 759315

    3% 29587 + example/file.txt
qwr
  • 1,005
0

Here are some numbers of a prometheus (metrics data) backup on Debian/buster AMD64:

root# cd /path/to/prometheus/
root# tar -cf - ./metrics | ( pv -p --timer --rate --bytes > prometheus-metrics.tar )

Canceled this job as there was not enough disk-space available.

Experimenting with zstd as compressor for tar with monitoring the progress using pv:

root# apt-get update
root# apt-get install zstd pv

root# tar -c --zstd -f - ./metrics | ( pv -p --timer --rate --bytes > prometheus-metrics.tar.zst )
10.2GiB 0:11:50 [14.7MiB/s]

root# du -s -h prometheus
62G    prometheus

root# du -s -h prometheus-metrics.tar.zst
11G    prometheus-metrics.tar.zst
dileks
  • 1
0

In my daily use I don't need to know the exact percent-level progress of the operation, only if it is working and (sometimes) how much it is near completion.

I solve this need minimally by showing the number of files processed in its own line; in Bash:

let n=0; tar zcvf files.tgz directory | while read LINE; do printf "\r%d" $((n++)) ; done ; echo

As I use this a lot, I defined a function alias in .bashrc:

function pvl { declare -i n=0; while read L ; do printf "\r%d" $((++n)) ; done ; echo ; }

Then simply:

tar zcvf files.tgz directory | pvl

I can compute the number of files in advance if needed with find directory | wc -l (Or better using the same function shown [find directory | pvl] to squash my impatience!).

Another example, setting rights for a virtual website (after that, a chown -R is fast because the filenames are in the filesystem cache):

find /site -print -type d -exec chmod 2750 "{}" \; -o -type f -exec chmod 640 "{}" | pvl

It's true this lateral processing could slow the main operation, but I think printing a return character and a few digits cannot be too expensive (besides that, waiting for the next equal sign to appear or percent digit to change feels slow compared with the subjective blazing speed of changing digits!).

Fjor
  • 116