Environment variables in split's filter

Question

So, I am trying the contents of a large block device using split like so:

split --bytes 10M --numeric-suffixes --filter='cat | ssh root@$remote_ip "gzip >> /root/myfilecopy.gz"' /dev/myblockdev

But when I save the remote file name in an environment variable and use that variable in split's filter, it does not work:

remote_file=/root/myfilecopy.gz
split --bytes 10M --numeric-suffixes --filter='cat | ssh root@$remote_ip "gzip >> $remote_file"' /dev/myblockdev

This is what I get:

bash: -c: line 0: syntax error near unexpected token `newline'
bash: -c: line 0: `gzip >> '
split: with FILE=x00, exit 1 from command: cat | ssh root@192.168.0.105 "gzip >> $remote_file"

Looks like the environment variable is not expanded properly inside the filter command.

Any clue how to fix this?

Thanks.

Kamil Maciorowski · Accepted Answer · 2018-05-07T14:30:16.113

Overcomplicated?

Variables aside for a moment. This is your basic code:

 split --bytes 10M --numeric-suffixes --filter='cat | ssh root@$remote_ip "gzip >> /root/myfilecopy.gz"' /dev/myblockdev

First of all: cat is useless here.

Next I think that appending the results of consecutive gzip-s to the same file (>>) cancels the work of split. In theory at the end what you get is just gzipped /dev/myblockdev, exactly as if you did:

ssh root@$remote_ip 'gzip > /root/myfilecopy.gz' < /dev/myblockdev

In practice I would think of race condition. I do expect split to run ssh-s in sequence on the local side; but I wouldn't be surprised if in some circumstances various buffers or lags on the remote side caused the next gzip begin to write before the previous one finished. This would corrupt myfilecopy.gz. Opening the file just once prevents this.

If you used $FILE (see man 1 split) and wrote to multiple files, I would see the point of using split. Note this wouldn't introduce a race condition because each of these files would be opened just once.

Conclusion: split is probably useless, cat is useless for sure; careless appending may corrupt the resulting file.

The issue you asked about

OK, let's assume you have your reasons to use split and >> this way (your cat is still useless though).

Your current shell knows the value of $remote_file, but split and its filter(s) are child processes and they won't inherit the variable unless you export it beforehand. Nonexisting variable expands to nothing, so the relevant fragment looks like gzip >> (newline), hence the error.

The same applies to $remote_ip. I guess in your code you don't use $remote_ip but the actual IP address 192.168.0.105. From now on I use remote_ip as a placeholder (i.e. not a shell variable) for the actual IP address.

To export:

remote_file="/root/myfilecopy.gz"
export remote_file
# now your split command should utilize the variable as you expected

Alternatively you can make your current shell expand $remote_file at the moment you run split. Originally the $remote_file string is left intact because of single quotes surrounding it. The following syntax changes the type of quotes just for this one variable:

split --bytes 10M --numeric-suffixes --filter='ssh root@remote_ip "gzip >> '"$remote_file"'"' /dev/myblockdev
#                                                     close a single quote ^              ^ and open again
#                                             variable inside double quotes ^^^^^^^^^^^^^^

This way split never gets literal $remote_file, it gets its value.

Another issue

If you had remote_file="/root/myfile copy.gz", the space in the path would split it into two arguments on the remote side. For this reason the more robust approach requires additional quoting. This is the above "no export" approach with extra quotes (escaped with \):

split --bytes 10M --numeric-suffixes --filter='ssh root@remote_ip "gzip >> \"'"$remote_file"'\""' /dev/myblockdev

Let's strip it step by step. Among other arguments, split would see the following as a single one:

--filter=ssh root@remote_ip "gzip >> \"/root/myfile copy.gz\""

It would run this filter in bash:

ssh root@remote_ip "gzip >> \"/root/myfile copy.gz\""

Then ssh would see this as one of its arguments:

gzip >> "/root/myfile copy.gz"

So on the remote side the redirection would be to "/root/myfile copy.gz". Without these added quotes you would have:

gzip >> /root/myfile copy.gz

which is equivalent to

gzip copy.gz >> /root/myfile

Additional notes

If the local side CPU is fast enough, consider gzip before ssh. This way you push less data through the network link(s); especially if you prepared your block device to compress well.
If your intention was to run multiple gzip processes to get advantage from multiple CPU cores, then note they would probably run in sequence anyway; if not, a race condition -- you don't want this. Get familiar with pigz.

Environment variables in split's filter

1 Answers1

Overcomplicated?

The issue you asked about

Another issue

Additional notes