3

I'm currently testing gnu parallel to distribute a compare command across multiple servers using bash. In its most basic function this compare command takes two inputs to compare (oracle database accessions) and requires an output filename via -o. At least one action load, save, or direct upload is required by the program.

compare -o cmp.input1.input2.dat Input1 Input2

I have a few thousand of these input pairs and create a file with all the combinations so that each line contains the output filename and the database identifiers required by the program

#test_parallel
-o cmp.input1.input2.dat Input1 Input2
-o cmp.input1.input3.dat Input1 Input3
-o cmp.input2.input3.dat Input2 Input3
[...]

and execute the command using parallel, however the compare command fails

parallel -a test_parallel "compare {}"
ERROR: No action specified for results (load, save or direct upload)
usage: compare [-u][-o <file>] query target

using --dryrun mode this is what parallel executes:

compare -o\ cmp.input1.input2.dat\ Input1\ Input2

For some reason I don't understand, the escaped white space is not handled correctly by the compare program. Executing this command in bash results in the exact same error message message. Removing the escape after the -o flag (I could move the -o to the parallel command) results in a "too many arguments" error. Removing all escapes executes the command as expected.

Is it possible to tell parallel to not print the escape on the command call? I don't seem to find anything in the documentation, except that this is the expected default behavior, as indicated by parallel --shellquote

1 Answers1

5

GNU Parallel treats input as a single argument and quotes it so you can safely use filenames like:

My brother's 12" records costs 30$ each.txt

In your case you want the argument to be parsed by the shell, so the spaces will be unquoted:

parallel -a test_parallel eval compare {}

Or you can split on space:

parallel --colsep ' ' -a test_parallel compare {1} {2} {3} {4}

But since you want to compare all vs. all you can do it much more elegantly:

parallel cmp -o ../out/cmp.{1}.{2} {1} {2} ::: Input* ::: Input*

This will compare all Input* to all Input*. With --results you can get the outputs nicely structured in a dir:

parallel --results out/ cmp {1} {2} ::: Input* ::: Input*

But if you want to skip running cmp InputY InputX after you already ran cmd InputX InputY then you can do this:

parallel --results out/ cmp {=1' $arg[1] ge $arg[2] and $job->skip();' =} {2} ::: Input* ::: Input*

Edit:

Version 20190722 introduces the function uq.

parallel -a test_parallel compare {=uq=}

uq is a perl function. When called, GNU Parallel will refrain from quoting that replacement string. So you can mix quoted and unquoted replacement strings:

parallel echo {} = {=uq=} ::: \$PWD
# You can change $_ if you want: uq() is a normal perl function
parallel echo {}ME = '{=uq(); $_.="ME"=}' ::: \$HO \$LOGNA
Ole Tange
  • 5,099