I have been trying to execute piped commands via the subprocess module, but am having some issues.
I have seen the solutions proposed below, but none have solved my problem:
- sending a sequence (list) of arguments
- several Popen commands using subprocess.PIPE
- sending a string with shell=True
I would like to avoid the third option, with shell=True, although it did produce the expected results on my test system.
Here is the command that works in Terminal, which I would like to replicate:
tr -c "[:alpha:]" " " < some\ file\ name_raw.txt | sed -E "s/ +/ /g" | tr "[:upper:]" "[:lower:]" > clean_in_one_command.txt
This command cleans files as required. It first uses the tr command on an input file, which has spaces in the name. The output is passed to sed, which removes some whitespace and then passes the contents to tr again to make everything lower case.
After several iterations, I ended up breaking it all down into the simplest form I could, implementing the second method above: several instances of Popen, passing information using subprocess.PIPE. It is long-winded, but will hopefully make debugging easier:
from subprocess import run, Popen, PIPE
cmd1_func = ['tr']
cmd1_flags = ['-c']
cmd1_arg1 = [r'"[:alpha:]\"']
cmd1_arg2 = [r'" "']
cmd1_pass_input = ['<']
cmd1_infile = ['some file name_raw.txt']
cmd1 = cmd1_func + cmd1_flags + cmd1_arg1 + cmd1_arg2 + cmd1_pass_input + cmd1_infile
print("Command 1:", cmd1) # just to see if things look fine
cmd2_func = ['sed']
cmd2_flags = ['-E']
cmd2_arg = [r'"s/ +/ /g\"']
cmd2 = cmd2_func + cmd2_flags + cmd2_arg
print("command 2:", cmd2)
cmd3_func = ['tr']
cmd3_arg1 = ["\"[:upper:]\""]
cmd3_arg2 = ["\"[:lower:]\""]
cmd3_pass_output = ['>']
cmd3_outfile = [output_file_abs]
cmd3 = cmd3_func + cmd3_arg1 + cmd3_arg2 + cmd3_pass_output + cmd3_outfile
print("command 3:", cmd3)
# run first command into first process
proc1, _ = Popen(cmd1, stdout=PIPE)
# pass its output as input to second process
proc2, _ = Popen(cmd2, stdin=proc1.stdout, stdout=PIPE)
# close first process
proc1.stdout.close()
# output of second process into third process
proc3, _ = Popen(cmd3, stdin=proc2.stdout, stdout=PIPE)
# close second process output
proc2.stdout.close()
# save any output from final process to a logger
output = proc3.communicate()[0]
I would then simply write the output to a text file, but the program doesn't get that far, because I receive the following error:
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
sed: 1: ""s/ +/ /g\"": invalid command code "
usage: tr [-Ccsu] string1 string2
tr [-Ccu] -d string1
tr [-Ccu] -s string1
tr [-Ccu] -ds string1 string2
this suggests that my arguments are not being passed correctly. It seems the ' and " quote marks are both being passed into sed as ". I do actually need one of them there explicitly. If I only put one set into my list, then they are stripped in the command completely, which also breaks the command.
Things I have tried:
- not declaring literal strings for those strings where I need explicit quotations
- escaping and double-escaping explicit quotations
- passing the entire command as one list into the
subprocess.Popenandsubprocess.runfunctions. - playing around with the
shlexpackage to deal with quotations - removing the parts
cmd3_pass_output = ['>']andcmd3_outfile= [output_file_abs]so that only the raw (piped) output is dealt with.
Am I missing something, or am I going to be forced to use shell=True?