I'm trying to do the right thing by porting a Python script that invokes a number of shell command lines via
subprocess.call(... | ... | ... , shell=True) 
to one that avoid the security risk of shell=True by using Popen.  So I have written a little sample  script to try things out.  It executes the command line
awk '{print $1 " - " $2}' < scores.txt | sort | python uppercase.py > teams.txt
as follows:
with open('teams.txt', 'w') as destination:
    with open('scores.txt', 'r') as source:
        p3 = Popen(['python', 'uppercase.py'], stdin=PIPE, stdout=destination)
        p2 = Popen(['sort'], stdin=PIPE, stdout=p3.stdin)
        p1 = Popen(['awk', '{print $1 " - " $2}'], stdin=source, stdout=p2.stdin)
        p1.communicate()
This program works with a small data set.
Now I was struck by the following line from the documentation of the communicate method:
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
What?  But I have huge files that need to be awk'd and sorted, among other things.  The reason I tried to use communicate in the first place is that I saw this warning for subprocess.call:
Note Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.
I'm really confused. It seems my choices are:
- use callwithshell=True(security risk, they say)
- use PIPEwithcall(but then risk deadlock)
- use Popenandcommunicate(but my data is too large, 100s of megabytes).
What am I missing?  How do I create a several process pipeline in Python for very large files without shell=True, or is shell=True acceptable?
 
     
    