I have a 120k lines file. Each line has to be processed by an external application. I start a subprocess and send each line to stdin. It takes at least a second to start the application and that's a real bottleneck.
I am looking for a way to make it so I can start the process once and send data to it line by line.
My current code:
    #not pictured: the loop that iterates over all lines. Here the text var is the line I need to pass to the application
pdebug("Sending to tomita:\n----\n", text,"\n----")
    try:
        p = Popen(['tomita/tomitaparser.exe', "tomita/config.proto"], stdout=PIPE, stdin=PIPE, stderr=PIPE)
        stdout_data, stderr_data = p.communicate(input=bytes(text, 'UTF-8'), timeout=45)
        pdebug("Tomita returned stderr:\n", "stderr: "+stderr_data.decode("utf-8").strip()+"\n" )
    except TimeoutExpired:
        p.kill()
        pdebug("Tomita killed")
    stdout_data = stdout_data.decode("utf-8")
    facts = parse_tomita_output(stdout_data)
    pdebug('Received facts:\n----\n',str(facts),"\n----")
The code I tried recently:
try:
    p = Popen(['tomita/tomitaparser.exe', "tomita/config.proto"], stdout=PIPE, stdin=PIPE, stderr=PIPE)
    for news_line in news:
        pdebug("Sending to tomita:\n----\n", news_line.text,"\n----")
        stdout_data, stderr_data = p.communicate(input=bytes(news_line.text, 'UTF-8'), timeout=45)
        pdebug("Tomita returned stderr:\n",stderr_data.decode("utf-8").strip()+"\n" )
        stdout_data = stdout_data.decode("utf-8")
        facts = parse_tomita_output(stdout_data)
        pdebug('Received facts:\n----\n',str(facts),"\n----")
        news_line.grammemes = facts
except TimeoutExpired:
    p.kill()
    pdebug("Tomita killed due to timeout")
The recent code produces this error:
ValueError: Cannot send input after starting communication
So is there a way to send input after I launch the exe, read stdout, flush stdin and stdout, repeat the process?
 
     
    