Can bash consume the same fifo from two separate commands?

Question

I have a huge data source that I'm filtering using some greps.

Here's basically what I'm doing right now:

#!/bin/bash
param1='something'
param2='another'
param3='yep'
echo $(avro-read /log/huge_data | grep $param1 | grep "$param2-" | grep $param3 | wc -l) / $(avro-read /log/ap/huge_data | grep $param1 | grep -v "$param2-" | grep $param3 | wc -l) | bc -l

Notice how I'm doing mostly the same filtering twice (a single difference the second time), taking the count of each, and dividing the final result. This is definitely a hacky thing to do, but I'd like to try and speed it up just a bit and only perform the initial filtering once without using a temp file.

I tried using a fifo, but I'm not sure if it's possible to have two processes in one script reading from it, as well as have a third process "wait" until both are done to compute the final result. I also looked into using tee, but again not sure how to synchronize the resulting sub processes.

EDIT: Solved this myself using https://superuser.com/a/561248/43649, but marked another suggestion as the answer.

score 3 · Accepted Answer · edited Mar 20 '17 at 10:17

If you just want to avoid creating temporary files (or storing the output of grep in a variable), you can feed it to a for loop like this:

#!/bin/bash

IFS=$'\n'
yay=0
nay=0

for line in `avro-read /log/huge_data | grep $param1 | grep $param3`; do
    [[ $line =~ $param2- ]] && yay=$(($yay + 1)) || nay=$(($nay + 1))
done

echo $yay / $nay \* 100 | bc -l

unset IFS

I've created a modified version of the approach in your self-answer that won't require temporary files:

#!/bin/bash

(avro-read /log/huge_data | grep $param1 | grep $param3 | tee \
     >(echo yay=`grep -c "$param2-"`) \
     >(echo nay=`grep -vc "$param2-"`) \
     >/dev/null | cat ; echo 'echo $yay / $nay \* 100 | bc -l') | sh

The output of the individual grep -c commands and the echo command get printed as

yay=123
nay=456
echo $yay / $nay \* 100 | bc -l

to avoid race conditions¹. Piping to sh executes the printed commands.

_{¹ Whichever grep -c command finishes first will print the first line of output.}

score 2 · Answer 2 · answered Mar 05 '13 at 18:23

I ended up solving this like so:

#!/bin/bash
param1='something'
param2='another'
param3='yep'

avro-read /log/huge_data | grep $param1 | grep $param3 \
| tee \
>(grep "$param2-" | wc -l | tr -d '\n' > has_count) \
>(grep -v "$param2-" | wc -l | tr -d '\n' > not_count) \
> /dev/null

echo $(cat has_count | tr -d '\n') '/' $(cat not_count | tr -d 'n') '* 100' | bc -l

So rather than relying on a fifo or temp file, I used tee to split the stream into two separate processes that just output a count! This way I don't need to try and synchronize the two processes before trying to divide the counts.

mpy · Answer 3 · 2013-03-05T18:02:16.047

Hm, zsh has a feature, called MULTIOS. Therewith it's possible to connect one process to two fifo's. If that's an option here a small demo:

#!/bin/zsh -f

setopt multios

mkfifo f1 f2 2> /dev/null

param1='something'
param2='another'
param3='yep'

{ avro-read /log/huge_data | grep $param1 | grep $param3 } > f1 > f2 &

( cat f1 | grep $param2 | wc -l > value1 ) &!
value2=$(cat f2 | grep -v $param2 | wc -l)

print $(( 1. * $( cat value1 ) / $value2 ))

rm value1

However, I could not figure out a way to get round the creation of the temporary file value1, which should be probably avoided as pointed out by Dennis. But perhaps you'll like this solution nevertheless.

Can bash consume the same fifo from two separate commands?

3 Answers3