Bash split stream (tee) and join them together

Question

I need to do something similar to this question, except that in that question the OP just concat the outputs of command2 and command3, and I need them be handed over separately, like this:

             command2 [stream A]
            /                    \
    command1                      join -j1 [stream A] [stream B]
            \                    /
             command3 [stream B]

(here, join is the coreutils join utility, the only one I've named explicitely to make it clear that I don't want the streams A and B to be merged indiscriminately)

I've tried this:

command1 | tee >(command2 >&3- ) >(command3 >&4- ) >/dev/null | join -j1 /dev/fd/3 /dev/fd/4

But bash rightfully complains:

bash: 3: Bad file descriptor
bash: 4: Bad file descriptor

(because file descriptors 3 and 4 are not open yet)

I think I need instruct bash to somehow call 2 extra pipe(2) (like it does when piping fd1 from left command to fd0 of right command in left | right). As each call of such pipe(2) creates two fds (one for writing and other for reading), I need to:

for command2 close the reading end, and redirect stdout to the writing end (of pipe1);
for command3 close the reading end, and redirect stdout to the writing end (of pipe2);
for join close both writing ends and instruct it to open /dev/fd/reading-end-of-pipe1 and /dev/fd/reading-end-of-pipe2

I don't necessarily have write access to any path in this enviroment, precluding the use of mkfifo.

score 2 · Answer 1 · answered Jul 29 '22 at 04:03

In theory you can do this with a coprocess. Either I'm too dumb, or managing descriptors like ${COPROC[1]} (so they are available where I want them and closed where and when I want them to be closed) is really cumbersome. I tried this and I failed. I found a less cumbersome way, but it requires /proc.

The version of my Bash is 5.0.3.

It's relatively easy to use coproc just to set relevant pipes and then use /proc/…/fd/… where needed.

Start a coprocess:
```
coproc command2
```
Build the rest of the piping. There's a quirk. I expect command2 to be a filter that exits only after its stdin or stdout is closed. The shell holds these open and if we detach then the coprocess will be terminated. So we cannot detach until we attach processes that are going to really use the coprocess. On the other hand if the shell does not detach then the coprocess will not terminate and other processes will probably wait for it. This means we must run them asynchronously with the shell and detach the shell later, otherwise the whole setup may block. Build the rest of the piping and run it asynchronously:
```
command1 | tee "/proc/$COPROC_PID/fd/0" | command3 | join -j1 - "/proc/$COPROC_PID/fd/1" &
```
Detach the shell from the coprocess. In my tests it was enough to close the descriptor with a number available as ${COPROC[1]}. This is how you do it:
```
exec {COPROC[1]}>&-
```
^{Note the above syntax does not really use ${COPROC[1]} with $. Quite unintuitive. Before you close the descriptor you can see the number: echo "${COPROC[1]}". Suppose it prints 60. exec 60>&- is a valid command, but don't try to exec "${COPROC[1]}">&-. The latter syntax will "work" like exec 60 >&- and it will close the stdout of the shell!}
Now you can wait for our pipeline; or you can fg it if job control is enabled.

Notes:

If job control is enabled (it is by default in interactive Bash) and command1 wants to read something from the terminal then it won't be able to until you fg it. It's different when job control is disabled, e.g. in a script. If job control is disabled then some shells would redirect the stdin of command1 to /dev/null or some equivalent file. This is how & works. Bash doesn't do this for pipelines though.
Unfortunately there is a race condition: exec {COPROC[1]}>&- may be executed and the coprocess terminated before our asynchronous commands manage to open /proc/$COPROC_PID/fd/…. A non-elegant "fix" is to sleep a while. I don't like this very much (in theory a delay does not guarantee anything, it just reduces the probability of failure). A robust fix is to open the pipes in a subshell that is going to exec to a command that opens …/fd/0 anyway, then signal the main shell it's safe for it to close the descriptor.
```
coproc command2
trap 'exec {COPROC[1]}>&-' USR1
command1 | ( exec 3>"/proc/$COPROC_PID/fd/0" 4<"/proc/$COPROC_PID/fd/1"; kill -s USR1 "$$"; exec tee "/proc/$COPROC_PID/fd/0" ) | command3 | join -j1 - "/proc/$COPROC_PID/fd/1" &
wait
wait
```
We need two waits because the first wait will probably be interrupted by the signal.

Proof of concept

#!/bin/bash
set +m   # job control explicitly disabled
coproc sed 's/$/ added_by_sed/'
trap 'exec {COPROC[1]}>&-' USR1
echo 'Type few lines and press Ctrl+d (twice if needed).'
cat -n | ( exec 3>"/proc/$COPROC_PID/fd/0" 4<"/proc/$COPROC_PID/fd/1"; kill -s USR1 "$$"; exec tee "/proc/$COPROC_PID/fd/0" ) | awk '{print $0,"added_by_awk"}' | join -j1 - "/proc/$COPROC_PID/fd/1" &
wait
wait

Bash split stream (tee) and join them together

1 Answers1

Proof of concept