I have a postgres started like this:
exec 3< <(su -l postgres -c "/usr/local/pgsql/bin/postmaster -p '$port' -d 3 -D '$backupDir' 2>&1 & echo \$!")
This redirects all logs to the socket filedescriptor 3
postgres postgres 64 Mar 19 16:32 3 -> socket:[7119806]
Then I read from it:
while [ true ]
do
read -u 3 line
... //Listening for an event, then kill
done
Which works fine
Once a certain event happens, I stop postgres with "kill "
and normally it shuts down and all is good. But once in a while the shutdown is not working. So to see what happens I start listening to that socket again after the kill
And in that case I always get the same line back, that was the last one I got before. Why? I would expect it to block if nothing new is available on that socket.
Postgres: 2021-03-18 23:52:22.234 GMT [4775] DEBUG: switched WAL source from archive to stream after failure
Postgres: 2021-03-18 23:52:22.234 GMT [4775] DEBUG: switched WAL source from archive to stream after failure
Postgres: 2021-03-18 23:52:22.234 GMT [4775] DEBUG: switched WAL source from archive to stream after failure
Postgres: 2021-03-18 23:52:22.234 GMT [4775] DEBUG: switched WAL source from archive to stream after failure
UPDATE:
Ok, if I do a manual cat on the filedescriptor cat /proc/13458/fd/3, I get all the messages I would like to have gotten, and the process finally exits cleanly as it should. So the Postgres process seems to be stuck trying to write to a receiver and that is why it's stuck.
Still question is, why my attempt to reiceve the data again fails. This behaviour was introduced after an update of our centos 7. Befora that this never happened.
UPDATE2
I did an strace on the script as suggested. Here is the loop that the script is caught in and the stack trace. It begins with the sleep and ends with the sleep. I also tried without -t 0, same result.
while [ true ]
do
#Checking if postgres is still running
ps -q $postgresPid aux | tail -n1
if [ $? -ne 0 ]
then
break
else
#Try reading postgres logs
lastLine=""
line=""
while [ true ]
do
read -t 0 -u 3 line
if [ $? -ne 0 ]
then
break
fi
if [ "$lastLine" == "$line" ]
then
break
fi
lastLine=$line
echo "Postgres: $line"
done
sleep 5
fi
done
Output in one cycle (Always the same)
2021-03-19 21:33:45: postgres 27807 0.0 0.0 410588 19864 ? S 17:02 0:00 /usr/local/pgsql/bin/postmaster -p 57943 -d 3 -D /tank/backups/cv17/databases/postgres/data
2021-03-19 21:33:45: Postgres: 2021-03-19 17:39:20.733 GMT [27808] DEBUG: switched WAL source from archive to stream after failure
STRACE:
strace: Process 9703 attached
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 350
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=350, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffd1dbd6a50, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn({mask=[]}) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
pipe([4, 5]) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f915b541a10) = 1686
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f915b541a10) = 1691
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
close(4) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 1691
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 1686
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
close(4) = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1691, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffd1dbd6c50, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn({mask=[]}) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
fcntl(3, F_GETFD) = 0
select(4, [3], NULL, [3], {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(1, "Postgres: 2021-03-19 17:39:20.73"..., 111) = 111
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
fcntl(3, F_GETFD) = 0
select(4, [3], NULL, [3], {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f915b541a10) = 1734
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 1734
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1734, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffd1dbd6a50, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn({mask=[]}) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
pipe([4, 5]) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f915b541a10) = 1903
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f915b541a10) = 1904
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
close(4) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 1904
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 1903
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
close(4) = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1904, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffd1dbd6c50, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn({mask=[]}) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
fcntl(3, F_GETFD) = 0
select(4, [3], NULL, [3], {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(1, "Postgres: 2021-03-19 17:39:20.73"..., 111) = 111
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
fcntl(3, F_GETFD) = 0
select(4, [3], NULL, [3], {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f915b541a10) = 1905
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, {sa_handler=0x43e910, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f915ab7f450}, 8) = 0
wait4(-1,
Update 3
Ok part of my issue is, that indeed doing exec 3< <(command) will block after some lines if no-one consumes the output offered at file descriptor 3. And this buffer size might be what changed after my update and caused my initial problem as I stopped consuming after sending the kill command.
I still have no clue why taking up the consuming after the kill fails, but I guess I can now rewrite my code to not really stop cosnuming in the first place.