4

I am running this Bash script:

#!/bin/bash

for i in $(seq 1 100); do echo $i sleep 1 done

When I issue kill -INT <pid> nothing happens -- the script keeps running. If I issue kill -TERM <pid>, the script and the sleep command are both killed. <pid> here refers to the PID of the script, not the sleep process.

I want to understand what the differences in signal handling here are. I understand issuing Ctrl+C will achieve the same results as sending -TERM because of the signal being sent to the process group, but I don't quite understand why issuing -INT doesn't behave the same way as issuing -TERM. From what I gather the default action of SIGINT is to terminate the process.

Some context: My use case is that this script was being started as a subprocess from another program, and that program was sending a SIGINT during shutdown, before waiting for X seconds and sending a SIGKILL. I changed the SIGINT to a SIGTERM and it just abruptly killed the process without waiting, and this got me thinking about the behavior difference here.

Giacomo1968
  • 58,727
rama
  • 51

1 Answers1

5

What you observed is a consequence of an approach called wait and cooperative exit (WCE). It's about handling SIGINT/SIGQUIT.

You can read about WCE (and about alternative approaches) here: Proper handling of SIGINT/SIGQUIT. The article is quite comprehensive. A short description of WCE from Greg's Wiki is enough to get the idea:

bash is among a few shells that implement a wait and cooperative exit approach at handling SIGINT/SIGQUIT delivery. When interpreting a script, upon receiving a SIGINT, it doesn't exit straight away but instead waits for the currently running command to return and only exits (by killing itself with SIGINT) if that command was also killed by that SIGINT. The idea is that if your script calls vi for instance, and you press Ctrl+C within vi to cancel an action, that should not be considered as a request to abort the script.

What does it mean for your script?

If <pid> is the PID of the script (i.e. the PID of bash interpreting the script) then your kill -INT <pid> sends SIGINT to this process only. Ctrl+C or kill -INT -- -<pid> ("negative" PID) would send SIGINT to the entire process group. SIGINT is by design the "interrupt from keyboard" (see man 7 signal), so Ctrl+C is the by-design way to send SIGINT, so sending SIGINT to the foreground process group (as opposed to sending to a single process) is the by-design usage of SIGINT.

When bash interpreting the script gets SIGINT while waiting for sleep 1 to return, it handles the signal and reacts or not, depending on the reaction of sleep. In fact the shell cannot tell if sleep is also getting the signal; it assumes. The shell assumes the signal comes from Ctrl+C (or something equivalent) because this is the by-design usage of SIGINT; so the shell assumes the current sleep has also got the signal.

If sleep really got the signal (i.e. if you used Ctrl+C or kill -INT -- -<pid>) then it would be terminated because of the signal. In general a process is able to tell if its child was terminated by a signal, so here the shell would be able to tell that sleep was terminated by the signal. If sleep was terminated by SIGINT, the shell would conclude the user has pressed Ctrl+C in order to terminate the whole script, so the shell would kill itself with SIGINT (note: not just exit; it would kill itself by stopping handling the signal and sending the signal to itself, so if the parent of the shell also implemented WCE, then it would in turn react accordingly; and so on).

Your sleep had not got the signal, but bash interpreting the script assumed it had. Then the shell observed that sleep had exited nicely (not having been terminated by the signal), so it assumed sleep had handled or ignored the signal; so it assumed that yes, there had been Ctrl+C, but as a part of normal operation of the child, not as an attempt to terminate the whole script. This is why the shell did not exit, it continued interpreting the script as if nothing happened, because sleep had exited as if nothing happened.

In other words a parent implementing WCE reacts to SIGINT/SIGQUIT according to how the currently running child reacts. By sending SIGINT to bash and not sending to sleep, you kinda tricked the shell: you made it assume that sleep had handled or ignored the signal and everything is fine, thus exiting the whole script would be the Wrong Thing.

I changed the SIGINT to a SIGTERM and it just abruptly killed the process without waiting

WCE is about handling SIGINT/SIGQUIT. There is no such mechanism (nor any need for it, I think) in case of SIGTERM, so this signal "just works".

Giacomo1968
  • 58,727