5

The symptom is very simple. For instance:

ls | grep a | grep b | grep c | grep d

throws

-bash: child setpgid (8948 to 8943): Operation not permitted
-bash: child setpgid (8950 to 8943): Operation not permitted
-bash: child setpgid (8952 to 8943): Operation not permitted
-bash: child setpgid (8953 to 8943): Operation not permitted
-bash: child setpgid (8954 to 8943): Operation not permitted
-bash: child setpgid (8955 to 8943): Operation not permitted
-bash: child setpgid (8962 to 8957): Operation not permitted
-bash: child setpgid (8964 to 8957): Operation not permitted
-bash: child setpgid (8966 to 8957): Operation not permitted
-bash: child setpgid (8967 to 8957): Operation not permitted
-bash: child setpgid (8968 to 8957): Operation not permitted
-bash: child setpgid (8969 to 8957): Operation not permitted
-bash: child setpgid (8976 to 8971): Operation not permitted
-bash: child setpgid (8978 to 8971): Operation not permitted
-bash: child setpgid (8980 to 8971): Operation not permitted
-bash: child setpgid (8981 to 8971): Operation not permitted
-bash: child setpgid (8982 to 8971): Operation not permitted
-bash: child setpgid (8983 to 8971): Operation not permitted
-bash: child setpgid (8990 to 8985): Operation not permitted
-bash: child setpgid (8992 to 8985): Operation not permitted
-bash: child setpgid (8994 to 8985): Operation not permitted
-bash: child setpgid (8995 to 8985): Operation not permitted
-bash: child setpgid (8996 to 8985): Operation not permitted
-bash: child setpgid (8997 to 8985): Operation not permitted

The number of greps and pipes used doesn't matter. Sometimes ls | grep a also throws the error.

AFAIK, ls anad grep does not require root privilege. Thus, I am wondering how to solve this problem.

The current machine is Cent OS 5 (kernel 2.6.18). If you need more detailed information, please let me know it.

Added: trace of ls and grep

type ls
ls is aliased to `ls -hF --color=auto'
which ls
/bin/ls
type grep
grep is /bin/grep
which grep
/bin/grep

Added 2

At this moment, I found that this is not limited to ls and grep. It seems that it applies to all commands using pipes. e.g., echo 'Hello' | tee outfile throws the same error.

Added 3: in response to @Argonauts'

Since logs are too long, please refer https://gist.github.com/anonymous/5459fa0322d178f85b0cd2d5ee2add53.

In short,

  • ulimit -a
    • pipe size (512 bytes, -p) 8
    • max user processes (-u) 129094
  • type log says -bash: type: log: not found: OK
  • trap -p: trap -- 'history_to_syslog' DEBUG. Would it cause problem?
  • Trial with cleared environment: sometimes no error, but sometimes error.
  • Need to be investigated
    • Bash debug output
    • Strace
Jeon
  • 183

1 Answers1

3

Here are a few things to try which should help at best to solve your issue, at worst to figure out what it "isn't". In some cases you may want to combine the steps (e.g. strace and 'try with cleared environment').

Ulimit

Check to see if you have any unusually low limits set for number of allowed processes in your shell or pipeline maximum size with the following command: ulimit -a

If you can, append the output of that command to your question.

Logging

On older versions of bash pipelines could break due to logging functions being enabled (bash < 4.1).

type log
That should return something like 'log: not found'. If instead it returns a function definition, clear it out with the command unset log.

Debug Trap

trap -p

See if any traps are output that are linked to DEBUG or logging. If they are and/or a log function is defined, you need to find out where they are defined and (at least temporarily) remove them.

They could be defined in .bashrc, .bash_profile and any other related initialization files. Since it appears to impacting root as well, it would more likely be found in a system level file like /etc/bashrc or /etc/profile.

At the very least you can clear the trap and log function from your current environment and see if it resolves the issue.

Try with cleared environment

Another method to check this is by running the piped commands using (fixed)

env -i ls | env -i grep a | env -i grep b | env -i grep c | env -i grep d

to clear the environment (for that command sequence). You may need to change your commands to include full paths. It would be worthwhile to see if the values from ulimit -a are different in this enviroment, also.

Bash debug output

Before running your piped cmd sequence, type set -x on the command line, which will turn on bash debugging - all 'behind the scenes' commands will be printed to the screen. It's possible you may see something odd - a hook to another function being called similar to the log issue discussed above - or other oddity.

Strace

Run the command with strace:
strace ls | grep a | grep b | grep c | grep d

and see what exactly is going on. If you want to post these results you'd probably need to put them on pastebin or similar site and post a link. This is the most likely approach to resolve the issue, but the output can be hard to decode.

Update

After reviewing your logs:

  1. When using the env -i each stage of the pipe needs to use it - each stage is effectively a separate shell instance. My mistake. env -i ls | env -i grep a | env -i grep b | env -i grep c | env -i grep d

  2. The logging function that is called between each call combined with the DEBUG trap is almost definitely the bug I was referring to. Unfortunately the bug is not available for viewing even with my RHEL subscription. It is https://bugzilla.redhat.com/show_bug.cgi?id=720464

This bug resulted in a race condition when logging occurred in conjunction with debug traps, which is exactly what you have going on - the set -x clearly shows the fairly extensive logging (to syslog) of every command that is issued.

Because a pipe creates sub shells you can't just clear it in the top level shell and issue piped commands. The next piped stage will have it defined. Retesting with the change in item 1 above will show that it does work without these hooks.

The bug report indicates no back port of the fix. I've put some details from rhel here: http://pastebin.com/dymenY7e

You need to clear the trap and or remove the definition of the logging function history_to_syslog If you have root access you can definitely remove this permanently. I gave some tips in my original answer on where to look.

You could try checking for an update to bash for centos 5, but the info I linked above stated no back port to rhel 5 was created so it's unlikely one was for centos 5.

Brief update:

To clarify the tie between the bug and the failure mode a bit - what happens is that calls to interact with process ids associated with the logging function and DEBUG hook occur out of sequence - the race condition - resulting in calls such as getppid that reference processes that have just been closed, resulting in the error that you see.

On a side note- that is an aggressive logging capability. The sysadmin clearly doesn't believe in the circle of trust.

Argonauts
  • 4,490