2

Can a process that was started in a tmux session fall asleep? If yes, what is the cause(s), how to prevent it?

Example reason for the question: I started a process on a server yesterday (training neural networks, it prints the current training epoch to stdout). I had a split window, and in the one with the process running, I had activated scroll mode before detaching from the session.

Today I come back, and it has made no progress at all.

More specifically, the epoch is the same. After quitting scroll mode, it now happily continued.

The log reads something like

...
Epoch 40: 1h few mins
Epoch 41: 12h few mins
Epoch 42: 12h few more mins
...
Epoch 73: 13h

Meaning, the time it took to get from epoch 0 to 49 was definitely less than two hours; from epoch 40 to 41 it took around 11 hours (!), from epoch 41 to 76 average time per epoch was around 1.7 minutes. The epochs are in a loop, and there shouldn't be a reason why one takes around 400 times longer than the others.


Additional information: This 'sleeping' doesn't happen every time I detach while being in scroll mode. But it already happened before. The scroll mode might not have anything to do with it at all.

The program is a python script, including tensorflow code running on a GPU; the command to run it was :

python train_script.py 2>&1 | tee train_log.txt.

For tmux I use tmux attach to re-attach, the standard key mapping and ctrl-b + d to detach, ctrl-b + up(number block) to start scrolling, q to quit scroll mode.

dasWesen
  • 121

2 Answers2

1

I know I'm late, but I've had the same thing happen to me a few times. The environment is a little different, I'm running a python script on a slurm front end, which submits jobs, moves files, sumbits more jobs etc. A single compute job usually takes about an hour.

I started my python script one day in the evening, checked on it a few times and then left tmux in scroll mode, detached and checked on the script in the morning. It seemed to be stuck, so I checked to see if any jobs were currently running, none were. I checked if the expected files were present, which they were not. My script didn't print its "all jobs successful" note, so clearly it was still running, just not doing anything. I left scroll mode, and suddenly the script continued, produced a lot more output and lo and behold, submitted another batch of compute jobs.

Now, this could just be odd timing, and unfortunately, I don't have iterating milestones with time stamps to see how long it got stuck, but this is the third time this has happened, I'm really doubting this is coincidental timing.

Did you ever figure out why/if your script got stuck? I will exit scroll mode from now on before detaching and see if it makes a difference.


Edit: Apparently, this used to be a known bug in tmux, but no note whether it has been fixed: https://github.com/tmux/tmux/issues/431. The tmux version on the machine I'm working on is quite outdated: tmux 1.8. So, in essence, the workaround would be:

Always exit scroll mode and detach properly from tmux.

0

Can a process that was started in a tmux session fall asleep?

Basically all tmux doing is attaching own file descriptors in place of STDIN/STDOUT/STDERR to a running process inside of tmux that allows it to work while detached from console.

Below is a simple script you can run using the same workflow(attaching/detaching from tmux session) you described:

#!/bin/sh

c=1000

while [ $c -ne 0 ]; do
  date '+%Y-%m-%dT%H:%M:%S' | tee -a log.txt
  sleep 1
done

even if you would switch to the scroll mode and then detached from tmux session, it would still continue running, you can check log.txt file, so it isn't an issue with tmux.

Alex
  • 6,375