2

As the title says, I have /var and /var/log on separate partitions.

On shutdown, I get the error, that umount /var/log, and later umount /var failed.

My question is:

How to debug this issue?

If it counts, I am running Debian Stretch.

So far I was running around searching the web, it turned up an issue with journald logging in /var/log to the very last moment, however, on my system the journald logs go to /run.

Which means, there is something else hogging the /var.


1) Ideally, I would stop the shutdown process at the point, where the umount errors turn up, open a shell, and issue an lsof, or just put in there somewhere a script that does the same. However, I am not knowledgeable enough, how should I do this?

I have a vague idea that I should have an init.d script, with no requirement of local_fs, and put it in rc0 and rc6 with K99, then it will hopefully execute in the right time, and write me some output in a logfile.

Or maybe the rc levels do not have that fine control, and I should create a script, and a systemd unit to run it.

Anyway, the problem here, even if I tried that, I would not know, if it executed at the right time, so I would have no idea, whether what I see in the logs, is from before, after, on spot, when the error happens...?


2) Alternatively, I could check, what is writing in /var/log on a normal, rc2 running system with lsof, then find all of their startup scripts/methods, and make sure, they have a requirement of having /var and /var/log mounted.

Also making sure that I don't create a shutdown dependency loop.

I would rather pinpoint the problem first, then start to blindly overwrite my system config.


A) Then, this is kinda hijacking the question, but maybe there is a setting, probably for /etc/fstab that says: "for the umount order, treat /var, /var/log the same as /".

Zoltan K.
  • 121

1 Answers1

0

My solution was to poll. I logged into /root, which was on root and the very last thing to be unmounted. This version of the script respects "stop", but it could push on.

Examining the output, it seems the partitions are unmounted correctly, though the exact timing changed a bit with respects to the other processes.

Consequently, the error/warning message seems harmless.

Here is the script I used to poll, installation instructions are in the comments. Name the script "oflogger".

#! /bin/sh
### BEGIN INIT INFO
# Provides:          oflogger
# Required-Start:
# Required-Stop:
# Default-Start:     2 3 4 5
# Default-Stop:
# Short-Description: log the open fd-s in selected dirs (/var, /var/log)
# Description: Log the output of lsof and mount, filtered by dir into the
#              root's home every 0.1s .
#              The root's home is not the safest place, but we want to log
#              as long as we can, so the location must be at the root
#              partition.  We don't want to litter with this logfile,
#              so the ~root seems to be a nice, out-of-way place, which
#              will probably also ring a bell when backuping the system.
#              NOTE: We want this to be absolutely the last thing to be
#                    killed (and the first one to be started), so even
#                    though it obviously needs a filesystem, we do not
#                    add this requirement.
#                    This is because the task of this program is exactly
#                    to identify processes that may obstruct umounting.
### END INIT INFO

# INSTALL:
#cp oflogger /etc/init.d/
#ln -s /etc/init.d/oflogger /etc/rc0.d/K99oflogger
#ln -s /etc/init.d/oflogger /etc/rc1.d/K99oflogger
#ln -s /etc/init.d/oflogger /etc/rc2.d/S01oflogger
#ln -s /etc/init.d/oflogger /etc/rc3.d/S01oflogger
#ln -s /etc/init.d/oflogger /etc/rc4.d/S01oflogger
#ln -s /etc/init.d/oflogger /etc/rc5.d/S01oflogger
#ln -s /etc/init.d/oflogger /etc/rc6.d/K99oflogger
# not adding to rcS.d
#cp /usr/bin/cut /usr/bin/uniq /usr/bin/lsof /usr/bin/sort /root/

# unINSTALL:
#rm /etc/init.d/oflogger
#rm /etc/rc0.d/K99oflogger
#rm /etc/rc1.d/K99oflogger
#rm /etc/rc2.d/S01oflogger
#rm /etc/rc3.d/S01oflogger
#rm /etc/rc4.d/S01oflogger
#rm /etc/rc5.d/S01oflogger
#rm /etc/rc6.d/K99oflogger
#rm /root/cut /root/uniq /root/lsof /root/sort

LSOF='/root/lsof'
GREP='/bin/grep'
CUT='/root/cut'
SORT='/root/sort'
UNIQ='/root/uniq'

test -x "$LSOF" || exit 0

. /lib/lsb/init-functions

pid=''

case "$1" in
  start)
    echo "===start" >> /root/lsof.log
    # NOTE: Error output from here will end up in the system log,
    #       and since lsof produces an error message every time
    #       it runs, we rather disable it.
    #       ALTERNATIVE:
    #       just filtering that 1 offending message, but since
    #       we know the script works, we just ignore this problem
    while sleep 0.1
    do
        echo '.../var'
        $LSOF | $GREP '/var' | $CUT -d' ' -f1 | $SORT | $UNIQ
        echo '.../var/log'
        $LSOF | $GREP '/var/log' | $CUT -d' ' -f1 | $SORT | $UNIQ
        echo '...mount'
        mount | grep '\<var\>'
        echo '---------------------'
    done 2>/dev/null 1>>/root/lsof.log &
    ps=$!
    ;;
  restart)
    echo "===restart" >> /root/lsof.log
    ;;
  force-reload)
    echo "===force-reload" >> /root/lsof.log
    ;;
  reload)
    echo "===reload" >> /root/lsof.log
    ;;
  stop)
    echo "===stop" >> /root/lsof.log
    if [ x != x"$ps" ]
    then
        kill $ps
    fi
    ;;
  status)
    echo "===status" >> /root/lsof.log
    ;;
  *)
    echo "===*" >> /root/lsof.log
    ;;
esac

exit 0
Zoltan K.
  • 121