Running CentOS 6 in AWS, and what I'm seeing is baffling me.
There is an s3fs mount in /etc/fstab that sometimes loses its ability to read and write from. I have a cron job that worked great for months, that would simply test that the mount was good every minute, and if it ever lost the connection, it would just umount and mount the share. The mount tended to go away more often under no load, then under actual load, so this was a great solution.
For some reason this stopped working, and now machines come up unable to read/write from the share, as the first thing the machines do upon boot after provisioning is umount and mount the share.
Now the error I get when trying to read is this:
cp: cannot open `/app/canary.txt' for reading: Input/output error
In the /var/log/messages I see this:
kernel: s3fs[3077]: segfault at e66000 ip 00007f833663d94e sp 00007ffc849c5b18
error 4 in libc-2.12.so[7f83365b4000+18a000]
Now, when I run the exact same script in the console as root, it simply works perfectly. Unmounting and mounting the drive and leaving it in a working state.
My first guess was that something in the environment was causing the difference, so I added a source /root/.bash_profile to my script, to no avail.
The line in /etc/fstab is a monster, but this is what we found to work best after many attempts at fine tuning:
ourbucket /app fuse.s3fs _netdev,allow_other,endpoint=us-west-2,url=https://s3-us-west-2.amazonaws.com,use_path_request_style,use_sse,gid=1001,umask=0007,use_cache=/srv/s3fs,retries=20,parallel_count=30,connect_timeout=30,readwrite_timeout=60,stat_cache_expire=86400,max_stat_cache_size=100000 0 0
This is what the cronfile looks like:
* * * * * root /usr/bin/sudo /root/check_mount.sh
I tried it with and without the sudo, as I thought it may affect the environment.
I've tried many variations of the script, but most of these commands were used at one point or another. The same issue comes up regardless of which type of umount I do.
\cp /app/canary.txt /tmp/canary.txt
retVal=$?
sleep 1
if [ ${retVal} -ne 0 ]; then
echo "Copy failed, trying to umount"
umount /app
echo "umount returned $?"
sleep 1
echo "Trying umount -f"
umount -f /app
echo "umount -f returned $?"
sleep 1
echo "Trying fusermount -u"
/usr/local/bin/fusermount -u /app
echo "fusermount returned $?"
sleep 1
echo "Trying to mount"
mount /app
echo "mount returned $?"
sleep 1
echo "Trying copy after mount"
\cp /app/canary.txt /tmp/canary.txt
fi
This script was initially in python, with the key pieces just shelling out to os.system, but I wanted to remove that from the equation. It was giving the same issues.