We have a TS809U that we have joined to the domain. Shares and access rights works as the should with the domain users and everything is just the way it should be. But after a couple of weeks/a month the domain users and groups disappear from the TS809, and I have to manually rejoin the domain again. After rejoining the domain the process repeats itself within the same timeframe, and I have to rejoin the domain yet again.
There is no errors in the logs in the web interface, and it shows the NAS joining the domain succesfully. I updated the TS809U to the latest firmware 4.0.3 (from 3.x) in hopes that this would solve it, but the problem still persists.
Has anyone encountered this before and would what the issue could be, or how to troubleshoot it further?
The only message I've been able to find in the event viewer that references the NAS is a 5722 that might point in the direction of the comment below:
The session setup from the computer
NASC473CDfailed to authenticate. The name(s) of the account(s) referenced in the security database isNASC473CD$.
The following error occurred:
Access is denied.
The timing between when the entries disappeared and then re-appeared seems to be 14 days. Our domain is (still) based on Windows Server 2003.
Update
Update: The problem has surfaced again, but logs didn't really show anything interesting. wbinfo -t (testing the trust secret) did not work and (unsurprisingly) neither did wbinfo -c (changing the trust secret). I did discover that the current kerberos5 ticket store hadn't been refreshed and the validity of the kerberos tickets had expired, which might be connected. I've now added /sbin/update_krb5_ticket to the crontab to see if that'll help (and it's now being refreshed each hour).
Update 2014-02-25
Still no success. log.wb-DOMAINNAME shows that we're apparently being refused access, probably because of timed out credentials or invalid secrets. Not sure how to progress, as the kerberos ticket list (klist) showed a valid ticket when it occurred.
log.wb-DOMAINNAMEshows:
[2014/02/25 03:05:20.545176, 3] winbindd/winbindd_pam.c:1902(winbindd_dual_pam_auth_crap)
could not open handle to NETLOGON pipe (error: NT_STATUS_ACCESS_DENIED)
[2014/02/25 03:05:20.545198, 2] winbindd/winbindd_pam.c:2003(winbindd_dual_pam_auth_crap)
NTLM CRAP authentication for user [DOMAINNAME]\[MACHINE$] returned NT_STATUS_ACCESS_DENIED (PAM: 4)
[2014/02/25 03:05:20.548424, 3] winbindd/winbindd_pam.c:1841(winbindd_dual_pam_auth_crap)
[20497]: pam auth crap domain: DOMAINNAME user: MACHINE$
(the same error messages occur when referring to users). At least the issue seems to be that the server responds with ACCESS_DENIED when samba tries to use the NETLOGON resource as far as I understand. I did however discover that one of the DNS servers on the TS809 was set to an external server - and not a server in the domain. I've updated the DNS-servers to both point to our AD DC-s to see if that could be the reason (if it falls over to the external, it will get host not found instead of timeouts for internal, domain based hosts).
Update 2015-03-04. Automated rejoin script deployed as a work around.
We're still no closer to determining a lasting solution, but we're currently seeing timeouts each week. This seems to be the same time as a valid kerberos ticket, but I've been unable to find any setting that changes it.
I have however created a small script that checks if we've lost the user list from the domain, and rejoins the server if needed. (Using Samba's net rpc join command.) "username" is a user in the domain that have access to join computers into the domain (we created a user for the qnap for this purpose only):
COUNT=`wbinfo -g | grep DOMAINNAME | wc -l`
if [ "$COUNT" -lt "1" ]
then
/usr/local/samba/bin/net rpc join -Uusername%password
fi
This script is run on the qnap with cron (search for qnap cron on Google on how to set up cron properly). This has worked decent the last months.