Tuesday, April 16, 2013

Netlogon 5719 at startup

This issue was a real booger and I almost threw in the flag and called in the big guns.

This error has been around for awhile.  There is a lot of information out there on it and a LOT of reasons it can occur.

I've actually run across this now twice.  Once in my server VM environment and also on new desktops.  It is possible / likely they are related due to the use of switches being simular / same.


In this post I'm focusing on the Virtual Environment issue.

I first discovered the issue when building an Exchange 2010 server and finding that the services where not starting automatically on boot.  This led me to find the Netlogon 5719.  After a review of the events it was obvious that this service was attempting and failing to start before the network was connected.

After find this: http://support.microsoft.com/kb/938449 I tried some of the suggestions with no help. Note this setup was with ESXi 5.1 going back to HP ProCurve switches (2810's).  STP was off on the switches.  Also, connected to the same switches is a XenServer environment and a few physical servers which do not see the issue.

Some of the different posts and KB's I found suggested that this isn't an issue and can safely be ignored as long as you can reach the DC to login.  After the set timeperiod Group Policy will apply.  Unfortunately this is NOT a solution nor a good workaround (for desktops, servers, anything).  This causes lots of issues in a domain environment especially where folder redirection, logon scripts, etc.  The proper fix is to be able to get the NIC to initialize before netlogon OR for MS to provide a method for admins to reliably force netlogon to wait for the NIC.

After messing around for awhile I discovered that this only occurs if the NIC is set to static IP.  When set to DHCP all works as expected.

So, at this point we could do DHCP reservations to make it work, BUT this isn't a solution for DC's or DHCP servers, and sometimes a static address is necessary or easier.

After finding a thread on VMWare communities that was exactly my issue it was suggested to try changing the ArpRetryCount.
http://communities.vmware.com/thread/316237?start=15&tstart=0

Bingo!

This could indicate a deeper network issue or possibly a flaw in logic as to when netlogon service should attempt to start.


Note: I also commonly see an issue very simular to this on workstations with SSD's (some differences, occurs when set to DHCP but not static, etc).  In these cases changing the ArpRetryCount does not help although I did find that it is heavily dependent on the type of switch that the workstation is plugged into.  For instance, the issue occurs when plugged into HP ProCurve switches, but does not occur when plugged into cheapo Linksys / Cisco switches.  This likely indicates configuration issue with HP ProCurve (although, many report same or simular issues with enterprise Cisco switches).  It may also be caused by the type of NIC / driver on the system (ie Realtek driver issue).  I have not been able to dig into this issue in great detail yet.

Citrix print management service crashing

Awhile back I posted about cleaning up print drivers in XA4.5. Recently I started to have issues again with printers not being auto created with what appeared to be the same issue of the spooler crashing and taking the Citrix service with it. Oddly it wasn't logging the crash though.

(see this post: http://didyourestart.blogspot.com/2009/04/terminal-server-citrix-printing-errors.html)

It then occurred to me that it's not the same issue! Duh

On a pool of 8 XenApp 5.0 (windows 2003) with R07 installed I've found that occasionally the Citrix Print Management Service will crash.  Note that in this instance the Print Spooler is NOT crashing, only the Citrix Service.  I determined this by using the script in my other printing issue post so that it would log when the print spooler crashed.  In this case no log was ever generated on the servers after a failure.

I then added a new short script and set it to run on failure of the Citrix Print Management Service.  On the next failure sure enough I had my log showing the failure time.

Batch File contents
net start "Citrix Print Manager Service"
SET logfile=C:\AdminTools\CitrixCrashLogs.log
ECHO Citrix print management service crashed on %date% at %time% on %computername% >> %Logfile%

I then set the Citrix Print Management Service to run this program on failure.

This tells me that it's a different issue causing the failure of the Citrix service since print spooler isn't actually crashing.  I believe this is an issue that was introduced sometime post R05 as I never had the issue (that I'm aware of) until updating to R07.  Note that I had skipped installing R06.  I also tested this on a fresh Citrix build with the same results.

I now implement the above batch file as part of my build on all XenApp servers.  This has reduced the help desk calls for this issue down to 1 or less a quarter.