The Single biggest cause of Lotus Notes client crashes and how to avoid them


(For a related article, see my post titled Quick Tip: Fix for unexplainable common crashes of Lotus Notes 8.x with Eclipse )

While reviewing an environment with about 3000 users, I discovered an extremely high number of fault reports occurring.  On a daily basis there were from 100 to 200 faults reported.  Some users were crashing every single day.  Clearly this points to a systemic problem, probably due to some software conflict or other configuration issue widely used within this organization.  Yet for all these crashes, the users were not reporting any problems.  While they weren’t reporting problems, this was likely to lead to bigger problems from file corruption if it wasn’t already.  I needed to find the cause.  One catch though: I had limited access to the computers or contact with the users.  This can make troubleshooting very difficult.

The first step was to examine the data submitted in the Fault Reports database.  Unfortunately, the crashes were not reporting much, if any, useful data, including only partial .NSD files.  Fewer than 10% of the crashes even reported a version, but of those that did, they were all either Release 8.5.2 or 8.5.3 with various Fix Packs.  While we were only about half way through an upgrade from 7.0.x to 8.5.3, none of the crashes reported a version of 7.x.  If all the crashes are 8.5.x, then that makes the fault rate even worse; about 10% per day for fifteen hundred 8.5.x users!  Yet no one was reporting any problems.  Quite the mystery.

The next logical step would be to run Fault Analyzer against the Fault Reports database to look for trends in the fault reports and to examine whatever is available in the .NSD files for any clues.  The .NSD files were mostly empty and Fault Analyzer proved useless because there wasn’t enough data reported in the fault reports.  For those crashes that did report some data, examining them manually, I found a common thread among some of the crashes:

Host Name       : LAPTOP1234
User Name       : SYSTEM
Date            : Thu Oct 11 10:33:24 2012
Windows Dir     : C:\Windows
Arguments       : “C:\Program Files (x86)\IBM\Lotus\Notes\nsd.exe” -dumpandkill -termstatus 1 -dlgopts showwait  -wctpid 5292 -wctexitcode 1073807364 -panicdirect -crashpid 3940 -crashtid 516 -runtime 300 -ini “C:\Program Files (x86)\IBM\Lotus\Notes\notes.ini” -svcreq 128
NSD Version     : 8.5.23.1132 (Release 8.5.2FP3)
OS Version      : Windows/7 6.1 [64-bit] (Build 7601), PlatID=2, Service Pack 1 (8 Processors)
Running as 32-bit Windows application on 64-bit Windows
Build time      : Mon Jul 11 03:15:18 2011
Latest file mod : Fri May 13 09:03:31 2011
Notes Version   :  (32-bit client)
ERROR (79): the directory () does not exist – (22) Invalid argument
ERROR (44): unable to open file ‘C:\Program Files (x86)\IBM\Lotus\Notes\Data\formats.ini’ – (2) No such file or directory

This is an odd error, but searching the web I did find others who reported a similar problem and they solved it by getting a copy of the formats.ini file from a good installation and adding it to their computer.  Could it be that our customized installation kit was missing this file?  If so, it would be a straightforward fix, though it would have to be applied to all computers already upgraded.  However, an inspection of one of the computers that had been crashing revealed the file is right where it should be.  This was a dead end.

Finally I was able to work with one user on the issue.  She had been crashing several times a week for the past few months though she never noticed.  The crash reports were time stamped fairly consistently at around 7:30 AM correlating with the time she came in to work.  The user did not report any unusual behavior when she started her computer, though occasionally Lotus Notes did “take a long time to start”.  So one morning I watched her go through her morning routine of starting up and logging in.  There was nothing unusual.  No crash report posted either.  Time to do more trend analysis.

I created several views in the Fault Reports database trying to identify any other trends using different categorized sorts: by date, by user, by hour of the day.  When categorized by the hour of day, the crashes revealed a trend.  The majority of crashes were in the afternoon between 1:00 PM and 5:00 PM (hours 13 – 16).

Fault Reports by Hour of Day

Fault Reports by Hour of Day

I sorted this view further by user.  From this I noticed that, while the crashes were scattered throughout the afternoon, for any given person they were usually crashing in the same hour almost every time.  I re-sorted the view so it was first categorized by user and then by hour and added a column with the exact time of crash.  Now I could see all the crashes for one person grouped together and categorized by hour. Then scanning through the users with very high crash counts, I found the final clue: One user crashed at precisely 5:00 PM every single day.  This user was crashing at precisely 5:00 PM every day and the crashes were being submitted consistently at 8:02 AM the next day.

Crashes occurring at 5:00 PM daily

Crashes occurring at 5:00 PM daily

This person happened to be the receptionist.  Her work hours are precisely from 8 to 5.  Looking more closely at the other users I could see the crashes were typically occurring about 8 hours after the previous crash report was submitted by each person.  It is important to note here that the crash report is reported (Creation date/time) at the next restart of Notes.  In other words, Notes would crash at the end of their day and they didn’t restart Notes until the next morning.

I called the receptionist and asked how she shuts down her computer at the end of the day.  I expected to hear her say she just hits the power button, but that was not the case.  It turns out she clicks the X in the top right corner of Notes to close the window, then clicks Log Off on the Start menu immediately after.  Apparently Notes 8.5.x takes longer to close than 7.x and it was not able to close before the OS dumped it from memory during shutdown, thus causing it to not close cleanly.

With a bit of user training, this problem has been resolved.  They were told to give Notes an extra minute to shut down before logging out or just lock or hibernate the computer instead of logging off.

I think this is a flaw in the interaction between the OS and Notes, but until that is fixed, this is a clean, simple work-around.  What are your ideas and experiences with this?

Advertisements

Posted on January 20, 2013, in Quick Tip and tagged , , , . Bookmark the permalink. 13 Comments.

  1. great sleuthing work. Notes 8.5. has a very (very) large memory footprint. Also, I have observed the Windows O/S takes its own sweet time to de-allocate an exe/dll. I saw this with MS Word automation which was causing seemingly random crashes and OLE failures.

    Your specific problem though involves having some kind of controlled shutdown. Your user probably doesn’t want to wait around before shutting down, and detecting when Notes is actually dumped from memory is hard.

    However, there is a utility that can run a timer in a batch file to and then do the shutdown. You could create a batch file that the users run from the start menu, and then the user just walks away whilst a batch file runs the “psshutdown.exe” with a timer, allowing Notes to properly de-allocate from memory, (hopefully).

    What’s psshutdown ? It’s a utility from the great Mark Russinovich (Windows Defrag fame).

    http://technet.microsoft.com/en-au/sysinternals/bb897541.aspx

    It’s used in this article about “one-click exits”
    http://pcworld.about.net/gi/dynamic/offsite.htm?site=http://find.pcworld.com/41282

    and here

    http://pcworld.about.net/magazine/2206p160id115628.htm

    The articles are old, but should still work on windows 7 clients.

  2. Great write-up!
    We’ve experienced similar issues with < 8.5.3 releases, but not with 8.5.3 – at least not with FP2/3.
    Have you looked at how windows is configured with regards to waiting for applications during shutdown?
    We've seen bad windows configurations play into this, brute-forcing shutdown after just a couple of seconds.

  3. Thanks. I did not get to work with the desktop support team as I would prefer, to see how things are configured on the Windows side. I am suspicious of that as well because this organization has an unusually high rate of these crashes compared to others I have engaged. Now that I have discovered this, I will know to watch for it in the future. For the benefit of other readers, perhaps you could elaborate on which Windows settings might influence this?

  4. Excellent problem analysis work. In my experience Notes 8.5.x does take a long time to shut down. Important to check if client is set to replicate on shutdown (and whether it is doing it silently) as this can be a cause but certainly if you close Notes and then immediately close Windows, Windows will usually kill Notes before it has had a chance to close.

  5. Good point Alan. That could have been a contributing factor for laptops. In the case of the receptionist, she has a desktop and is not replicating. (Which, by the way, explained a separate issue I encountered where some desktop users had issues sending email to staff that recently left and were re-hired or were converted from contractor to employee. That was because they had the mobile directory catalog, but replication was not enabled.)
    There is always something to keep an administrator busy!

  6. Indeed this is great detective work David, and valuable insight for the customer. Nice job! 🙂

  7. Thanks Scott! This is probably as close as I will ever get to playing Sherlock Holmes.

  8. Although I am not sure whether or not Lotus Notes crashes after sending/forwarding an e-mail, I have found that deleting the “dircat5” file and then relaunching Notes fixes the problem right away. A lot of times if the “dircat5” file is corrupted it would cause a creash. Once the “dircat5” file is deleted, relaunching Notes recreates the file.

  9. My Lotus Notes crashes everyday right after 1 pm. It happens even I leave my computer on, in hibertate, or turn off at night and then back on at 7:30 the next morning.

  10. Larry, check to see if you have Run local agents enabled. Check your log.nsf for errors. Make sure your anti-virus software is not scanning the .nsf files. And look at the .nsd files created in the IBM technical support folder. Submit a ticket to technical support and include the .nsd or at least run a search to find the technote that describes how to analyze the .nsd to see the cause.

  1. Pingback: Links for those who attended my session at IamLUG: Hunting the Gremlins in your Domino System | The Notes Guy in Seattle

  2. Pingback: Quick Tip: Fix for unexplainable common crashes of Lotus Notes 8.x with Eclipse | The Notes Guy in Seattle

%d bloggers like this: