==> Regarding [Wlug] HOWTO debug hard lockups; Andy Stewart <andystewart@comcast.net> adds: andystewart> HI gang, andystewart> My dual Opteron machine is not happy. I cannot get more than andystewart> 7 straight days of uptime without getting a hard lock, andystewart> requiring a reboot. (My definition of hard lock is: machine andystewart> responds neither to keyboard input, mouse input, nor network andystewart> pings). andystewart> I can stimulate hard locks by running OpenOffice 1.1.3 (I had andystewart> 3 tonight, and 3-4 on a previous occasion while running andystewart> OpenOffice). It makes no sense to me that an application run andystewart> as a normal user could lockup a machine. andystewart> I've tried setting "nmi_watchdog=1" to see if I could get an andystewart> "oops" when it hard locks - no dice. Do you know any other andystewart> tricks I could try to see if it is the kernel which is locking andystewart> up? I'm running SuSE's version of 2.6.8. Did you verify that NMIs are being delivered? After boot, cat /proc/interrupts and make sure the NMI line is non-zero. Also note that, at least with upstream and Red Hat kernels, the nmi_watchdog defaults to 1 for Opterons (i.e. you shouldn't need to manually set it). If the NMI watchdog works, it will print a message to the console. However, you will not see this if you are in X windows. Do you have a serial console hooked up, by any chance? I strongly suggest it if you have the means. Aside from this, if it is indeed a hard lockup, there is really nothing you can do (without purchasing other hardware to help debug the problem). Please give these suggestions a shot and let us know how it goes. Thanks! -Jeff