-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jeff Moyer wrote:
> ==> Regarding [Wlug] HOWTO debug hard lockups; Andy Stewart <andystewart@comcast.net> adds:
>
> andystewart> HI gang,
>
> andystewart> My dual Opteron machine is not happy. I cannot get more than
> andystewart> 7 straight days of uptime without getting a hard lock,
> andystewart> requiring a reboot. (My definition of hard lock is: machine
> andystewart> responds neither to keyboard input, mouse input, nor network
> andystewart> pings).
>
> andystewart> I can stimulate hard locks by running OpenOffice 1.1.3 (I had
> andystewart> 3 tonight, and 3-4 on a previous occasion while running
> andystewart> OpenOffice). It makes no sense to me that an application run
> andystewart> as a normal user could lockup a machine.
>
> andystewart> I've tried setting "nmi_watchdog=1" to see if I could get an
> andystewart> "oops" when it hard locks - no dice. Do you know any other
> andystewart> tricks I could try to see if it is the kernel which is locking
> andystewart> up? I'm running SuSE's version of 2.6.8.
>
> Did you verify that NMIs are being delivered? After boot, cat
> /proc/interrupts and make sure the NMI line is non-zero. Also note that,
> at least with upstream and Red Hat kernels, the nmi_watchdog defaults to 1
> for Opterons (i.e. you shouldn't need to manually set it).
HI Jeff,
Well, this is weird, I'm seeing a ZERO count for NMIs, so that makes me
think they are NOT being delivered. How would I go about solving *that*
little problem? I did a "cat /proc/cmdline" to insure that I had
"nmi_watchdog=1" and indeed it is there. Perhaps this is the clue we've
been seeking.
> If the NMI watchdog works, it will print a message to the console.
> However, you will not see this if you are in X windows. Do you have a
> serial console hooked up, by any chance? I strongly suggest it if you have
> the means.
I think I have a cable to which I could connect the serial port of the
Opteron to the serial port of another Linux box (and then use minicom or
some such terminal program).
>
> Aside from this, if it is indeed a hard lockup, there is really nothing you
> can do (without purchasing other hardware to help debug the problem).
Yeah, I was afraid of that.
>
> Please give these suggestions a shot and let us know how it goes.
Shall do - thanks, everybody!
Oh, be advised that when you smash your fist on the keyboard after your
system locks up for the umpteenth time, that a lot of dead skin cells
will come flying upward out of the bowels of the keyboard. I would
recommend safety glasses.
Later,
Andy
- --
Andy Stewart, Founder
Worcester Linux Users' Group
Worcester, MA, USA
http://www.wlug.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFCpiuVHl0iXDssISsRAowjAJsHjggG0QsMPQ/H+2YQnzNZPtF9gQCfepvV
u6O8n+PSW4M0I1MHXsH06Xo=
=ViVg
-----END PGP SIGNATURE-----
_______________________________________________
Wlug mailing list
Wlug@mail.wlug.org
http://mail.wlug.org/mailman/listinfo/wlug