==> Regarding Re: [Wlug] HOWTO debug hard lockups; Andy Stewart <andystewart@comcast.net> adds:
andystewart> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
andystewart> Jeff Moyer wrote:
==> Regarding [Wlug] HOWTO debug hard lockups; Andy Stewart <andystewart@comcast.net> adds:
andystewart> HI gang, andystewart> My dual Opteron machine is not happy. I cannot get more
andystewart> 7 straight days of uptime without getting a hard lock, andystewart> requiring a reboot. (My definition of hard lock is: machine andystewart> responds neither to keyboard input, mouse input, nor network andystewart> pings). andystewart> I can stimulate hard locks by running OpenOffice 1.1.3 (I had andystewart> 3 tonight, and 3-4 on a previous occasion while running andystewart> OpenOffice). It makes no sense to me that an application run andystewart> as a normal user could lockup a machine. andystewart> I've tried setting "nmi_watchdog=1" to see if I could get an andystewart> "oops" when it hard locks - no dice. Do you know any other andystewart> tricks I could try to see if it is the kernel which is locking andystewart> up? I'm running SuSE's version of 2.6.8.
Did you verify that NMIs are being delivered? After boot, cat /proc/interrupts and make sure the NMI line is non-zero. Also note that, at least with upstream and Red Hat kernels, the nmi_watchdog defaults to 1 for Opterons (i.e. you shouldn't need to manually set it).
andystewart> HI Jeff,
andystewart> Well, this is weird, I'm seeing a ZERO count for NMIs, so
andystewart> makes me think they are NOT being delivered. How would I go andystewart> about solving *that* little problem? I did a "cat
Well, you can try booting with nmi_watchdog=2. This will try to use
local APIC to deliver nmi's, but I haven't actually seen a dual
Andy, Your lockup's sound like an SMP issue. I'd try booting a non SMP kernel and then seeing if you can duplicate the lockup... I say this because I was getting practically the same behavior with a duel Xeon machine running FC3. The machine would just up and die in the same way your machine is. Just my $.02. Tim. -----Original Message----- From: wlug-bounces@mail.wlug.org [mailto:wlug-bounces@mail.wlug.org] On Behalf Of Andy Stewart Sent: Wednesday, June 08, 2005 7:15 PM To: jmoyer@redhat.com; Worcester Linux Users Group Subject: Re: [Wlug] HOWTO debug hard lockups -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeff Moyer wrote: than that the processor
system that required this (all of them I've seen work with nmi_watchdog=1). It is worth a try, however.
With nmi_watchdog=1, I see the following output from "dmesg | grep -i nmi" with the 2.6.11.11 kernel I very recently installed: Bootdata ok (command line is root=/dev/sda3 vga=0x31a selinux=0 console=tty0 resume=/dev/sda2 desktop elevator=as nmi_watchdog=1 splash=verbose) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) Kernel command line: root=/dev/sda3 vga=0x31a selinux=0 console=tty0 resume=/dev/sda2 desktop elevator=as nmi_watchdog=1 splash=verbose activating NMI Watchdog ... done. testing NMI watchdog ... CPU#1: NMI appears to be stuck (0)! I'll try nmi_watchdog=2 shortly. Thanks, Andy - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA, USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCp3v8Hl0iXDssISsRAoDJAJ0TonNTSEAd+4W0GRPwTW/UoGFssACdEEq9 8IgEFJ7VAL/FVeyPXrEapHg= =jcjE -----END PGP SIGNATURE----- _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug