RE: [Wlug] HOWTO debug hard lockups
==> Regarding Re: [Wlug] HOWTO debug hard lockups; Andy Stewart <andystewart@comcast.net> adds:
andystewart> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
andystewart> Jeff Moyer wrote:
==> Regarding [Wlug] HOWTO debug hard lockups; Andy Stewart <andystewart@comcast.net> adds:
andystewart> HI gang, andystewart> My dual Opteron machine is not happy. I cannot get more
andystewart> 7 straight days of uptime without getting a hard lock, andystewart> requiring a reboot. (My definition of hard lock is: machine andystewart> responds neither to keyboard input, mouse input, nor network andystewart> pings). andystewart> I can stimulate hard locks by running OpenOffice 1.1.3 (I had andystewart> 3 tonight, and 3-4 on a previous occasion while running andystewart> OpenOffice). It makes no sense to me that an application run andystewart> as a normal user could lockup a machine. andystewart> I've tried setting "nmi_watchdog=1" to see if I could get an andystewart> "oops" when it hard locks - no dice. Do you know any other andystewart> tricks I could try to see if it is the kernel which is locking andystewart> up? I'm running SuSE's version of 2.6.8.
Did you verify that NMIs are being delivered? After boot, cat /proc/interrupts and make sure the NMI line is non-zero. Also note that, at least with upstream and Red Hat kernels, the nmi_watchdog defaults to 1 for Opterons (i.e. you shouldn't need to manually set it).
andystewart> HI Jeff,
andystewart> Well, this is weird, I'm seeing a ZERO count for NMIs, so
andystewart> makes me think they are NOT being delivered. How would I go andystewart> about solving *that* little problem? I did a "cat
Well, you can try booting with nmi_watchdog=2. This will try to use
local APIC to deliver nmi's, but I haven't actually seen a dual
Andy, Your lockup's sound like an SMP issue. I'd try booting a non SMP kernel and then seeing if you can duplicate the lockup... I say this because I was getting practically the same behavior with a duel Xeon machine running FC3. The machine would just up and die in the same way your machine is. Just my $.02. Tim. -----Original Message----- From: wlug-bounces@mail.wlug.org [mailto:wlug-bounces@mail.wlug.org] On Behalf Of Andy Stewart Sent: Wednesday, June 08, 2005 7:15 PM To: jmoyer@redhat.com; Worcester Linux Users Group Subject: Re: [Wlug] HOWTO debug hard lockups -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeff Moyer wrote: than that the processor
system that required this (all of them I've seen work with nmi_watchdog=1). It is worth a try, however.
With nmi_watchdog=1, I see the following output from "dmesg | grep -i nmi" with the 2.6.11.11 kernel I very recently installed: Bootdata ok (command line is root=/dev/sda3 vga=0x31a selinux=0 console=tty0 resume=/dev/sda2 desktop elevator=as nmi_watchdog=1 splash=verbose) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) Kernel command line: root=/dev/sda3 vga=0x31a selinux=0 console=tty0 resume=/dev/sda2 desktop elevator=as nmi_watchdog=1 splash=verbose activating NMI Watchdog ... done. testing NMI watchdog ... CPU#1: NMI appears to be stuck (0)! I'll try nmi_watchdog=2 shortly. Thanks, Andy - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA, USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCp3v8Hl0iXDssISsRAoDJAJ0TonNTSEAd+4W0GRPwTW/UoGFssACdEEq9 8IgEFJ7VAL/FVeyPXrEapHg= =jcjE -----END PGP SIGNATURE----- _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 So far, so good, but I only installed 2.6.11.11: 7:49pm up 2 days 20:40, 7 users, load average: 2.04, 2.08, 2.08 Linux amdtux 2.6.11.11 #1 SMP Tue Jun 7 21:12:53 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA, USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCqiduHl0iXDssISsRApBGAJ4w8HXNq60qb/6BMON3uz4aoywjRwCdHWrC xVYbnucZfFPBUFt3MWZhXNc= =z/Mt -----END PGP SIGNATURE-----
Andy> So far, so good, but I only installed 2.6.11.11: Andy> 7:49pm up 2 days 20:40, 7 users, load average: 2.04, 2.08, 2.08 Andy> Linux amdtux 2.6.11.11 #1 SMP Tue Jun 7 21:12:53 EDT 2005 x86_64 x86_64 Andy> x86_64 GNU/Linux I assume you did a 'make oldconfig' to keep your settings a close as possible to the previous kernel? I'm wondering if there's some sort of difference in settings that might have made a difference here. In any case, good luck, and great to hear it's working well for you. My old dual cpu PIII Xeon 550mhz is still chugging along well, but I'm starting to think I want a silent NFS/mysql/daemon server with a bunch of disk and a tape drive to sit off in a quiet corner somewhere to do backups, serve files, etc. And be stable and quiet. Then my main box could become a more general play system again. John
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 John Stoffel wrote:
Andy> So far, so good, but I only installed 2.6.11.11: Andy> 7:49pm up 2 days 20:40, 7 users, load average: 2.04, 2.08, 2.08
Andy> Linux amdtux 2.6.11.11 #1 SMP Tue Jun 7 21:12:53 EDT 2005 x86_64 x86_64 Andy> x86_64 GNU/Linux
I assume you did a 'make oldconfig' to keep your settings a close as possible to the previous kernel? I'm wondering if there's some sort of difference in settings that might have made a difference here.
Well, what I actually did is this: booted previous kernel (2.6.8.something_from_suse) cd /usr/src/linux-2.6.11.11 zcat /proc/config.gz > .config make xconfig (make no changes, save it back out) make .......
In any case, good luck, and great to hear it's working well for you.
Thanks - I'm keeping my fingers crossed!
My old dual cpu PIII Xeon 550mhz is still chugging along well, but I'm starting to think I want a silent NFS/mysql/daemon server with a bunch of disk and a tape drive to sit off in a quiet corner somewhere to do backups, serve files, etc. And be stable and quiet. Then my main box could become a more general play system again.
My dual PIII 450MHz system is now a MythTV backend, and it seems quite happy in that role. :-) Later, Andy - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA, USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCq6mgHl0iXDssISsRAu6TAJ9EsNQVMwogKOSmUB1weKE2XU83pgCfZGSI 9n5fJ5mD4VwfZD2LxGcLg1E= =6fO7 -----END PGP SIGNATURE-----
participants (3)
-
Andy Stewart
-
John Stoffel
-
Keller, Tim