System hang woes continue
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 HI gang, I think I somehow built the computer from Hell. Recall my dual Opteron box with SuSE 9.2 and 1 GB of memory. I removed every SATA device from my system, and now it doesn't hang as often, but it still hangs periodically. This time I got 10 days of uptime before it hung - a new world's record on this particular box. Strangely enough, I could demonstrate more uptime with the "old" SATA code than with libata - go figure. I am suspicious of the 2.6 kernel. We run dual Opteron servers at work with the 2.4 kernel series with no problems at all (on RedHat 7.3). I am wondering what would happen if I took SuSE 9.2 and replaced the 2.6 kernel with a 2.4 kernel. Am I asking for a heap of trouble? What specific issues would I encounter? I'm guessing that the kernel modules would be all fubar since I thought taht 2.4 and 2.6 did that quite differently. Perhaps I could bypass that with a monolithic kernel. I'm not sure what other problems I'd make for myself if I did this. Another thought - does somebody have a .config file for a Linux 2.6 kernel on an Opteron system that works really well? Thanks, Andy - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA, USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDYCguHl0iXDssISsRAuxmAJ9we3ey7BSPStkpVxgCyskVTThBvQCfYFjQ MD3q5MG1aecc81+ZY6KiZW4= =eBAh -----END PGP SIGNATURE-----
On Wed, Oct 26, 2005 at 09:06:54PM -0400, Andy Stewart wrote:
I think I somehow built the computer from Hell. Recall my dual Opteron box with SuSE 9.2 and 1 GB of memory.
what version of 2.6 ? there was a smp bug with amd64 that was pretty random/nasty but should be fixed in 2.6.13+ iirc ... -mike
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike Frysinger wrote:
On Wed, Oct 26, 2005 at 09:06:54PM -0400, Andy Stewart wrote:
I think I somehow built the computer from Hell. Recall my dual Opteron box with SuSE 9.2 and 1 GB of memory.
what version of 2.6 ? there was a smp bug with amd64 that was pretty random/nasty but should be fixed in 2.6.13+ iirc ... -mike _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
I am running 2.6.13.3. Andy - -- Andy Stewart, Founder Worcester Linux Users' Group Worcester, MA, USA http://www.wlug.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDYCrVHl0iXDssISsRArJjAJwKxA4aW8guWRUhZ4UZeP4PWfuH0QCeIPkl nxPfTPx51CtQwm48ukwasfU= =60Wm -----END PGP SIGNATURE-----
From: Andy Stewart <andystewart@comcast.net>
I think I somehow built the computer from Hell. Recall my dual Opteron box with SuSE 9.2 and 1 GB of memory.
Not specifically, but each of us has a computer that sometimes stops dead for no reason. I have always assumed that mine was due to some hardware that runs on the edge. Once or twice a crash seemed to happen exactly when I bumped the case. I moved it from my desk and set it on a shelf where it was never touched (well, hardly ever). It has been running as a firewall 24 hours per day for about a year with random crashes less common than power outages and thunderstorms. In the past few days it has crashed dozens of times, and sometimes takes two or three tries to re-boot. Maybe it doesn't like water dripping from the walls and soaking wet carpets. I just bought a bunch of parts and will attempt to build a computer not from Hell.
I am suspicious of the 2.6 kernel. We run dual Opteron servers at work with the 2.4 kernel series with no problems at all (on RedHat 7.3). I am wondering what would happen if I took SuSE 9.2 and replaced the 2.6 kernel with a 2.4 kernel. Am I asking for a heap of trouble?
Oh! What hours of fun you are about to enjoy!
What specific issues would I encounter?
Duh...Incompatible C libraries?
I'm guessing that the kernel modules would be all fubar since I thought taht 2.4 and 2.6 did that quite differently. Perhaps I could bypass that with a monolithic kernel. I'm not sure what other problems I'd make for myself if I did this.
The modules come with the kernel, do they not? Do you have 'third party' modules? It should be no problem having two sets of modules installed, as long as the kernel version numbers are different, since each kernel looks for its modules in a directory with its own name on it. But the new kernel must have new powers. Does SuSE depend on the new stuff? -- Keith
==> Regarding [Wlug] System hang woes continue; Andy Stewart <andystewart@comcast.net> adds: andystewart> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 andystewart> HI gang, andystewart> I think I somehow built the computer from Hell. Recall my andystewart> dual Opteron box with SuSE 9.2 and 1 GB of memory. andystewart> I removed every SATA device from my system, and now it doesn't andystewart> hang as often, but it still hangs periodically. This time I andystewart> got 10 days of uptime before it hung - a new world's record on andystewart> this particular box. Strangely enough, I could demonstrate andystewart> more uptime with the "old" SATA code than with libata - go andystewart> figure. andystewart> I am suspicious of the 2.6 kernel. We run dual Opteron andystewart> servers at work with the 2.4 kernel series with no problems at andystewart> all (on RedHat 7.3). I am wondering what would happen if I andystewart> took SuSE 9.2 and replaced the 2.6 kernel with a 2.4 kernel. andystewart> Am I asking for a heap of trouble? andystewart> What specific issues would I encounter? Problems with your device tree, no doubt. I think this is a rat hole. Just don't go there. I'd run a live cd based on 2.4 before trying what you're suggesting. andystewart> I'm guessing that the kernel modules would be all fubar since andystewart> I thought taht 2.4 and 2.6 did that quite differently. andystewart> Perhaps I could bypass that with a monolithic kernel. I'm not andystewart> sure what other problems I'd make for myself if I did this. andystewart> Another thought - does somebody have a .config file for a andystewart> Linux 2.6 kernel on an Opteron system that works really well? The distribution configs always work for me. They probably see the most testing, if you think about it. One other suggestion: have you checked to see if there are any BIOS updates? I've seen *very* strange problems of the random variety that are really BIOS bugs. If you're really serious about tracking this down, I'd start doing some stress testing to see if you can get a reliable reproducer. Without that, it's tough to address. Or, enable kexec/kdump. There's a quick blurb on it in fedora weekly news, but that's for fedora. -Jeff
Andy> I think I somehow built the computer from Hell. Recall my dual Andy> Opteron box with SuSE 9.2 and 1 GB of memory. Bummers, I've been thinking about going the opteron route as well with a dual box like this. Thoughts off the top of my head: 1. Is the BIOS updated? 2. Can you tune/tweak the BIOS settings to more conservative values? 3. Try running kernel 2.6.14-rc5, there were various problems found and fixed with AMD stuff. 4. Try booting a UP kernel -or- Try booting an SMP kernel with 'nosmp' boot option. 5. Try booting with the 'noapic' option. 6. Boot with a serial console, log all output to another system. Try to do magic SysRq from that console when the system hangs. Andy> I removed every SATA device from my system, and now it doesn't Andy> hang as often, but it still hangs periodically. Hmm... this points to the problem not being SATA then. Can you disable the SATA chipset from the BIOS level as a test? Andy> This time I got 10 days of uptime before it hung - a new world's Andy> record on this particular box. Strangely enough, I could Andy> demonstrate more uptime with the "old" SATA code than with Andy> libata - go figure. How did the system hang? Do you have sysreq enabled? It would be good to know what type of lockups you're getting here. Does it lockup when the system is idle or in use? Andy> I am suspicious of the 2.6 kernel. We run dual Opteron servers Andy> at work with the 2.4 kernel series with no problems at all (on Andy> RedHat 7.3). I am wondering what would happen if I took SuSE Andy> 9.2 and replaced the 2.6 kernel with a 2.4 kernel. Am I asking Andy> for a heap of trouble? Not really, though you might have issues if you have binary modules for closed source drivers. I'd grab the 2.4.31 kernel and try it out. It should work fairly well on there, though you might have issues if you use udev for your devices. Someone else suggested a LiveCD, which I think is a good idea. If you can, trash one of your SATA disks and do a debian/ubuntu install using the 2.4 kernel and see what happens. We run Rackable Dual/Quad opterons at work with RHEL 3 and they just work. But we're only stressing them with CPU bound jobs, not with devices or graphics or audio or anything like that. Good luck! John
Friends, I have HP ML 370p with DVD RW. Here, I have tried to install RHEL 4 ES, but not work with several diffrent error listed during the installation. But, its all ok after replacing the DVD RW with common cd rom. can you help me here how to copy data from physical hardisk to a blank dvd drive ? whats the command ? Thanks all. Regards, Aramico
Aramico> Here, I have tried to install RHEL 4 ES, but not work with Aramico> several diffrent error listed during the installation. Bummers. Aramico> But, its all ok after replacing the DVD RW with common cd Aramico> rom. Good. Aramico> can you help me here how to copy data from physical hardisk Aramico> to a blank dvd drive ? whats the command ? Thanks all. You should google for 'dvd writing linux' and probably 'dvdrtools' as well. But if the drive was giving errors on reading a disk, then you're probably going to have problems writing. John
We have had problems with bulging capacitors. After some research, I have found that many different systems have this issue (Dell, Intel, etc. motherboards). Open the BFH (box from hell) and check the caps. They may be bulging on top or even leaking. This is from an electrolyte formula that was stolen from Japan. Good luck! Walt On Thu, 2005-10-27 at 11:55 -0400, John Stoffel wrote:
Andy> I think I somehow built the computer from Hell. Recall my dual Andy> Opteron box with SuSE 9.2 and 1 GB of memory.
Bummers, I've been thinking about going the opteron route as well with a dual box like this. Thoughts off the top of my head:
1. Is the BIOS updated? 2. Can you tune/tweak the BIOS settings to more conservative values? 3. Try running kernel 2.6.14-rc5, there were various problems found and fixed with AMD stuff. 4. Try booting a UP kernel -or- Try booting an SMP kernel with 'nosmp' boot option. 5. Try booting with the 'noapic' option. 6. Boot with a serial console, log all output to another system. Try to do magic SysRq from that console when the system hangs.
Andy> I removed every SATA device from my system, and now it doesn't Andy> hang as often, but it still hangs periodically.
Hmm... this points to the problem not being SATA then. Can you disable the SATA chipset from the BIOS level as a test?
Andy> This time I got 10 days of uptime before it hung - a new world's Andy> record on this particular box. Strangely enough, I could Andy> demonstrate more uptime with the "old" SATA code than with Andy> libata - go figure.
How did the system hang? Do you have sysreq enabled? It would be good to know what type of lockups you're getting here. Does it lockup when the system is idle or in use?
Andy> I am suspicious of the 2.6 kernel. We run dual Opteron servers Andy> at work with the 2.4 kernel series with no problems at all (on Andy> RedHat 7.3). I am wondering what would happen if I took SuSE Andy> 9.2 and replaced the 2.6 kernel with a 2.4 kernel. Am I asking Andy> for a heap of trouble?
Not really, though you might have issues if you have binary modules for closed source drivers. I'd grab the 2.4.31 kernel and try it out. It should work fairly well on there, though you might have issues if you use udev for your devices.
Someone else suggested a LiveCD, which I think is a good idea. If you can, trash one of your SATA disks and do a debian/ubuntu install using the 2.4 kernel and see what happens.
We run Rackable Dual/Quad opterons at work with RHEL 3 and they just work. But we're only stressing them with CPU bound jobs, not with devices or graphics or audio or anything like that.
Good luck! John _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
From: "John Stoffel" <john@stoffel.org>
Bummers, I've been thinking about going the opteron route as well with a dual box like this. Thoughts off the top of my head:
1. Is the BIOS updated? 2. Can you tune/tweak the BIOS settings to more conservative values?
I like that idea, where can I learn about BIOS tweaking and what values are conservative? Can I read about overclocking and just do the opposite? I guess I am getting old; I would gladly wait five percent longer if it meant I could be certain it would work, and I feel no need to brag about the size of my MHz. -- Keith
There should be settings in your bios main screen for 'Load setup defaults' or 'Load Failsafe defaults' The setup defaults are pretty conservative, but the failsafe is barebones On 10/29/05, Keith Wright <kwright@free-comp-shop.com> wrote:
From: "John Stoffel" <john@stoffel.org>
Bummers, I've been thinking about going the opteron route as well with a dual box like this. Thoughts off the top of my head:
1. Is the BIOS updated? 2. Can you tune/tweak the BIOS settings to more conservative values?
I like that idea, where can I learn about BIOS tweaking and what values are conservative? Can I read about overclocking and just do the opposite? I guess I am getting old; I would gladly wait five percent longer if it meant I could be certain it would work, and I feel no need to brag about the size of my MHz.
-- Keith _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
participants (8)
-
Andy Stewart
-
Aramico
-
Eric Martin
-
Jeff Moyer
-
John Stoffel
-
Keith Wright
-
Mike Frysinger
-
Walt Sawyer