help diagnosing PC issue: suspect power supply
Hi, I'm trying to diagnose a home built PC running fine since 2007 but lately starting to act odd. Here are the system specs: Intel Core 2 Duo 1.8GHz cpu Asus P5B Deluxe mobo Nvidia 7900GS graphics card Kingston KHX6400 2x1GB DDR2 memory Sparkle ATX 400W power supply 3x 1TB SATA HDDs 1x SATA DVD-RW $ uname -a Linux spider 3.0.0-25-generic #41-Ubuntu SMP Mon Aug 13 17:58:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/*release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=11.10 DISTRIB_CODENAME=oneiric DISTRIB_DESCRIPTION="Ubuntu 11.10" $ uname -a Linux spider 3.0.0-25-generic #41-Ubuntu SMP Mon Aug 13 17:58:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux What recently changed: i) replaced monitor, last one died ii) (carefully) vacuumed the dust from inside. Kept hand on case chassis, didn't touch any PCB/chips, etc. Here are the symptoms: 1) first started seeing BIOS claims of bad CMOS checksum at boot time. Immediately suspected the CMOS battery, replaced it and reset CMOS settings but I continue to see this. This is the one symptom that I can't yet place, unless it's simply a bad replacement battery. 2) next started seeing the monitor fail to wake up after entering power save 3) powering on the PC shows issues: 3a) often the monitor won't wake at boot 3b) the bios often doesn't finish POST, gets stuck at splash screen 3c) bios settings often don't survive across power cycles (possibly related to #1--battery issue?) 3d) bios doesn't always give the affirmative beep (likely same symptom as 3a, 3b) 3e) a cold power on or reset can take several "tries" before the system stays on. I.e. fans will spin up, then spin down, one to three times before staying on. When this happens it's pretty much guaranteed that it won't boot successfully. 4) system will freeze to user input while up. I.e. once I get Linux booted, as long as I disable monitor power save it will stay running for a while, but I observed it stopped responding to KB and mouse input earlier this morning. Here's what I've tried: a) reseated both DIMMs, reproduced issues; tried with just one DIMM, repro'd; tried other DIMM, repro'd b) reseated graphics card, repro'd; removed graphics card (now headless), repro'd problem 3e c) reseated CMOS battery (CR2032), repro'd d) pulled all non-essential cables and internal connections, including power and SATA cables from all drives, repro'd e) updated BIOS to latest version, repro'd Unfortunately I don't have another ATX power supply to try, but was thinking of buying one. I might also try an old PCI video card, but I suspect due to repro (b) above that it's not the cause. Anyone have any other insights or agree it sounds like a bad PS? I still can't explain symptom (1), will probably buy another battery to be sure. Five years seems like an old PC, but I'm not a gamer so this suits me fine. Thanks, Brett
On Sat, Sep 29, 2012 at 02:32:05PM -0400, Brett Russ wrote:
Sparkle ATX 400W power supply Anyone have any other insights or agree it sounds like a bad PS? I still can't explain symptom (1), will probably buy another battery to be sure. Five years seems like an old PC, but I'm not a gamer so this suits me fine.
I agree that the power supply is suspect. You can measure the voltage on a spare power supply hard drive connector, there should be 12v from black to yellow, and 5v from black to red. If that checks out, you should measure from the motherboard ATX connector as some of the other voltages may be out of spec. You can find the standard ATX pinouts and voltages online. Finally, you can test with the power supply completely disconnected from the PC. Again, you can find instructions online for how to do that, as it requires a jumper wire for the power supply to power on. Alternatively, it may be easier to just try a replacement.
Thanks Chuck, On Sat, Sep 29, 2012 at 6:19 PM, Chuck Anderson <cra@wpi.edu> wrote:
I agree that the power supply is suspect. You can measure the voltage on a spare power supply hard drive connector, there should be 12v from black to yellow, and 5v from black to red. If that checks out, you should measure from the motherboard ATX connector as some of the other voltages may be out of spec. You can find the standard ATX pinouts and voltages online. Finally, you can test with the power supply completely disconnected from the PC. Again, you can find instructions online for how to do that, as it requires a jumper wire for the power supply to power on.
I suppose it's time to buy a multimeter...or borrow one from work.
Alternatively, it may be easier to just try a replacement.
I have a new one on the way. I'll reply here with the results. -BR
On Mon, Oct 1, 2012 at 7:57 AM, Brett Russ <bruss@alum.wpi.edu> wrote:
On Sat, Sep 29, 2012 at 6:19 PM, Chuck Anderson <cra@wpi.edu> wrote:
I agree that the power supply is suspect. You can measure the voltage
Alternatively, it may be easier to just try a replacement.
I have a new one on the way. I'll reply here with the results.
Well, brand new power supply and the system behaves the same problematic way as before. So next try is the memory, as another list reply suggested. I've had memtest86 running against both DIMMs for 47+ hours and counting, 70 total passes so far, and 0 errors. Remember I'd already tried removing first one DIMM, then the other, and the problem repro'd either way. So by these measures it'd seem the memory is OK? I know some mem problems take days to show up so I'll just leave this running for a couple more days anyway. Not much more I can do here, as I've seen repros with most everything else unplugged so I'm now thinking it must be the motherboard. Perhaps I will try reseating the CPU--haven't done that yet. Recall the system is having trouble POST'ing: it hangs at various points before BIOS POST completes, i.e. I don't get a confirming 'beep' at end of POST and in some cases the monitor won't wake, other cases the BIOS splash screen stays splashed, etc. The full original email is here: http://mail.wlug.org/pipermail/wlug/2012-September/008059.html Anyway, if anyone has other ideas here I'd welcome them. Thanks again, Brett
If you reseat the CPU, be careful with the heatsink. I have pulled out an AMD chip out of a socket with the socket engaged by trying to remove the heatsink. The heatsink compound can get gooey enough to act like a weak glue. On Mon, Oct 8, 2012, at 11:30 PM, Brett Russ wrote:
On Mon, Oct 1, 2012 at 7:57 AM, Brett Russ <bruss@alum.wpi.edu> wrote:
On Sat, Sep 29, 2012 at 6:19 PM, Chuck Anderson <cra@wpi.edu> wrote:
I agree that the power supply is suspect. You can measure the voltage
Alternatively, it may be easier to just try a replacement.
I have a new one on the way. I'll reply here with the results.
Well, brand new power supply and the system behaves the same problematic way as before.
So next try is the memory, as another list reply suggested. I've had memtest86 running against both DIMMs for 47+ hours and counting, 70 total passes so far, and 0 errors. Remember I'd already tried removing first one DIMM, then the other, and the problem repro'd either way. So by these measures it'd seem the memory is OK? I know some mem problems take days to show up so I'll just leave this running for a couple more days anyway.
Not much more I can do here, as I've seen repros with most everything else unplugged so I'm now thinking it must be the motherboard. Perhaps I will try reseating the CPU--haven't done that yet.
Recall the system is having trouble POST'ing: it hangs at various points before BIOS POST completes, i.e. I don't get a confirming 'beep' at end of POST and in some cases the monitor won't wake, other cases the BIOS splash screen stays splashed, etc. The full original email is here: http://mail.wlug.org/pipermail/wlug/2012-September/008059.html
Anyway, if anyone has other ideas here I'd welcome them.
Thanks again, Brett _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug --
kstratton@fastmail.us
On Mon, Oct 8, 2012 at 11:30 PM, Brett Russ <bruss@alum.wpi.edu> wrote:
Well, brand new power supply and the system behaves the same problematic way as before.
So next try is the memory, as another list reply suggested. I've had memtest86 running against both DIMMs for 47+ hours and counting, 70 total passes so far, and 0 errors. Remember I'd already tried removing first one DIMM, then the other, and the problem repro'd either way. So by these measures it'd seem the memory is OK? I know some mem problems take days to show up so I'll just leave this running for a couple more days anyway.
Not much more I can do here, as I've seen repros with most everything else unplugged so I'm now thinking it must be the motherboard. Perhaps I will try reseating the CPU--haven't done that yet.
It could be the motherboard or CPU, but try unplugging all USB devices and removing all the internal cards that you can, first. If it boots okay then, try adding them back, one at a time, until you find the offending device. -- Rich
On Tue, Oct 9, 2012 at 11:59 AM, Richard Klein <rich@richardklein.org>wrote:
It could be the motherboard or CPU, but try unplugging all USB devices and removing all the internal cards that you can, first. If it boots okay then, try adding them back, one at a time, until you find the offending device.
Just a second for this sort of approach. I once had a machine which basically got hit by lightning and they went down. In the end, I found that what had actually happened was the hub I was using got hit and fried all the network cards on the LAN. As I recall, the machines wouldn't even let me power them on until I pulled the (noticeably crispy) network card out and then everything worked fine after that.
How about the power capacitors and/or ic's on the board itself (as opposed the the power supply)? I have to admit, that is the first thing that came to my mind after reading the first few posts. I'm surprised nobody else went there. Of course, memory is a fickle thing and it still could be that, but I am feeling that it is something else. DaveC On 10/8/2012 11:30 PM, Brett Russ wrote:
On Sat, Sep 29, 2012 at 6:19 PM, Chuck Anderson <cra@wpi.edu> wrote:
I agree that the power supply is suspect. You can measure the voltage Alternatively, it may be easier to just try a replacement. I have a new one on the way. I'll reply here with the results. Well, brand new power supply and the system behaves the same
On Mon, Oct 1, 2012 at 7:57 AM, Brett Russ <bruss@alum.wpi.edu> wrote: problematic way as before.
So next try is the memory, as another list reply suggested. I've had memtest86 running against both DIMMs for 47+ hours and counting, 70 total passes so far, and 0 errors. Remember I'd already tried removing first one DIMM, then the other, and the problem repro'd either way. So by these measures it'd seem the memory is OK? I know some mem problems take days to show up so I'll just leave this running for a couple more days anyway.
Not much more I can do here, as I've seen repros with most everything else unplugged so I'm now thinking it must be the motherboard. Perhaps I will try reseating the CPU--haven't done that yet.
Recall the system is having trouble POST'ing: it hangs at various points before BIOS POST completes, i.e. I don't get a confirming 'beep' at end of POST and in some cases the monitor won't wake, other cases the BIOS splash screen stays splashed, etc. The full original email is here: http://mail.wlug.org/pipermail/wlug/2012-September/008059.html
Anyway, if anyone has other ideas here I'd welcome them.
Thanks again, Brett _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
-- David P. Connell "Watch where you're going; remember where you've been."
I agree with running a liveCD OS to test for software issues. Almost always do that to draw the line between software and hardware issues. Try Puppy Linux, if only because of the fast download, cute name, and it is of course, useful as well. SystemRescueCD too. And, to clarify about "fickle memory" - I meant running tests on memory can be tricky, as was mentioned by another individual. Still sounds like it could be spiking power and then lagging power in the supply circuit on the motherboard. Could be a single capacitor or (sadly) impossible to find economically. I hope it is something simple though, like the NIC idea, although those are usually onboard nowadays and a bad IC or circuit there may remain a problem even if not being used. You could try the same idea as our friend suggested of removing peripherals and adding them back by doing something similar in the bios by disabling all nonessential items. Turn off the NIC, the serial port, if applicable, etc. One thing I wasn't sure of: have you tried a different video card? You mentioned replacing the failed monitor in the original post. I always try to go back to previous failures and changes to trace current problems. It may be possible that some sort of power spike during the failure of that old monitor caused an intermittent (or pending) issue in your video card that is now showing up in your system. Since the video subsystem is such a 'major' part of the way we use computers these days and also very tightly bound to the power system and power requirements in a lot of instances, I would at least check it out. DaveC On 10/10/2012 1:24 PM, David P. Connell wrote:
How about the power capacitors and/or ic's on the board itself (as opposed the the power supply)? I have to admit, that is the first thing that came to my mind after reading the first few posts. I'm surprised nobody else went there. Of course, memory is a fickle thing and it still could be that, but I am feeling that it is something else.
DaveC
On 10/8/2012 11:30 PM, Brett Russ wrote:
On Sat, Sep 29, 2012 at 6:19 PM, Chuck Anderson <cra@wpi.edu> wrote:
I agree that the power supply is suspect. You can measure the voltage Alternatively, it may be easier to just try a replacement. I have a new one on the way. I'll reply here with the results. Well, brand new power supply and the system behaves the same
On Mon, Oct 1, 2012 at 7:57 AM, Brett Russ <bruss@alum.wpi.edu> wrote: problematic way as before.
So next try is the memory, as another list reply suggested. I've had memtest86 running against both DIMMs for 47+ hours and counting, 70 total passes so far, and 0 errors. Remember I'd already tried removing first one DIMM, then the other, and the problem repro'd either way. So by these measures it'd seem the memory is OK? I know some mem problems take days to show up so I'll just leave this running for a couple more days anyway.
Not much more I can do here, as I've seen repros with most everything else unplugged so I'm now thinking it must be the motherboard. Perhaps I will try reseating the CPU--haven't done that yet.
Recall the system is having trouble POST'ing: it hangs at various points before BIOS POST completes, i.e. I don't get a confirming 'beep' at end of POST and in some cases the monitor won't wake, other cases the BIOS splash screen stays splashed, etc. The full original email is here: http://mail.wlug.org/pipermail/wlug/2012-September/008059.html
Anyway, if anyone has other ideas here I'd welcome them.
Thanks again, Brett _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
-- David P. Connell "Watch where you're going; remember where you've been."
All, I'll try to address everyone's responses at once. In short, it's nothing to do with the OS because the problem is quite simply a failure for the BIOS to complete POST, and quite often at that. As I mentioned I've already repro'd the problem with everything (which means a video card and all SATA devices and all USB devices b/c everything else is onboard) unplugged. This is why I suspect the HW. The next step is, as someone suggested, monitoring motherboard LEDs, although I haven't seen much here yet. There is no beep pattern when things fail. There is, apparently, a beep at end of POST, which I don't hear when the problem repro's, which indicates to me that I'm stuck in POST somewhere. I will double check later tonight the completely bare motherboard (with CPU and at least one DIMM, of course) to see if I can repro again in this state. I'll check for any LEDs and flashing patterns. Lastly, I'll manually disable through BIOS menu as many devices as I can and hope that the settings stick across reboot (I'd mentioned some problems in this department in my original email). Even then I'm not sure if the BIOS will still include these devices in POST (I'd expect it might) and then just disable them before proceeding to boot the OS. Thanks for the help all! Brett
Brett> I will double check later tonight the completely bare motherboard Brett> (with CPU and at least one DIMM, of course) to see if I can repro Brett> again in this state. I'll check for any LEDs and flashing patterns. Actually, I'd also go into the BIOS and reset it to the manufacturers defaults, just in case you made a change at some point to tweak settings for more performance, etc. I.E. make sure there's no overclocking or anything like that. Good luck! John
On Wed, Oct 10, 2012 at 4:58 PM, John Stoffel <john@stoffel.org> wrote:
Actually, I'd also go into the BIOS and reset it to the manufacturers defaults, just in case you made a change at some point to tweak settings for more performance, etc. I.E. make sure there's no overclocking or anything like that.
Yes, I've done that several times. In fact, since my BIOS is having issues preserving settings and shows the CMOS checksum failure quite often, I usually have to revert to defaults just to boot. This, despite a brand new CMOS battery. This is the other symptom in my original email that I can't yet explain. But, if it ends up being a motherboard HW issue, then anything could go wrong including corrupted CMOS. thanks, Brett
"Brett" == Brett Russ <bruss@alum.wpi.edu> writes:
Brett> Yes, I've done that several times. In fact, since my BIOS is Brett> having issues preserving settings and shows the CMOS checksum Brett> failure quite often, I usually have to revert to defaults just Brett> to boot. This, despite a brand new CMOS battery. This is the Brett> other symptom in my original email that I can't yet explain. Brett> But, if it ends up being a motherboard HW issue, then anything Brett> could go wrong including corrupted CMOS. This points more to a low level hardware issue in the motherboard, or possibly the CPU. Hard to know. If it's old enough, it might be time to just replace it. John
Take a close look at the capacitors on the motherboard. If they are bulging you can either replace them or the whole board. Walt On 10/11/2012 10:30 AM, John Stoffel wrote:
"Brett" == Brett Russ<bruss@alum.wpi.edu> writes:
Brett> Yes, I've done that several times. In fact, since my BIOS is Brett> having issues preserving settings and shows the CMOS checksum Brett> failure quite often, I usually have to revert to defaults just Brett> to boot. This, despite a brand new CMOS battery. This is the Brett> other symptom in my original email that I can't yet explain. Brett> But, if it ends up being a motherboard HW issue, then anything Brett> could go wrong including corrupted CMOS.
This points more to a low level hardware issue in the motherboard, or possibly the CPU. Hard to know. If it's old enough, it might be time to just replace it.
John _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
On Thu, Oct 11, 2012 at 7:40 PM, Walt <waltsaw@gmail.com> wrote:
Take a close look at the capacitors on the motherboard. If they are bulging you can either replace them or the whole board. Walt
On 10/11/2012 10:30 AM, John Stoffel wrote:
"Brett" == Brett Russ<bruss@alum.wpi.edu> writes:
>
Brett> Yes, I've done that several times. In fact, since my BIOS is Brett> having issues preserving settings and shows the CMOS checksum Brett> failure quite often, I usually have to revert to defaults just Brett> to boot. This, despite a brand new CMOS battery. This is the Brett> other symptom in my original email that I can't yet explain. Brett> But, if it ends up being a motherboard HW issue, then anything Brett> could go wrong including corrupted CMOS.
This points more to a low level hardware issue in the motherboard, or possibly the CPU. Hard to know. If it's old enough, it might be time to just replace it.
Just wanted to follow up with the group on the final resolution of this issue (I hope!). As I wrote earlier the new power supply didn't help. I ended up buying a new motherboard, got it today, installed it tonight with the old power supply and all came right up w/o issues. Interesting findings: when I took the CPU cooler off there was a big dead spider curled up amongst the power inductors surrounding the CPU. Also, there were a couple caps that were slightly "crowning", i.e. bulging a bit on top, when compared with some other identical ones on the board with the same capacity. So I figure either of these could be contributors, more likely the caps. Now I have a nearly brand new Antec 450W power supply back in the box, for sale if anyone's interested. Thanks all for the help, Brett
The caps must have been dying of dead spider overload. Liz J On 6 November 2012 20:34, Brett Russ <bruss@alum.wpi.edu> wrote:
Just wanted to follow up with the group on the final resolution of this issue (I hope!). As I wrote earlier the new power supply didn't help.
I ended up buying a new motherboard, got it today, installed it tonight with the old power supply and all came right up w/o issues.
Interesting findings: when I took the CPU cooler off there was a big dead spider curled up amongst the power inductors surrounding the CPU. Also, there were a couple caps that were slightly "crowning", i.e. bulging a bit on top, when compared with some other identical ones on the board with the same capacity. So I figure either of these could be contributors, more likely the caps.
Now I have a nearly brand new Antec 450W power supply back in the box, for sale if anyone's interested.
Thanks all for the help, Brett
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
Yeah, there's nothing to fear except fear itself... AND SPIDERS! On Nov 7, 2012 9:31 AM, "E Johnson" <iris.gates@gmail.com> wrote:
The caps must have been dying of dead spider overload. Liz J
On 6 November 2012 20:34, Brett Russ <bruss@alum.wpi.edu> wrote:
Just wanted to follow up with the group on the final resolution of this issue (I hope!). As I wrote earlier the new power supply didn't help.
I ended up buying a new motherboard, got it today, installed it tonight with the old power supply and all came right up w/o issues.
Interesting findings: when I took the CPU cooler off there was a big dead spider curled up amongst the power inductors surrounding the CPU. Also, there were a couple caps that were slightly "crowning", i.e. bulging a bit on top, when compared with some other identical ones on the board with the same capacity. So I figure either of these could be contributors, more likely the caps.
Now I have a nearly brand new Antec 450W power supply back in the box, for sale if anyone's interested.
Thanks all for the help, Brett
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
My apologies for excess (here and previous posts). I should have re-read the original post more clearly. Still, I hope I may have sparked a thought along the chain of diagnosis. Good luck. -- David P. Connell "Watch where you're going; remember where you've been."
participants (10)
-
Brett Russ
-
Chuck Anderson
-
David P. Connell
-
E Johnson
-
John Stoffel
-
kstratton@fastmail.us
-
Randall Mason
-
Richard Klein
-
Theo Van Dinter
-
Walt