HI guys, I finally got some type of crash dump from my system. I have tried Ubuntu variations of the 2.6.35, 2.6.36, and 2.6.37 kernels, all of which have crashed on me. Finally, 2.6.37 gave me a trace on the console screen. I had to copy it down by hand, so don't take it too literally. Here is the data. I intentionally left out all of the hex addresses in the call trace. It seems pretty obvious to me that something is amiss in the hardware. I wonder if my heat syncs are full of dust. That will be the first thing I check. Any helpful advice would be most appreciated. Thanks! Andy ======= Hardware error CPU0 Machine Check Exception: 4 Bank 4 2f00001000010c0f TSC 1ae6ab55d12 Processor 2 f5a time 1296699696 socket 0 apic 0 mc4_status uncorrected error, other errors lost: yes cpu context corrupt: yes Northbridge error, node 0, crc error detected on HT link Transaction GEN (GEN), no timeout, Cache level L3/Gen, Participating processor:OBS Machine check: processor context corrupt Kernel Panic - not syncing: Fatal Machine check on current CPU Pid 0, comm: swapper Tainted: G M 2.6.37-020637-generic Call Trace: #MC panic printk mce_panic do_machine_check machine_check native_safe_halt EOE default_idle cpu_idle rest_init start_kernel early_idt_handler x86_64_start_reservations x86_64_start_kernel panic occurred, switching back to text console ======= -- Andy Stewart (KB1OIQ) Founder: Worcester Linux Users' Group Founder: Chelmsford Linux Meetup Group President: PART of Westford, MA (WB1GOF)
On Wed, 02 Feb 2011 21:38:42 -0500, Andy Stewart wrote:
HI guys,
I finally got some type of crash dump from my system. I have tried Ubuntu variations of the 2.6.35, 2.6.36, and 2.6.37 kernels, all of which have crashed on me.
Finally, 2.6.37 gave me a trace on the console screen. I had to copy it down by hand, so don't take it too literally.
Here is the data. I intentionally left out all of the hex addresses in the call trace. It seems pretty obvious to me that something is amiss in the hardware. I wonder if my heat syncs are full of dust. That will be the first thing I check.
Any helpful advice would be most appreciated.
Thanks!
Andy
=== Hardware error CPU0 Machine Check Exception: 4 Bank 4 2f00001000010c0f TSC 1ae6ab55d12 Processor 2 f5a time 1296699696 socket 0 apic 0 mc4_status uncorrected error, other errors lost: yes cpu context corrupt: yes Northbridge error, node 0, crc error detected on HT link
Well, that about says it: a CRC error on the HyperTransport link. That sure sounds like a hardware glitch. You can certainly try cleaning out the heat sinks and fans, but if that doesn't work, it may be time for a new CPU and/or mobo.
I'd run memtest86 plus, the memory controller can cause those of you have a bad module. Try swapping out the ram before. Blaming the CPU or motherboard. I had a similar panic a long time ago, turned out 1 of my memory modules was bad. Definately a hardware issue though. On Feb 2, 2011 9:49 PM, "Robert Krawitz" <rlk@alum.mit.edu> wrote: On Wed, 02 Feb 2011 21:38:42 -0500, Andy Stewart wrote:
HI guys,
I finally got some type of ...
Well, that about says it: a CRC error on the HyperTransport link. That sure sounds like a hardware glitch. You can certainly try cleaning out the heat sinks and fans, but if that doesn't work, it may be time for a new CPU and/or mobo. _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlu...
On 02/02/2011 10:25 PM, Jason Couture wrote:
I'd run memtest86 plus, the memory controller can cause those of you have a bad module. Try swapping out the ram before. Blaming the CPU or motherboard. I had a similar panic a long time ago, turned out 1 of my memory modules was bad. Definately a hardware issue though.
Well, I cleaned out a bunch of dust...maybe a heat problem caused some weirdness. I'm also running memtester on the live system...so far, so good. The machine is about 5 yrs old - maybe this is my excuse to build myself another one. The crash is weird, sometimes it crashes within a couple of minutes of booting the system, and other times, it takes a week. I still claim that it started happening when I upgraded from SuSE 10.1 to Kubuntu 10.10, but I did open the case and install a new SATA hard drive at that time as well. This machine was an absolute rock when running SuSE 10.1. We'll see what happens...thanks, everybody, for the helpful suggestions. Later, Andy -- Andy Stewart (KB1OIQ) Founder: Worcester Linux Users' Group Founder: Chelmsford Linux Meetup Group President: PART of Westford, MA (WB1GOF)
Andy, The mcelog program will decode that Machine Check Exception for you and tell you what's going on. As others mentioned, could be heat sink, dust, or RAM. It also could be a strained power supply unit or poor connections. Good luck! ~j On Wed, Feb 2, 2011 at 9:38 PM, Andy Stewart <andystewart@comcast.net> wrote:
HI guys,
I finally got some type of crash dump from my system. I have tried Ubuntu variations of the 2.6.35, 2.6.36, and 2.6.37 kernels, all of which have crashed on me.
Finally, 2.6.37 gave me a trace on the console screen. I had to copy it down by hand, so don't take it too literally.
Here is the data. I intentionally left out all of the hex addresses in the call trace. It seems pretty obvious to me that something is amiss in the hardware. I wonder if my heat syncs are full of dust. That will be the first thing I check.
Any helpful advice would be most appreciated.
Thanks!
Andy
=======
Hardware error CPU0 Machine Check Exception: 4 Bank 4 2f00001000010c0f TSC 1ae6ab55d12 Processor 2 f5a time 1296699696 socket 0 apic 0 mc4_status uncorrected error, other errors lost: yes cpu context corrupt: yes Northbridge error, node 0, crc error detected on HT link Transaction GEN (GEN), no timeout, Cache level L3/Gen, Participating processor:OBS Machine check: processor context corrupt Kernel Panic - not syncing: Fatal Machine check on current CPU Pid 0, comm: swapper Tainted: G M 2.6.37-020637-generic Call Trace: #MC panic printk mce_panic do_machine_check machine_check native_safe_halt
EOE default_idle cpu_idle rest_init start_kernel early_idt_handler x86_64_start_reservations x86_64_start_kernel
panic occurred, switching back to text console
=======
-- Andy Stewart (KB1OIQ) Founder: Worcester Linux Users' Group Founder: Chelmsford Linux Meetup Group President: PART of Westford, MA (WB1GOF) _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
participants (4)
-
Andy Stewart
-
Jason Couture
-
Jorden M
-
Robert Krawitz