Re: [plug] decoding further a Machine Check Excepton

Top Page

Reply to this message
Author: Michael Tinsay
Date:  
To: Philippine Linux Users' Group (PLUG) Technical Discussion List
Subject: Re: [plug] decoding further a Machine Check Excepton
Thanks fooler and Edwin,
I ran memtest and mestester on the server for several days each and both didn't find any problem with the memory modules installed.

--- mike t.

      From: fooler mail <fooler.mail@???>
 To: Michael Tinsay <tinsami1@???>; Philippine Linux Users' Group (PLUG) Technical Discussion List <plug@???> 
 Sent: Wednesday, 26 October 2016, 9:05
 Subject: Re: [plug] decoding further a Machine Check Excepton


it looks like a memory error to me... can you remove the memory at
bank 8 if that solves the problem?

fooler.

On Mon, Oct 24, 2016 at 9:31 PM, Michael Tinsay <tinsami1@???> wrote:
> Hi!
>
> Yesterday one of our servers had this on the console:
>
> [ 1184.087973] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank
> 8: ba000000000000b2
> [ 1184.087973] mce: [Hardware Error]: TSC 3a3965b65c0 MISC 80000
> [ 1184.087973] mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1477301538
> SOCKET 0 APIC 0 microcode 2
> [ 1184.087973] mce: [Hardware Error]: Machine check: Processor context
> corrupt
>
> So I did some research and found out that I can use an app named mcelog to
> decode this.  This was the output from it:
>
> Hardware event. This is not a software error.
> CPU 0 BANK 8 TSC 3a3965b65c0
> MISC 80000
> TIME 1477301538 Mon Oct 24 17:32:18 2016
> MCG status:MCIP
> MCi status:
> Uncorrected error
> Error enabled
> MCi_MISC register valid
> Processor context corrupt
> MCA: MEMORY CONTROLLER AC_CHANNEL2_ERR
> Transaction: Address/Command error
> Memory corrected error count (CORE_ERR_CNT): 0
> Memory transaction Tracker ID (RTId): 0
> Memory DIMM ID of error: 0
> Memory channel ID of error: 2
> Memory ECC syndrome: 0
> STATUS ba000000000000b2 MCGSTATUS 4
> CPUID Vendor Intel Family 6 Model 44
> SOCKET 0 APIC 0 microcode 2
> tinsaymc@IT-046641:~$  cat mce.txt
> CPU 0: Machine Check Exception: 4 Bank 8: ba000000000000b2
> TSC 3a3965b65c0 MISC 80000
> PROCESSOR 0:206c2 TIME 1477301538 SOCKET 0 APIC 0 microcode 2
>
> So my question now, for those who know more about this area than I, is:  Is
> the exception due to a problem in the CPU itself or somewhere on the
> motherboard?
>
> Regards.
>
>
> --- mike t.
>
> _________________________________________________
> Philippine Linux Users' Group (PLUG) Mailing List
> http://lists.linux.org.ph/mailman/listinfo/plug
> Searchable Archives: http://archives.free.net.ph



_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
http://lists.linux.org.ph/mailman/listinfo/plug
Searchable Archives: http://archives.free.net.ph