Re: [plug] decoding further a Machine Check Excepton

Top Page

Reply to this message
Author: fooler mail
Date:  
To: Michael Tinsay, Philippine Linux Users' Group (PLUG) Technical Discussion List
Subject: Re: [plug] decoding further a Machine Check Excepton
Ok probably a short blip on your power supply (eg. low voltage) that
can cause data corruption. Is the problem still persist?

fooler.

On Thu, Oct 27, 2016 at 3:45 AM, Michael Tinsay <tinsami1@???> wrote:
> Thanks fooler and Edwin,
>
> I ran memtest and mestester on the server for several days each and both
> didn't find any problem with the memory modules installed.
>
>
> --- mike t.
>
> ________________________________
> From: fooler mail <fooler.mail@???>
> To: Michael Tinsay <tinsami1@???>; Philippine Linux Users' Group
> (PLUG) Technical Discussion List <plug@???>
> Sent: Wednesday, 26 October 2016, 9:05
> Subject: Re: [plug] decoding further a Machine Check Excepton
>
> it looks like a memory error to me... can you remove the memory at
> bank 8 if that solves the problem?
>
> fooler.
>
> On Mon, Oct 24, 2016 at 9:31 PM, Michael Tinsay <tinsami1@???> wrote:
>> Hi!
>>
>> Yesterday one of our servers had this on the console:
>>
>> [ 1184.087973] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4
>> Bank
>> 8: ba000000000000b2
>> [ 1184.087973] mce: [Hardware Error]: TSC 3a3965b65c0 MISC 80000
>> [ 1184.087973] mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1477301538
>> SOCKET 0 APIC 0 microcode 2
>> [ 1184.087973] mce: [Hardware Error]: Machine check: Processor context
>> corrupt
>>
>> So I did some research and found out that I can use an app named mcelog to
>> decode this. This was the output from it:
>>
>> Hardware event. This is not a software error.
>> CPU 0 BANK 8 TSC 3a3965b65c0
>> MISC 80000
>> TIME 1477301538 Mon Oct 24 17:32:18 2016
>> MCG status:MCIP
>> MCi status:
>> Uncorrected error
>> Error enabled
>> MCi_MISC register valid
>> Processor context corrupt
>> MCA: MEMORY CONTROLLER AC_CHANNEL2_ERR
>> Transaction: Address/Command error
>> Memory corrected error count (CORE_ERR_CNT): 0
>> Memory transaction Tracker ID (RTId): 0
>> Memory DIMM ID of error: 0
>> Memory channel ID of error: 2
>> Memory ECC syndrome: 0
>> STATUS ba000000000000b2 MCGSTATUS 4
>> CPUID Vendor Intel Family 6 Model 44
>> SOCKET 0 APIC 0 microcode 2
>> tinsaymc@IT-046641:~$ cat mce.txt
>> CPU 0: Machine Check Exception: 4 Bank 8: ba000000000000b2
>> TSC 3a3965b65c0 MISC 80000
>> PROCESSOR 0:206c2 TIME 1477301538 SOCKET 0 APIC 0 microcode 2
>>
>> So my question now, for those who know more about this area than I, is:
>> Is
>> the exception due to a problem in the CPU itself or somewhere on the
>> motherboard?
>>
>> Regards.
>>
>>
>> --- mike t.
>
>>
>> _________________________________________________
>> Philippine Linux Users' Group (PLUG) Mailing List
>> http://lists.linux.org.ph/mailman/listinfo/plug
>> Searchable Archives: http://archives.free.net.ph
>
>
>
>
> _________________________________________________
> Philippine Linux Users' Group (PLUG) Mailing List
> http://lists.linux.org.ph/mailman/listinfo/plug
> Searchable Archives: http://archives.free.net.ph

_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
http://lists.linux.org.ph/mailman/listinfo/plug
Searchable Archives: http://archives.free.net.ph