[pve-devel] need help to debug random host freeze on multiple hosts
Alexandre DERUMIER
aderumier at odiso.com
Mon Dec 29 20:05:35 CET 2014
>>I don't have info about microcode update, only a note from dell support which said that it's correcting
>>instability on vmware. (So I don't known for kvm)
Here the detail of microcode patch
815 Processor May Read Partially Updated Branch Status
Register
Description
Under a highly specific and detailed set of internal timing conditions, the processor may read an internal branch
status register (BSR) while the register is being updated resulting in an incorrect rIP.
Potential Effect on System
The incorrect rIP causes unpredictable program or system behavior, usually observed as a page fault.
Suggested Workaround
Contact your AMD representative for information on a BIOS update.
Fix Planned
No fix planned
I have another crash this afternoon, and this host was around 90% cpu usage since 12h. (But loadaverage was ok).
So maybe more cpu give more chance to reach the case.
I have patched this bios, I'll wait to see if it's improve or not.
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "datanom.net" <mir at datanom.net>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Lundi 29 Décembre 2014 16:56:32
Objet: Re: [pve-devel] need help to debug random host freeze on multiple hosts
>>Could this, given the high load, be caused by a race condition which is
>>solved in the new microcode?
I don't have info about microcode update, only a note from dell support which said that it's correcting
instability on vmware. (So I don't known for kvm)
>>Have you tried connecting a serial console to one of the nodes?
>>
>>If you have IPMI on the nodes you should also be able to monitor
>>further than on the default console.
I'm going to implement serial output over the dell idrac.
----- Mail original -----
De: "datanom.net" <mir at datanom.net>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Cc: "aderumier" <aderumier at odiso.com>
Envoyé: Lundi 29 Décembre 2014 13:27:08
Objet: Re: [pve-devel] need help to debug random host freeze on multiple hosts
On Mon, 29 Dec 2014 07:31:32 +0100 (CET)
Alexandre DERUMIER <aderumier at odiso.com> wrote:
>
> Yes sure , I have nothing in logs.
> (That's why I thinked of kdump to try to have more info).
>
> I'll really don't known if it's a software real kernel panic, or a hardware bug.
>
> I just see on vmware forum some amd microcode bug, and see that dell provide a new bios update this month.
> I'll try to update to see if it's help.
>
Could this, given the high load, be caused by a race condition which is
solved in the new microcode?
Have you tried connecting a serial console to one of the nodes?
If you have IPMI on the nodes you should also be able to monitor
further than on the default console.
--
Hilsen/Regards
Michael Rasmussen
Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
We secure our friends not by accepting favors but by doing them.
-- Thucydides
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list