[pve-devel] need help to debug random host freeze on multiple hosts

Alexandre DERUMIER aderumier at odiso.com
Mon Dec 29 20:05:35 CET 2014


>>I don't have info about microcode update, only a note from dell support which said that it's correcting 
>>instability on vmware. (So I don't known for kvm) 

Here the detail of microcode patch

815 Processor May Read Partially Updated Branch Status
Register
Description
Under a highly specific and detailed set of internal timing conditions, the processor may read an internal branch
status register (BSR) while the register is being updated resulting in an incorrect rIP.
Potential Effect on System
The incorrect rIP causes unpredictable program or system behavior, usually observed as a page fault.
Suggested Workaround
Contact your AMD representative for information on a BIOS update.
Fix Planned
No fix planned



I have another crash this afternoon, and this host was around 90% cpu usage since 12h. (But loadaverage was ok).
So maybe more cpu give more chance to reach the case.

I have patched this bios, I'll wait to see if it's improve or not.



----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "datanom.net" <mir at datanom.net>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Lundi 29 Décembre 2014 16:56:32
Objet: Re: [pve-devel] need help to debug random host freeze on multiple hosts

>>Could this, given the high load, be caused by a race condition which is 
>>solved in the new microcode? 

I don't have info about microcode update, only a note from dell support which said that it's correcting 
instability on vmware. (So I don't known for kvm) 


>>Have you tried connecting a serial console to one of the nodes? 
>> 
>>If you have IPMI on the nodes you should also be able to monitor 
>>further than on the default console. 

I'm going to implement serial output over the dell idrac. 


----- Mail original ----- 
De: "datanom.net" <mir at datanom.net> 
À: "pve-devel" <pve-devel at pve.proxmox.com> 
Cc: "aderumier" <aderumier at odiso.com> 
Envoyé: Lundi 29 Décembre 2014 13:27:08 
Objet: Re: [pve-devel] need help to debug random host freeze on multiple hosts 

On Mon, 29 Dec 2014 07:31:32 +0100 (CET) 
Alexandre DERUMIER <aderumier at odiso.com> wrote: 

> 
> Yes sure , I have nothing in logs. 
> (That's why I thinked of kdump to try to have more info). 
> 
> I'll really don't known if it's a software real kernel panic, or a hardware bug. 
> 
> I just see on vmware forum some amd microcode bug, and see that dell provide a new bios update this month. 
> I'll try to update to see if it's help. 
> 
Could this, given the high load, be caused by a race condition which is 
solved in the new microcode? 

Have you tried connecting a serial console to one of the nodes? 

If you have IPMI on the nodes you should also be able to monitor 
further than on the default console. 

-- 
Hilsen/Regards 
Michael Rasmussen 

Get my public GnuPG keys: 
michael <at> rasmussen <dot> cc 
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E 
mir <at> datanom <dot> net 
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C 
mir <at> miras <dot> org 
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 
-------------------------------------------------------------- 
/usr/games/fortune -es says: 
We secure our friends not by accepting favors but by doing them. 
-- Thucydides 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 



More information about the pve-devel mailing list