[pve-devel] Blacklisting HP hardware watchdog timer module ?

Alexandre DERUMIER aderumier at odiso.com
Thu Dec 3 18:33:54 CET 2015


Damned,

I can't force openmanage to set the timer under 60s :(

#omconfig system recovery timer=10
Error! Recovery reset time must be between 60 and 720 seconds.

I'll try to see if we can disable it.

----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 3 Décembre 2015 18:24:40
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ?

I just found a strange bug with ipmi_watchdog, dell openmanage related 

at boot the timeout is correclty setup to 10s 

root at kvmtest1 ~ # ipmitool mc watchdog get 
Watchdog Timer Use: SMS/OS (0x44) 
Watchdog Timer Is: Started/Running 
Watchdog Timer Actions: Hard Reset (0x01) 
Pre-timeout interval: 0 seconds 
Timer Expiration Flags: 0x10 
Initial Countdown: 10 sec 
Present Countdown: 9 sec 


but after some minutes (5-10min), 
I'm seeing it at 480s 

# ipmitool mc watchdog get 
Watchdog Timer Use: SMS/OS (0xc4) 
Watchdog Timer Is: Started/Running 
Watchdog Timer Actions: No action (0x00) 
Pre-timeout interval: 0 seconds 
Timer Expiration Flags: 0x10 
Initial Countdown: 480 sec 
Present Countdown: 479 sec 


In the dell openmanage, I'm seeing a reset configuration option at 480s. 

(I think it's the openmanage service which overwrite the value). 

I'll add a note in the wiki about this too. 


----- Mail original ----- 
De: "aderumier" <aderumier at odiso.com> 
À: "dietmar" <dietmar at proxmox.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Jeudi 3 Décembre 2015 17:48:14 
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ? 

>>The timeout must be 60 seconds!! Never change that. 
>> 
>>We set the timeout to 60s when we start watchdog-mux. 
Ah ok. I thinked we need to define it manually 

What is the difference between this 2 timeout ? 

+ int watchdog_timeout = 10; 
+ int client_watchdog_timeout = 60; 


ipmitool give me 10s, so it's seem to works fine :) 
# ipmitool mc watchdog get 
Initial Countdown: 10 sec 




> Another question, I have done some tests 2weeks ago with a customer, 
> and I think I had some problem, if the node reboot too fast 
> (pve-ha-manager see the node down, but it's coming up again before the vm was 
> migrated). 
> Is it a known bug ? 

>>What bug exactly? 
I don't remember exactly, but lrm or crm was stuck, because node (and vms) had rebooted too fast. 

I don't have access to customer logs sorry. 



----- Mail original ----- 
De: "dietmar" <dietmar at proxmox.com> 
À: "aderumier" <aderumier at odiso.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Jeudi 3 Décembre 2015 17:28:55 
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ? 

> BTW, what is the best timeout for the watchdog ? 
> I think that pve ha manager wait for around 1min before migrating vm ? 
> if yes, the watchdog timeout should be lower ? 

The timeout must be 60 seconds!! Never change that. 

We set the timeout to 60s when we start watchdog-mux. 

> Another question, I have done some tests 2weeks ago with a customer, 
> and I think I had some problem, if the node reboot too fast 
> (pve-ha-manager see the node down, but it's coming up again before the vm was 
> migrated). 
> Is it a known bug ? 

What bug exactly? 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 



More information about the pve-devel mailing list