[pve-devel] training : watchdog not working on 1 server

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Feb 4 12:13:56 CET 2016



On 02/04/2016 11:58 AM, Alexandre DERUMIER wrote:
>>> If it runs fine on the other two with the same hardware it smells strong
>>> like a possible hardware bug/defective hardware (or firmware)?
>>>
>>> The countdown is probably only the default countdown, as it's not active
>>> and has no action configured this can be dismissed, imo.
> But this should work with softdog right ?

The ipmitool? No, AFAIK these have no build in function to read out the 
softdog status. In fact if I start it on a machine with no hardware dog 
present (only softdog) I get:
> Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: 
> No such file or directory

Concluding from the watchdog-mux output the softdog was successfully 
loaded and will (should) also trigger when not reset regularly.

So the problem is that ipmi gets recognized but the watchdog don't, as 
the other systems are working I still suspect faulty hardware/firmware 
on this one.


>
> ----- Mail original -----
> De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
> À: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Jeudi 4 Février 2016 11:30:08
> Objet: Re: [pve-devel] training : watchdog not working on 1 server
>
> On 02/04/2016 11:07 AM, Alexandre DERUMIER wrote:
>>>> looks OK to me. Seems there is no HA enabled VM on this node? That
>>>> would explain that the watchdog does not trigger.
>> The problem is not that the watchdog is not trigger,
>>
>> is that the watchdog timer is stopped
>> (and with a strange countdown of 15s)
>>
>>
>> # ipmitool mc watchdog get
>> Watchdog Timer Use: Reserved (0x00)
>> Watchdog Timer Is: Stopped
>> Watchdog Timer Actions: No action (0x00)
>> Pre-timeout interval: 1 seconds
>> Timer Expiration Flags: 0x00
>> Initial Countdown: 15 sec
>> Present Countdown: 15 sec
>>
>>
>> I don't have any other external software installed which can stop or change the timer (like openmanage)
> If it runs fine on the other two with the same hardware it smells strong
> like a possible hardware bug/defective hardware (or firmware)?
>
> The countdown is probably only the default countdown, as it's not active
> and has no action configured this can be dismissed, imo.
>
> The kernel simply doesn't find/sees the dog, did it work with PVE 3.X or
> some other system?
>
>
>>
>> ----- Mail original -----
>> De: "dietmar" <dietmar at proxmox.com>
>> À: "aderumier" <aderumier at odiso.com>
>> Cc: "pve-devel" <pve-devel at pve.proxmox.com>
>> Envoyé: Jeudi 4 Février 2016 09:34:31
>> Objet: Re: [pve-devel] training : watchdog not working on 1 server
>>
>>>>> What is the output of:
>>> # systemctl status watchdog-mux.service
>>>
>>>
>>> ● watchdog-mux.service - Proxmox VE watchdog multiplexer
>>> Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static)
>>> Active: active (running) since Thu 2016-02-04 09:09:26 CET; 1min 38s ago
>>> Main PID: 2808 (watchdog-mux)
>>> CGroup: /system.slice/watchdog-mux.service
>>> └─2808 /usr/sbin/watchdog-mux
>>>
>>> Feb 04 09:09:26 kvmformation2 watchdog-mux[2808]: Loading watchdog module
>>> 'softdog'
>>> Feb 04 09:09:26 kvmformation2 watchdog-mux[2808]: Watchdog driver 'Software
>>> Watchdog', version 0
>> looks OK to me. Seems there is no HA enabled VM on this node? That
>> would explain that the watchdog does not trigger.
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel





More information about the pve-devel mailing list