[pve-devel] HA: vm shutdown/stop behaviour and other HA questions

Thu Mar 7 08:07:28 CET 2019

On 3/6/19 4:33 PM, Alexandre DERUMIER wrote:
>> Is is possible to implement a true "vm stop" without shutdown ? 
> 
>>> it could be, but do you need this often? We assumed that more often HA VM/CTs 
>>> are wanted to be shutdown gracefully.
> 
> It's more in case of a kernel panic, filesystem hang, or unresponsive vm... where the shutdown will take some minutes.

in some of those case the watchdog will trigger anyway, making reboot "unnecessary",
for a hanging VM it can really make sense, yes.

> 
>> Is is possible to reduce this sleep for manual actions ? (not sure if it's related to watchdog ?). 
> 
> 
>>> theoretically yes, but we then probably want do nothing if there's no change 
>>> (e.g., no new CRM command) as some users already complained about writing out 
>>> the LRM status every 10 seconds (they though it was bad for the pmxcfs DB backing 
>>> storage, but IMHO this really shouldn't do much for current gen hardware which is 
>>> able to write >100GB per day and still achieve lifespans for >5years.) 
> 
> Isn't it possible to keep the default 10s, and when manual action is done, 
> talk to crm/lrm (socket,api,...) to fast execute theses commands ?

socket won't work, you need to propagate the command and then the state over
the whole cluster, the current HA master can be anywhere.

Easiest way could be some inotify stuff on the crm command queue, so that you
reduce the worst time to that of the next LRM cycle, ~10 seconds.

> 
> (I just learning the code, so I really don't known if it could be possible)

it's a bit different than the rest of PVE, but once into it isn't to hard.
But changes have a completely different and bigger breakage possibility than
with other components, if you want to test around a virtual cluster, maybe
even with the "soft_noboot" option set on the softdog may make things easier.

I also tried to write up about the design here:
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_how_it_works

It's not perfect but could be a start when trying to get the internals.

> 
> 
> ----- Mail original -----
> De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
> À: "pve-devel" <pve-devel at pve.proxmox.com>, "aderumier" <aderumier at odiso.com>
> Envoyé: Mercredi 6 Mars 2019 08:21:06
> Objet: Re: [pve-devel] HA: vm shutdown/stop behaviour and other HA questions
> 
> Hi! 
> 
> On 3/6/19 7:59 AM, Alexandre DERUMIER wrote: 
>> Hi, 
>>
>> I'm finally going use HA on my cluster when proxmox 6.0 will be released (waiting for corosync 3.X). 
> 
> great. 
> 
>>
>> and, I have notice than shutdown or stop on vm, call both "HA stop" , which call "vm shutdown" then stop HA. 
>>
>>
>> Is is possible to implement a true "vm stop" without shutdown ? 
> 
> it could be, but do you need this often? We assumed that more often HA VM/CTs 
> are wanted to be shutdown gracefully. 
> 
>>
>> Also, I have notice than when we start/stop/migrate vm manually, it can take 10-20 second between the HA action, 
>> and the real vm action. (Seem to come from the 10s sleep in crm + lrm, between each loop). 
> 
> yes, exactly, you have worst-case 10 seconds until the current Master (CRM) picks 
> the migrate/relocate command up and after that, worst-case additional 10 seconds 
> until the LRM sees the new state, coming in at ~20 seconds in the double worst-case. 
> 
>> Is is possible to reduce this sleep for manual actions ? (not sure if it's related to watchdog ?). 
> 
> 
> theoretically yes, but we then probably want do nothing if there's no change 
> (e.g., no new CRM command) as some users already complained about writing out 
> the LRM status every 10 seconds (they though it was bad for the pmxcfs DB backing 
> storage, but IMHO this really shouldn't do much for current gen hardware which is 
> able to write >100GB per day and still achieve lifespans for >5years.) 
> 
> 
>>
>>
>>
>> In the futur, I would like to add some kind of balancing/scheduling of vm (memory/cpu balancing), 
>> I think it's the right place to do it ? 
>>
>> I have looked in /usr/share/perl5/PVE/HA/Manager.pm, sub select_service_node, 
>> seem pretty basic for now. (try node by priority, and by nodeid, try_next if it's failing). 
>> I think they are lot of improvment possible here. 
> 
> yes, this was a bit prepared for that use case, just didn't come around actually doing it. 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>