[pve-devel] [PATCH ha-manager 2/3] fix #1378: allow to specify a service shutdown policy
Thomas Lamprecht
thomas at lamprecht.org
Fri Dec 21 08:30:58 CET 2018
yes, i really missed some comments from you, sorry!Addressing them here.
Am 12/19/2018 um 01:10 PM schrieb Fabian Grünbichler:
> On Wed, Dec 19, 2018 at 11:39:40AM +0100, Thomas Lamprecht wrote:
>> Allow an admin to set a datacenter wide HA policy which can change
>> the way we handle services on a node shutdown.
>>
>> There's:
>>
>> * freeze: always freeze servivces, independent of the shutdown type
>> (reboot, poweroff)
>> * failover: never freeze services, this means that a service will get
>> recovered to another node if possible and if the current node does
>> not comes back up in the grace period of 1 minute.
> s/comes/come
>
> why 1 minute here, and 2 minutes in the other patch? ;)
>
>> * default: this is the current behavior, freeze on reboot but do not
>> freeze on poweroff
>>
>> Add to tests, shutdown-policy1 which is based of the reboot1 test,
>> but enforces no freeze with a failover policy, and shutdown-policy2
>> which is based on the shutdown1 test but with a explicit freeze
>> policy. You can compare (diff) each tests log result to the test it's
>> based on to see what changes.
>>
>> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
>> ---
>> src/PVE/HA/LRM.pm | 22 +++++++++-
>> src/test/test-shutdown-policy1/cmdlist | 4 ++
>> src/test/test-shutdown-policy1/datacenter.cfg | 5 +++
>> .../test-shutdown-policy1/hardware_status | 5 +++
>> src/test/test-shutdown-policy1/log.expect | 40 +++++++++++++++++++
>> src/test/test-shutdown-policy1/manager_status | 1 +
>> src/test/test-shutdown-policy1/service_config | 3 ++
>> src/test/test-shutdown-policy2/cmdlist | 4 ++
>> src/test/test-shutdown-policy2/datacenter.cfg | 5 +++
>> .../test-shutdown-policy2/hardware_status | 5 +++
>> src/test/test-shutdown-policy2/log.expect | 34 ++++++++++++++++
>> src/test/test-shutdown-policy2/manager_status | 1 +
>> src/test/test-shutdown-policy2/service_config | 3 ++
>> 13 files changed, 130 insertions(+), 2 deletions(-)
>> create mode 100644 src/test/test-shutdown-policy1/cmdlist
>> create mode 100644 src/test/test-shutdown-policy1/datacenter.cfg
>> create mode 100644 src/test/test-shutdown-policy1/hardware_status
>> create mode 100644 src/test/test-shutdown-policy1/log.expect
>> create mode 100644 src/test/test-shutdown-policy1/manager_status
>> create mode 100644 src/test/test-shutdown-policy1/service_config
>> create mode 100644 src/test/test-shutdown-policy2/cmdlist
>> create mode 100644 src/test/test-shutdown-policy2/datacenter.cfg
>> create mode 100644 src/test/test-shutdown-policy2/hardware_status
>> create mode 100644 src/test/test-shutdown-policy2/log.expect
>> create mode 100644 src/test/test-shutdown-policy2/manager_status
>> create mode 100644 src/test/test-shutdown-policy2/service_config
>>
>> diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
>> index dda82eb..fc560fa 100644
>> --- a/src/PVE/HA/LRM.pm
>> +++ b/src/PVE/HA/LRM.pm
>> @@ -53,6 +53,20 @@ sub shutdown_request {
>>
>> my ($shutdown, $reboot) = $haenv->is_node_shutdown();
>>
>> + my $dc_ha_cfg = $haenv->get_ha_settings();
>> + my $shutdown_policy = $dc_ha_cfg->{shutdown_policy} // 'default';
>> +
>> + my $freeze_all = $reboot;
>> + if ($shutdown_policy eq 'default') {
>> + $freeze_all = $reboot;
>> + } elsif ($shutdown_policy eq 'freeze') {
>> + $freeze_all = 1;
>> + } elsif ($shutdown_policy eq 'failover') {
>> + $freeze_all = 0;
>> + } else {
>> + $haenv->log('err', "unkown shutdown policy '$shutdown_policy', fall back to default");
>> + }
> s/unkown/unknown/
>
> this is a bit weird to read IMHO. how about
>
> my $freeze_all;
> if ($shutdown_policy eq 'freeze') {
> freeze_all = 1;
> } elsif ($shutdown_policy eq 'failover') {
> freeze_all = 0;
> } else {
> env->log('err', "unknown shutdown policy '$shutdown_policy', fall back to default")
> if $shutdown_policy ne 'default';
> freeze_all = $reboot;
> }
Hmm, I do not like the post if in the else, for me yours is harder to read.
But i could let $freeze_all stay undefined on declaration and explicitly
set it to
$reboot in the else branch. makes it a bit clearer but does not
introduces a post if
in a strict "switch" like if/elsif construct.
>> +
>> if ($shutdown) {
>> # *always* queue stop jobs for all services if the node shuts down,
>> # independent if it's a reboot or a poweroff, else we may corrupt
>> @@ -69,8 +83,12 @@ sub shutdown_request {
>> }
>>
>> if ($shutdown) {
>> - if ($reboot) {
>> - $haenv->log('info', "reboot LRM, stop and freeze all services");
>> + if ($freeze_all) {
>> + if ($shutdown_policy eq 'default') {
> wouldn't $reboot be more appropriate here instead of matching on the
> policy?
It's really the same here, but yeah $reboot could make it easier to read
for an outsider.
> alternatively, we could just log shutdown vs. reboot and policy
> separately (the latter is patch #3 anyway), and drop this combined
> output? or just log shutdown type + policy?
This is already the case? And you wouldn't know what's happening in the
"conditional"
case with your proposal...With this we have the distinction in that case
too, we keep the
regression test clean, do not change logs just for the sake of it and it
wouldn't saves much.
I think about it, but I currently lean to keep it as proposed.
>> + $haenv->log('info', "reboot LRM, stop and freeze all services");
>> + } else {
>> + $haenv->log('info', "shutdown LRM, stop and freeze all services");
>> + }
>> $self->{mode} = 'restart';
>> } else {
>> $haenv->log('info', "shutdown LRM, stop all services");
> isn't this missing the second case as well? i.e., reboot + failover/stop?
Hmm, we never differed here, this was implicit, I would try to keep it
that way
at least in this patch.Also, if the node comes up fast again (e.g. kexec
or just
fast hardware with non-bloat FW) you do not failover, so implying this
is possible
wrong. That's the reason we had it this way. Doing the
"reboot"/"shutdown" could be
done, I could set a shutdown_type message above and reuse it here and in
above's
branch eliminating the if/else for haenv->log completely.
As the comments are just cosmetical issues and do not change behavior at
all I try
to assemble a v3 soon.
>> diff --git a/src/test/test-shutdown-policy1/cmdlist b/src/test/test-shutdown-policy1/cmdlist
>> new file mode 100644
>> index 0000000..8558351
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy1/cmdlist
>> @@ -0,0 +1,4 @@
>> +[
>> + [ "power node1 on", "power node2 on", "power node3 on"],
>> + [ "reboot node3" ]
>> +]
>> diff --git a/src/test/test-shutdown-policy1/datacenter.cfg b/src/test/test-shutdown-policy1/datacenter.cfg
>> new file mode 100644
>> index 0000000..6108ece
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy1/datacenter.cfg
>> @@ -0,0 +1,5 @@
>> +{
>> + "ha": {
>> + "shutdown_policy": "failover"
>> + }
>> +}
>> diff --git a/src/test/test-shutdown-policy1/hardware_status b/src/test/test-shutdown-policy1/hardware_status
>> new file mode 100644
>> index 0000000..451beb1
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy1/hardware_status
>> @@ -0,0 +1,5 @@
>> +{
>> + "node1": { "power": "off", "network": "off" },
>> + "node2": { "power": "off", "network": "off" },
>> + "node3": { "power": "off", "network": "off" }
>> +}
>> diff --git a/src/test/test-shutdown-policy1/log.expect b/src/test/test-shutdown-policy1/log.expect
>> new file mode 100644
>> index 0000000..385b07a
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy1/log.expect
>> @@ -0,0 +1,40 @@
>> +info 0 hardware: starting simulation
>> +info 20 cmdlist: execute power node1 on
>> +info 20 node1/crm: status change startup => wait_for_quorum
>> +info 20 node1/lrm: status change startup => wait_for_agent_lock
>> +info 20 cmdlist: execute power node2 on
>> +info 20 node2/crm: status change startup => wait_for_quorum
>> +info 20 node2/lrm: status change startup => wait_for_agent_lock
>> +info 20 cmdlist: execute power node3 on
>> +info 20 node3/crm: status change startup => wait_for_quorum
>> +info 20 node3/lrm: status change startup => wait_for_agent_lock
>> +info 20 node1/crm: got lock 'ha_manager_lock'
>> +info 20 node1/crm: status change wait_for_quorum => master
>> +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
>> +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
>> +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
>> +info 20 node1/crm: adding new service 'vm:103' on node 'node3'
>> +info 22 node2/crm: status change wait_for_quorum => slave
>> +info 24 node3/crm: status change wait_for_quorum => slave
>> +info 25 node3/lrm: got lock 'ha_agent_node3_lock'
>> +info 25 node3/lrm: status change wait_for_agent_lock => active
>> +info 25 node3/lrm: starting service vm:103
>> +info 25 node3/lrm: service status vm:103 started
>> +info 120 cmdlist: execute reboot node3
>> +info 120 node3/lrm: shutdown LRM, stop all services
>> +info 125 node3/lrm: stopping service vm:103
>> +info 125 node3/lrm: service status vm:103 stopped
>> +info 126 node3/lrm: exit (loop end)
>> +info 126 reboot: execute crm node3 stop
>> +info 125 node3/crm: server received shutdown request
>> +info 145 node3/crm: exit (loop end)
>> +info 145 reboot: execute power node3 off
>> +info 145 reboot: execute power node3 on
>> +info 145 node3/crm: status change startup => wait_for_quorum
>> +info 140 node3/lrm: status change startup => wait_for_agent_lock
>> +info 145 node3/lrm: got lock 'ha_agent_node3_lock'
>> +info 145 node3/lrm: status change wait_for_agent_lock => active
>> +info 145 node3/lrm: starting service vm:103
>> +info 145 node3/lrm: service status vm:103 started
>> +info 164 node3/crm: status change wait_for_quorum => slave
>> +info 720 hardware: exit simulation - done
>> diff --git a/src/test/test-shutdown-policy1/manager_status b/src/test/test-shutdown-policy1/manager_status
>> new file mode 100644
>> index 0000000..0967ef4
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy1/manager_status
>> @@ -0,0 +1 @@
>> +{}
>> diff --git a/src/test/test-shutdown-policy1/service_config b/src/test/test-shutdown-policy1/service_config
>> new file mode 100644
>> index 0000000..c6860e7
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy1/service_config
>> @@ -0,0 +1,3 @@
>> +{
>> + "vm:103": { "node": "node3", "state": "enabled" }
>> +}
>> diff --git a/src/test/test-shutdown-policy2/cmdlist b/src/test/test-shutdown-policy2/cmdlist
>> new file mode 100644
>> index 0000000..a86b9e2
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy2/cmdlist
>> @@ -0,0 +1,4 @@
>> +[
>> + [ "power node1 on", "power node2 on", "power node3 on"],
>> + [ "shutdown node3" ]
>> +]
>> diff --git a/src/test/test-shutdown-policy2/datacenter.cfg b/src/test/test-shutdown-policy2/datacenter.cfg
>> new file mode 100644
>> index 0000000..0d411c1
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy2/datacenter.cfg
>> @@ -0,0 +1,5 @@
>> +{
>> + "ha": {
>> + "shutdown_policy": "freeze"
>> + }
>> +}
>> diff --git a/src/test/test-shutdown-policy2/hardware_status b/src/test/test-shutdown-policy2/hardware_status
>> new file mode 100644
>> index 0000000..451beb1
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy2/hardware_status
>> @@ -0,0 +1,5 @@
>> +{
>> + "node1": { "power": "off", "network": "off" },
>> + "node2": { "power": "off", "network": "off" },
>> + "node3": { "power": "off", "network": "off" }
>> +}
>> diff --git a/src/test/test-shutdown-policy2/log.expect b/src/test/test-shutdown-policy2/log.expect
>> new file mode 100644
>> index 0000000..a36c628
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy2/log.expect
>> @@ -0,0 +1,34 @@
>> +info 0 hardware: starting simulation
>> +info 20 cmdlist: execute power node1 on
>> +info 20 node1/crm: status change startup => wait_for_quorum
>> +info 20 node1/lrm: status change startup => wait_for_agent_lock
>> +info 20 cmdlist: execute power node2 on
>> +info 20 node2/crm: status change startup => wait_for_quorum
>> +info 20 node2/lrm: status change startup => wait_for_agent_lock
>> +info 20 cmdlist: execute power node3 on
>> +info 20 node3/crm: status change startup => wait_for_quorum
>> +info 20 node3/lrm: status change startup => wait_for_agent_lock
>> +info 20 node1/crm: got lock 'ha_manager_lock'
>> +info 20 node1/crm: status change wait_for_quorum => master
>> +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online'
>> +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online'
>> +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online'
>> +info 20 node1/crm: adding new service 'vm:103' on node 'node3'
>> +info 22 node2/crm: status change wait_for_quorum => slave
>> +info 24 node3/crm: status change wait_for_quorum => slave
>> +info 25 node3/lrm: got lock 'ha_agent_node3_lock'
>> +info 25 node3/lrm: status change wait_for_agent_lock => active
>> +info 25 node3/lrm: starting service vm:103
>> +info 25 node3/lrm: service status vm:103 started
>> +info 120 cmdlist: execute shutdown node3
>> +info 120 node3/lrm: shutdown LRM, stop and freeze all services
>> +info 120 node1/crm: service 'vm:103': state changed from 'started' to 'freeze'
>> +info 125 node3/lrm: stopping service vm:103
>> +info 125 node3/lrm: service status vm:103 stopped
>> +info 126 node3/lrm: exit (loop end)
>> +info 126 shutdown: execute crm node3 stop
>> +info 125 node3/crm: server received shutdown request
>> +info 145 node3/crm: exit (loop end)
>> +info 145 shutdown: execute power node3 off
>> +info 160 node1/crm: node 'node3': state changed from 'online' => 'unknown'
>> +info 720 hardware: exit simulation - done
>> diff --git a/src/test/test-shutdown-policy2/manager_status b/src/test/test-shutdown-policy2/manager_status
>> new file mode 100644
>> index 0000000..0967ef4
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy2/manager_status
>> @@ -0,0 +1 @@
>> +{}
>> diff --git a/src/test/test-shutdown-policy2/service_config b/src/test/test-shutdown-policy2/service_config
>> new file mode 100644
>> index 0000000..c6860e7
>> --- /dev/null
>> +++ b/src/test/test-shutdown-policy2/service_config
>> @@ -0,0 +1,3 @@
>> +{
>> + "vm:103": { "node": "node3", "state": "enabled" }
>> +}
>> --
>> 2.19.2
More information about the pve-devel
mailing list