[pve-devel] [PATCH ha-manager] always queue service stop if node shuts down

Fri May 26 16:49:49 CEST 2017

small comment inside

On Fri, May 26, 2017 at 03:45:38PM +0200, Thomas Lamprecht wrote:
> Commit 61ae38eb6fc5ab351fb61f2323776819e20538b7 which ensured that
> services get freezed on a node reboot had a side effect where running
> services did not get gracefully shutdown on node reboot.
> This may lead to data loss as the services then get hard killed, or
> they may even prevent a node reboot because a storage cannot get
> unmounted as a service still access it.
> 
> This commits addresses this issue but does not changes behavior of
> the freeze logic for now, but we should evaluate if a freeze makes
> really sense here or at least make it configurable.
> 
> The changed regression test is a result of the fact that we did not
> adapt the correct behavior for the is_node_shutdown command in the
> problematic commit. The simulation envrionment returned true
> everytime a node shutdown (reboot and poweroff) and the real world
> environment just returned true if a poweroff happened but not on a
> reboot.
> 
> Now the simulation acts the same way as the real environment.
> Further I moved the simulation implemenentation to the base class so
> that both simulator and regression test system behave the same.
> 
> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
> ---
> 
> Important changes happen in LRM and Env/PVE2 the rest is "noise" from the
> regression/simulation framework.
> 
> Please review as a candidate for PVE 4.4 and PVE 5.
> 
>  src/PVE/HA/Env/PVE2.pm           |  6 ++++--
>  src/PVE/HA/LRM.pm                | 14 ++++++++------
>  src/PVE/HA/NodeStatus.pm         |  6 +++++-
>  src/PVE/HA/Sim/Env.pm            | 20 +++++++++++++++++++-
>  src/PVE/HA/Sim/TestEnv.pm        | 11 -----------
>  src/test/test-reboot1/log.expect | 12 +++++++-----
>  6 files changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
> index 382b61c..2d8f219 100644
> --- a/src/PVE/HA/Env/PVE2.pm
> +++ b/src/PVE/HA/Env/PVE2.pm
> @@ -85,18 +85,21 @@ sub is_node_shutdown {
>      my ($self) = @_;
>  
>      my $shutdown = 0;
> +    my $reboot = 0;
>  
>      my $code = sub {
>  	my $line = shift;
>  
>  	# ensure we match the full unit name by matching /^JOB_ID UNIT /
> -	$shutdown = 1 if ($line =~ m/^\d+\s+(poweroff|halt)\.target\s+/);
> +	# see: man systemd.special
> +	$shutdown = 1 if ($line =~ m/^\d+\s+shutdown\.target\s+/);
> +	$reboot = 1 if ($line =~ m/^\d+\s+reboot\.target\s+/);
>      };
>  
>      my $cmd = ['/bin/systemctl', '--full', 'list-jobs'];
>      eval { PVE::Tools::run_command($cmd, outfunc => $code, noerr => 1); };
>  
> -    return $shutdown;
> +    return ($shutdown, $reboot);
>  }
>  
>  sub queue_crm_commands {
> diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
> index 26c5c89..86ec608 100644
> --- a/src/PVE/HA/LRM.pm
> +++ b/src/PVE/HA/LRM.pm
> @@ -51,14 +51,12 @@ sub shutdown_request {
>  
>      my $nodename = $haenv->nodename();
>  
> -    my $shutdown = $haenv->is_node_shutdown();
> +    my ($shutdown, $reboot) = $haenv->is_node_shutdown();
>  
>      if ($shutdown) {
> -	$haenv->log('info', "shutdown LRM, stop all services");
> -	$self->{mode} = 'shutdown';
> -
> -	# queue stop jobs for all services
> -
> +	# *always* queue stop jobs for all services if the node shuts down,
> +	# independent if it's a reboot or a poweroff, else we may corrupt
> +	# services or hinder node shutdown
>  	my $ss = $self->{service_status};
>  
>  	foreach my $sid (keys %$ss) {
> @@ -68,7 +66,11 @@ sub shutdown_request {
>  	    # Note: use undef uid to mark shutdown/stop jobs
>  	    $self->queue_resource_command($sid, undef, 'request_stop');
>  	}
> +    }
>  
> +    if ($shutdown && !$reboot) {
> +	$haenv->log('info', "shutdown LRM, stop all services");
> +	$self->{mode} = 'shutdown';
>      } else {
>  	$haenv->log('info', "restart LRM, freeze all services");

could we maybe split this if / else up more, and make the log message
explicit?

e.g.

if shutdown
  if reboot
    log node rebooting, freeze all services
  else
    log node shutting down, stop all services
else
  log restart LRM, freeze all services

seeing a "restart LRM" in the middle of shutting down a node is kind of
weird, especially since we are in fact stopping the LRM service at that
point, not restarting it:

May 26 13:28:17 node1 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
May 26 13:28:18 node1 systemd[1]: PID file /var/run/pve-ha-lrm.pid not readable (yet?) after start.
May 26 13:28:18 node1 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
May 26 13:28:18 node1 pve-ha-lrm[3609]: starting server
May 26 13:28:18 node1 pve-ha-lrm[3609]: status change startup => wait_for_agent_lock
May 26 13:30:20 node1 systemd[1]: Stopping PVE Local HA Ressource Manager Daemon...
May 26 13:30:21 node1 pve-ha-lrm[3609]: received signal TERM
May 26 13:30:21 node1 pve-ha-lrm[3609]: restart LRM, freeze all services
May 26 13:30:22 node1 pve-ha-lrm[3609]: server stopped
May 26 13:30:23 node1 systemd[1]: Stopped PVE Local HA Ressource Manager Daemon.

>  	$self->{mode} = 'restart';
> diff --git a/src/PVE/HA/NodeStatus.pm b/src/PVE/HA/NodeStatus.pm
> index b99689c..493298f 100644
> --- a/src/PVE/HA/NodeStatus.pm
> +++ b/src/PVE/HA/NodeStatus.pm
> @@ -221,8 +221,12 @@ sub fence_node {
>      if ($success) {
>  	my $msg = "fencing: acknowledged - got agent lock for node '$node'";
>  	$haenv->log("info", $msg);
> +
> +	if ($state eq 'fence') {
> +	    &$send_fence_state_email($self, 'SUCCEED', $msg, $node);
> +	}
> +
>  	&$set_node_state($self, $node, 'unknown');
> -	&$send_fence_state_email($self, 'SUCCEED', $msg, $node);
>      }
>  
>      return $success;
> diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm
> index cd1574c..fce688a 100644
> --- a/src/PVE/HA/Sim/Env.pm
> +++ b/src/PVE/HA/Sim/Env.pm
> @@ -158,7 +158,25 @@ sub write_lrm_status {
>  sub is_node_shutdown {
>      my ($self) = @_;
>  
> -    return 0; # default to freezing services if not overwritten by subclass
> +    my $node = $self->{nodename};
> +    my $cstatus = $self->{hardware}->read_hardware_status_nolock();
> +
> +    die "undefined node status for node '$node'" if !defined($cstatus->{$node});
> +
> +    my ($shutdown, $reboot) = (0, 0);
> +
> +    if (my $target = $cstatus->{$node}->{shutdown}) {
> +	if ($target eq 'shutdown') {
> +	    $shutdown = 1;
> +	} elsif ($target eq 'reboot') {
> +	    $shutdown = 1;
> +	    $reboot = 1;
> +	} else {
> +	    die "unknown shutdown target '$target'";
> +	}
> +    }
> +
> +    return ($shutdown, $reboot);
>  }
>  
>  sub read_service_config {
> diff --git a/src/PVE/HA/Sim/TestEnv.pm b/src/PVE/HA/Sim/TestEnv.pm
> index fe08be3..0e6eced 100644
> --- a/src/PVE/HA/Sim/TestEnv.pm
> +++ b/src/PVE/HA/Sim/TestEnv.pm
> @@ -112,17 +112,6 @@ sub loop_end_hook {
>      $self->{cur_time} += 1; # easier for simulation
>  }
>  
> -sub is_node_shutdown {
> -    my ($self) = @_;
> -
> -    my $node = $self->{nodename};
> -    my $cstatus = $self->{hardware}->read_hardware_status_nolock();
> -
> -    die "undefined node status for node '$node'" if !defined($cstatus->{$node});
> -
> -    return defined($cstatus->{$node}->{shutdown}) ? 1 : 0;
> -}
> -
>  # must be 0 as we do not want to fork in the regression tests
>  sub get_max_workers {
>      my ($self) = @_;
> diff --git a/src/test/test-reboot1/log.expect b/src/test/test-reboot1/log.expect
> index 840f56d..e2dda5c 100644
> --- a/src/test/test-reboot1/log.expect
> +++ b/src/test/test-reboot1/log.expect
> @@ -21,7 +21,8 @@ info     25    node3/lrm: status change wait_for_agent_lock => active
>  info     25    node3/lrm: starting service vm:103
>  info     25    node3/lrm: service status vm:103 started
>  info    120      cmdlist: execute reboot node3
> -info    120    node3/lrm: shutdown LRM, stop all services
> +info    120    node3/lrm: restart LRM, freeze all services
> +info    120    node1/crm: service 'vm:103': state changed from 'started' to 'freeze'
>  info    125    node3/lrm: stopping service vm:103
>  info    125    node3/lrm: service status vm:103 stopped
>  info    126    node3/lrm: exit (loop end)
> @@ -31,9 +32,10 @@ info    145       reboot: execute power node3 off
>  info    145       reboot: execute power node3 on
>  info    145    node3/crm: status change startup => wait_for_quorum
>  info    140    node3/lrm: status change startup => wait_for_agent_lock
> -info    145    node3/lrm: got lock 'ha_agent_node3_lock'
> -info    145    node3/lrm: status change wait_for_agent_lock => active
> -info    145    node3/lrm: starting service vm:103
> -info    145    node3/lrm: service status vm:103 started
> +info    160    node1/crm: service 'vm:103': state changed from 'freeze' to 'started'
>  info    164    node3/crm: status change wait_for_quorum => slave
> +info    165    node3/lrm: got lock 'ha_agent_node3_lock'
> +info    165    node3/lrm: status change wait_for_agent_lock => active
> +info    165    node3/lrm: starting service vm:103
> +info    165    node3/lrm: service status vm:103 started
>  info    720     hardware: exit simulation - done
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel