[pve-devel] [PATCH ha-manager 1/6] fix inf. loop error on orphaned workers
Thomas Lamprecht
t.lamprecht at proxmox.com
Mon Feb 8 14:39:03 CET 2016
When we have a running job for a service which gets removed from
HA it can result in an error. This is normally not problematic if
the worker was already started (=has a PID) else we may trigger a
loop of errors when alrteady "$max_workers" are active and we
remove a service with a queued crm command.
Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
---
src/PVE/HA/LRM.pm | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 1894f3c..217a8ad 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -378,7 +378,16 @@ sub run_workers {
my $w = $self->{workers}->{$sid};
my $cd = $sc->{$sid};
if (!$cd) {
- $haenv->log('err', "missing resource configuration for '$sid'");
+ # if not already started don't start the worker at all,
+ # as the service was removed from HA management, else warn
+ if (!$w->{pid}) {
+ delete $self->{workers}->{$sid};
+ $haenv->log('err', "missing resource configuration for " .
+ "'$sid' - do not start worker [$w->{state}]");
+ } else {
+ $haenv->log('err', "orphaned active worker [$w->{state}] for" .
+ " service '$sid' with no resource configuration");
+ }
next;
}
if (!$w->{pid}) {
--
2.1.4
More information about the pve-devel
mailing list