[pve-devel] [PATCH cluster v4 1/4] cfs_lock: address race where alarm triggers with lock accquired

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Nov 9 09:47:24 CET 2017


As mkdir can possibly hang forever we need to enforce a timeout on
it. But this was made in such a way so that a small time window
existed where the lock could be acquired successfully but the alarm
triggered still, leaving around an unused lock for 120 seconds.

Wrap only the mkdir call itself in an alarm and save its result
directly in a $got_lock variable, this minimizes the window as far as
possible from the perl side.

This is also easier to track for humans reading the code and should
cope better against code changes, e.g., it does not breaks just if an
error message typo got corrected a few lines above.

Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
---
 data/PVE/Cluster.pm | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/data/PVE/Cluster.pm b/data/PVE/Cluster.pm
index 70ce250..94db591 100644
--- a/data/PVE/Cluster.pm
+++ b/data/PVE/Cluster.pm
@@ -861,6 +861,7 @@ my $cfs_lock = sub {
     my ($lockid, $timeout, $code, @param) = @_;
 
     my $res;
+    my $got_lock = 0;
 
     # this timeout is for aquire the lock
     $timeout = 10 if !$timeout;
@@ -877,21 +878,21 @@ my $cfs_lock = sub {
 	    die "$msg: pve cluster filesystem not online.\n";
 	}
 
-        local $SIG{ALRM} = sub { die "got lock request timeout\n"; };
+	my $timeout_err = sub { die "$msg: got lock request timeout\n"; };
+	local $SIG{ALRM} = $timeout_err;
 
-        alarm ($timeout);
+	while (1) {
+	    alarm ($timeout);
+	    $got_lock = mkdir($filename);
+	    $timeout = alarm(0);
+
+	    last if $got_lock;
+
+	    $timeout_err->() if $timeout == 0;
 
-	if (!(mkdir $filename)) {
 	    print STDERR "trying to aquire cfs lock '$lockid' ...";
- 	    while (1) {
-		if (!(mkdir $filename)) {
-		    (utime 0, 0, $filename); # cfs unlock request
-		} else {
-		    print STDERR " OK\n";
-		    last;
-		}
-		sleep(1);
-	    }
+	    utime (0, 0, $filename); # cfs unlock request
+	    sleep(1);
 	}
 
 	# fixed command timeout: cfs locks have a timeout of 120
-- 
2.11.0





More information about the pve-devel mailing list