[pve-devel] [PATCH cluster v3 1/4] cfs_lock: address race where alarm triggers with lock accquired

Thomas Lamprecht t.lamprecht at proxmox.com
Tue Nov 7 15:37:35 CET 2017


As mkdir can possibly hang forever we need to enforce a timeout on
it. But this was made in such a way so that a small time window
existed where the lock could be acquired successfully but the alarm
triggered still, leaving around an unused lock for 120 seconds.

Wrap only the mkdir call itself in an alarm and save its result
directly in a $got_lock variable, this minimizes the window as far as
possible from the perl side.

This is also easier to track for humans reading the code and should
cope better against code changes, e.g., it does not breaks just if an
error message typo got corrected a few lines above.
---

new in this revision, reuses parts of v2 1/3

 data/PVE/Cluster.pm | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/data/PVE/Cluster.pm b/data/PVE/Cluster.pm
index 70ce250..968ad4b 100644
--- a/data/PVE/Cluster.pm
+++ b/data/PVE/Cluster.pm
@@ -861,6 +861,7 @@ my $cfs_lock = sub {
     my ($lockid, $timeout, $code, @param) = @_;
 
     my $res;
+    my $got_lock = 0;
 
     # this timeout is for aquire the lock
     $timeout = 10 if !$timeout;
@@ -877,22 +878,27 @@ my $cfs_lock = sub {
 	    die "$msg: pve cluster filesystem not online.\n";
 	}
 
-        local $SIG{ALRM} = sub { die "got lock request timeout\n"; };
+	my $timeout_err = sub { die "$msg: got lock request timeout\n"; };
+	local $SIG{ALRM} = $timeout_err;
 
-        alarm ($timeout);
+	my $start_time = time();
+	while (1) {
+	    alarm ($timeout);
+	    $got_lock = mkdir($filename);
+	    alarm(0);
+
+	    last if $got_lock;
+
+	    my $duration = time() - $start_time;
+	    $timeout_err->() if $duration >= $timeout;
+
+	    $timeout -= $duration;
 
-	if (!(mkdir $filename)) {
 	    print STDERR "trying to aquire cfs lock '$lockid' ...";
- 	    while (1) {
-		if (!(mkdir $filename)) {
-		    (utime 0, 0, $filename); # cfs unlock request
-		} else {
-		    print STDERR " OK\n";
-		    last;
-		}
-		sleep(1);
-	    }
+	    utime (0, 0, $filename); # cfs unlock request
+	    sleep(1);
 	}
+	print STDERR " OK\n";
 
 	# fixed command timeout: cfs locks have a timeout of 120
 	# using 60 gives us another 60 seconds to abort the task
-- 
2.11.0





More information about the pve-devel mailing list