[pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images
Fiona Ebner
f.ebner at proxmox.com
Mon Aug 21 17:05:37 CEST 2023
Am 14.06.23 um 13:10 schrieb Aaron Lauterer:
> It can happen, that an RBD image isn't cleaned up 100%. Calling 'rbd ls
> -l' will then show errors that it is not possible to open the image in
> question:
> ```
> rbd: error opening vm-103-disk-1: (2) No such file or directory
> rbd: listing images failed: (2) No such file or directory
> ```
>
> Originally we only showed the last error line which is too generic and
> doesn't give a good hint what is actually wrong.
>
> We can improve that by catching these specific errors and add the
> problematic disk images to the returned list with a size of '-1'.
>
What do you think about logging a warning instead, hinting that it might
be a partially removed image? The thing I'm a bit worried about is that
existing scripts/tools interacting with our API might get confused by
the -1. And if I use the UI, I don't see it with either approach,
because your next patch hides it. If I use the CLI, I'll see either the
warning or the -1 depending on the approach.
> @@ -207,13 +209,28 @@ sub rbd_ls {
> my $raw = '';
> my $parser = sub { $raw .= shift };
>
> + my $show_err = 1;
> + my $missing_images = {};
> + my $err_parser = sub {
> + my $line = shift;
> + if ($line =~ m/$missing_image_err_regex/) {
> + $show_err = 0;
While both might be edge cases: What if there was some other error
before this one that we should die on? Or what if another error happens
in such a way that I don't get another stderr log line? Then $show_err
will still be 0 below and the function doesn't die.
It might be slightly better to do:
1. if there was any stderr log line we don't want to ignore, die
2. if there was none, base the decision off whether the final log line
was the "rbd: listing images failed: (2) No such file or directory"
> + $missing_images->{$1} = 1;
> + } elsif ($line ne "rbd: listing images failed: (2) No such file or directory") {
> + # this generic error is shown after the image specific "No such file..." one,
> + # ignore it but not other errors
> + $show_err = 1;
> + die $line;
> + }
> + };
> +
> my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '-l', '--format', 'json');
> eval {
> - run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => $parser);
> + run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => $parser);
> };
> my $err = $@;
>
> - die $err if $err && $err !~ m/doesn't contain rbd images/ ;
> + die $err if $err && $show_err && $err !~ m/doesn't contain rbd images/ ;
>
The "doesn't contain rbd images" bit could also be added to the
err_parser() :)
> my $result;
> if ($raw eq '') {
> @@ -224,6 +241,13 @@ sub rbd_ls {
> die "got unexpected data from rbd ls: '$raw'\n";
> }
>
> + for my $image (keys %$missing_images) {
> + push @$result, {
> + image => $image,
> + size => -1,
> + };
> + }
> +
> my $list = {};
>
> foreach my $el (@$result) {
> @@ -251,7 +275,20 @@ sub rbd_ls_snap {
> my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'ls', $name, '--format', 'json');
>
> my $raw = '';
> - run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => sub { $raw .= shift; });
> + my $show_err = 0;
Similar to the above, but this can happen more easily I think: What if
there is no stderr log line, but the command fails?
Slightly better:
1. if we got no log lines at all, but command failed, die
2. if there was any stderr log line we don't want to ignore, also die
3. If we only got log lines we want to ignore, don't die
> + my $err_parser = sub {
> + my $line = shift;
> + if ($line !~ m/$missing_image_err_regex/) {
> + $show_err = 1;
> + die $line;
> + }
> + };
> + eval {
> + run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => sub { $raw .= shift; });
> + };
> + my $err = $@;
> + die $err if $err && $show_err;
> + return {} if $err && !$show_err; # could not open image, probably missing
>
> my $list;
> if ($raw =~ m/^(\[.*\])$/s) { # untaint
> @@ -633,10 +670,13 @@ sub free_image {
>
> $class->deactivate_volume($storeid, $scfg, $volname);
>
> - my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge', $name);
> - run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
>
> - $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
> + if (keys %{$snaps}) {
> + my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge', $name);
> + run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
> + }
> +
> + my $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
> run_rbd_command($cmd, errmsg => "rbd rm '$name' error");
>
> return undef;
Does the 'snap purge' command on such a partially removed image also
fail? If that was the motivation for this change, please mention it in
the commit message. Otherwise, it can be it's own patch ;)
More information about the pve-devel
mailing list