[pve-devel] [RFC v2 storage] smartctl: use json parsing

Oguz Bektas o.bektas at proxmox.com
Wed Mar 31 14:17:40 CEST 2021


hi,

thank you for the comments!



On Wed, Mar 31, 2021 at 02:06:42PM +0200, Stefan Reiter wrote:
> Not familiar with smartctl, but I can comment on the code at least :)
> 
> First off, quick smoke test shows me this in the GUI on an NVMe device:
> 
> available_spare : 100
> available_spare_threshold : 10
> controller_busy_time : 4089
> critical_comp_time : 0
> critical_warning : 0
> data_units_read : 10402470
> data_units_written : 142318653
> host_reads : 341998913
> host_writes : 1135808482
> media_errors : 0
> num_err_log_entries : 345
> percentage_used : 11
> power_cycles : 301
> power_on_hours : 1033
> temperature : 44
> temperature_sensors : ARRAY(0x55c92a53a868)
> unsafe_shutdowns : 27
> warning_temp_time : 0
> 
> ...so the "temperate_sensors" are certainly wrong, and also the format is
> much less readable then before. ATA output was the same.


this field never came up on my machine so i missed it, thanks for
pointing out

> 
> On 3/30/21 2:23 PM, Oguz Bektas wrote:
> > adapt the smartctl endpoint to run smartctl with the --json or -j flag
> > to parse it more reasonably.
> > 
> > additionally add the $format parameter, one can choose between 'text'
> > and 'json' outputs in the API (only relevant for nvme devices, since for
> > API backwards compatibility we have to keep the 'text' field)
> > 
> > for the unit tests we need to collect the smartctl json outputs from
> > now. the current tests cover ssd_smart and nvme_smart cases.
> > 
> > Signed-off-by: Oguz Bektas <o.bektas at proxmox.com>
> > ---
> > 
> > 
> > NOTE: i deleted the rest of the tests so this builds properly with
> > the unit tests passing. but i could still keep the old outputs and place
> > the new ones with a new naming scheme, and adapt the
> > disklist_test.pm to read those instead. i'm open for ideas.
> > though applying this should result in make'able and working package
> > 
> 
> Please put at least the deletion (or moving/renaming if you prefer that) of
> old test cases into a separate patch, if it applies first there still won't
> be any breakage, but this one is a bit much to take in with additions and
> deletions all mixed together.


will do

> 
> > 
> > [...snip...]
> > 
> > diff --git a/PVE/API2/Disks.pm b/PVE/API2/Disks.pm
> > index 33bca76..a453efc 100644
> > --- a/PVE/API2/Disks.pm
> > +++ b/PVE/API2/Disks.pm
> > @@ -195,6 +195,12 @@ __PACKAGE__->register_method ({
> >   		description => "If true returns only the health status",
> >   		optional => 1,
> >   	    },
> > +	    format => {
> > +		description => "Return json or text",
> > +		type => 'string',
> > +		enum => ['text', 'json'],
> > +		optional => 1,
> > +	    },
> 
> As mentioned off-list, we already have 'pve-output-format', though for
> pvesh... not sure how this would help or apply here, but certainly worth
> giving a quick thought about.
> 
> 
> For instance, "pvesh get /nodes/.../disk/smart --disk /dev/nvme0n1
> --output-format text --format json" or "--output-format json-pretty --format
> text" give some interesting results ;)


that is intended. the pvesh output format is separate from this
'$format'.

the reason to add this parameter was to keep it API backwards
compatible (for nvme devices, we send the 'text' field in the json with
the raw text output obtained from smartctl). changing this default would
break things for API clients/scripts that depend on this field existing,
thats why it was made optional to get the json key pairs in the json for
nvme devices.


> 
> >   	},
> >       },
> >       returns => {
> > @@ -204,6 +210,7 @@ __PACKAGE__->register_method ({
> >   	    type => { type => 'string', optional => 1 },
> >   	    attributes => { type => 'array', optional => 1},
> >   	    text => { type => 'string', optional => 1 },
> > +	    json => { type => 'string', optional => 1 },
> >   	},
> >       },
> >       code => sub {
> > @@ -211,7 +218,7 @@ __PACKAGE__->register_method ({
> >   	my $disk = PVE::Diskmanage::verify_blockdev_path($param->{disk});
> > -	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly});
> > +	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}, $param->{format});
> >   	$result->{health} = 'UNKNOWN' if !defined $result->{health};
> >   	$result = { health => $result->{health} } if $param->{healthonly};
> > diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
> > index 64bb813..5ed140f 100644
> > --- a/PVE/Diskmanage.pm
> > +++ b/PVE/Diskmanage.pm
> > @@ -81,11 +81,12 @@ sub disk_is_used {
> >   }
> >   sub get_smart_data {
> > -    my ($disk, $healthonly) = @_;
> > +    my ($disk, $healthonly, $format) = @_;
> >       assert_blockdev($disk);
> >       my $smartdata = {};
> >       my $type;
> > +    $format = 'text' if !defined($format);
> 
> $format //= 'text';
ok
> 
> >       my $returncode = 0;
> > @@ -95,69 +96,69 @@ sub get_smart_data {
> >   	    or die "failed to get nvme controller device for $disk\n");
> >       }
> > -    my $cmd = [$SMARTCTL, '-H'];
> > -    push @$cmd, '-A', '-f', 'brief' if !$healthonly;
> > +    my $cmd = [$SMARTCTL, '-j', '-H'];
> > +    if (!$healthonly) {
> > +	push @$cmd, '-Afbrief';
> > +    }
> 
> why change the 'push' here?

ah, i was experimenting a bit and i guess this was leftover. i'd still
write it as one array item in the v3 (unit test takes the disk arg by array
element index):

push @$cmd, '-Afbrief' if !$healthonly;


> 
> >       push @$cmd, $disk;
> > +    my $smart_result = '';
> >       eval {
> > -	$returncode = run_command($cmd, noerr => 1, outfunc => sub{
> > -	    my ($line) = @_;
> > -
> > -# ATA SMART attributes, e.g.:
> > -# ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
> > -#   1 Raw_Read_Error_Rate     POSR-K   100   100   000    -    0
> > -#
> > -# SAS and NVME disks, e.g.:
> > -# Data Units Written:                 5,584,952 [2.85 TB]
> > -# Accumulated start-stop cycles:  34
> > -
> > -	    if (defined($type) && $type eq 'ata' && $line =~ m/^([ \d]{2}\d)\s+(\S+)\s+(\S{6})\s+(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$/) {
> > -		my $entry = {};
> > -
> > -		$entry->{name} = $2 if defined $2;
> > -		$entry->{flags} = $3 if defined $3;
> > -		# the +0 makes a number out of the strings
> > -		$entry->{value} = $4+0 if defined $4;
> > -		$entry->{worst} = $5+0 if defined $5;
> > -		# some disks report the default threshold as --- instead of 000
> > -		if (defined($6) && $6 eq '---') {
> > -		    $entry->{threshold} = 0;
> > -		} else {
> > -		    $entry->{threshold} = $6+0 if defined $6;
> > -		}
> > -		$entry->{fail} = $7 if defined $7;
> > -		$entry->{raw} = $8 if defined $8;
> > -		$entry->{id} = $1 if defined $1;
> > -		push @{$smartdata->{attributes}}, $entry;
> > -	    } elsif ($line =~ m/(?:Health Status|self\-assessment test result): (.*)$/ ) {
> > -		$smartdata->{health} = $1;
> > -	    } elsif ($line =~ m/Vendor Specific SMART Attributes with Thresholds:/) {
> > -		$type = 'ata';
> > -		delete $smartdata->{text};
> > -	    } elsif ($line =~ m/=== START OF (READ )?SMART DATA SECTION ===/) {
> > -		$type = 'text';
> > -	    } elsif (defined($type) && $type eq 'text') {
> > -		$smartdata->{text} = '' if !defined $smartdata->{text};
> > -		$smartdata->{text} .= "$line\n";
> > -		# extract wearout from nvme/sas text, allow for decimal values
> > -		if ($line =~ m/Percentage Used(?: endurance indicator)?:\s*(\d+(?:\.\d+)?)\%/i) {
> > -		    $smartdata->{wearout} = 100 - $1;
> > -		}
> > -	    } elsif ($line =~ m/SMART Disabled/) {
> > -		$smartdata->{health} = "SMART Disabled";
> > -	    }
> > -	});
> > +	run_command($cmd, noerr => 1, outfunc => sub { $smart_result .= shift });
> >       };
> >       my $err = $@;
> > -    # bit 0 and 1 mark an severe smartctl error
> > -    # all others are for disk status, so ignore them
> > -    # see smartctl(8)
> > -    if ((defined($returncode) && ($returncode & 0b00000011)) || $err) {
> > -	die "Error getting S.M.A.R.T. data: Exit code: $returncode\n";
> > +    my $json_result = decode_json($smart_result);
> > +
> > +    my $smart_health = $json_result->{smart_status}->{passed};
> > +    if (JSON::is_bool($smart_health)) {
> > +	$smart_health = ($smart_health)? "PASSED" : "FAILED";
> 
> unnecessary brackets?
i guess
> 
> > +    } else {
> > +	$smart_health = 'UNKNOWN';
> >       }
> > -    $smartdata->{type} = $type;
> > +    $smartdata->{health} = $smart_health;
> > +
> > +    $type = $json_result->{device}->{type};
> > +    if ($type eq 'nvme') {
> > +	$smartdata->{type} = $format; # text or json
> > +
> > +	my $wearout = 100.0 - $json_result->{nvme_smart_health_information_log}->{percentage_used} if !$healthonly;
> 
> too long even for my liking, needs a line break somewhere :)

hmm yeah, probably good idea to make
my $nvme_info = $json_result->{nvme_smart_health_information_log};

and then $nvme_info->{percentage_used}



> 
> > +
> > +	foreach my $k (sort keys %{$json_result->{nvme_smart_health_information_log}}) {
> > +	    my $v = $json_result->{nvme_smart_health_information_log}->{$k};
> > +	    if ($format eq 'text') {
> > +		$smartdata->{text} .= "$k : $v\n";
> 
> Previously, the "key : value" format would display human-readable info, i.e.
> capitalized key names, spacing around ':' and units for the value.
> With this, it simply prints out the json key/value.
> 
> I suppose this could also be fixed by adapting the GUI, but on the CLI it
> would still look different.

true, the data is the same though. my original plan was to make the GUI
part for nvme devices the same as ATA devices, e.g. with a table for the
attributes/properties

not sure what's the best way to go about it

> 
> > +	    } else {
> > +		$smartdata->{properties}->{$k} = $v;
> > +	    }
> > +	    if ($k eq 'percentage_used') {
> > +		$smartdata->{wearout} = $wearout;
> > +	    }
> > +	}
> > +	delete $smartdata->{attributes}; # this is for ata disks only
> > +    } else {
> > +	$smartdata->{type} = 'ata';
> > +	foreach my $attribute (@{$json_result->{ata_smart_attributes}->{table}}) {
> > +	    # we need to override or delete some fields to fit expected json schema of unit tests
> 
> while true, I feel like the reason isn't really the unit tests but rather
> consumers (i.e. the GUI) depending on it ;)
> 
> > +	    my $flags_string = $attribute->{flags}->{string};
> > +	    $flags_string =~ s/\s+$//;
> > +	    $attribute->{flags} = $flags_string;
> > +	    $attribute->{raw} = $attribute->{raw}->{string};
> > +	    $attribute->{threshold} = delete $attribute->{thresh};
> > +	    $attribute->{fail} = ${attribute}->{when_failed} eq "" ? "-" : ${attribute}->{when_failed};
> > +	    delete ${attribute}->{when_failed};
> > +
> > +	    push @{$smartdata->{attributes}}, $attribute;
> > +	}
> > +	my @sorted_attributes = sort { $a->{id} <=> $b->{id} } @{$smartdata->{attributes}};
> > +	@{$smartdata->{attributes}} = @sorted_attributes;
> > +    }
> > +
> > +    my $exit_status = $json_result->{smartctl}->{exit_status};
> > +    if ($exit_status ne 0) {
> > +	die "Error getting S.M.A.R.T. data: Exit code: $exit_status\n";
> > +    }
> 
> maybe do this before parsing?

hmm. maybe getting the exit_status from
json wasn't a good idea.


> 
> >       return $smartdata;
> >   }





More information about the pve-devel mailing list