[pve-devel] [RFC v2 storage] smartctl: use json parsing

Stefan Reiter s.reiter at proxmox.com
Wed Mar 31 14:06:42 CEST 2021


Not familiar with smartctl, but I can comment on the code at least :)

First off, quick smoke test shows me this in the GUI on an NVMe device:

available_spare : 100
available_spare_threshold : 10
controller_busy_time : 4089
critical_comp_time : 0
critical_warning : 0
data_units_read : 10402470
data_units_written : 142318653
host_reads : 341998913
host_writes : 1135808482
media_errors : 0
num_err_log_entries : 345
percentage_used : 11
power_cycles : 301
power_on_hours : 1033
temperature : 44
temperature_sensors : ARRAY(0x55c92a53a868)
unsafe_shutdowns : 27
warning_temp_time : 0

...so the "temperate_sensors" are certainly wrong, and also the format 
is much less readable then before. ATA output was the same.

On 3/30/21 2:23 PM, Oguz Bektas wrote:
> adapt the smartctl endpoint to run smartctl with the --json or -j flag
> to parse it more reasonably.
> 
> additionally add the $format parameter, one can choose between 'text'
> and 'json' outputs in the API (only relevant for nvme devices, since for
> API backwards compatibility we have to keep the 'text' field)
> 
> for the unit tests we need to collect the smartctl json outputs from
> now. the current tests cover ssd_smart and nvme_smart cases.
> 
> Signed-off-by: Oguz Bektas <o.bektas at proxmox.com>
> ---
> 
> 
> NOTE: i deleted the rest of the tests so this builds properly with
> the unit tests passing. but i could still keep the old outputs and place
> the new ones with a new naming scheme, and adapt the
> disklist_test.pm to read those instead. i'm open for ideas.
> though applying this should result in make'able and working package
> 

Please put at least the deletion (or moving/renaming if you prefer that) 
of old test cases into a separate patch, if it applies first there still 
won't be any breakage, but this one is a bit much to take in with 
additions and deletions all mixed together.

> 
> [...snip...]
> 
> diff --git a/PVE/API2/Disks.pm b/PVE/API2/Disks.pm
> index 33bca76..a453efc 100644
> --- a/PVE/API2/Disks.pm
> +++ b/PVE/API2/Disks.pm
> @@ -195,6 +195,12 @@ __PACKAGE__->register_method ({
>   		description => "If true returns only the health status",
>   		optional => 1,
>   	    },
> +	    format => {
> +		description => "Return json or text",
> +		type => 'string',
> +		enum => ['text', 'json'],
> +		optional => 1,
> +	    },

As mentioned off-list, we already have 'pve-output-format', though for 
pvesh... not sure how this would help or apply here, but certainly worth 
giving a quick thought about.


For instance, "pvesh get /nodes/.../disk/smart --disk /dev/nvme0n1 
--output-format text --format json" or "--output-format json-pretty 
--format text" give some interesting results ;)

>   	},
>       },
>       returns => {
> @@ -204,6 +210,7 @@ __PACKAGE__->register_method ({
>   	    type => { type => 'string', optional => 1 },
>   	    attributes => { type => 'array', optional => 1},
>   	    text => { type => 'string', optional => 1 },
> +	    json => { type => 'string', optional => 1 },
>   	},
>       },
>       code => sub {
> @@ -211,7 +218,7 @@ __PACKAGE__->register_method ({
>   
>   	my $disk = PVE::Diskmanage::verify_blockdev_path($param->{disk});
>   
> -	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly});
> +	my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}, $param->{format});
>   
>   	$result->{health} = 'UNKNOWN' if !defined $result->{health};
>   	$result = { health => $result->{health} } if $param->{healthonly};
> diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
> index 64bb813..5ed140f 100644
> --- a/PVE/Diskmanage.pm
> +++ b/PVE/Diskmanage.pm
> @@ -81,11 +81,12 @@ sub disk_is_used {
>   }
>   
>   sub get_smart_data {
> -    my ($disk, $healthonly) = @_;
> +    my ($disk, $healthonly, $format) = @_;
>   
>       assert_blockdev($disk);
>       my $smartdata = {};
>       my $type;
> +    $format = 'text' if !defined($format);

$format //= 'text';

>   
>       my $returncode = 0;
>   
> @@ -95,69 +96,69 @@ sub get_smart_data {
>   	    or die "failed to get nvme controller device for $disk\n");
>       }
>   
> -    my $cmd = [$SMARTCTL, '-H'];
> -    push @$cmd, '-A', '-f', 'brief' if !$healthonly;
> +    my $cmd = [$SMARTCTL, '-j', '-H'];
> +    if (!$healthonly) {
> +	push @$cmd, '-Afbrief';
> +    }

why change the 'push' here?

>       push @$cmd, $disk;
>   
> +    my $smart_result = '';
>       eval {
> -	$returncode = run_command($cmd, noerr => 1, outfunc => sub{
> -	    my ($line) = @_;
> -
> -# ATA SMART attributes, e.g.:
> -# ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
> -#   1 Raw_Read_Error_Rate     POSR-K   100   100   000    -    0
> -#
> -# SAS and NVME disks, e.g.:
> -# Data Units Written:                 5,584,952 [2.85 TB]
> -# Accumulated start-stop cycles:  34
> -
> -	    if (defined($type) && $type eq 'ata' && $line =~ m/^([ \d]{2}\d)\s+(\S+)\s+(\S{6})\s+(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$/) {
> -		my $entry = {};
> -
> -		$entry->{name} = $2 if defined $2;
> -		$entry->{flags} = $3 if defined $3;
> -		# the +0 makes a number out of the strings
> -		$entry->{value} = $4+0 if defined $4;
> -		$entry->{worst} = $5+0 if defined $5;
> -		# some disks report the default threshold as --- instead of 000
> -		if (defined($6) && $6 eq '---') {
> -		    $entry->{threshold} = 0;
> -		} else {
> -		    $entry->{threshold} = $6+0 if defined $6;
> -		}
> -		$entry->{fail} = $7 if defined $7;
> -		$entry->{raw} = $8 if defined $8;
> -		$entry->{id} = $1 if defined $1;
> -		push @{$smartdata->{attributes}}, $entry;
> -	    } elsif ($line =~ m/(?:Health Status|self\-assessment test result): (.*)$/ ) {
> -		$smartdata->{health} = $1;
> -	    } elsif ($line =~ m/Vendor Specific SMART Attributes with Thresholds:/) {
> -		$type = 'ata';
> -		delete $smartdata->{text};
> -	    } elsif ($line =~ m/=== START OF (READ )?SMART DATA SECTION ===/) {
> -		$type = 'text';
> -	    } elsif (defined($type) && $type eq 'text') {
> -		$smartdata->{text} = '' if !defined $smartdata->{text};
> -		$smartdata->{text} .= "$line\n";
> -		# extract wearout from nvme/sas text, allow for decimal values
> -		if ($line =~ m/Percentage Used(?: endurance indicator)?:\s*(\d+(?:\.\d+)?)\%/i) {
> -		    $smartdata->{wearout} = 100 - $1;
> -		}
> -	    } elsif ($line =~ m/SMART Disabled/) {
> -		$smartdata->{health} = "SMART Disabled";
> -	    }
> -	});
> +	run_command($cmd, noerr => 1, outfunc => sub { $smart_result .= shift });
>       };
>       my $err = $@;
>   
> -    # bit 0 and 1 mark an severe smartctl error
> -    # all others are for disk status, so ignore them
> -    # see smartctl(8)
> -    if ((defined($returncode) && ($returncode & 0b00000011)) || $err) {
> -	die "Error getting S.M.A.R.T. data: Exit code: $returncode\n";
> +    my $json_result = decode_json($smart_result);
> +
> +    my $smart_health = $json_result->{smart_status}->{passed};
> +    if (JSON::is_bool($smart_health)) {
> +	$smart_health = ($smart_health)? "PASSED" : "FAILED";

unnecessary brackets?

> +    } else {
> +	$smart_health = 'UNKNOWN';
>       }
>   
> -    $smartdata->{type} = $type;
> +    $smartdata->{health} = $smart_health;
> +
> +    $type = $json_result->{device}->{type};
> +    if ($type eq 'nvme') {
> +	$smartdata->{type} = $format; # text or json
> +
> +	my $wearout = 100.0 - $json_result->{nvme_smart_health_information_log}->{percentage_used} if !$healthonly;

too long even for my liking, needs a line break somewhere :)

> +
> +	foreach my $k (sort keys %{$json_result->{nvme_smart_health_information_log}}) {
> +	    my $v = $json_result->{nvme_smart_health_information_log}->{$k};
> +	    if ($format eq 'text') {
> +		$smartdata->{text} .= "$k : $v\n";

Previously, the "key : value" format would display human-readable info, 
i.e. capitalized key names, spacing around ':' and units for the value.
With this, it simply prints out the json key/value.

I suppose this could also be fixed by adapting the GUI, but on the CLI 
it would still look different.

> +	    } else {
> +		$smartdata->{properties}->{$k} = $v;
> +	    }
> +	    if ($k eq 'percentage_used') {
> +		$smartdata->{wearout} = $wearout;
> +	    }
> +	}
> +	delete $smartdata->{attributes}; # this is for ata disks only
> +    } else {
> +	$smartdata->{type} = 'ata';
> +	foreach my $attribute (@{$json_result->{ata_smart_attributes}->{table}}) {
> +	    # we need to override or delete some fields to fit expected json schema of unit tests

while true, I feel like the reason isn't really the unit tests but 
rather consumers (i.e. the GUI) depending on it ;)

> +	    my $flags_string = $attribute->{flags}->{string};
> +	    $flags_string =~ s/\s+$//;
> +	    $attribute->{flags} = $flags_string;
> +	    $attribute->{raw} = $attribute->{raw}->{string};
> +	    $attribute->{threshold} = delete $attribute->{thresh};
> +	    $attribute->{fail} = ${attribute}->{when_failed} eq "" ? "-" : ${attribute}->{when_failed};
> +	    delete ${attribute}->{when_failed};
> +
> +	    push @{$smartdata->{attributes}}, $attribute;
> +	}
> +	my @sorted_attributes = sort { $a->{id} <=> $b->{id} } @{$smartdata->{attributes}};
> +	@{$smartdata->{attributes}} = @sorted_attributes;
> +    }
> +
> +    my $exit_status = $json_result->{smartctl}->{exit_status};
> +    if ($exit_status ne 0) {
> +	die "Error getting S.M.A.R.T. data: Exit code: $exit_status\n";
> +    }

maybe do this before parsing?

>   
>       return $smartdata;
>   }





More information about the pve-devel mailing list