[pve-devel] [RFC v2 storage] smartctl: use json parsing
Stefan Reiter
s.reiter at proxmox.com
Wed Mar 31 14:06:42 CEST 2021
Not familiar with smartctl, but I can comment on the code at least :)
First off, quick smoke test shows me this in the GUI on an NVMe device:
available_spare : 100
available_spare_threshold : 10
controller_busy_time : 4089
critical_comp_time : 0
critical_warning : 0
data_units_read : 10402470
data_units_written : 142318653
host_reads : 341998913
host_writes : 1135808482
media_errors : 0
num_err_log_entries : 345
percentage_used : 11
power_cycles : 301
power_on_hours : 1033
temperature : 44
temperature_sensors : ARRAY(0x55c92a53a868)
unsafe_shutdowns : 27
warning_temp_time : 0
...so the "temperate_sensors" are certainly wrong, and also the format
is much less readable then before. ATA output was the same.
On 3/30/21 2:23 PM, Oguz Bektas wrote:
> adapt the smartctl endpoint to run smartctl with the --json or -j flag
> to parse it more reasonably.
>
> additionally add the $format parameter, one can choose between 'text'
> and 'json' outputs in the API (only relevant for nvme devices, since for
> API backwards compatibility we have to keep the 'text' field)
>
> for the unit tests we need to collect the smartctl json outputs from
> now. the current tests cover ssd_smart and nvme_smart cases.
>
> Signed-off-by: Oguz Bektas <o.bektas at proxmox.com>
> ---
>
>
> NOTE: i deleted the rest of the tests so this builds properly with
> the unit tests passing. but i could still keep the old outputs and place
> the new ones with a new naming scheme, and adapt the
> disklist_test.pm to read those instead. i'm open for ideas.
> though applying this should result in make'able and working package
>
Please put at least the deletion (or moving/renaming if you prefer that)
of old test cases into a separate patch, if it applies first there still
won't be any breakage, but this one is a bit much to take in with
additions and deletions all mixed together.
>
> [...snip...]
>
> diff --git a/PVE/API2/Disks.pm b/PVE/API2/Disks.pm
> index 33bca76..a453efc 100644
> --- a/PVE/API2/Disks.pm
> +++ b/PVE/API2/Disks.pm
> @@ -195,6 +195,12 @@ __PACKAGE__->register_method ({
> description => "If true returns only the health status",
> optional => 1,
> },
> + format => {
> + description => "Return json or text",
> + type => 'string',
> + enum => ['text', 'json'],
> + optional => 1,
> + },
As mentioned off-list, we already have 'pve-output-format', though for
pvesh... not sure how this would help or apply here, but certainly worth
giving a quick thought about.
For instance, "pvesh get /nodes/.../disk/smart --disk /dev/nvme0n1
--output-format text --format json" or "--output-format json-pretty
--format text" give some interesting results ;)
> },
> },
> returns => {
> @@ -204,6 +210,7 @@ __PACKAGE__->register_method ({
> type => { type => 'string', optional => 1 },
> attributes => { type => 'array', optional => 1},
> text => { type => 'string', optional => 1 },
> + json => { type => 'string', optional => 1 },
> },
> },
> code => sub {
> @@ -211,7 +218,7 @@ __PACKAGE__->register_method ({
>
> my $disk = PVE::Diskmanage::verify_blockdev_path($param->{disk});
>
> - my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly});
> + my $result = PVE::Diskmanage::get_smart_data($disk, $param->{healthonly}, $param->{format});
>
> $result->{health} = 'UNKNOWN' if !defined $result->{health};
> $result = { health => $result->{health} } if $param->{healthonly};
> diff --git a/PVE/Diskmanage.pm b/PVE/Diskmanage.pm
> index 64bb813..5ed140f 100644
> --- a/PVE/Diskmanage.pm
> +++ b/PVE/Diskmanage.pm
> @@ -81,11 +81,12 @@ sub disk_is_used {
> }
>
> sub get_smart_data {
> - my ($disk, $healthonly) = @_;
> + my ($disk, $healthonly, $format) = @_;
>
> assert_blockdev($disk);
> my $smartdata = {};
> my $type;
> + $format = 'text' if !defined($format);
$format //= 'text';
>
> my $returncode = 0;
>
> @@ -95,69 +96,69 @@ sub get_smart_data {
> or die "failed to get nvme controller device for $disk\n");
> }
>
> - my $cmd = [$SMARTCTL, '-H'];
> - push @$cmd, '-A', '-f', 'brief' if !$healthonly;
> + my $cmd = [$SMARTCTL, '-j', '-H'];
> + if (!$healthonly) {
> + push @$cmd, '-Afbrief';
> + }
why change the 'push' here?
> push @$cmd, $disk;
>
> + my $smart_result = '';
> eval {
> - $returncode = run_command($cmd, noerr => 1, outfunc => sub{
> - my ($line) = @_;
> -
> -# ATA SMART attributes, e.g.:
> -# ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
> -# 1 Raw_Read_Error_Rate POSR-K 100 100 000 - 0
> -#
> -# SAS and NVME disks, e.g.:
> -# Data Units Written: 5,584,952 [2.85 TB]
> -# Accumulated start-stop cycles: 34
> -
> - if (defined($type) && $type eq 'ata' && $line =~ m/^([ \d]{2}\d)\s+(\S+)\s+(\S{6})\s+(\d+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(.*)$/) {
> - my $entry = {};
> -
> - $entry->{name} = $2 if defined $2;
> - $entry->{flags} = $3 if defined $3;
> - # the +0 makes a number out of the strings
> - $entry->{value} = $4+0 if defined $4;
> - $entry->{worst} = $5+0 if defined $5;
> - # some disks report the default threshold as --- instead of 000
> - if (defined($6) && $6 eq '---') {
> - $entry->{threshold} = 0;
> - } else {
> - $entry->{threshold} = $6+0 if defined $6;
> - }
> - $entry->{fail} = $7 if defined $7;
> - $entry->{raw} = $8 if defined $8;
> - $entry->{id} = $1 if defined $1;
> - push @{$smartdata->{attributes}}, $entry;
> - } elsif ($line =~ m/(?:Health Status|self\-assessment test result): (.*)$/ ) {
> - $smartdata->{health} = $1;
> - } elsif ($line =~ m/Vendor Specific SMART Attributes with Thresholds:/) {
> - $type = 'ata';
> - delete $smartdata->{text};
> - } elsif ($line =~ m/=== START OF (READ )?SMART DATA SECTION ===/) {
> - $type = 'text';
> - } elsif (defined($type) && $type eq 'text') {
> - $smartdata->{text} = '' if !defined $smartdata->{text};
> - $smartdata->{text} .= "$line\n";
> - # extract wearout from nvme/sas text, allow for decimal values
> - if ($line =~ m/Percentage Used(?: endurance indicator)?:\s*(\d+(?:\.\d+)?)\%/i) {
> - $smartdata->{wearout} = 100 - $1;
> - }
> - } elsif ($line =~ m/SMART Disabled/) {
> - $smartdata->{health} = "SMART Disabled";
> - }
> - });
> + run_command($cmd, noerr => 1, outfunc => sub { $smart_result .= shift });
> };
> my $err = $@;
>
> - # bit 0 and 1 mark an severe smartctl error
> - # all others are for disk status, so ignore them
> - # see smartctl(8)
> - if ((defined($returncode) && ($returncode & 0b00000011)) || $err) {
> - die "Error getting S.M.A.R.T. data: Exit code: $returncode\n";
> + my $json_result = decode_json($smart_result);
> +
> + my $smart_health = $json_result->{smart_status}->{passed};
> + if (JSON::is_bool($smart_health)) {
> + $smart_health = ($smart_health)? "PASSED" : "FAILED";
unnecessary brackets?
> + } else {
> + $smart_health = 'UNKNOWN';
> }
>
> - $smartdata->{type} = $type;
> + $smartdata->{health} = $smart_health;
> +
> + $type = $json_result->{device}->{type};
> + if ($type eq 'nvme') {
> + $smartdata->{type} = $format; # text or json
> +
> + my $wearout = 100.0 - $json_result->{nvme_smart_health_information_log}->{percentage_used} if !$healthonly;
too long even for my liking, needs a line break somewhere :)
> +
> + foreach my $k (sort keys %{$json_result->{nvme_smart_health_information_log}}) {
> + my $v = $json_result->{nvme_smart_health_information_log}->{$k};
> + if ($format eq 'text') {
> + $smartdata->{text} .= "$k : $v\n";
Previously, the "key : value" format would display human-readable info,
i.e. capitalized key names, spacing around ':' and units for the value.
With this, it simply prints out the json key/value.
I suppose this could also be fixed by adapting the GUI, but on the CLI
it would still look different.
> + } else {
> + $smartdata->{properties}->{$k} = $v;
> + }
> + if ($k eq 'percentage_used') {
> + $smartdata->{wearout} = $wearout;
> + }
> + }
> + delete $smartdata->{attributes}; # this is for ata disks only
> + } else {
> + $smartdata->{type} = 'ata';
> + foreach my $attribute (@{$json_result->{ata_smart_attributes}->{table}}) {
> + # we need to override or delete some fields to fit expected json schema of unit tests
while true, I feel like the reason isn't really the unit tests but
rather consumers (i.e. the GUI) depending on it ;)
> + my $flags_string = $attribute->{flags}->{string};
> + $flags_string =~ s/\s+$//;
> + $attribute->{flags} = $flags_string;
> + $attribute->{raw} = $attribute->{raw}->{string};
> + $attribute->{threshold} = delete $attribute->{thresh};
> + $attribute->{fail} = ${attribute}->{when_failed} eq "" ? "-" : ${attribute}->{when_failed};
> + delete ${attribute}->{when_failed};
> +
> + push @{$smartdata->{attributes}}, $attribute;
> + }
> + my @sorted_attributes = sort { $a->{id} <=> $b->{id} } @{$smartdata->{attributes}};
> + @{$smartdata->{attributes}} = @sorted_attributes;
> + }
> +
> + my $exit_status = $json_result->{smartctl}->{exit_status};
> + if ($exit_status ne 0) {
> + die "Error getting S.M.A.R.T. data: Exit code: $exit_status\n";
> + }
maybe do this before parsing?
>
> return $smartdata;
> }
More information about the pve-devel
mailing list