[pve-devel] [PATCH v2 pve-storage 1/2] lvmplugin: use blkdiscard when supported instead cstream to saferemove drive

Wed Oct 22 11:43:15 CEST 2025

Am 22.10.25 um 11:35 AM schrieb DERUMIER, Alexandre:
>> +            }
>> +
>> +            my $discard_enabled = undef;
>> +
>> +            if ($scfg->{'saferemove-discard'}) {
>> +                my $discard_zeroes_data =
>> +                   
>> file_read_firstline("$sysdir/queue/discard_zeroes_data") // 0;
> 
>>> Are you sure this works? See:
>>> https://www.kernel.org/doc/html/v6.17/admin-guide/abi-
>>> stable.html#abi-sys-block-disk-queue-discard-zeroes-data
>>>
>>> "[RO] Will always return 0. Don’t rely on any specific behavior for
>>> discards, and don’t read this file."
> 
> ah, you are right, is has been removed some year ago
> https://git.zx2c4.com/linux-rng/commit/?h=jd/vdso-test-harness&id=48920ff2a5a940cd07d12cc79e4a2c75f1185aee
>>>
> from what I understand, it was a hack because REQ_OP_WRITE_ZEROES was
> not implemented, so it was using discard with zeroing (when it was
> possible).
> 
> 
>>> Isn't discard_max_hw_bytes the correct one, which also can be used to
>>> determine the step size:
>>> https://www.kernel.org/doc/html/v6.17/admin-guide/abi-
>>> stable.html#abi-sys-block-disk-queue-discard-max-hw-bytes
> 
>>> "[RO] Devices that support discard functionality may have internal
>>> limits on the number of bytes that can be trimmed or unmapped in a
>>> single operation. The discard_max_hw_bytes parameter is set by the
>>> device driver to the maximum number of bytes that can be discarded in
>>> a
>>> single operation. Discard requests issued to the device must not
>>> exceed
>>> this limit. A discard_max_hw_bytes value of 0 means that the device
>>> does
>>> not support discard functionality."
> 
> mmm, it's not that simple because the queue/discard_zeroes_data  was a
> flag to known if the discarded block on the storage are also really
> zero filled at same time (RZAT TRIM (return zero after trim).)
> 
> I known this was quite buggy because of storage implementation bug,
> maybe this is why redhat have removed discard support later
> https://access.redhat.com/errata/RHBA-2018:0135
> "The kernel no longer supports the /sys/block/dm-
> X/queue/discard_zeroes_data file in sysfs. It is therefore no longer
> possible to determine whether discarded blocks from a block device
> returns zeros or the actual data. Therefore, the virtual machine disk
> properties "Wipe After Delete" and "Enable Discard" are no longer
> supported at the same time. (BZ#1529305)"
> 
> 
> So, maybe we don't need to use discard at all, REQ_OP_WRITE_ZEROES is
> enough. (From my test, it's quite fast)

Oh sorry, of course discard by itself is not good enough. Yes, let's
just go with always using --zeroout (or the fallback if
REQ_OP_WRITE_ZEROES is not supported).

>>> And I'm not sure a limit of 32 MiB makes sense then. If the hardware
>>> supports much more, it should be fine to use that, or? 
> 
> The default of 32MB for zeroing was from redhat benchmark on real san
> hardware, balance between zeroing speed and latency for the running vm
> workload. 

For zeroing, the default is perfectly sensible, I was only talking about
discard here.