[pbs-devel] [PATCH proxmox-backup] docs: add note for not using remote storages

Dominik Csapak d.csapak at proxmox.com
Wed Jun 12 08:39:42 CEST 2024


On 6/11/24 8:05 PM, Thomas Lamprecht wrote:
> This section is a quite central and important one, so I'm being a bit
> more nitpicking with it than other content. NFS boxes are still quite
> popular, a blanket recommendation against them quite probably won't
> help our cause or reducing noise in our getting help channels.
> 
> Dietmar already applied this, so would need a follow-up please.

sure

> 
> Am 11/06/2024 um 11:30 schrieb Dominik Csapak:
>> such as NFS or SMB. They will not provide the expected performance
>> and it's better to recommend against them.
> 
> Not so sure about doing recommending against them as a blanket statement,
> the "remote" part might adjective is a bit subtle and, e.g., using a local
> full flash NVMe storage attached over a 100G link with latency in the µs
> surely beats basically any local spinner only storage and probably even
> a lot of SATA attached SSD ones.

well alone the fact of using nfs makes some operations a few magnitudes 
slower. e.g. here locally creating a datastore locally takes a few 
seconds (probably fast due to the page cache) but a locally
mounted nfs (so no network involved) on the same disk takes
a few minutes. so at least some file creation/deletion operations
are some magnitudes slower just by using nfs (though i guess
there are some options/ipmlementations that can influence that
such as async/sync export options)

also a remote SMB share from windows (same physical host though, so
again, no real network) takes ~ a minute for the same operation

so yes, while I generally agree that using remote storage can be fast 
enough, using any of them increases some file operations by a 
significant amount, even when using fast storage and fast network

(i know that datastore creation is not the best benchmark for this,
but shows that there is significant overhead on some operations)

> 
> Also, it can be totally fine to use as second datastore, i.e. in a setup
> with a (smaller) datastore backed by (e.g. local) fast storage that is
> then periodically synced to a slower remote.
> 
>> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
>> ---
>> if we want to discourage users even more, we could also detect it on
>> datastore creation and put a warning into the task log
> 
> I would avoid that, at least not without actually measuring how the
> storage performs (which is probably quite prone to errors, or would
> require periodic measurements).

fine with me

> 
>>
>> also if we ever come around to implementing the 'health' page thomas
>> wished for, we can put a warning/error there too
>>
>>   docs/system-requirements.rst | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/docs/system-requirements.rst b/docs/system-requirements.rst
>> index fb920865..17756b7b 100644
>> --- a/docs/system-requirements.rst
>> +++ b/docs/system-requirements.rst
>> @@ -41,6 +41,9 @@ Recommended Server System Requirements
>>     * Use only SSDs, for best results
>>     * If HDDs are used: Using a metadata cache is highly recommended, for example,
>>       add a ZFS :ref:`special device mirror <local_zfs_special_device>`.
>> +  * While it's technically possible to use remote storages such as NFS or SMB,
> 
> Up-front, I wrote some possible smaller improvements upfront but then
> a replacement (see below), but I kept the others
> 
> Would do s/remote storages/remote storage/
> 
> (We use "storages" quite a few times already, but if possible keeping it
> singular sounds nicer IMO)

ok

> 
>> +    the additional latency and overhead drastically reduces performance and it's
> 
> s/additional latency and overhead/additional latency overhead/ ?
> 
> or "network overhead"
> 
> If it'd stay as is, the "reduces" should be changed to "reduce" ("latency and
> overhead" is plural).
> 

i meant actually two things here, the network latency and the additional
overhead of the second filesystem layer

> 
>> +    not recommended to use such a setup.
> 
> The last part would be better off with just:
> 
> "... and is not recommended"
> 

agreed, i was on the edge a bit with that wording anyway but just 
leaving it off sounds better.

> 
> But I'd rather reword the whole thing to focus more on what the actual issue is,
> i.e., not NFS or SMB/CIFS per se, but if the network accessing them is slow.
> Maybe something like:
> 
> * Avoid using remote storage, like NFS or SMB/CIFS, connected over a slow
>    (< 10 Gbps) and/or high latency (> 1 ms) link. Such a storage can
>    dramatically reduce performance and may even negatively impact the
>    backup source, e.g. by causing IO hangs.
> 
> I pulled the numbers in parentheses out of thin air, but IMO they shouldn't be too far
> off from 2024 Slow™, no hard feelings on adapting them though.

IMHO i'd not mention any specific numbers at all, unless we actually
benchmarked such a setup. so what about:

* Avoid using remote storage, like NFS or SMB/CIFS, connected over a 
slow and/or high latency link. Such a storage can dramatically reduce 
performance and may even negatively impact the backup source, e.g. by
causing IO hangs. If you want to use such a storage, make sure it
performs as expected by testing it before using it in production.


By adding that additional sentence we hopefully nudge some users
into actually testing before deploying it, instead of then
complaining that it's slow.


> 
>>   
>>   * Redundant Multi-GBit/s network interface cards (NICs)
>>   
> 




More information about the pbs-devel mailing list