[pve-devel] [PATCH SERIES storage/qemu-server/-manager] RFC : add lvmqcow2 storage support

Dominik Csapak d.csapak at proxmox.com
Wed Aug 28 14:53:30 CEST 2024


On 8/26/24 13:00, Alexandre Derumier via pve-devel wrote:
> 
> This patch series add support for a new lvmqcow2 storage format.
> 
> Currently, we can't do snasphot && thin provisionning on shared block devices because
> lvm thin can't share his metavolume. I have a lot of onprem vmware customers
> where it's really blocking the proxmox migration. (and they are looking for ovirt/oracle
> virtualisation where it's working fine).
> 
> It's possible to format a block device without filesystem with qcow2 format directly.
> This is used by redhat rhev/ovirt since almost 10year in their vsdm daemon.
> 
> For thin provisiniong or to handle extra size of snapshot, we need to be able to resize
> the lvm volume dynamically.
> The volume is increased by chunk of 1GB by default (can be changed).
> Qemu implement events to sent an alert when the write usage is reaching a threshold.
> (Threshold is 50% of last chunk, so when vm have 500MB free)
> 
> The resize is async (around 2s), so user need to choose a correct chunk size && threshold,
> if the storage is really fast (nvme for example, where you can write more than 500MB in 2ss)
> 
> If the resize is not enough fast, the vm will pause in io-error.
> pvestatd is looking for this error, and try to extend again if needed and resume the vm


Hi,

just my personal opinion, maybe you also want to wait for more feedback from somebody else...
(also i just glanced over the patches, so correct me if I'm wrong)

i see some problems with this approach (some are maybe fixable, some probably not?)

* as you mentioned, if the storage is fast enough you have a runaway VM
   this is IMHO not acceptable, as that leads to VMs that are completely blocked and
   can't do anything. I fear this will generate many support calls why their guests
   are stopped/hanging...

* the code says containers are supported (rootdir => 1) but i don't see how?
   there is AFAICS no code to handle them in any way...
   (maybe just falsely copied?)

* you lock the local blockextend call, but give it a timeout of 60 seconds.
   what if that timeout expires? the vm again gets completely blocked until it's
   resized by pvestatd

* IMHO pvestatd is the wrong place to make such a call. It's already doing much
   stuff in a way where a single storage operation blocks many other things
   (metrics, storage/vm status, ballooning, etc..)

   cramming another thing in there seems wrong and will only lead to even more people
   complaining about the pvestatd not working, only in this case the vms
   will be in an io-error state indefinitely then.

   I'd rather make a separate daemon/program, or somehow integrate it into
   qmeventd (but then it would have to become multi threaded/processes/etc.
   to not block it's other purposes)

* there is no cluster locking?
   you only mention

   ---8<---
   #don't·use·global·cluster·lock·here,·use·on·native·local·lvm·lock
   --->8---

   but don't configure any lock? (AFAIR lvm cluster locking needs additional
   configuration/daemons?)

   this *will* lead to errors if multiple VMs on different hosts try
   to resize at the same time.

   even with cluster locking, this will very soon lead to contention, since
   storage operations are inherently expensive, e.g. if i have
   10-100 VMs wanting to resize at the same time, some of them will run
   into a timeout or at least into the blocking state.

   That does not even need much IO, just bad luck when multiple VMs go
   over the threshold within a short time.

All in all, I'm not really sure if the gain (snapshots on shared LVM) is worth
the potential cost in maintenance, support and customer dissatisfaction with
stalled/blocked VMs.

Generally a better approach could be for your customers to use some
kind of shared filesystem (GFS2/OCFS/?). I know those are not really
tested or supported by us, but i would hope that they scale and behave
better than qcow2-on-lvm-with-dynamic-resize.

best regards
Dominik




More information about the pve-devel mailing list