[RFC storage/qemu-server] Thin provisioning on LVM

Joao Sousa joao.sousa at eurotux.com
Fri Jul 25 12:00:40 CEST 2025


Hi,

As previously discussed with Alexandre, we talked about an architecture 
that enables the use of thin-provisioned LVs with LVM. The idea is to 
implement a daemon that processes LV extend requests from a queue.

We considered two possible implementations for the queue and the daemon:

1. One queue and daemon per node. This approach increases complexity, 
particularly for live migrations and node failures. If a node fails, 
other nodes would need to "steal" pending requests from the failed 
node’s queue. It also introduces challenges in preserving the execution 
order of extend operations, since multiple daemons would compete for a 
storage lock without a guaranteed order.

2. A centralized queue in /etc/pve and one daemon per node. Each daemon 
would check the first entry in the queue and process the extend request 
only if the target volume is local to that node. This approach is 
simpler and easier to manage. However, we’d need to ensure proper 
locking when writing to the queue. Is there a C-based alternative to 
cfs_lock_file that we can use to coordinate writes to the queue from 
qmeventd and pvestatd?

For both implementations, we need to configure a write threshold for 
each VM's block devices. When this threshold is reached, it should 
trigger an event that qmeventd catches. As a fallback, if a VM is locked 
due to an I/O error, pvestatd should also submit an extend request. This 
one should be prioritized by placing it at the front of the queue.

The write threshold must be applied to the top node of the block device 
chain (drive-$drive_id) during the QemuServer::Blockdev::attach function 
when the VM starts. It should also be updated each time the volume is 
extended, so the daemon must reset it accordingly.

Here’s a simplified flow of the architecture:

qemu -> qmeventd -> extend_queue <- storage_monitor_daemon
      -> pvestatd

I’m currently implementing the write threshold in the attach function 
but running into issues with debugging. Are there any recommended 
methods or tools for debugging qemu-server functions? I’m not seeing any 
relevant logs in syslog.



More information about the pve-devel mailing list