[pve-devel] orphaned cfs lock when interrupting qm disk import

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Sep 6 19:38:07 CEST 2024


Hi,

Am 26/08/2024 um 22:46 schrieb Joshua Huber:
> Say you've just kicked off a long-running "qm disk import ..." command and
> notice that an incorrect flag was specified. Ok, cancel the operation with
> control-C, fix the flag and re-execute the import command...
> 
> However, when using shared storage you'll (at least for the next 120
> seconds) bump up against an orphaned cfs lock directory. One could manually
> remove the directory from /etc/pve/priv/locks, but it seems a little risky.
> (and a bad habit to get into)
> 
> So this got me thinking... could we trap SIGINT and more gracefully fail
> the operation? It seems like this would allow the cfs_lock function to
> clean up after itself. This seems like it'd be a nice CLI QOL improvement.

Yeah, that sounds sensible, albeit I'd have to look more closely into the
code, because it might not be that trivial if we execute another command,
like e.g. qemu-img, that then controls the terminal and would receive
the sigint directly. There are options for that too, but not so nice.
Anyhow, as long as this all happens in a worker it probably should not
be an issue and one could just install a handler with
`$SIG{INT} = sub { ... cleanup ...; die "interrupted" };`
and be done.

> However, I'm not familiar with the history of the cfs-lock mechanism, why
> it's used for shared storage backends, and what other invalid PVE states
> might be avoided as a side-effect of serializing storage operations.
> Allowing concurrent operations could result in disk numbering collisions,
> but I'm not sure what else. (apart from storage-specific limitations.)

In general the shared lock is to avoid concurrent access of the same volume
but it's currently much coarser that it needs to be, i.e., it could be
on just the volume or at least the VMID for volumes that are owned by guests.
But that's a rather different topic.

Anyhow, once interrupted keeping the lock active won't protect us from
anything, especially as it will become free after 120s anyway, as you noticed
yourself, so removing that actively immediately should not cause any (more)
problems, FIWCT.

Are you willing to look into this? Otherwise, a bugzilla entry would be fine
too.

cheers
 Thomas




More information about the pve-devel mailing list