orphaned cfs lock when interrupting qm disk import

Mon Aug 26 22:46:10 CEST 2024

Hi everyone,

Say you've just kicked off a long-running "qm disk import ..." command and
notice that an incorrect flag was specified. Ok, cancel the operation with
control-C, fix the flag and re-execute the import command...

However, when using shared storage you'll (at least for the next 120
seconds) bump up against an orphaned cfs lock directory. One could manually
remove the directory from /etc/pve/priv/locks, but it seems a little risky.
(and a bad habit to get into)

So this got me thinking... could we trap SIGINT and more gracefully fail
the operation? It seems like this would allow the cfs_lock function to
clean up after itself. This seems like it'd be a nice CLI QOL improvement.

However, I'm not familiar with the history of the cfs-lock mechanism, why
it's used for shared storage backends, and what other invalid PVE states
might be avoided as a side-effect of serializing storage operations.
Allowing concurrent operations could result in disk numbering collisions,
but I'm not sure what else. (apart from storage-specific limitations.)

The README in the pve-cluster repo was helpful but a bit limited in scope.
Could anyone shed some more light on this for me?

Thanks in advance,
Josh