[pve-devel] [PATCH storage 0/2] Fix #2046 and disksize-mismatch with shared LVM

Fri Jan 4 16:06:49 CET 2019

On Fri, Jan 04, 2019 at 02:41:00PM +0100, Stoiko Ivanov wrote:
> On Fri, 4 Jan 2019 14:12:23 +0100
> Alwin Antreich <a.antreich at proxmox.com> wrote:
> 
> > On Fri, Jan 04, 2019 at 02:06:23PM +0100, Stoiko Ivanov wrote:
> > > The issue was observed recently and can lead to potential dataloss.
> > > When using a shared LVM storage (e.g. over iSCSI) in a clustered
> > > setup only the node, where a guest is active notices the size
> > > change upon disk-resize (lvextend/lvreduce).
> > > 
> > > LVM's metadata gets updated on all nodes eventually (the latest
> > > when pvestatd runs and lists all LVM-volumes (lvs/vgs update the
> > > metadata), however the device-files (/dev/$vg/$lv) on all nodes,
> > > where the guest is not actively running do not notice the change.
> > > 
> > > Steps to reproduce an I/O error:
> > > * create a qemu-guest with a disk backed by a shared LVM storage
> > > * create a filesystem on that disk and fill it to 100%
> > > * resize the disk/filesystem
> > > * put some more data on the filesystem
> > > * migrate the guest to another node
> > > * try reading past the initial disksize
> > > 
> > > The second patch fixes the size-mismatch by running `lvchange
> > > --refresh` whenever we activate a volume with LVM and should fix
> > > the critical issue
> > > 
> > > The first patch introduces a direct implementation of
> > > volume_size_info to the LVMPlugin.pm, reading the volume size via
> > > `lvs`, instead of falling back to `qemu-img info` from Plugin.pm.
> > > While this should always yield the same output after the second
> > > patch on the node where a guest is currently running, there still
> > > might be a mismatch when the LV is active (e.g. after a fresh boot)
> > > on a node, and gets resized on another node.  
> > I faintly recall, that there was a discussion offline about changing
> > the activation of LVs, especially for booting. Something similar to,
> > 'we activate LVs only if we need them on the specific node'.
> 
> Could make sense in general. However the volumes might get activated by
> some external invocation nonetheless (e.g. running `vgchange -ay`) -
> then the refresh is still necessary.

specifically, we discussed setting the 'k' attribute ('activationskip')
on PVE-managed LVs - these will then be ignored by all the usual
activation on boot, since those do not pass '--ignoreactivationskip' /
'-K' for obvious reasons ;)

having the refresh here is a good idea anyway, but maybe we want to
re-evaluate the above mechanism for 6.x?