[PVE-User] shared LVM on host-based mirrored iSCSI LUNs
Stefan Sänger
stsaenger at googlemail.com
Mon Apr 23 17:37:41 CEST 2012
Hi Dietmar,
Am 23.04.2012 13:21, schrieb Dietmar Maurer:
>> Since I did not configure any locking for mdadm I figured that mdadm would
>> lead to corrupting the contents of the logical volumes used as virtual hard
>> disks.
>>
>> But to my surprise fsck did not reveal any errors.
>>
>> So my question is: Is there some locking already in place and I just missed it?
>> clvm is installed but obviously not used, /etc/lvm.conf is set to file based
>> locking and the locking_dir is local to every server,
>
> Yes, we have cluster wide locking, as long as you use the pve tools to manage storage.
Well - as far as I understood that cluster wide locking is in place is
no problem for drbd or iSCSI/FC-Targets.
The difference which I am really not sure about is the RAID setup with
mdadm using 2 iSCSI-Targets.
The fault scenario I am thinking about is this:
node pve1 is running a VM managed by HA when it crashes. As the crash
occured the vm was writing data to its hard disk. In normal operation
mode the data is passed to LVM which will pass it to mdadm - and mdadm
will write the data to each raid member disk.
I suspect that there may be a chance that the last write operation was
only successful to one of the raid members.
Now the cluster starts its work and will do two things: fence the failed
node and start the vm on another node, let's say pve2.
Since the restart of pve1 initiated by fencing will take some time and
booting the vm pve2 starts earlier, it is likely that the raid metadata
will still state "clean" when pve1 starts to connect to the storage
again - so that will not be a problem.
But looking at the physical extents used by the logical volume the
situaqtion is different; the last write operation may have failed and
now the extents may hold different data. When data is read from a RAID1
volume mdadm is supposed to do round-robin-reading in order to speed up
disk access. I believe that there is a 50/50 chance from which raid
member the extent will be read, so it is not defined if the correct data
will be read. Or am I missing something here?
The cluster wide locking is working on lvm layer. But my concern this
time is one layer further down: mdadm.
Stefan
More information about the pve-user
mailing list