[pve-devel] Default cache mode for VM hard drives

Thu May 28 09:55:37 CEST 2015

Alexandre,

This is all correct but not related to inconsistency issue.

Stanislav

On Thu, May 28, 2015 at 10:44 AM, Alexandre DERUMIER <aderumier at odiso.com>
wrote:

> >>That is right and you just can't use O_DIRECT without alignment. You
> would just get an error on "write" system call. If you check
> drbd_oos_test.c you find posix_memalign there.
> http://people.redhat.com/msnitzer/docs/io-limits.txt
> "Direct I/O best practices
> -------------------------
> Users must always take care to use properly aligned and sized IO.  This
> is especially important for Direct I/O access.  Direct I/O should be
> aligned on a 'logical_block_size' boundary and in multiples of the
> 'logical_block_size'.  With native 4K devices (logical_block_size is 4K)
> it is now critical that applications perform Direct I/O that is a
> multiple of the device's 'logical_block_size'.  This means that
> applications that do not perform 4K aligned I/O, but 512-byte aligned
> I/O, will break with native 4K devices.
> "
>
> about qemu, (for qcow2, I think raw it's ok)
>
> http://lists.gnu.org/archive/html/qemu-discuss/2015-01/msg00051.html
>
> "
> qcow2 cannot store the "physical block size" as an explicit
> property.  But what you can do is the following:
>
> 1. Make sure the host physical disk partition system that stores
> the qcow2 file is aligned to a multiple of 4K (or the RAID block
> size if on a RAID system).
>
> 2. Make sure the host file system that stores the qcow2 file has
> a block size of 4K or a multiple of 4K.
>
> 3. Make sure the internal qcow2 cluster_size is 4K or a multiple
> of 4K (I think this is the default).  Otherwise this is set using
> the "-o" "cluster_size=4096" option to qemu-img create/convert.
>
> 4. Make sure the guest partition on the virtual disk (backed by
> the qcow2 file) is aligned on a multiple of the qcow2
> cluster_size.
>
> 5. Make sure the guest file system of the guest partition on the
> virtual disk has a block size which is a multiple of the qcow2
> cluster_size.
>
> In other words, the usual "4K issue" procedures, but on both the
> physical and virtual machine.
> "
> ----- Mail original -----
> De: "Stanislav German-Evtushenko" <ginermail at gmail.com>
> À: "aderumier" <aderumier at odiso.com>
> Cc: "dietmar" <dietmar at proxmox.com>, "pve-devel" <
> pve-devel at pve.proxmox.com>
> Envoyé: Jeudi 28 Mai 2015 09:38:15
> Objet: Re: [pve-devel] Default cache mode for VM hard drives
>
> > not sure it's related, but with O_DIRECT I think that the write need to
> be aligned with multiple of 4k block. (or 512bytes)
> That is right and you just can't use O_DIRECT without alignment. You would
> just get an error on "write" system call. If you check drbd_oos_test.c you
> find posix_memalign there.
>
> On Thu, May 28, 2015 at 10:33 AM, Alexandre DERUMIER < aderumier at odiso.com
> > wrote:
>
>
> Hi,
>
> not sure it's related, but with O_DIRECT I think that the write need to be
> aligned with multiple of 4k block. (or 512bytes)
>
> (and I remember some bug with qemu and and 512b-logical/4k-physical disks
>
> http://pve.proxmox.com/pipermail/pve-devel/2012-November/004530.html
>
> I'm not an expert so I can't confirm.
>
> ----- Mail original -----
> De: "Stanislav German-Evtushenko" < ginermail at gmail.com >
> À: "dietmar" < dietmar at proxmox.com >
> Cc: "aderumier" < aderumier at odiso.com >, "pve-devel" <
> pve-devel at pve.proxmox.com >
> Envoyé: Jeudi 28 Mai 2015 09:22:12
> Objet: Re: [pve-devel] Default cache mode for VM hard drives
>
> Hi Dietmar,
>
> I did it couple of times already and everytime I had the same answer
> "upper layer problem". Well, as we've done this long way up to this point I
> would like to continue.
>
> I have just done the same test with mdadm and not DRBD. And what I found
> that this problem was reproducible on the software raid too, just as it was
> claimed by Lars Ellenberg. It means that problem is not only related to
> DRBD but to O_DIRECT mode generally when we don't use host cache and a
> block device reads data directly from userspace.
>
> The testcase is bellow.
>
> 1. Prepare
>
> dd if=/dev/zero of=/tmp/mdadm1 bs=1M count=100
> dd if=/dev/zero of=/tmp/mdadm2 bs=1M count=100
> losetup /dev/loop1 /tmp/mdadm1
> losetup /dev/loop2 /tmp/mdadm2
> mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/loop{1,2}
>
> 2. Write data with O_DIRECT
>
> ./a.out /dev/md0
>
> 3. Check consistency with vbindiff
>
> vbindiff /tmp/mdadm{1,2} #press enter multiple times to skip metadata
>
> And here we find that data on "physical devices" is different and md raid
> did not catch this.
>
>
> On Thu, May 28, 2015 at 7:40 AM, Dietmar Maurer < dietmar at proxmox.com >
> wrote:
>
>
> > What this means?
>
> I still think you should discuss that on the DRBD list.
>
>
>
>
>
> Best regards,
> Stanislav German-Evtushenko
>
>
>
>

-- 
www.helplinux.ru - Найди себе Гуру
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-devel/attachments/20150528/074e5f33/attachment.htm>