[pve-devel] Default cache mode for VM hard drives

Stanislav German-Evtushenko ginermail at gmail.com
Thu May 28 10:27:34 CEST 2015


Alexandre,

> That's why we need to use barrier or FUA in last kernel in guest, when
using O_DIRECT, to be sure that guest filesystem is ok and datas are
flushed at regular interval.

The problems are:
- Linux swap - no barrier or something similar
- Windows - I have no idea what Windows does to ensure consistency but the
issue is reproducible for Windows 7.

BTW: can anybody test drbd_oos_test.c against Ceph? I guess we will have
the same result.

Stanislav

On Thu, May 28, 2015 at 11:22 AM, Stanislav German-Evtushenko <
ginermail at gmail.com> wrote:

> Alexandre,
>
> > do you see the problem with qemu cache=directsync ? (O_DIRECT + O_DSYNC).
> Yes, it happens in less number of cases (may be 10 times less) but still
> happens. I have a reproducible case with Windows 7 and directsync.
>
> Stanislav
>
> On Thu, May 28, 2015 at 11:18 AM, Alexandre DERUMIER <aderumier at odiso.com>
> wrote:
>
>> >>Resume: when working in O_DIRECT mode QEMU has to wait until "write"
>> system call is finished before changing this buffer OR QEMU has to create
>> new buffer every time OR ... other ideas?
>>
>> AFAIK, only O_DSYNC can guarantee that data are really written to the
>> last layer(disk platters)
>>
>> That's why we need to use barrier or FUA in last kernel in guest, when
>> using O_DIRECT, to be sure that guest filesystem is ok and datas are
>> flushed at regular interval.
>> (To avoid incoherent filesystem with datas).
>>
>>
>> do you see the problem with qemu cache=directsync ? (O_DIRECT + O_DSYNC).
>>
>>
>>
>>
>>
>> ----- Mail original -----
>> De: "Stanislav German-Evtushenko" <ginermail at gmail.com>
>> À: "dietmar" <dietmar at proxmox.com>
>> Cc: "aderumier" <aderumier at odiso.com>, "pve-devel" <
>> pve-devel at pve.proxmox.com>
>> Envoyé: Jeudi 28 Mai 2015 09:54:32
>> Objet: Re: [pve-devel] Default cache mode for VM hard drives
>>
>> Dietmar,
>>
>> fsync esures that data reaches underlying hardware but it does not help
>> being sure that buffer is not changed until it is fully written.
>>
>> I will describe my understanding here why we get this problem with
>> O_DIRECT and don't have without.
>>
>> ** Without O_DIRECT **
>> 1. Application tries to write data from buffer
>> 2. Data from buffer goes to host cache
>> 3. RAID writers get data from host cache and put to /dev/loop1 and
>> /dev/loop2
>> Even if buffer changes data in host cache will not be changed so RAID is
>> consistent.
>>
>> ** With O_DIRECT **
>> 1. Application tries to write data from buffer
>> 2. RAID writers get data from application (!!!) bufferand put to
>> /dev/loop1 and /dev/loop2
>> if meanwhile data in buffer is changed (this change can be done in
>> different posix thread) then we have different data reachs /dev/loop1 and
>> /dev/loop2
>>
>> Resume: when working in O_DIRECT mode QEMU has to wait until "write"
>> system call is finished before changing this buffer OR QEMU has to create
>> new buffer every time OR ... other ideas?
>>
>> Stanislav
>>
>> On Thu, May 28, 2015 at 10:31 AM, Dietmar Maurer < dietmar at proxmox.com >
>> wrote:
>>
>>
>> > I have just done the same test with mdadm and not DRBD. And what I found
>> > that this problem was reproducible on the software raid too, just as it
>> was
>> > claimed by Lars Ellenberg. It means that problem is not only related to
>> > DRBD but to O_DIRECT mode generally when we don't use host cache and a
>> > block device reads data directly from userspace.
>>
>> We simply think the behavior is correct. If you want to be sure data is
>> on disk you have to call fsync.
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-devel/attachments/20150528/55766166/attachment.htm>


More information about the pve-devel mailing list