<div dir="ltr"><div>Alexandre,<br><br>> That's why we need to use barrier or FUA in last kernel in guest, when 

using O_DIRECT, to be sure that guest filesystem is ok and datas are 

flushed at regular interval.<br></div><div><br>The problems are:<br></div><div>- Linux swap - no barrier or something similar<br></div><div>- Windows - I have no idea what Windows does to ensure consistency but the issue is reproducible for Windows 7.<br><br></div><div>BTW: can anybody test drbd_oos_test.c against Ceph? I guess we will have the same result.<br></div><div><br></div>Stanislav<br><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 28, 2015 at 11:22 AM, Stanislav German-Evtushenko <span dir="ltr"><<a href="mailto:ginermail@gmail.com" target="_blank">ginermail@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Alexandre,<span class=""><br><br>> do you see the problem with qemu cache=directsync ? (O_DIRECT + O_DSYNC).<br></span></div>Yes, it happens in less number of cases (may be 10 times less) but still happens. I have a reproducible case with Windows 7 and directsync.<span class="HOEnZb"><font color="#888888"><br><br></font></span></div><span class="HOEnZb"><font color="#888888">Stanislav<br></font></span></div><div class="gmail_extra"><div><div class="h5"><br><div class="gmail_quote">On Thu, May 28, 2015 at 11:18 AM, Alexandre DERUMIER <span dir="ltr"><<a href="mailto:aderumier@odiso.com" target="_blank">aderumier@odiso.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>>>Resume: when working in O_DIRECT mode QEMU has to wait until "write" system call is finished before changing this buffer OR QEMU has to create new buffer every time OR ... other ideas?<br>

<br>

</span>AFAIK, only O_DSYNC can guarantee that data are really written to the last layer(disk platters)<br>

<br>

That's why we need to use barrier or FUA in last kernel in guest, when using O_DIRECT, to be sure that guest filesystem is ok and datas are flushed at regular interval.<br>

(To avoid incoherent filesystem with datas).<br>

<br>

<br>

do you see the problem with qemu cache=directsync ? (O_DIRECT + O_DSYNC).<br>

<span><br>

<br>

<br>

<br>

<br>

----- Mail original -----<br>

De: "Stanislav German-Evtushenko" <<a href="mailto:ginermail@gmail.com" target="_blank">ginermail@gmail.com</a>><br>

À: "dietmar" <<a href="mailto:dietmar@proxmox.com" target="_blank">dietmar@proxmox.com</a>><br>

Cc: "aderumier" <<a href="mailto:aderumier@odiso.com" target="_blank">aderumier@odiso.com</a>>, "pve-devel" <<a href="mailto:pve-devel@pve.proxmox.com" target="_blank">pve-devel@pve.proxmox.com</a>><br>

</span>Envoyé: Jeudi 28 Mai 2015 09:54:32<br>

<span>Objet: Re: [pve-devel] Default cache mode for VM hard drives<br>

<br>

</span><div><div>Dietmar,<br>

<br>

fsync esures that data reaches underlying hardware but it does not help being sure that buffer is not changed until it is fully written.<br>

<br>

I will describe my understanding here why we get this problem with O_DIRECT and don't have without.<br>

<br>

** Without O_DIRECT **<br>

1. Application tries to write data from buffer<br>

2. Data from buffer goes to host cache<br>

3. RAID writers get data from host cache and put to /dev/loop1 and /dev/loop2<br>

Even if buffer changes data in host cache will not be changed so RAID is consistent.<br>

<br>

** With O_DIRECT **<br>

1. Application tries to write data from buffer<br>

2. RAID writers get data from application (!!!) bufferand put to /dev/loop1 and /dev/loop2<br>

if meanwhile data in buffer is changed (this change can be done in different posix thread) then we have different data reachs /dev/loop1 and /dev/loop2<br>

<br>

Resume: when working in O_DIRECT mode QEMU has to wait until "write" system call is finished before changing this buffer OR QEMU has to create new buffer every time OR ... other ideas?<br>

<br>

Stanislav<br>

<br>

On Thu, May 28, 2015 at 10:31 AM, Dietmar Maurer < <a href="mailto:dietmar@proxmox.com" target="_blank">dietmar@proxmox.com</a> > wrote:<br>

<br>

<br>

> I have just done the same test with mdadm and not DRBD. And what I found<br>

> that this problem was reproducible on the software raid too, just as it was<br>

> claimed by Lars Ellenberg. It means that problem is not only related to<br>

> DRBD but to O_DIRECT mode generally when we don't use host cache and a<br>

> block device reads data directly from userspace.<br>

<br>

We simply think the behavior is correct. If you want to be sure data is<br>

on disk you have to call fsync.<br>

<br>

<br>

<br>

<br>

</div></div></blockquote></div></div></div></div></blockquote></div><br></div></div></div>