[pve-devel] Default cache mode for VM hard drives

Thu May 28 10:18:22 CEST 2015

>>Resume: when working in O_DIRECT mode QEMU has to wait until "write" system call is finished before changing this buffer OR QEMU has to create new buffer every time OR ... other ideas?

AFAIK, only O_DSYNC can guarantee that data are really written to the last layer(disk platters)

That's why we need to use barrier or FUA in last kernel in guest, when using O_DIRECT, to be sure that guest filesystem is ok and datas are flushed at regular interval.
(To avoid incoherent filesystem with datas).

do you see the problem with qemu cache=directsync ? (O_DIRECT + O_DSYNC).

----- Mail original -----
De: "Stanislav German-Evtushenko" <ginermail at gmail.com>
À: "dietmar" <dietmar at proxmox.com>
Cc: "aderumier" <aderumier at odiso.com>, "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 28 Mai 2015 09:54:32
Objet: Re: [pve-devel] Default cache mode for VM hard drives

Dietmar, 

fsync esures that data reaches underlying hardware but it does not help being sure that buffer is not changed until it is fully written. 

I will describe my understanding here why we get this problem with O_DIRECT and don't have without. 

** Without O_DIRECT ** 
1. Application tries to write data from buffer 
2. Data from buffer goes to host cache 
3. RAID writers get data from host cache and put to /dev/loop1 and /dev/loop2 
Even if buffer changes data in host cache will not be changed so RAID is consistent. 

** With O_DIRECT ** 
1. Application tries to write data from buffer 
2. RAID writers get data from application (!!!) bufferand put to /dev/loop1 and /dev/loop2 
if meanwhile data in buffer is changed (this change can be done in different posix thread) then we have different data reachs /dev/loop1 and /dev/loop2 

Resume: when working in O_DIRECT mode QEMU has to wait until "write" system call is finished before changing this buffer OR QEMU has to create new buffer every time OR ... other ideas? 

Stanislav 

On Thu, May 28, 2015 at 10:31 AM, Dietmar Maurer < dietmar at proxmox.com > wrote: 

> I have just done the same test with mdadm and not DRBD. And what I found 
> that this problem was reproducible on the software raid too, just as it was 
> claimed by Lars Ellenberg. It means that problem is not only related to 
> DRBD but to O_DIRECT mode generally when we don't use host cache and a 
> block device reads data directly from userspace. 

We simply think the behavior is correct. If you want to be sure data is 
on disk you have to call fsync.