[pve-devel] Default cache mode for VM hard drives

Wolfgang Bumiller w.bumiller at proxmox.com
Thu May 28 14:41:33 CEST 2015

> On May 28, 2015 at 2:07 PM Stanislav German-Evtushenko <ginermail at gmail.com>
> wrote:
> With O_DIRECT we have to trust our user space application because data
> is getting by kernel directly from the application memory. We can
> think that kernel could copy buffer from user space before writing it
> to block device however it would mean we re-implement "host cache"
> again.

Not exactly. A cache does more than just buffering.
But you're right in that a buffer would be an additional performance
hit.  At the same time, you could argue that multiple concurrent
reads on the same user-space buffer aren't ideal behavior either.

IMHO this kind of buffering is the job of whichever layer depends on
consistent data, ie: mdraid, dmraid, drbd.
After all, their entire *job* is to make sure the same data is
written to multiple devices, so it would make sense that it is also
*their* job to - when used with O_DIRECT - *buffer* the user data
somewhere before sending off multiple inconsistent versions to their
lower levels.

> BTW: Linus Torvalds considers O_DIRECT broken by design and thinks it
> must never been used. At the same time it is widely used.
Yes, the comment in the open(2) manpage is hilarious ;)
However, the idea of O_DIRECT *does* make sense in *some* cases.
Particularly, if you can predict your own application's storage data
flow well enough, then custom caching in userspace *can* make sense.
So, yes, databases for instance *might* benefit. (Although I doubt
every single one of them truly does a better job than the kernel ;) )

More information about the pve-devel mailing list