[pve-devel] Default cache mode for VM hard drives

Fri May 29 09:58:23 CEST 2015

> AFAIK all disk IO is done by a single, dedicated thread.

The same is true for the C testcase. One thread is writing, the
other one is manipulating.

> I tried to read qemu-kvm code but it is difficult for me as I have
> never written C code.
(...)
> Meanwhile
> if somebody implemented the "default cache" option for a storage
> object it would help a lot for end-users like me.

This is probably the best course of action.

Now, let's get some things straight:

Fact is: Manipulating the buffer while writing means the data which
ends up on the device is UNpredictable. Nobody's denying this.
Fact is: A RAID1 device's entire purpose is to write the very same
data to multiple devices.

While these two facts seem to be incompatible, this is not at all the
case!
The actual data you write would most generally be defined as follows:
You write any random mixture of states of the buffer that existed
during the entire duraction of the write call.
And this is perfectly fine! It is exactly what you'd expect when
writing to a single disk. However, this is ***NOT*** the problem we
are actually encountering here! The problem we see here is the fact
that a RAID implementation is writing MULTIPLE DIFFERENT random
mixtures of states of the buffer to its target devices.

To me the situation is clear: The layer which starts distributing the
data to multiple lower layers is responsible for making sure each of
the lower layers is getting the exact same data. As I said above,
this is the very purpose of a RAID1 device! And it's the RAID1
device's job to buffer a single consistent version of the data,
which in ITSELF might not be in a consistent state from the
*userspace's* point of view, but still has to be consistent across the
devices the data gets spread across!

If this issue persists in current kernels it might be worth mentioning
the issue on the linux kernel mailing list.