[PVE-User] excessive I/O latency during CEPH rebuild
Eneko Lacunza
elacunza at binovo.es
Tue Oct 28 16:13:53 CET 2014
Hi Adam,
I suggest to set noout before removing the drive, so that cluster
doesn't start to rebalance. Then put in the new disk and in it; then
unset noout.
That way you just get the network traffic to complete the data in that
new disk (copy recover), and no rebalancing.
On 28/10/14 16:05, Adam Thompson wrote:
> On 14-10-28 10:03 AM, Adam Thompson wrote:
>> I'm seeing ridiculous I/O latency after out'ing and re-in'ing a disk
>> in the CEPH array; the OSD monitor tab shows two OSDs (i.e. disks)
>> having latency above 10msec - they're both in the 200ms range - but
>> reading a single uncached sector from a virtual disk takes >10sec.
>>
>> It's bad enough that all my virtualized DNS servers are timing out
>> and this, of course, directly impacts service.
>>
>> During normal (non-rebuild, non-rebalance) operations, CEPH is not
>> terribly fast to write, but delivers acceptable read speeds.
>>
>> Where do I start looking for problems? Are there any knobs I should
>> be tweaking for CEPH?
>>
>
> A related question: to proactively replace a disk, I'm doing
> Stop->Out->Remove / swap disk / Create OSD. Is that a viable
> procedure? Other than the rebuild I/O starving regular reads, it
> seems to be working...
>
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-user
mailing list