[PVE-User] excessive I/O latency during CEPH rebuild

Tue Oct 28 16:11:08 CET 2014

Hi Adam,

You only have 3 osd in ceph cluster?

What about journals? Are they inline or in a separate (ssd?) disk?

What about network? Do you have an phisically independent network for 
proxmox/vms and ceph?

We have a currently 6-osd 3-node ceph cluster;  doing an out/in of a 
osd, doesn't create a very high impact. If you in a new osd (replace a 
disk) the impact is noticeable but our ~30 vms were yet workable. We do 
have different physicall networks for proxmox/VMs and ceph. (1gbit)

Cheers
Eneko

On 28/10/14 16:03, Adam Thompson wrote:
> I'm seeing ridiculous I/O latency after out'ing and re-in'ing a disk 
> in the CEPH array; the OSD monitor tab shows two OSDs (i.e. disks) 
> having latency above 10msec - they're both in the 200ms range - but 
> reading a single uncached sector from a virtual disk takes >10sec.
>
> It's bad enough that all my virtualized DNS servers are timing out and 
> this, of course, directly impacts service.
>
> During normal (non-rebuild, non-rebalance) operations, CEPH is not 
> terribly fast to write, but delivers acceptable read speeds.
>
> Where do I start looking for problems?  Are there any knobs I should 
> be tweaking for CEPH?
>

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
       943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es