[PVE-User] High ceph OSD latency

Fabrizio Cuseo f.cuseo at panservice.it
Fri Jan 16 16:36:33 CET 2015

Following my problem, is correct that proxmox uses "barrier=1" on Ceph OSDS and "barrier=0" on /var/lib/vz  ? 

With barrier enabled, fsyncs/second values are really different:

root at proxmox:~# pveperf /var/lib/vz
CPU BOGOMIPS:      40000.24
REGEX/SECOND:      932650
HD SIZE:           325.08 GB (/dev/mapper/pve-data)
BUFFERED READS:    97.43 MB/sec
FSYNCS/SECOND:     20.88
DNS EXT:           69.87 ms
DNS INT:           63.98 ms (test.panservice)

root at proxmox:~# mount -o remount -o barrier=0 /var/lib/vz

root at proxmox:~# pveperf /var/lib/vz
CPU BOGOMIPS:      40000.24
REGEX/SECOND:      980519
HD SIZE:           325.08 GB (/dev/mapper/pve-data)
BUFFERED READS:    82.29 MB/sec
FSYNCS/SECOND:     561.09
DNS EXT:           64.09 ms
DNS INT:           77.50 ms (test.panservice)

Regards, Fabrizio 

----- Messaggio originale -----
Da: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
A: pve-user at pve.proxmox.com, "Fabrizio Cuseo" <f.cuseo at panservice.it>
Inviato: Giovedì, 15 gennaio 2015 13:17:07
Oggetto: Re: [PVE-User] High ceph OSD latency

On Thu, 15 Jan 2015 11:25:44 AM Fabrizio Cuseo wrote:
> What is strange is that on OSD tree I have high latency: tipically Apply
> latency is between 5 and 25, but commit lattency is between 150 and 300
> (and sometimes 5/600), with 5/10 op/s and some B/s rd/wr (i have only 3
> vms, and only 1 is working now, so the cluster is really unloaded).
> I am using a pool with 3 copies, and I have increased pg_num to 256 (the
> default value of 64 is too low); but OSD latency is the same with a
> different pg_num value.
> I have other clusters (similar configuration, using dell 2950, dual ethernet
> for ceph and proxmox, 4 x OSD with 1Tbyte drive, perc 5i controller), with
> several vlms, and the commit and apply latency is 1/2ms.
> Another cluster (test cluster) with 3 x dell PE860, with only 1 OSD per
> node, have better latency (10/20 ms).
> What can i check ? 

POOMA U, but if you have one drive or controller that is marginal or failing, 
it can slow down the whole cluster.

Might be worth while benching individual osd's

Fabrizio Cuseo - mailto:f.cuseo at panservice.it
Direzione Generale - Panservice InterNetWorking
Servizi Professionali per Internet ed il Networking
Panservice e' associata AIIP - RIPE Local Registry
Phone: +39 0773 410020 - Fax: +39 0773 470219
http://www.panservice.it  mailto:info at panservice.it
Numero verde nazionale: 800 901492

More information about the pve-user mailing list