[PVE-User] High I/O waits, not sure if it's a ceph issue.

Alexandre DERUMIER aderumier at odiso.com
Thu Jul 2 15:57:32 CEST 2020


you should give it a try to ceph octopus. librbd have greatly improved for write, and I can recommand to enable writeback now by default

Here some iops result with 1vm - 1disk -  4k block   iodepth=64, librbd, no iothread.

                        nautilus-cache=none     nautilus-cache=writeback          octopus-cache=none     octopus-cache=writeback
randread 4k                  62.1k                     25.2k                            61.1k                     60.8k
randwrite 4k                 27.7k                     19.5k                            34.5k                     53.0k
seqwrite 4k                  7850                      37.5k                            24.9k                     82.6k

----- Mail original -----
De: "Mark Schouten" <mark at tuxis.nl>
À: "proxmoxve" <pve-user at pve.proxmox.com>
Envoyé: Jeudi 2 Juillet 2020 15:15:20
Objet: Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.

On Thu, Jul 02, 2020 at 09:06:54AM +1000, Lindsay Mathieson wrote: 
> I did some adhoc testing last night - definitely a difference, in KRBD's 
> favour. Both sequential and random IO was much better with it enabled. 

Interesting! I just did some testing too on our demo cluster. Ceph with 
6 osd's over three nodes, size 2. 

root at node04:~# pveversion 
pve-manager/6.2-6/ee1d7754 (running kernel: 5.4.41-1-pve) 
root at node04:~# ceph -v 
ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus 

rbd create fio_test --size 10G -p Ceph 
rbd create map_test --size 10G -p Ceph 
rbd map Ceph/map_test 

When just using a write test (rw=randwrite) krbd wins, big. 
WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec 
WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec 

However, using rw=randrw (rwmixread=75), things change a lot: 
READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec 
WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec 

READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec 
WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec 

Maybe I'm interpreting or testing stuff wrong, but it looks like simply writing 
to krbd is much faster, but actually trying to use that data seems slower. Let 
me know what you guys think. 

Attachments are being stripped, IIRC, so here's the config and the full output 
of the tests: 

rbd_write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32 
rbd_readwrite: (g=1): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32 
krbd_write: (g=2): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 
krbd_readwrite: (g=3): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 
Starting 4 processes 
Jobs: 1 (f=1): [_(3),f(1)][100.0%][eta 00m:00s] 
rbd_write: (groupid=0, jobs=1): err= 0: pid=1846441: Thu Jul 2 15:08:42 2020 
write: IOPS=9712, BW=37.9MiB/s (39.8MB/s)(10.0GiB/269904msec); 0 zone resets 
slat (nsec): min=943, max=1131.9k, avg=6367.94, stdev=10934.84 
clat (usec): min=1045, max=259066, avg=3286.70, stdev=4553.24 
lat (usec): min=1053, max=259069, avg=3293.06, stdev=4553.20 
clat percentiles (usec): 
| 1.00th=[ 1844], 5.00th=[ 2114], 10.00th=[ 2311], 20.00th=[ 2573], 
| 30.00th=[ 2769], 40.00th=[ 2933], 50.00th=[ 3064], 60.00th=[ 3228], 
| 70.00th=[ 3425], 80.00th=[ 3621], 90.00th=[ 3982], 95.00th=[ 4359], 
| 99.00th=[ 5538], 99.50th=[ 6718], 99.90th=[ 82314], 99.95th=[125305], 
| 99.99th=[187696] 
bw ( KiB/s): min=17413, max=40282, per=83.81%, avg=32561.17, stdev=3777.39, samples=539 
iops : min= 4353, max=10070, avg=8139.93, stdev=944.34, samples=539 
lat (msec) : 2=2.64%, 4=87.80%, 10=9.37%, 20=0.08%, 50=0.01% 
lat (msec) : 100=0.02%, 250=0.09%, 500=0.01% 
cpu : usr=8.73%, sys=5.27%, ctx=1254152, majf=0, minf=8484 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0 
latency : target=0, window=0, percentile=100.00%, depth=32 
rbd_readwrite: (groupid=1, jobs=1): err= 0: pid=1852029: Thu Jul 2 15:08:42 2020 
read: IOPS=12.8k, BW=49.0MiB/s (52.4MB/s)(7678MiB/153607msec) 
slat (nsec): min=315, max=4467.8k, avg=3247.91, stdev=7360.28 
clat (usec): min=276, max=160495, avg=1412.53, stdev=656.11 
lat (usec): min=281, max=160497, avg=1415.78, stdev=656.02 
clat percentiles (usec): 
| 1.00th=[ 494], 5.00th=[ 693], 10.00th=[ 832], 20.00th=[ 1012], 
| 30.00th=[ 1139], 40.00th=[ 1254], 50.00th=[ 1352], 60.00th=[ 1450], 
| 70.00th=[ 1549], 80.00th=[ 1696], 90.00th=[ 1926], 95.00th=[ 2343], 
| 99.00th=[ 3621], 99.50th=[ 3949], 99.90th=[ 5604], 99.95th=[ 7373], 
| 99.99th=[11207] 
bw ( KiB/s): min=25546, max=50344, per=78.44%, avg=40147.73, stdev=2610.22, samples=306 
iops : min= 6386, max=12586, avg=10036.57, stdev=652.54, samples=306 
write: IOPS=4270, BW=16.7MiB/s (17.5MB/s)(2562MiB/153607msec); 0 zone resets 
slat (nsec): min=990, max=555362, avg=5474.97, stdev=6241.91 
clat (usec): min=1052, max=196165, avg=3239.08, stdev=3722.92 
lat (usec): min=1056, max=196171, avg=3244.55, stdev=3722.91 
clat percentiles (usec): 
| 1.00th=[ 1663], 5.00th=[ 1991], 10.00th=[ 2180], 20.00th=[ 2442], 
| 30.00th=[ 2606], 40.00th=[ 2769], 50.00th=[ 2966], 60.00th=[ 3130], 
| 70.00th=[ 3359], 80.00th=[ 3654], 90.00th=[ 4359], 95.00th=[ 5014], 
| 99.00th=[ 6325], 99.50th=[ 7177], 99.90th=[ 40109], 99.95th=[104334], 
| 99.99th=[175113] 
bw ( KiB/s): min= 8450, max=17786, per=78.45%, avg=13398.97, stdev=891.56, samples=306 
iops : min= 2112, max= 4446, avg=3349.35, stdev=222.88, samples=306 
lat (usec) : 500=0.79%, 750=4.30%, 1000=9.19% 
lat (msec) : 2=55.67%, 4=26.22%, 10=3.78%, 20=0.03%, 50=0.01% 
lat (msec) : 100=0.01%, 250=0.02% 
cpu : usr=13.97%, sys=7.94%, ctx=1729014, majf=0, minf=2214 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued rwts: total=1965537,655903,0,0 short=0,0,0,0 dropped=0,0,0,0 
latency : target=0, window=0, percentile=100.00%, depth=32 
krbd_write: (groupid=2, jobs=1): err= 0: pid=1855430: Thu Jul 2 15:08:42 2020 
write: IOPS=52.9k, BW=207MiB/s (217MB/s)(10.0GiB/49582msec); 0 zone resets 
slat (nsec): min=1624, max=41411k, avg=17942.28, stdev=482539.15 
clat (nsec): min=1495, max=41565k, avg=586889.90, stdev=2650654.87 
lat (usec): min=3, max=41568, avg=604.90, stdev=2691.73 
clat percentiles (usec): 
| 1.00th=[ 92], 5.00th=[ 93], 10.00th=[ 93], 20.00th=[ 94], 
| 30.00th=[ 95], 40.00th=[ 96], 50.00th=[ 99], 60.00th=[ 102], 
| 70.00th=[ 109], 80.00th=[ 120], 90.00th=[ 139], 95.00th=[ 161], 
| 99.00th=[14877], 99.50th=[18482], 99.90th=[18744], 99.95th=[22676], 
| 99.99th=[22938] 
bw ( KiB/s): min=61770, max=1314960, per=94.28%, avg=199384.09, stdev=331335.15, samples=99 
iops : min=15442, max=328740, avg=49845.71, stdev=82833.70, samples=99 
lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=55.51% 
lat (usec) : 250=41.04%, 500=0.12%, 750=0.01%, 1000=0.01% 
lat (msec) : 2=0.01%, 10=0.01%, 20=3.22%, 50=0.07% 
cpu : usr=6.29%, sys=11.90%, ctx=4350, majf=0, minf=12 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0 
latency : target=0, window=0, percentile=100.00%, depth=32 
krbd_readwrite: (groupid=3, jobs=1): err= 0: pid=1858884: Thu Jul 2 15:08:42 2020 
read: IOPS=1377, BW=5511KiB/s (5643kB/s)(7680MiB/1426930msec) 
slat (usec): min=200, max=145771, avg=716.74, stdev=355.05 
clat (usec): min=31, max=169171, avg=16877.60, stdev=2881.41 
lat (usec): min=692, max=170099, avg=17594.97, stdev=2937.38 
clat percentiles (usec): 
| 1.00th=[11207], 5.00th=[12780], 10.00th=[13698], 20.00th=[14746], 
| 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433], 
| 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103], 
| 99.00th=[25035], 99.50th=[28443], 99.90th=[34866], 99.95th=[39060], 
| 99.99th=[57410] 
bw ( KiB/s): min= 2312, max= 6776, per=99.99%, avg=5510.53, stdev=292.16, samples=2853 
iops : min= 578, max= 1694, avg=1377.63, stdev=73.04, samples=2853 
write: IOPS=459, BW=1837KiB/s (1881kB/s)(2560MiB/1426930msec); 0 zone resets 
slat (nsec): min=1731, max=131919, avg=8242.43, stdev=5170.50 
clat (usec): min=4, max=169165, avg=16871.71, stdev=2885.14 
lat (usec): min=22, max=169182, avg=16880.10, stdev=2885.33 
clat percentiles (usec): 
| 1.00th=[11207], 5.00th=[12780], 10.00th=[13698], 20.00th=[14746], 
| 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433], 
| 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103], 
| 99.00th=[25297], 99.50th=[28181], 99.90th=[34866], 99.95th=[38536], 
| 99.99th=[58459] 
bw ( KiB/s): min= 696, max= 2368, per=100.00%, avg=1837.14, stdev=169.59, samples=2853 
iops : min= 174, max= 592, avg=459.28, stdev=42.39, samples=2853 
lat (usec) : 10=0.01%, 50=0.01%, 750=0.01% 
lat (msec) : 2=0.01%, 4=0.01%, 10=0.21%, 20=90.02%, 50=9.75% 
lat (msec) : 100=0.02%, 250=0.01% 
cpu : usr=1.34%, sys=3.68%, ctx=1966986, majf=0, minf=15 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued rwts: total=1966000,655440,0,0 short=0,0,0,0 dropped=0,0,0,0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec 

Run status group 1 (all jobs): 
READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec 
WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec 

Run status group 2 (all jobs): 
WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec 

Run status group 3 (all jobs): 
READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec 
WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec 

Disk stats (read/write): 
rbd0: ios=1965643/893981, merge=0/2379481, ticks=1366950/16608305, in_queue=14637096, util=95.65% 

============FIO CONFIG================== 





Mark Schouten | Tuxis B.V. 
KvK: 74698818 | http://www.tuxis.nl/ 
T: +31 318 200208 | info at tuxis.nl 
pve-user mailing list 
pve-user at pve.proxmox.com 

More information about the pve-user mailing list