[PVE-User] High I/O waits, not sure if it's a ceph issue.
Alexandre DERUMIER
aderumier at odiso.com
Thu Jul 2 15:57:32 CEST 2020
Hi,
you should give it a try to ceph octopus. librbd have greatly improved for write, and I can recommand to enable writeback now by default
Here some iops result with 1vm - 1disk - 4k block iodepth=64, librbd, no iothread.
nautilus-cache=none nautilus-cache=writeback octopus-cache=none octopus-cache=writeback
randread 4k 62.1k 25.2k 61.1k 60.8k
randwrite 4k 27.7k 19.5k 34.5k 53.0k
seqwrite 4k 7850 37.5k 24.9k 82.6k
----- Mail original -----
De: "Mark Schouten" <mark at tuxis.nl>
À: "proxmoxve" <pve-user at pve.proxmox.com>
Envoyé: Jeudi 2 Juillet 2020 15:15:20
Objet: Re: [PVE-User] High I/O waits, not sure if it's a ceph issue.
On Thu, Jul 02, 2020 at 09:06:54AM +1000, Lindsay Mathieson wrote:
> I did some adhoc testing last night - definitely a difference, in KRBD's
> favour. Both sequential and random IO was much better with it enabled.
Interesting! I just did some testing too on our demo cluster. Ceph with
6 osd's over three nodes, size 2.
root at node04:~# pveversion
pve-manager/6.2-6/ee1d7754 (running kernel: 5.4.41-1-pve)
root at node04:~# ceph -v
ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus
(stable)
rbd create fio_test --size 10G -p Ceph
rbd create map_test --size 10G -p Ceph
rbd map Ceph/map_test
When just using a write test (rw=randwrite) krbd wins, big.
rbd:
WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec
krbd:
WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec
However, using rw=randrw (rwmixread=75), things change a lot:
rbd:
READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec
WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec
krbd:
READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec
WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec
Maybe I'm interpreting or testing stuff wrong, but it looks like simply writing
to krbd is much faster, but actually trying to use that data seems slower. Let
me know what you guys think.
Attachments are being stripped, IIRC, so here's the config and the full output
of the tests:
============RESULTS==================
rbd_write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32
rbd_readwrite: (g=1): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32
krbd_write: (g=2): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
krbd_readwrite: (g=3): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 4 processes
Jobs: 1 (f=1): [_(3),f(1)][100.0%][eta 00m:00s]
rbd_write: (groupid=0, jobs=1): err= 0: pid=1846441: Thu Jul 2 15:08:42 2020
write: IOPS=9712, BW=37.9MiB/s (39.8MB/s)(10.0GiB/269904msec); 0 zone resets
slat (nsec): min=943, max=1131.9k, avg=6367.94, stdev=10934.84
clat (usec): min=1045, max=259066, avg=3286.70, stdev=4553.24
lat (usec): min=1053, max=259069, avg=3293.06, stdev=4553.20
clat percentiles (usec):
| 1.00th=[ 1844], 5.00th=[ 2114], 10.00th=[ 2311], 20.00th=[ 2573],
| 30.00th=[ 2769], 40.00th=[ 2933], 50.00th=[ 3064], 60.00th=[ 3228],
| 70.00th=[ 3425], 80.00th=[ 3621], 90.00th=[ 3982], 95.00th=[ 4359],
| 99.00th=[ 5538], 99.50th=[ 6718], 99.90th=[ 82314], 99.95th=[125305],
| 99.99th=[187696]
bw ( KiB/s): min=17413, max=40282, per=83.81%, avg=32561.17, stdev=3777.39, samples=539
iops : min= 4353, max=10070, avg=8139.93, stdev=944.34, samples=539
lat (msec) : 2=2.64%, 4=87.80%, 10=9.37%, 20=0.08%, 50=0.01%
lat (msec) : 100=0.02%, 250=0.09%, 500=0.01%
cpu : usr=8.73%, sys=5.27%, ctx=1254152, majf=0, minf=8484
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
rbd_readwrite: (groupid=1, jobs=1): err= 0: pid=1852029: Thu Jul 2 15:08:42 2020
read: IOPS=12.8k, BW=49.0MiB/s (52.4MB/s)(7678MiB/153607msec)
slat (nsec): min=315, max=4467.8k, avg=3247.91, stdev=7360.28
clat (usec): min=276, max=160495, avg=1412.53, stdev=656.11
lat (usec): min=281, max=160497, avg=1415.78, stdev=656.02
clat percentiles (usec):
| 1.00th=[ 494], 5.00th=[ 693], 10.00th=[ 832], 20.00th=[ 1012],
| 30.00th=[ 1139], 40.00th=[ 1254], 50.00th=[ 1352], 60.00th=[ 1450],
| 70.00th=[ 1549], 80.00th=[ 1696], 90.00th=[ 1926], 95.00th=[ 2343],
| 99.00th=[ 3621], 99.50th=[ 3949], 99.90th=[ 5604], 99.95th=[ 7373],
| 99.99th=[11207]
bw ( KiB/s): min=25546, max=50344, per=78.44%, avg=40147.73, stdev=2610.22, samples=306
iops : min= 6386, max=12586, avg=10036.57, stdev=652.54, samples=306
write: IOPS=4270, BW=16.7MiB/s (17.5MB/s)(2562MiB/153607msec); 0 zone resets
slat (nsec): min=990, max=555362, avg=5474.97, stdev=6241.91
clat (usec): min=1052, max=196165, avg=3239.08, stdev=3722.92
lat (usec): min=1056, max=196171, avg=3244.55, stdev=3722.91
clat percentiles (usec):
| 1.00th=[ 1663], 5.00th=[ 1991], 10.00th=[ 2180], 20.00th=[ 2442],
| 30.00th=[ 2606], 40.00th=[ 2769], 50.00th=[ 2966], 60.00th=[ 3130],
| 70.00th=[ 3359], 80.00th=[ 3654], 90.00th=[ 4359], 95.00th=[ 5014],
| 99.00th=[ 6325], 99.50th=[ 7177], 99.90th=[ 40109], 99.95th=[104334],
| 99.99th=[175113]
bw ( KiB/s): min= 8450, max=17786, per=78.45%, avg=13398.97, stdev=891.56, samples=306
iops : min= 2112, max= 4446, avg=3349.35, stdev=222.88, samples=306
lat (usec) : 500=0.79%, 750=4.30%, 1000=9.19%
lat (msec) : 2=55.67%, 4=26.22%, 10=3.78%, 20=0.03%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.02%
cpu : usr=13.97%, sys=7.94%, ctx=1729014, majf=0, minf=2214
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=1965537,655903,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
krbd_write: (groupid=2, jobs=1): err= 0: pid=1855430: Thu Jul 2 15:08:42 2020
write: IOPS=52.9k, BW=207MiB/s (217MB/s)(10.0GiB/49582msec); 0 zone resets
slat (nsec): min=1624, max=41411k, avg=17942.28, stdev=482539.15
clat (nsec): min=1495, max=41565k, avg=586889.90, stdev=2650654.87
lat (usec): min=3, max=41568, avg=604.90, stdev=2691.73
clat percentiles (usec):
| 1.00th=[ 92], 5.00th=[ 93], 10.00th=[ 93], 20.00th=[ 94],
| 30.00th=[ 95], 40.00th=[ 96], 50.00th=[ 99], 60.00th=[ 102],
| 70.00th=[ 109], 80.00th=[ 120], 90.00th=[ 139], 95.00th=[ 161],
| 99.00th=[14877], 99.50th=[18482], 99.90th=[18744], 99.95th=[22676],
| 99.99th=[22938]
bw ( KiB/s): min=61770, max=1314960, per=94.28%, avg=199384.09, stdev=331335.15, samples=99
iops : min=15442, max=328740, avg=49845.71, stdev=82833.70, samples=99
lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=55.51%
lat (usec) : 250=41.04%, 500=0.12%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 10=0.01%, 20=3.22%, 50=0.07%
cpu : usr=6.29%, sys=11.90%, ctx=4350, majf=0, minf=12
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
krbd_readwrite: (groupid=3, jobs=1): err= 0: pid=1858884: Thu Jul 2 15:08:42 2020
read: IOPS=1377, BW=5511KiB/s (5643kB/s)(7680MiB/1426930msec)
slat (usec): min=200, max=145771, avg=716.74, stdev=355.05
clat (usec): min=31, max=169171, avg=16877.60, stdev=2881.41
lat (usec): min=692, max=170099, avg=17594.97, stdev=2937.38
clat percentiles (usec):
| 1.00th=[11207], 5.00th=[12780], 10.00th=[13698], 20.00th=[14746],
| 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433],
| 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103],
| 99.00th=[25035], 99.50th=[28443], 99.90th=[34866], 99.95th=[39060],
| 99.99th=[57410]
bw ( KiB/s): min= 2312, max= 6776, per=99.99%, avg=5510.53, stdev=292.16, samples=2853
iops : min= 578, max= 1694, avg=1377.63, stdev=73.04, samples=2853
write: IOPS=459, BW=1837KiB/s (1881kB/s)(2560MiB/1426930msec); 0 zone resets
slat (nsec): min=1731, max=131919, avg=8242.43, stdev=5170.50
clat (usec): min=4, max=169165, avg=16871.71, stdev=2885.14
lat (usec): min=22, max=169182, avg=16880.10, stdev=2885.33
clat percentiles (usec):
| 1.00th=[11207], 5.00th=[12780], 10.00th=[13698], 20.00th=[14746],
| 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433],
| 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103],
| 99.00th=[25297], 99.50th=[28181], 99.90th=[34866], 99.95th=[38536],
| 99.99th=[58459]
bw ( KiB/s): min= 696, max= 2368, per=100.00%, avg=1837.14, stdev=169.59, samples=2853
iops : min= 174, max= 592, avg=459.28, stdev=42.39, samples=2853
lat (usec) : 10=0.01%, 50=0.01%, 750=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.21%, 20=90.02%, 50=9.75%
lat (msec) : 100=0.02%, 250=0.01%
cpu : usr=1.34%, sys=3.68%, ctx=1966986, majf=0, minf=15
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=1966000,655440,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec
Run status group 1 (all jobs):
READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec
WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec
Run status group 2 (all jobs):
WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec
Run status group 3 (all jobs):
READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec
WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec
Disk stats (read/write):
rbd0: ios=1965643/893981, merge=0/2379481, ticks=1366950/16608305, in_queue=14637096, util=95.65%
============FIO CONFIG==================
[global]
invalidate=0
bs=4k
iodepth=32
stonewall
[rbd_write]
ioengine=rbd
clientname=admin
pool=Ceph
rbdname=fio_test
rw=randwrite
[rbd_readwrite]
ioengine=rbd
clientname=admin
pool=Ceph
rbdname=fio_test
rw=randrw
rwmixread=75
[krbd_write]
ioengine=libaio
filename=/dev/rbd0
rw=randwrite
[krbd_readwrite]
ioengine=libaio
filename=/dev/rbd0
rw=randrw
rwmixread=75
--
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | info at tuxis.nl
_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list