[PVE-User] High I/O waits, not sure if it's a ceph issue.

Mark Schouten mark at tuxis.nl
Thu Jul 2 15:15:20 CEST 2020


On Thu, Jul 02, 2020 at 09:06:54AM +1000, Lindsay Mathieson wrote:
> I did some adhoc testing last night - definitely a difference, in KRBD's
> favour. Both sequential and random IO was much better with it enabled.

Interesting! I just did some testing too on our demo cluster. Ceph with
6 osd's over three nodes, size 2.

root at node04:~# pveversion 
pve-manager/6.2-6/ee1d7754 (running kernel: 5.4.41-1-pve)
root at node04:~# ceph -v
ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus
(stable)

rbd create fio_test --size 10G -p Ceph
rbd create map_test --size 10G -p Ceph
rbd map Ceph/map_test

When just using a write test (rw=randwrite) krbd wins, big.
rbd:
  WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec
krbd:
  WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec

However, using rw=randrw (rwmixread=75), things change a lot:
rbd:
  READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec
  WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec

krbd:
  READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec
  WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec



Maybe I'm interpreting or testing stuff wrong, but it looks like simply writing
to krbd is much faster, but actually trying to use that data seems slower. Let
me know what you guys think.


Attachments are being stripped, IIRC, so here's the config and the full output
of the tests:

============RESULTS==================
rbd_write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32
rbd_readwrite: (g=1): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32
krbd_write: (g=2): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
krbd_readwrite: (g=3): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 4 processes
Jobs: 1 (f=1): [_(3),f(1)][100.0%][eta 00m:00s]                                                   
rbd_write: (groupid=0, jobs=1): err= 0: pid=1846441: Thu Jul  2 15:08:42 2020
  write: IOPS=9712, BW=37.9MiB/s (39.8MB/s)(10.0GiB/269904msec); 0 zone resets
    slat (nsec): min=943, max=1131.9k, avg=6367.94, stdev=10934.84
    clat (usec): min=1045, max=259066, avg=3286.70, stdev=4553.24
     lat (usec): min=1053, max=259069, avg=3293.06, stdev=4553.20
    clat percentiles (usec):
     |  1.00th=[  1844],  5.00th=[  2114], 10.00th=[  2311], 20.00th=[  2573],
     | 30.00th=[  2769], 40.00th=[  2933], 50.00th=[  3064], 60.00th=[  3228],
     | 70.00th=[  3425], 80.00th=[  3621], 90.00th=[  3982], 95.00th=[  4359],
     | 99.00th=[  5538], 99.50th=[  6718], 99.90th=[ 82314], 99.95th=[125305],
     | 99.99th=[187696]
   bw (  KiB/s): min=17413, max=40282, per=83.81%, avg=32561.17, stdev=3777.39, samples=539
   iops        : min= 4353, max=10070, avg=8139.93, stdev=944.34, samples=539
  lat (msec)   : 2=2.64%, 4=87.80%, 10=9.37%, 20=0.08%, 50=0.01%
  lat (msec)   : 100=0.02%, 250=0.09%, 500=0.01%
  cpu          : usr=8.73%, sys=5.27%, ctx=1254152, majf=0, minf=8484
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
rbd_readwrite: (groupid=1, jobs=1): err= 0: pid=1852029: Thu Jul  2 15:08:42 2020
  read: IOPS=12.8k, BW=49.0MiB/s (52.4MB/s)(7678MiB/153607msec)
    slat (nsec): min=315, max=4467.8k, avg=3247.91, stdev=7360.28
    clat (usec): min=276, max=160495, avg=1412.53, stdev=656.11
     lat (usec): min=281, max=160497, avg=1415.78, stdev=656.02
    clat percentiles (usec):
     |  1.00th=[  494],  5.00th=[  693], 10.00th=[  832], 20.00th=[ 1012],
     | 30.00th=[ 1139], 40.00th=[ 1254], 50.00th=[ 1352], 60.00th=[ 1450],
     | 70.00th=[ 1549], 80.00th=[ 1696], 90.00th=[ 1926], 95.00th=[ 2343],
     | 99.00th=[ 3621], 99.50th=[ 3949], 99.90th=[ 5604], 99.95th=[ 7373],
     | 99.99th=[11207]
   bw (  KiB/s): min=25546, max=50344, per=78.44%, avg=40147.73, stdev=2610.22, samples=306
   iops        : min= 6386, max=12586, avg=10036.57, stdev=652.54, samples=306
  write: IOPS=4270, BW=16.7MiB/s (17.5MB/s)(2562MiB/153607msec); 0 zone resets
    slat (nsec): min=990, max=555362, avg=5474.97, stdev=6241.91
    clat (usec): min=1052, max=196165, avg=3239.08, stdev=3722.92
     lat (usec): min=1056, max=196171, avg=3244.55, stdev=3722.91
    clat percentiles (usec):
     |  1.00th=[  1663],  5.00th=[  1991], 10.00th=[  2180], 20.00th=[  2442],
     | 30.00th=[  2606], 40.00th=[  2769], 50.00th=[  2966], 60.00th=[  3130],
     | 70.00th=[  3359], 80.00th=[  3654], 90.00th=[  4359], 95.00th=[  5014],
     | 99.00th=[  6325], 99.50th=[  7177], 99.90th=[ 40109], 99.95th=[104334],
     | 99.99th=[175113]
   bw (  KiB/s): min= 8450, max=17786, per=78.45%, avg=13398.97, stdev=891.56, samples=306
   iops        : min= 2112, max= 4446, avg=3349.35, stdev=222.88, samples=306
  lat (usec)   : 500=0.79%, 750=4.30%, 1000=9.19%
  lat (msec)   : 2=55.67%, 4=26.22%, 10=3.78%, 20=0.03%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.02%
  cpu          : usr=13.97%, sys=7.94%, ctx=1729014, majf=0, minf=2214
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=1965537,655903,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
krbd_write: (groupid=2, jobs=1): err= 0: pid=1855430: Thu Jul  2 15:08:42 2020
  write: IOPS=52.9k, BW=207MiB/s (217MB/s)(10.0GiB/49582msec); 0 zone resets
    slat (nsec): min=1624, max=41411k, avg=17942.28, stdev=482539.15
    clat (nsec): min=1495, max=41565k, avg=586889.90, stdev=2650654.87
     lat (usec): min=3, max=41568, avg=604.90, stdev=2691.73
    clat percentiles (usec):
     |  1.00th=[   92],  5.00th=[   93], 10.00th=[   93], 20.00th=[   94],
     | 30.00th=[   95], 40.00th=[   96], 50.00th=[   99], 60.00th=[  102],
     | 70.00th=[  109], 80.00th=[  120], 90.00th=[  139], 95.00th=[  161],
     | 99.00th=[14877], 99.50th=[18482], 99.90th=[18744], 99.95th=[22676],
     | 99.99th=[22938]
   bw (  KiB/s): min=61770, max=1314960, per=94.28%, avg=199384.09, stdev=331335.15, samples=99
   iops        : min=15442, max=328740, avg=49845.71, stdev=82833.70, samples=99
  lat (usec)   : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=55.51%
  lat (usec)   : 250=41.04%, 500=0.12%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 10=0.01%, 20=3.22%, 50=0.07%
  cpu          : usr=6.29%, sys=11.90%, ctx=4350, majf=0, minf=12
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
krbd_readwrite: (groupid=3, jobs=1): err= 0: pid=1858884: Thu Jul  2 15:08:42 2020
  read: IOPS=1377, BW=5511KiB/s (5643kB/s)(7680MiB/1426930msec)
    slat (usec): min=200, max=145771, avg=716.74, stdev=355.05
    clat (usec): min=31, max=169171, avg=16877.60, stdev=2881.41
     lat (usec): min=692, max=170099, avg=17594.97, stdev=2937.38
    clat percentiles (usec):
     |  1.00th=[11207],  5.00th=[12780], 10.00th=[13698], 20.00th=[14746],
     | 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433],
     | 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103],
     | 99.00th=[25035], 99.50th=[28443], 99.90th=[34866], 99.95th=[39060],
     | 99.99th=[57410]
   bw (  KiB/s): min= 2312, max= 6776, per=99.99%, avg=5510.53, stdev=292.16, samples=2853
   iops        : min=  578, max= 1694, avg=1377.63, stdev=73.04, samples=2853
  write: IOPS=459, BW=1837KiB/s (1881kB/s)(2560MiB/1426930msec); 0 zone resets
    slat (nsec): min=1731, max=131919, avg=8242.43, stdev=5170.50
    clat (usec): min=4, max=169165, avg=16871.71, stdev=2885.14
     lat (usec): min=22, max=169182, avg=16880.10, stdev=2885.33
    clat percentiles (usec):
     |  1.00th=[11207],  5.00th=[12780], 10.00th=[13698], 20.00th=[14746],
     | 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433],
     | 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103],
     | 99.00th=[25297], 99.50th=[28181], 99.90th=[34866], 99.95th=[38536],
     | 99.99th=[58459]
   bw (  KiB/s): min=  696, max= 2368, per=100.00%, avg=1837.14, stdev=169.59, samples=2853
   iops        : min=  174, max=  592, avg=459.28, stdev=42.39, samples=2853
  lat (usec)   : 10=0.01%, 50=0.01%, 750=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.21%, 20=90.02%, 50=9.75%
  lat (msec)   : 100=0.02%, 250=0.01%
  cpu          : usr=1.34%, sys=3.68%, ctx=1966986, majf=0, minf=15
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=1966000,655440,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec

Run status group 1 (all jobs):
   READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec
  WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec

Run status group 2 (all jobs):
  WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec

Run status group 3 (all jobs):
   READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec
  WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec

Disk stats (read/write):
  rbd0: ios=1965643/893981, merge=0/2379481, ticks=1366950/16608305, in_queue=14637096, util=95.65%



============FIO CONFIG==================
[global]
invalidate=0
bs=4k
iodepth=32
stonewall

[rbd_write]
ioengine=rbd
clientname=admin
pool=Ceph
rbdname=fio_test
rw=randwrite

[rbd_readwrite]
ioengine=rbd
clientname=admin
pool=Ceph
rbdname=fio_test
rw=randrw
rwmixread=75

[krbd_write]
ioengine=libaio
filename=/dev/rbd0
rw=randwrite

[krbd_readwrite]
ioengine=libaio
filename=/dev/rbd0
rw=randrw
rwmixread=75


-- 
Mark Schouten     | Tuxis B.V.
KvK: 74698818     | http://www.tuxis.nl/
T: +31 318 200208 | info at tuxis.nl


More information about the pve-user mailing list