[PVE-User] ceph rebalance/ raw vs pool usage

Alwin Antreich a.antreich at proxmox.com
Wed May 8 12:33:53 CEST 2019


On Wed, May 08, 2019 at 09:34:44AM +0100, Mark Adams wrote:
> Thanks for getting back to me Alwin. See my response below.
> 
> 
> I have the same size and count in each node, but I have had a disk failure
> (has been replaced) and also had issues with osds dropping when that memory
> allocation bug was around just before last christmas (Think it was when
> they made some bluestore updates, then the next release they increased the
> default memory allocation to rectify the issue) so that could have messed
> up the balance.
Ok, that can impact the distribution of PGs. Could you please post the
crush tunables too? Maybe there could be something to tweak, besides the
reweight-by-utilization.

> 
> ceph osd df tree:
> 
> ID CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE
> NAME
> -1       209.58572        -  210TiB  151TiB 58.8TiB 71.92 1.00   - root
> default
> -3        69.86191        - 69.9TiB 50.2TiB 19.6TiB 71.91 1.00   -     host
> prod-pve1
>  0   ssd   6.98619  0.90002 6.99TiB 5.70TiB 1.29TiB 81.54 1.13 116
>  osd.0
>  1   ssd   6.98619  1.00000 6.99TiB 5.49TiB 1.49TiB 78.65 1.09 112
>  osd.1
>  2   ssd   6.98619  1.00000 6.99TiB 4.95TiB 2.03TiB 70.88 0.99 101
>  osd.2
>  4   ssd   6.98619  1.00000 6.99TiB 4.90TiB 2.09TiB 70.11 0.97 100
>  osd.4
>  5   ssd   6.98619  1.00000 6.99TiB 4.52TiB 2.47TiB 64.67 0.90  92
>  osd.5
>  6   ssd   6.98619  1.00000 6.99TiB 5.34TiB 1.64TiB 76.50 1.06 109
>  osd.6
>  7   ssd   6.98619  1.00000 6.99TiB 4.56TiB 2.42TiB 65.31 0.91  93
>  osd.7
>  8   ssd   6.98619  1.00000 6.99TiB 4.91TiB 2.08TiB 70.21 0.98 100
>  osd.8
>  9   ssd   6.98619  1.00000 6.99TiB 4.66TiB 2.32TiB 66.76 0.93  95
>  osd.9
> 30   ssd   6.98619  1.00000 6.99TiB 5.20TiB 1.78TiB 74.49 1.04 106
>  osd.30
> -5        69.86191        - 69.9TiB 50.3TiB 19.6TiB 71.93 1.00   -     host
> prod-pve2
> 10   ssd   6.98619  1.00000 6.99TiB 4.47TiB 2.52TiB 63.92 0.89  91
>  osd.10
> 11   ssd   6.98619  1.00000 6.99TiB 4.86TiB 2.13TiB 69.53 0.97  99
>  osd.11
> 12   ssd   6.98619  1.00000 6.99TiB 4.46TiB 2.52TiB 63.91 0.89  91
>  osd.12
> 13   ssd   6.98619  1.00000 6.99TiB 4.71TiB 2.28TiB 67.43 0.94  96
>  osd.13
> 14   ssd   6.98619  1.00000 6.99TiB 5.50TiB 1.49TiB 78.68 1.09 112
>  osd.14
> 15   ssd   6.98619  1.00000 6.99TiB 5.20TiB 1.79TiB 74.38 1.03 106
>  osd.15
> 16   ssd   6.98619  1.00000 6.99TiB 4.66TiB 2.32TiB 66.74 0.93  95
>  osd.16
> 17   ssd   6.98619  1.00000 6.99TiB 5.51TiB 1.48TiB 78.84 1.10 112
>  osd.17
> 18   ssd   6.98619  1.00000 6.99TiB 5.40TiB 1.59TiB 77.24 1.07 110
>  osd.18
> 19   ssd   6.98619  1.00000 6.99TiB 5.50TiB 1.49TiB 78.66 1.09 112
>  osd.19
> -7        69.86191        - 69.9TiB 50.2TiB 19.6TiB 71.93 1.00   -     host
> prod-pve3
> 20   ssd   6.98619  1.00000 6.99TiB 4.22TiB 2.77TiB 60.40 0.84  86
>  osd.20
> 21   ssd   6.98619  1.00000 6.99TiB 4.43TiB 2.56TiB 63.35 0.88  90
>  osd.21
> 22   ssd   6.98619  0.95001 6.99TiB 5.69TiB 1.30TiB 81.45 1.13 116
>  osd.22
> 23   ssd   6.98619  1.00000 6.99TiB 4.67TiB 2.32TiB 66.79 0.93  95
>  osd.23
> 24   ssd   6.98619  0.95001 6.99TiB 5.74TiB 1.24TiB 82.20 1.14 117
>  osd.24
> 25   ssd   6.98619  1.00000 6.99TiB 4.51TiB 2.47TiB 64.59 0.90  92
>  osd.25
> 26   ssd   6.98619  1.00000 6.99TiB 4.90TiB 2.09TiB 70.15 0.98 100
>  osd.26
> 27   ssd   6.98619  1.00000 6.99TiB 5.39TiB 1.59TiB 77.21 1.07 110
>  osd.27
> 28   ssd   6.98619  1.00000 6.99TiB 5.69TiB 1.29TiB 81.47 1.13 116
>  osd.28
> 29   ssd   6.98619  1.00000 6.99TiB 5.00TiB 1.98TiB 71.63 1.00 102
>  osd.29
>                       TOTAL  210TiB  151TiB 58.8TiB 71.92
> 
> MIN/MAX VAR: 0.84/1.14  STDDEV: 6.44
How many placement groups do(es) your pool(s) have?

> 
> 
> 
> >
> > >
> > > Is it safe enough to keep tweaking this? (I believe I should run ceph osd
> > > reweight-by-utilization 101 0.05 15) Is there any gotchas I need to be
> > > aware of when doing this apart from the obvious load of reshuffling the
> > > data around? The cluster has 30 OSDs and it looks like it will reweight
> > 13.
> > Your cluster may get more and more unbalanced. Eg. making a OSD
> > replacement a bigger challenge.
> >
> >
> It can make the balance worse? I thought the whole point was to get it back
> in balance! :)
Yes, but just meant, be carefull. ;) I have re-read the section in
ceph's docs and the reweights are relative to eachother. So, it should
not do much harm, but I faintly recall that I had issues with PG
distribution afterwards. My old memory. ^^

--
Cheers,
Alwin



More information about the pve-user mailing list