[PVE-User] ceph rebalance/ raw vs pool usage

Mark Adams mark at openvs.co.uk
Thu May 9 01:42:43 CEST 2019


Doing some more research this evening, it turns out the big divergence
between the POOLS %USED and GLOBAL %RAW USED I've had is because the pool
numbers are based on the amount of space that the most full OSD has left.

So if you have 1 OSD that is disproportionately full, the %USED for POOLS
will only show you the capacity you have until that overweight OSD is full.

I've done quite a bit of reweighting and the %USED (POOLS) and %RAW USED
(GLOBAL) are now much closer together.

Cheers for your help so far Alwin - If you have any suggestions to improve
things based on my current tunables I would love to have your input.

Cheers,
Mark

On Wed, 8 May 2019 at 11:53, Mark Adams <mark at openvs.co.uk> wrote:

>
>
> On Wed, 8 May 2019 at 11:34, Alwin Antreich <a.antreich at proxmox.com>
> wrote:
>
>> On Wed, May 08, 2019 at 09:34:44AM +0100, Mark Adams wrote:
>> > Thanks for getting back to me Alwin. See my response below.
>> >
>> >
>> > I have the same size and count in each node, but I have had a disk
>> failure
>> > (has been replaced) and also had issues with osds dropping when that
>> memory
>> > allocation bug was around just before last christmas (Think it was when
>> > they made some bluestore updates, then the next release they increased
>> the
>> > default memory allocation to rectify the issue) so that could have
>> messed
>> > up the balance.
>> Ok, that can impact the distribution of PGs. Could you please post the
>> crush tunables too? Maybe there could be something to tweak, besides the
>> reweight-by-utilization.
>>
>
>   "choose_local_tries": 0,
>     "choose_local_fallback_tries": 0,
>     "choose_total_tries": 50,
>     "chooseleaf_descend_once": 1,
>     "chooseleaf_vary_r": 1,
>     "chooseleaf_stable": 1,
>     "straw_calc_version": 1,
>     "allowed_bucket_algs": 54,
>     "profile": "jewel",
>     "optimal_tunables": 1,
>     "legacy_tunables": 0,
>     "minimum_required_version": "jewel",
>     "require_feature_tunables": 1,
>     "require_feature_tunables2": 1,
>     "has_v2_rules": 0,
>     "require_feature_tunables3": 1,
>     "has_v3_rules": 0,
>     "has_v4_buckets": 1,
>     "require_feature_tunables5": 1,
>     "has_v5_rules": 0
>
>
>> >
>> > ceph osd df tree:
>> >
>> > ID CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE
>> > NAME
>> > -1       209.58572        -  210TiB  151TiB 58.8TiB 71.92 1.00   - root
>> > default
>> > -3        69.86191        - 69.9TiB 50.2TiB 19.6TiB 71.91 1.00   -
>>  host
>> > prod-pve1
>> >  0   ssd   6.98619  0.90002 6.99TiB 5.70TiB 1.29TiB 81.54 1.13 116
>> >  osd.0
>> >  1   ssd   6.98619  1.00000 6.99TiB 5.49TiB 1.49TiB 78.65 1.09 112
>> >  osd.1
>> >  2   ssd   6.98619  1.00000 6.99TiB 4.95TiB 2.03TiB 70.88 0.99 101
>> >  osd.2
>> >  4   ssd   6.98619  1.00000 6.99TiB 4.90TiB 2.09TiB 70.11 0.97 100
>> >  osd.4
>> >  5   ssd   6.98619  1.00000 6.99TiB 4.52TiB 2.47TiB 64.67 0.90  92
>> >  osd.5
>> >  6   ssd   6.98619  1.00000 6.99TiB 5.34TiB 1.64TiB 76.50 1.06 109
>> >  osd.6
>> >  7   ssd   6.98619  1.00000 6.99TiB 4.56TiB 2.42TiB 65.31 0.91  93
>> >  osd.7
>> >  8   ssd   6.98619  1.00000 6.99TiB 4.91TiB 2.08TiB 70.21 0.98 100
>> >  osd.8
>> >  9   ssd   6.98619  1.00000 6.99TiB 4.66TiB 2.32TiB 66.76 0.93  95
>> >  osd.9
>> > 30   ssd   6.98619  1.00000 6.99TiB 5.20TiB 1.78TiB 74.49 1.04 106
>> >  osd.30
>> > -5        69.86191        - 69.9TiB 50.3TiB 19.6TiB 71.93 1.00   -
>>  host
>> > prod-pve2
>> > 10   ssd   6.98619  1.00000 6.99TiB 4.47TiB 2.52TiB 63.92 0.89  91
>> >  osd.10
>> > 11   ssd   6.98619  1.00000 6.99TiB 4.86TiB 2.13TiB 69.53 0.97  99
>> >  osd.11
>> > 12   ssd   6.98619  1.00000 6.99TiB 4.46TiB 2.52TiB 63.91 0.89  91
>> >  osd.12
>> > 13   ssd   6.98619  1.00000 6.99TiB 4.71TiB 2.28TiB 67.43 0.94  96
>> >  osd.13
>> > 14   ssd   6.98619  1.00000 6.99TiB 5.50TiB 1.49TiB 78.68 1.09 112
>> >  osd.14
>> > 15   ssd   6.98619  1.00000 6.99TiB 5.20TiB 1.79TiB 74.38 1.03 106
>> >  osd.15
>> > 16   ssd   6.98619  1.00000 6.99TiB 4.66TiB 2.32TiB 66.74 0.93  95
>> >  osd.16
>> > 17   ssd   6.98619  1.00000 6.99TiB 5.51TiB 1.48TiB 78.84 1.10 112
>> >  osd.17
>> > 18   ssd   6.98619  1.00000 6.99TiB 5.40TiB 1.59TiB 77.24 1.07 110
>> >  osd.18
>> > 19   ssd   6.98619  1.00000 6.99TiB 5.50TiB 1.49TiB 78.66 1.09 112
>> >  osd.19
>> > -7        69.86191        - 69.9TiB 50.2TiB 19.6TiB 71.93 1.00   -
>>  host
>> > prod-pve3
>> > 20   ssd   6.98619  1.00000 6.99TiB 4.22TiB 2.77TiB 60.40 0.84  86
>> >  osd.20
>> > 21   ssd   6.98619  1.00000 6.99TiB 4.43TiB 2.56TiB 63.35 0.88  90
>> >  osd.21
>> > 22   ssd   6.98619  0.95001 6.99TiB 5.69TiB 1.30TiB 81.45 1.13 116
>> >  osd.22
>> > 23   ssd   6.98619  1.00000 6.99TiB 4.67TiB 2.32TiB 66.79 0.93  95
>> >  osd.23
>> > 24   ssd   6.98619  0.95001 6.99TiB 5.74TiB 1.24TiB 82.20 1.14 117
>> >  osd.24
>> > 25   ssd   6.98619  1.00000 6.99TiB 4.51TiB 2.47TiB 64.59 0.90  92
>> >  osd.25
>> > 26   ssd   6.98619  1.00000 6.99TiB 4.90TiB 2.09TiB 70.15 0.98 100
>> >  osd.26
>> > 27   ssd   6.98619  1.00000 6.99TiB 5.39TiB 1.59TiB 77.21 1.07 110
>> >  osd.27
>> > 28   ssd   6.98619  1.00000 6.99TiB 5.69TiB 1.29TiB 81.47 1.13 116
>> >  osd.28
>> > 29   ssd   6.98619  1.00000 6.99TiB 5.00TiB 1.98TiB 71.63 1.00 102
>> >  osd.29
>> >                       TOTAL  210TiB  151TiB 58.8TiB 71.92
>> >
>> > MIN/MAX VAR: 0.84/1.14  STDDEV: 6.44
>> How many placement groups do(es) your pool(s) have?
>>
>>
> 1024
>
> Cheers!
>
>> >
>> >
>> >
>> > >
>> > > >
>> > > > Is it safe enough to keep tweaking this? (I believe I should run
>> ceph osd
>> > > > reweight-by-utilization 101 0.05 15) Is there any gotchas I need to
>> be
>> > > > aware of when doing this apart from the obvious load of reshuffling
>> the
>> > > > data around? The cluster has 30 OSDs and it looks like it will
>> reweight
>> > > 13.
>> > > Your cluster may get more and more unbalanced. Eg. making a OSD
>> > > replacement a bigger challenge.
>> > >
>> > >
>> > It can make the balance worse? I thought the whole point was to get it
>> back
>> > in balance! :)
>> Yes, but just meant, be carefull. ;) I have re-read the section in
>> ceph's docs and the reweights are relative to eachother. So, it should
>> not do much harm, but I faintly recall that I had issues with PG
>> distribution afterwards. My old memory. ^^
>>
>> --
>> Cheers,
>> Alwin
>
>


More information about the pve-user mailing list