[pve-devel] ceph create pool with min_size=1 not possible anymore with last gui wizard

Dominik Csapak d.csapak at proxmox.com
Mon Jun 7 09:14:13 CEST 2021


On 6/7/21 08:57, aderumier at odiso.com wrote:
> Le vendredi 04 juin 2021 à 15:23 +0200, Dominik Csapak a écrit :
>> On 6/4/21 04:47, aderumier at odiso.com <mailto:aderumier at odiso.com> wrote:
>>> Hi,
>>>
>>
>>
>> Hi,
>>
>>> I was doing a training week with students,
>>>
>>> and I see that the new ceph wizard to create pool don't allow to set
>>> min_size=1 anymore.
>>>
>>> It's currently displaying a warning "min_size <= size/2 can lead to
>>> data loss, incomplete PGs or unfound objects",
>>>
>>> that's ok  ,  but It's also blocking the validation button.
>>>
>>
>> yes, in our experience, setting min_size to 1 is always a bad idea
>> and most likely not what you want
>>
>> what is possible though is to either create the pool on the cli,
>> or changing the min_size after creation to 1 (this is not blocked)
>>
> yes, Sute. I could be great to be able to change size/min_size from the 
> gui too.
> 
> 

this should already possible in current versions, but as i said
not for pool creation, only afterwards

> 
>>>
>>>
>>> Some users with small cluster/budgets want to do only size=2,
>>>
>>> so with min_size=2, the cluster will go read only in case of any osd
>>> down.
>>>
>>> It could be great to allow at least min_size=1 when size=2 is used.
>>>
>>
>> "great" but very dangerous
>>
>>>
>>> also,
>>> Other setup like size=4, min_size=2, also display the warning, but
>>> allow to validate the form.
>>>
>>> I'm not sure this warning is correct in this case , as since octopus,
>>> min_size
>>> is auto compute when pool is created, and a simple
>>>
>>> ceph osd pool create mypool 128 --size=4  , create a pool with
>>> min_size=2 by default.
>>>
>>>
>>
>> the rationale behind this decision was (i think) because
>> if you have exactly 50% min_size of size (e.g. 4/2)
>> you can get inconsistent pgs, with no quorum as to
>> which pg is correct?
>> (though don't quote me on that)
>>
>> so i think its always better to have > 50% min_size of size
>>
> Well, afaik, they are no "quorum" on pg consistency for repair currently.
> if a pg is corrupt, ceph is simply copy data from a pg copy where 
> checksum is ok.
> and if no checksum is available, it take a random copy. (maybe it need a 
> manual pg_repair in this case).
> But They are not something like "theses 2 copies have the more majority 
> (quorum) of checksum.
> 
> (Maybe I'm wrong, but 1 or 2 year ago, Sage have confirmed this on the 
> ceph mailing)
> 
> 

i thought more about 'inconsistent' pgs, maybe i am wrong
but how does ceph cope with multiple 'valid' objects (all checksums are 
ok) but different content? (e.g, when during a write, theres a
power cut?) i assumed that there a 'majority' must be
established?

although i did not find any document to support that, and in [0]
it is only mentioned it will take the authoritative copy

i'll discuss this with my colleages, and check more sources to maybe
relax the '> 50%' rule a little for the warning

thanks :)

0: https://docs.ceph.com/en/latest/rados/operations/pg-repair






More information about the pve-devel mailing list