[PVE-User] CEPH: How to remove an OSD without experiencing inactive placement groups
Eneko Lacunza
elacunza at binovo.es
Tue Dec 30 11:45:52 CET 2014
Hi All,
Sorry for the delay getting back, I've been on holiday and yesterday was
too busy to catch up with the list.
On 19/12/14 19:54, Adam Thompson wrote:
> On 14-12-19 12:46 PM, Chris Murray wrote:
>>
>> Thank you. From what I'm reading, those options affect the amount of
>> concurrent recovery that is happening. Forgive my ignorance, but how
>> does it address the 78 placement groups which were inactive from the
>> beginning of the process and past the end of the process?
>>
>> My google for the following doesn't turn up much:
>> "stuck inactive" "osd max backfills" "osd recovery max active"
>>
>> I don't understand why these would become 'stuck inactive' until I
>> brought the OSD up again. If it were a case of the lack of IO in the
>> pool getting in the way of recovery (which I can understand with only
>> nine disks), why were there 78 pgs inactive from the beginning, then
>> (presumably the same) 78 at the end? I might expect in that situation
>> that the VMs would be slow, and at the end of the process or part-way
>> through when the IO has subsided, CEPH would decide that they become
>> one of the active states again. I'm not familiar with the inner
>> workings of CEPH and they are probably complex enough to just go over
>> my head anyway; just trying to understand roughly what it's chosen to
>> do there and why. I can see why those tunables might improve the
>> responsiveness during the recovery process though.
>
> AFAIK you're exactly right about those settings.
> What I found was the only way to work around it was to adjust the
> "size" and "min_size" pool options to "1" before removing the OSD,
> then set them back to whatever you wanted after OSD removal.
> I think what's happening is that CEPH is noticing that there are a
> bunch of pages that, while replicated elsewhere, are still valid, that
> are now offline... not 100% sure.
>
> I wish sheepdog would hurry up and mature, it's much less complicated
> for small-scale situations (1<n<32 hosts) like you and I are running.
> After ignoring multiple warnings from Proxmox staff, I configured
> sheepdog, saw fantastic performance (esp. compared to CEPH) and ...
> promptly got burned when the next update changed the metadata format
> with *no* in-place upgrade option. (But until then it was awesome.)
>
> CEPH is a solid option, and I'm glad PVE includes it, but it's very
> big and complex and cumbersome for low disk-count, low host-count
> setups. (E.g. I have 4 hosts, with 2 OSDs each. CEPH isn't really
> designed to scale down that small, at least not very well.)
>
I have experienced similar problems, but having pool size=3 then
changing it to size=2, ceph won't show HEALTH_OK. I had to change
size=1, then change back to size=2 too to get a HEALTH_OK.
I think that due to Ceph storage being developed and tested for much
larger setups (in nodes and disks), with small setups we're hitting some
rough/corner cases. :(
Anyway I like what I've seen so far, and integration in Proxmox is also
very convenient (haven't checked sheepdog/glusterfs).
Cheers
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-user
mailing list