[PVE-User] CEPH: How to remove an OSD without experiencing inactive placement groups

Fri Dec 19 19:54:26 CET 2014

On 14-12-19 12:46 PM, Chris Murray wrote:
> Hi Eneko,
>
> Thank you. From what I'm reading, those options affect the amount of concurrent recovery that is happening. Forgive my ignorance, but how does it address the 78 placement groups which were inactive from the beginning of the process and past the end of the process?
>
> My google for the following doesn't turn up much:
> "stuck inactive" "osd max backfills" "osd recovery max active"
>
> I don't understand why these would become 'stuck inactive' until I brought the OSD up again. If it were a case of the lack of IO in the pool getting in the way of recovery (which I can understand with only nine disks), why were there 78 pgs inactive from the beginning, then (presumably the same) 78 at the end? I might expect in that situation that the VMs would be slow, and at the end of the process or part-way through when the IO has subsided, CEPH would decide that they become one of the active states again. I'm not familiar with the inner workings of CEPH and they are probably complex enough to just go over my head anyway; just trying to understand roughly what it's chosen to do there and why. I can see why those tunables might improve the responsiveness during the recovery process though.

AFAIK you're exactly right about those settings.
What I found was the only way to work around it was to adjust the "size" 
and "min_size" pool options to "1" before removing the OSD, then set 
them back to whatever you wanted after OSD removal.
I think what's happening is that CEPH is noticing that there are a bunch 
of pages that, while replicated elsewhere, are still valid, that are now 
offline... not 100% sure.

I wish sheepdog would hurry up and mature, it's much less complicated 
for small-scale situations (1<n<32 hosts) like you and I are running.  
After ignoring multiple warnings from Proxmox staff, I configured 
sheepdog, saw fantastic performance (esp. compared to CEPH) and ... 
promptly got burned when the next update changed the metadata format 
with *no* in-place upgrade option.  (But until then it was awesome.)

CEPH is a solid option, and I'm glad PVE includes it, but it's very big 
and complex and cumbersome for low disk-count, low host-count setups.  
(E.g. I have 4 hosts, with 2 OSDs each.  CEPH isn't really designed to 
scale down that small, at least not very well.)

-- 
-Adam Thompson
  athompso at athompso.net