[PVE-User] CEPH: How to remove an OSD without experiencing inactive placement groups
Chris Murray
chrismurray84 at gmail.com
Fri Dec 19 11:31:43 CET 2014
I'm starting to familiarise myself with CEPH, and am very impressed with
how it's been packaged into Proxmox. Very easy to set up and administer,
thank you. This may be a CEPH question at heart, but I'll ask here in
case it's related to the implementation in Proxmox.
I might be misunderstanding in/out/up/down, but what is the correct
procedure for OSD removal?
I have three hosts, each with 3 OSDs. In addition to the usual three
pools, there's an additional 'vmpool' pool. All four have size=3 and
min_size=1. There's quite a difference in disk sizes and possibly
different degrees of health.
There's a mapping to 'vmpool' from another Proxmox cluster, upon which
some virtual machines live.
So, the pool works, but I want to remove OSD.0 on the first CEPH node.
I mark the OSD as 'down' and 'out' (although which I did first I can't
remember), and a load of IO starts and VMs become unresponsive. They
aren't very busy virtual machines.
'ceph status' looks as follows. Note the 78 stuck inactive placement
groups.
cluster e3dd7a1a-bd5f-43fe-a06f-58e830b93b7a
health HEALTH_WARN 48 pgs backfill; 2 pgs backfilling; 340 pgs
degraded; 20 pgs recovering; 123 pgs recovery_wait; 78 pgs stuck
inactive; 613 pgs stuck unclean; 20 requests are blocked > 32 sec;
recovery 158823/691378 objects degraded (22.972%)
monmap e3: 3 mons at
{0=192.168.12.25:6789/0,1=192.168.12.26:6789/0,2=192.168.12.27:6789/0},
election epoch 50, quorum 0,1,2 0,1,2
osdmap e572: 9 osds: 8 up, 8 in
pgmap v96969: 1216 pgs, 4 pools, 888 GB data, 223 kobjects
2249 GB used, 7486 GB / 9736 GB avail
158823/691378 objects degraded (22.972%)
8 active+recovering+remapped
78 inactive
72 active+recovery_wait
603 active+clean
2 active+degraded+remapped+backfilling
12 active+recovering
290 active+degraded
52 active+remapped
51 active+recovery_wait+remapped
48 active+degraded+remapped+wait_backfill
recovery io 17591 kB/s, 4 objects/s
I leave this overnight, and find that the same 78 remain when the
process has apparently finished.
cluster e3dd7a1a-bd5f-43fe-a06f-58e830b93b7a
health HEALTH_WARN 290 pgs degraded; 78 pgs stuck inactive; 496 pgs
stuck unclean; 4 requests are blocked > 32 sec; recovery 69696/685356
objects degraded (10.169%)
monmap e3: 3 mons at
{0=192.168.12.25:6789/0,1=192.168.12.26:6789/0,2=192.168.12.27:6789/0},
election epoch 50, quorum 0,1,2 0,1,2
osdmap e669: 9 osds: 8 up, 8 in
pgmap v100175: 1216 pgs, 4 pools, 888 GB data, 223 kobjects
2408 GB used, 7327 GB / 9736 GB avail
69696/685356 objects degraded (10.169%)
78 inactive
720 active+clean
290 active+degraded
128 active+remapped
I started the OSD to bring it back 'up'. It's still 'out'.
cluster e3dd7a1a-bd5f-43fe-a06f-58e830b93b7a
health HEALTH_WARN 59 pgs degraded; 496 pgs stuck unclean; recovery
30513/688554 objects degraded (4.431%)
monmap e3: 3 mons at
{0=192.168.12.25:6789/0,1=192.168.12.26:6789/0,2=192.168.12.27:6789/0},
election epoch 50, quorum 0,1,2 0,1,2
osdmap e671: 9 osds: 9 up, 8 in
pgmap v103181: 1216 pgs, 4 pools, 892 GB data, 224 kobjects
2408 GB used, 7327 GB / 9736 GB avail
30513/688554 objects degraded (4.431%)
720 active+clean
59 active+degraded
437 active+remapped
client io 2303 kB/s rd, 153 kB/s wr, 85 op/s
No pgs marked inactive now. I stop the OSD. It's now 'down' and 'out'
again, as it was earlier. At this point, I start my virtual machines
again, which now function.
cluster e3dd7a1a-bd5f-43fe-a06f-58e830b93b7a
health HEALTH_WARN 368 pgs degraded; 496 pgs stuck unclean;
recovery 83332/688554 objects degraded (12.102%)
monmap e3: 3 mons at
{0=192.168.12.25:6789/0,1=192.168.12.26:6789/0,2=192.168.12.27:6789/0},
election epoch 50, quorum 0,1,2 0,1,2
osdmap e673: 9 osds: 8 up, 8 in
pgmap v103248: 1216 pgs, 4 pools, 892 GB data, 224 kobjects
2408 GB used, 7327 GB / 9736 GB avail
83332/688554 objects degraded (12.102%)
720 active+clean
368 active+degraded
128 active+remapped
client io 19845 B/s wr, 6 op/s
Remove the OSD, and activity starts to move data around, as I'd expect.
The VMs are slow but they're working, which is good. :-)
cluster e3dd7a1a-bd5f-43fe-a06f-58e830b93b7a
health HEALTH_WARN 35 pgs backfill; 8 pgs backfilling; 43 pgs
degraded; 17 pgs recovering; 122 pgs recovery_wait; 631 pgs stuck
unclean; 1 requests are blocked > 32 sec; recovery 295039/709243 objects
degraded (41.599%)
monmap e3: 3 mons at
{0=192.168.12.25:6789/0,1=192.168.12.26:6789/0,2=192.168.12.27:6789/0},
election epoch 50, quorum 0,1,2 0,1,2
osdmap e690: 8 osds: 8 up, 8 in
pgmap v103723: 1216 pgs, 4 pools, 892 GB data, 224 kobjects
2412 GB used, 7323 GB / 9736 GB avail
295039/709243 objects degraded (41.599%)
401 active
122 active+recovery_wait
13 active+degraded+remapped
567 active+clean
11 active+remapped+wait_backfill
6 active+degraded+remapped+backfilling
17 active+recovering
53 active+remapped
24 active+degraded+remapped+wait_backfill
2 active+remapped+backfilling
recovery io 197 MB/s, 49 objects/s
client io 7721 B/s wr, 2 op/s
--------
My question is: what is the correct procedure for removing an OSD, and
why would the actions above have rendered placement groups temporarily
'blocked' for want of a better word, when there were other replicas of
data available in the pool (and must have been for the process to
ultimately complete). What if the same sequence of actions happened
during an actual failure, but it was not possible to start the OSD to
bring it back 'up' first? E.g. disk failure then entire host failure.
I understand this is an emerging technology with active development;
just want to check I'm not missing anything obvious or haven't
fundamentally misunderstood how it works. I didn't expect the loss of
1/9 of the devices in the pool to cease IO, especially when every object
exists three times.
Thanks in advance,
Chris
More information about the pve-user
mailing list