[PVE-User] Hanging storage tasks in all RH based VMs after update

Thu May 3 07:26:53 CEST 2018

Hi Lindsay,

>> I updated my cluster this morning (version info see end of mail) and rebooted all hosts sequentially, live migrating 
>> VMs between hosts. (Six hosts connected via 10GbE, all participating in a Ceph cluster.)
> 
> Whats your ceph status? it probably doing a massive backfill after the rolling reboot. That will kill your IO.
> 

Backfilling shouldn't be the cause as I always run "ceph osd set noout" before rebooting the servers.

When I last checked yesterday it was HEALTH_OK but now it is:

# ceph status
   cluster:
     id:     982484e6-69bf-490c-9b3a-942a179e759b
     health: HEALTH_WARN
             15 slow requests are blocked > 32 sec

   services:
     mon: 6 daemons, quorum 0,1,2,3,px-echo-cluster,px-foxtrott-cluster
     mgr: px-alpha-cluster(active), standbys: px-bravo-cluster, px-charlie-cluster, px-echo-cluster, px-delta-cluster, 
px-foxtrott-cluster
     osd: 24 osds: 24 up, 24 in

   data:
     pools:   2 pools, 576 pgs
     objects: 111k objects, 401 GB
     usage:   1287 GB used, 11046 GB / 12334 GB avail
     pgs:     576 active+clean

   io:
     client:   60071 B/s wr, 0 op/s rd, 2 op/s wr

I'll look into the slow requests and let you know.

Thanks,

	Uwe