[PVE-User] inconsistency between rgmanager & pve status

Mon Oct 6 11:17:14 CEST 2014

Hello,

I had some trouble this morning with my storage nodes and i needed to 
restart some vms...
All vms restarted fine, but one.

On proxmox servers i noticed that rgmanager was keeping the vm status as 
started.

root at proxmoxt2:~# clustat | grep 140
pvevm:140 proxmoxt2 started

But it was not.

root at proxmoxt2:~# qm list| grep 140
        140 bibbona              stopped    3072              33.19 0

I tried to disable/enable with clusvcadm...which didn't work (the 
process seems to wait forever.)
Then i found a old process which seemed stalled for some time.

root     23610 31270  0 Sep28 ?        00:00:06 /usr/bin/perl -w 
/usr/share/cluster/pvevm status

As soon as i killed the process, rgmanager refreshed the status of the vm.

root at proxmoxt2:~# clustat | grep 140
  pvevm:140                      proxmoxt2 started
root at proxmoxt2:~# kill 23610
root at proxmoxt2:~# clustat | grep 140
  pvevm:140                      proxmoxt2 starting
root at proxmoxt2:~# clustat | grep 140
  pvevm:140                      proxmoxt2 stopping
root at proxmoxt2:~# clustat | grep 140
  pvevm:140                      proxmoxt2 stopping

This is not the first time it happens, and everytime i have to kill 
"pvevm status" to unblock the service.
Not sure, but maybe a timeout on "pvevm status" would help ?

-- 
Regards,
Alexandre.