[PVE-User] Cluster problem...

Fri Jul 11 19:43:37 CEST 2014

Hi...

That's another problem running around Cluster Administration on Proxmox...

I set two VirtualBox VM's running latest PVE version...
My laptop, an Intel Core i5 running Ubuntu, act as a Storage with TGT
Target...
I am able to create the cluster and define the quorum disk...
However, when I reboot both nodes, I get this error:

Starting qdiskd [ FAILED ]...
No local IP Address has been set...

I think something with DLM lock or something similar issue...

But, if I go to CLI and check:

pve01:~# /etc/init.d/cman status
qdiskd is stopped

pve01:~# /etc/init.d/rgmanager status
rgmanager is stopped

On both nodes, cman and rgmanager are dead!

So, if yype the sequence command bellow:

/etc/init.d/cman start
/etc/init.d/rgmanager start

On both nodes, the cluster go on-line....

I had experience with this issue in physical machines two...

First, I tought that could be a problem with VirtualBox VM's but it is
not...

So, as a workaround, I put this command in rc.local:

/etc/init.d/cman stop
/etc/init.d/rgmanager stop

/etc/init.d/cman start
/etc/init.d/rgmanager start

in order to bring cluster on-line...

Here's the cluster.conf:

<?xml version="1.0"?>
<cluster config_version="35" name="CLUSTER">
  <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
   <quorumd allow_kill="0" interval="3" label="quorum" tko="10" votes="1">
   <heuristic interval="3" program="ping 192.168.1.100 -c1 -w1" score="1"
tko="4"/>
   <heuristic interval="3" program="ip addr | grep eth0 | grep -q UP"
score="2" tko="3"/>
 </quorumd>
 <totem token="54000"/>
  <clusternodes>
    <clusternode name="pve01" nodeid="1" votes="1">
    </clusternode>
    <clusternode name="pve02" nodeid="2" votes="1">
    </clusternode>
  </clusternodes>
  <rm>
   <failoverdomains>
     <failoverdomain name="serverfailover" ordered="1" restricted="0">
       <failoverdomainnode name="pve01" priority="1"/>
       <failoverdomainnode name="pve02" priority="2"/>
     </failoverdomain>
   </failoverdomains>
    <pvevm autostart="1" vmid="100"/>
  </rm>
</cluster>

And /etc/default/redhat-cluster-pve has the content:

FENCE_JOIN="yes"

After running this:

/etc/init.d/cman stop
/etc/init.d/cman start
/etc/init.d/rgmanager stop
/etc/init.d/rgmanager start
/etc/init.d/pve-cluster stop
/etc/init.d/pve-cluster start
/etc/init.d/pveproxy stop
/etc/init.d/pveproxy start

My cluster get on-line, but the more weird issue is here:

clustat
Cluster Status for CLUSTER @ Thu Jul 10 11:43:30 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 pve01                                                               1
Online, Local, rgmanager
 pve02                                                               2
Online
 /dev/block/8:33                                                     0
Online, Quorum Disk

 Service Name                                                     Owner
(Last)                                                     State
 ------- ----                                                     -----
------                                                     -----
 pvevm:100                                                        pve01
                                                       starting

I remove such VM, 100... It doesn't exist anymore.... But stiil there,
according to clustat!!!

Seconds after, I run clustat again and got this message:

pve01:~# clustat
Cluster Status for CLUSTER @ Thu Jul 10 11:43:46 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 pve01                                                               1
Online, Local, rgmanager
 pve02                                                               2
Online
 /dev/block/8:33                                                     0
Online, Quorum Disk

 Service Name                                                     Owner
(Last)                                                     State
 ------- ----                                                     -----
------                                                     -----
 pvevm:100                                                        (none)
                                                        recoverable

And finally:

clustat
Cluster Status for CLUSTER @ Thu Jul 10 11:44:07 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 pve01                                                               1
Online, Local, rgmanager
 pve02                                                               2
Online, rgmanager
 /dev/block/8:33                                                     0
Online, Quorum Disk

 Service Name                                                     Owner
(Last)                                                     State
 ------- ----                                                     -----
------                                                     -----
 pvevm:100                                                        (pve01)
                                                       failed

But, again, there is no VM...

Is there something I do wrong?

-- 
Gilberto Ferreira
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20140711/e3a388de/attachment.htm>