[pve-devel] Proxmox 4 feedback

Fri Oct 9 20:14:07 CEST 2015

Le 09/10/2015 18:36, Gilou a écrit :
> Le 09/10/2015 18:21, Dietmar Maurer a écrit :
>>> So I tried again.. HA doesn't work.
>>> Both resources are now frozen (?), and they didn't restart... Even after
>>> 5 minutes...
>>> service vm:102 (pve1, freeze)
>>> service vm:303 (pve1, freeze)
>>
>> The question is why they are frozen. The only action which 
>> puts them to 'freeze' is when you shutdown a node.
>>
> 
> I pulled the ethernet cables out of the to-be-failing node when I
> tested. It didn't shut down. I plugged them back in 20 minutes later.
> They were down (so I guess the fencing worked). But still?
> 

OK, so I reinstalled fresh from the PVE 4 ISO 3 nodes, that are using
one single NIC to communicate with a NFS server and themselves. Cluster
is up, and one VM is protected:
# ha-manager status
quorum OK
master pve1 (active, Fri Oct  9 19:55:06 2015)
lrm pve1 (active, Fri Oct  9 19:55:12 2015)
lrm pve2 (active, Fri Oct  9 19:55:07 2015)
lrm pve3 (active, Fri Oct  9 19:55:10 2015)
service vm:100 (pve2, started)
# pvecm status
Quorum information
------------------
Date:             Fri Oct  9 19:55:22 2015
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          12
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.44.129
0x00000003          1 192.168.44.132
0x00000001          1 192.168.44.143 (local)

One one of the nodes, incidentally, the one running the HA VM, I already
get those:
Oct 09 19:55:07 pve2 pve-ha-lrm[1211]: watchdog update failed - Broken pipe

Not good.
I tried to migrate to pve1 to see what happens:
Executing HA migrate for VM 100 to node pve1
unable to open file '/etc/pve/ha/crm_commands.tmp.3377' - No such file
or directory
TASK ERROR: command 'ha-manager migrate vm:100 pve1' failed: exit code 2

OK.. so we can't migrate running HA VMs ? What did I get wrong here?
So. I remove the VM from HA, I migrate it on pve1, see what happens. It
works. OK. I stop the VM. Enable HA. It won't start.
service vm:100 (pve1, freeze)

OK. And now, on pve1:
Oct 09 19:59:16 pve1 pve-ha-crm[1202]: watchdog update failed - Broken pipe

OK... Let's try pve3, cold migrate, without ha, enable ha again..
interesting, now we have:
# ha-manager status
quorum OK
master pve1 (active, Fri Oct  9 20:09:46 2015)
lrm pve1 (old timestamp - dead?, Fri Oct  9 19:58:57 2015)
lrm pve2 (active, Fri Oct  9 20:09:47 2015)
lrm pve3 (active, Fri Oct  9 20:09:50 2015)
service vm:100 (pve3, started)

Why is pve1 not reporting properly...

And now on 3 nodes:
Oct 09 20:10:40 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe
Oct 09 20:10:50 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe
Oct 09 20:11:00 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe

Wtf? omping reports multicast is getting through, but I'm not sure what
would be the issue there... It worked on 3.4 on the same physical setup.
So ?