[pve-devel] Proxmox 4 feedback
Gilou
contact+dev at gilouweb.com
Fri Oct 9 20:14:07 CEST 2015
Le 09/10/2015 18:36, Gilou a écrit :
> Le 09/10/2015 18:21, Dietmar Maurer a écrit :
>>> So I tried again.. HA doesn't work.
>>> Both resources are now frozen (?), and they didn't restart... Even after
>>> 5 minutes...
>>> service vm:102 (pve1, freeze)
>>> service vm:303 (pve1, freeze)
>>
>> The question is why they are frozen. The only action which
>> puts them to 'freeze' is when you shutdown a node.
>>
>
> I pulled the ethernet cables out of the to-be-failing node when I
> tested. It didn't shut down. I plugged them back in 20 minutes later.
> They were down (so I guess the fencing worked). But still?
>
OK, so I reinstalled fresh from the PVE 4 ISO 3 nodes, that are using
one single NIC to communicate with a NFS server and themselves. Cluster
is up, and one VM is protected:
# ha-manager status
quorum OK
master pve1 (active, Fri Oct 9 19:55:06 2015)
lrm pve1 (active, Fri Oct 9 19:55:12 2015)
lrm pve2 (active, Fri Oct 9 19:55:07 2015)
lrm pve3 (active, Fri Oct 9 19:55:10 2015)
service vm:100 (pve2, started)
# pvecm status
Quorum information
------------------
Date: Fri Oct 9 19:55:22 2015
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 12
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.44.129
0x00000003 1 192.168.44.132
0x00000001 1 192.168.44.143 (local)
One one of the nodes, incidentally, the one running the HA VM, I already
get those:
Oct 09 19:55:07 pve2 pve-ha-lrm[1211]: watchdog update failed - Broken pipe
Not good.
I tried to migrate to pve1 to see what happens:
Executing HA migrate for VM 100 to node pve1
unable to open file '/etc/pve/ha/crm_commands.tmp.3377' - No such file
or directory
TASK ERROR: command 'ha-manager migrate vm:100 pve1' failed: exit code 2
OK.. so we can't migrate running HA VMs ? What did I get wrong here?
So. I remove the VM from HA, I migrate it on pve1, see what happens. It
works. OK. I stop the VM. Enable HA. It won't start.
service vm:100 (pve1, freeze)
OK. And now, on pve1:
Oct 09 19:59:16 pve1 pve-ha-crm[1202]: watchdog update failed - Broken pipe
OK... Let's try pve3, cold migrate, without ha, enable ha again..
interesting, now we have:
# ha-manager status
quorum OK
master pve1 (active, Fri Oct 9 20:09:46 2015)
lrm pve1 (old timestamp - dead?, Fri Oct 9 19:58:57 2015)
lrm pve2 (active, Fri Oct 9 20:09:47 2015)
lrm pve3 (active, Fri Oct 9 20:09:50 2015)
service vm:100 (pve3, started)
Why is pve1 not reporting properly...
And now on 3 nodes:
Oct 09 20:10:40 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe
Oct 09 20:10:50 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe
Oct 09 20:11:00 pve3 pve-ha-lrm[1208]: watchdog update failed - Broken pipe
Wtf? omping reports multicast is getting through, but I'm not sure what
would be the issue there... It worked on 3.4 on the same physical setup.
So ?
More information about the pve-devel
mailing list