[PVE-User] Missing node in ha-manager

Thomas Lamprecht t.lamprecht at proxmox.com
Tue May 2 09:05:43 CEST 2017


Hi,


On 05/01/2017 11:45 PM, Mark Schouten wrote:
> Hi,
>
> I recently added a new node to a cluster which is also running with HA. The fourth node seems to be working fine, but one of the other nodes is confused. pvecm nodes shows the full hostname for the new node, and the short one for the existing nodes. So probably a result of a imperfect /etc/hosts-file. I corrected the hosts-file on all nodes, but pvecm nodes still shows the incorrect output.

No, the node names for the cluster are not resolved from the /etc/hosts 
file, but from /etc/pve/corosync.conf (either the `name` property, or if 
not set, the ring0_addr property). The hosts file can help to resolve 
the node names to their IPs, normally corosync can do that over the 
multicast group, but it's considered good practice to have a valid 
nodenames to IP mapping in /etc/hosts nevertheless.

Can you control that the config looks the same on all nodes?
Especially the difference between working and misbehaving nodes would be 
interesting.

>
> Also, in HA, the new node does not exist on the misbehaving node.  In the logs I see:
> May 01 09:38:36 proxmox03 pve-ha-crm[2777]: node 'proxmox04': state changed from 'unknown' => 'gone'
> May 01 09:38:36 proxmox03 pve-ha-crm[2777]: crm command error - node not online: migrate vm:222 proxmox04
>
> Which is a result of http://pve-devel.pve.proxmox.narkive.com/Eafo8CAz/patch-pve-ha-manager-handle-node-deletion-in-the-ha-stack <http://pve-devel.pve.proxmox.narkive.com/Eafo8CAz/patch-pve-ha-manager-handle-node-deletion-in-the-ha-stack> . I understand why this is done, but I would like to fix this without rebooting the misbehaving node. Can I restart pve-ha-crm to make things right again? /etc/pve/.members on the misbehaving node does not mention the new node at all…

In general you could just restart CRM, but the CRM is capable of syncing 
in new nodes while running, so there shouldn't be any need for that, the 
patches you linked also do not change that, AFAIK.
As /etc/pve.members doesn't shows the new node on the misbehaving one 
the problem is another one.
Who is the current master? Can you give me an output of:
# ha-manager status
# pvecm status
# cat /etc/pve/corosync.conf

 From the misbehaving node and a "OK" one? Remember to redact public IP 
addresses.

cheers,
Thomas




More information about the pve-user mailing list