[PVE-User] Missing node in ha-manager
Thomas Lamprecht
t.lamprecht at proxmox.com
Tue May 2 09:05:43 CEST 2017
Hi,
On 05/01/2017 11:45 PM, Mark Schouten wrote:
> Hi,
>
> I recently added a new node to a cluster which is also running with HA. The fourth node seems to be working fine, but one of the other nodes is confused. pvecm nodes shows the full hostname for the new node, and the short one for the existing nodes. So probably a result of a imperfect /etc/hosts-file. I corrected the hosts-file on all nodes, but pvecm nodes still shows the incorrect output.
No, the node names for the cluster are not resolved from the /etc/hosts
file, but from /etc/pve/corosync.conf (either the `name` property, or if
not set, the ring0_addr property). The hosts file can help to resolve
the node names to their IPs, normally corosync can do that over the
multicast group, but it's considered good practice to have a valid
nodenames to IP mapping in /etc/hosts nevertheless.
Can you control that the config looks the same on all nodes?
Especially the difference between working and misbehaving nodes would be
interesting.
>
> Also, in HA, the new node does not exist on the misbehaving node. In the logs I see:
> May 01 09:38:36 proxmox03 pve-ha-crm[2777]: node 'proxmox04': state changed from 'unknown' => 'gone'
> May 01 09:38:36 proxmox03 pve-ha-crm[2777]: crm command error - node not online: migrate vm:222 proxmox04
>
> Which is a result of http://pve-devel.pve.proxmox.narkive.com/Eafo8CAz/patch-pve-ha-manager-handle-node-deletion-in-the-ha-stack <http://pve-devel.pve.proxmox.narkive.com/Eafo8CAz/patch-pve-ha-manager-handle-node-deletion-in-the-ha-stack> . I understand why this is done, but I would like to fix this without rebooting the misbehaving node. Can I restart pve-ha-crm to make things right again? /etc/pve/.members on the misbehaving node does not mention the new node at all…
In general you could just restart CRM, but the CRM is capable of syncing
in new nodes while running, so there shouldn't be any need for that, the
patches you linked also do not change that, AFAIK.
As /etc/pve.members doesn't shows the new node on the misbehaving one
the problem is another one.
Who is the current master? Can you give me an output of:
# ha-manager status
# pvecm status
# cat /etc/pve/corosync.conf
From the misbehaving node and a "OK" one? Remember to redact public IP
addresses.
cheers,
Thomas
More information about the pve-user
mailing list