[PVE-User] Missing node in ha-manager
Thomas Lamprecht
t.lamprecht at proxmox.com
Fri May 5 18:14:15 CEST 2017
Hi,
It looks like the PVE cluster filesystem is out of sync or has a
problematic connection to corosync.
From corosyncs stand point the node addition worked, and is consistent
on all nodes, which is good.
Now, some log from `proxmox03`- the problematic node - would be nice:
# journalctl -u corosync -u pve-cluster
As I may not get back to you until Tuesday I give you a quite possible
resolution now:
I'd suggest restarting pve-cluster *but* as the pve-ha-lrm and its
watchdog is active on node proxmox03,
it could result in a node reset *if* pve-cluster cannot connect back to
corosync or fails in another way.
This is not likely but if you can not schedule a maintenance window it
should be taken care of.
First restartthe pve-ha-crm so that another mode takes up the master role.
Then either move all HA-Services from this node or remove them
temporarily and stop the pve-ha-lrm and pve-ha-crm services:
# systemctl stop pve-ha-lrm pve-ha-crm
now restart the pve-cluster and corosync (just to be sure) service:
systemctl restart pve-cluster corosync
and check
# cat /etc/pve/.members
It should show all members and the same version number as the other
members. If thats the case start pve-ha-lrm and crm again,
all should be clear now again.
Oh and sorry for getting back a bit late :)
cheers,
Thomas
On 05/05/2017 09:38 AM, Mark Schouten wrote:
> Thomas, pretty please? :)
>
> On Wed, 2017-05-03 at 09:45 +0200, Mark Schouten wrote:
>> On Tue, 2017-05-02 at 09:05 +0200, Thomas Lamprecht wrote:
>>> Can you control that the config looks the same on all nodes?
>>> Especially the difference between working and misbehaving nodes
>>> would
>>> be
>>> interesting.
>> Please see the attachment. That includes /etc/pve/.members and
>> /etc/pve/corosync.conf from all nodes. Only the .members file of the
>> misbehaving node is off.
>>
>>> In general you could just restart CRM, but the CRM is capable of
>>> syncing
>>> in new nodes while running, so there shouldn't be any need for
>>> that,
>>> the
>>> patches you linked also do not change that, AFAIK.
>> I would like to do a sync without a restart as well, but what would
>> trigger this?
>>
>>> As /etc/pve.members doesn't shows the new node on the misbehaving
>>> one
>>> the problem is another one.
>>> Who is the current master? Can you give me an output of:
>>> # ha-manager status
>>> # pvecm status
>>> # cat /etc/pve/corosync.conf
>> Output in the attachment. Because the misbehaving node also is the
>> master, output of ha-manager is identical on all nodes.
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list