[PVE-User] Missing node in ha-manager

Thomas Lamprecht t.lamprecht at proxmox.com
Fri May 5 18:14:15 CEST 2017


Hi,

It looks like the PVE cluster filesystem is out of sync or has a 
problematic connection to corosync.
 From corosyncs stand point the node addition worked, and is consistent 
on all nodes, which is good.

Now, some log from `proxmox03`- the problematic node - would be nice:

# journalctl -u corosync -u pve-cluster

As I may not get back to you  until Tuesday I give you a quite possible 
resolution now:

I'd suggest restarting pve-cluster *but* as the pve-ha-lrm and its 
watchdog is active on node proxmox03,
it could result in a node reset *if* pve-cluster cannot connect back to 
corosync or fails in another way.
This is not likely but if you can not schedule a maintenance window it 
should be taken care of.
First restartthe pve-ha-crm so that another mode takes up the master role.
Then either move all HA-Services from this node or remove them 
temporarily and stop the pve-ha-lrm and pve-ha-crm services:

# systemctl stop pve-ha-lrm pve-ha-crm

now restart the pve-cluster and corosync (just to be sure) service:

systemctl restart pve-cluster corosync

and check

# cat /etc/pve/.members

It should show all members and the same version number as the other 
members. If thats the case start pve-ha-lrm and crm again,
all should be clear now again.

Oh and sorry for getting back a bit late :)

cheers,
Thomas


On 05/05/2017 09:38 AM, Mark Schouten wrote:
> Thomas, pretty please? :)
>
> On Wed, 2017-05-03 at 09:45 +0200, Mark Schouten wrote:
>> On Tue, 2017-05-02 at 09:05 +0200, Thomas Lamprecht wrote:
>>> Can you control that the config looks the same on all nodes?
>>> Especially the difference between working and misbehaving nodes
>>> would
>>> be
>>> interesting.
>> Please see the attachment. That includes /etc/pve/.members and
>> /etc/pve/corosync.conf from all nodes. Only the .members file of the
>> misbehaving node is off.
>>
>>> In general you could just restart CRM, but the CRM is capable of
>>> syncing
>>> in new nodes while running, so there shouldn't be any need for
>>> that,
>>> the
>>> patches you linked also do not change that, AFAIK.
>> I would like to do a sync without a restart as well, but what would
>> trigger this?
>>
>>> As /etc/pve.members doesn't shows the new node on the misbehaving
>>> one
>>> the problem is another one.
>>> Who is the current master? Can you give me an output of:
>>> # ha-manager status
>>> # pvecm status
>>> # cat /etc/pve/corosync.conf
>> Output in the attachment. Because the misbehaving node also is the
>> master, output of ha-manager is identical on all nodes.
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user





More information about the pve-user mailing list