[PVE-User] Missing node in ha-manager

Thomas Lamprecht t.lamprecht at proxmox.com
Tue May 16 10:31:53 CEST 2017


Hi,

On 05/09/2017 08:48 AM, Mark Schouten wrote:
 > On Fri, 2017-05-05 at 18:14 +0200, Thomas Lamprecht wrote:
 >> Hi,
 >>
 >> It looks like the PVE cluster filesystem is out of sync or has a
 >> problematic connection to corosync.
 >>  From corosyncs stand point the node addition worked, and is
 >> consistent
 >> on all nodes, which is good.
 >>
 >> Now, some log from `proxmox03`- the problematic node - would be nice:
 >>
 >> # journalctl -u corosync -u pve-cluster
 > See attachment.

Hmm, you had frequent changes where the 4th node left and then joined the
cluster combined with a few message re-transmits.
The frequency of the join/left cycles is strange, besides that it's not too
strange.
Strange is that while the status module from the cfs got all members the 
dcdb
(decentral database) did not received the 4ths node updates...

 >
 > BTW: /etc/pve/.members is different on all nodes. Is that file really
 > on pmxcfs, or is it actually a 'local' file ?
 >

It isn't a local file per se, but it is in fact a virtual one, i.e. its 
content
is read only and gets produced by the the pmxcfs on the fly (locally).
It pulls the information from corosync and its own internal state.
This means that it should be completely the same on each node, when adding
nodes or similar it naturally can differ for a very short amount of time.

How is it different on the different nodes?

Honestly, I would just stop the HA services (this marks the VMs 
currently under
HA) and then do a clean restart of the pve-cluster corosync services,
I'd do this for all nodes but not all at the same time :) This is as 
safe as it
can get, no VM should be interrupted, and I expected that even if we 
know the
full trigger of this it will result in the same action.

Before that do the omping test to see if multicast works as expected in your
setup:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

If this test go through the restart should fix it.
I'm know it's not ideal but I have not too much time available currently and
such problems are often result of small details of a setup or the node 
addition
process.

cheers,
Thomas




More information about the pve-user mailing list