[PVE-User] PVE 6.2 Strange cluster node fence
Eneko Lacunza
elacunza at binovo.es
Wed Apr 14 17:15:08 CEST 2021
Hi Stefan,
El 14/4/21 a las 16:49, Stefan M. Radman escribió:
>> If nodes had only one 1G interface, would you also une RRP? (one ring
>> on 1G and the other on 10G bond)
>
> That’s pretty unlikely. Usually they come in pairs ;)
Right, unless you use "entry" level servers or DIY builds ;)
>
> But yes, in that hypothetical case I’d use the available physical
> interface for ring1 and build ring2 from a tagged interface.
>
> For corosync interfaces I prefer two separate physical interfaces
> (simple, resilient).
> Bonding and tagging adds a layer of complexity you don’t want on a
> cluster heartbeat.
Sure.
>
> Find below an actual configuration of a cluster with one node having
> just 2 interfaces while the other nodes all have 4.
> The 2 interfaces are configured in an HA bond like yours and the
> corosync rings are stacked on it as tagged interfaces in their
> specific VLANs.
> VLAN684 exists on switch1 only and VLAN685 exists on switch2 only.
> The most resilient solution under the circumstances given and has been
> working like a charm for several years now.
Thanks for the examples!
Cheers
Eneko
>
> Regards
>
> Stefan
>
> NODE1 - 4 interfaces
> ====================
>
> iface eno1 inet manual
> #Gb1 - Trunk
>
> iface eno2 inet manual
> #Gb2 - Trunk
>
> auto eno3
> iface eno3 inet static
> address192.168.84.1
> netmask255.255.255.0
> #Gb3 - COROSYNC1 - VLAN684
>
> auto eno4
> iface eno4 inet static
> address192.168.85.1
> netmask255.255.255.0
> #Gb4 - COROSYNC2 - VLAN685
>
> auto bond0
> iface bond0 inet manual
> slaves eno1 eno2
> bond_miimon 100
> bond_mode active-backup
> #HA Bundle Gb1/Gb2 - Trunk
>
>
> NODE3 - 2 interfaces
> ====================
>
> iface eno1 inet manual
> #Gb1 - Trunk
>
> iface eno2 inet manual
> #Gb2 - Trunk
>
> auto bond0
> iface bond0 inet manual
> slaves eno1 eno2
> bond_miimon 100
> bond_mode active-backup
> #HA Bundle Gb1/Gb2 - Trunk
>
> auto bond0.684
> iface bond0.684 inet static
> address192.168.84.3
> netmask 255.255.255.0
> #COROSYNC1 - VLAN684
>
> auto bond0.685
> iface bond0.685 inet static
> address 192.168.85.3
> netmask 255.255.255.0
> #COROSYNC2 - VLAN685
>
>> On Apr 14, 2021, at 16:07, Eneko Lacunza <elacunza at binovo.es
>> <mailto:elacunza at binovo.es>> wrote:
>>
>> Hi Stefan,
>>
>> Thanks for your advice. Seems a really good use for otherwise unused
>> 1G ports so I'll look into configuring that.
>>
>> If nodes had only one 1G interface, would you also une RRP? (one ring
>> on 1G and the other on 10G bond)
>>
>> Thanks
>>
>> El 14/4/21 a las 15:57, Stefan M. Radman escribió:
>>> Hi Eneko
>>>
>>> That’s a nice setup and I bet it works well but you should do some
>>> hand-tuning to increase resilience.
>>>
>>> Are the unused eno1 and eno2 interfaces on-board 1GbE copper interfaces?
>>>
>>> If that’s the case I’d strongly recommend to turn them into
>>> dedicated untagged interfaces for the cluster traffic, running on
>>> two separate “rings".
>>>
>>> https://pve.proxmox.com/wiki/Separate_Cluster_Network
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380150598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vWo26hj0ANMu6mtkk9WhdbKA0TJ0%2FgalkowwssJqmjA%3D&reserved=0>
>>> https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol
>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpve.proxmox.com%2Fwiki%2FSeparate_Cluster_Network%23Redundant_Ring_Protocol&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SRJn5cb7yUPxuUTOFRnUxiyBjtCindxzPjpNBMlYuf4%3D&reserved=0>
>>>
>>> Create two corosync rings, using isolated VLANs on your two switches
>>> e.g. VLAN4001 on Switch1 and VLAN4002 on Switch2.
>>>
>>> eno1 => Switch1 => VLAN4001
>>> eno2 => Switch2 => VLAN4002
>>>
>>> Restrict VLAN4001 to the access ports where the eno1 interfaces are
>>> connected. Prune VLAN4001 from ALL trunks.
>>> Restrict VLAN4001 to the access ports where the eno2 interfaces are
>>> connected. Prune VLAN4002 from ALL trunks.
>>> Assign the eno1 and eno2 interfaces to two separate subnets and you
>>> are done.
>>>
>>> With separate rings you don’t even have to stop your cluster while
>>> migrating corosync to the new subnets.
>>> Just do them one-by-one.
>>>
>>> With corosync running on two separate rings isolated from the rest
>>> of your network you should not see any further node fencing.
>>>
>>> Stefan
>>>
>>>> On Apr 14, 2021, at 15:18, Eneko Lacunza <elacunza at binovo.es
>>>> <mailto:elacunza at binovo.es>> wrote:
>>>>
>>>> Hi Stefan,
>>>>
>>>> El 14/4/21 a las 13:22, Stefan M. Radman escribió:
>>>>> Hi Eneko
>>>>>
>>>>> Do you have separate physical interfaces for the cluster
>>>>> (corosync) traffic?
>>>> No.
>>>>> Do you have them on separate VLANs on your switches?
>>>> Onyl Ceph traffic is on VLAN91, the rest is untagged.
>>>>
>>>>> Are you running 1 or 2 corosync rings?
>>>> This is standard... no hand tuning:
>>>>
>>>> nodelist {
>>>> node {
>>>> name: proxmox1
>>>> nodeid: 2
>>>> quorum_votes: 1
>>>> ring0_addr: 192.168.90.11
>>>> }
>>>> node {
>>>> name: proxmox2
>>>> nodeid: 1
>>>> quorum_votes: 1
>>>> ring0_addr: 192.168.90.12
>>>> }
>>>> node {
>>>> name: proxmox3
>>>> nodeid: 3
>>>> quorum_votes: 1
>>>> ring0_addr: 192.168.90.13
>>>> }
>>>> }
>>>>
>>>> quorum {
>>>> provider: corosync_votequorum
>>>> }
>>>>
>>>> totem {
>>>> cluster_name: CLUSTERNAME
>>>> config_version: 3
>>>> interface {
>>>> linknumber: 0
>>>> }
>>>> ip_version: ipv4-6
>>>> secauth: on
>>>> version: 2
>>>> }
>>>>
>>>>>
>>>>> Please post your /etc/network/interfaces and explain which
>>>>> interface connects where.
>>>> auto lo
>>>> iface lo inet loopback
>>>>
>>>> iface ens2f0np0 inet manual
>>>> # Switch2
>>>>
>>>> iface ens2f1np1 inet manual
>>>> # Switch1
>>>>
>>>> iface eno1 inet manual
>>>>
>>>> iface eno2 inet manual
>>>>
>>>> auto bond0
>>>> iface bond0 inet manual
>>>> bond-slaves ens2f0np0 ens2f1np1
>>>> bond-miimon 100
>>>> bond-mode active-backup
>>>> bond-primary ens2f0np1
>>>>
>>>> auto bond0.91
>>>> iface bond0.91 inet static
>>>> address 192.168.91.11
>>>> #Ceph
>>>>
>>>> auto vmbr0
>>>> iface vmbr0 inet static
>>>> address 192.168.90.11
>>>> gateway 192.168.90.1
>>>> bridge-ports bond0
>>>> bridge-stp off
>>>> bridge-fd 0
>>>>
>>>> Thanks
>>>>>
>>>>> Thanks
>>>>>
>>>>> Stefan
>>>>>
>>>>>
>>>>>> On Apr 14, 2021, at 12:12, Eneko Lacunza via pve-user
>>>>>> <pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> *From: *Eneko Lacunza <elacunza at binovo.es
>>>>>> <mailto:elacunza at binovo.es>>
>>>>>> *Subject: **Re: [PVE-User] PVE 6.2 Strange cluster node fence*
>>>>>> *Date: *April 14, 2021 at 12:12:09 GMT+2
>>>>>> *To: *pve-user at lists.proxmox.com <mailto:pve-user at lists.proxmox.com>
>>>>>>
>>>>>>
>>>>>> Hi Michael,
>>>>>>
>>>>>> El 14/4/21 a las 11:21, Michael Rasmussen via pve-user escribió:
>>>>>>> On Wed, 14 Apr 2021 11:04:10 +0200
>>>>>>> Eneko Lacunza via pve-user<pve-user at lists.proxmox.com
>>>>>>> <mailto:pve-user at lists.proxmox.com>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Yesterday we had a strange fence happen in a PVE 6.2 cluster.
>>>>>>>>
>>>>>>>> Cluster has 3 nodes (proxmox1, proxmox2, proxmox3) and has been
>>>>>>>> operating normally for a year. Last update was on January 21st
>>>>>>>> 2021.
>>>>>>>> Storage is Ceph and nodes are connected to the same network switch
>>>>>>>> with active-pasive bonds.
>>>>>>>>
>>>>>>>> proxmox1 was fenced and automatically rebooted, then everything
>>>>>>>> recovered. HA restarted VMs in other nodes too.
>>>>>>>>
>>>>>>>> proxmox1 syslog: (no network link issues reported at device level)
>>>>>>> I have seen this occasionally and every time the cause was high
>>>>>>> network
>>>>>>> load/network congestion which caused token timeout. The default
>>>>>>> token
>>>>>>> timeout in corosync IMHO is very optimistically configured to
>>>>>>> 1000 ms
>>>>>>> so I have changed this setting to 5000 ms and after I have done
>>>>>>> this I
>>>>>>> have never seen fencing happening caused by network load/network
>>>>>>> congestion again. You could try this and see if that helps you.
>>>>>>>
>>>>>>> PS. my cluster communication is on a dedicated gb bonded vlan.
>>>>>> Thanks for the info. In this case network is 10Gbit (I see I
>>>>>> didn't include this info) but only for proxmox nodes:
>>>>>>
>>>>>> - We have 2 Dell N1124T 24x1Gbit 4xSFP+ switches
>>>>>> - Both switches are interconnected with a SFP+ DAC
>>>>>> - Active-passive Bonds in each proxmox node go one SFP+ interface
>>>>>> on each switch. Primary interfaces are configured to be on the
>>>>>> same switch.
>>>>>> - Connectivity to the LAN is done with 1 Gbit link
>>>>>> - Proxmox 2x10G Bond is used for VM networking and Ceph
>>>>>> public/private networks.
>>>>>>
>>>>>> I wouldn't expect high network load/congestion because it's on an
>>>>>> internal LAN, with 1Gbit clients. No Ceph issues/backfilling were
>>>>>> ocurring during the fence.
>>>>>>
>>>>>> Network cards are Broadcom.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eneko Lacunza
>>>>>> Zuzendari teknikoa | Director técnico
>>>>>> Binovo IT Human Project
>>>>>>
>>>>>> Tel. +34 943 569 206 | https://www.binovo.es
>>>>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.binovo.es%2F&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380160591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4%2FgMjuUWXvASTXhGHY1jaebv1O9MS8YB7K7DUa9pq3E%3D&reserved=0>
>>>>>> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>>>>>>
>>>>>> https://www.youtube.com/user/CANALBINOVO
>>>>>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FCANALBINOVO&data=04%7C01%7Csmr%40kmi.com%7Cbe75958756eb4c30831708d8ff4e99a6%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637540060380170585%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7d1X%2F9o5%2Fds6UrrotnAjhZEiKo6X0Yfvi8AfZWr%2BbNk%3D&reserved=0>
>>>>>> https://www.linkedin.com/company/37269706/
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> pve-user mailing list
>>>>>> pve-user at lists.proxmox.com
>>>>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.proxmox.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fpve-user&data=04%7C01%7Csmr%40kmi.com%7C94935b3774c84a829c8008d8ff2dcd78%7Cc2283768b8d34e008f3d85b1b4f03b33%7C0%7C0%7C637539919485970079%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0Lc31YKv%2Fm4RQEsAZlcdsuA1XidEZEgfmAwRgGT4Dlg%3D&reserved=0
>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE: /This communication may contain privileged
>>>>> and confidential information, or may otherwise be protected from
>>>>> disclosure, and is intended solely for use of the intended
>>>>> recipient(s). If you are not the intended recipient of this
>>>>> communication, please notify the sender that you have received
>>>>> this communication in error and delete and destroy all copies in
>>>>> your possession. /
>>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE: /This communication may contain privileged
>>> and confidential information, or may otherwise be protected from
>>> disclosure, and is intended solely for use of the intended
>>> recipient(s). If you are not the intended recipient of this
>>> communication, please notify the sender that you have received this
>>> communication in error and delete and destroy all copies in your
>>> possession. /
>>>
>>
>
>
> CONFIDENTIALITY NOTICE: /This communication may contain privileged and
> confidential information, or may otherwise be protected from
> disclosure, and is intended solely for use of the intended
> recipient(s). If you are not the intended recipient of this
> communication, please notify the sender that you have received this
> communication in error and delete and destroy all copies in your
> possession. /
>
EnekoLacunza
Director Técnico | Zuzendari teknikoa
Binovo IT Human Project
943 569 206 <tel:943 569 206>
elacunza at binovo.es <mailto:elacunza at binovo.es>
binovo.es <//binovo.es>
Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
youtube <https://www.youtube.com/user/CANALBINOVO/>
linkedin <https://www.linkedin.com/company/37269706/>
More information about the pve-user
mailing list