[pve-devel] Quorum problems with NICs Intel of 10 Gb/s and VMsturns off

Mon Dec 22 18:58:27 CET 2014

>>After several checks, I found the problem in these two servers: a 
>>configuration in the Hardware Bios that isn't compatible with the 
>>pve-kernel-3.10.0-5, and my NICs was getting the link to down and after up. 
>>(i guess that soon i will comunicate my setup of BIOS in Dell R720). 
>>... :-) 

I'm interested to known what is this option ;)

>>The strange behaviour is that when i run "pvecm status", i get this message: 
>>Version: 6.2.0 
>>Config Version: 41 
>>Cluster Name: ptrading 
>>Cluster Id: 28503 
>>Cluster Member: Yes 
>Cluster Generation: 8360 
>>Membership state: Cluster-Member 
>>Nodes: 8 
>>Expected votes: 8 
>>Total votes: 8 
>>Node votes: 1 
>>Quorum: 5 
>>Active subsystems: 6 
>>Flags: 
>>Ports Bound: 0 177 
>>Node name: pve5 
>>Node ID: 5 
>>Multicast addresses: 239.192.111.198 
>>Node addresses: 192.100.100.50 

So, you have quorum here. All nodes are ok . I don't see any problem.

>>And in the PVE GUI i see the red light in all the others nodes. 

That's mean that the pvestatd daemon is hanging/crashed.

Can you check that you can write to  /etc/pve.

if not, try to restart

/etc/init.d/pve-cluster restart

then 

/etc/init.d/pvedaemon restart
/etc/init.d/pvestatd restart

----- Mail original -----
De: "Cesar Peschiera" <brain at click.com.py>
À: "aderumier" <aderumier at odiso.com>, "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Lundi 22 Décembre 2014 04:01:31
Objet: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and VMsturns off

After several checks, I found the problem in these two servers: a 
configuration in the Hardware Bios that isn't compatible with the 
pve-kernel-3.10.0-5, and my NICs was getting the link to down and after up. 
(i guess that soon i will comunicate my setup of BIOS in Dell R720). 
... :-) 

But now i have other problem, with the mix of PVE-manager 3.3-5 and 2.3-13 
versions in a PVE cluster of 8 nodes: I am losing quorum in several nodes 
very often. 

Moreover, for now i can not apply a upgrade to my old PVE nodes, so for the 
moment i would like to know if is possible to make a quick configuration for 
that all my nodes always has quorum. 

The strange behaviour is that when i run "pvecm status", i get this message: 
Version: 6.2.0 
Config Version: 41 
Cluster Name: ptrading 
Cluster Id: 28503 
Cluster Member: Yes 
Cluster Generation: 8360 
Membership state: Cluster-Member 
Nodes: 8 
Expected votes: 8 
Total votes: 8 
Node votes: 1 
Quorum: 5 
Active subsystems: 6 
Flags: 
Ports Bound: 0 177 
Node name: pve5 
Node ID: 5 
Multicast addresses: 239.192.111.198 
Node addresses: 192.100.100.50 

And in the PVE GUI i see the red light in all the others nodes. 

Can apply a some kind of temporal solution as "Quorum: 1" for that my nodes 
can work well and not has this strange behaviour? (Only until I performed 
the updates) 
Or, what will be the more simple and quick temporal solution for avoid to do 
a upgrade in my nodes? 
(something as for example: add to the rc.local file a line that says: "pvecm 
expected 1") 

Note about of the Quorum: I don't have any Hardware fence device enabled, so 
i do not care that each node always have quorum (i always can turns off the 
server manually and brutally if it is necessary). 

----- Original Message ----- 
From: "Cesar Peschiera" <brain at click.com.py> 
To: "Alexandre DERUMIER" <aderumier at odiso.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Sent: Saturday, December 20, 2014 9:30 AM 
Subject: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
VMsturns off 

> Hi Alexandre 
> 
> I put 192.100.100.51 ip address directly to bond0, and i don't have 
> network 
> enabled (as if the node is totally isolated) 
> 
> This was my setup: 
> ------------------- 
> auto bond0 
> iface bond0 inet static 
> address 192.100.100.51 
> netmask 255.255.255.0 
> gateway 192.100.100.4 
> slaves eth0 eth2 
> bond_miimon 100 
> bond_mode 802.3ad 
> bond_xmit_hash_policy layer2 
> 
> auto vmbr0 
> iface vmbr0 inet manual 
> bridge_ports bond0 
> bridge_stp off 
> bridge_fd 0 
> post-up echo 0 > /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping 
> post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_querier 
> 
> ...... :-( 
> 
> Some other suggestion? 
> 
> ----- Original Message ----- 
> From: "Alexandre DERUMIER" <aderumier at odiso.com> 
> To: "Cesar Peschiera" <brain at click.com.py> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Sent: Friday, December 19, 2014 7:59 AM 
> Subject: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
> VMsturns off 
> 
> 
> maybe can you try to put 192.100.100.51 ip address directly to bond0, 
> 
> to avoid corosync traffic going through to vmbr0. 
> 
> (I remember some old offloading bugs with 10gbe nic and linux bridge) 
> 
> 
> ----- Mail original ----- 
> De: "Cesar Peschiera" <brain at click.com.py> 
> À: "aderumier" <aderumier at odiso.com> 
> Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Vendredi 19 Décembre 2014 11:08:33 
> Objet: Re: [pve-devel] Quorum problems with NICs Intel of 10 Gb/s and 
> VMsturns off 
> 
>>can you post your /etc/network/interfaces of theses 10gb/s nodes ? 
> 
> This is my configuration: 
> Note: The LAN use 192.100.100.0/24 
> 
> #Network interfaces 
> auto lo 
> iface lo inet loopback 
> 
> iface eth0 inet manual 
> iface eth1 inet manual 
> iface eth2 inet manual 
> iface eth3 inet manual 
> iface eth4 inet manual 
> iface eth5 inet manual 
> iface eth6 inet manual 
> iface eth7 inet manual 
> iface eth8 inet manual 
> iface eth9 inet manual 
> iface eth10 inet manual 
> iface eth11 inet manual 
> 
> #PVE Cluster and VMs (NICs are of 10 Gb/s): 
> auto bond0 
> iface bond0 inet manual 
> slaves eth0 eth2 
> bond_miimon 100 
> bond_mode 802.3ad 
> bond_xmit_hash_policy layer2 
> 
> #PVE Cluster and VMs: 
> auto vmbr0 
> iface vmbr0 inet static 
> address 192.100.100.51 
> netmask 255.255.255.0 
> gateway 192.100.100.4 
> bridge_ports bond0 
> bridge_stp off 
> bridge_fd 0 
> post-up echo 0 > 
> /sys/devices/virtual/net/vmbr0/bridge/multicast_snooping 
> post-up echo 1 > /sys/class/net/vmbr0/bridge/multicast_querier 
> 
> #A link for DRBD (NICs are of 10 Gb/s): 
> auto bond401 
> iface bond401 inet static 
> address 10.1.1.51 
> netmask 255.255.255.0 
> slaves eth1 eth3 
> bond_miimon 100 
> bond_mode balance-rr 
> mtu 9000 
> 
> #Other link for DRBD (NICs are of 10 Gb/s): 
> auto bond402 
> iface bond402 inet static 
> address 10.2.2.51 
> netmask 255.255.255.0 
> slaves eth4 eth6 
> bond_miimon 100 
> bond_mode balance-rr 
> mtu 9000 
> 
> #Other link for DRBD (NICs are of 10 Gb/s): 
> auto bond403 
> iface bond403 inet static 
> address 10.3.3.51 
> netmask 255.255.255.0 
> slaves eth5 eth7 
> bond_miimon 100 
> bond_mode balance-rr 
> mtu 9000 
> 
> #A link for the NFS-Backups (NICs are of 1 Gb/s): 
> auto bond10 
> iface bond10 inet static 
> address 10.100.100.51 
> netmask 255.255.255.0 
> slaves eth8 eth10 
> bond_miimon 100 
> bond_mode balance-rr 
> #bond_mode active-backup 
> mtu 9000 
>