Strange problem on bridge after upgrade to proxmox 7
Simone Piccardi
piccardi at truelite.it
Thu May 19 19:58:09 CEST 2022
Hi, I have a very strange networking problem on a Proxmox server,
emerged after upgrading from 6.4 to 7.
These the results of pveversion on the server:
root at lama10:~# pveversion -V
proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-5.15: 7.2-3
pve-kernel-helper: 7.2-3
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.35-1-pve: 5.15.35-2
pve-kernel-5.13.19-6-pve: 5.13.19-15
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
The server has 4 network interfaces, bound in pairs in active-passive
mode, then bridged. This is its /etc/network/interfaces:
auto eth0
iface eth0 inet manual
auto eth1
iface eth1 inet manual
auto eth2
iface eth2 inet manual
auto eth3
iface eth3 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eth0 eth1
bond-miimon 100
bond-mode active-backup
bond-primary eth0
auto bond1
iface bond1 inet manual
bond-slaves eth2 eth3
bond-miimon 100
bond-mode active-backup
bond-primary eth2
auto vmbr0
iface vmbr0 inet static
address 192.168.250.110/23
gateway 192.168.250.254
bridge-ports bond0
bridge-stp off
bridge-fd 0
auto vmbr1
iface vmbr1 inet static
address 192.168.223.110/24
bridge-ports bond1
bridge-stp off
bridge-fd 0
The network problems comes only for connectiong to the virtual machines
hosted by the server (no container are used), there is no problem at all
for connecting to the server. The only anomaly I could find is that it
seems that the bridge makes mac-address of some of the VM coming from a
wrong internal port, so they become unreachable.
To explain what this means, I put 3 test VM on the server (two debian 11
and a windows one, just to exclude problem at operating system level)
using vmbr1 bridge; their tap interfaces are:
root at lama10:~# brctl show vmbr1
bridge name bridge id STP enabled interfaces
vmbr1 8000.7a576e974a37 no bond1
tap403i0
tap404i0
tap603i0
Sometime some of them are working and some are not. When I was writing
this email the VM 404 was not working. Looking at tap404i0 mac address I
got:
root at lama10:~# ip -br link show dev tap404i0
tap404i0 UNKNOWN 26:6f:0c:19:95:58
<BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP>
while the 404 VM own mac address is:
root at lama10:~# grep vmbr1 /etc/pve/qemu-server/404.conf
net0: virtio=BE:47:4C:D5:5D:A9,bridge=vmbr1
and when I look at these mac address seen inside vmbr1 I got:
root at lama10:~# brctl showmacs vmbr1 | egrep -i
'(26:6f:0c:19:95:58|BE:47:4C:D5:5D:A9)'
4 26:6f:0c:19:95:58 yes 0.00
4 26:6f:0c:19:95:58 yes 0.00
1 be:47:4c:d5:5d:a9 no 0.65
doing the same for another VM that was working (mac address are found as
above) I found instead:
root at lama10:~# brctl showmacs vmbr1 | egrep -i
'(92:4f:ec:7e:8a:e1|DE:A3:E6:96:0C:6E)'
3 92:4f:ec:7e:8a:e1 yes 0.00
3 92:4f:ec:7e:8a:e1 yes 0.00
3 de:a3:e6:96:0c:6e no 2.32
Note: with "working" I mean that a VM is normally reachable by network
without packet loss. I checked in multiple times and in other servers
and in all working cases the the ports inside the vmbrX switch are the
same for the TAP mac and the VM mac, as expected. When not working the
VM own mac seems always to be associated to port 1 (the one of the
bonding interface).
What I find in a "not working" VM is that ARP reply is never received
(looking with tcpdump run using the console). The arp request are sent,
and seen in other VM or on the host, but no reply are seen.
Having a working VM is almost casual (or at least I could not find a
pattern up to now). After stopping and restarting the above working VM I
got it not working anymore and the port on the bridge changed:
root at lama10:~# brctl showmacs vmbr1 | egrep -i
'(92:4f:ec:7e:8a:e1|DE:A3:E6:96:0C:6E)'
3 92:4f:ec:7e:8a:e1 yes 0.00
3 92:4f:ec:7e:8a:e1 yes 0.00
1 de:a3:e6:96:0c:6e no 0.86
What make this behaviour "strange" is that other two identical machines
with same Proxmox version (they are in cluster with this one, and inside
a blades rack) are just working fine. And no problem on the cluster
(like I said, no network problems at all for the server itself).
The only difference on the other two fully working nodes is that their
bonding is configures as lacp. That was not possible for this one; it
got loop error messages when configured, so I had to remove that
configuration to avoid disturbance on the other two nodes, were all
production VM were migrated and are running whitout problems.
But another standalone server (with the same Proxmox version of all
other ones) that's outside the blade rack and it's also configured with
active-passive bonding, is working fine.
So despite the difference in network configuration between all these
servers I still cannot imagine how the different kind of bonding or the
use of a different switch can have an impact on this problem. In the
previous example I cannot ping 404 VM nor from the server itself nor
from the the other working VM hosted inside the server itself, and this
kind of traffic is completely internal traffic, done inside vmbr1.
So I'm asking directions about what to search, and where to look to find
how the ports inside the bridge are allocated, or any other suggestion
useful to have some light on this issue.
Simone
--
Simone Piccardi Truelite Srl
piccardi at truelite.it (email/jabber) Via Monferrato, 6
Tel. +39-347-1032433 50142 Firenze
http://www.truelite.it Tel. +39-055-7879597
More information about the pve-user
mailing list