[pve-devel] [PATCH cluster/docs/manager/network/proxmox{, -ve-rs, -firewall, -perl-rs} 00/52] Add SDN Fabrics
Friedrich Weber
f.weber at proxmox.com
Thu Apr 3 10:30:05 CEST 2025
On 28/03/2025 18:12, Gabriel Goller wrote:
> This series allows the user to add fabrics such as OpenFabric and OSPF over
> their clusters.
>
> Overview
> ========
>
> This series allows the user to create routed networks ('fabrics') across their
> clusters, which can be used as the underlay network for a EVPN cluster, or for
> creating Ceph full mesh clusters easily.
>
> This patch series adds the initial support for two routing protocols:
> * OpenFabric
> * OSPF
I tested a bit with packages provided Gabriel built for me (thanks!),
both OSPF and OpenFabric, and also set up a Ceph full mesh over OpenFabric.
Overall it looked quite smooth! I didn't notice huge issues, but have
some minor points below:
- I think the error message when frr+frr-pythontools is not installed
looked a bit scary. It's on me for not reading the docs, but still,
might be nice to have a friendlier error message in that case :)
- having already added one node, and then adding another using the "Add
Node" dialog, it has happened multiple times that I kept "Node" at the
default first node (which I already had defined) while I thought I was
configuring the second one, and only noticed when I submitted and got
"node already exists". And then, when I change the "Node" to the correct
one, I lost my form input :) I understand that we need to reload when
changing "Node" (the other node might have other interfaces), but to
avoid the above, maybe the dialog could preselect a node that is not yet
defined?
- when removing a fabric, the IP addresses defined on the interfaces
remain until the next reboot. I guess the reason is that ifupdown2
doesn't remove IP addresses when the corresponding stanza vanishes. Not
sure if this can be easily fixed -- if not, maybe this would be worth a
note in the docs?
- when removing the only fabric and applying, the srvreload task has a
couple of spurious error messages:
> 2025-04-03 09:35:59,354 [91m ERROR[0m: Filename /etc/frr/frr.conf is an empty file
> frr reload command fail: command '/usr/lib/frr/frr-reload.py --stdout --reload /etc/frr/frr.conf' failed: exit code 1
> Restarting frr. at /usr/share/perl5/PVE/Network/SDN/Frr.pm line 74.
> TASK OK
- regarding the hello/csnp intervals: it would be nice to mention what the
default values are. Also, probably not relevant for this patch series, but
wanted to mention anyway: For running a Ceph full mesh over a fabric,
one probably wants to set relatively low values here (as our wiki guide
does [3])? If there is a guide in the future for setting up Ceph full mesh
over fabric, would be nice if the guide would mention that.
- I'm not so sure about this, but maybe it would be nice to show the
default-hidden hello/csnp interval columns if I have entered a value
there?
- when I remove hello interval+multiplier and the csnp via the GUI, I get
the following warning in the journal:
> Apr 03 10:20:50 fabric159 pveproxy[9244]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330.
> Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330.
> Apr 03 10:21:02 fabric159 pveproxy[9246]: Use of uninitialized value $id in concatenation (.) or string at /usr/share/perl5/PVE/API2/Network/SDN/Fabrics.pm line 330.
- after setting up an OSPF fabric in a 3-node full mesh, I couldn't ping
the loopback addresses until I rebooted all nodes. I've attached the
task logs of the srvreloads and the ospf.cfg below [1]. After a reboot,
the pings work fine. Could it be because an OSPF with the same area
existed previously?
- probably a user error, but: after setting up an OpenFabric fabric and
rebooting, the routes didn't come up automatically. My openfabric.cfg is
in [2]. systemctl status frr shows the following:
> Apr 03 10:02:20 fabric159 systemd[1]: Started frr.service - FRRouting.
> Apr 03 10:02:21 fabric159 fabricd[699]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
> Apr 03 10:03:48 fabric159 fabricd[699]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
> Apr 03 10:02:23 fabric160 systemd[1]: Started frr.service - FRRouting.
> Apr 03 10:02:24 fabric160 fabricd[674]: [MZS0T-YRAMC] OpenFabric: Initial synchronization on ens19 complete.
> Apr 03 10:03:48 fabric160 fabricd[674]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
> Apr 03 10:02:19 fabric161 systemd[1]: Started frr.service - FRRouting.
> Apr 03 10:02:21 fabric161 fabricd[681]: [MZS0T-YRAMC] OpenFabric: Initial synchronization on ens20 complete.
> Apr 03 10:03:48 fabric161 fabricd[681]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
Maybe I'm just too impatient, but estarting frr and waiting for ~30 seconds fixes it.
[1]
fabric159:
2025-04-03 09:30:06,673 INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)"
2025-04-03 09:30:06,673 INFO: Loading Config object from file /etc/frr/frr.conf
2025-04-03 09:30:06,690 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:06,697 INFO: "frr defaults traditional" cannot be removed
2025-04-03 09:30:06,703 INFO: Executed "ip forwarding"
2025-04-03 09:30:06,709 INFO: Executed "ipv6 forwarding"
2025-04-03 09:30:06,709 INFO: /var/run/frr/reload-B14N3D.txt content
['frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.159\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.159\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1667|mgmtd] sending configuration
[1668|zebra] sending configuration
[1671|ospfd] sending configuration
[1674|bgpd] sending configuration
[1668|zebra] done
[1682|watchfrr] sending configuration
[1684|staticd] sending configuration
[1685|bfdd] sending configuration
Waiting for children to finish applying config...
[1682|watchfrr] done
[1674|bgpd] done
[1684|staticd] done
[1685|bfdd] done
[1667|mgmtd] done
[1671|ospfd] done
2025-04-03 09:30:06,721 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:06,729 INFO: /var/run/frr/reload-UJJQIC.txt content
['line vty\nexit\n',
'frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.159\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.159\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1692|mgmtd] sending configuration
[1693|zebra] sending configuration
[1696|ospfd] sending configuration
[1699|bgpd] sending configuration
[1693|zebra] done
[1707|watchfrr] sending configuration
[1709|staticd] sending configuration
[1710|bfdd] sending configuration
Waiting for children to finish applying config...
[1707|watchfrr] done
[1696|ospfd] done
MGMTD: No changes found to be committed!
[1692|mgmtd] done
[1709|staticd] done
[1699|bgpd] done
[1710|bfdd] done
TASK OK
fabric160:
2025-04-03 09:30:09,972 INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)"
2025-04-03 09:30:09,972 INFO: Loading Config object from file /etc/frr/frr.conf
2025-04-03 09:30:09,985 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:09,992 INFO: "frr defaults traditional" cannot be removed
2025-04-03 09:30:09,998 INFO: Executed "ip forwarding"
2025-04-03 09:30:10,004 INFO: Executed "ipv6 forwarding"
2025-04-03 09:30:10,004 INFO: /var/run/frr/reload-5ATLT2.txt content
['frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.160\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.160\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1699|mgmtd] sending configuration
[1700|zebra] sending configuration
[1703|ospfd] sending configuration
[1706|bgpd] sending configuration
[1700|zebra] done
[1714|watchfrr] sending configuration
[1716|staticd] sending configuration
[1717|bfdd] sending configuration
Waiting for children to finish applying config...
[1714|watchfrr] done
[1716|staticd] done
[1706|bgpd] done
[1717|bfdd] done
[1699|mgmtd] done
[1703|ospfd] done
2025-04-03 09:30:10,016 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:10,023 INFO: /var/run/frr/reload-NFS4UM.txt content
['line vty\nexit\n',
'frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.160\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.160\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1724|mgmtd] sending configuration
[1725|zebra] sending configuration
[1728|ospfd] sending configuration
[1731|bgpd] sending configuration
[1739|watchfrr] sending configuration
[1725|zebra] done
[1741|staticd] sending configuration
[1742|bfdd] sending configuration
Waiting for children to finish applying config...
[1739|watchfrr] done
[1741|staticd] done
[1728|ospfd] done
[1731|bgpd] done
[1742|bfdd] done
MGMTD: No changes found to be committed!
[1724|mgmtd] done
TASK OK
fabric161:
2025-04-03 09:30:08,321 INFO: Called via "Namespace(input=None, reload=True, test=False, debug=False, log_level='info', stdout=True, pathspace=None, filename='/etc/frr/frr.conf', overwrite=False, bindir='/usr/bin', confdir='/etc/frr', rundir='/var/run/frr', vty_socket=None, daemon='', test_reset=False)"
2025-04-03 09:30:08,321 INFO: Loading Config object from file /etc/frr/frr.conf
2025-04-03 09:30:08,334 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:08,342 INFO: "frr defaults traditional" cannot be removed
2025-04-03 09:30:08,348 INFO: Executed "ip forwarding"
2025-04-03 09:30:08,354 INFO: Executed "ipv6 forwarding"
2025-04-03 09:30:08,354 INFO: /var/run/frr/reload-PVFBCH.txt content
['frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.161\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.161\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1671|mgmtd] sending configuration
[1672|zebra] sending configuration
[1675|ospfd] sending configuration
[1678|bgpd] sending configuration
[1686|watchfrr] sending configuration
[1688|staticd] sending configuration
[1672|zebra] done
[1689|bfdd] sending configuration
Waiting for children to finish applying config...
[1688|staticd] done
[1686|watchfrr] done
[1689|bfdd] done
[1678|bgpd] done
[1671|mgmtd] done
[1675|ospfd] done
2025-04-03 09:30:08,367 INFO: Loading Config object from vtysh show running
2025-04-03 09:30:08,374 INFO: /var/run/frr/reload-SKOSWJ.txt content
['line vty\nexit\n',
'frr defaults datacenter\n',
'log syslog informational\n',
'router ospf\nexit\n',
'router ospf\n ospf router-id 172.16.0.161\nexit\n',
'interface dummy_1234\nexit\n',
'interface dummy_1234\n ip ospf area 1234\nexit\n',
'interface dummy_1234\n ip ospf passive\nexit\n',
'interface ens19\nexit\n',
'interface ens19\n ip ospf area 1234\nexit\n',
'interface ens20\nexit\n',
'interface ens20\n ip ospf area 1234\nexit\n',
'access-list ospf_1234_ips permit 172.16.0.0/24\n',
'route-map ospf permit 100\nexit\n',
'route-map ospf permit 100\n match ip address ospf_1234_ips\nexit\n',
'route-map ospf permit 100\n set src 172.16.0.161\nexit\n',
'ip protocol ospf route-map ospf\n',
'line vty\nexit\n']
[1696|mgmtd] sending configuration
[1697|zebra] sending configuration
[1700|ospfd] sending configuration
[1703|bgpd] sending configuration
[1697|zebra] done
[1711|watchfrr] sending configuration
[1713|staticd] sending configuration
Waiting for children to finish applying config...
[1714|bfdd] sending configuration
[1711|watchfrr] done
[1713|staticd] done
[1714|bfdd] done
[1700|ospfd] done
[1703|bgpd] done
MGMTD: No changes found to be committed!
[1696|mgmtd] done
TASK OK
# cat /etc/pve/sdn/fabrics/ospf.cfg
fabric: 1234
loopback_prefix 172.16.0.0/24
node: 1234_fabric159
interface name=ens19,ip=172.31.0.159/24
interface name=ens20,ip=172.31.2.159/24
router_id 172.16.0.159
node: 1234_fabric160
interface name=ens19,ip=172.31.0.160/24
interface name=ens20,ip=172.31.1.160/24
router_id 172.16.0.160
node: 1234_fabric161
interface name=ens19,ip=172.31.1.161/24
interface name=ens20,ip=172.31.2.161/24
router_id 172.16.0.161
[2]
# cat /etc/pve/sdn/fabrics/openfabric.cfg
fabric: fabric
hello_interval 2
loopback_prefix 172.16.0.0/24
node: fabric_fabric159
interface name=ens19,ip=172.31.0.159/24
interface name=ens20,ip=172.31.2.159/24
router_id 172.16.0.159
node: fabric_fabric160
interface name=ens19,ip=172.31.0.160/24
interface name=ens20,ip=172.31.1.160/24
router_id 172.16.0.160
node: fabric_fabric161
interface name=ens19,ip=172.31.1.161/24
interface name=ens20,ip=172.31.2.161/24
router_id 172.16.0.161
[3] https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)
More information about the pve-devel
mailing list