[pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ?

Alexandre DERUMIER aderumier at odiso.com
Mon Jan 22 10:00:28 CET 2018


Hi,

we have done a POC at work, with vxlan + ebgp-vpn,
and we have tested a very nice feature : anycast gateway.

basicaly, each vmbrX of a specific tenant, have the same ipaddress/mac address.
This ip is the default gateway of the vm.

That mean that vm can be migrate across hosts :)

(openstack have a network model like this called "dvr", distributed virtual routing, vmware :nsx Distributed Logical Router (DLR))

I think this model works very fine with proxmox, as all proxmox nodes are master.


This open a lot of possiblities:

- distributed dhcp on all cluster nodes (btw, http://kea.isc.org seem to be a very good candidate, can be extended with custom backend)
- distributed dns
- S-nat for vm private only -> internet. (need 1 public ip on each hosts)
- 1:1 nat (aka floating ip in openstack,google cloud,...) : a public ip is created on host, and "migrate" at the same time than a vm
  https://assafmuller.com/2015/04/15/distributed-virtual-routing-floating-ips/
- cloudinit metadata-server should be easy to implement
- add vrf support to isolate host (avoid connect from vm to the host gateway ip), or prevent inter-routing between tenants
- maybe other cool stuffs I don't have think yet about :)



I really like the vxlan bgp evpn, because it's a standard, with no central controller,
and we can also use it on physical switch/routers, differents proxmox clusters,and also on docker/kubernetes cluster.

Also, this fully supported in current linux kernel. (cumulus network work hard on it, and use it on their switch).
We only need : the linux kernel + a bgp routing deamon (quagga, frr, bird, gobgp,...)


I can already configure manually each brige and vtep from each network in /etc/network/interfaces 
(with ifupdown2, https://github.com/CumulusNetworks/ifupdown2/tree/v1.0.0, already support new kernel 4.14 features).
I have checked systemd-networkd, seem to be good, with some missing optional features like arp suppression (kernel 4.14)

But this could be done easy with some custom code with ip commands.


basicaly we need to define something like

local on each node
-------------------
/etc/pve/node/

host1:
------
vtep :  myvtep
        dstport 4789 
        address 203.0.113.1 
        learningmode nolearning

(each node have a loopback with this ip address, which is used to generate vtep and the local bgp config too from this)

host2:
vtep :  myvtep
        dstport 4789 
        address 203.0.113.2 
        learningmode nolearning

global
------
/etc/pve/networks.cfg

vxlanebgp: tenantnetwork1
           gateway_address 10.0.1.1/24
           gateway_macaddress a2:ed:21:06:e7:48
           vni 1
           vtep myvtep


vxlanebgp: tenantnetwork2
           gateway_address 10.0.2.1/24
           gateway_macaddress a2:ed:21:06:e7:48
           vni 2
           vtep myvtep



Then when vm start, we can generate the bridge with anycast address, 
create a vtep and plug it on the bridge (differents vtep reuse the same loopback address)


manual config:

host1 config
------------
/etc/network/interfaces
-----------------------
auto lo
iface lo inet loopback
        pre-up ip addr add 203.0.113.1/32 dev lo

auto vmbr1
iface vmbr1 inet static
        address 10.0.1.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 203.0.113.1 nolearning
        pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr1 vxlan1

auto vmbr2
iface vmbr2 inet static
        address 10.0.2.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 203.0.113.1 nolearning
        pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr2 vxlan2

quagga bgp config:
------------------


router bgp 65000
  bgp router-id 203.0.113.1
  no bgp default ipv4-unicast
  neighbor fabric peer-group
  neighbor fabric remote-as 65000
  neighbor fabric capability extended-nexthop
  ! BGP sessions with route reflectors or full mesh with all proxmox hosts or routers..
  neighbor 203.0.113.2 peer-group fabric
  neighbor 203.0.113.254 peer-group fabric
  !
  address-family evpn
   neighbor fabric activate
   advertise-all-vni
  exit-address-family
  !
  exit
!


host2 config
------------
/etc/network/interfaces
-----------------------
auto lo
iface lo inet loopback
        pre-up ip addr add 203.0.113.2/32 dev lo

auto vmbr1
iface vmbr1 inet static
        address 10.0.1.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 203.0.113.2 nolearning
        pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr1 vxlan1

auto vmbr2
iface vmbr2 inet static
        address 10.0.2.1/24
        bridge_ports vxlan1
        bridge_stp off
        bridge_fd 0
        pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 203.0.113.2 nolearning
        pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48
        pre-up brctl addif vmbr2 vxlan2

quagga bgp config:
------------------

router bgp 65000
  bgp router-id 203.0.113.2
  no bgp default ipv4-unicast
  neighbor fabric peer-group
  neighbor fabric remote-as 65000
  neighbor fabric capability extended-nexthop
  ! BGP sessions with route reflectors or full mesh with all proxmox hosts or routers..
  neighbor 203.0.113.1 peer-group fabric
  neighbor 203.0.113.254 peer-group fabric
  !
  address-family evpn
   neighbor fabric activate
   advertise-all-vni
  exit-address-family
  !
  exit
!



Regards,

Alexandre






----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "dietmar" <dietmar at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 5 Janvier 2018 12:26:32
Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ?

>>I think we basically have two kinds of networks: 
>> 
>>1.) local networks: 
>> 
>>This is what we already have in /etc/network/interface. Access to local network 
>>is 
>>usually restricted to admins. 
>> 
>>2.) virtual networks: 
>> 
>>Basically a linux bridge where we can connect VM to. One can connect such 
>>virtual network to local network: 
>> 
>>- directly (this is what we currently use for the firewall) 
>>- vlan 
>>- vxlan 
>> 
>>Or we can connect that bridge to some SDN. 
>> 
>>We can also add additional service to such virtual network: 
>> 
>>- SNAT, DNAT 
>>- Firewall 
>>- DHCP 
>>- Routing, ... 

Yes, I totally agreed with you. 




For vxlan with linux bridge, I have found very good documentation here: 

https://vincent.bernat.im/fr/blog/2017-vxlan-linux 
https://vincent.bernat.im/fr/blog/2017-vxlan-bgp-evpn 

(In french, sorry). 

But basically, we can: 

create a simple bridge with vxlan interface (1 bridge by vxlan) 

Host1 (10.0.0.1) 
------- 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 


Host2 (10.0.0.2) 
------- 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 

# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 


This simple setup use multicast to send arp requests to all vni. 
Can work with layer2 lan, but not across internet. 

Anoter mode, is to use unicast instead multicast 
---------------------------------------------------------- 
host1 
------ 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2 
# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2 
# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 

host2 
------ 
ip link add vxlan100 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1 
# brctl addbr vmbr100 
# brctl addif vmbr100 vxlan100 


ip link add vxlan200 type vxlan \ 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
group ff05::100 \ 
dev eth0 \ 
ttl 5 
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1 
# brctl addbr vmbr200 
# brctl addif vmbr200 vxlan200 


This works fine for small setup, as arp will be replicate in unicast to all vni 


So to avoid arp (for big network), we can disable learning on vni , and use a bgp daemon (bgp-evpn protocol) to sync the fbd 
host1: 
------- 
ip link add vxlan100 type vxlan 
id 100 \ 
dstport 4789 \ 
local 10.0.0.1 \ 
nolearning 

host2 
------- 
ip link add vxlan100 type vxlan 
id 100 \ 
dstport 4789 \ 
local 10.0.0.2 \ 
nolearning 

then quagga/or frr local on each host, to peer with others hosts or through bgp routes reflector. (see the doc) 



They are also description of manual fbd setup (could be done by a proxmox daemon, as we known the mac address of vms, but this will work only for 1 proxmox cluster). 
They are examples in documentation with behaviour of docker libnetworkd and flannel. 


It could be great to have something easy to setup, without need to configure each host manually. 
for example, something like 
/etc/pve/network.conf: 

vxlanplugin: customer1 
vxlan 100 
underlay_network 10.0.0.0/8 

and in vm config: net0: virtio=....,network=customer1 

this will create the vmbr100 with vxlan100 interface and take the local ip of each host, do the unicast config if needed with all others hosts,.... 



De: "dietmar" <dietmar at proxmox.com> 
À: "aderumier" <aderumier at odiso.com>, "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Jeudi 4 Janvier 2018 09:30:52 
Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ? 

I think we basically have two kinds of networks: 

1.) local networks: 

This is what we already have in /etc/network/interface. Access to local network 
is 
usually restricted to admins. 

2.) virtual networks: 

Basically a linux bridge where we can connect VM to. One can connect such 
virtual network to local network: 

- directly (this is what we currently use for the firewall) 
- vlan 
- vxlan 

Or we can connect that bridge to some SDN. 

We can also add additional service to such virtual network: 

- SNAT, DNAT 
- Firewall 
- DHCP 
- Routing, ... 


> On January 2, 2018 at 3:04 PM Alexandre DERUMIER <aderumier at odiso.com> wrote: 
> I think we have 2 kind of setup: 
> 
> - basic local vswitch (bridge, ovs, snabwitch,....) : can be easily setup with 
> systemd-network + some tap/eth plug/unplug scripts. 
> - bigger sdn setup, with external controllers. (which could manage networks 
> across multiple proxmox clusters too) 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



More information about the pve-devel mailing list