[pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ?
Alexandre DERUMIER
aderumier at odiso.com
Mon Jan 22 10:00:28 CET 2018
Hi,
we have done a POC at work, with vxlan + ebgp-vpn,
and we have tested a very nice feature : anycast gateway.
basicaly, each vmbrX of a specific tenant, have the same ipaddress/mac address.
This ip is the default gateway of the vm.
That mean that vm can be migrate across hosts :)
(openstack have a network model like this called "dvr", distributed virtual routing, vmware :nsx Distributed Logical Router (DLR))
I think this model works very fine with proxmox, as all proxmox nodes are master.
This open a lot of possiblities:
- distributed dhcp on all cluster nodes (btw, http://kea.isc.org seem to be a very good candidate, can be extended with custom backend)
- distributed dns
- S-nat for vm private only -> internet. (need 1 public ip on each hosts)
- 1:1 nat (aka floating ip in openstack,google cloud,...) : a public ip is created on host, and "migrate" at the same time than a vm
https://assafmuller.com/2015/04/15/distributed-virtual-routing-floating-ips/
- cloudinit metadata-server should be easy to implement
- add vrf support to isolate host (avoid connect from vm to the host gateway ip), or prevent inter-routing between tenants
- maybe other cool stuffs I don't have think yet about :)
I really like the vxlan bgp evpn, because it's a standard, with no central controller,
and we can also use it on physical switch/routers, differents proxmox clusters,and also on docker/kubernetes cluster.
Also, this fully supported in current linux kernel. (cumulus network work hard on it, and use it on their switch).
We only need : the linux kernel + a bgp routing deamon (quagga, frr, bird, gobgp,...)
I can already configure manually each brige and vtep from each network in /etc/network/interfaces
(with ifupdown2, https://github.com/CumulusNetworks/ifupdown2/tree/v1.0.0, already support new kernel 4.14 features).
I have checked systemd-networkd, seem to be good, with some missing optional features like arp suppression (kernel 4.14)
But this could be done easy with some custom code with ip commands.
basicaly we need to define something like
local on each node
-------------------
/etc/pve/node/
host1:
------
vtep : myvtep
dstport 4789
address 203.0.113.1
learningmode nolearning
(each node have a loopback with this ip address, which is used to generate vtep and the local bgp config too from this)
host2:
vtep : myvtep
dstport 4789
address 203.0.113.2
learningmode nolearning
global
------
/etc/pve/networks.cfg
vxlanebgp: tenantnetwork1
gateway_address 10.0.1.1/24
gateway_macaddress a2:ed:21:06:e7:48
vni 1
vtep myvtep
vxlanebgp: tenantnetwork2
gateway_address 10.0.2.1/24
gateway_macaddress a2:ed:21:06:e7:48
vni 2
vtep myvtep
Then when vm start, we can generate the bridge with anycast address,
create a vtep and plug it on the bridge (differents vtep reuse the same loopback address)
manual config:
host1 config
------------
/etc/network/interfaces
-----------------------
auto lo
iface lo inet loopback
pre-up ip addr add 203.0.113.1/32 dev lo
auto vmbr1
iface vmbr1 inet static
address 10.0.1.1/24
bridge_ports vxlan1
bridge_stp off
bridge_fd 0
pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 203.0.113.1 nolearning
pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48
pre-up brctl addif vmbr1 vxlan1
auto vmbr2
iface vmbr2 inet static
address 10.0.2.1/24
bridge_ports vxlan1
bridge_stp off
bridge_fd 0
pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 203.0.113.1 nolearning
pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48
pre-up brctl addif vmbr2 vxlan2
quagga bgp config:
------------------
router bgp 65000
bgp router-id 203.0.113.1
no bgp default ipv4-unicast
neighbor fabric peer-group
neighbor fabric remote-as 65000
neighbor fabric capability extended-nexthop
! BGP sessions with route reflectors or full mesh with all proxmox hosts or routers..
neighbor 203.0.113.2 peer-group fabric
neighbor 203.0.113.254 peer-group fabric
!
address-family evpn
neighbor fabric activate
advertise-all-vni
exit-address-family
!
exit
!
host2 config
------------
/etc/network/interfaces
-----------------------
auto lo
iface lo inet loopback
pre-up ip addr add 203.0.113.2/32 dev lo
auto vmbr1
iface vmbr1 inet static
address 10.0.1.1/24
bridge_ports vxlan1
bridge_stp off
bridge_fd 0
pre-up ip link add vxlan1 type vxlan id 1 dstport 4789 local 203.0.113.2 nolearning
pre-up ip link set dev vmbr1 address a2:ed:21:06:e7:48
pre-up brctl addif vmbr1 vxlan1
auto vmbr2
iface vmbr2 inet static
address 10.0.2.1/24
bridge_ports vxlan1
bridge_stp off
bridge_fd 0
pre-up ip link add vxlan2 type vxlan id 2 dstport 4789 local 203.0.113.2 nolearning
pre-up ip link set dev vmbr2 address a2:ed:21:06:e7:48
pre-up brctl addif vmbr2 vxlan2
quagga bgp config:
------------------
router bgp 65000
bgp router-id 203.0.113.2
no bgp default ipv4-unicast
neighbor fabric peer-group
neighbor fabric remote-as 65000
neighbor fabric capability extended-nexthop
! BGP sessions with route reflectors or full mesh with all proxmox hosts or routers..
neighbor 203.0.113.1 peer-group fabric
neighbor 203.0.113.254 peer-group fabric
!
address-family evpn
neighbor fabric activate
advertise-all-vni
exit-address-family
!
exit
!
Regards,
Alexandre
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "dietmar" <dietmar at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 5 Janvier 2018 12:26:32
Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ?
>>I think we basically have two kinds of networks:
>>
>>1.) local networks:
>>
>>This is what we already have in /etc/network/interface. Access to local network
>>is
>>usually restricted to admins.
>>
>>2.) virtual networks:
>>
>>Basically a linux bridge where we can connect VM to. One can connect such
>>virtual network to local network:
>>
>>- directly (this is what we currently use for the firewall)
>>- vlan
>>- vxlan
>>
>>Or we can connect that bridge to some SDN.
>>
>>We can also add additional service to such virtual network:
>>
>>- SNAT, DNAT
>>- Firewall
>>- DHCP
>>- Routing, ...
Yes, I totally agreed with you.
For vxlan with linux bridge, I have found very good documentation here:
https://vincent.bernat.im/fr/blog/2017-vxlan-linux
https://vincent.bernat.im/fr/blog/2017-vxlan-bgp-evpn
(In french, sorry).
But basically, we can:
create a simple bridge with vxlan interface (1 bridge by vxlan)
Host1 (10.0.0.1)
-------
ip link add vxlan100 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.1 \
group ff05::100 \
dev eth0 \
ttl 5
# brctl addbr vmbr100
# brctl addif vmbr100 vxlan100
ip link add vxlan200 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.1 \
group ff05::100 \
dev eth0 \
ttl 5
# brctl addbr vmbr200
# brctl addif vmbr200 vxlan200
Host2 (10.0.0.2)
-------
ip link add vxlan100 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.2 \
group ff05::100 \
dev eth0 \
ttl 5
# brctl addbr vmbr100
# brctl addif vmbr100 vxlan100
ip link add vxlan200 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.2 \
group ff05::100 \
dev eth0 \
ttl 5
# brctl addbr vmbr200
# brctl addif vmbr200 vxlan200
This simple setup use multicast to send arp requests to all vni.
Can work with layer2 lan, but not across internet.
Anoter mode, is to use unicast instead multicast
----------------------------------------------------------
host1
------
ip link add vxlan100 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.1 \
group ff05::100 \
dev eth0 \
ttl 5
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2
# brctl addbr vmbr100
# brctl addif vmbr100 vxlan100
ip link add vxlan200 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.1 \
group ff05::100 \
dev eth0 \
ttl 5
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.2
# brctl addbr vmbr200
# brctl addif vmbr200 vxlan200
host2
------
ip link add vxlan100 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.2 \
group ff05::100 \
dev eth0 \
ttl 5
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1
# brctl addbr vmbr100
# brctl addif vmbr100 vxlan100
ip link add vxlan200 type vxlan \
id 100 \
dstport 4789 \
local 10.0.0.2 \
group ff05::100 \
dev eth0 \
ttl 5
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.0.1
# brctl addbr vmbr200
# brctl addif vmbr200 vxlan200
This works fine for small setup, as arp will be replicate in unicast to all vni
So to avoid arp (for big network), we can disable learning on vni , and use a bgp daemon (bgp-evpn protocol) to sync the fbd
host1:
-------
ip link add vxlan100 type vxlan
id 100 \
dstport 4789 \
local 10.0.0.1 \
nolearning
host2
-------
ip link add vxlan100 type vxlan
id 100 \
dstport 4789 \
local 10.0.0.2 \
nolearning
then quagga/or frr local on each host, to peer with others hosts or through bgp routes reflector. (see the doc)
They are also description of manual fbd setup (could be done by a proxmox daemon, as we known the mac address of vms, but this will work only for 1 proxmox cluster).
They are examples in documentation with behaviour of docker libnetworkd and flannel.
It could be great to have something easy to setup, without need to configure each host manually.
for example, something like
/etc/pve/network.conf:
vxlanplugin: customer1
vxlan 100
underlay_network 10.0.0.0/8
and in vm config: net0: virtio=....,network=customer1
this will create the vmbr100 with vxlan100 interface and take the local ip of each host, do the unicast config if needed with all others hosts,....
De: "dietmar" <dietmar at proxmox.com>
À: "aderumier" <aderumier at odiso.com>, "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 4 Janvier 2018 09:30:52
Objet: Re: [pve-devel] proxmox 2018 : add support for "virtual" network and network plugins ?
I think we basically have two kinds of networks:
1.) local networks:
This is what we already have in /etc/network/interface. Access to local network
is
usually restricted to admins.
2.) virtual networks:
Basically a linux bridge where we can connect VM to. One can connect such
virtual network to local network:
- directly (this is what we currently use for the firewall)
- vlan
- vxlan
Or we can connect that bridge to some SDN.
We can also add additional service to such virtual network:
- SNAT, DNAT
- Firewall
- DHCP
- Routing, ...
> On January 2, 2018 at 3:04 PM Alexandre DERUMIER <aderumier at odiso.com> wrote:
> I think we have 2 kind of setup:
>
> - basic local vswitch (bridge, ovs, snabwitch,....) : can be easily setup with
> systemd-network + some tap/eth plug/unplug scripts.
> - bigger sdn setup, with external controllers. (which could manage networks
> across multiple proxmox clusters too)
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list