[pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN
DERUMIER, Alexandre
alexandre.derumier at groupe-cyllene.com
Fri Oct 27 15:37:59 CEST 2023
-------- Message initial --------
De: Stefan Lendl <s.lendl at proxmox.com>
À: "DERUMIER, Alexandre" <alexandre.derumier at groupe-cyllene.com>
Cc: pve-devel at lists.proxmox.com <pve-devel at lists.proxmox.com>
Objet: Re: [pve-devel] [WIP v2 cluster/network/manager/qemu-
server/container 00/10] Add support for DHCP servers to SDN
Date: 27/10/2023 14:53:25
Hi Alexandre, I am proposing a slightly different view.
>>I think it's better to keep all IPs, managed by the IPAM in the IPAM
>>and the VM only configures as DHCP.
Yes, I'm thinking exactly the same !
I had tried 2year ago to implement ipam with static ip in vm
configuration (+ipam), and they are a lot of corner case.
>>I would implement the 4 mentioned events (vNIC create, destroy,
>>start,
>>stop) in the SDN module and limit interactions between VM configs and
>>the SDN module to these events.
>>
>>On NIC create: the it calls the SDN::nic_join_vnet($bridge, $mac)
>>function that handles IPAM registration if necessary triggers
>>generating
>>DHCP config and so on. Same approach for the other SDN related
>>events.
>>
>>All the logic is implemented in the SDN module. This reduces coupling
>>between VM logic and SDN logic.
sound great :)
"DERUMIER, Alexandre" <alexandre.derumier at groupe-cyllene.com> writes:
> Hi Stefan (Lendl),
>
> I'm totally agreed with you, we should have persistent reservation,
> at vm create/nic plug, nic delete, vm delete.
>
> At least , for my usage with multiple cluster on different
> datacenters,
> I really can wait to call ipam to api at each start (for scalability
> or
> for security if ipam is down)
>
>
> This also allow to simply do reservations in dnsmasq file without any
> need to restart it. (AFAIK, openstack is using dnsmasq like this too)
>
>
> I'm not sure if true dynamic ephemral ip , changing at each vm
> stop/start is interesting for a server vm usage. (maybe for desktop
> vmwhere you share a small pool of ip, but I personnaly don't known
> any
> proxmox users using proxmox ve for this)
>
>
> see my proposal here (with handle ephemeral && reserved, but it's
> even
> easier with only reserved):
>
> https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3mP
> QdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD-
> 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD-
> gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT-
> NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//lists.proxmox.c
> om/pipermail/pve-devel/2023-September/059169.html&k=ogd1
>
>
>
>
> "
> I think we could implement ipam call like:
>
>
> create vm or add a new nic -->
> -----------------------------
> qm create ... -net0
> bridge=vnet,....,ip=(auto|192.168.0.1|dynamic),ip6=(..)
>
>
> auto : search a free ip in ipam. write the ip address in net0:
> ...,ip=
> ip field
>
> 192.168.0.1: check if ip is free in ipam && register ip in ipam.
> write
> the ip in ip field.
>
>
> dynamic: write "ephemeral" in net0: ....,ip=ephemeral (This is a
> dynamic ip registered at vm start, and release at vm stop)
>
>
>
> vm start
> ---------
> - if ip=ephemeral, find && register a free ip in ipam, write it in vm
> net0: ...,ip=192.168.0.10[E] . (maybe with a special flag [E] to
> indicate it's ephemeral)
> - read ip from vm config && inject in dhcp
>
>
> vm_stop
> -------
> if ip is ephemeral (netX: ip=192.168.0.10[E]), delete ip from ipam,
> set ip=ephemeral in vm config
>
>
> vm_destroy or nic remove/unplug
> -------------------------
> if netX: ...,ip=192.168.0.10 , remove ip from ipam
>
>
>
> nic update when vm is running:
> ------------------------------
> if ip is defined : netX: ip=192.168.0.10, we don't allow bridge
> change
> or ip change, as vm is not notified about theses changes, and still
> use
> old ip.
>
> We can allow nic hot-unplug && hotplug. (guest os will remove the ip
> on
> nic removal, and will call dhcp again on nic hotplug)
>
>
>
>
> nic hotplug with ip=auto:
> -------------------------
>
> --> add nic in pending state ----> find ip in ipam && write it in
> pending ---> do the hotplug in qemu.
>
> We need to handle the config revert to remove ip from ipam if the nic
> hotplug is blocked in pending state(I never see this case until os
> don't have pci_hotplug module loaded, but it's better to be carefull
> )
>
> "
>
>
> > > I am currently working on the SDN feature. This is an initial
> > > review
> > > of
> > > the patch series and I am trying to make a strong case against
> > > ephemeral
> > > DHCP IP reservation.
> > >
> > > The current state of the patch series invokes the IPAM on every
> > > VM/CT
> > > start/stop to add or remove the IP from the IPAM.
> > > This triggers the dnsmasq config generation on the specific host
> > > with
> > > only the MAC/IP mapping of that particular host.
>
>
>
>
>
> From reading the discussion of the v1 patch series I understand this
> approach tries to implement the ephemeral IP reservation strategy.
> From
> off-list conversations with Stefan Hanreich, I agree that having
> ephemeral IP reservation coordinated by the IPAM requires us to
> re-implement DHCP functionality in the IPAM and heavily rely on
> syncing
> between the different services.
>
> To maintain reliable sync we need to hook into many different places
> where the IPAM need to be queried. Any issues with the
> implementation
> may lead to IPAM and DHCP local config state running out of sync
> causing
> network issues duplicate multiple IPs.
>
> Furthermore, every interaction with the IPAM requires a cluster-wide
> lock on the IPAM. Having a central cluster-wide lock on every VM
> start/stop/migrate will significantly limit parallel operations.
> Event
> starting two VMs in parallel will be limited by this central lock. At
> boot trying to start many VMs (ideally as much in parallel as
> possible)
> is limited by the central IPAM lock even further.
>
> I argue that we shall not support ephemeral IPs altogether.
> The alternative is to make all IPAM reservations persistent.
>
> Using persistent IPs only reduces the interactions of VM/CTs with the
> IPAM to a minimum of NIC joining a subnet and NIC leaving a subnet. I
> am
> deliberately not referring to VMs because a VM may be part of
> multiple
> VNets or even multiple times in the same VNet (regardless if that is
> sensible).
>
> Cases the IPAM needs to be involved:
>
> - NIC with DHCP enabled VNet is added to VM config
> - NIC with DHCP enabled VNet is removed from VM config
> - NIC is assigned to another Bridge
> can be treated as individual leave + join events
>
> Cases that are explicitly not covered but may be added if desired:
>
> - Manually assign an IP address on a NIC
> will not be automatically visible in the IPAM
> - Manually change the MAC on a NIC
> don't do that > you are on your own.
> Not handled > change in IPAM manually
>
> Once an IP is reserved via IPAM, the dnsmasq config can be generated
> stateless and idempotent from the pve IPAM and is identical on all
> nodes
> regardless if a VM/CT actually resides on that node or is running or
> stopped. This is especially useful for VM migration because the IP
> stays consistent without spacial considering.
>
> Snapshot/revert, backup/restore, suspend/hibernate/resume cases are
> automatically covered because the IP will already be reserved for
> that
> MAC.
>
> If the admin wants to change, the IP of a VM this can be done via the
> IPAM API/UI which will have to be implemented separately.
>
> A limitation of this approach vs dynamic IP reservation is that the
> IP
> range on the subnet needs to be large enough to hold all IPs of all,
> even stopped, VMs in that subnet. This is in contrast to default DHCP
> functionality where only the number of actively running VMs is
> limited.
> It should be enough to mention this in the docs.
>
> I will further review the code an try to implement the aforementioned
> approach.
>
> Best regards,
> Stefan Lendl
>
> Stefan Hanreich <s.hanreich at proxmox.com> writes:
>
> > This is a WIP patch series, since I will be gone for 3 weeks and
> > wanted to
> > share my current progress with the DHCP support for SDN.
> >
> > This patch series adds support for automatically deploying dnsmasq
> > as
> > a DHCP
> > server to a simple SDN Zone.
> >
> > While certainly not 100% polished on some ends (looking at
> > restarting
> > systemd
> > services in particular), the general idea behind the mechanism
> > shows.
> > I wanted
> > to gather some feedback on how I approached designing the plugins
> > and
> > the
> > config regeneration process before comitting to this design by
> > creating an API
> > and UI around it.
> >
> > You need to install dnsmasq (and disable it afterwards):
> >
> > apt install dnsmasq && systemctl disable --now dnsmasq
> >
> >
> > You can use the following example configuration for deploying a
> > DHCP
> > server in
> > a SDN subnet:
> >
> > /etc/pve/sdn/dhcp.cfg:
> >
> > dnsmasq: nat
> >
> >
> > /etc/pve/sdn/zones.cfg:
> >
> > simple: DHCPNAT
> > ipam pve
> >
> >
> > /etc/pve/sdn/vnets.cfg:
> >
> > vnet: dhcpnat
> > zone DHCPNAT
> >
> >
> > /etc/pve/sdn/subnets.cfg:
> >
> > subnet: DHCPNAT-10.1.0.0-16
> > vnet dhcpnat
> > dhcp-dns-server 10.1.0.1
> > dhcp-range server=nat,start-address=10.1.0.100,end-
> > address=10.1.0.200
> > gateway 10.1.0.1
> > snat 1
> >
> >
> > Then apply the SDN configuration:
> >
> > pvesh set /cluster/sdn
> >
> > You need to apply the SDN configuration once after adding the dhcp-
> > range lines
> > to the configuration, since the running configuration is used for
> > managing
> > DHCP. It will not work otherwise!
> >
> > For testing it can be helpful to monitor the following files (e.g.
> > with watch)
> > to find out what is happening
> > * /etc/dnsmasq.d/<dhcp_id>/ethers (on each node)
> > * /etc/pve/priv/ipam.db
> >
> > Changes from v1 -> v2:
> > * added hooks for handling DHCP when starting / stopping / .. VMs
> > and CTs
> > * Get an IP from IPAM and register that IP in the DHCP server
> > (pve only for now)
> > * remove lease-time, since it is now infinite and managed by the
> > VM
> > lifecycle
> > * add hooks for setting & deleting DHCP mappings to DHCP plugins
> > * modified interface of the abstract class to reflect new
> > requirements
> > * added helpers in existing SDN classes
> > * simplified DHCP configuration settings
> >
> >
> >
> > pve-cluster:
> >
> > Stefan Hanreich (1):
> > cluster files: add dhcp.cfg
> >
> > src/PVE/Cluster.pm | 1 +
> > src/pmxcfs/status.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> >
> > pve-network:
> >
> > Stefan Hanreich (6):
> > subnets: vnets: preparations for DHCP plugins
> > dhcp: add abstract class for DHCP plugins
> > dhcp: subnet: add DHCP options to subnet configuration
> > dhcp: add DHCP plugin for dnsmasq
> > ipam: Add helper methods for DHCP to PVE IPAM
> > dhcp: regenerate config for DHCP servers on reload
> >
> > debian/control | 1 +
> > src/PVE/Network/SDN.pm | 11 +-
> > src/PVE/Network/SDN/Dhcp.pm | 192
> > +++++++++++++++++++++++++
> > src/PVE/Network/SDN/Dhcp/Dnsmasq.pm | 186
> > ++++++++++++++++++++++++
> > src/PVE/Network/SDN/Dhcp/Makefile | 8 ++
> > src/PVE/Network/SDN/Dhcp/Plugin.pm | 83 +++++++++++
> > src/PVE/Network/SDN/Ipams/PVEPlugin.pm | 64 +++++++++
> > src/PVE/Network/SDN/Makefile | 3 +-
> > src/PVE/Network/SDN/SubnetPlugin.pm | 32 +++++
> > src/PVE/Network/SDN/Subnets.pm | 43 ++++--
> > src/PVE/Network/SDN/Vnets.pm | 27 ++--
> > 11 files changed, 622 insertions(+), 28 deletions(-)
> > create mode 100644 src/PVE/Network/SDN/Dhcp.pm
> > create mode 100644 src/PVE/Network/SDN/Dhcp/Dnsmasq.pm
> > create mode 100644 src/PVE/Network/SDN/Dhcp/Makefile
> > create mode 100644 src/PVE/Network/SDN/Dhcp/Plugin.pm
> >
> >
> > pve-manager:
> >
> > Stefan Hanreich (1):
> > sdn: regenerate DHCP config on reload
> >
> > PVE/API2/Network.pm | 1 +
> > 1 file changed, 1 insertion(+)
> >
> >
> > qemu-server:
> >
> > Stefan Hanreich (1):
> > sdn: dhcp: add DHCP setup to vm-network-scripts
> >
> > PVE/QemuServer.pm | 14 ++++++++++++++
> > vm-network-scripts/pve-bridge | 3 +++
> > vm-network-scripts/pve-bridgedown | 19 +++++++++++++++++++
> > 3 files changed, 36 insertions(+)
> >
> >
> > pve-container:
> >
> > Stefan Hanreich (1):
> > sdn: dhcp: setup DHCP mappings in LXC hooks
> >
> > src/PVE/LXC.pm | 10 ++++++++++
> > src/lxc-pve-poststop-hook | 1 +
> > src/lxc-pve-prestart-hook | 9 +++++++++
> > 3 files changed, 20 insertions(+)
> >
> >
> > Summary over all repositories:
> > 20 files changed, 681 insertions(+), 28 deletions(-)
> >
> > --
> > murpp v0.4.0
> >
> >
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel at lists.proxmox.com
> > https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3
> > mPQdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD-
> > 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD-
> > gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT-
> > NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//antiphishing.
> > cetsi.fr/proxy/v3%3Fi%3Dd1l4NXNNaWE4SWZqU0dLWcuTfdxE&k=ogd1
> > d98NfWIp9dma5kY&r=MXJUa0FrUVJqc1UwYWxNZ-
> > tuXduEO8AMVnCvYVMprCZ3oPilgy3nXcuJTOGH5iK84rVRg8cukFAROdxYRgFTTg&f=
> > c2
> > xMdVN4Smh2R2tOZDdIRKCk7WEocHpTPMerT1Q-
> > Aq5qwr8l2xvAWuOGvFsV3frp2oSAgxNUQCpJDHp2iUmTWg&u=https%3A//lists.pr
> > ox
> > mox.com/cgi-bin/mailman/listinfo/pve-devel&k=fjzS
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3mP
> QdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD-
> 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD-
> gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT-
> NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//antiphishing.ce
> tsi.fr/proxy/v3%3Fi%3Dd1l4NXNNaWE4SWZqU0dLWcuTfdxEd9&k=ogd1
> 8NfWIp9dma5kY&r=MXJUa0FrUVJqc1UwYWxNZ-
> tuXduEO8AMVnCvYVMprCZ3oPilgy3nXcuJTOGH5iK84rVRg8cukFAROdxYRgFTTg&f=c2
> xM
> dVN4Smh2R2tOZDdIRKCk7WEocHpTPMerT1Q-
> Aq5qwr8l2xvAWuOGvFsV3frp2oSAgxNUQCpJDHp2iUmTWg&u=https%3A//lists.prox
> mo
> x.com/cgi-bin/mailman/listinfo/pve-devel&k=fjzS
More information about the pve-devel
mailing list