[pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN

DERUMIER, Alexandre alexandre.derumier at groupe-cyllene.com
Fri Oct 27 15:37:59 CEST 2023


-------- Message initial --------
De: Stefan Lendl <s.lendl at proxmox.com>
À: "DERUMIER, Alexandre" <alexandre.derumier at groupe-cyllene.com>
Cc: pve-devel at lists.proxmox.com <pve-devel at lists.proxmox.com>
Objet: Re: [pve-devel] [WIP v2 cluster/network/manager/qemu-
server/container 00/10] Add support for DHCP servers to SDN
Date: 27/10/2023 14:53:25


Hi Alexandre, I am proposing a slightly different view.

>>I think it's better to keep all IPs, managed by the IPAM in the IPAM
>>and the VM only configures as DHCP.


Yes, I'm thinking exactly the same !   


I had tried 2year ago to implement ipam with static ip in vm
configuration (+ipam), and they are a lot of corner case.




>>I would implement the 4 mentioned events (vNIC create, destroy,
>>start,
>>stop) in the SDN module and limit interactions between VM configs and
>>the SDN module to these events.

>>
>>On NIC create: the it calls the SDN::nic_join_vnet($bridge, $mac)
>>function that handles IPAM registration if necessary triggers
>>generating
>>DHCP config and so on. Same approach for the other SDN related
>>events.
>>
>>All the logic is implemented in the SDN module. This reduces coupling
>>between VM logic and SDN logic.


sound great :)

"DERUMIER, Alexandre" <alexandre.derumier at groupe-cyllene.com> writes:

> Hi Stefan (Lendl),
> 
> I'm totally agreed with you, we should have persistent reservation,
> at vm create/nic plug, nic delete, vm delete.
> 
> At least , for my usage with multiple cluster on different
> datacenters,
> I really can wait to call ipam to api at each start (for scalability
> or
> for security if ipam is down)
> 
> 
> This also allow to simply do reservations in dnsmasq file without any
> need to restart it. (AFAIK, openstack is using dnsmasq like this too)
> 
> 
> I'm not sure if true dynamic ephemral ip , changing at each vm
> stop/start is interesting for a server vm usage. (maybe for desktop
> vmwhere you share a small pool of ip, but I personnaly don't known
> any
> proxmox users using proxmox ve for this)
> 
> 
> see my proposal here (with handle ephemeral && reserved, but it's
> even
> easier with only reserved):
> 
> https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3mP
> QdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD-
> 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD-
> gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT-
> NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//lists.proxmox.c
> om/pipermail/pve-devel/2023-September/059169.html&k=ogd1
> 
> 
> 
> 
> "
> I think we could implement ipam call like:
> 
> 
> create vm or add a new nic  -->
> -----------------------------
> qm create ... -net0
> bridge=vnet,....,ip=(auto|192.168.0.1|dynamic),ip6=(..)
> 
> 
> auto : search a free ip in ipam.  write the ip address in net0:
> ...,ip=
> ip field
> 
> 192.168.0.1:  check if ip is free in ipam && register ip in ipam.
> write
> the ip in ip field.
> 
> 
> dynamic: write "ephemeral" in net0: ....,ip=ephemeral (This is a
> dynamic ip registered at vm start, and release at vm stop)
> 
> 
> 
> vm start
> ---------
> - if ip=ephemeral, find && register a free ip in ipam, write it in vm
> net0: ...,ip=192.168.0.10[E] .   (maybe with a special flag [E] to
> indicate it's ephemeral)
> - read ip from vm config && inject in dhcp
> 
> 
> vm_stop
> -------
> if ip is ephemeral (netX: ip=192.168.0.10[E]),  delete ip from ipam,
> set ip=ephemeral in vm config
> 
> 
> vm_destroy or nic remove/unplug
> -------------------------
> if netX: ...,ip=192.168.0.10   ,  remove ip from ipam
> 
> 
> 
> nic update when vm is running:
> ------------------------------
> if ip is defined : netX: ip=192.168.0.10,  we don't allow bridge
> change
> or ip change, as vm is not notified about theses changes, and still
> use
> old ip.
> 
> We can allow nic hot-unplug && hotplug. (guest os will remove the ip
> on
> nic removal, and will call dhcp again on nic hotplug)
> 
> 
> 
> 
> nic hotplug with ip=auto:
> -------------------------
> 
> --> add nic in pending state ----> find ip in ipam && write it in
> pending ---> do the hotplug in qemu.
> 
> We need to handle the config revert to remove ip from ipam if the nic
> hotplug is blocked in pending state(I never see this case until os
> don't have pci_hotplug module loaded, but it's better to be carefull
> )
> 
> "
> 
> 
> > > I am currently working on the SDN feature.  This is an initial
> > > review
> > > of
> > > the patch series and I am trying to make a strong case against
> > > ephemeral
> > > DHCP IP reservation.
> > > 
> > > The current state of the patch series invokes the IPAM on every
> > > VM/CT
> > > start/stop to add or remove the IP from the IPAM.
> > > This triggers the dnsmasq config generation on the specific host
> > > with
> > > only the MAC/IP mapping of that particular host.
> 
> 
> 
> 
> 
> From reading the discussion of the v1 patch series I understand this
> approach tries to implement the ephemeral IP reservation strategy.
> From
> off-list conversations with Stefan Hanreich, I agree that having
> ephemeral IP reservation coordinated by the IPAM requires us to
> re-implement DHCP functionality in the IPAM and heavily rely on
> syncing
> between the different services.
> 
> To maintain reliable sync we need to hook into many different places
> where the IPAM need to be queried.  Any issues with the
> implementation
> may lead to IPAM and DHCP local config state running out of sync
> causing
> network issues duplicate multiple IPs.
> 
> Furthermore, every interaction with the IPAM requires a cluster-wide
> lock on the IPAM. Having a central cluster-wide lock on every VM
> start/stop/migrate will significantly limit parallel operations. 
> Event
> starting two VMs in parallel will be limited by this central lock. At
> boot trying to start many VMs (ideally as much in parallel as
> possible)
> is limited by the central IPAM lock even further.
> 
> I argue that we shall not support ephemeral IPs altogether.
> The alternative is to make all IPAM reservations persistent.
> 
> Using persistent IPs only reduces the interactions of VM/CTs with the
> IPAM to a minimum of NIC joining a subnet and NIC leaving a subnet. I
> am
> deliberately not referring to VMs because a VM may be part of
> multiple
> VNets or even multiple times in the same VNet (regardless if that is
> sensible).
> 
> Cases the IPAM needs to be involved:
> 
> - NIC with DHCP enabled VNet is added to VM config
> - NIC with DHCP enabled VNet is removed from VM config
> - NIC is assigned to another Bridge
>   can be treated as individual leave + join events
> 
> Cases that are explicitly not covered but may be added if desired:
> 
> - Manually assign an IP address on a NIC
>   will not be automatically visible in the IPAM
> - Manually change the MAC on a NIC
>   don't do that > you are on your own.
>   Not handled > change in IPAM manually
> 
> Once an IP is reserved via IPAM, the dnsmasq config can be generated
> stateless and idempotent from the pve IPAM and is identical on all
> nodes
> regardless if a VM/CT actually resides on that node or is running or
> stopped.  This is especially useful for VM migration because the IP
> stays consistent without spacial considering.
> 
> Snapshot/revert, backup/restore, suspend/hibernate/resume cases are
> automatically covered because the IP will already be reserved for
> that
> MAC.
> 
> If the admin wants to change, the IP of a VM this can be done via the
> IPAM API/UI which will have to be implemented separately.
> 
> A limitation of this approach vs dynamic IP reservation is that the
> IP
> range on the subnet needs to be large enough to hold all IPs of all,
> even stopped, VMs in that subnet. This is in contrast to default DHCP
> functionality where only the number of actively running VMs is
> limited.
> It should be enough to mention this in the docs.
> 
> I will further review the code an try to implement the aforementioned
> approach.
> 
> Best regards,
> Stefan Lendl
> 
> Stefan Hanreich <s.hanreich at proxmox.com> writes:
> 
> > This is a WIP patch series, since I will be gone for 3 weeks and
> > wanted to
> > share my current progress with the DHCP support for SDN.
> > 
> > This patch series adds support for automatically deploying dnsmasq
> > as
> > a DHCP
> > server to a simple SDN Zone.
> > 
> > While certainly not 100% polished on some ends (looking at
> > restarting
> > systemd
> > services in particular), the general idea behind the mechanism
> > shows.
> > I wanted
> > to gather some feedback on how I approached designing the plugins
> > and
> > the
> > config regeneration process before comitting to this design by
> > creating an API
> > and UI around it.
> > 
> > You need to install dnsmasq (and disable it afterwards):
> > 
> >   apt install dnsmasq && systemctl disable --now dnsmasq
> > 
> > 
> > You can use the following example configuration for deploying a
> > DHCP
> > server in
> > a SDN subnet:
> > 
> > /etc/pve/sdn/dhcp.cfg:
> > 
> >   dnsmasq: nat
> > 
> > 
> > /etc/pve/sdn/zones.cfg:
> > 
> >   simple: DHCPNAT
> >           ipam pve
> > 
> > 
> > /etc/pve/sdn/vnets.cfg:
> > 
> >   vnet: dhcpnat
> >           zone DHCPNAT
> > 
> > 
> > /etc/pve/sdn/subnets.cfg:
> > 
> >   subnet: DHCPNAT-10.1.0.0-16
> >           vnet dhcpnat
> >           dhcp-dns-server 10.1.0.1
> >           dhcp-range server=nat,start-address=10.1.0.100,end-
> > address=10.1.0.200
> >           gateway 10.1.0.1
> >           snat 1
> > 
> > 
> > Then apply the SDN configuration:
> > 
> >   pvesh set /cluster/sdn
> > 
> > You need to apply the SDN configuration once after adding the dhcp-
> > range lines
> > to the configuration, since the running configuration is used for
> > managing
> > DHCP. It will not work otherwise!
> > 
> > For testing it can be helpful to monitor the following files (e.g.
> > with watch)
> > to find out what is happening
> >   * /etc/dnsmasq.d/<dhcp_id>/ethers (on each node)
> >   * /etc/pve/priv/ipam.db
> > 
> > Changes from v1 -> v2:
> >   * added hooks for handling DHCP when starting / stopping / .. VMs
> > and CTs
> >   * Get an IP from IPAM and register that IP in the DHCP server
> >     (pve only for now)
> >   * remove lease-time, since it is now infinite and managed by the
> > VM
> > lifecycle
> >   * add hooks for setting & deleting DHCP mappings to DHCP plugins
> >   * modified interface of the abstract class to reflect new
> > requirements
> >   * added helpers in existing SDN classes
> >   * simplified DHCP configuration settings
> > 
> > 
> > 
> > pve-cluster:
> > 
> > Stefan Hanreich (1):
> >   cluster files: add dhcp.cfg
> > 
> >  src/PVE/Cluster.pm  | 1 +
> >  src/pmxcfs/status.c | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > 
> > pve-network:
> > 
> > Stefan Hanreich (6):
> >   subnets: vnets: preparations for DHCP plugins
> >   dhcp: add abstract class for DHCP plugins
> >   dhcp: subnet: add DHCP options to subnet configuration
> >   dhcp: add DHCP plugin for dnsmasq
> >   ipam: Add helper methods for DHCP to PVE IPAM
> >   dhcp: regenerate config for DHCP servers on reload
> > 
> >  debian/control                         |   1 +
> >  src/PVE/Network/SDN.pm                 |  11 +-
> >  src/PVE/Network/SDN/Dhcp.pm            | 192
> > +++++++++++++++++++++++++
> >  src/PVE/Network/SDN/Dhcp/Dnsmasq.pm    | 186
> > ++++++++++++++++++++++++
> >  src/PVE/Network/SDN/Dhcp/Makefile      |   8 ++
> >  src/PVE/Network/SDN/Dhcp/Plugin.pm     |  83 +++++++++++
> >  src/PVE/Network/SDN/Ipams/PVEPlugin.pm |  64 +++++++++
> >  src/PVE/Network/SDN/Makefile           |   3 +-
> >  src/PVE/Network/SDN/SubnetPlugin.pm    |  32 +++++
> >  src/PVE/Network/SDN/Subnets.pm         |  43 ++++--
> >  src/PVE/Network/SDN/Vnets.pm           |  27 ++--
> >  11 files changed, 622 insertions(+), 28 deletions(-)
> >  create mode 100644 src/PVE/Network/SDN/Dhcp.pm
> >  create mode 100644 src/PVE/Network/SDN/Dhcp/Dnsmasq.pm
> >  create mode 100644 src/PVE/Network/SDN/Dhcp/Makefile
> >  create mode 100644 src/PVE/Network/SDN/Dhcp/Plugin.pm
> > 
> > 
> > pve-manager:
> > 
> > Stefan Hanreich (1):
> >   sdn: regenerate DHCP config on reload
> > 
> >  PVE/API2/Network.pm | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > 
> > qemu-server:
> > 
> > Stefan Hanreich (1):
> >   sdn: dhcp: add DHCP setup to vm-network-scripts
> > 
> >  PVE/QemuServer.pm                 | 14 ++++++++++++++
> >  vm-network-scripts/pve-bridge     |  3 +++
> >  vm-network-scripts/pve-bridgedown | 19 +++++++++++++++++++
> >  3 files changed, 36 insertions(+)
> > 
> > 
> > pve-container:
> > 
> > Stefan Hanreich (1):
> >   sdn: dhcp: setup DHCP mappings in LXC hooks
> > 
> >  src/PVE/LXC.pm            | 10 ++++++++++
> >  src/lxc-pve-poststop-hook |  1 +
> >  src/lxc-pve-prestart-hook |  9 +++++++++
> >  3 files changed, 20 insertions(+)
> > 
> > 
> > Summary over all repositories:
> >   20 files changed, 681 insertions(+), 28 deletions(-)
> > 
> > --
> > murpp v0.4.0
> > 
> > 
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel at lists.proxmox.com
> > https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3
> > mPQdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD-
> > 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD-
> > gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT-
> > NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//antiphishing.
> > cetsi.fr/proxy/v3%3Fi%3Dd1l4NXNNaWE4SWZqU0dLWcuTfdxE&k=ogd1
> > d98NfWIp9dma5kY&r=MXJUa0FrUVJqc1UwYWxNZ-
> > tuXduEO8AMVnCvYVMprCZ3oPilgy3nXcuJTOGH5iK84rVRg8cukFAROdxYRgFTTg&f=
> > c2
> > xMdVN4Smh2R2tOZDdIRKCk7WEocHpTPMerT1Q-
> > Aq5qwr8l2xvAWuOGvFsV3frp2oSAgxNUQCpJDHp2iUmTWg&u=https%3A//lists.pr
> > ox
> > mox.com/cgi-bin/mailman/listinfo/pve-devel&k=fjzS
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://antiphishing.cetsi.fr/proxy/v3?i=YXJwbnI5ZGY3YXM2MThBYc__j3mP
> QdDC0mZ08oRIJLw&r=d2RpVFJVaTVtcFJRWFNMYgYCddP93Y9SOEaGwAD-
> 9JdLrx2JwwKfs9Sn_uiRQCCUgqnCg4WLD-
> gLY0eKXrXX4A&f=SVN0TjFBb1k5Qk8zQ2E1YT-
> NJ2Y2fJYrRVcVAuRs9UYfyMFrtkoDLcaTV9MhYQZD&u=https%3A//antiphishing.ce
> tsi.fr/proxy/v3%3Fi%3Dd1l4NXNNaWE4SWZqU0dLWcuTfdxEd9&k=ogd1
> 8NfWIp9dma5kY&r=MXJUa0FrUVJqc1UwYWxNZ-
> tuXduEO8AMVnCvYVMprCZ3oPilgy3nXcuJTOGH5iK84rVRg8cukFAROdxYRgFTTg&f=c2
> xM
> dVN4Smh2R2tOZDdIRKCk7WEocHpTPMerT1Q-
> Aq5qwr8l2xvAWuOGvFsV3frp2oSAgxNUQCpJDHp2iUmTWg&u=https%3A//lists.prox
> mo
> x.com/cgi-bin/mailman/listinfo/pve-devel&k=fjzS




More information about the pve-devel mailing list