[pve-devel] [WIP v2 cluster/network/manager/qemu-server/container 00/10] Add support for DHCP servers to SDN

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Oct 27 09:39:06 CEST 2023

Am 23/10/2023 um 14:40 schrieb Stefan Lendl:
> I am currently working on the SDN feature.  This is an initial review of
> the patch series and I am trying to make a strong case against ephemeral
> DHCP IP reservation.

Stefan Hanreich's reply to the cover letter already mentions upserts, those
will avoid basically all problems while allowing for some dynamic changes.

> The current state of the patch series invokes the IPAM on every VM/CT
> start/stop to add or remove the IP from the IPAM.
> This triggers the dnsmasq config generation on the specific host with
> only the MAC/IP mapping of that particular host.
> From reading the discussion of the v1 patch series I understand this
> approach tries to implement the ephemeral IP reservation strategy. From
> off-list conversations with Stefan Hanreich, I agree that having
> ephemeral IP reservation coordinated by the IPAM requires us to
> re-implement DHCP functionality in the IPAM and heavily rely on syncing
> between the different services.
> To maintain reliable sync we need to hook into many different places
> where the IPAM need to be queried.  Any issues with the implementation
> may lead to IPAM and DHCP local config state running out of sync causing
> network issues duplicate multiple IPs.

The same is true for permanent reservations, wherever that reservation is
saved needs to be in sync with IPAM, e.g., also on backup restore (into a
new env), if subnets change their configured CIDRs, ...

> Furthermore, every interaction with the IPAM requires a cluster-wide
> lock on the IPAM. Having a central cluster-wide lock on every VM
> start/stop/migrate will significantly limit parallel operations.  Event
> starting two VMs in parallel will be limited by this central lock. At
> boot trying to start many VMs (ideally as much in parallel as possible)
> is limited by the central IPAM lock even further.

Cluster wide locks are relatively cheap, especially if one avoids having
a long critical section, i.e., query IPAM while still unlocked, then 
read and update the state locked, if the newly received IP is already
in there then simply give up lock again and repeat.

We also have a clusters wide lock for starting HA guests, to set the
wanted ha-resource state, that is no issue at all, you can start/stop
many orders of magnitudes more VMs than any HW/Storage could cope with.

> I argue that we shall not support ephemeral IPs altogether.
> The alternative is to make all IPAM reservations persistent.

> Using persistent IPs only reduces the interactions of VM/CTs with the
> IPAM to a minimum of NIC joining a subnet and NIC leaving a subnet. I am
> deliberately not referring to VMs because a VM may be part of multiple
> VNets or even multiple times in the same VNet (regardless if that is
> sensible).

Yeah, talking about vNICs / veth's is the better term here, guests are
only indirectly relevant.

> Cases the IPAM needs to be involved:
> - NIC with DHCP enabled VNet is added to VM config
> - NIC with DHCP enabled VNet is removed from VM config
> - NIC is assigned to another Bridge
>   can be treated as individual leave + join events


- subnet config is changed
- vNIC changes from SDN-DHCP managed to manual, or vice versa
  Albeit that can almost be treated like vNet leave/join though

> Cases that are explicitly not covered but may be added if desired:
> - Manually assign an IP address on a NIC
>   will not be automatically visible in the IPAM

This sounds like you want to save the state in the VM config, which I'm
rather skeptical about, and would try hard to avoid. We also would need
to differ between bridges that are part of DHCP-managed SDN and others,
as else a user could set some IP but nothing would happen.

> - Manually change the MAC on a NIC
>   don't do that > you are on your own.

FWIW, a clone is such a change, and we have to support that, otherwise
the MAC field needs to get some warning hints or even become read-only
in the UI.

>   Not handled > change in IPAM manually
> Once an IP is reserved via IPAM, the dnsmasq config can be generated
> stateless and idempotent from the pve IPAM and is identical on all nodes
> regardless if a VM/CT actually resides on that node or is running or
> stopped.  This is especially useful for VM migration because the IP
> stays consistent without spacial considering.

That should be orthogonal to the feature set, if we have all the info
saved somewhere else

But this also speaks against having it in the VM config, as that would
mean that every node needs to parse every guests' config periodically,
which is way worse than some cluster lock and breaks with our base
axiom that guests are owned by their current node, and only by that,
and a node should not really alter behavior dependent on some "foreign"

> Snapshot/revert, backup/restore, suspend/hibernate/resume cases are
> automatically covered because the IP will already be reserved for that
> MAC.

Not really, restore to another setup is broken, one could resume the
VM after having changed CIDRs of a subnet, making that broken too, ...

> If the admin wants to change, the IP of a VM this can be done via the
> IPAM API/UI which will have to be implemented separately.

Providing Overrides can be fine, but IMO that all should be still in
the SDN state, not per-VM one, and ideally use a common API.

> A limitation of this approach vs dynamic IP reservation is that the IP
> range on the subnet needs to be large enough to hold all IPs of all,
> even stopped, VMs in that subnet. This is in contrast to default DHCP
> functionality where only the number of actively running VMs is limited.
> It should be enough to mention this in the docs.

In production setups it should not matter _that_ much, but it might
be a bit of a PITA if one has a few "archived" VMs or the like, but
that alone would

> I will further review the code an try to implement the aforementioned
> approach.

You can naturally experiment, but I'd also try the upsert proposal from
Stefan H., as IMO that sounds like a good balance.

More information about the pve-devel mailing list