[pve-devel] avoiding VMID reuse

Thu Apr 12 14:41:44 CEST 2018

On Thu, Apr 12 2018 14:26:53 +0200, Fabian Grünbichler wrote:
> > Sure, it's not a guarantee (because it isn't an error to use an unused
> > ID less than nextid -- it would be easy to convert the warning to an
> > error though). But we don't especially need it to be a guarantee, we
> > just want casual web interface use to not end us up in a situation where
> > backups break or data is lost, so it's enough to just fix the suggestion
> > made by the web interface (which is what /cluster/nextid does
> > precisely).
> 
> but it does change the semantics and introduces a new class of problem
> (the guest ID cannot get arbitrarily high, and you only go up and never
> back down). reusing "holes" avoids this altogether.

I consider this an unlikely problem, and a much smaller one than
problems that can arise from not having unique identifiers. So yes, with
these patches we are trading the problem class "it is not possible to
uniquely identify virtual machines over time" to "IDs may get high and
the ID namespace may have holes".

I don't consider it an especially bad thing that holes are possible.
There's precedent in other systems, for example PostgreSQL uses the
"serial" type for auto-incrementing numeric id fields, and holes happen
there similarly if data is deleted. Yes, it limits the "lifetime" of
id's to 2^31, but that *is* a lot.

> > > > another approach would be to adapt your snapshot/sync scripts to remove
> > > > sync targets if the source gets removed, or do a forceful full sync if
> > > > an ID gets re-used. the latter is how PVE's builtin ZFS replication
> > > > works if it fails to find a snapshot combination that allows incremental
> > > > sending.
> > 
> > That sounds super dangerous. If I delete a VM and then someone creates a
> > new one that now gets the same ID, I also lose all backups of my deleted
> > VM!
> 
> replication != backup. replication in PVE is for fast disaster recovery.
> when you delete the source, the replicated copies also get deleted.

Sure, but I'm specifically talking about backups - just pointing out
that your advice does not apply there.

> > > > I am a bit hesitant to introduce such special case heuristics,
> > > > especially since we don't know if anybody relies on the current
> > > > semantics of /cluster/nextid
> > > 
> > > that point still stands though ;)
> > 
> > I didn't make this configurable, because I don't really see how someone
> > could be relying on id's getting reused (unless there's an upper limit
> > to id numbers that could be argued to be reachable).
> 
> guest IDs are unsigned ints (32 bit) internally. the API limits that
> further to the range of 100-999999999. while that might seem like a lot,
> with your proposed change a user just needs to "allocate" the maximum ID
> to break your scheme (intentionally or otherwise).

Sure, you could allocate the maximum id right away and then the
suggestion in the web UI would break. But that is fixable by just
editing the nextid file, and in at least our deployment we don't really
worry that users creating VMs want to make each other's life more
difficult. I could change it so that you are not allowed to use an id
that is some amount larger than the current highest one if you think
it's a problem though.

Or would you propose something different instead?