[pve-devel] avoiding VMID reuse

Thu Apr 12 14:26:53 CEST 2018

On Thu, Apr 12, 2018 at 03:12:50PM +0300, Lauri Tirkkonen wrote:
> On Thu, Apr 12 2018 12:42:51 +0200, Fabian Grünbichler wrote:
> > > - please send patch series as threads (cover letter and each patch as
> > >   separate mail) and configure the subjectprefix accordingly for each
> > >   repository. this allows inline comments on each patch.
> 
> Ok. This particular patchset isn't a series per se but 1 patch each for
> 4 different repositories, I was kinda wondering how you wanted to deal
> with that. Subject-prefix answers that I suppose, but it means four
> threads for this change instead of just one.

one thread is fine, but each patch should get its own mail.

> 
> > > if I understood you correctly, your intended use case is to prevent
> > > accidentally re-using a guest ID formerly used by a no longer existing
> > > guest, because of backups / replicated and/or leftover disks that
> > > reference this ID?
> 
> Yes, that's correct.
> 
> > > I assume you have some kind of client / script for managing guests,
> > > since we don't call /cluster/nextid in our GUI or anywhere else.
> > 
> > Dominik just pointed out to me that we do in fact use it in the JS code
> > (I only checked the perl code). sorry for the confusion.
> > 
> > > wouldn't the simple solution be to keep track of "already used" guest
> > > IDs in your client? especially if you only have single nodes?
> > > alternatively, you could just not use /cluster/nextid and instead use
> > > your own counter (and increment and retry in case the chosen ID is
> > > already taken - /cluster/nextid does not guarantuee it will still be
> > > available when you want to use it anyway..).
> 
> Sure, it's not a guarantee (because it isn't an error to use an unused
> ID less than nextid -- it would be easy to convert the warning to an
> error though). But we don't especially need it to be a guarantee, we
> just want casual web interface use to not end us up in a situation where
> backups break or data is lost, so it's enough to just fix the suggestion
> made by the web interface (which is what /cluster/nextid does
> precisely).

but it does change the semantics and introduces a new class of problem
(the guest ID cannot get arbitrarily high, and you only go up and never
back down). reusing "holes" avoids this altogether.
> 
> > > another approach would be to adapt your snapshot/sync scripts to remove
> > > sync targets if the source gets removed, or do a forceful full sync if
> > > an ID gets re-used. the latter is how PVE's builtin ZFS replication
> > > works if it fails to find a snapshot combination that allows incremental
> > > sending.
> 
> That sounds super dangerous. If I delete a VM and then someone creates a
> new one that now gets the same ID, I also lose all backups of my deleted
> VM!

replication != backup. replication in PVE is for fast disaster recovery.
when you delete the source, the replicated copies also get deleted.

> > > I am a bit hesitant to introduce such special case heuristics,
> > > especially since we don't know if anybody relies on the current
> > > semantics of /cluster/nextid
> > 
> > that point still stands though ;)
> 
> I didn't make this configurable, because I don't really see how someone
> could be relying on id's getting reused (unless there's an upper limit
> to id numbers that could be argued to be reachable).

guest IDs are unsigned ints (32 bit) internally. the API limits that
further to the range of 100-999999999. while that might seem like a lot,
with your proposed change a user just needs to "allocate" the maximum ID
to break your scheme (intentionally or otherwise).