[pbs-devel] applied: [RFC PATCH proxmox-backup 2/2] file-restore: dynamically increase memory of vm for zpools

Tue Nov 8 11:07:31 CET 2022

On 11/8/22 10:19, Thomas Lamprecht wrote:
> Am 08/11/2022 um 08:59 schrieb Dominik Csapak:
>> On 11/7/22 18:15, Thomas Lamprecht wrote:
>>> Am 07/11/2022 um 13:35 schrieb Wolfgang Bumiller:
>>>> applied
>>>
>>> meh, can we please get this opt-in any only enabled it for root at pam or for users
>>> with some powerfull priv on / as talked as chosen approach to allow more memory the
>>> last time this came up (off list IIRC)... I really do *not* want a memory DOS potential
>>> increased by a lot just opening some file-restore tabs, this actually should get more
>>> restrictions (for common "non powerfull" users), not less..
>>
>> understandable, so i can do that, but maybe it's time we rethink the file-restore
>> mechanism as a whole, since it's currently rather inergonomic:
> 
> IMO the current approach is very ergonomic as the user doesn't has to care
> at all about details, I'd like to keep it that way as much as possible.
> While we may need to add some every extra setting or extra work load for users/admins
> should be kept at a minimum.

understandable

> 
>>
>> * users don't know how many and which file restore vms are running, they
>> may not even know it starts a vm at all
> 
> that isn't an issue, the user wants file restores, not care about managing
> its details.

yes, but ... (see my answer below)

> 
>> * regardless with/without my patch, the only thing necessary to start a bunch vms
>> is VM.Backup to the vm and Datastore.AllocateSpace on the storage
>> (which in turn is probably enough to create an arbitrary amount of backups)
>> * having arbitrary sized disks/fs inside, no fixed amount we give will always
>> be enough
>>
>> so here some proposals on how to improve that (we won't implement all of them
>> simultaneously, but maybe something from that list is usable)
>> * make the list of running file-restore vms visible, and maybe add a manual 'shutdown'
> 
> the VM lives for a minute and can be spawn in milliseconds, so not really seeing
> the benefit of that, a misguided/bad/... user can much faster spawn such VMs that
> one can stop them and the syslog already contains basic info.

just to clarify: the current shutdown timeout is 10 minutes (600s) after the
last action (maybe we should reduce that?)

anyway, the point of this was mostly intended for the point below:

> 
>> * limit the amount of restore vms per user (or per vm?)
>>    - this would need the mechanism from above anyway, since otherwise either the user
>>      cannot start the restore vm or we abort an older vm (with possibly running
>>      download operation)
> 
> which mechanism? The list of running VMs is only required in the backend
> and we can already find that info out there.
> 

if we limit this in any way (regardless if memory/count) the user has a problem
when reaching the limit. either we
* throw an error on start and the user is unable do the file restore until
   one (invisible) vm vanishes (without a way of knowing when that will be)
* letting the kernel kill/kill manually another vm
   (imho a no go, since there might be running operations)

so to 'fix' that we could show some running file restore list
(then the user has control over it)

your suggestion with the 'finish' button is also good,
but maybe we should generally do that when closing the file restore window?
because when the user closes it without 'finishing' there again
is no way to know which vm is actually running

(also it does not help when the user closes the tab, browser, etc. and
also does not do anything for cli users (granted root at pam only))

>> * make the vm memory configurable (per user/vm/globally?)
> 
> meh, but will be one of the better mechanisms with the priv/unpriv and
> different limits enforced via a cgroup (see below)
> 
>> * limit the global memory usage for file restore vms
>>    - again needs some control mechanism for stopping/seeing these vms
> 
> Not really, just needs a cgroup for limiting all. Unpriv user A really
> won't be able to stop the file-restore VM of unpriv User B, so this is not
> really gaining anything - you cannot expect that an admin is around all
> the time.
> 
>> * throw away the automatic starting of vms, and make it explicit, i.e.
> 
> NACK, worsens UX a lot without really a benefit, the VM will still be
> started in the end, whatever limiting mechanism could be applied without
> any dialogue anyway..
> 
>>    make the user start/shutdown a vm manually
>>    - we can have some 'configuration panel' before starting (like with restore)
>>    - the user is aware it's starting
>>    - still needs some mechanism to see them, but with a manual starting api call
>>      it's easier to have e.g. a running worker that can be stopped
>>
> 
> 
> As said weeks ago, the simple stop gap for now is to just opt into more possible
> memory once we got a sufficiently privilege (e.g., VM.Allocate and/or Sys.Modify
> on / as a starter) for a user, this does away the basic woes now. Sure, a priv
> and unpriv user may get to "open" the same restore VM, and depending on order
> it will then be able to use more or less resources, but that won't matter much
> in practice and the same is true for the debug flag.
> 
> As small addition, with possibly nice in practice effects:
> add a "Finish" button  to the file-restore window, that actively sends a signal
> to the VM on press which then will reduce the idle shutdown timer to ~3s (to avoid
> breaking other currently restores or having that open), that way you don't need
> extra lists and managements as most of the time it happens indirectly just fine.
> 
> Additionally we can put all VMs in a cgroup with max mem configured, that really
> could be a new setting in the node or datacenter.cfg, can easily be nested as in:
> 
> - root restore CG with a total limit, restores started by priv'd user start here
>   '- user A CG with a per-unpriv-user-limit
>   '- user B CG with a per-unpriv-user-limit
> 
> That way we would not need to limit on number of restores, which we don't care
> but the actual resource we care about. Side benefit, could get reduced CPU shares
> so that scheduling prefers the "real" workload.
> 
> Another possibility would be to also evaluate the MicroVM machine type to reduce
> the basic impact of the OS - tbh. I'm not sure if Stefan checked that out.
> 
> And please note that this all is mostly an artificial problem anyway, ZFS isn't
> that popular inside of guests, which are the prime target for file-restore,
> and especially not in form of using a huge set of disks as redundancy et al. is
> normally covered more centrally on the host, so iff its rather used for snapshots
> (which lvm-thin or btrfs may be a better option inside guests, as they're in-
> kernel-tree and one doesn't pay such a (relatively) big memory tax.
> 
> IIRC the issue even only got reported from internal testing of PVE or the like,
> iow. odd setups. So please lets keep this simple and avoid breaking UX for all
> the actual realistic uses cases. >
> So please, make this opt-in at PBS site first, then, opt-in for the respective
> user class (even just root at pam can be argued if really unsure), else I'll need
> to revert this for the next bump.

yes, besides my remarks above seems like a sensible solution.
just to clarify that i understood you correctly

i'd do the following now:
* add a flag to file-restore that enables the increasing memory behaviour
* check the user privs in the pve-storage api and automatically enable that flag

later we can introduce the cgroup mechanism