[pve-devel] [PATCH pve-guest-common 1/1] add pre/post-snapshot hooks

Fri Nov 18 09:27:57 CET 2022

Am 17.11.22 um 12:27 schrieb Stefan Hanreich:> On 11/14/22 09:51, Fiona
Ebner wrote:
>> Am 22.09.22 um 13:54 schrieb Stefan Hanreich:
>>> Signed-off-by: Stefan Hanreich <s.hanreich at proxmox.com>
>>> ---
>>>
>>
>> Should there be a third hook that's called when the snapshot fails? That
>> would allow doing cleanup in all cases. Could still be added later when
>> actually requested by users of course.
>>
>> What are the common use cases you have in mind for these hooks?
>>
> 
> Quoting from the user in #2530:
> Imagine hooks for pre- and post-snapshot too, it will helps with making
> consistent backups, clones, replication greatly, reduces recovery time
> after firing HA or backup restore… If one will be able to ensure that
> your databases and other stuff oflloads data correctly before snapshoting.
> 
> Apparently they want to use it to force DBs and similar applications to
> flush all their writes and maybe block new writes in the meanwhile.
> Adding a failed-snapshot hook might be a good idea then, because in
> those cases one would still have to clean up some stuff that has been
> done in the pre-snapshot hook...

Makes sense. Would be nice have a "partially close #2530" prefix in the
commit title ;) and maybe also briefly describe the use case in the
commit message.

So the proposed use-case is not about editing configs, but I'm sure some
users are going to want that too.

> 
>>> I have opted to include the snapshot hooks in the abstract class,
>>> since this
>>> seemed like the more straightforward way.
>>> The other option would have been duplicating the calls of the hooks
>>> into the
>>> respective Backends, but I couldn't see any upsides to this.
>>>
>>> This hook runs before the preparation steps, since otherwise in case
>>> of a
>>> failing pre-snapshot hook the VM/CT is left in a locked state, which
>>> I would
>>> need to clean up, which would add unnecessary complexity imo.
>>>
>>> Otoh, there is a case to be made that the hook should only run after
>>> we checked
>>> every precondition and are as certain as we can be that the snapshot
>>> will be
>>> successfully created. This would be more convenient from a user's pov.
>>> This can be particularly convenient as it would avoid errors with
>>> user scripts
>>> that are not idempotent. Although those would still fail in case of a
>>> failure
>>> during the snapshotting process.
>>
>> Calling the hook script only after setting the lock in the config would
>> add protection against other modifications happening at the same time.
>> Stupid example: if we add another such hook outside a lock, and both
>> that and the 'pre-snapshot' hook would modify the config, they could
>> interfere when happening right after another. But it doesn't need to be
>> another hook script, the config modification could also come from
>> somewhere else.
>>
>> Although this approach does requires users to pass the --skiplock option
>> to modify the config if using our API/CLI. And we need to repeat the
>> checks after calling the hook script, because they might not apply any
>> more ;)
>>
> 
> Yes, from this POV it's probably a smarter idea to do it this way. Not
> sure if we can trust the users to properly lock the config file in their
> hook scripts when editing the config. Definitely the safer option.
> Especially if you think about the case that the hookscript config option
> could also change sometime inbetween. That could lead to some weird
> behaviour.

Locking in the hookscript wouldn't even help, because you could have:
1. pre-snapshot hook locks config and finishes
2. hook for other operation locks and modifies config and finishes
3. snapshot operation locks config ending up with modified config

Well, I'd say if users change the hookscript option during the
hookscript, they are responsible for the consequences :P

> 
> I can actually see some use-cases for editing the config with regards to
> PCI passthrough. One could script removing / adding PCI-E devices in
> order to be able to take snapshots. Together with the restore hooks, one
> could even try scripting automatically detaching/attaching PCI-E devices
> when snapshotting/restoring. Not sure if this is practical though,
> because of the issues it could cause.
> 

If users *really* want to edit the config, they would just do it
manually. IMHO the need to use skiplock is not that big of a downside
compared to the upside of having protection against modifications from
concurrent operations. But if we go for that approach we should properly
test and document it of course :)

>>>
>>> What do you think would be the better solution?
>>>
>>>   src/PVE/AbstractConfig.pm | 8 +++++++-
>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/PVE/AbstractConfig.pm b/src/PVE/AbstractConfig.pm
>>> index a0c0bc6..8052fde 100644
>>> --- a/src/PVE/AbstractConfig.pm
>>> +++ b/src/PVE/AbstractConfig.pm
>>> @@ -799,11 +799,15 @@ sub __snapshot_activate_storages {
>>>   sub snapshot_create {
>>>       my ($class, $vmid, $snapname, $save_vmstate, $comment) = @_;
>>>   +    my $conf = $class->load_config($vmid);
>>> +    PVE::GuestHelpers::exec_hookscript($conf, $vmid, "pre-snapshot",
>>> 1);
>>> +
>>>       my $snap = $class->__snapshot_prepare($vmid, $snapname,
>>> $save_vmstate, $comment);
>>>         $save_vmstate = 0 if !$snap->{vmstate};
>>>   -    my $conf = $class->load_config($vmid);
>>> +    # reload config after changes in snapshot preparation step
>>
>> I think there should be a cfs_update() call?
> 
> Yes, definitely. If I understand correctly load_config uses a cache to
> get already loaded configurations instead of hitting the disk everytime?
> That's why this is required?
> 

Yes.