[pve-devel] [RFC qemu] savevm-async: improve check for blockers

Thomas Lamprecht t.lamprecht at proxmox.com
Fri May 17 17:00:48 CEST 2024


subject might be improved by being less general/ambiguous, something like:

savevm-async: improve coverage by also checking for migration blockers

or 

savevm-async: block snapshot also if migration would fail

or

savevm-async: reuse migration blocker check for snapshots

Would have helped me to have a better initial context for reading this commit
(message).

Am 17/05/2024 um 13:39 schrieb Fiona Ebner:
> Same rationale as with upstream QEMU commit 5aaac46793 ("migration:
> savevm: consult migration blockers"), migration and (async) snapshot
> are essentially the same operation and thus snapshot also needs to
> check for migration blockers. For example, this catches passed-through
> PCI devices, where the driver does not support migration.
> 
> However, the commit notes:
> 
>> There is really no difference between live migration and savevm, except
>> that savevm does not require bdrv_invalidate_cache to be implemented
>> by all disks.  However, it is unlikely that savevm is used with anything
>> except qcow2 disks, so the penalty is small and worth the improvement
>> in catching bad usage of savevm.
> 
> and for Proxmox VE, suspend-to-disk with VMDK does use savevm-async
> and would be broken by simply using migration_is_blocked(). To keep
> this working, introduce a new helper that filters blockers with the
> prefix used by the VMDK migration blocker.
> 
> The function qemu_savevm_state_blocked() is called as part of
> migration_is_blocked_allow_prefix() so no check is lost with this
> patch.
> 
> Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
> ---
> 
> An alternative would be to mark the VMDK blocker as a
> "live-migration-only" blocker and extending migration_is_blocked() or
> using separate helpers to check for migration and snapshot blockers
> differently. But that requires touching more machinery and probably
> needs more adaptation going forward than the approach here.
> 
>  migration/migration.c    | 22 ++++++++++++++++++++++
>  migration/migration.h    |  1 +
>  migration/savevm-async.c |  7 ++++++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index b8d7e471a4..6235309a00 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1897,6 +1897,28 @@ void qmp_migrate_pause(Error **errp)
>                 "during postcopy-active or postcopy-recover state");
>  }
>  
> +/*
> + * HACK to allow hibernation in Proxmox VE even when VMDK image is present.
> + */
> +bool migration_is_blocked_allow_prefix(Error **errp, const char *prefix)
> +{
> +    GSList *blockers = migration_blockers[migrate_mode()];
> +
> +    if (qemu_savevm_state_blocked(errp)) {
> +        return true;
> +    }
> +
> +    while (blockers) {
> +        if (!g_str_has_prefix(error_get_pretty(blockers->data), prefix)) {
> +            error_propagate(errp, error_copy(blockers->data));
> +            return true;
> +        }
> +        blockers = g_slist_next(blockers);
> +    }
> +
> +    return false;
> +}
> +
>  bool migration_is_blocked(Error **errp)
>  {
>      GSList *blockers = migration_blockers[migrate_mode()];
> diff --git a/migration/migration.h b/migration/migration.h
> index 8045e39c26..575805a8e2 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -484,6 +484,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type,
>                               Error **errp);
>  
>  int migrate_init(MigrationState *s, Error **errp);
> +bool migration_is_blocked_allow_prefix(Error **errp, const char *prefix);
>  bool migration_is_blocked(Error **errp);
>  /* True if outgoing migration has entered postcopy phase */
>  bool migration_in_postcopy(void);
> diff --git a/migration/savevm-async.c b/migration/savevm-async.c
> index bf36fc06d2..33085446e1 100644
> --- a/migration/savevm-async.c
> +++ b/migration/savevm-async.c
> @@ -363,7 +363,12 @@ void qmp_savevm_start(const char *statefile, Error **errp)
>          return;
>      }
>  
> -    if (qemu_savevm_state_blocked(errp)) {
> +    /*
> +     * The limitation for VMDK images only applies to live-migration, not
> +     * snapshots, see commit 5aaac46793 ("migration: savevm: consult migration
> +     * blockers").
> +     */
> +    if (migration_is_blocked_allow_prefix(errp, "The vmdk format used by node")) {

meh, not a big fan of matching strings here, especially as that is not
stable ABI, I mean I did not check, but I would be surprised if that's
the case – maybe we could factor out that string here and when its added
as blocker into a common constant so that we'd notice if it changes.

And if we only uses this here then why add a generic "ignore one specific
blocker" helper, might be better to at least contain that detail in a
"qemu_savevm_async_state_blocked" one that takes only the `errp` as
parameter, as hacks should IMO always be quite specific to avoid the
spread of them (I know you would check in detail before doing so, but
not everybody does).

>          return;
>      }
>  





More information about the pve-devel mailing list