[pve-devel] [RFC qemu 7/7] block/backup: set callback for cbw errors

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Jul 5 11:37:36 CEST 2024


Quoting Fiona Ebner (2024-06-10 14:59:42)
> The callback is invoked when cbw is configured to not break the guest
> write and will abort a backup job immediately. Currently the backup
> has to wait for the rest of the block copy operation to finish before
> checking the cbw error state.
> 
> Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
> ---
> 
> Note for testers: if e.g. the PBS is compeletly unreachable, the
> backup job still will need to wait until the in-flight request is
> aborted after 15 minutes. But the guest writes should be fast again.

could we improve that by checking the status in the pbs-qemu lib periodically,
and aborting there as well?

how is the bitmap handled in case of a cbw-timeout/error?

> 
>  block/backup.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/block/backup.c b/block/backup.c
> index ba153110d3..43d34ce4c2 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -32,6 +32,45 @@
>  
>  static const BlockJobDriver backup_job_driver;
>  
> +typedef struct {
> +    Job *job;
> +    int ret;
> +} BackupOnCbwError;
> +
> +static void backup_on_cbw_error_cb_bh(void *opaque)
> +{
> +    BackupOnCbwError *data = opaque;
> +    if (data->job) {
> +        WITH_JOB_LOCK_GUARD() {
> +            if (!job_is_completed_locked(data->job)) {
> +                error_report("backup was cancelled because of copy-before-write error: %s",
> +                             strerror(-data->ret));
> +                job_cancel_locked(data->job, true);
> +            }
> +        }
> +    } else {
> +        error_report("backup_on_cbw_error_cb_bh: no job! Error: %s", strerror(-data->ret));
> +    }
> +
> +    g_free(data);
> +}
> +
> +static void backup_on_cbw_error_cb(void *opaque, int ret)
> +{
> +    BackupOnCbwError *data = g_new0(BackupOnCbwError, 1);
> +    data->job = opaque;
> +    data->ret = ret;
> +
> +    /*
> +     * backup_cancel() cannot run in coroutine context.
> +     */
> +    if (qemu_in_coroutine()) {
> +        aio_bh_schedule_oneshot(qemu_get_aio_context(), backup_on_cbw_error_cb_bh, data);
> +    } else {
> +        backup_on_cbw_error_cb_bh(data);
> +    }
> +}
> +
>  static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>  {
>      BdrvDirtyBitmap *bm;
> @@ -477,6 +516,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>          goto error;
>      }
>  
> +    bdrv_cbw_set_error_cb(cbw, backup_on_cbw_error_cb, job);
> +
>      job->cbw = cbw;
>      job->source_bs = bs;
>      job->target_bs = target;
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
>




More information about the pve-devel mailing list