[pve-devel] [RFC qemu 5/7] fix #3231+#3631: PVE backup: add timeout for copy-before-write operations and fail backup instead of guest writes

Fiona Ebner f.ebner at proxmox.com
Mon Jun 10 14:59:40 CEST 2024


If the backup target can't be reached or is very slow, then the
default behavior for QEMU backup is to break the guest write. This is
undesirable and it is more expected and less intrusive to make the
backup error out instead.

A timeout of 45 seconds for copy-before-write operations is set, like
for fleecing. Guest drivers like virtio-win have issues when a write
takes more than 60 seconds and still completes afterwards, so a value
below that was chosen.

Unfortunately, with this alone, the backup would still try to run to
completion and fail only at the very end. This can be improved by
adding a callback function that will abort the backup once a
copy-before-write operation fails.

Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
---
 pve-backup.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 108e185a20..9843d8d122 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -560,6 +560,7 @@ static void create_backup_jobs_bh(void *opaque) {
         bdrv_drained_begin(di->bs);
 
         BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+        QDict *backup_cbw_opts = qdict_new();
 
         BlockDriverState *source_bs = di->bs;
         bool discard_source = false;
@@ -631,11 +632,18 @@ static void create_backup_jobs_bh(void *opaque) {
                 perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
             }
             perf.has_min_cluster_size = true;
+        } else {
+            /*
+             * When fleecing is not used, need to set the options on the copy-before-write node
+             * installed by the backup job itself.
+             */
+            qdict_put_str(backup_cbw_opts, "on-cbw-error", "break-snapshot");
+            qdict_put_int(backup_cbw_opts, "cbw-timeout", 45);
         }
 
         BlockJob *job = backup_job_create(
-            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
-            bitmap_mode, false, discard_source, NULL, &perf, NULL, BLOCKDEV_ON_ERROR_REPORT,
+            job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap, bitmap_mode,
+            false, discard_source, NULL, &perf, backup_cbw_opts, BLOCKDEV_ON_ERROR_REPORT,
             BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
             &local_err);
 
-- 
2.39.2





More information about the pve-devel mailing list