[pve-devel] [RFC qemu 1/1] add patch for incremental drive-mirror
Fabian Grünbichler
f.gruenbichler at proxmox.com
Thu Jul 18 14:43:45 CEST 2019
Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
---
Notes:
this is a WIP patch from qemu-devel, see
<20181219065020.81256-1-mahaocong_work at 163.com>
<20181220083839.85523-1-mahaocong_work at 163.com>
<20190102100415.24680-1-mahaocong_work at 163.com>
<20190131040154.3770-1-mahaocong_work at 163.com>
<20190131121715.22954-1-mahaocong_work at 163.com>
<20190214052809.44336-1-mahaocong_work at 163.com>
<20190214064312.44794-1-mahaocong_work at 163.com>
<20190221155521.77094-1-mahaocong_work at 163.com>
for discussions and
<20170504105444.8940-1-daniel.kucera at gmail.com>
for an older, similar patch from a different author but with the
same motivation (ZFS replication) ;)
if we want to include this we should probably first get it in an upstreamable
state and push for inclusion in qemu proper, so that we can backport it..
...04-drive-mirror-add-incremental-mode.patch | 329 ++++++++++++++++++
debian/patches/series | 1 +
2 files changed, 330 insertions(+)
create mode 100644 debian/patches/extra/0004-drive-mirror-add-incremental-mode.patch
diff --git a/debian/patches/extra/0004-drive-mirror-add-incremental-mode.patch b/debian/patches/extra/0004-drive-mirror-add-incremental-mode.patch
new file mode 100644
index 0000000..8bbeaa6
--- /dev/null
+++ b/debian/patches/extra/0004-drive-mirror-add-incremental-mode.patch
@@ -0,0 +1,329 @@
+From fbe68abe71867ffac74aa82a1e56f2a697827cf8 Mon Sep 17 00:00:00 2001
+From: mahaocong <mahaocong at didichuxing.com>
+Date: Thu, 14 Feb 2019 14:43:12 +0800
+Subject: [PATCH qemu] drive-mirror: add incremental mode
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This patch adds possibility to start mirroring with user-created-bitmap.
+On full mode, mirror create a non-named-bitmap by scanning whole block-chain,
+and on top mode, mirror create a bitmap by scanning the top block layer. So I
+think I can copy a user-created-bitmap and use it as the initial state of the
+mirror, the same as incremental mode drive-backup, and I call this new mode
+as incremental mode drive-mirror.
+
+A possible usage scene of incremental mode mirror is live migration. For maintain
+the block data and recover after a malfunction, someone may backup data to ceph
+or other distributed storage. On qemu incremental backup, we need to create a new
+bitmap and attach to block device before the first backup job. Then the bitmap
+records the change after the backup job. If we want to migration this vm, we can
+migrate block data between source and destionation by using drive-mirror directly,
+or use backup data and backup-bitmap to reduce the data should be synchronize.
+To do this, we should first create a new image in destination and set backing file
+as backup image, then set backup-bitmap as the initial state of incremental mode
+drive-mirror, and synchronize dirty block starting with the state set by the last
+incremental backup job. When the mirror complete, we get an active layer on destination,
+and its backing file is backup image on ceph. Then we can do live copy data from
+backing files into overlay images by using block-stream, or do backup continually.
+
+In this scene, It seems that If the guest os doesn't write too many data after the
+last backup, the incremental mode may transmit less data than full mode or top
+mode. However, if the write data is too many, there is no advantage on incremental
+mode compare with other mode.
+
+This scene can be described as following steps:
+1.create rbd image in ceph, and map nbd device with rbd image.
+2.create a new bitmap and attach to block device.
+3.do full mode backup on nbd device and thus sync it to the rbd image.
+4.severl times incremental mode backup.
+5.create new image in destination and set its backing file as backup image.
+6.do live-migration, and migrate block data by incremental mode drive-mirror
+ with bitmap created from step 2.
+
+Signed-off-by: Ma Haocong <mahaocong at didichuxing.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
+---
+ include/block/block_int.h | 3 ++-
+ block/mirror.c | 47 +++++++++++++++++++++++++++++----------
+ blockdev.c | 36 ++++++++++++++++++++++++++++--
+ qapi/block-core.json | 14 ++++++++++--
+ 4 files changed, 83 insertions(+), 17 deletions(-)
+
+diff --git a/include/block/block_int.h b/include/block/block_int.h
+index 01e855a066..537ad23476 100644
+--- a/include/block/block_int.h
++++ b/include/block/block_int.h
+@@ -1122,7 +1122,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
+ BlockDriverState *target, const char *replaces,
+ int creation_flags, int64_t speed,
+ uint32_t granularity, int64_t buf_size,
+- MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
++ MirrorSyncMode mode, BdrvDirtyBitmap *src_bitmap,
++ BlockMirrorBackingMode backing_mode,
+ BlockdevOnError on_source_error,
+ BlockdevOnError on_target_error,
+ bool unmap, const char *filter_node_name,
+diff --git a/block/mirror.c b/block/mirror.c
+index ff15cfb197..eb8ff34d5b 100644
+--- a/block/mirror.c
++++ b/block/mirror.c
+@@ -50,6 +50,7 @@ typedef struct MirrorBlockJob {
+ /* Used to block operations on the drive-mirror-replace target */
+ Error *replace_blocker;
+ bool is_none_mode;
++ BdrvDirtyBitmap *src_bitmap;
+ BlockMirrorBackingMode backing_mode;
+ MirrorCopyMode copy_mode;
+ BlockdevOnError on_source_error, on_target_error;
+@@ -821,6 +822,16 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
+ return 0;
+ }
+
++/*
++ * init dirty bitmap by using user bitmap. usr->hbitmap will be copy to
++ * mirror bitmap->hbitmap instead of reuse it.
++ */
++static void coroutine_fn mirror_dirty_init_incremental(MirrorBlockJob *s,
++ Error **errp)
++{
++ bdrv_merge_dirty_bitmap(s->dirty_bitmap, s->src_bitmap, NULL, errp);
++}
++
+ /* Called when going out of the streaming phase to flush the bulk of the
+ * data to the medium, or just before completing.
+ */
+@@ -846,6 +857,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
+ char backing_filename[2]; /* we only need 2 characters because we are only
+ checking for a NULL string */
+ int ret = 0;
++ Error *local_err = NULL;
+
+ if (job_is_cancelled(&s->common.job)) {
+ goto immediate_exit;
+@@ -920,9 +932,19 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
+
+ s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+ if (!s->is_none_mode) {
+- ret = mirror_dirty_init(s);
+- if (ret < 0 || job_is_cancelled(&s->common.job)) {
+- goto immediate_exit;
++ /* incremental mode */
++ if (s->src_bitmap) {
++ mirror_dirty_init_incremental(s, &local_err);
++ if (local_err) {
++ error_propagate(errp, local_err);
++ ret = -1;
++ goto immediate_exit;
++ }
++ } else {
++ ret = mirror_dirty_init(s);
++ if (ret < 0 || job_is_cancelled(&s->common.job)) {
++ goto immediate_exit;
++ }
+ }
+ }
+
+@@ -1502,7 +1524,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
+ BlockCompletionFunc *cb,
+ void *opaque,
+ const BlockJobDriver *driver,
+- bool is_none_mode, BlockDriverState *base,
++ bool is_none_mode, BdrvDirtyBitmap *src_bitmap,
++ BlockDriverState *base,
+ bool auto_complete, const char *filter_node_name,
+ bool is_mirror, MirrorCopyMode copy_mode,
+ Error **errp)
+@@ -1617,6 +1640,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
+ s->on_source_error = on_source_error;
+ s->on_target_error = on_target_error;
+ s->is_none_mode = is_none_mode;
++ s->src_bitmap = src_bitmap;
+ s->backing_mode = backing_mode;
+ s->copy_mode = copy_mode;
+ s->base = base;
+@@ -1698,7 +1722,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
+ BlockDriverState *target, const char *replaces,
+ int creation_flags, int64_t speed,
+ uint32_t granularity, int64_t buf_size,
+- MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
++ MirrorSyncMode mode, BdrvDirtyBitmap *src_bitmap,
++ BlockMirrorBackingMode backing_mode,
+ BlockdevOnError on_source_error,
+ BlockdevOnError on_target_error,
+ bool unmap, const char *filter_node_name,
+@@ -1707,17 +1732,14 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
+ bool is_none_mode;
+ BlockDriverState *base;
+
+- if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+- error_setg(errp, "Sync mode 'incremental' not supported");
+- return;
+- }
+ is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
+ base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
+ mirror_start_job(job_id, bs, creation_flags, target, replaces,
+ speed, granularity, buf_size, backing_mode,
+ on_source_error, on_target_error, unmap, NULL, NULL,
+- &mirror_job_driver, is_none_mode, base, false,
+- filter_node_name, true, copy_mode, errp);
++ &mirror_job_driver, is_none_mode,
++ src_bitmap, base, false, filter_node_name, true,
++ copy_mode, errp);
+ }
+
+ void commit_active_start(const char *job_id, BlockDriverState *bs,
+@@ -1741,7 +1763,8 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
+ mirror_start_job(job_id, bs, creation_flags, base, NULL, speed, 0, 0,
+ MIRROR_LEAVE_BACKING_CHAIN,
+ on_error, on_error, true, cb, opaque,
+- &commit_active_job_driver, false, base, auto_complete,
++ &commit_active_job_driver, false,
++ NULL, base, auto_complete,
+ filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
+ &local_err);
+ if (local_err) {
+diff --git a/blockdev.c b/blockdev.c
+index 4775a07d93..195620851e 100644
+--- a/blockdev.c
++++ b/blockdev.c
+@@ -3685,6 +3685,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+ BlockDriverState *target,
+ bool has_replaces, const char *replaces,
+ enum MirrorSyncMode sync,
++ bool has_bitmap,
++ const char *bitmap_name,
+ BlockMirrorBackingMode backing_mode,
+ bool has_speed, int64_t speed,
+ bool has_granularity, uint32_t granularity,
+@@ -3702,6 +3704,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+ Error **errp)
+ {
+ int job_flags = JOB_DEFAULT;
++ BdrvDirtyBitmap *src_bitmap = NULL;
+
+ if (!has_speed) {
+ speed = 0;
+@@ -3724,6 +3727,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+ if (!has_filter_node_name) {
+ filter_node_name = NULL;
+ }
++ if (!has_bitmap) {
++ bitmap_name = NULL;
++ }
++
+ if (!has_copy_mode) {
+ copy_mode = MIRROR_COPY_MODE_BACKGROUND;
+ }
+@@ -3788,13 +3795,35 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
+ return;
+ }
+ }
++ if (!bitmap_name && (sync == MIRROR_SYNC_MODE_INCREMENTAL)) {
++ error_setg(errp, "incremental mode must specify the bitmap name");
++ return;
++ }
++ /*
++ * In incremental mode, we should create null name bitmap by
++ * using user bitmap's granularity.
++ */
++ if (sync == MIRROR_SYNC_MODE_INCREMENTAL) {
++ assert(bitmap_name);
++ src_bitmap = bdrv_find_dirty_bitmap(bs, bitmap_name);
++ if (!src_bitmap) {
++ error_setg(errp, "Error: can't find dirty bitmap "
++ "before start incremental drive-mirror");
++ return;
++ }
++ if (granularity) {
++ warn_report("On incremental mode, granularity is unused, "
++ "the bitmap's granularity is used instead");
++ }
++ granularity = bdrv_dirty_bitmap_granularity(src_bitmap);
++ }
+
+ /* pass the node name to replace to mirror start since it's loose coupling
+ * and will allow to check whether the node still exist at mirror completion
+ */
+ mirror_start(job_id, bs, target,
+ has_replaces ? replaces : NULL, job_flags,
+- speed, granularity, buf_size, sync, backing_mode,
++ speed, granularity, buf_size, sync, src_bitmap, backing_mode,
+ on_source_error, on_target_error, unmap, filter_node_name,
+ copy_mode, errp);
+ }
+@@ -3914,6 +3943,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
+
+ blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
+ arg->has_replaces, arg->replaces, arg->sync,
++ arg->has_bitmap, arg->bitmap,
+ backing_mode, arg->has_speed, arg->speed,
+ arg->has_granularity, arg->granularity,
+ arg->has_buf_size, arg->buf_size,
+@@ -3935,6 +3965,7 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
+ const char *device, const char *target,
+ bool has_replaces, const char *replaces,
+ MirrorSyncMode sync,
++ bool has_bitmap, const char *bitmap,
+ bool has_speed, int64_t speed,
+ bool has_granularity, uint32_t granularity,
+ bool has_buf_size, int64_t buf_size,
+@@ -3971,7 +4002,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
+ bdrv_set_aio_context(target_bs, aio_context);
+
+ blockdev_mirror_common(has_job_id ? job_id : NULL, bs, target_bs,
+- has_replaces, replaces, sync, backing_mode,
++ has_replaces, replaces, sync, has_bitmap,
++ bitmap, backing_mode,
+ has_speed, speed,
+ has_granularity, granularity,
+ has_buf_size, buf_size,
+diff --git a/qapi/block-core.json b/qapi/block-core.json
+index 7ccbfff9d0..7074d73df9 100644
+--- a/qapi/block-core.json
++++ b/qapi/block-core.json
+@@ -1914,6 +1914,11 @@
+ # (all the disk, only the sectors allocated in the topmost image, or
+ # only new I/O).
+ #
++# @bitmap: The name of a bitmap to use in incremental mode. This argument must
++# be present for incremental mode and absent otherwise. In incremental
++# mode, granularity is unused, the bitmap's granularity is used instead
++# (since 4.0).
++#
+ # @granularity: granularity of the dirty bitmap, default is 64K
+ # if the image format doesn't have clusters, 4K if the clusters
+ # are smaller than that, else the cluster size. Must be a
+@@ -1955,7 +1960,7 @@
+ { 'struct': 'DriveMirror',
+ 'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
+ '*format': 'str', '*node-name': 'str', '*replaces': 'str',
+- 'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
++ 'sync': 'MirrorSyncMode', '*bitmap': 'str', '*mode': 'NewImageMode',
+ '*speed': 'int', '*granularity': 'uint32',
+ '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
+ '*on-target-error': 'BlockdevOnError',
+@@ -2210,6 +2215,11 @@
+ # (all the disk, only the sectors allocated in the topmost image, or
+ # only new I/O).
+ #
++# @bitmap: The name of a bitmap to use in incremental mode. This argument must
++# be present for incremental mode and absent otherwise. In incremental
++# mode, granularity is unused, the bitmap's granularity is used instead
++# (since 4.0).
++#
+ # @granularity: granularity of the dirty bitmap, default is 64K
+ # if the image format doesn't have clusters, 4K if the clusters
+ # are smaller than that, else the cluster size. Must be a
+@@ -2262,7 +2272,7 @@
+ { 'command': 'blockdev-mirror',
+ 'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
+ '*replaces': 'str',
+- 'sync': 'MirrorSyncMode',
++ 'sync': 'MirrorSyncMode', '*bitmap': 'str',
+ '*speed': 'int', '*granularity': 'uint32',
+ '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
+ '*on-target-error': 'BlockdevOnError',
+--
+2.20.1
+
diff --git a/debian/patches/series b/debian/patches/series
index 549bc52..2b8f645 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,6 +1,7 @@
extra/0001-target-i386-add-MDS-NO-feature.patch
extra/0002-target-i386-define-md-clear-bit.patch
extra/0003-virtio-balloon-fix-QEMU-4.0-config-size-migration-in.patch
+extra/0004-drive-mirror-add-incremental-mode.patch
pve/0001-PVE-Config-block-file-change-locking-default-to-off.patch
pve/0002-PVE-Config-Adjust-network-script-path-to-etc-kvm.patch
pve/0003-PVE-Config-set-the-CPU-model-to-kvm64-32-instead-of-.patch
--
2.20.1
More information about the pve-devel
mailing list