[pve-devel] [RFC qemu-server 00/10] improve live-migration downtime
Thomas Lamprecht
t.lamprecht at proxmox.com
Fri Aug 4 12:28:27 CEST 2017
On 08/04/2017 10:54 AM, Fabian Grünbichler wrote:
> this patch series attempts to reduce the downtime occuring during
> live-migration of VMs to sane levels by
> - conditionalizing potentially unneeded SSH connections
> - replacing commands over SSH with new 'qm mtunnel' commands
> - reducing the polling interval to notice a completed migration faster
>
> attempts to monitor down time via ping produced rather unreliable results,
> probably cause of ARP? but old to old is reliable slowest there too..
>
> following are durations in 'paused' state, between 'paused inmigrate' and
> 'running', measured with qmp status with 0.1 sleep inbetween, tests repeated 5
> times each on a network-rate-limited virtual cluster.
>
> with old polling, 2G RAM (actual RAM transfer in <2s, so no auto-reduction of
> polling interval happens):
>
> old code: average 3.2s
> new to old: average 1.6s (skips pvesr set-state)
> new to new: average 1.2s
>
> with old polling, 8G RAM (auto-reduction of polling interval kicks in, slightly better results):
>
> old code: average 2.7s
> new to old: 1s
> new to new: 0.7s
>
> with reduced polling interval (last patch applied), 2G and 8G RAM:
> new to old: 0.4s
> new to new: one single instance of logged paused state over 5 migrations!
>
> with reduced polling interval, 8G RAM, old code but with last patch applied:
> 2s
>
> so it seems like this is the right combination of changes to get downtime back
> to acceptable levels without sacrificing consistency.
>
> commands which might be integrated into mtunnel as well in the future:
> -pvesr set-state
> -qm nbdstop
> -qm unlock
>
> Fabian Grünbichler (10):
> migrate: switch back to qm mtunnel
> migrate: refactor mtunnel read/write
> qm mtunnel: add tunnel version
> migrate: read mtunnel version
> qm mtunnel: add write helper
> mtunnel: add and handle OK/ERR replies
> qm mtunnel/migrate: add resume VMID command
> migrate: finish tunnel in phase 3
> migrate: keep track of replication
> migrate: reduce polling intervals
>
> PVE/CLI/qm.pm | 28 ++++++++++--
> PVE/QemuMigrate.pm | 126 ++++++++++++++++++++++++++++++++++++++---------------
> 2 files changed, 116 insertions(+), 38 deletions(-)
>
With the small nitpicks addressed:
Reviewed-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
More information about the pve-devel
mailing list