[pve-devel] [RFC qemu-server 00/10] improve live-migration downtime

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Aug 4 12:28:27 CEST 2017


On 08/04/2017 10:54 AM, Fabian Grünbichler wrote:
> this patch series attempts to reduce the downtime occuring during
> live-migration of VMs to sane levels by
> - conditionalizing potentially unneeded SSH connections
> - replacing commands over SSH with new 'qm mtunnel' commands
> - reducing the polling interval to notice a completed migration faster
> 
> attempts to monitor down time via ping produced rather unreliable results,
> probably cause of ARP? but old to old is reliable slowest there too..
> 
> following are durations in 'paused' state, between 'paused inmigrate' and
> 'running', measured with qmp status with 0.1 sleep inbetween, tests repeated 5
> times each on a network-rate-limited virtual cluster.
> 
> with old polling, 2G RAM (actual RAM transfer in <2s, so no auto-reduction of
> polling interval happens):
> 
> old code: average 3.2s
> new to old: average 1.6s (skips pvesr set-state)
> new to new: average 1.2s
> 
> with old polling, 8G RAM (auto-reduction of polling interval kicks in, slightly better results):
> 
> old code: average 2.7s
> new to old: 1s
> new to new: 0.7s
> 
> with reduced polling interval (last patch applied), 2G and 8G RAM:
> new to old: 0.4s
> new to new: one single instance of logged paused state over 5 migrations!
> 
> with reduced polling interval, 8G RAM, old code but with last patch applied:
> 2s
> 
> so it seems like this is the right combination of changes to get downtime back
> to acceptable levels without sacrificing consistency.
> 
> commands which might be integrated into mtunnel as well in the future:
> -pvesr set-state
> -qm nbdstop
> -qm unlock
> 
> Fabian Grünbichler (10):
>    migrate: switch back to qm mtunnel
>    migrate: refactor mtunnel read/write
>    qm mtunnel: add tunnel version
>    migrate: read mtunnel version
>    qm mtunnel: add write helper
>    mtunnel: add and handle OK/ERR replies
>    qm mtunnel/migrate: add resume VMID command
>    migrate: finish tunnel in phase 3
>    migrate: keep track of replication
>    migrate: reduce polling intervals
> 
>   PVE/CLI/qm.pm      |  28 ++++++++++--
>   PVE/QemuMigrate.pm | 126 ++++++++++++++++++++++++++++++++++++++---------------
>   2 files changed, 116 insertions(+), 38 deletions(-)
> 

With the small nitpicks addressed:
Reviewed-by: Thomas Lamprecht <t.lamprecht at proxmox.com>





More information about the pve-devel mailing list