[PVE-User] HA migration behaviour vs. failures

Dhaussy Alexandre ADhaussy at voyages-sncf.com
Tue Jul 29 17:02:39 CEST 2014


Le 29/07/2014 07:20, Dietmar Maurer a écrit :
> OK, I changed the behavior:
>
> https://git.proxmox.com/?p=qemu-server.git;a=commitdiff;h=debe88829e468928271c6d0baf6592b682a70c46
> https://git.proxmox.com/?p=pve-manager.git;a=commitdiff;h=c0a008a8b3e1a4938b10cbd09f7be403ce17f1cb
>
> Would be great if you can test?
Thank you ! Much appreciated !
I just applied your patch and rebooted the 3 cluster nodes.

root at proxmoxt2:~# /usr/local/bin/bascule_rhcluster.pl proxmoxt1
===
=== Starting cluster switch to proxmoxt1 (2 threads)
=== Start time : 29/07/2014 - 15:31:27
===
START (29/07/2014 - 15:31:27) : clusvcadm -M pvevm:101 -m proxmoxt1
START (29/07/2014 - 15:31:27) : clusvcadm -M pvevm:102 -m proxmoxt1
--> Trying to migrate pvevm:101 to proxmoxt1...Success (28 secs)
START (29/07/2014 - 15:31:55) : clusvcadm -M pvevm:103 -m proxmoxt1
--> Trying to migrate pvevm:103 to proxmoxt1...Failed; service running 
on original owner (9 secs)
START (29/07/2014 - 15:32:04) : clusvcadm -M pvevm:104 -m proxmoxt1
--> Trying to migrate pvevm:102 to proxmoxt1...Success (38 secs)
START (29/07/2014 - 15:32:05) : clusvcadm -M pvevm:105 -m proxmoxt1
--> Trying to migrate pvevm:105 to proxmoxt1...Failed; service running 
on original owner (6 secs)
START (29/07/2014 - 15:32:11) : clusvcadm -M pvevm:106 -m proxmoxt1
--> Trying to migrate pvevm:104 to proxmoxt1...Failed; service running 
on original owner (7 secs)
START (29/07/2014 - 15:32:11) : clusvcadm -M pvevm:107 -m proxmoxt1
--> Trying to migrate pvevm:106 to proxmoxt1...Failed; service running 
on original owner (7 secs)
START (29/07/2014 - 15:32:18) : clusvcadm -M pvevm:108 -m proxmoxt1
--> Trying to migrate pvevm:107 to proxmoxt1...Failed; service running 
on original owner (9 secs)
START (29/07/2014 - 15:32:20) : clusvcadm -M pvevm:109 -m proxmoxt1
--> Trying to migrate pvevm:108 to proxmoxt1...Success (25 secs)
START (29/07/2014 - 15:32:43) : clusvcadm -M pvevm:111 -m proxmoxt1
--> Trying to migrate pvevm:109 to proxmoxt1...Success (29 secs)
START (29/07/2014 - 15:32:49) : clusvcadm -M pvevm:112 -m proxmoxt1
--> Trying to migrate pvevm:111 to proxmoxt1...Success (50 secs)
START (29/07/2014 - 15:33:33) : clusvcadm -M pvevm:113 -m proxmoxt1
--> Trying to migrate pvevm:112 to proxmoxt1...Success (50 secs)
START (29/07/2014 - 15:33:39) : clusvcadm -M pvevm:114 -m proxmoxt1
--> Trying to migrate pvevm:113 to proxmoxt1...Success (48 secs)
START (29/07/2014 - 15:34:21) : clusvcadm -M pvevm:115 -m proxmoxt1
--> Trying to migrate pvevm:114 to proxmoxt1...Success (50 secs)
--> Trying to migrate pvevm:115 to proxmoxt1...Success (34 secs)
===
=== End time : 29/07/2014 - 15:34:55
===

root at proxmoxt2:~# clustat | grep 'pvevm.*proxmoxt2'
  pvevm:103 proxmoxt2                                   started
  pvevm:104 proxmoxt2                                   started
  pvevm:105 proxmoxt2                                   started
  pvevm:106 proxmoxt2                                   started
  pvevm:107 proxmoxt2                                   started

Let's try again...

root at proxmoxt2:~# /usr/local/bin/bascule_rhcluster.pl proxmoxt1
....
--> Trying to migrate pvevm:103 to proxmoxt1...Success (80 secs)
--> Trying to migrate pvevm:104 to proxmoxt1...Success (106 secs)
--> Trying to migrate pvevm:105 to proxmoxt1...Success (29 secs)
--> Trying to migrate pvevm:106 to proxmoxt1...Success (100 secs)
--> Trying to migrate pvevm:107 to proxmoxt1...Success (107 secs)

root at proxmoxt2:~# clustat | grep 'pvevm.*proxmoxt2' | wc -l
0

Good. Let's try to live migrate back to the original node...

root at proxmoxt1:~# perl /usr/local/bin/bascule_rhcluster.pl proxmoxt2
....
--> Trying to migrate pvevm:102 to proxmoxt2...Success (23 secs)
--> Trying to migrate pvevm:101 to proxmoxt2...Success (23 secs)
--> Trying to migrate pvevm:103 to proxmoxt2...Success (65 secs)
--> Trying to migrate pvevm:104 to proxmoxt2...Success (73 secs)
--> Trying to migrate pvevm:105 to proxmoxt2...Success (55 secs)
--> Trying to migrate pvevm:106 to proxmoxt2...Failed; service running 
on original owner (49 secs)
--> Trying to migrate pvevm:108 to proxmoxt2...Success (21 secs)
--> Trying to migrate pvevm:109 to proxmoxt2...Success (21 secs)
--> Trying to migrate pvevm:111 to proxmoxt2...Success (34 secs)
--> Trying to migrate pvevm:107 to proxmoxt2...Success (97 secs)
--> Trying to migrate pvevm:112 to proxmoxt2...Success (25 secs)
--> Trying to migrate pvevm:113 to proxmoxt2...Success (32 secs)
--> Trying to migrate pvevm:114 to proxmoxt2...Success (30 secs)
--> Trying to migrate pvevm:115 to proxmoxt2...Success (21 secs)

root at proxmoxt1:~# perl /usr/local/bin/bascule_rhcluster.pl proxmoxt2
--> Trying to migrate pvevm:106 to proxmoxt2...Success (50 secs)

OK! Still some random failures..
I wonder if it could be related to some kind of latency induced by the 
ongoing self healing...whatever...

On the other hand, no more downtimes ! Sweeet ! :)


More information about the pve-user mailing list