[pve-devel] [PATCH] migrate : add nocheck for resume

Alexandre DERUMIER aderumier at odiso.com
Wed Oct 14 13:34:44 CEST 2015


Here a inotify trace on /etc/pve , when the problem has occured.

source : 2015-10-14 13:25:34 125.conf MOVED_FROM
target : 2015-10-14 13:25:39 125.conf.tmp.15438 MOVED_FROM

(5s difference, ouch ...)

Not sure it's related, but they are also lrm_status.tmp file move, with ha.
Don't known if it can slowdown things on the corosync layer.



kvmtest2->kvmtest1
-------------------
Oct 14 13:25:36 ERROR: unable to find configuration file for VM 125 - no such machine
Oct 14 13:25:36 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root at 10.3.94.31 qm resume 125 --skiplock' failed: exit code 2
Oct 14 13:25:39 ERROR: migration finished with problems (duration 00:00:10)
TASK ERROR: migration problems



kvmtest2
--------
2015-10-14 13:25:29 ha_agent_kvmtest2_lock ATTRIB:ISDIR
2015-10-14 13:25:29 ATTRIB:ISDIR 
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:29 125.conf.tmp.12645 CREATE
2015-10-14 13:25:29 125.conf.tmp.12645 OPEN
2015-10-14 13:25:29 125.conf.tmp.12645 MODIFY
2015-10-14 13:25:29 125.conf.tmp.12645 CLOSE_WRITE:CLOSE
2015-10-14 13:25:29 125.conf.tmp.12645 MOVED_FROM
2015-10-14 13:25:29 125.conf MOVED_TO
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 known_hosts OPEN
2015-10-14 13:25:31 known_hosts ACCESS
2015-10-14 13:25:31 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 known_hosts OPEN
2015-10-14 13:25:31 known_hosts ACCESS
2015-10-14 13:25:31 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:33 cluster.fw OPEN
2015-10-14 13:25:33 cluster.fw ACCESS
2015-10-14 13:25:33 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:34 lrm_status.tmp.3995 CREATE
2015-10-14 13:25:34 lrm_status.tmp.3995 OPEN
2015-10-14 13:25:34 lrm_status.tmp.3995 MODIFY
2015-10-14 13:25:34 lrm_status.tmp.3995 CLOSE_WRITE:CLOSE
2015-10-14 13:25:34 lrm_status.tmp.3995 MOVED_FROM
2015-10-14 13:25:34 lrm_status MOVED_TO
2015-10-14 13:25:34 125.conf MOVED_FROM
2015-10-14 13:25:34 125.conf MOVED_TO
2015-10-14 13:25:34 known_hosts OPEN
2015-10-14 13:25:34 known_hosts ACCESS
2015-10-14 13:25:34 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:34 known_hosts OPEN
2015-10-14 13:25:34 known_hosts ACCESS
2015-10-14 13:25:34 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:36 ceph.conf OPEN
2015-10-14 13:25:36 ceph.conf ACCESS
2015-10-14 13:25:36 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:36 ceph1.keyring OPEN
2015-10-14 13:25:36 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:36 ceph1.keyring OPEN
2015-10-14 13:25:36 ceph1.keyring ACCESS
2015-10-14 13:25:36 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 known_hosts OPEN
2015-10-14 13:25:38 known_hosts ACCESS
2015-10-14 13:25:38 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 known_hosts OPEN
2015-10-14 13:25:38 known_hosts ACCESS
2015-10-14 13:25:38 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:39 ha_agent_kvmtest2_lock ATTRIB:ISDIR
2015-10-14 13:25:39 ATTRIB:ISDIR 
2015-10-14 13:25:39 lrm_status.tmp.3995 CREATE
2015-10-14 13:25:39 lrm_status.tmp.3995 OPEN
2015-10-14 13:25:39 lrm_status.tmp.3995 MODIFY
2015-10-14 13:25:39 lrm_status.tmp.3995 CLOSE_WRITE:CLOSE
2015-10-14 13:25:39 lrm_status.tmp.3995 MOVED_FROM
2015-10-14 13:25:39 lrm_status MOVED_TO



kvmtest1
--------

2015-10-14 13:25:30 cluster.fw OPEN
2015-10-14 13:25:30 cluster.fw ACCESS
2015-10-14 13:25:30 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph.conf OPEN
2015-10-14 13:25:31 ceph.conf ACCESS
2015-10-14 13:25:31 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring ACCESS
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 authorized_keys OPEN
2015-10-14 13:25:31 authorized_keys ACCESS
2015-10-14 13:25:31 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 authorized_keys OPEN
2015-10-14 13:25:31 authorized_keys ACCESS
2015-10-14 13:25:31 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph.conf OPEN
2015-10-14 13:25:31 ceph.conf ACCESS
2015-10-14 13:25:31 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring ACCESS
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:33 ha_agent_kvmtest1_lock ATTRIB:ISDIR
2015-10-14 13:25:33 ATTRIB:ISDIR 
2015-10-14 13:25:33 lrm_status.tmp.4283 CREATE
2015-10-14 13:25:33 lrm_status.tmp.4283 OPEN
2015-10-14 13:25:33 lrm_status.tmp.4283 MODIFY
2015-10-14 13:25:33 lrm_status.tmp.4283 CLOSE_WRITE:CLOSE
2015-10-14 13:25:33 lrm_status.tmp.4283 MOVED_FROM
2015-10-14 13:25:33 lrm_status MOVED_TO
2015-10-14 13:25:34 authorized_keys OPEN
2015-10-14 13:25:34 authorized_keys ACCESS
2015-10-14 13:25:34 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:34 authorized_keys OPEN
2015-10-14 13:25:34 authorized_keys ACCESS
2015-10-14 13:25:34 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:35 ha_manager_lock ATTRIB:ISDIR
2015-10-14 13:25:35 ATTRIB:ISDIR 
2015-10-14 13:25:35 lrm_status OPEN
2015-10-14 13:25:35 lrm_status ACCESS
2015-10-14 13:25:35 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:35 lrm_status OPEN
2015-10-14 13:25:35 lrm_status ACCESS
2015-10-14 13:25:35 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:35 domain-ha CREATE:ISDIR
2015-10-14 13:25:35 domain-ha OPEN:ISDIR
2015-10-14 13:25:35 domain-ha ACCESS:ISDIR
2015-10-14 13:25:35 domain-ha CLOSE_NOWRITE:CLOSE:ISDIR
2015-10-14 13:25:35 crm_commands.tmp.3861 CREATE
2015-10-14 13:25:35 crm_commands.tmp.3861 OPEN
2015-10-14 13:25:35 crm_commands.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:35 crm_commands.tmp.3861 MOVED_FROM
2015-10-14 13:25:35 crm_commands MOVED_TO
2015-10-14 13:25:35 DELETE_SELF 
2015-10-14 13:25:35 domain-ha DELETE:ISDIR
2015-10-14 13:25:35 manager_status.tmp.3861 CREATE
2015-10-14 13:25:35 manager_status.tmp.3861 OPEN
2015-10-14 13:25:35 manager_status.tmp.3861 MODIFY
2015-10-14 13:25:35 manager_status.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:35 manager_status.tmp.3861 MOVED_FROM
2015-10-14 13:25:35 manager_status MOVED_TO
2015-10-14 13:25:37 pve-ssl.key OPEN
2015-10-14 13:25:37 pve-ssl.key ACCESS
2015-10-14 13:25:37 pve-ssl.key CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 pve-ssl.pem OPEN
2015-10-14 13:25:37 pve-ssl.pem ACCESS
2015-10-14 13:25:37 pve-ssl.pem CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 authkey.pub OPEN
2015-10-14 13:25:37 authkey.pub ACCESS
2015-10-14 13:25:37 authkey.pub CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 cluster.fw OPEN
2015-10-14 13:25:37 cluster.fw ACCESS
2015-10-14 13:25:37 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 cluster.fw OPEN
2015-10-14 13:25:37 cluster.fw ACCESS
2015-10-14 13:25:37 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 authkey.pub OPEN
2015-10-14 13:25:37 authkey.pub ACCESS
2015-10-14 13:25:37 authkey.pub CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 authorized_keys OPEN
2015-10-14 13:25:38 authorized_keys ACCESS
2015-10-14 13:25:38 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 authorized_keys OPEN
2015-10-14 13:25:38 authorized_keys ACCESS
2015-10-14 13:25:38 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:39 125.conf.tmp.15438 CREATE
2015-10-14 13:25:39 125.conf.tmp.15438 OPEN
2015-10-14 13:25:39 125.conf.tmp.15438 MODIFY
2015-10-14 13:25:39 125.conf.tmp.15438 CLOSE_WRITE:CLOSE
2015-10-14 13:25:39 125.conf.tmp.15438 MOVED_FROM
2015-10-14 13:25:39 125.conf MOVED_TO
2015-10-14 13:25:40 cluster.fw OPEN
2015-10-14 13:25:40 cluster.fw ACCESS
2015-10-14 13:25:40 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:41 ceph.conf OPEN
2015-10-14 13:25:41 ceph.conf ACCESS
2015-10-14 13:25:41 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:41 ceph1.keyring OPEN
2015-10-14 13:25:41 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:41 ceph1.keyring OPEN
2015-10-14 13:25:41 ceph1.keyring ACCESS
2015-10-14 13:25:41 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:43 ha_agent_kvmtest1_lock ATTRIB:ISDIR
2015-10-14 13:25:43 ATTRIB:ISDIR 
2015-10-14 13:25:43 lrm_status.tmp.4283 CREATE
2015-10-14 13:25:43 lrm_status.tmp.4283 OPEN
2015-10-14 13:25:43 lrm_status.tmp.4283 MODIFY
2015-10-14 13:25:43 lrm_status.tmp.4283 CLOSE_WRITE:CLOSE
2015-10-14 13:25:43 lrm_status.tmp.4283 MOVED_FROM
2015-10-14 13:25:43 lrm_status MOVED_TO
2015-10-14 13:25:45 ha_manager_lock ATTRIB:ISDIR
2015-10-14 13:25:45 ATTRIB:ISDIR 
2015-10-14 13:25:45 lrm_status OPEN
2015-10-14 13:25:45 lrm_status ACCESS
2015-10-14 13:25:45 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:45 lrm_status OPEN
2015-10-14 13:25:45 lrm_status ACCESS
2015-10-14 13:25:45 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:45 domain-ha CREATE:ISDIR
2015-10-14 13:25:45 domain-ha OPEN:ISDIR
2015-10-14 13:25:45 domain-ha ACCESS:ISDIR
2015-10-14 13:25:45 domain-ha CLOSE_NOWRITE:CLOSE:ISDIR
2015-10-14 13:25:45 crm_commands.tmp.3861 CREATE
2015-10-14 13:25:45 crm_commands.tmp.3861 OPEN
2015-10-14 13:25:45 crm_commands.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:45 crm_commands.tmp.3861 MOVED_FROM
2015-10-14 13:25:45 crm_commands MOVED_TO
2015-10-14 13:25:45 DELETE_SELF 
2015-10-14 13:25:45 domain-ha DELETE:ISDIR
2015-10-14 13:25:45 manager_status.tmp.3861 CREATE
2015-10-14 13:25:45 manager_status.tmp.3861 OPEN
2015-10-14 13:25:45 manager_status.tmp.3861 MODIFY
2015-10-14 13:25:45 manager_status.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:45 manager_status.tmp.3861 MOVED_FROM
2015-10-14 13:25:45 manager_status MOVED_TO
2015-10-14 13:25:50 cluster.fw OPEN
2015-10-14 13:25:50 cluster.fw ACCESS
2015-10-14 13:25:50 cluster.fw CLOSE_NOWRITE:CLOSE


----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 13:24:45
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume

Don't help :( 

I'll try to launch inotifywatch on /etc/pve source and target, 

and check the date of file move, and maybe if they are other file writes at the same time. 




----- Mail original ----- 
De: "aderumier" <aderumier at odiso.com> 
À: "dietmar" <dietmar at proxmox.com> 
Cc: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Mercredi 14 Octobre 2015 12:10:31 
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume 

>>I would really like to understand what happens. 

Yes, me too ! 

>>I wonder if it may help 
>>if we use 'direct_io' flag for fuse. Would you mind to test? 

Sure, I'll try this afternoon 

----- Mail original ----- 
De: "dietmar" <dietmar at proxmox.com> 
À: "aderumier" <aderumier at odiso.com>, "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Mercredi 14 Octobre 2015 11:30:35 
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume 

> Users have reported resume bug when HA is used. 
> 
> They seem to have a little race (bench show >0s < 1s) between the vm conf file 
> move on source node and replication to, 
> and resume on target node. 
> 
> I don't known why this is only with HA, maybe this occur will standard 
> migration too. 

I would really like to understand what happens. I wonder if it may help 
if we use 'direct_io' flag for fuse. Would you mind to test? 

diff --git a/data/src/pmxcfs.c b/data/src/pmxcfs.c 
index 26cbc30..2c34df2 100644 
--- a/data/src/pmxcfs.c 
+++ b/data/src/pmxcfs.c 
@@ -897,7 +897,7 @@ int main(int argc, char *argv[]) 

mkdir(CFSDIR, 0755); 

- char *fa[] = { "-f", "-odefault_permissions", "-oallow_other", NULL}; 
+ char *fa[] = { "-f", "-odirect_io", "-odefault_permissions", 
"-oallow_other", NULL}; 

struct fuse_args fuse_args = FUSE_ARGS_INIT(sizeof (fa)/sizeof(gpointer) 
- 1, fa); 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


More information about the pve-devel mailing list