[pve-devel] [PATCH] migrate : add nocheck for resume
Alexandre DERUMIER
aderumier at odiso.com
Wed Oct 14 14:15:42 CEST 2015
I'm able to reproduce it without HA
kvmtest1 -> kvmtest2
--------------------
Oct 14 14:11:20 ERROR: unable to find configuration file for VM 125 - no such machine
Oct 14 14:11:20 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root at 10.3.94.47 qm resume 125 --skiplock' failed: exit code 2
Oct 14 14:11:23 ERROR: migration finished with problems (duration 00:00:10)
kvmtest1
---------
2015-10-14 14:11:18 125.conf MOVED_FROM
2015-10-14 14:11:18 125.conf MOVED_TO
2015-10-14 14:11:18 known_hosts OPEN
2015-10-14 14:11:18 known_hosts ACCESS
2015-10-14 14:11:18 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:18 known_hosts OPEN
2015-10-14 14:11:18 known_hosts ACCESS
2015-10-14 14:11:18 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:20 cluster.fw OPEN
2015-10-14 14:11:20 cluster.fw ACCESS
2015-10-14 14:11:20 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:21 ceph.conf OPEN
2015-10-14 14:11:21 ceph.conf ACCESS
2015-10-14 14:11:21 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:21 ceph1.keyring OPEN
2015-10-14 14:11:21 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:21 ceph1.keyring OPEN
2015-10-14 14:11:21 ceph1.keyring ACCESS
2015-10-14 14:11:21 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:21 cluster.fw OPEN
2015-10-14 14:11:21 cluster.fw ACCESS
2015-10-14 14:11:21 cluster.fw OPEN
2015-10-14 14:11:21 cluster.fw ACCESS
2015-10-14 14:11:21 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:21 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:22 known_hosts OPEN
2015-10-14 14:11:22 known_hosts ACCESS
2015-10-14 14:11:22 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:22 known_hosts OPEN
2015-10-14 14:11:22 known_hosts ACCESS
2015-10-14 14:11:22 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:23 ha_agent_kvmtest1_lock ATTRIB:ISDIR
2015-10-14 14:11:23 ATTRIB:ISDIR
2015-10-14 14:11:23 lrm_status.tmp.4283 CREATE
2015-10-14 14:11:23 lrm_status.tmp.4283 OPEN
2015-10-14 14:11:23 lrm_status.tmp.4283 MODIFY
2015-10-14 14:11:23 lrm_status.tmp.4283 CLOSE_WRITE:CLOSE
2015-10-14 14:11:23 lrm_status.tmp.4283 MOVED_FROM
2015-10-14 14:11:23 lrm_status MOVED_TO
kvmtest2
---------
2015-10-14 14:11:18 authorized_keys OPEN
2015-10-14 14:11:18 authorized_keys ACCESS
2015-10-14 14:11:18 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:18 authorized_keys OPEN
2015-10-14 14:11:18 authorized_keys ACCESS
2015-10-14 14:11:18 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:19 ha_agent_kvmtest2_lock ATTRIB:ISDIR
2015-10-14 14:11:19 ATTRIB:ISDIR
2015-10-14 14:11:19 lrm_status.tmp.3995 CREATE
2015-10-14 14:11:19 lrm_status.tmp.3995 OPEN
2015-10-14 14:11:19 lrm_status.tmp.3995 MODIFY
2015-10-14 14:11:19 lrm_status.tmp.3995 CLOSE_WRITE:CLOSE
2015-10-14 14:11:19 lrm_status.tmp.3995 MOVED_FROM
2015-10-14 14:11:19 lrm_status MOVED_TO
2015-10-14 14:11:22 authorized_keys OPEN
2015-10-14 14:11:22 authorized_keys ACCESS
2015-10-14 14:11:22 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:22 authorized_keys OPEN
2015-10-14 14:11:22 authorized_keys ACCESS
2015-10-14 14:11:22 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 14:11:23 125.conf.tmp.18587 CREATE
2015-10-14 14:11:23 125.conf.tmp.18587 OPEN
2015-10-14 14:11:23 125.conf.tmp.18587 MODIFY
2015-10-14 14:11:23 125.conf.tmp.18587 CLOSE_WRITE:CLOSE
2015-10-14 14:11:23 125.conf.tmp.18587 MOVED_FROM
2015-10-14 14:11:23 125.conf MOVED_TO
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 13:34:44
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume
Here a inotify trace on /etc/pve , when the problem has occured.
source : 2015-10-14 13:25:34 125.conf MOVED_FROM
target : 2015-10-14 13:25:39 125.conf.tmp.15438 MOVED_FROM
(5s difference, ouch ...)
Not sure it's related, but they are also lrm_status.tmp file move, with ha.
Don't known if it can slowdown things on the corosync layer.
kvmtest2->kvmtest1
-------------------
Oct 14 13:25:36 ERROR: unable to find configuration file for VM 125 - no such machine
Oct 14 13:25:36 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root at 10.3.94.31 qm resume 125 --skiplock' failed: exit code 2
Oct 14 13:25:39 ERROR: migration finished with problems (duration 00:00:10)
TASK ERROR: migration problems
kvmtest2
--------
2015-10-14 13:25:29 ha_agent_kvmtest2_lock ATTRIB:ISDIR
2015-10-14 13:25:29 ATTRIB:ISDIR
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:29 125.conf.tmp.12645 CREATE
2015-10-14 13:25:29 125.conf.tmp.12645 OPEN
2015-10-14 13:25:29 125.conf.tmp.12645 MODIFY
2015-10-14 13:25:29 125.conf.tmp.12645 CLOSE_WRITE:CLOSE
2015-10-14 13:25:29 125.conf.tmp.12645 MOVED_FROM
2015-10-14 13:25:29 125.conf MOVED_TO
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:29 known_hosts OPEN
2015-10-14 13:25:29 known_hosts ACCESS
2015-10-14 13:25:29 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 known_hosts OPEN
2015-10-14 13:25:31 known_hosts ACCESS
2015-10-14 13:25:31 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 known_hosts OPEN
2015-10-14 13:25:31 known_hosts ACCESS
2015-10-14 13:25:31 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:33 cluster.fw OPEN
2015-10-14 13:25:33 cluster.fw ACCESS
2015-10-14 13:25:33 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:34 lrm_status.tmp.3995 CREATE
2015-10-14 13:25:34 lrm_status.tmp.3995 OPEN
2015-10-14 13:25:34 lrm_status.tmp.3995 MODIFY
2015-10-14 13:25:34 lrm_status.tmp.3995 CLOSE_WRITE:CLOSE
2015-10-14 13:25:34 lrm_status.tmp.3995 MOVED_FROM
2015-10-14 13:25:34 lrm_status MOVED_TO
2015-10-14 13:25:34 125.conf MOVED_FROM
2015-10-14 13:25:34 125.conf MOVED_TO
2015-10-14 13:25:34 known_hosts OPEN
2015-10-14 13:25:34 known_hosts ACCESS
2015-10-14 13:25:34 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:34 known_hosts OPEN
2015-10-14 13:25:34 known_hosts ACCESS
2015-10-14 13:25:34 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:36 ceph.conf OPEN
2015-10-14 13:25:36 ceph.conf ACCESS
2015-10-14 13:25:36 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:36 ceph1.keyring OPEN
2015-10-14 13:25:36 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:36 ceph1.keyring OPEN
2015-10-14 13:25:36 ceph1.keyring ACCESS
2015-10-14 13:25:36 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 known_hosts OPEN
2015-10-14 13:25:38 known_hosts ACCESS
2015-10-14 13:25:38 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 known_hosts OPEN
2015-10-14 13:25:38 known_hosts ACCESS
2015-10-14 13:25:38 known_hosts CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:39 ha_agent_kvmtest2_lock ATTRIB:ISDIR
2015-10-14 13:25:39 ATTRIB:ISDIR
2015-10-14 13:25:39 lrm_status.tmp.3995 CREATE
2015-10-14 13:25:39 lrm_status.tmp.3995 OPEN
2015-10-14 13:25:39 lrm_status.tmp.3995 MODIFY
2015-10-14 13:25:39 lrm_status.tmp.3995 CLOSE_WRITE:CLOSE
2015-10-14 13:25:39 lrm_status.tmp.3995 MOVED_FROM
2015-10-14 13:25:39 lrm_status MOVED_TO
kvmtest1
--------
2015-10-14 13:25:30 cluster.fw OPEN
2015-10-14 13:25:30 cluster.fw ACCESS
2015-10-14 13:25:30 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph.conf OPEN
2015-10-14 13:25:31 ceph.conf ACCESS
2015-10-14 13:25:31 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring ACCESS
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 authorized_keys OPEN
2015-10-14 13:25:31 authorized_keys ACCESS
2015-10-14 13:25:31 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 authorized_keys OPEN
2015-10-14 13:25:31 authorized_keys ACCESS
2015-10-14 13:25:31 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph.conf OPEN
2015-10-14 13:25:31 ceph.conf ACCESS
2015-10-14 13:25:31 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:31 ceph1.keyring OPEN
2015-10-14 13:25:31 ceph1.keyring ACCESS
2015-10-14 13:25:31 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:33 ha_agent_kvmtest1_lock ATTRIB:ISDIR
2015-10-14 13:25:33 ATTRIB:ISDIR
2015-10-14 13:25:33 lrm_status.tmp.4283 CREATE
2015-10-14 13:25:33 lrm_status.tmp.4283 OPEN
2015-10-14 13:25:33 lrm_status.tmp.4283 MODIFY
2015-10-14 13:25:33 lrm_status.tmp.4283 CLOSE_WRITE:CLOSE
2015-10-14 13:25:33 lrm_status.tmp.4283 MOVED_FROM
2015-10-14 13:25:33 lrm_status MOVED_TO
2015-10-14 13:25:34 authorized_keys OPEN
2015-10-14 13:25:34 authorized_keys ACCESS
2015-10-14 13:25:34 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:34 authorized_keys OPEN
2015-10-14 13:25:34 authorized_keys ACCESS
2015-10-14 13:25:34 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:35 ha_manager_lock ATTRIB:ISDIR
2015-10-14 13:25:35 ATTRIB:ISDIR
2015-10-14 13:25:35 lrm_status OPEN
2015-10-14 13:25:35 lrm_status ACCESS
2015-10-14 13:25:35 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:35 lrm_status OPEN
2015-10-14 13:25:35 lrm_status ACCESS
2015-10-14 13:25:35 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:35 domain-ha CREATE:ISDIR
2015-10-14 13:25:35 domain-ha OPEN:ISDIR
2015-10-14 13:25:35 domain-ha ACCESS:ISDIR
2015-10-14 13:25:35 domain-ha CLOSE_NOWRITE:CLOSE:ISDIR
2015-10-14 13:25:35 crm_commands.tmp.3861 CREATE
2015-10-14 13:25:35 crm_commands.tmp.3861 OPEN
2015-10-14 13:25:35 crm_commands.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:35 crm_commands.tmp.3861 MOVED_FROM
2015-10-14 13:25:35 crm_commands MOVED_TO
2015-10-14 13:25:35 DELETE_SELF
2015-10-14 13:25:35 domain-ha DELETE:ISDIR
2015-10-14 13:25:35 manager_status.tmp.3861 CREATE
2015-10-14 13:25:35 manager_status.tmp.3861 OPEN
2015-10-14 13:25:35 manager_status.tmp.3861 MODIFY
2015-10-14 13:25:35 manager_status.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:35 manager_status.tmp.3861 MOVED_FROM
2015-10-14 13:25:35 manager_status MOVED_TO
2015-10-14 13:25:37 pve-ssl.key OPEN
2015-10-14 13:25:37 pve-ssl.key ACCESS
2015-10-14 13:25:37 pve-ssl.key CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 pve-ssl.pem OPEN
2015-10-14 13:25:37 pve-ssl.pem ACCESS
2015-10-14 13:25:37 pve-ssl.pem CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 authkey.pub OPEN
2015-10-14 13:25:37 authkey.pub ACCESS
2015-10-14 13:25:37 authkey.pub CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 cluster.fw OPEN
2015-10-14 13:25:37 cluster.fw ACCESS
2015-10-14 13:25:37 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 cluster.fw OPEN
2015-10-14 13:25:37 cluster.fw ACCESS
2015-10-14 13:25:37 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:37 authkey.pub OPEN
2015-10-14 13:25:37 authkey.pub ACCESS
2015-10-14 13:25:37 authkey.pub CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 authorized_keys OPEN
2015-10-14 13:25:38 authorized_keys ACCESS
2015-10-14 13:25:38 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:38 authorized_keys OPEN
2015-10-14 13:25:38 authorized_keys ACCESS
2015-10-14 13:25:38 authorized_keys CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:39 125.conf.tmp.15438 CREATE
2015-10-14 13:25:39 125.conf.tmp.15438 OPEN
2015-10-14 13:25:39 125.conf.tmp.15438 MODIFY
2015-10-14 13:25:39 125.conf.tmp.15438 CLOSE_WRITE:CLOSE
2015-10-14 13:25:39 125.conf.tmp.15438 MOVED_FROM
2015-10-14 13:25:39 125.conf MOVED_TO
2015-10-14 13:25:40 cluster.fw OPEN
2015-10-14 13:25:40 cluster.fw ACCESS
2015-10-14 13:25:40 cluster.fw CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:41 ceph.conf OPEN
2015-10-14 13:25:41 ceph.conf ACCESS
2015-10-14 13:25:41 ceph.conf CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:41 ceph1.keyring OPEN
2015-10-14 13:25:41 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:41 ceph1.keyring OPEN
2015-10-14 13:25:41 ceph1.keyring ACCESS
2015-10-14 13:25:41 ceph1.keyring CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:43 ha_agent_kvmtest1_lock ATTRIB:ISDIR
2015-10-14 13:25:43 ATTRIB:ISDIR
2015-10-14 13:25:43 lrm_status.tmp.4283 CREATE
2015-10-14 13:25:43 lrm_status.tmp.4283 OPEN
2015-10-14 13:25:43 lrm_status.tmp.4283 MODIFY
2015-10-14 13:25:43 lrm_status.tmp.4283 CLOSE_WRITE:CLOSE
2015-10-14 13:25:43 lrm_status.tmp.4283 MOVED_FROM
2015-10-14 13:25:43 lrm_status MOVED_TO
2015-10-14 13:25:45 ha_manager_lock ATTRIB:ISDIR
2015-10-14 13:25:45 ATTRIB:ISDIR
2015-10-14 13:25:45 lrm_status OPEN
2015-10-14 13:25:45 lrm_status ACCESS
2015-10-14 13:25:45 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:45 lrm_status OPEN
2015-10-14 13:25:45 lrm_status ACCESS
2015-10-14 13:25:45 lrm_status CLOSE_NOWRITE:CLOSE
2015-10-14 13:25:45 domain-ha CREATE:ISDIR
2015-10-14 13:25:45 domain-ha OPEN:ISDIR
2015-10-14 13:25:45 domain-ha ACCESS:ISDIR
2015-10-14 13:25:45 domain-ha CLOSE_NOWRITE:CLOSE:ISDIR
2015-10-14 13:25:45 crm_commands.tmp.3861 CREATE
2015-10-14 13:25:45 crm_commands.tmp.3861 OPEN
2015-10-14 13:25:45 crm_commands.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:45 crm_commands.tmp.3861 MOVED_FROM
2015-10-14 13:25:45 crm_commands MOVED_TO
2015-10-14 13:25:45 DELETE_SELF
2015-10-14 13:25:45 domain-ha DELETE:ISDIR
2015-10-14 13:25:45 manager_status.tmp.3861 CREATE
2015-10-14 13:25:45 manager_status.tmp.3861 OPEN
2015-10-14 13:25:45 manager_status.tmp.3861 MODIFY
2015-10-14 13:25:45 manager_status.tmp.3861 CLOSE_WRITE:CLOSE
2015-10-14 13:25:45 manager_status.tmp.3861 MOVED_FROM
2015-10-14 13:25:45 manager_status MOVED_TO
2015-10-14 13:25:50 cluster.fw OPEN
2015-10-14 13:25:50 cluster.fw ACCESS
2015-10-14 13:25:50 cluster.fw CLOSE_NOWRITE:CLOSE
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 13:24:45
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume
Don't help :(
I'll try to launch inotifywatch on /etc/pve source and target,
and check the date of file move, and maybe if they are other file writes at the same time.
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "dietmar" <dietmar at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 12:10:31
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume
>>I would really like to understand what happens.
Yes, me too !
>>I wonder if it may help
>>if we use 'direct_io' flag for fuse. Would you mind to test?
Sure, I'll try this afternoon
----- Mail original -----
De: "dietmar" <dietmar at proxmox.com>
À: "aderumier" <aderumier at odiso.com>, "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 11:30:35
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume
> Users have reported resume bug when HA is used.
>
> They seem to have a little race (bench show >0s < 1s) between the vm conf file
> move on source node and replication to,
> and resume on target node.
>
> I don't known why this is only with HA, maybe this occur will standard
> migration too.
I would really like to understand what happens. I wonder if it may help
if we use 'direct_io' flag for fuse. Would you mind to test?
diff --git a/data/src/pmxcfs.c b/data/src/pmxcfs.c
index 26cbc30..2c34df2 100644
--- a/data/src/pmxcfs.c
+++ b/data/src/pmxcfs.c
@@ -897,7 +897,7 @@ int main(int argc, char *argv[])
mkdir(CFSDIR, 0755);
- char *fa[] = { "-f", "-odefault_permissions", "-oallow_other", NULL};
+ char *fa[] = { "-f", "-odirect_io", "-odefault_permissions",
"-oallow_other", NULL};
struct fuse_args fuse_args = FUSE_ARGS_INIT(sizeof (fa)/sizeof(gpointer)
- 1, fa);
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list