[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Alexandre DERUMIER
aderumier at odiso.com
Fri Dec 23 10:20:03 CET 2016
Hi wolfgang,
I have done more test with drive mirror and nbd target.
It seem that the hang occur only if the target ip is unreacheable (no network reponse)
# drive_mirror -n drive-scsi0 nbd://66.66.66.66/target
ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout
If the ip address exist and up,
# drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target
Failed to connect socket: Connection refused
I'm not sure, maybe it can hang too if pve-firewall do a drop instead a reject on target port.
I think this come from in qemu net/socket.c,
where we have an infinite loop.
I'm not sure how to add a timeout here, help is welcome :)
static int net_socket_connect_init(NetClientState *peer,
const char *model,
const char *name,
const char *host_str)
{
NetSocketState *s;
int fd, connected, ret;
struct sockaddr_in saddr;
if (parse_host_port(&saddr, host_str) < 0)
return -1;
fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
if (fd < 0) {
perror("socket");
return -1;
}
qemu_set_nonblock(fd);
connected = 0;
for(;;) {
ret = connect(fd, (struct sockaddr *)&saddr, sizeof(saddr));
if (ret < 0) {
if (errno == EINTR || errno == EWOULDBLOCK) {
/* continue */
} else if (errno == EINPROGRESS ||
errno == EALREADY ||
errno == EINVAL) {
break;
} else {
perror("connect");
closesocket(fd);
return -1;
}
} else {
connected = 1;
break;
}
}
s = net_socket_fd_init(peer, model, name, fd, connected);
if (!s)
return -1;
snprintf(s->nc.info_str, sizeof(s->nc.info_str),
"socket: connect to %s:%d",
inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
return 0;
}
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 21 Décembre 2016 13:57:10
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>Then it can still hang if the destination disappears between tcp_ping()
>>and the `drive-mirror` command, so I'd rather get better behavior on qemu's
>>side. It needs a time-out or a way to cancel it or something.
Yes sure!
I'm currently looking at qemu code to see how nbd client works.
----- Mail original -----
De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
À: "aderumier" <aderumier at odiso.com>
Cc: "pve-devel" <pve-devel at pve.proxmox.com>, "dietmar" <dietmar at proxmox.com>
Envoyé: Mercredi 21 Décembre 2016 12:20:28
Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderumier at odiso.com> wrote:
>
>
> >>IIRC that was the only blocker.
> >>
> >>Basically the patchset has to work *without* tcp_ping() since it is an
> >>unreliable check, and then we still have to catch failing connections
> >>_correctly_. (There's no point in knowing that "some time in the past
> >>you were able to connect to something which may or may not have been a
> >>qemu nbd server", we need to know whether the drive-mirror job itself
> >>was able to connect.)
>
> For me, the mirror job auto abort if connection is failing during the migration. Do you see another behaviour ?
That covers one problem. IIRC the disk-deletion problem was that due
to wrong [] usage around an ipv6 address it could not connect in the
first place and didn't error as I would have hoped.
>
> the tcp_ping was just before launching the drive mirror command, because it was hanging in this case.
Then it can still hang if the destination disappears between tcp_ping()
and the `drive-mirror` command, so I'd rather get better behavior on qemu's
side. It needs a time-out or a way to cancel it or something.
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list