[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Wolfgang Bumiller
w.bumiller at proxmox.com
Fri Dec 23 10:53:12 CET 2016
On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote:
> Hi wolfgang,
>
> I have done more test with drive mirror and nbd target.
>
> It seem that the hang occur only if the target ip is unreacheable (no network reponse)
>
> # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target
> ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout
>
> If the ip address exist and up,
> # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target
> Failed to connect socket: Connection refused
>
>
>
> I'm not sure, maybe it can hang too if pve-firewall do a drop instead a reject on target port.
>
>
>
> I think this come from in qemu net/socket.c,
>
> where we have an infinite loop.
>
> I'm not sure how to add a timeout here, help is welcome :)
Doesn't the job show up in query-block-jobs before the connection
finishes? If so then we could just assume if it's still at 0% after a
timeout the connection doesn't work? The question is though, what is a
good timeout?
Or we find (or add) a way to query the connection status, then we can
poll that before we start polling the block-job status.
(we already poll for stuff, so it's not worse than before ;p)
Then again, an actual timeout on qemu's side might even make it
upstream...? Maybe? (If it doesn't I'd prefer polling as described above
over maintaining yet another qemu patch ;-) )
But it won't be a simple select/poll timeout as they use non-blocking
sockets and register the fd with a callback for the connection
completion later (in net_socket_fd_init() it calls
qemu_set_fd_handler() where it ends up controlled by the aio handlers).
It might be possible to add a timer (qemu_timer_new() & friends
perhaps?) to NetSocketState (the NetSocktState cleanup would have to
remove it, and if it gets triggered it cancels the connection instead).
> static int net_socket_connect_init(NetClientState *peer,
> const char *model,
> const char *name,
> const char *host_str)
> {
> NetSocketState *s;
> int fd, connected, ret;
> struct sockaddr_in saddr;
>
> if (parse_host_port(&saddr, host_str) < 0)
> return -1;
>
> fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> if (fd < 0) {
> perror("socket");
> return -1;
> }
> qemu_set_nonblock(fd);
>
> connected = 0;
> for(;;) {
> ret = connect(fd, (struct sockaddr *)&saddr, sizeof(saddr));
> if (ret < 0) {
> if (errno == EINTR || errno == EWOULDBLOCK) {
> /* continue */
> } else if (errno == EINPROGRESS ||
> errno == EALREADY ||
> errno == EINVAL) {
> break;
> } else {
> perror("connect");
> closesocket(fd);
> return -1;
> }
> } else {
> connected = 1;
> break;
> }
> }
> s = net_socket_fd_init(peer, model, name, fd, connected);
> if (!s)
> return -1;
> snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> "socket: connect to %s:%d",
> inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> return 0;
> }
>
>
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier at odiso.com>
> À: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
> Cc: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Mercredi 21 Décembre 2016 13:57:10
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>
> >>Then it can still hang if the destination disappears between tcp_ping()
> >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's
> >>side. It needs a time-out or a way to cancel it or something.
> Yes sure!
>
> I'm currently looking at qemu code to see how nbd client works.
>
> ----- Mail original -----
> De: "Wolfgang Bumiller" <w.bumiller at proxmox.com>
> À: "aderumier" <aderumier at odiso.com>
> Cc: "pve-devel" <pve-devel at pve.proxmox.com>, "dietmar" <dietmar at proxmox.com>
> Envoyé: Mercredi 21 Décembre 2016 12:20:28
> Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>
> > On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderumier at odiso.com> wrote:
> >
> >
> > >>IIRC that was the only blocker.
> > >>
> > >>Basically the patchset has to work *without* tcp_ping() since it is an
> > >>unreliable check, and then we still have to catch failing connections
> > >>_correctly_. (There's no point in knowing that "some time in the past
> > >>you were able to connect to something which may or may not have been a
> > >>qemu nbd server", we need to know whether the drive-mirror job itself
> > >>was able to connect.)
> >
> > For me, the mirror job auto abort if connection is failing during the migration. Do you see another behaviour ?
>
> That covers one problem. IIRC the disk-deletion problem was that due
> to wrong [] usage around an ipv6 address it could not connect in the
> first place and didn't error as I would have hoped.
>
> >
> > the tcp_ping was just before launching the drive mirror command, because it was hanging in this case.
>
> Then it can still hang if the destination disappears between tcp_ping()
> and the `drive-mirror` command, so I'd rather get better behavior on qemu's
> side. It needs a time-out or a way to cancel it or something.
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
More information about the pve-devel
mailing list