[pve-devel] [PATCH cluster v2 0/8] initial API adaption to corosync 3/kronosnet

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Jun 14 15:03:56 CEST 2019


On Wed, Jun 12, 2019 at 04:36:00PM +0200, Fabian Grünbichler wrote:
> sent some comments as replies to individual patches, rest LGTM - but
> only read and build-tested so far. I'll try to give this (or a
> subsequent v3) an actual spin this week as well if time permits.

did that now, and noticed the following:

ISSUE 1

creating a cluster with

 pvesh create /cluster/config -clustername thomastest -link0 192.168.21.71 -link5 10.0.0.71 

creates an invalid corosync.conf:

Jun 14 14:23:50 clustertest71 systemd[1]: Starting Corosync Cluster Engine...
Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] Corosync Cluster Engine 3.0.1-dirty starting up
Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] parse error in config: Not all nodes have the same number of links
Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1386.
Jun 14 14:23:50 clustertest71 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Jun 14 14:23:50 clustertest71 systemd[1]: corosync.service: Failed with result 'exit-code'.
Jun 14 14:23:50 clustertest71 systemd[1]: Failed to start Corosync Cluster Engine.

$ cat /etc/corosync/corosync.conf

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: clustertest71
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.21.71
    ring5_addr: 10.0.0.71
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: thomastest
  config_version: 1
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 5
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

doing the same with link0 and link1 instead of link0 and link5 works.
subsequently changing corosync.conf to have link0 and linkX with X != 1
also works, although the reload complains with the same error message
(cmap and corosync-cfgtool show the updated status just fine).
restarting corosync fails, again with the status shown above.

haven't checked yet whether that is an issue on our side or corosync,
but probably worth an investigation ;)

ISSUE 2

after creating a cluster with link0 and link1 configured on node A, a
join on node B with just link0 set fails with the following (incorrect)
error message:

500 corosync: totem interface with linknumber 0 configured but 'link0' parameter not defined!

but not on all attempts, sometimes it correctly complains about link1

passing the same address twice for link0 and link1 also fails with an
incorrect message:

500 corosync: totem interface with linknumber 1 configured but 'link1' parameter not defined!

further investigation showed that that check uses {id} instead of ${id}
in line 262 of PVE/API2/ClusterConfig.pm

with that fixed up, joining works, but the generated corosync.conf is
again wrong:


logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: clustertest71
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.21.71
    ring1_addr: 10.0.0.71
  }
  node {
    name: clustertest72
    nodeid: 2
    quorum_votes: 1
    ring0_addr {
      address: 192.168.21.72
    }
    ring1_addr {
      address: 10.0.0.72
    }
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: thomastest
  config_version: 3
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 1
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

stopped testing there for now ;)

> On Tue, Jun 11, 2019 at 07:36:25PM +0200, Thomas Lamprecht wrote:
> > v2 with a few more patches and some changes regarding Fabian's feedback
> > (thanks) from the initial RFC version[0].
> > 
> > Last two patches, both new, aren't really tested by me to much, just FYI,
> > the others seen mostly reasonable adaptions and got smoke-tested for them, as I
> > looked more close last time (sorry, trying to fit quite much in my schedule,
> > currently).
> > 
> > [0]: https://pve.proxmox.com/pipermail/pve-devel/2019-May/037169.html
> > 
> > Thomas Lamprecht (8):
> >   corosync config: support 'linknumber' property
> >   add new corosync-link format
> >   cluster create: use new corosync-link format for totem interfaces
> >   corosync: allow to set link priorities
> >   node join: use new corosync link parameters
> >   remove now unused old corosync ringX_addr formats
> >   allow to create a cluster with all possible knet links
> >   api: join/add: generalize and allow all knet links
> > 
> >  data/PVE/API2/ClusterConfig.pm | 85 ++++++++++++++--------------------
> >  data/PVE/CLI/pvecm.pm          | 15 ++++--
> >  data/PVE/Cluster.pm            | 72 +++++++++++++++++++++++++---
> >  data/PVE/Corosync.pm           | 59 ++++++++++++-----------
> >  4 files changed, 139 insertions(+), 92 deletions(-)
> > 
> > -- 
> > 2.20.1
> > 
> > 
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel at pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel




More information about the pve-devel mailing list