[PVE-User] Cluster does not start, corosync timeout...

Marco Gaiarin gaio at sv.lnf.it
Thu Jul 4 14:23:45 CEST 2019


Mandi! Thomas Lamprecht
  In chel di` si favelave...

> Hmm, that's strange, do you have the full log between "19:58:40" and
> "20:00:09", as normally there should be some more info, at least for
> corosync and pve-cluster, e.g., the following output would be great:
> journalctl -u corosync -u pve-cluster --since "2019-07-03 19:58:40" --until "2019-07-03 20:00:09"

Just rotated:
	root at pvecn1:~# journalctl -u corosync -u pve-cluster --since "2019-07-03 19:58:40" --until "2019-07-03 20:00:09"
	-- Logs begin at Wed 2019-07-03 21:03:31 CEST, end at Thu 2019-07-04 14:12:38 CEST. --

looking at syslog.1:

Jul  3 19:58:40 pvecn1 corosync[3443]:  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Jul  3 19:58:40 pvecn1 corosync[3443]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Jul  3 19:58:40 pvecn1 corosync[3443]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Jul  3 19:58:40 pvecn1 corosync[3443]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Jul  3 19:58:41 pvecn1 pve-firewall[3491]: starting server
Jul  3 19:58:41 pvecn1 pvestatd[3503]: starting server
Jul  3 19:58:41 pvecn1 systemd[1]: Started Proxmox VE firewall.
Jul  3 19:58:41 pvecn1 systemd[1]: Started PVE Status Daemon.
Jul  3 19:58:41 pvecn1 kernel: [   36.327130] ip6_tables: (C) 2000-2006 Netfilter Core Team
Jul  3 19:58:41 pvecn1 kernel: [   36.464756] ip_set: protocol 6
Jul  3 19:58:42 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:58:42 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:58:42 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:58:42 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:58:44 pvecn1 hpasmlited[1740]: hpDeferSPDThread: Starting thread to collect DIMM SPD Data.
Jul  3 19:58:44 pvecn1 hpasmlited[1740]: Initialize data structures successful
Jul  3 19:58:48 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:58:48 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:58:48 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:58:48 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:58:49 pvecn1 multipathd[913]: zd0: triggering change event to reinitialize
Jul  3 19:58:49 pvecn1 multipathd[913]: zd0: add path (uevent)
Jul  3 19:58:49 pvecn1 multipathd[913]: zd0: spurious uevent, path already in pathvec
Jul  3 19:58:49 pvecn1 multipathd[913]: zd0: failed to get udev uid: Invalid argument
Jul  3 19:58:49 pvecn1 multipathd[913]: zd0: failed to get sysfs uid: Invalid argument
Jul  3 19:58:49 pvecn1 multipathd[913]: zd0: failed to get sgio uid: Inappropriate ioctl for device
Jul  3 19:58:49 pvecn1 multipathd[913]: zd0: failed to get path uid
Jul  3 19:58:49 pvecn1 multipathd[913]: uevent trigger error
Jul  3 19:58:52 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 19:58:54 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:58:54 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:58:54 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:58:54 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:00 pvecn1 systemd[1]: Starting Proxmox VE replication runner...
Jul  3 19:59:00 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:00 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:00 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:00 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:00 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:01 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 19:59:01 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:02 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:03 pvecn1 hpasmlited[1740]: hpDeferSPDThread: End of Collecting DIMM SPD data.
Jul  3 19:59:03 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:04 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:05 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:06 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:06 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:06 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:06 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:06 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:07 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:08 pvecn1 pvesr[3641]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 19:59:09 pvecn1 pvesr[3641]: error with cfs lock 'file-replication_cfg': no quorum!
Jul  3 19:59:09 pvecn1 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jul  3 19:59:09 pvecn1 systemd[1]: Failed to start Proxmox VE replication runner.
Jul  3 19:59:09 pvecn1 systemd[1]: pvesr.service: Unit entered failed state.
Jul  3 19:59:09 pvecn1 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul  3 19:59:11 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 19:59:12 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:12 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:12 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:12 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:18 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:18 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:18 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:18 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:21 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 19:59:24 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:24 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:24 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:24 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:30 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:30 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:30 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:30 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:31 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 19:59:36 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:36 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:36 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:36 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:41 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 19:59:42 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:42 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:42 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:42 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:48 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:48 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:48 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:48 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 19:59:51 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 19:59:54 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 19:59:54 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 19:59:54 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 19:59:54 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 20:00:00 pvecn1 systemd[1]: Starting Proxmox VE replication runner...
Jul  3 20:00:00 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:00 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 20:00:00 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 20:00:00 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 20:00:00 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 20:00:01 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:01 pvecn1 pvestatd[3503]: storage 'Backup' is not online
Jul  3 20:00:02 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:03 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:04 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:05 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:06 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:06 pvecn1 pmxcfs[3322]: [quorum] crit: quorum_initialize failed: 2
Jul  3 20:00:06 pvecn1 pmxcfs[3322]: [confdb] crit: cmap_initialize failed: 2
Jul  3 20:00:06 pvecn1 pmxcfs[3322]: [dcdb] crit: cpg_initialize failed: 2
Jul  3 20:00:06 pvecn1 pmxcfs[3322]: [status] crit: cpg_initialize failed: 2
Jul  3 20:00:07 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:08 pvecn1 pvesr[4068]: trying to acquire cfs lock 'file-replication_cfg' ...
Jul  3 20:00:09 pvecn1 pvesr[4068]: error with cfs lock 'file-replication_cfg': no quorum!
Jul  3 20:00:09 pvecn1 systemd[1]: corosync.service: Start operation timed out. Terminating.
Jul  3 20:00:09 pvecn1 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jul  3 20:00:09 pvecn1 systemd[1]: Failed to start Proxmox VE replication runner.
Jul  3 20:00:09 pvecn1 systemd[1]: pvesr.service: Unit entered failed state.
Jul  3 20:00:09 pvecn1 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul  3 20:00:09 pvecn1 systemd[1]: Failed to start Corosync Cluster Engine.
Jul  3 20:00:09 pvecn1 systemd[1]: corosync.service: Unit entered failed state.
Jul  3 20:00:09 pvecn1 systemd[1]: corosync.service: Failed with result 'timeout'.
Jul  3 20:00:09 pvecn1 systemd[1]: Starting PVE API Daemon...

Note that i'm not using pvesr, so all the warning about it can be
safely ignored.
Also, 'Backup' storage is a NFS storage in one of the node, probably
still booting...


> > But... some host in the cluster missed from /etc/hosts: this suffices
> > to have corosync not to start correctly?
> depends on the config, as you stated yourself with multicast it normally
> won't be an issue, but maybe the switch had some issues with multicast initially
> after the power outage, as a guess.

I've tried to check multipath with 'omping' now (and i'm sure i've had
checked setting up the claster), and works.

So i'm not sure how multicast can 'not work initially', and subsequent
start to work...


> can you please post your corosync.conf ?

Sure!

root at pvecn1:~# cat /etc/pve/corosync.conf 
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pvecn2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: pvecn2
  }

  node {
    name: pvecn1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: pvecn1
  }

  node {
    name: pvecn3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: pvecn3
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: CONEGLIANO
  config_version: 3
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 10.10.1.50
    ringnumber: 0
  }

}


Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)


More information about the pve-user mailing list