[PVE-User] Cluster does not start, corosync timeout...

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Jul 4 13:19:34 CEST 2019


On 7/4/19 12:35 PM, Marco Gaiarin wrote:
> We had a major power outgage here, and our cluster have some trouble on
> restart. The worster was:
> 
>  Jul  3 19:58:40 pvecn1 corosync[3443]:  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
>  Jul  3 19:58:40 pvecn1 corosync[3443]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
>  Jul  3 19:58:40 pvecn1 corosync[3443]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
>  Jul  3 19:58:40 pvecn1 corosync[3443]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
>  Jul  3 20:00:09 pvecn1 systemd[1]: corosync.service: Start operation timed out. Terminating.
>  Jul  3 20:00:09 pvecn1 systemd[1]: corosync.service: Unit entered failed state.

Hmm, that's strange, do you have the full log between "19:58:40" and
"20:00:09", as normally there should be some more info, at least for
corosync and pve-cluster, e.g., the following output would be great:

journalctl -u corosync -u pve-cluster --since "2019-07-03 19:58:40" --until "2019-07-03 20:00:09"

> 
> But... some host in the cluster missed from /etc/hosts: this suffices
> to have corosync not to start correctly?
> 

depends on the config, as you stated yourself with multicast it normally
won't be an issue, but maybe the switch had some issues with multicast initially
after the power outage, as a guess.

> 
> Looking at docs (https://pve.proxmox.com/pve-docs/pve-admin-guide.html):
> 
>  While it’s often common use to reference all other nodenames in /etc/hosts with their IP this is not strictly necessary for a cluster, which normally uses multicast, to work. It maybe useful as you then can connect from one node to the other with SSH through the easier to remember node name.
> 
> this mean i've not multicast correctly working? I was sure i had...

can you please post your corosync.conf ?




More information about the pve-user mailing list