[pve-devel] corosync bug: cluster break after 1 node clean shutdown
Fabian Grünbichler
f.gruenbichler at proxmox.com
Tue Sep 29 10:51:32 CEST 2020
On September 28, 2020 5:59 pm, Alexandre DERUMIER wrote:
> Here a new test http://odisoweb1.odiso.net/test5
>
> This has occured at corosync start
>
>
> node1:
> -----
> start corosync : 17:30:19
>
>
> node2: /etc/pve locked
> --------------
> Current time : 17:30:24
>
>
> I have done backtrace of all nodes at same time with parallel ssh at 17:35:22
>
> and a coredump of all nodes at same time with parallel ssh at 17:42:26
>
>
> (Note that this time, /etc/pve was still locked after backtrace/coredump)
okay, so this time two more log lines got printed on the (again) problem
causing node #13, but it still stops logging at a point where this makes
no sense.
I rebuilt the packages:
f318f12e5983cb09d186c2ee37743203f599d103b6abb2d00c78d312b4f12df942d8ed1ff5de6e6c194785d0a81eb881e80f7bbfd4865ca1a5a509acd40f64aa pve-cluster_6.1-8_amd64.deb
b220ee95303e22704793412e83ac5191ba0e53c2f41d85358a247c248d2a6856e5b791b1d12c36007a297056388224acf4e5a1250ef1dd019aee97e8ac4bcac7 pve-cluster-dbgsym_6.1-8_amd64.deb
with a change of how the logging is set up (I now suspect that some
messages might get dropped if the logging throughput is high enough),
let's hope this gets us the information we need. please repeat the test5
again with these packages.
is there anything special about node 13? network topology, slower
hardware, ... ?
More information about the pve-devel
mailing list