[pve-devel] corosync bug: cluster break after 1 node clean shutdown
Alexandre DERUMIER
aderumier at odiso.com
Tue Sep 29 12:52:44 CEST 2020
here a new test:
http://odisoweb1.odiso.net/test6/
node1
-----
start corosync : 12:08:33
node2 (/etc/pve lock)
-----
Current time : 12:08:39
node1 (stop corosync : unlock /etc/pve)
-----
12:28:11 : systemctl stop corosync
backtraces: 12:26:30
coredump : 12:27:21
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "Proxmox VE development discussion" <pve-devel at lists.proxmox.com>
Envoyé: Mardi 29 Septembre 2020 11:37:41
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
>>with a change of how the logging is set up (I now suspect that some
>>messages might get dropped if the logging throughput is high enough),
>>let's hope this gets us the information we need. please repeat the test5
>>again with these packages.
I'll test this afternoon
>>is there anything special about node 13? network topology, slower
>>hardware, ... ?
no nothing special, all nodes have exactly same hardware/cpu (24cores/48threads 3ghz)/memory/disk.
this node is around 10% cpu usage, load is around 5.
----- Mail original -----
De: "Fabian Grünbichler" <f.gruenbichler at proxmox.com>
À: "Proxmox VE development discussion" <pve-devel at lists.proxmox.com>
Envoyé: Mardi 29 Septembre 2020 10:51:32
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown
On September 28, 2020 5:59 pm, Alexandre DERUMIER wrote:
> Here a new test http://odisoweb1.odiso.net/test5
>
> This has occured at corosync start
>
>
> node1:
> -----
> start corosync : 17:30:19
>
>
> node2: /etc/pve locked
> --------------
> Current time : 17:30:24
>
>
> I have done backtrace of all nodes at same time with parallel ssh at 17:35:22
>
> and a coredump of all nodes at same time with parallel ssh at 17:42:26
>
>
> (Note that this time, /etc/pve was still locked after backtrace/coredump)
okay, so this time two more log lines got printed on the (again) problem
causing node #13, but it still stops logging at a point where this makes
no sense.
I rebuilt the packages:
f318f12e5983cb09d186c2ee37743203f599d103b6abb2d00c78d312b4f12df942d8ed1ff5de6e6c194785d0a81eb881e80f7bbfd4865ca1a5a509acd40f64aa pve-cluster_6.1-8_amd64.deb
b220ee95303e22704793412e83ac5191ba0e53c2f41d85358a247c248d2a6856e5b791b1d12c36007a297056388224acf4e5a1250ef1dd019aee97e8ac4bcac7 pve-cluster-dbgsym_6.1-8_amd64.deb
with a change of how the logging is set up (I now suspect that some
messages might get dropped if the logging throughput is high enough),
let's hope this gets us the information we need. please repeat the test5
again with these packages.
is there anything special about node 13? network topology, slower
hardware, ... ?
_______________________________________________
pve-devel mailing list
pve-devel at lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list