[pve-devel] corosync bug: cluster break after 1 node clean shutdown

Tue Sep 15 16:57:46 CEST 2020

>>I mean this is bad, but also great! 
>>Cam you do a coredump of the whole thing and upload it somewhere with the version info 
>>used (for dbgsym package)? That could help a lot.

I'll try to reproduce it again (with the full lock everywhere), and do the coredump.

I have tried the real time scheduling,

but I still have been able to reproduce the "lrm too long" for 60s (but as I'm restarting corosync each minute, I think it's unlocking
something at next corosync restart.)

this time it was blocked at the same time on a node in:

work {
...
   } elsif ($state eq 'active') {
      ....
        $self->update_lrm_status();

and another node in

        if ($fence_request) {
            $haenv->log('err', "node need to be fenced - releasing agent_lock\n");
            $self->set_local_status({ state => 'lost_agent_lock'});
        } elsif (!$self->get_protected_ha_agent_lock()) {
            $self->set_local_status({ state => 'lost_agent_lock'});
        } elsif ($self->{mode} eq 'maintenance') {
            $self->set_local_status({ state => 'maintenance'});
        }

----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "aderumier" <aderumier at odiso.com>
Cc: "Proxmox VE development discussion" <pve-devel at lists.proxmox.com>
Envoyé: Mardi 15 Septembre 2020 16:32:52
Objet: Re: [pve-devel] corosync bug: cluster break after 1 node clean shutdown

On 9/15/20 4:09 PM, Alexandre DERUMIER wrote: 
>>> Can you try to give pmxcfs real time scheduling, e.g., by doing: 
>>> 
>>> # systemctl edit pve-cluster 
>>> 
>>> And then add snippet: 
>>> 
>>> 
>>> [Service] 
>>> CPUSchedulingPolicy=rr 
>>> CPUSchedulingPriority=99 
> yes, sure, I'll do it now 
> 
> 
>> I'm currently digging the logs 
>>> Is your most simplest/stable reproducer still a periodic restart of corosync in one node? 
> yes, a simple "systemctl restart corosync" on 1 node each minute 
> 
> 
> 
> After 1hour, it's still locked. 
> 
> on other nodes, I still have pmxfs logs like: 
> 

I mean this is bad, but also great! 
Cam you do a coredump of the whole thing and upload it somewhere with the version info 
used (for dbgsym package)? That could help a lot. 

> manual "pmxcfs -d" 
> https://gist.github.com/aderumier/4cd91d17e1f8847b93ea5f621f257c2e 
> 

Hmm, the fuse connection of the previous one got into a weird state (or something is still 
running) but I'd rather say this is a side-effect not directly connected to the real bug. 

> 
> some interesting dmesg about "pvesr" 
> 
> [Tue Sep 15 14:45:34 2020] INFO: task pvesr:19038 blocked for more than 120 seconds. 
> [Tue Sep 15 14:45:34 2020] Tainted: P O 5.4.60-1-pve #1 
> [Tue Sep 15 14:45:34 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
> [Tue Sep 15 14:45:34 2020] pvesr D 0 19038 1 0x00000080 
> [Tue Sep 15 14:45:34 2020] Call Trace: 
> [Tue Sep 15 14:45:34 2020] __schedule+0x2e6/0x6f0 
> [Tue Sep 15 14:45:34 2020] ? filename_parentat.isra.57.part.58+0xf7/0x180 
> [Tue Sep 15 14:45:34 2020] schedule+0x33/0xa0 
> [Tue Sep 15 14:45:34 2020] rwsem_down_write_slowpath+0x2ed/0x4a0 
> [Tue Sep 15 14:45:34 2020] down_write+0x3d/0x40 
> [Tue Sep 15 14:45:34 2020] filename_create+0x8e/0x180 
> [Tue Sep 15 14:45:34 2020] do_mkdirat+0x59/0x110 
> [Tue Sep 15 14:45:34 2020] __x64_sys_mkdir+0x1b/0x20 
> [Tue Sep 15 14:45:34 2020] do_syscall_64+0x57/0x190 
> [Tue Sep 15 14:45:34 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9 
> 

hmm, hangs in mkdir (cluster wide locking)