[PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum

Frank Thommen f.thommen at dkfz-heidelberg.de
Sat Jan 16 13:26:17 CET 2021


Just to close this thread on the maillist: I finally made this a support 
request @proxmox and we are still working on it.  It's not an easy case 
to solve :-)

Frank



On 08.01.21 13:01, Frank Thommen wrote:
> Could this entry be the result of the fencing which happened when the 
> host initially crashed?  I assumed, that it would automatically be 
> unfenced when it comes up again.  I never run some manual "unfencing" (I 
> wouldn't know how).
> 
> Frank
> 
> 
> 
> On 08.01.21 12:44, Frank Thommen wrote:
>> yes /etc/ceph/ceph.conf is identical on all three hosts and there is a 
>> mon_host line with the correct IPs.  Interestingly there is a special 
>> section for odcf-pve02:
>>
>> -----------
>> [mon.odcf-pve02]
>>       public_addr = 192.168.255.2
>> -----------
>>
>> This is the same IP as in the mon_host line.  However there is no 
>> equivalent section for the other two nodes.
>>
>> Frank
>>
>>
>> On 08.01.21 12:27, Peter Simon wrote:
>>> Hi Frank,
>>>
>>> your /etc/ceph/ceph.conf is the same on all hosts ?
>>>
>>> is there mon host = ip1, ip2, ip3
>>>
>>> and seperate sections with [mon.x]
>>> host = hostname
>>> mon addr = ip:6789
>>>
>>> Cheers
>>> Peter
>>>
>>> Am 08.01.21 um 12:05 schrieb Frank Thommen:
>>>>
>>>>
>>>> On 08.01.21 11:45, Uwe Sauter wrote:
>>>>>
>>>>>
>>>>> Am 08.01.21 um 11:36 schrieb Frank Thommen:
>>>>>>
>>>>>> On 05.01.21 21:17, Frank Thommen wrote:
>>>>>>> On 05.01.21 21:02, Uwe Sauter wrote:
>>>>>>>> There's a paragraph about probing mons on
>>>>>>>>
>>>>>>>> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/ 
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I will check that (tomorrow :-)
>>>>>>
>>>>>>
>>>>>> using the monitor's admin socket on either of the three nodes I can
>>>>>> query the monitors of 01 and 03 (the good ones) but not of 02 (the
>>>>>> problematic one):
>>>>>>
>>>>>> root at odcf-pve01:~# ceph tell mon.odcf-pve02 mon_status
>>>>>> Error ENOENT: problem getting command descriptions from 
>>>>>> mon.odcf-pve02
>>>>>> root at odcf-pve01:~#
>>>>>>
>>>>>> The monitor daemon is running on all three and the ports are open.
>>>>>>
>>>>>> Any other ideas?
>>>>>
>>>>> You could check the permissions on the socket:
>>>>>
>>>>> ss -xln | grep ceph-mon
>>>>> SOCK=$(ss -xln | awk '/ceph-mon/ {print $5}')
>>>>> ls -la ${SOCK}
>>>>>
>>>>> On my host, this shows
>>>>>
>>>>> srwxr-xr-x 1 ceph ceph 0 Dec 20 23:47
>>>>> /var/run/ceph/ceph-mon.px-alpha-cluster.asok
>>>>
>>>> same here
>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>
>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> 
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user




More information about the pve-user mailing list