[PVE-User] After update Ceph monitor shows wrong version in UI and is down and out of quorum
Frank Thommen
f.thommen at dkfz-heidelberg.de
Sat Jan 16 13:26:17 CET 2021
Just to close this thread on the maillist: I finally made this a support
request @proxmox and we are still working on it. It's not an easy case
to solve :-)
Frank
On 08.01.21 13:01, Frank Thommen wrote:
> Could this entry be the result of the fencing which happened when the
> host initially crashed? I assumed, that it would automatically be
> unfenced when it comes up again. I never run some manual "unfencing" (I
> wouldn't know how).
>
> Frank
>
>
>
> On 08.01.21 12:44, Frank Thommen wrote:
>> yes /etc/ceph/ceph.conf is identical on all three hosts and there is a
>> mon_host line with the correct IPs. Interestingly there is a special
>> section for odcf-pve02:
>>
>> -----------
>> [mon.odcf-pve02]
>> public_addr = 192.168.255.2
>> -----------
>>
>> This is the same IP as in the mon_host line. However there is no
>> equivalent section for the other two nodes.
>>
>> Frank
>>
>>
>> On 08.01.21 12:27, Peter Simon wrote:
>>> Hi Frank,
>>>
>>> your /etc/ceph/ceph.conf is the same on all hosts ?
>>>
>>> is there mon host = ip1, ip2, ip3
>>>
>>> and seperate sections with [mon.x]
>>> host = hostname
>>> mon addr = ip:6789
>>>
>>> Cheers
>>> Peter
>>>
>>> Am 08.01.21 um 12:05 schrieb Frank Thommen:
>>>>
>>>>
>>>> On 08.01.21 11:45, Uwe Sauter wrote:
>>>>>
>>>>>
>>>>> Am 08.01.21 um 11:36 schrieb Frank Thommen:
>>>>>>
>>>>>> On 05.01.21 21:17, Frank Thommen wrote:
>>>>>>> On 05.01.21 21:02, Uwe Sauter wrote:
>>>>>>>> There's a paragraph about probing mons on
>>>>>>>>
>>>>>>>> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> I will check that (tomorrow :-)
>>>>>>
>>>>>>
>>>>>> using the monitor's admin socket on either of the three nodes I can
>>>>>> query the monitors of 01 and 03 (the good ones) but not of 02 (the
>>>>>> problematic one):
>>>>>>
>>>>>> root at odcf-pve01:~# ceph tell mon.odcf-pve02 mon_status
>>>>>> Error ENOENT: problem getting command descriptions from
>>>>>> mon.odcf-pve02
>>>>>> root at odcf-pve01:~#
>>>>>>
>>>>>> The monitor daemon is running on all three and the ports are open.
>>>>>>
>>>>>> Any other ideas?
>>>>>
>>>>> You could check the permissions on the socket:
>>>>>
>>>>> ss -xln | grep ceph-mon
>>>>> SOCK=$(ss -xln | awk '/ceph-mon/ {print $5}')
>>>>> ls -la ${SOCK}
>>>>>
>>>>> On my host, this shows
>>>>>
>>>>> srwxr-xr-x 1 ceph ceph 0 Dec 20 23:47
>>>>> /var/run/ceph/ceph-mon.px-alpha-cluster.asok
>>>>
>>>> same here
>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-user at lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>
>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list