[PVE-User] Proxmox 4.1 cluster issue
Guy Plunkett
guy at britewhite.net
Wed Feb 17 12:47:05 CET 2016
I’ve just rebuild all my proxmox heads and created a new cluster. No HA.
This was working just fine before upgrading to proxmox 4.1
Within 5 minutes adding all 4 systems to the cluster proxmox03 and proxmox01 have dropped from the cluster group.
I’m seeing the following filling up the logs
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:42:23 proxmox04 corosync[34115]: [TOTEM ] Retransmit List: 4d7 4d8 4d9 4da 4db 4dc 4dd 4de 4df 4e0 4e1 4e2 4e
Feb 17 11:44:48 proxmox01 corosync[3195]: [MAIN ] Completed service synchronization, ready to provide servi
Feb 17 11:44:54 proxmox01 corosync[3195]: [TOTEM ] A new membership (10.240.0.100:220) was formed. Members l
Feb 17 11:44:54 proxmox01 corosync[3195]: [TOTEM ] Failed to receive the leave message. failed: 3 1
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [dcdb] notice: members: 4/3172
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [status] notice: members: 4/3172
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [status] notice: node lost quorum
Feb 17 11:44:54 proxmox01 corosync[3195]: [QUORUM] This node is within the non-primary component and will NO
Feb 17 11:44:54 proxmox01 corosync[3195]: [QUORUM] Members[1]: 4
Feb 17 11:44:54 proxmox01 corosync[3195]: [MAIN ] Completed service synchronization, ready to provide servi
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [dcdb] crit: received write while not quorate - trigger resync
Feb 17 11:44:54 proxmox01 pmxcfs[3172]: [dcdb] crit: leaving CPG group
Feb 17 11:44:55 proxmox01 pmxcfs[3172]: [dcdb] notice: start cluster connection
Feb 17 11:44:55 proxmox01 pmxcfs[3172]: [dcdb] notice: members: 4/3172
Feb 17 11:44:55 proxmox01 pmxcfs[3172]: [dcdb] notice: all data is up to date
----
Guy
> On 17 Feb 2016, at 07:23, Thomas Lamprecht <t.lamprecht at proxmox.com> wrote:
>
> Note that /etc/cluster/cluster.conf isn't needed anymore, everything cluster relevant will we read out of /etc/pve/corosync.conf (which looks good as far as I can see).
>
> You said you upgrade, are you really _really_ sure you did not miss a step (no offense)?
>
> I assume you rebuild the cluster cleanly with pvecm addnode <...>?
>
> Can you post also your /etc/hostname and /etc/network/interfaces,
> but it seems to be able to connect initially, thus they should be fine...
>
>
> proxmox04 seems to be the problem, as the other can connect just fine.
>
> Can you post whats happening there with:
> $ journalctl -u corosync.service -u pve-cluster.service -b
>
> So we filter out (possible) irrelevant other logging.
>
> cheers,
> Thomas
>
> On 02/16/2016 07:46 PM, Guy Plunkett wrote:
>> Hello,
>>
>> I’ve upgraded my Dell M1000 blade centre to Proxmox 4.1. The upgrade seems to go fine, however I can’t seem to have all 4 nodes connected at once. It seems to work for a short time then then one node will disappear, I can SSH to it just fine, and have to restart corosync and pve-cluster and it will join again, however shortly later another node will disappear.
>>
>> Finally a node crashes and restarts. There is nothing present in the syslogs as to why this node cashed.
>>
>> I’ve spent 2 days fighting with this to try and resolve it. This was working just fine on 3.x.
>>
>> Please can someone help here I’m pulling my hair out trying to get this working, and I don’t have much left!
>>
>> Cheers,
>> —Guy
>>
>> Feb 16 16:32:50 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35536) was formed. Members
>> Feb 16 16:32:50 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:32:50 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:32:53 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35540) was formed. Members
>> Feb 16 16:32:53 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:32:53 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:32:56 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35544) was formed. Members
>> Feb 16 16:32:56 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:32:56 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:32:59 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35548) was formed. Members
>> Feb 16 16:32:59 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:32:59 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:33:02 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35552) was formed. Members
>> Feb 16 16:33:02 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:33:02 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:33:05 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35556) was formed. Members
>> Feb 16 16:33:05 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:33:05 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:33:08 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35560) was formed. Members
>> Feb 16 16:33:08 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:33:08 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:33:11 proxmox01 corosync[5747]: [TOTEM ] A new membership (10.240.0.100:35564) was formed. Members
>> Feb 16 16:33:11 proxmox01 corosync[5747]: [QUORUM] Members[3]: 4 3 2
>> Feb 16 16:33:11 proxmox01 corosync[5747]: [MAIN ] Completed service synchronization, ready to provide service.
>> Feb 16 16:36:45 proxmox01 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2723" x-info="http://www.rsyslog.com <http://www.rsyslog.com/>"] start
>> Feb 16 16:36:45 proxmox01 systemd-modules-load[999]: Module 'fuse' is builtin
>> Feb 16 16:36:45 proxmox01 systemd-modules-load[999]: Inserted module 'vhost_net'
>> Feb 16 16:36:45 proxmox01 hdparm[1031]: Setting parameters of disc: (none).
>> Feb 16 16:36:45 proxmox01 lvm[1280]: 3 logical volume(s) in volume group "pve" now active
>>
>>
>>
>> # cat /etc/cluster/cluster.conf
>> <?xml version="1.0"?>
>> <cluster name="Cork-Training" config_version="6">
>>
>> <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
>> </cman>
>>
>> <clusternodes>
>> <clusternode name="proxmox01" votes="1" nodeid="1"/>
>> <clusternode name="proxmox02" votes="1" nodeid="2"/><clusternode name="proxmox03" votes="1" nodeid="3"/><clusternode name="proxmox04" votes="1" nodeid="4"/></clusternodes>
>>
>> </cluster>
>>
>> # cat /etc/pve/corosync.conf
>> logging {
>> debug: off
>> to_syslog: yes
>> }
>>
>> nodelist {
>> node {
>> name: proxmox04
>> nodeid: 1
>> quorum_votes: 1
>> ring0_addr: proxmox04
>> }
>>
>> node {
>> name: proxmox03
>> nodeid: 2
>> quorum_votes: 1
>> ring0_addr: proxmox03
>> }
>>
>> node {
>> name: proxmox02
>> nodeid: 3
>> quorum_votes: 1
>> ring0_addr: proxmox02
>> }
>>
>> node {
>> name: proxmox01
>> nodeid: 4
>> quorum_votes: 1
>> ring0_addr: proxmox01
>> }
>>
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> }
>>
>> totem {
>> cluster_name: Cork-Training
>> config_version: 6
>> ip_version: ipv4
>> secauth: on
>> version: 2
>> interface {
>> bindnetaddr: 10.240.0.100
>> ringnumber: 0
>> }
>>
>> }
>>
>>
>>
>>
>> ----
>> Guy
>>
>>
>>
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user <http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20160217/3db018ee/attachment.htm>
More information about the pve-user
mailing list