[PVE-User] troubles creating a cluster

Woods, Ken A (DNR) ken.woods at alaska.gov
Tue Oct 30 18:19:31 CET 2018


Or, not doing that and making sure that multicast is enabled and omping works, like mentioned in the docs.  

> On Oct 30, 2018, at 08:37, Gilberto Nunes <gilberto.nunes32 at gmail.com> wrote:
> 
> Consider reinstall proxmox
> ---
> Gilberto Nunes Ferreira
> 
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
> 
> Skype: gilberto.nunes36
> 
> 
> 
> 
> 
> Em ter, 30 de out de 2018 às 13:28, Adam Weremczuk <adamw at matrixscience.com>
> escreveu:
> 
>> It doesn't appear to be related to /etc/hosts.
>> I've reverted them to defaults on all systems, commented out IPv6
>> sections and restarted all nodes.
>> The problem on node1 (lion) persists:
>> 
>> systemctl status pve-cluster.service
>> ● pve-cluster.service - The Proxmox VE cluster filesystem
>>    Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled;
>> vendor preset: enabled)
>>    Active: active (running) since Tue 2018-10-30 16:18:10 GMT; 3min 7s ago
>>   Process: 1864 ExecStartPost=/usr/bin/pvecm updatecerts --silent
>> (code=exited, status=0/SUCCESS)
>>   Process: 1819 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
>>  Main PID: 1853 (pmxcfs)
>>     Tasks: 6 (limit: 4915)
>>    Memory: 46.4M
>>       CPU: 699ms
>>    CGroup: /system.slice/pve-cluster.service
>>            └─1853 /usr/bin/pmxcfs
>> 
>> Oct 30 16:18:08 lion pmxcfs[1853]: [dcdb] crit: can't initialize service
>> Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: cpg_initialize failed: 2
>> Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: can't initialize service
>> Oct 30 16:18:10 lion systemd[1]: Started The Proxmox VE cluster filesystem.
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: update cluster info
>> (cluster name  MS-HA-Cluster, version = 1)
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: node has quorum
>> Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: members: 1/1853
>> Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: all data is up to date
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: members: 1/1853
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: all data is up to date
>> 
>> 
>>> On 30/10/18 15:06, Adam Weremczuk wrote:
>>> I have modified /etc/hosts on all nodes indeed.
>>> That's because DNS will be served from one of containers on the cluster.
>>> I don't want for cluster nodes to rely on DNS when communicating with
>>> each other.
>>> Maybe I'm trying to duplicate what Proxmox already does under the hood?
>>> 
>>> Anyway my hosts files look like below:
>>> 
>>> node1
>>> 192.168.8.101 node1.example.com node1 pvelocalhost
>>> 192.168.8.102 node2.example.com node2
>>> 192.168.8.103 node3.example.com node3
>>> 
>>> node2
>>> 192.168.8.101 node1.example.com node1
>>> 192.168.8.102 node2.example.com node2 pvelocalhost
>>> 192.168.8.103 node3.example.com node3
>>> 
>>> node3
>>> 192.168.8.101 node1.example.com node1
>>> 192.168.8.102 node2.example.com node2
>>> 192.168.8.103 node3.example.com node3 pvelocalhost
>>> 
>>> + IPv6 section (identical on all) which I should probably comment out:
>>> 
>>> ::1     ip6-localhost ip6-loopback
>>> fe00::0 ip6-localnet
>>> ff00::0 ip6-mcastprefix
>>> ff02::1 ip6-allnodes
>>> ff02::2 ip6-allrouters
>>> ff02::3 ip6-allhosts
>>> 
>>> 
>>>> On 30/10/18 14:54, Gilberto Nunes wrote:
>>>> HOw about /etc/hosts file?
>>>> Remember that Proxmox need to know about his IP and hostname
>>>> correctly, in order to start CRM accordingly
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>> 
>>>> (47) 3025-5907
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>> 
>>>> Skype: gilberto.nunes36
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Em ter, 30 de out de 2018 às 11:47, Adam Weremczuk
>>>> <adamw at matrixscience.com <mailto:adamw at matrixscience.com>> escreveu:
>>>> 
>>>>    Yes, I have 3 nodes (2 x Lenovo servers + a VM) all on the same
>>>>    LAN with
>>>>    static IPv4 addresses.
>>>>    They can happily ping each other and Proxmox web GUI looks ok on
>>>>    all 3.
>>>>    No IPv6 in use.
>>>> 
>>>>    "Systemctl status pve-cluster.service" looks clean on the other
>>>> nodes
>>>>    but on this troublesome one returns:
>>>> 
>>>>    Active: active (running)
>>>>    (...)
>>>>    Oct 30 14:17:10 lion pmxcfs[18003]: [dcdb] crit: can't initialize
>>>>    service
>>>>    Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: cpg_initialize
>>>>    failed: 2
>>>>    Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: can't
>>>>    initialize service
>>>> 
>>>> 
>>>>>    On 30/10/18 14:38, Gilberto Nunes wrote:
>>>>> Hi
>>>>> 
>>>>> It's seems to be a problem with the network connection between
>>>>    the servers.
>>>>> They can ping each others?
>>>>> Is this a separated network, isolated from you LAN Network?
>>>>> 
>>>>> ---
>>>>> Gilberto Nunes Ferreira
>>>>> 
>>>>> (47) 3025-5907
>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>> 
>>>>> Skype: gilberto.nunes36
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Em ter, 30 de out de 2018 às 11:36, Adam Weremczuk
>>>>    <adamw at matrixscience.com <mailto:adamw at matrixscience.com>>
>>>>> escreveu:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> My errors:
>>>>>> 
>>>>>> Connection error 500: RPCEnvironment init request failed:
>>>>    Unable to load
>>>>>> access control list: Connection refused
>>>>>> 
>>>>>> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[1] failed:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[2] failed:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[3] failed:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: Unable to load access
>>>>    control list:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Main process
>>>>    exited,
>>>>>> code=exited, status=111/n/a
>>>>>> Oct 30 14:17:06 lion systemd[1]: Failed to start Proxmox VE
>>>>    replication
>>>>>> runner.
>>>>>> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Unit entered
>>>>    failed state.
>>>>>> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Failed with
>>>> result
>>>>>> 'exit-code'.
>>>>>> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:07 lion ntpd[1700]: Soliciting pool server
>>>>    2001:4860:4806:8::
>>>>>> Oct 30 14:17:07 lion pve-ha-lrm[1980]: updating service status
>>>> from
>>>>>> manager failed: Connection refused
>>>>>> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[1] failed:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[2] failed:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[3] failed:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[4] failed:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: status update error:
>>>>    Connection
>>>>>> refused
>>>>>> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: State
>>>>>> 'stop-sigterm' timed out. Killing.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Killing
>>>>    process
>>>>>> 1813 (pmxcfs) with signal SIGKILL.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Main
>>>> process
>>>>>> exited, code=killed, status=9/KILL
>>>>>> Oct 30 14:17:10 lion systemd[1]: Stopped The Proxmox VE cluster
>>>>    filesystem.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Unit
>>>> entered
>>>>>> failed state.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Failed
>>>>    with result
>>>>>> 'timeout'.
>>>>>> 
>>>>>> System info:
>>>>>> 
>>>>>> pveversion -v
>>>>>> proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
>>>>>> pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
>>>>>> pve-kernel-4.15: 5.2-1
>>>>>> pve-kernel-4.15.17-1-pve: 4.15.17-9
>>>>>> corosync: 2.4.2-pve5
>>>>>> criu: 2.11.1-1~bpo90
>>>>>> glusterfs-client: 3.8.8-1
>>>>>> ksm-control-daemon: 1.2-2
>>>>>> libjs-extjs: 6.0.1-2
>>>>>> libpve-access-control: 5.0-8
>>>>>> libpve-apiclient-perl: 2.0-5
>>>>>> libpve-common-perl: 5.0-40
>>>>>> libpve-guest-common-perl: 2.0-18
>>>>>> libpve-http-server-perl: 2.0-11
>>>>>> libpve-storage-perl: 5.0-23
>>>>>> libqb0: 1.0.1-1
>>>>>> lvm2: 2.02.168-pve6
>>>>>> lxc-pve: 3.0.2+pve1-3
>>>>>> lxcfs: 3.0.2-2
>>>>>> novnc-pve: 1.0.0-2
>>>>>> proxmox-widget-toolkit: 1.0-20
>>>>>> pve-cluster: 5.0-30
>>>>>> pve-container: 2.0-23
>>>>>> pve-docs: 5.2-8
>>>>>> pve-firewall: 3.0-14
>>>>>> pve-firmware: 2.0-5
>>>>>> pve-ha-manager: 2.0-5
>>>>>> pve-i18n: 1.0-6
>>>>>> pve-libspice-server1: 0.12.8-3
>>>>>> pve-qemu-kvm: 2.11.1-5
>>>>>> pve-xtermjs: 1.0-5
>>>>>> qemu-server: 5.0-38
>>>>>> smartmontools: 6.5+svn4324-1
>>>>>> spiceterm: 3.0-5
>>>>>> vncterm: 1.5-3
>>>>>> zfsutils-linux: 0.7.11-pve1~bpo1
>>>>>> 
>>>>>> Any idea what's wrong with my (fresh and default) installation?
>>>>>> 
>>>>>> Thanks,
>>>>>> Adam
>>>>>> 
>>>>>> _______________________________________________
>>>>>> pve-user mailing list
>>>>>> pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=
>>>>>> 
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=
>>>> 
>>> 
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=
>> 
>> 
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=


More information about the pve-user mailing list