[PVE-User] troubles creating a cluster
Woods, Ken A (DNR)
ken.woods at alaska.gov
Tue Oct 30 18:19:31 CET 2018
Or, not doing that and making sure that multicast is enabled and omping works, like mentioned in the docs.
> On Oct 30, 2018, at 08:37, Gilberto Nunes <gilberto.nunes32 at gmail.com> wrote:
>
> Consider reinstall proxmox
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em ter, 30 de out de 2018 às 13:28, Adam Weremczuk <adamw at matrixscience.com>
> escreveu:
>
>> It doesn't appear to be related to /etc/hosts.
>> I've reverted them to defaults on all systems, commented out IPv6
>> sections and restarted all nodes.
>> The problem on node1 (lion) persists:
>>
>> systemctl status pve-cluster.service
>> ● pve-cluster.service - The Proxmox VE cluster filesystem
>> Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled;
>> vendor preset: enabled)
>> Active: active (running) since Tue 2018-10-30 16:18:10 GMT; 3min 7s ago
>> Process: 1864 ExecStartPost=/usr/bin/pvecm updatecerts --silent
>> (code=exited, status=0/SUCCESS)
>> Process: 1819 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
>> Main PID: 1853 (pmxcfs)
>> Tasks: 6 (limit: 4915)
>> Memory: 46.4M
>> CPU: 699ms
>> CGroup: /system.slice/pve-cluster.service
>> └─1853 /usr/bin/pmxcfs
>>
>> Oct 30 16:18:08 lion pmxcfs[1853]: [dcdb] crit: can't initialize service
>> Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: cpg_initialize failed: 2
>> Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: can't initialize service
>> Oct 30 16:18:10 lion systemd[1]: Started The Proxmox VE cluster filesystem.
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: update cluster info
>> (cluster name MS-HA-Cluster, version = 1)
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: node has quorum
>> Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: members: 1/1853
>> Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: all data is up to date
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: members: 1/1853
>> Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: all data is up to date
>>
>>
>>> On 30/10/18 15:06, Adam Weremczuk wrote:
>>> I have modified /etc/hosts on all nodes indeed.
>>> That's because DNS will be served from one of containers on the cluster.
>>> I don't want for cluster nodes to rely on DNS when communicating with
>>> each other.
>>> Maybe I'm trying to duplicate what Proxmox already does under the hood?
>>>
>>> Anyway my hosts files look like below:
>>>
>>> node1
>>> 192.168.8.101 node1.example.com node1 pvelocalhost
>>> 192.168.8.102 node2.example.com node2
>>> 192.168.8.103 node3.example.com node3
>>>
>>> node2
>>> 192.168.8.101 node1.example.com node1
>>> 192.168.8.102 node2.example.com node2 pvelocalhost
>>> 192.168.8.103 node3.example.com node3
>>>
>>> node3
>>> 192.168.8.101 node1.example.com node1
>>> 192.168.8.102 node2.example.com node2
>>> 192.168.8.103 node3.example.com node3 pvelocalhost
>>>
>>> + IPv6 section (identical on all) which I should probably comment out:
>>>
>>> ::1 ip6-localhost ip6-loopback
>>> fe00::0 ip6-localnet
>>> ff00::0 ip6-mcastprefix
>>> ff02::1 ip6-allnodes
>>> ff02::2 ip6-allrouters
>>> ff02::3 ip6-allhosts
>>>
>>>
>>>> On 30/10/18 14:54, Gilberto Nunes wrote:
>>>> HOw about /etc/hosts file?
>>>> Remember that Proxmox need to know about his IP and hostname
>>>> correctly, in order to start CRM accordingly
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>>
>>>> (47) 3025-5907
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>
>>>> Skype: gilberto.nunes36
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Em ter, 30 de out de 2018 às 11:47, Adam Weremczuk
>>>> <adamw at matrixscience.com <mailto:adamw at matrixscience.com>> escreveu:
>>>>
>>>> Yes, I have 3 nodes (2 x Lenovo servers + a VM) all on the same
>>>> LAN with
>>>> static IPv4 addresses.
>>>> They can happily ping each other and Proxmox web GUI looks ok on
>>>> all 3.
>>>> No IPv6 in use.
>>>>
>>>> "Systemctl status pve-cluster.service" looks clean on the other
>>>> nodes
>>>> but on this troublesome one returns:
>>>>
>>>> Active: active (running)
>>>> (...)
>>>> Oct 30 14:17:10 lion pmxcfs[18003]: [dcdb] crit: can't initialize
>>>> service
>>>> Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: cpg_initialize
>>>> failed: 2
>>>> Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: can't
>>>> initialize service
>>>>
>>>>
>>>>> On 30/10/18 14:38, Gilberto Nunes wrote:
>>>>> Hi
>>>>>
>>>>> It's seems to be a problem with the network connection between
>>>> the servers.
>>>>> They can ping each others?
>>>>> Is this a separated network, isolated from you LAN Network?
>>>>>
>>>>> ---
>>>>> Gilberto Nunes Ferreira
>>>>>
>>>>> (47) 3025-5907
>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>>
>>>>> Skype: gilberto.nunes36
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Em ter, 30 de out de 2018 às 11:36, Adam Weremczuk
>>>> <adamw at matrixscience.com <mailto:adamw at matrixscience.com>>
>>>>> escreveu:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> My errors:
>>>>>>
>>>>>> Connection error 500: RPCEnvironment init request failed:
>>>> Unable to load
>>>>>> access control list: Connection refused
>>>>>>
>>>>>> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[1] failed:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[2] failed:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[3] failed:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:06 lion pvesr[17960]: Unable to load access
>>>> control list:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Main process
>>>> exited,
>>>>>> code=exited, status=111/n/a
>>>>>> Oct 30 14:17:06 lion systemd[1]: Failed to start Proxmox VE
>>>> replication
>>>>>> runner.
>>>>>> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Unit entered
>>>> failed state.
>>>>>> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Failed with
>>>> result
>>>>>> 'exit-code'.
>>>>>> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:07 lion ntpd[1700]: Soliciting pool server
>>>> 2001:4860:4806:8::
>>>>>> Oct 30 14:17:07 lion pve-ha-lrm[1980]: updating service status
>>>> from
>>>>>> manager failed: Connection refused
>>>>>> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[1] failed:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[2] failed:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[3] failed:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[4] failed:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:08 lion pvestatd[1879]: status update error:
>>>> Connection
>>>>>> refused
>>>>>> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>>>>> Connection refused
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: State
>>>>>> 'stop-sigterm' timed out. Killing.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Killing
>>>> process
>>>>>> 1813 (pmxcfs) with signal SIGKILL.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Main
>>>> process
>>>>>> exited, code=killed, status=9/KILL
>>>>>> Oct 30 14:17:10 lion systemd[1]: Stopped The Proxmox VE cluster
>>>> filesystem.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Unit
>>>> entered
>>>>>> failed state.
>>>>>> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Failed
>>>> with result
>>>>>> 'timeout'.
>>>>>>
>>>>>> System info:
>>>>>>
>>>>>> pveversion -v
>>>>>> proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
>>>>>> pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
>>>>>> pve-kernel-4.15: 5.2-1
>>>>>> pve-kernel-4.15.17-1-pve: 4.15.17-9
>>>>>> corosync: 2.4.2-pve5
>>>>>> criu: 2.11.1-1~bpo90
>>>>>> glusterfs-client: 3.8.8-1
>>>>>> ksm-control-daemon: 1.2-2
>>>>>> libjs-extjs: 6.0.1-2
>>>>>> libpve-access-control: 5.0-8
>>>>>> libpve-apiclient-perl: 2.0-5
>>>>>> libpve-common-perl: 5.0-40
>>>>>> libpve-guest-common-perl: 2.0-18
>>>>>> libpve-http-server-perl: 2.0-11
>>>>>> libpve-storage-perl: 5.0-23
>>>>>> libqb0: 1.0.1-1
>>>>>> lvm2: 2.02.168-pve6
>>>>>> lxc-pve: 3.0.2+pve1-3
>>>>>> lxcfs: 3.0.2-2
>>>>>> novnc-pve: 1.0.0-2
>>>>>> proxmox-widget-toolkit: 1.0-20
>>>>>> pve-cluster: 5.0-30
>>>>>> pve-container: 2.0-23
>>>>>> pve-docs: 5.2-8
>>>>>> pve-firewall: 3.0-14
>>>>>> pve-firmware: 2.0-5
>>>>>> pve-ha-manager: 2.0-5
>>>>>> pve-i18n: 1.0-6
>>>>>> pve-libspice-server1: 0.12.8-3
>>>>>> pve-qemu-kvm: 2.11.1-5
>>>>>> pve-xtermjs: 1.0-5
>>>>>> qemu-server: 5.0-38
>>>>>> smartmontools: 6.5+svn4324-1
>>>>>> spiceterm: 3.0-5
>>>>>> vncterm: 1.5-3
>>>>>> zfsutils-linux: 0.7.11-pve1~bpo1
>>>>>>
>>>>>> Any idea what's wrong with my (fresh and default) installation?
>>>>>>
>>>>>> Thanks,
>>>>>> Adam
>>>>>>
>>>>>> _______________________________________________
>>>>>> pve-user mailing list
>>>>>> pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=
>>>>>>
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=
>>>>
>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-user at pve.proxmox.com
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=
>>
>>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=J583oNe6UySZrXl9dmRGYYWtv3F4criIo2nAlEyb1N8&s=O1znuDHRDxzP-CfEXskid3_dkVoOiWfmp9A6HYv_-7Q&e=
More information about the pve-user
mailing list