[PVE-User] troubles creating a cluster
Adam Weremczuk
adamw at matrixscience.com
Tue Oct 30 17:28:49 CET 2018
It doesn't appear to be related to /etc/hosts.
I've reverted them to defaults on all systems, commented out IPv6
sections and restarted all nodes.
The problem on node1 (lion) persists:
systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled;
vendor preset: enabled)
Active: active (running) since Tue 2018-10-30 16:18:10 GMT; 3min 7s ago
Process: 1864 ExecStartPost=/usr/bin/pvecm updatecerts --silent
(code=exited, status=0/SUCCESS)
Process: 1819 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 1853 (pmxcfs)
Tasks: 6 (limit: 4915)
Memory: 46.4M
CPU: 699ms
CGroup: /system.slice/pve-cluster.service
└─1853 /usr/bin/pmxcfs
Oct 30 16:18:08 lion pmxcfs[1853]: [dcdb] crit: can't initialize service
Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: cpg_initialize failed: 2
Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: can't initialize service
Oct 30 16:18:10 lion systemd[1]: Started The Proxmox VE cluster filesystem.
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: update cluster info
(cluster name MS-HA-Cluster, version = 1)
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: node has quorum
Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: members: 1/1853
Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: all data is up to date
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: members: 1/1853
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: all data is up to date
On 30/10/18 15:06, Adam Weremczuk wrote:
> I have modified /etc/hosts on all nodes indeed.
> That's because DNS will be served from one of containers on the cluster.
> I don't want for cluster nodes to rely on DNS when communicating with
> each other.
> Maybe I'm trying to duplicate what Proxmox already does under the hood?
>
> Anyway my hosts files look like below:
>
> node1
> 192.168.8.101 node1.example.com node1 pvelocalhost
> 192.168.8.102 node2.example.com node2
> 192.168.8.103 node3.example.com node3
>
> node2
> 192.168.8.101 node1.example.com node1
> 192.168.8.102 node2.example.com node2 pvelocalhost
> 192.168.8.103 node3.example.com node3
>
> node3
> 192.168.8.101 node1.example.com node1
> 192.168.8.102 node2.example.com node2
> 192.168.8.103 node3.example.com node3 pvelocalhost
>
> + IPv6 section (identical on all) which I should probably comment out:
>
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
>
> On 30/10/18 14:54, Gilberto Nunes wrote:
>> HOw about /etc/hosts file?
>> Remember that Proxmox need to know about his IP and hostname
>> correctly, in order to start CRM accordingly
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em ter, 30 de out de 2018 às 11:47, Adam Weremczuk
>> <adamw at matrixscience.com <mailto:adamw at matrixscience.com>> escreveu:
>>
>> Yes, I have 3 nodes (2 x Lenovo servers + a VM) all on the same
>> LAN with
>> static IPv4 addresses.
>> They can happily ping each other and Proxmox web GUI looks ok on
>> all 3.
>> No IPv6 in use.
>>
>> "Systemctl status pve-cluster.service" looks clean on the other
>> nodes
>> but on this troublesome one returns:
>>
>> Active: active (running)
>> (...)
>> Oct 30 14:17:10 lion pmxcfs[18003]: [dcdb] crit: can't initialize
>> service
>> Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: cpg_initialize
>> failed: 2
>> Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: can't
>> initialize service
>>
>>
>> On 30/10/18 14:38, Gilberto Nunes wrote:
>> > Hi
>> >
>> > It's seems to be a problem with the network connection between
>> the servers.
>> > They can ping each others?
>> > Is this a separated network, isolated from you LAN Network?
>> >
>> > ---
>> > Gilberto Nunes Ferreira
>> >
>> > (47) 3025-5907
>> > (47) 99676-7530 - Whatsapp / Telegram
>> >
>> > Skype: gilberto.nunes36
>> >
>> >
>> >
>> >
>> >
>> > Em ter, 30 de out de 2018 às 11:36, Adam Weremczuk
>> <adamw at matrixscience.com <mailto:adamw at matrixscience.com>>
>> > escreveu:
>> >
>> >> Hi all,
>> >>
>> >> My errors:
>> >>
>> >> Connection error 500: RPCEnvironment init request failed:
>> Unable to load
>> >> access control list: Connection refused
>> >>
>> >> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[1] failed:
>> >> Connection refused
>> >> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[2] failed:
>> >> Connection refused
>> >> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[3] failed:
>> >> Connection refused
>> >> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[1] failed:
>> Connection
>> >> refused
>> >> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[2] failed:
>> Connection
>> >> refused
>> >> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[3] failed:
>> Connection
>> >> refused
>> >> Oct 30 14:17:06 lion pvesr[17960]: Unable to load access
>> control list:
>> >> Connection refused
>> >> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Main process
>> exited,
>> >> code=exited, status=111/n/a
>> >> Oct 30 14:17:06 lion systemd[1]: Failed to start Proxmox VE
>> replication
>> >> runner.
>> >> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Unit entered
>> failed state.
>> >> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Failed with
>> result
>> >> 'exit-code'.
>> >> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>> >> Connection refused
>> >> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>> >> Connection refused
>> >> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>> >> Connection refused
>> >> Oct 30 14:17:07 lion ntpd[1700]: Soliciting pool server
>> 2001:4860:4806:8::
>> >> Oct 30 14:17:07 lion pve-ha-lrm[1980]: updating service status
>> from
>> >> manager failed: Connection refused
>> >> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>> >> Connection refused
>> >> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>> >> Connection refused
>> >> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>> >> Connection refused
>> >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[1] failed:
>> Connection
>> >> refused
>> >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[2] failed:
>> Connection
>> >> refused
>> >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[3] failed:
>> Connection
>> >> refused
>> >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[4] failed:
>> Connection
>> >> refused
>> >> Oct 30 14:17:08 lion pvestatd[1879]: status update error:
>> Connection
>> >> refused
>> >> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>> >> Connection refused
>> >> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>> >> Connection refused
>> >> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>> >> Connection refused
>> >> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>> >> Connection refused
>> >> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>> >> Connection refused
>> >> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>> >> Connection refused
>> >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: State
>> >> 'stop-sigterm' timed out. Killing.
>> >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Killing
>> process
>> >> 1813 (pmxcfs) with signal SIGKILL.
>> >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Main
>> process
>> >> exited, code=killed, status=9/KILL
>> >> Oct 30 14:17:10 lion systemd[1]: Stopped The Proxmox VE cluster
>> filesystem.
>> >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Unit
>> entered
>> >> failed state.
>> >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Failed
>> with result
>> >> 'timeout'.
>> >>
>> >> System info:
>> >>
>> >> pveversion -v
>> >> proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
>> >> pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
>> >> pve-kernel-4.15: 5.2-1
>> >> pve-kernel-4.15.17-1-pve: 4.15.17-9
>> >> corosync: 2.4.2-pve5
>> >> criu: 2.11.1-1~bpo90
>> >> glusterfs-client: 3.8.8-1
>> >> ksm-control-daemon: 1.2-2
>> >> libjs-extjs: 6.0.1-2
>> >> libpve-access-control: 5.0-8
>> >> libpve-apiclient-perl: 2.0-5
>> >> libpve-common-perl: 5.0-40
>> >> libpve-guest-common-perl: 2.0-18
>> >> libpve-http-server-perl: 2.0-11
>> >> libpve-storage-perl: 5.0-23
>> >> libqb0: 1.0.1-1
>> >> lvm2: 2.02.168-pve6
>> >> lxc-pve: 3.0.2+pve1-3
>> >> lxcfs: 3.0.2-2
>> >> novnc-pve: 1.0.0-2
>> >> proxmox-widget-toolkit: 1.0-20
>> >> pve-cluster: 5.0-30
>> >> pve-container: 2.0-23
>> >> pve-docs: 5.2-8
>> >> pve-firewall: 3.0-14
>> >> pve-firmware: 2.0-5
>> >> pve-ha-manager: 2.0-5
>> >> pve-i18n: 1.0-6
>> >> pve-libspice-server1: 0.12.8-3
>> >> pve-qemu-kvm: 2.11.1-5
>> >> pve-xtermjs: 1.0-5
>> >> qemu-server: 5.0-38
>> >> smartmontools: 6.5+svn4324-1
>> >> spiceterm: 3.0-5
>> >> vncterm: 1.5-3
>> >> zfsutils-linux: 0.7.11-pve1~bpo1
>> >>
>> >> Any idea what's wrong with my (fresh and default) installation?
>> >>
>> >> Thanks,
>> >> Adam
>> >>
>> >> _______________________________________________
>> >> pve-user mailing list
>> >> pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> >>
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
More information about the pve-user
mailing list