[PVE-User] troubles creating a cluster

Adam Weremczuk adamw at matrixscience.com
Tue Oct 30 17:28:49 CET 2018


It doesn't appear to be related to /etc/hosts.
I've reverted them to defaults on all systems, commented out IPv6 
sections and restarted all nodes.
The problem on node1 (lion) persists:

systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
    Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; 
vendor preset: enabled)
    Active: active (running) since Tue 2018-10-30 16:18:10 GMT; 3min 7s ago
   Process: 1864 ExecStartPost=/usr/bin/pvecm updatecerts --silent 
(code=exited, status=0/SUCCESS)
   Process: 1819 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
  Main PID: 1853 (pmxcfs)
     Tasks: 6 (limit: 4915)
    Memory: 46.4M
       CPU: 699ms
    CGroup: /system.slice/pve-cluster.service
            └─1853 /usr/bin/pmxcfs

Oct 30 16:18:08 lion pmxcfs[1853]: [dcdb] crit: can't initialize service
Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: cpg_initialize failed: 2
Oct 30 16:18:08 lion pmxcfs[1853]: [status] crit: can't initialize service
Oct 30 16:18:10 lion systemd[1]: Started The Proxmox VE cluster filesystem.
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: update cluster info 
(cluster name  MS-HA-Cluster, version = 1)
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: node has quorum
Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: members: 1/1853
Oct 30 16:18:14 lion pmxcfs[1853]: [dcdb] notice: all data is up to date
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: members: 1/1853
Oct 30 16:18:14 lion pmxcfs[1853]: [status] notice: all data is up to date


On 30/10/18 15:06, Adam Weremczuk wrote:
> I have modified /etc/hosts on all nodes indeed.
> That's because DNS will be served from one of containers on the cluster.
> I don't want for cluster nodes to rely on DNS when communicating with 
> each other.
> Maybe I'm trying to duplicate what Proxmox already does under the hood?
>
> Anyway my hosts files look like below:
>
> node1
> 192.168.8.101 node1.example.com node1 pvelocalhost
> 192.168.8.102 node2.example.com node2
> 192.168.8.103 node3.example.com node3
>
> node2
> 192.168.8.101 node1.example.com node1
> 192.168.8.102 node2.example.com node2 pvelocalhost
> 192.168.8.103 node3.example.com node3
>
> node3
> 192.168.8.101 node1.example.com node1
> 192.168.8.102 node2.example.com node2
> 192.168.8.103 node3.example.com node3 pvelocalhost
>
> + IPv6 section (identical on all) which I should probably comment out:
>
> ::1     ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
>
> On 30/10/18 14:54, Gilberto Nunes wrote:
>> HOw about /etc/hosts file?
>> Remember that Proxmox need to know about his IP and hostname 
>> correctly, in order to start CRM accordingly
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em ter, 30 de out de 2018 às 11:47, Adam Weremczuk 
>> <adamw at matrixscience.com <mailto:adamw at matrixscience.com>> escreveu:
>>
>>     Yes, I have 3 nodes (2 x Lenovo servers + a VM) all on the same
>>     LAN with
>>     static IPv4 addresses.
>>     They can happily ping each other and Proxmox web GUI looks ok on
>>     all 3.
>>     No IPv6 in use.
>>
>>     "Systemctl status pve-cluster.service" looks clean on the other 
>> nodes
>>     but on this troublesome one returns:
>>
>>     Active: active (running)
>>     (...)
>>     Oct 30 14:17:10 lion pmxcfs[18003]: [dcdb] crit: can't initialize
>>     service
>>     Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: cpg_initialize
>>     failed: 2
>>     Oct 30 14:17:10 lion pmxcfs[18003]: [status] crit: can't
>>     initialize service
>>
>>
>>     On 30/10/18 14:38, Gilberto Nunes wrote:
>>     > Hi
>>     >
>>     > It's seems to be a problem with the network connection between
>>     the servers.
>>     > They can ping each others?
>>     > Is this a separated network, isolated from you LAN Network?
>>     >
>>     > ---
>>     > Gilberto Nunes Ferreira
>>     >
>>     > (47) 3025-5907
>>     > (47) 99676-7530 - Whatsapp / Telegram
>>     >
>>     > Skype: gilberto.nunes36
>>     >
>>     >
>>     >
>>     >
>>     >
>>     > Em ter, 30 de out de 2018 às 11:36, Adam Weremczuk
>>     <adamw at matrixscience.com <mailto:adamw at matrixscience.com>>
>>     > escreveu:
>>     >
>>     >> Hi all,
>>     >>
>>     >> My errors:
>>     >>
>>     >> Connection error 500: RPCEnvironment init request failed:
>>     Unable to load
>>     >> access control list: Connection refused
>>     >>
>>     >> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[1] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[2] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:06 lion pveproxy[14464]: ipcc_send_rec[3] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[1] failed:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[2] failed:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:06 lion pvesr[17960]: ipcc_send_rec[3] failed:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:06 lion pvesr[17960]: Unable to load access
>>     control list:
>>     >> Connection refused
>>     >> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Main process
>>     exited,
>>     >> code=exited, status=111/n/a
>>     >> Oct 30 14:17:06 lion systemd[1]: Failed to start Proxmox VE
>>     replication
>>     >> runner.
>>     >> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Unit entered
>>     failed state.
>>     >> Oct 30 14:17:06 lion systemd[1]: pvesr.service: Failed with 
>> result
>>     >> 'exit-code'.
>>     >> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:07 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:07 lion ntpd[1700]: Soliciting pool server
>>     2001:4860:4806:8::
>>     >> Oct 30 14:17:07 lion pve-ha-lrm[1980]: updating service status 
>> from
>>     >> manager failed: Connection refused
>>     >> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:08 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[1] failed:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[2] failed:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[3] failed:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:08 lion pvestatd[1879]: ipcc_send_rec[4] failed:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:08 lion pvestatd[1879]: status update error:
>>     Connection
>>     >> refused
>>     >> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:09 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[1] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[2] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:10 lion pveproxy[17194]: ipcc_send_rec[3] failed:
>>     >> Connection refused
>>     >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: State
>>     >> 'stop-sigterm' timed out. Killing.
>>     >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Killing
>>     process
>>     >> 1813 (pmxcfs) with signal SIGKILL.
>>     >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Main 
>> process
>>     >> exited, code=killed, status=9/KILL
>>     >> Oct 30 14:17:10 lion systemd[1]: Stopped The Proxmox VE cluster
>>     filesystem.
>>     >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Unit 
>> entered
>>     >> failed state.
>>     >> Oct 30 14:17:10 lion systemd[1]: pve-cluster.service: Failed
>>     with result
>>     >> 'timeout'.
>>     >>
>>     >> System info:
>>     >>
>>     >> pveversion -v
>>     >> proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
>>     >> pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
>>     >> pve-kernel-4.15: 5.2-1
>>     >> pve-kernel-4.15.17-1-pve: 4.15.17-9
>>     >> corosync: 2.4.2-pve5
>>     >> criu: 2.11.1-1~bpo90
>>     >> glusterfs-client: 3.8.8-1
>>     >> ksm-control-daemon: 1.2-2
>>     >> libjs-extjs: 6.0.1-2
>>     >> libpve-access-control: 5.0-8
>>     >> libpve-apiclient-perl: 2.0-5
>>     >> libpve-common-perl: 5.0-40
>>     >> libpve-guest-common-perl: 2.0-18
>>     >> libpve-http-server-perl: 2.0-11
>>     >> libpve-storage-perl: 5.0-23
>>     >> libqb0: 1.0.1-1
>>     >> lvm2: 2.02.168-pve6
>>     >> lxc-pve: 3.0.2+pve1-3
>>     >> lxcfs: 3.0.2-2
>>     >> novnc-pve: 1.0.0-2
>>     >> proxmox-widget-toolkit: 1.0-20
>>     >> pve-cluster: 5.0-30
>>     >> pve-container: 2.0-23
>>     >> pve-docs: 5.2-8
>>     >> pve-firewall: 3.0-14
>>     >> pve-firmware: 2.0-5
>>     >> pve-ha-manager: 2.0-5
>>     >> pve-i18n: 1.0-6
>>     >> pve-libspice-server1: 0.12.8-3
>>     >> pve-qemu-kvm: 2.11.1-5
>>     >> pve-xtermjs: 1.0-5
>>     >> qemu-server: 5.0-38
>>     >> smartmontools: 6.5+svn4324-1
>>     >> spiceterm: 3.0-5
>>     >> vncterm: 1.5-3
>>     >> zfsutils-linux: 0.7.11-pve1~bpo1
>>     >>
>>     >> Any idea what's wrong with my (fresh and default) installation?
>>     >>
>>     >> Thanks,
>>     >> Adam
>>     >>
>>     >> _______________________________________________
>>     >> pve-user mailing list
>>     >> pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>>     >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>     >>
>>     > _______________________________________________
>>     > pve-user mailing list
>>     > pve-user at pve.proxmox.com <mailto:pve-user at pve.proxmox.com>
>>     > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user



More information about the pve-user mailing list