[pve-devel] [PATCH cluster v3 12/14] api/cluster: create cluster in worker

Fabian Grünbichler f.gruenbichler at proxmox.com
Tue Dec 19 14:55:00 CET 2017


On Tue, Dec 19, 2017 at 12:52:37PM +0100, Thomas Lamprecht wrote:
> This may need a bit longer and we get a nice task log entry with
> this.

but we also get the following error message (at first glance it looks
like this happens because the task has finished and the tasklist
broadcast fails? not sure..)

$ pvecm create test
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
Writing corosync config to /etc/pve/corosync.conf
Restart corosync and cluster filesystem
ipcc_send_rec[4] failed: Transport endpoint is not connected

adding an artifical delay before returning from the forked worker does
not help, so it does not seem like it's just the startup of
pmxcfs/corosync:

Dec 19 14:48:41 clustertest01 pvecm[3711]: <root at pam> starting task UPID:clustertest01:00000E80:00008562:5A3918B9:clustercreate::root at pam:
Dec 19 14:48:41 clustertest01 systemd[1]: Stopping The Proxmox VE cluster filesystem...
Dec 19 14:48:41 clustertest01 pmxcfs[2804]: [main] notice: teardown filesystem
Dec 19 14:48:41 clustertest01 pve-ha-lrm[2953]: unable to write lrm status file - unable to open file '/etc/pve/nodes/clustertest01/lrm_status.tmp.2953' - No such file or directory
Dec 19 14:48:42 clustertest01 pveproxy[2910]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:42 clustertest01 pveproxy[2910]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:42 clustertest01 pveproxy[2910]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:42 clustertest01 pve-ha-crm[2990]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:42 clustertest01 pve-ha-crm[2990]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:42 clustertest01 pve-ha-crm[2990]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:43 clustertest01 pmxcfs[2804]: [main] notice: exit proxmox configuration filesystem (0)
Dec 19 14:48:43 clustertest01 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Dec 19 14:48:43 clustertest01 systemd[1]: Starting The Proxmox VE cluster filesystem...
Dec 19 14:48:43 clustertest01 pmxcfs[3721]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Dec 19 14:48:43 clustertest01 pmxcfs[3721]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [quorum] crit: quorum_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [quorum] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [confdb] crit: cmap_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [confdb] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [dcdb] crit: cpg_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [dcdb] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [status] crit: cpg_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [status] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pvecm[3739]: ipcc_send_rec[1] failed: Connection refused
Dec 19 14:48:43 clustertest01 pvecm[3739]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:43 clustertest01 pvecm[3739]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:43 clustertest01 pvecm[3739]: Unable to load access control list: Connection refused
Dec 19 14:48:43 clustertest01 systemd[1]: Started The Proxmox VE cluster filesystem.
Dec 19 14:48:43 clustertest01 systemd[1]: Starting Corosync Cluster Engine...
Dec 19 14:48:43 clustertest01 corosync[3754]:  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Dec 19 14:48:43 clustertest01 corosync[3754]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Dec 19 14:48:43 clustertest01 corosync[3754]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Dec 19 14:48:43 clustertest01 corosync[3754]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Dec 19 14:48:43 clustertest01 corosync[3754]: notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 19 14:48:43 clustertest01 corosync[3754]: notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Dec 19 14:48:43 clustertest01 corosync[3754]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 19 14:48:43 clustertest01 corosync[3754]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [TOTEM ] The network interface [10.0.0.11] is now up.
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 19 14:48:44 clustertest01 corosync[3754]: info    [QB    ] server name: cmap
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 19 14:48:44 clustertest01 corosync[3754]: info    [QB    ] server name: cfg
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec 19 14:48:44 clustertest01 corosync[3754]: info    [QB    ] server name: cpg
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Dec 19 14:48:44 clustertest01 corosync[3754]: warning [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
Dec 19 14:48:44 clustertest01 corosync[3754]: warning [WD    ] resource load_15min missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]: warning [WD    ] resource memory_used missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]: info    [WD    ] no resources configured.
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [QUORUM] Using quorum provider corosync_votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [QUORUM] This node is within the primary component and will provide service.
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [QUORUM] Members[0]:
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 19 14:48:44 clustertest01 corosync[3754]: info    [QB    ] server name: votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 19 14:48:44 clustertest01 corosync[3754]: info    [QB    ] server name: quorum
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [TOTEM ] A new membership (10.0.0.11:4) was formed. Members joined: 1
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [QUORUM] Members[1]: 1
Dec 19 14:48:44 clustertest01 corosync[3754]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 19 14:48:44 clustertest01 systemd[1]: Started Corosync Cluster Engine.
Dec 19 14:48:44 clustertest01 corosync[3754]:  [TOTEM ] The network interface [10.0.0.11] is now up.
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QB    ] server name: cmap
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QB    ] server name: cfg
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QB    ] server name: cpg
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
Dec 19 14:48:44 clustertest01 corosync[3754]:  [WD    ] resource load_15min missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]:  [WD    ] resource memory_used missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]:  [WD    ] no resources configured.
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync watchdog service [7]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QUORUM] Using quorum provider corosync_votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QUORUM] This node is within the primary component and will provide service.
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QUORUM] Members[0]:
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QB    ] server name: votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QB    ] server name: quorum
Dec 19 14:48:44 clustertest01 corosync[3754]:  [TOTEM ] A new membership (10.0.0.11:4) was formed. Members joined: 1
Dec 19 14:48:44 clustertest01 corosync[3754]:  [QUORUM] Members[1]: 1
Dec 19 14:48:44 clustertest01 corosync[3754]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[4] failed: Connection refused
Dec 19 14:48:44 clustertest01 pvestatd[1269]: status update error: Connection refused
Dec 19 14:48:46 clustertest01 pve-ha-lrm[2953]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: update cluster info (cluster name  test, version = 1)
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: node has quorum
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [dcdb] notice: members: 1/3735
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [dcdb] notice: all data is up to date
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: members: 1/3735
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: all data is up to date
Dec 19 14:48:58 clustertest01 pvecm[3711]: <root at pam> end task UPID:clustertest01:00000E80:00008562:5A3918B9:clustercreate::root at pam: OK

> ---
> 
> new in v3
> 
>  data/PVE/API2/ClusterConfig.pm | 44 +++++++++++++++++++++++-------------------
>  1 file changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/data/PVE/API2/ClusterConfig.pm b/data/PVE/API2/ClusterConfig.pm
> index 07e92be..123cecb 100644
> --- a/data/PVE/API2/ClusterConfig.pm
> +++ b/data/PVE/API2/ClusterConfig.pm
> @@ -100,38 +100,42 @@ __PACKAGE__->register_method ({
>  	    },
>  	},
>      },
> -    returns => { type => 'null' },
> -
> +    returns => { type => 'string' },
>      code => sub {
>  	my ($param) = @_;
>  
>  	-f $clusterconf && die "cluster config '$clusterconf' already exists\n";
>  
> -	PVE::Cluster::setup_sshd_config(1);
> -	PVE::Cluster::setup_rootsshconfig();
> -	PVE::Cluster::setup_ssh_keys();
> +	my $rpcenv = PVE::RPCEnvironment::get();
> +	my $authuser = $rpcenv->get_user();
>  
> -	PVE::Tools::run_command(['/usr/sbin/corosync-keygen', '-lk', $authfile])
> -	    if !-f $authfile;
> -	die "no authentication key available\n" if -f !$authfile;
> +	my $worker = sub {
> +	    PVE::Cluster::setup_sshd_config(1);
> +	    PVE::Cluster::setup_rootsshconfig();
> +	    PVE::Cluster::setup_ssh_keys();
>  
> -	my $nodename = PVE::INotify::nodename();
> +	    PVE::Tools::run_command(['/usr/sbin/corosync-keygen', '-lk', $authfile])
> +		if !-f $authfile;
> +	    die "no authentication key available\n" if -f !$authfile;
>  
> -	# get the corosync basis config for the new cluster
> -	my $config = PVE::Corosync::create_conf($nodename, %$param);
> +	    my $nodename = PVE::INotify::nodename();
>  
> -	print "Writing corosync config to /etc/pve/corosync.conf\n";
> -	PVE::Corosync::atomic_write_conf($config);
> +	    # get the corosync basis config for the new cluster
> +	    my $config = PVE::Corosync::create_conf($nodename, %$param);
>  
> -	my $local_ip_address = PVE::Cluster::remote_node_ip($nodename);
> -	PVE::Cluster::ssh_merge_keys();
> -	PVE::Cluster::gen_pve_node_files($nodename, $local_ip_address);
> -	PVE::Cluster::ssh_merge_known_hosts($nodename, $local_ip_address, 1);
> +	    print "Writing corosync config to /etc/pve/corosync.conf\n";
> +	    PVE::Corosync::atomic_write_conf($config);
>  
> -	print "Restart corosync and cluster filesystem\n";
> -	PVE::Tools::run_command('systemctl restart corosync pve-cluster');
> +	    my $local_ip_address = PVE::Cluster::remote_node_ip($nodename);
> +	    PVE::Cluster::ssh_merge_keys();
> +	    PVE::Cluster::gen_pve_node_files($nodename, $local_ip_address);
> +	    PVE::Cluster::ssh_merge_known_hosts($nodename, $local_ip_address, 1);
>  
> -	return undef;
> +	    print "Restart corosync and cluster filesystem\n";
> +	    PVE::Tools::run_command('systemctl restart corosync pve-cluster');
> +	};
> +
> +	return $rpcenv->fork_worker('clustercreate', '',  $authuser, $worker);
>  }});
>  
>  __PACKAGE__->register_method({
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel




More information about the pve-devel mailing list