[pve-devel] [PATCH cluster v3 12/14] api/cluster: create cluster in worker
Fabian Grünbichler
f.gruenbichler at proxmox.com
Tue Dec 19 14:55:00 CET 2017
On Tue, Dec 19, 2017 at 12:52:37PM +0100, Thomas Lamprecht wrote:
> This may need a bit longer and we get a nice task log entry with
> this.
but we also get the following error message (at first glance it looks
like this happens because the task has finished and the tasklist
broadcast fails? not sure..)
$ pvecm create test
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
Writing corosync config to /etc/pve/corosync.conf
Restart corosync and cluster filesystem
ipcc_send_rec[4] failed: Transport endpoint is not connected
adding an artifical delay before returning from the forked worker does
not help, so it does not seem like it's just the startup of
pmxcfs/corosync:
Dec 19 14:48:41 clustertest01 pvecm[3711]: <root at pam> starting task UPID:clustertest01:00000E80:00008562:5A3918B9:clustercreate::root at pam:
Dec 19 14:48:41 clustertest01 systemd[1]: Stopping The Proxmox VE cluster filesystem...
Dec 19 14:48:41 clustertest01 pmxcfs[2804]: [main] notice: teardown filesystem
Dec 19 14:48:41 clustertest01 pve-ha-lrm[2953]: unable to write lrm status file - unable to open file '/etc/pve/nodes/clustertest01/lrm_status.tmp.2953' - No such file or directory
Dec 19 14:48:42 clustertest01 pveproxy[2910]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:42 clustertest01 pveproxy[2910]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:42 clustertest01 pveproxy[2910]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:42 clustertest01 pve-ha-crm[2990]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:42 clustertest01 pve-ha-crm[2990]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:42 clustertest01 pve-ha-crm[2990]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:43 clustertest01 pmxcfs[2804]: [main] notice: exit proxmox configuration filesystem (0)
Dec 19 14:48:43 clustertest01 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Dec 19 14:48:43 clustertest01 systemd[1]: Starting The Proxmox VE cluster filesystem...
Dec 19 14:48:43 clustertest01 pmxcfs[3721]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Dec 19 14:48:43 clustertest01 pmxcfs[3721]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [quorum] crit: quorum_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [quorum] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [confdb] crit: cmap_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [confdb] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [dcdb] crit: cpg_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [dcdb] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [status] crit: cpg_initialize failed: 2
Dec 19 14:48:43 clustertest01 pmxcfs[3735]: [status] crit: can't initialize service
Dec 19 14:48:43 clustertest01 pvecm[3739]: ipcc_send_rec[1] failed: Connection refused
Dec 19 14:48:43 clustertest01 pvecm[3739]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:43 clustertest01 pvecm[3739]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:43 clustertest01 pvecm[3739]: Unable to load access control list: Connection refused
Dec 19 14:48:43 clustertest01 systemd[1]: Started The Proxmox VE cluster filesystem.
Dec 19 14:48:43 clustertest01 systemd[1]: Starting Corosync Cluster Engine...
Dec 19 14:48:43 clustertest01 corosync[3754]: [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Dec 19 14:48:43 clustertest01 corosync[3754]: notice [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Dec 19 14:48:43 clustertest01 corosync[3754]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Dec 19 14:48:43 clustertest01 corosync[3754]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Dec 19 14:48:43 clustertest01 corosync[3754]: notice [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 19 14:48:43 clustertest01 corosync[3754]: notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Dec 19 14:48:43 clustertest01 corosync[3754]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 19 14:48:43 clustertest01 corosync[3754]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [TOTEM ] The network interface [10.0.0.11] is now up.
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync configuration map access [0]
Dec 19 14:48:44 clustertest01 corosync[3754]: info [QB ] server name: cmap
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync configuration service [1]
Dec 19 14:48:44 clustertest01 corosync[3754]: info [QB ] server name: cfg
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec 19 14:48:44 clustertest01 corosync[3754]: info [QB ] server name: cpg
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync profile loading service [4]
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync resource monitoring service [6]
Dec 19 14:48:44 clustertest01 corosync[3754]: warning [WD ] Watchdog /dev/watchdog exists but couldn't be opened.
Dec 19 14:48:44 clustertest01 corosync[3754]: warning [WD ] resource load_15min missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]: warning [WD ] resource memory_used missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]: info [WD ] no resources configured.
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync watchdog service [7]
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [QUORUM] Using quorum provider corosync_votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [QUORUM] This node is within the primary component and will provide service.
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [QUORUM] Members[0]:
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 19 14:48:44 clustertest01 corosync[3754]: info [QB ] server name: votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 19 14:48:44 clustertest01 corosync[3754]: info [QB ] server name: quorum
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [TOTEM ] A new membership (10.0.0.11:4) was formed. Members joined: 1
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [QUORUM] Members[1]: 1
Dec 19 14:48:44 clustertest01 corosync[3754]: notice [MAIN ] Completed service synchronization, ready to provide service.
Dec 19 14:48:44 clustertest01 systemd[1]: Started Corosync Cluster Engine.
Dec 19 14:48:44 clustertest01 corosync[3754]: [TOTEM ] The network interface [10.0.0.11] is now up.
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync configuration map access [0]
Dec 19 14:48:44 clustertest01 corosync[3754]: [QB ] server name: cmap
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync configuration service [1]
Dec 19 14:48:44 clustertest01 corosync[3754]: [QB ] server name: cfg
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec 19 14:48:44 clustertest01 corosync[3754]: [QB ] server name: cpg
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync profile loading service [4]
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Dec 19 14:48:44 clustertest01 corosync[3754]: [WD ] Watchdog /dev/watchdog exists but couldn't be opened.
Dec 19 14:48:44 clustertest01 corosync[3754]: [WD ] resource load_15min missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]: [WD ] resource memory_used missing a recovery key.
Dec 19 14:48:44 clustertest01 corosync[3754]: [WD ] no resources configured.
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync watchdog service [7]
Dec 19 14:48:44 clustertest01 corosync[3754]: [QUORUM] Using quorum provider corosync_votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]: [QUORUM] This node is within the primary component and will provide service.
Dec 19 14:48:44 clustertest01 corosync[3754]: [QUORUM] Members[0]:
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 19 14:48:44 clustertest01 corosync[3754]: [QB ] server name: votequorum
Dec 19 14:48:44 clustertest01 corosync[3754]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 19 14:48:44 clustertest01 corosync[3754]: [QB ] server name: quorum
Dec 19 14:48:44 clustertest01 corosync[3754]: [TOTEM ] A new membership (10.0.0.11:4) was formed. Members joined: 1
Dec 19 14:48:44 clustertest01 corosync[3754]: [QUORUM] Members[1]: 1
Dec 19 14:48:44 clustertest01 corosync[3754]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[2] failed: Connection refused
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[3] failed: Connection refused
Dec 19 14:48:44 clustertest01 pvestatd[1269]: ipcc_send_rec[4] failed: Connection refused
Dec 19 14:48:44 clustertest01 pvestatd[1269]: status update error: Connection refused
Dec 19 14:48:46 clustertest01 pve-ha-lrm[2953]: ipcc_send_rec[1] failed: Transport endpoint is not connected
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: update cluster info (cluster name test, version = 1)
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: node has quorum
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [dcdb] notice: members: 1/3735
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [dcdb] notice: all data is up to date
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: members: 1/3735
Dec 19 14:48:49 clustertest01 pmxcfs[3735]: [status] notice: all data is up to date
Dec 19 14:48:58 clustertest01 pvecm[3711]: <root at pam> end task UPID:clustertest01:00000E80:00008562:5A3918B9:clustercreate::root at pam: OK
> ---
>
> new in v3
>
> data/PVE/API2/ClusterConfig.pm | 44 +++++++++++++++++++++++-------------------
> 1 file changed, 24 insertions(+), 20 deletions(-)
>
> diff --git a/data/PVE/API2/ClusterConfig.pm b/data/PVE/API2/ClusterConfig.pm
> index 07e92be..123cecb 100644
> --- a/data/PVE/API2/ClusterConfig.pm
> +++ b/data/PVE/API2/ClusterConfig.pm
> @@ -100,38 +100,42 @@ __PACKAGE__->register_method ({
> },
> },
> },
> - returns => { type => 'null' },
> -
> + returns => { type => 'string' },
> code => sub {
> my ($param) = @_;
>
> -f $clusterconf && die "cluster config '$clusterconf' already exists\n";
>
> - PVE::Cluster::setup_sshd_config(1);
> - PVE::Cluster::setup_rootsshconfig();
> - PVE::Cluster::setup_ssh_keys();
> + my $rpcenv = PVE::RPCEnvironment::get();
> + my $authuser = $rpcenv->get_user();
>
> - PVE::Tools::run_command(['/usr/sbin/corosync-keygen', '-lk', $authfile])
> - if !-f $authfile;
> - die "no authentication key available\n" if -f !$authfile;
> + my $worker = sub {
> + PVE::Cluster::setup_sshd_config(1);
> + PVE::Cluster::setup_rootsshconfig();
> + PVE::Cluster::setup_ssh_keys();
>
> - my $nodename = PVE::INotify::nodename();
> + PVE::Tools::run_command(['/usr/sbin/corosync-keygen', '-lk', $authfile])
> + if !-f $authfile;
> + die "no authentication key available\n" if -f !$authfile;
>
> - # get the corosync basis config for the new cluster
> - my $config = PVE::Corosync::create_conf($nodename, %$param);
> + my $nodename = PVE::INotify::nodename();
>
> - print "Writing corosync config to /etc/pve/corosync.conf\n";
> - PVE::Corosync::atomic_write_conf($config);
> + # get the corosync basis config for the new cluster
> + my $config = PVE::Corosync::create_conf($nodename, %$param);
>
> - my $local_ip_address = PVE::Cluster::remote_node_ip($nodename);
> - PVE::Cluster::ssh_merge_keys();
> - PVE::Cluster::gen_pve_node_files($nodename, $local_ip_address);
> - PVE::Cluster::ssh_merge_known_hosts($nodename, $local_ip_address, 1);
> + print "Writing corosync config to /etc/pve/corosync.conf\n";
> + PVE::Corosync::atomic_write_conf($config);
>
> - print "Restart corosync and cluster filesystem\n";
> - PVE::Tools::run_command('systemctl restart corosync pve-cluster');
> + my $local_ip_address = PVE::Cluster::remote_node_ip($nodename);
> + PVE::Cluster::ssh_merge_keys();
> + PVE::Cluster::gen_pve_node_files($nodename, $local_ip_address);
> + PVE::Cluster::ssh_merge_known_hosts($nodename, $local_ip_address, 1);
>
> - return undef;
> + print "Restart corosync and cluster filesystem\n";
> + PVE::Tools::run_command('systemctl restart corosync pve-cluster');
> + };
> +
> + return $rpcenv->fork_worker('clustercreate', '', $authuser, $worker);
> }});
>
> __PACKAGE__->register_method({
> --
> 2.11.0
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list