[pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

Alwin Antreich a.antreich at proxmox.com
Fri Dec 13 19:14:50 CET 2019


Some comments inline.

On Fri, Dec 13, 2019 at 03:56:42PM +0100, Aaron Lauterer wrote:
> It's possible to have a situation where the cluster network (used for
> inter-OSD traffic) is not configured on a node. The OSD can still be
> created but can't communicate.
> 
> This check will abort the creation if there is no IP within the subnet
> of the cluster network present on the node. If there is no dedicated
> cluster network the public network is used. The chances of that not
> being configured is much lower but better be on the safe side and check
> it if there is no cluster network.
> 
> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
> ---
>  PVE/API2/Ceph/OSD.pm | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index 5f70cf58..59cc9567 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
>  	# extract parameter info and fail if a device is set more than once
>  	my $devs = {};
>  
> +	my $ceph_conf = cfs_read_file('ceph.conf');
The public/cluster networks could have been migrated into the MON DB. In
this case they would not appear in the ceph.conf.

ATM it might be unlikely, there is an ugly warning, with every command
execution. But still possible.
```
Configuration option 'cluster_network' may not be modified at runtime
```

> +
> +	# check if network is configured
> +	my $osd_network = $ceph_conf->{global}->{cluster_network}
> +			    // $ceph_conf->{global}->{public_network};
An OSD needs both networks. Public for communication with the MONS &
clients. And the cluster network for replication. On our default setup,
it's both the same network.

I have tested the OSD creation with the cluster network down. During
creation, it only needs the public network to create the OSD on the MON.
But the OSD can't start and therefore isn't placed on the CRUSH map.
Once it can start, it will be added to the correct location on the map.

IMHO, the code needs to check both.

> +	die "No network interface configured for subnet $osd_network. Check ".
> +	    "your network config.\n" if !@{PVE::Network::get_local_ip_from_cidr($osd_network)};
> +
>  	# FIXME: rename params on next API compatibillity change (7.0)
>  	$param->{wal_dev_size} = delete $param->{wal_size};
>  	$param->{db_dev_size} = delete $param->{db_size};
> @@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
>  	my $fsid = $monstat->{monmap}->{fsid};
>          $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>  
> -	my $ceph_conf = cfs_read_file('ceph.conf');
>  	my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>  
>  	if (! -f $ceph_bootstrap_osd_keyring && $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
> -- 
> 2.20.1




More information about the pve-devel mailing list