[pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

Thomas Lamprecht t.lamprecht at proxmox.com
Sat Dec 14 15:17:03 CET 2019


On 12/13/19 7:14 PM, Alwin Antreich wrote:
> Some comments inline.
> 
> On Fri, Dec 13, 2019 at 03:56:42PM +0100, Aaron Lauterer wrote:
>> It's possible to have a situation where the cluster network (used for
>> inter-OSD traffic) is not configured on a node. The OSD can still be
>> created but can't communicate.
>>
>> This check will abort the creation if there is no IP within the subnet
>> of the cluster network present on the node. If there is no dedicated
>> cluster network the public network is used. The chances of that not
>> being configured is much lower but better be on the safe side and check
>> it if there is no cluster network.
>>
>> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
>> ---
>>  PVE/API2/Ceph/OSD.pm | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
>> index 5f70cf58..59cc9567 100644
>> --- a/PVE/API2/Ceph/OSD.pm
>> +++ b/PVE/API2/Ceph/OSD.pm
>> @@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
>>  	# extract parameter info and fail if a device is set more than once
>>  	my $devs = {};
>>  
>> +	my $ceph_conf = cfs_read_file('ceph.conf');
> The public/cluster networks could have been migrated into the MON DB. In
> this case they would not appear in the ceph.conf.
> 
> ATM it might be unlikely, there is an ugly warning, with every command
> execution. But still possible.
> ```
> Configuration option 'cluster_network' may not be modified at runtime
> ```

This is for setups which are managed by PVE-Ceph, so no concern for that.
Further, as you mentioned, it's highly unlikely, and can be work-arounded
so no concern for now. We also have more places which such potential
"issues", this is something we should look at for the next Ceph release,
Octopus, see were they're going (ceph tends to change mind sometimes) and
so see if we want/need to cope with the config DB better.

> 
>> +
>> +	# check if network is configured
>> +	my $osd_network = $ceph_conf->{global}->{cluster_network}
>> +			    // $ceph_conf->{global}->{public_network};
> An OSD needs both networks. Public for communication with the MONS &
> clients. And the cluster network for replication. On our default setup,
> it's both the same network.
> 
> I have tested the OSD creation with the cluster network down. During
> creation, it only needs the public network to create the OSD on the MON.
> But the OSD can't start and therefore isn't placed on the CRUSH map.
> Once it can start, it will be added to the correct location on the map.
> 
> IMHO, the code needs to check both.

This checks if the OSD communication is up, and is thus a valid fix for #2496,
if the public network is down the OSD cannot be created anyway, so there's
already an error - but yes, we could also check for the public network, would
not harm, the patch is good as is on its own though.

> 
>> +	die "No network interface configured for subnet $osd_network. Check ".
>> +	    "your network config.\n" if !@{PVE::Network::get_local_ip_from_cidr($osd_network)};
>> +
>>  	# FIXME: rename params on next API compatibillity change (7.0)
>>  	$param->{wal_dev_size} = delete $param->{wal_size};
>>  	$param->{db_dev_size} = delete $param->{db_size};
>> @@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
>>  	my $fsid = $monstat->{monmap}->{fsid};
>>          $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>>  
>> -	my $ceph_conf = cfs_read_file('ceph.conf');
>>  	my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>>  
>>  	if (! -f $ceph_bootstrap_osd_keyring && $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
>> -- 
>> 2.20.1






More information about the pve-devel mailing list