[pve-devel] [PATCH manager v3] fix #4631: ceph: osd: create: add osds-per-device

Aaron Lauterer a.lauterer at proxmox.com
Thu Sep 28 15:16:05 CEST 2023


ping? Patch still applies

previous patch versions with discussion are:
https://lists.proxmox.com/pipermail/pve-devel/2023-August/058794.html
https://lists.proxmox.com/pipermail/pve-devel/2023-August/058803.html

On 8/23/23 11:44, Aaron Lauterer wrote:
> Allows to automatically create multiple OSDs per physical device. The
> main use case are fast NVME drives that would be bottlenecked by a
> single OSD service.
> 
> By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
> lvm create' for multiple OSDs / device, we don't have to deal with the
> split of the drive ourselves.
> 
> But this means that the parameters to specify a DB or WAL device won't
> work as the 'batch' command doesn't use them. Dedicated DB and WAL
> devices don't make much sense anyway if we place the OSDs on fast NVME
> drives.
> 
> Some other changes to how the command is built were needed as well, as
> the 'batch' command needs the path to the disk as a positional argument,
> not as '--data /dev/sdX'.
> We drop the '--cluster-fsid' parameter because the 'batch' command
> doesn't accept it. The 'create' will fall back to reading it from the
> ceph.conf file.
> 
> Removal of OSDs works as expected without any code changes. As long as
> there are other OSDs on a disk, the VG & PV won't be removed, even if
> 'cleanup' is enabled.
> 
> The '--no-auto' parameter is used to avoid the following deprecation
> warning:
> ```
> --> DEPRECATION NOTICE
> --> You are using the legacy automatic disk sorting behavior
> --> The Pacific release will change the default to --no-auto
> --> passed data devices: 1 physical, 0 LVM
> --> relative data size: 0.3333333333333333
> ```
> 
> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
> ---
> 
> changes since v2:
> * removed check for fsid
> * rework ceph-volume call to place the positional devpath parameter
>    after '--'
> 
>   PVE/API2/Ceph/OSD.pm | 35 +++++++++++++++++++++++++++++------
>   1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index ded35990..a1d92ca7 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,13 @@ __PACKAGE__->register_method ({
>   		type => 'string',
>   		description => "Set the device class of the OSD in crush."
>   	    },
> +	    'osds-per-device' => {
> +		optional => 1,
> +		type => 'integer',
> +		minimum => '1',
> +		description => 'OSD services per physical device. Only useful for fast ".
> +		    "NVME devices to utilize their performance better.',
> +	    },
>   	},
>       },
>       returns => { type => 'string' },
> @@ -294,6 +301,15 @@ __PACKAGE__->register_method ({
>   	# extract parameter info and fail if a device is set more than once
>   	my $devs = {};
>   
> +	# allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
> +	# 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
> +	if ($param->{'osds-per-device'}) {
> +	    for my $type ( qw(db_dev wal_dev) ) {
> +		raise_param_exc({ $type => "canot use 'osds-per-device' parameter with '${type}'" })
> +		    if $param->{$type};
> +	    }
> +	}
> +
>   	my $ceph_conf = cfs_read_file('ceph.conf');
>   
>   	my $osd_network = $ceph_conf->{global}->{cluster_network};
> @@ -363,10 +379,6 @@ __PACKAGE__->register_method ({
>   	my $rados = PVE::RADOS->new();
>   	my $monstat = $rados->mon_command({ prefix => 'quorum_status' });
>   
> -	die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
> -	my $fsid = $monstat->{monmap}->{fsid};
> -        $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
> -
>   	my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>   
>   	if (! -f $ceph_bootstrap_osd_keyring && $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
> @@ -470,7 +482,10 @@ __PACKAGE__->register_method ({
>   		$test_disk_requirements->($disklist);
>   
>   		my $dev_class = $param->{'crush-device-class'};
> -		my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
> +		# create allows for detailed configuration of DB and WAL devices
> +		# batch for easy creation of multiple OSDs (per device)
> +		my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
> +		my $cmd = ['ceph-volume', 'lvm', $create_mode ];
>   		push @$cmd, '--crush-device-class', $dev_class if $dev_class;
>   
>   		my $devname = $devs->{dev}->{name};
> @@ -504,9 +519,17 @@ __PACKAGE__->register_method ({
>   		    push @$cmd, "--block.$type", $part_or_lv;
>   		}
>   
> -		push @$cmd, '--data', $devpath;
> +		push @$cmd, '--data', $devpath if $create_mode eq 'create';
>   		push @$cmd, '--dmcrypt' if $param->{encrypted};
>   
> +		if ($create_mode eq 'batch') {
> +		    push @$cmd,
> +			'--osds-per-device', $param->{'osds-per-device'},
> +			'--yes',
> +			'--no-auto',
> +			'--',
> +			$devpath;
> +		}
>   		PVE::Diskmanage::wipe_blockdev($devpath);
>   
>   		if (PVE::Diskmanage::is_partition($devpath)) {





More information about the pve-devel mailing list