[pve-devel] [PATCH manager v3] fix #4631: ceph: osd: create: add osds-per-device
Aaron Lauterer
a.lauterer at proxmox.com
Thu Sep 28 15:16:05 CEST 2023
ping? Patch still applies
previous patch versions with discussion are:
https://lists.proxmox.com/pipermail/pve-devel/2023-August/058794.html
https://lists.proxmox.com/pipermail/pve-devel/2023-August/058803.html
On 8/23/23 11:44, Aaron Lauterer wrote:
> Allows to automatically create multiple OSDs per physical device. The
> main use case are fast NVME drives that would be bottlenecked by a
> single OSD service.
>
> By using the 'ceph-volume lvm batch' command instead of the 'ceph-volume
> lvm create' for multiple OSDs / device, we don't have to deal with the
> split of the drive ourselves.
>
> But this means that the parameters to specify a DB or WAL device won't
> work as the 'batch' command doesn't use them. Dedicated DB and WAL
> devices don't make much sense anyway if we place the OSDs on fast NVME
> drives.
>
> Some other changes to how the command is built were needed as well, as
> the 'batch' command needs the path to the disk as a positional argument,
> not as '--data /dev/sdX'.
> We drop the '--cluster-fsid' parameter because the 'batch' command
> doesn't accept it. The 'create' will fall back to reading it from the
> ceph.conf file.
>
> Removal of OSDs works as expected without any code changes. As long as
> there are other OSDs on a disk, the VG & PV won't be removed, even if
> 'cleanup' is enabled.
>
> The '--no-auto' parameter is used to avoid the following deprecation
> warning:
> ```
> --> DEPRECATION NOTICE
> --> You are using the legacy automatic disk sorting behavior
> --> The Pacific release will change the default to --no-auto
> --> passed data devices: 1 physical, 0 LVM
> --> relative data size: 0.3333333333333333
> ```
>
> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
> ---
>
> changes since v2:
> * removed check for fsid
> * rework ceph-volume call to place the positional devpath parameter
> after '--'
>
> PVE/API2/Ceph/OSD.pm | 35 +++++++++++++++++++++++++++++------
> 1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index ded35990..a1d92ca7 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,13 @@ __PACKAGE__->register_method ({
> type => 'string',
> description => "Set the device class of the OSD in crush."
> },
> + 'osds-per-device' => {
> + optional => 1,
> + type => 'integer',
> + minimum => '1',
> + description => 'OSD services per physical device. Only useful for fast ".
> + "NVME devices to utilize their performance better.',
> + },
> },
> },
> returns => { type => 'string' },
> @@ -294,6 +301,15 @@ __PACKAGE__->register_method ({
> # extract parameter info and fail if a device is set more than once
> my $devs = {};
>
> + # allow 'osds-per-device' only without dedicated db and/or wal devs. We cannot specify them with
> + # 'ceph-volume lvm batch' and they don't make a lot of sense on fast NVMEs anyway.
> + if ($param->{'osds-per-device'}) {
> + for my $type ( qw(db_dev wal_dev) ) {
> + raise_param_exc({ $type => "canot use 'osds-per-device' parameter with '${type}'" })
> + if $param->{$type};
> + }
> + }
> +
> my $ceph_conf = cfs_read_file('ceph.conf');
>
> my $osd_network = $ceph_conf->{global}->{cluster_network};
> @@ -363,10 +379,6 @@ __PACKAGE__->register_method ({
> my $rados = PVE::RADOS->new();
> my $monstat = $rados->mon_command({ prefix => 'quorum_status' });
>
> - die "unable to get fsid\n" if !$monstat->{monmap} || !$monstat->{monmap}->{fsid};
> - my $fsid = $monstat->{monmap}->{fsid};
> - $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
> -
> my $ceph_bootstrap_osd_keyring = PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>
> if (! -f $ceph_bootstrap_osd_keyring && $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
> @@ -470,7 +482,10 @@ __PACKAGE__->register_method ({
> $test_disk_requirements->($disklist);
>
> my $dev_class = $param->{'crush-device-class'};
> - my $cmd = ['ceph-volume', 'lvm', 'create', '--cluster-fsid', $fsid ];
> + # create allows for detailed configuration of DB and WAL devices
> + # batch for easy creation of multiple OSDs (per device)
> + my $create_mode = $param->{'osds-per-device'} ? 'batch' : 'create';
> + my $cmd = ['ceph-volume', 'lvm', $create_mode ];
> push @$cmd, '--crush-device-class', $dev_class if $dev_class;
>
> my $devname = $devs->{dev}->{name};
> @@ -504,9 +519,17 @@ __PACKAGE__->register_method ({
> push @$cmd, "--block.$type", $part_or_lv;
> }
>
> - push @$cmd, '--data', $devpath;
> + push @$cmd, '--data', $devpath if $create_mode eq 'create';
> push @$cmd, '--dmcrypt' if $param->{encrypted};
>
> + if ($create_mode eq 'batch') {
> + push @$cmd,
> + '--osds-per-device', $param->{'osds-per-device'},
> + '--yes',
> + '--no-auto',
> + '--',
> + $devpath;
> + }
> PVE::Diskmanage::wipe_blockdev($devpath);
>
> if (PVE::Diskmanage::is_partition($devpath)) {
More information about the pve-devel
mailing list