[pve-devel] [PATCH docs] Add section Ceph CRUSH and device classes for pool assignment
Fabian Grünbichler
f.gruenbichler at proxmox.com
Mon Nov 20 10:01:38 CET 2017
looks mostly good, some nitpicks inline
On Fri, Nov 17, 2017 at 02:29:18PM +0100, Alwin Antreich wrote:
> Signed-off-by: Alwin Antreich <a.antreich at proxmox.com>
> ---
> pveceph.adoc | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 71 insertions(+)
>
> diff --git a/pveceph.adoc b/pveceph.adoc
> index c5eec4f..f152052 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -284,6 +284,77 @@ operation footnote:[Ceph pool operation
> http://docs.ceph.com/docs/luminous/rados/operations/pools/]
> manual.
>
> +Ceph CRUSH & device classes
> +---------------------------
> +The foundation of Ceph is its algorithm, **C**ontrolled **R**eplication
> +**U**nder **S**calable **H**ashing
> +(CRUSH footnote:[CRUSH https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf]).
> +
> +CRUSH calculates where to store to and retrieve data from, this has the
> +advantage that no central index service needed. CRUSH works with a map of OSDs,
s/needed/is needed/
> +buckets (device locations) and rulesets (data replication) for pools.
> +
> +NOTE: Further information can be found in the Ceph documentation, under the
> +section CRUSH map footnote:[CRUSH map http://docs.ceph.com/docs/luminous/rados/operations/crush-map/].
> +
> +This map can be altered to reflect different replication hierarchies. The object
> +replicas can be separated (eg. failure domains), while maintaining the desired
> +distribution.
> +
> +A common use case is to use different classes of disks for different Ceph pools.
> +For this reason, Ceph introduced the device classes with luminous, to
> +accommodate the need for easy ruleset generation.
> +
> +The device classes can be seen in the 'ceph osd tree' output. These classes
> +represent its own root bucket, that root can be seen with the below command.
s/its/their/
s/that root/which/
> +
> +[source, bash]
> +----
> +ceph osd crush tree --show-shadow
> +----
> +
> +Example for above command.
Example output for the above command:
> +
> +[source, bash]
> +----
> +ID CLASS WEIGHT TYPE NAME
> +-16 nvme 2.18307 root default~nvme
> +-13 nvme 0.72769 host sumi1~nvme
> + 12 nvme 0.72769 osd.12
> +-14 nvme 0.72769 host sumi2~nvme
> + 13 nvme 0.72769 osd.13
> +-15 nvme 0.72769 host sumi3~nvme
> + 14 nvme 0.72769 osd.14
> + -1 7.70544 root default
> + -3 2.56848 host sumi1
> + 12 nvme 0.72769 osd.12
> + -5 2.56848 host sumi2
> + 13 nvme 0.72769 osd.13
> + -7 2.56848 host sumi3
> + 14 nvme 0.72769 osd.14
> +----
I'd replace the hostnames and osd indices with more generic ones here.
> +
> +To let a pool distribute its objects only on a specific device class, you need
> +to create a ruleset with the specific class first.
> +
> +[source, bash]
> +----
> +ceph osd crush rule create-replicated <rule-name> <root> <failure-domain> <class>
> +----
I'd add a full example here and/or explain what those place holders mean, e.g.
ceph osd crush rule create-replicated nvmeonly default host nvme
> +
> +Once the rule is in the CRUSH map, you can tell a pool to use the ruleset.
> +
> +[source, bash]
> +----
> +ceph osd pool set <pool-name> crush_rule <rule-name>
> +----
> +
> +TIP: If the pool already contains objects, all of these have to be moved
> +accordingly. Depending on your setup this may introduce a big performance hit on
> +your cluster. As an alternative, you can create a new pool and move disks
> +separately.
> +
> +
> Ceph Client
> -----------
>
> --
> 2.11.0
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list