[pve-devel] [PATCH docs 10/11] Fix #1958: pveceph: add section Ceph maintenance

Aaron Lauterer a.lauterer at proxmox.com
Tue Nov 5 11:34:28 CET 2019



On 11/4/19 2:52 PM, Alwin Antreich wrote:
> Signed-off-by: Alwin Antreich <a.antreich at proxmox.com>
> ---
>   pveceph.adoc | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 54 insertions(+)
> 
> diff --git a/pveceph.adoc b/pveceph.adoc
> index 087c4d0..127e3bb 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -331,6 +331,7 @@ network. In a Ceph cluster, you will usually have one OSD per physical disk.
>   
>   NOTE: By default an object is 4 MiB in size.
>   
> +[[pve_ceph_osd_create]]
>   Creating OSDs
>   ~~~~~~~~~~~~~
>   
> @@ -407,6 +408,7 @@ Starting with Ceph Nautilus, {pve} does not support creating such OSDs with
>   ceph-volume lvm create --filestore --data /dev/sd[X] --journal /dev/sd[Y]
>   ----
>   
> +[[pve_ceph_osd_destroy]]
>   Destroying OSDs
>   ~~~~~~~~~~~~~~~
>   
> @@ -724,6 +726,58 @@ pveceph pool destroy NAME
>   ----
>   
>   
> +Ceph maintenance
> +----------------
> +Replace OSDs
> +~~~~~~~~~~~~
> +One of the common maintenance tasks in Ceph is to replace a disk of an OSD. If
... the disk ...
> +a disk already failed, you can go ahead and run through the steps in
> +xref:pve_ceph_osd_destroy[Destroying OSDs]. As no data is accessible from the
> +disk. Ceph will recreate those copies on the remaining OSDs if possible
... a disk is already in a failed state the data on it is not accessible 
anymore and you can go/run through the steps in 
xref:pve_ceph_osd_destroy[Destroying OSDs]. Ceph will recreate the 
missing copies on the remaining OSDs if possible.

> +
> +For replacing a still functioning disk. From the GUI run through the steps as
> +shown in xref:pve_ceph_osd_destroy[Destroying OSDs]. The only addition is to
> +wait till the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it.

To replace a still functioning disk via the GUI go/run through the steps 
in xref:pve_ceph_osd_destroy[Destroying OSDs] with one addition: wait 
until the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it.

> +
> +On the command line use the below commands.
... use the following commands:
> +----
> +ceph osd out osd.<id>
> +----
> +
> +You can check with the below command if the OSD can be already removed.
... with the command below if the OSD can be safely removed.
# or
... the following command if the OSD can be safely removed:
> +----
> +ceph osd safe-to-destroy osd.<id>
> +----
> +
> +Once the above check tells you that it is save to remove the OSD, you can
> +continue with below commands.
... continue with the following commands:
> +----
> +systemctl stop ceph-osd@<id>.service
> +pveceph osd destroy <id>
> +----
> +
> +Replace the old with the new disk and use the same procedure as described in
> +xref:pve_ceph_osd_create[Creating OSDs].

Replace the old disk with the new one and use the same procedure...
> +
> +NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
> +`size + 1` nodes are available.
> +
> +Run fstrim (discard)
> +~~~~~~~~~~~~~~~~~~~~
> +It is a good measure to run fstrim (discard) regularly on VMs or containers.
> +This releases data blocks that the filesystem isn’t using anymore. It reduces
> +data usage and the resource load.

... to run 'fstrim' (discard) ...
> +
> +Scrub & Deep Scrub
> +~~~~~~~~~~~~~~~~~~
> +Ceph insures data integrity by 'scrubbing' placement groups. Ceph check every
... Ceph checks every ...
> +object in a PG for its health. There are two forms of Scrubbing, daily
# scrubbing lower case
> +(metadata compare) and weekly. The latter reads the object and uses checksums
... The weekly scrub read the objects and uses checksums ...
> +to ensure data integrity. If a running scrub interferes with business needs,
> +you can adjust the time of execution of Scrub footnote:[Ceph scrubbing
.. adjust the time when scrubs are executed ...
> +https://docs.ceph.com/docs/nautilus/rados/configuration/osd-config-ref/#scrubbing].
> +
> +
>   Ceph monitoring and troubleshooting
>   -----------------------------------
>   A good start is to continuosly monitor the ceph health from the start of
> 




More information about the pve-devel mailing list