[pve-devel] [PATCH docs 1/6] ceph: add anchors for use in troubleshooting section

Mon Feb 3 17:19:19 CET 2025

On Mon Feb 3, 2025 at 3:27 PM CET, Alexander Zeidler wrote:
> Signed-off-by: Alexander Zeidler <a.zeidler at proxmox.com>
> ---

Some high-level feedback (see comments inline and in patches otherwise):

- The writing style is IMO quite clear and straightforward, nice work!

- In patch 03, the "_disk_health_monitoring" anchor reference seems to
  break my build for some reason. Does this also happen on your end? The
  single-page docs ("pve-admin-guide.html") seem to build just fine
  otherwise.

- Regarding implicitly / auto-generated anchors, is it fine to break
  those in general or not? See my other comments inline here.

- There are a few tiny style things I personally would correct, but if
  you disagree with them, feel free to leave them as they are.

All in all this seems pretty solid; the stuff regarding the anchors
needs to be clarified first (whether it's okay to break auto-generated
ones & the one anchor that makes my build fail). Otherwise, pretty good!

>  pveceph.adoc | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/pveceph.adoc b/pveceph.adoc
> index da39e7f..93c2f8d 100644
> --- a/pveceph.adoc
> +++ b/pveceph.adoc
> @@ -82,6 +82,7 @@ and vocabulary
>  footnote:[Ceph glossary {cephdocs-url}/glossary].
>  
>  
> +[[pve_ceph_recommendation]]
>  Recommendations for a Healthy Ceph Cluster
>  ------------------------------------------

AsciiDoc automatically generated an anchor for the heading above
already, and it's "_recommendations_for_a_healthy_ceph_cluster"
apparently. So, there's no need to provide one here explicitly, since it
already exists; it also might break old links that refer to the
documentation.

Though, perhaps in a separate series, you could look for all implicitly
defined anchors and set them explicitly..? Not sure if that's something
we want, though.

>  
> @@ -95,6 +96,7 @@ NOTE: The recommendations below should be seen as a rough guidance for choosing
>  hardware. Therefore, it is still essential to adapt it to your specific needs.
>  You should test your setup and monitor health and performance continuously.
>  
> +[[pve_ceph_recommendation_cpu]]
>  .CPU
>  Ceph services can be classified into two categories:
>  
> @@ -122,6 +124,7 @@ IOPS load over 100'000 with sub millisecond latency, each OSD can use multiple
>  CPU threads, e.g., four to six CPU threads utilized per NVMe backed OSD is
>  likely for very high performance disks.
>  
> +[[pve_ceph_recommendation_memory]]
>  .Memory
>  Especially in a hyper-converged setup, the memory consumption needs to be
>  carefully planned out and monitored. In addition to the predicted memory usage
> @@ -137,6 +140,7 @@ normal operation, but rather leave some headroom to cope with outages.
>  The OSD service itself will use additional memory. The Ceph BlueStore backend of
>  the daemon requires by default **3-5 GiB of memory** (adjustable).
>  
> +[[pve_ceph_recommendation_network]]
>  .Network
>  We recommend a network bandwidth of at least 10 Gbps, or more, to be used
>  exclusively for Ceph traffic. A meshed network setup
> @@ -172,6 +176,7 @@ high-performance setups:
>  * one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync
>    cluster communication.
>  
> +[[pve_ceph_recommendation_disk]]
>  .Disks
>  When planning the size of your Ceph cluster, it is important to take the
>  recovery time into consideration. Especially with small clusters, recovery
> @@ -197,6 +202,7 @@ You also need to balance OSD count and single OSD capacity. More capacity
>  allows you to increase storage density, but it also means that a single OSD
>  failure forces Ceph to recover more data at once.
>  
> +[[pve_ceph_recommendation_raid]]
>  .Avoid RAID
>  As Ceph handles data object redundancy and multiple parallel writes to disks
>  (OSDs) on its own, using a RAID controller normally doesn’t improve
> @@ -1018,6 +1024,7 @@ to act as standbys.
>  Ceph maintenance
>  ----------------
>  
> +[[pve_ceph_osd_replace]]
>  Replace OSDs
>  ~~~~~~~~~~~~

This one here is also implicitly defined already, unfortunately.

>  
> @@ -1131,6 +1138,7 @@ ceph osd unset noout
>  You can now start up the guests. Highly available guests will change their state
>  to 'started' when they power on.
>  
> +[[pve_ceph_mon_and_ts]]
>  Ceph Monitoring and Troubleshooting
>  -----------------------------------
>  

So is this one.

Actually, now I do wonder: I think it's better to define them in the
AsciiDoc code directly, but how would we do that with existing anchors?
Just use the automatically generated anchor name? Or are we fine with
breaking links? Would be nice if someone could chime in here.

(Personally, I think it's fine to break these things, but I stand
corrected if that's a no-go.)