[pve-devel] [RFC cluster/manager] prepare for cluster wide ceph dashboard

Fri Apr 26 08:21:24 CEST 2019

in order to have a better ceph dashboard that is available cluster-wide,
we have to add a few things:

* cluster-wide status api call
    not that hard, since in a hyperconverged setup we always have the
    info about the monitor and how to connect there

* a list of existing services
    ceph only manages monitors and osds that exists, but does not care
    about mds or mgr, especially if they are not running
    we did put the mons/mgr/mds into the ceph.conf, but this is not
    mandatory for a working ceph setup

    i implemented this with a cluster-wide synced list of existing
    systemd units for those types of services (mon/mgr/mds) so we can
    show which services are enabled where, independent of cephs status
    and config

    this way a user can see if there is any wrong service left over,
    can see services that are not started (and not in the config) or can
    see that a service is running, but is not enabled (and thus would
    not be running after e.g. a restart)

* a list of versions of the services
    this is also not that hard and accomplished with a call to
    'YYY metadata' via RADOS. There we get the versions of running
    services including their name, host and version
    with this information, we can warn the user that some services are
    running an older version, and they can restart them

sending it as RFC, since there are following things i am not so sure
about, and wanted comments before i begin with the work on the gui part

* the cluster sync interface
    i am not so sure if this is the best way, but we wanted such a thing
    a few times now and it seems to work pretty well

    we just have to be careful how we use this to not fill pmxcfs with
    unecessary things or rely to much on them

* the service/metadata structure merging
    i mangle the 'YYY metadata' and service lists into a single
    structure so that i can later process them in the gui.
    i am not really happy how this is done, but could not think of a
    better way (i tried several things)
    the only way left that may make things better is to abandond a
    generic data broadcast interface, and write one especially for this
    case, though i do not really like the idea to have ceph relevant
    code in pve-cluster

* collecting unit info via links in /etc/systemd/system
    as i see it, we have 3 options to get the existing services for ceph:
    1 calling DBUS to query it (expensive, complicated)
    2 parsing 'systemctl list-units GLOB' (complicated, potentially error-prone)
    3 parsing symlinks in /etc/systemd/system (fast, relatively easy,
      should be stable)

    i opted for option 3 (for now), but if someone has another option,
    or a compelling opinion for any other of the options, please share it

Dominik Csapak (1):
  add generic data broadcast interface

 data/PVE/Cluster.pm | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

pve-manager:

Dominik Csapak (3):
  add get_local_services for ceph
  broadcast ceph service data to cluster
  add cluster wide ceph api calls

 PVE/API2/Cluster.pm     | 64 +++++++++++++++++++++++++++++++++++++++++++++++++
 PVE/Ceph/Services.pm    | 18 ++++++++++++++
 PVE/Service/pvestatd.pm | 14 +++++++++++
 3 files changed, 96 insertions(+)

-- 
2.11.0