[pve-devel] [PATCH cluster/manager/ha-manager/proxmox{, -perl-rs} 00/18] fix #4156: introduce new notification module

Fri Apr 14 08:19:09 CEST 2023

Am 27/03/2023 um 17:18 schrieb Lukas Wagner:
> The purpose of this patch series is to overhaul the existing mail
> notification infrastructure in Proxmox VE.

nice! Some high level review, but I didn't checked the code basically at all,
so sorry if some comments of it are moot, as they're either already implemented
or discussed.

> A later patch could mark individual mail recipients in vzdump jobs as
> being deprecated in favor of the new notification endpoints. 

Hmmm, this is something that I did not think to closely about when drafting
the (very rough) design for the notification overhaul, but it has some merit
to be able to configure this per job – but naturally it also adds some
redundancy with a global filtering system. IOW., we still need something 
not all too compley for handling a job for, say, not so important test virtual
guests and one for production virtual guests. FWIW this could also be handled
by having a fixed mapping for some overides and jobs, i.e., the global
notification config could hold stanzas that match to a specific job (or other
notification emitting type) which is then also exposed when editing said job
directly. Otherwise, as we have now a pretty generic datacenter wide config
for almost all relevant jobs Proxmox VE can handle, we could also add a property
there that allows to match some notification policy/filter/config – that might
be cleaner as it avoids having to read/handle two configs in one API call.

> The
> drawback of this approach is that we might send a notification twice
> in case a user has created another sendmail endpoint with the same
> recipient. However, this is still better than not sending a
> notification at all. However, I'm open for suggestions for other
> approaches for maintaining compatibility.
> 
> Alternative approaches that came to my mind are:
>   - Explicitly break existing mail notifications with a major
>     release, maybe create a default sendmail endpoint where *all*
>     notifications are sent to root at pam via a sendmail endpoint.
>     In this variant, the custom mail recipients for vzdump jobs would
>     be removed
>   - On update, create endpoints in the new configuration file for all
>     possible email addresses, along with appropriate filters so that
>     the behavior for every component currently sending mails remains
>     unchanged. I don't really like this approach, as it might lead to
>     a pretty messy 'notifications.cfg' with lots of endpoints/filters, if
>     the user has lots of different backup jobs configured.

If we go that way we could do the ephermal sendpoint only heuristically,
i.e., check if there's another (email) notifaction endpoint that matches
ours and the `root at localhost || $configured_root_at_pam_email` and silence
it then, as that means that the admin switched over to the new mechanism.

Or, much simpler and transparent, some node/cluster global flag in e.g.,
vzdump.conf.

> Side effects of the changes:
>   - vzdump loses the HTML table which lists VMs/CTs. Since different
>     endpoints support different markup styles for formatting, it
>     does not really make sense to support this in the notification API, at
>     least not without 'cross-rendering' (e.g. markdown --> HTML)

This I'd think should be avoided if just anyhow possible, as that is
quite the nice feature. For keeping that possible we could add a optional
strucutred data object that can be passed to the notifaction system and
that (some) plugins can then use to show some more structured data.

Simplest might be a Option<serde_json::Value> that has a defined structure
like for example (psuedo json):

{
  schema: {
    type: "table", // or object
    rows: ["vmid", "job", "..."],
    // ... ?
  },
  data: [
     { vmid: 100, job: "foo", "...": "..."},
     ...
  ],
}

The mail plugin could then produce the two tables again, for the text/plain
and text/html multiparts again.

> 
> 
> Short summary of the introduced changes per repo:
>   - proxmox:
>     Add new proxmox-notification crate, including configuration file
>     parsing, two endpoints (sendmail, gotify) and per-endpoint
>     filtering
>   - proxmox-perl-rs:
>     Add bindings for proxmox-notification, so that we can use it from
>     perl. Also configure logging in such a way that log messages from
>     proxmox-notification integrate nicely in task logs.
>   - proxmox-cluster:
>     Register new configuration file, 'notifications.cfg'
>   - pve-manager:
>     Add some more glue code, so that the notification functions are
>     callable in a nice way. This is needed since we need to read the
>     configuration file and feed the config to the rust bindings as a
>     string.
>     Replace calls to 'sendmail' in vzdump,
>     apt, and replication with calls to the new notification module.
>   - pve-ha-manager:
>     Also replace calls to 'sendmail' with the new notification module
> 
> 
> Follow-up work (in no particular order):
>   - Add CRUD API routes for managing notification endpoints and
>     filters - right now, this can only be done by editing
>     'notifications.cfg'
>   - Add a GUI and CLI using this API
>   - Right now, the notification API is very simple. Sending a
>     notification can be as simple as
>       PVE::Notification::info("Title", "Body")

nit, might use the verb form for this to be slightly shorter, i.e.:
PVE::Notify::info

> 
>     In the future, the API might be changed/extended so that supports
>     "registering" notifications. This allows us to a.) generate a
>     list of all possible notification sources in the system b.) allows
>     users to easily create filters for specific notification events.
>     In my head, using the notification module could look like this
>     then:
> 
>     # Global context
>     my  = PVE::Notification::register({
>       'id' => 'backup-failed',
>       'severity' => 'error',
>       'properties' => ['host', 'vmlist', 'logs'],
>       'title' => '{{ host }}: Backup failed'
>       'body' => <<'EOF'
>     A backup has failed for the following VMs: {{ vmlist }}
> 
>     {{ logs }}
>     EOF
>     });

hmm, yeah having that in module space so that its called on load like our
API endpoints is certainly the straight forward way, but having it act like
it would be rust (i.e., the notifiy send call itself _is_ the registry) would
be even nicer – but might be a tad too complex, as Wolfgang reject implementing
a Perl parser in rust already :(

> 
>     # Later, to send the notification:
>     PVE::Notification::send(->instantiate({
>       'host' => 'earth',
>       'vmlist' => ... ,
>       'logs' => ... ,
>     }));
> 
>     Title and body effectively become templates that can be
>     instantiated with arbitrary properties. This has the following
>     benefits:
>       - This ensures that notifications could be easily translated in
>         the future
>       - Allows us to add functionality that allows users to customize
>         notification text, as wished in [2].
>       - Properties can be used in filters (e.g. exact match, regex,
>         etc.)
>       - Properties can be listed in the list of notifications
>         mentioned above, making it easier to create filters.

Yes that would be nice and they could also have access to the structured
data, if passed as considered above.

> 
>   - proxmox-mail-forward could be integrated as well. This would feed
>     e.g. zfs-zed events into our notification infrastructure. Special
>     care must be taken to not create recursive notification loops
>     (e.g. zed sends to root, forwarder uses notification module, a
>     configured sendmail endpoint sends to root, forwarder uses module
>     --> loop)
> 
>   - Maybe add some CLI so that admins can send notifications in
>     scripts (an API endpoint callable via pvesh might be enough for a
>     start)
> 
>   - Add more notification events
> 
>   - Add other endpoints, e.g. webhook, a generic SMTP, etc.
> 
>   - Integrate the new module into the other products
> 
> [1] https://gotify.net/
> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=4526

all in all it doesn't sounds to bad, from a higer level the structured data
to be able to keep special formats like html tables in backup notifiactions
and the how to switch over, or better said, how to also have some control in
the job-specific config about how/if notifications are emitted, are the two
higher level issues we should IMO tackle before initial integration can be
merged.

> 
> 
> Summary over all repositories:
>   34 files changed, 1464 insertions(+), 140 deletions(-)
> 
> Generated by murpp v0.1.0

huh, what's this? some patch handling tool I don't know about?