[pbs-devel] [PATCH proxmox-backup 14/33] server: notifications: send GC notifications via notification system

Wed Apr 17 16:26:04 CEST 2024

On  2024-04-17 09:46, Fabian Grünbichler wrote:
> On April 16, 2024 2:13 pm, Lukas Wagner wrote:
>>
>>
>> On  2024-04-16 11:37, Gabriel Goller wrote:
>>> On Fri Apr 12, 2024 at 12:06 PM CEST, Lukas Wagner wrote:
>>>> diff --git a/src/server/gc_job.rs b/src/server/gc_job.rs
>>>> index 41375d72..ff5bdccf 100644
>>>> --- a/src/server/gc_job.rs
>>>> +++ b/src/server/gc_job.rs
>>>> @@ -19,8 +19,6 @@ pub fn do_garbage_collection_job(
>>>>  ) -> Result<String, Error> {
>>>>      let store = datastore.name().to_string();
>>>>  
>>>> -    let (email, notify) = crate::server::lookup_datastore_notify_settings(&store);
>>>> -
>>>>      let worker_type = job.jobtype().to_string();
>>>>      let upid_str = WorkerTask::new_thread(
>>>>          &worker_type,
>>>> @@ -43,11 +41,9 @@ pub fn do_garbage_collection_job(
>>>>                  eprintln!("could not finish job state for {}: {err}", job.jobtype());
>>>>              }
>>>>  
>>>> -            if let Some(email) = email {
>>>> -                let gc_status = datastore.last_gc_status();
>>>> -                if let Err(err) = send_gc_status(&email, notify, &store, &gc_status, &result) {
>>>> -                    eprintln!("send gc notification failed: {err}");
>>>> -                }
>>>> +            let gc_status = datastore.last_gc_status();
>>>> +            if let Err(err) = send_gc_status(&store, &gc_status, &result) {
>>>> +                eprintln!("send gc notification failed: {err}");
>>>
>>> I think we should use 'task_err!()' here. I know eprintln is used above,
>>> and technically works because we redirect stderr in the service setup 
>>> but it's still slow and kinda the legacy method of printing task errors.
>>
>> I think the reason why the original code does not use task_log is because the
>> job is already marked as finished at that point:
>>
>>             if let Err(err) = job.finish(status) {
>>                 eprintln!("could not finish job state for {}: {err}", job.jobtype());
>>             }
>>
>>             let gc_status = datastore.last_gc_status();
>>             if let Err(err) = send_gc_status(&store, &gc_status, &result) {
>>                 eprintln!("send gc notification failed: {err}");
>>             }
>>
>> Other jobs seem to follow the same pattern - use task_log! before, and eprintln/log::error!
>> after the job is finished.
>> A `task_log!` after the job is finished still seems to work, but I'm not sure if that
>> might lead to problems. What do you think?
> 
> I don't think this is a problem per se - job.finish() just marks the job as
> finished in the job state, it's not related to the worker task
> finish/exit itself.
> 
> since task_log (and friends) require the worker to still exist, and the
> last thing that happens when the worker fn returns its result is that
> result being logged via the same logging mechanism, this should be fine.
> but we might still want to switch the order around, so that a warning
> for notification failure actually gets counted?
> 
> if it's such a common pattern to do job.start at the start, and
> job.finish near/at the end of a worker task, I wonder whether we
> couldn't handle that in a uniform fashion ;) then we might also not end
> up calling worker.create_state twice (once for job.finish, and once for
> worker.log_result as part of the worker exiting) with slightly different
> states..
> 

Thanks a lot for your input!

I think I'll do the changes in a followup (if only to replace the eprintln!) - since
Gabriel's proxmox_log/tracing series is ripping out the task_log! macro any way.

-- 
- Lukas