[pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages
Christian Ebner
c.ebner at proxmox.com
Fri Nov 22 09:41:23 CET 2024
On 11/21/24 20:32, Fabian Grünbichler wrote:
>
>> Gabriel Goller <g.goller at proxmox.com> hat am 21.11.2024 18:04 CET geschrieben:
>>
>>
>> On 21.11.2024 17:26, Christian Ebner wrote:
>>> On 11/21/24 17:06, Gabriel Goller wrote:
>>>> I think I'd be nice if we use `.context()` instead of `format_err` and
>>>> debug print ("{err:?}") instead of default ("{}") here. As these
>>>> messages are all errors they shouldn't appear too often in the log and
>>>> if an error happens, you get much more information.
>>>>
>>>> What do you think?
>>>
>>> Hmm, can check that out, but wouldn't that disrupt the single line
>>> character of the log entries, prefixed by the time stamp?
>>
>> We could use "{err:#}", which will print everything in one line?
>>
>> Although on the other hand, I'd understand if you want to have strict
>> control over what is displayed to the users, so no hard feelings on
>> this one.
>
> big tasks/job like the sync ones are a bit special w.r.t. error handling because they "catch" errors often at intermediate levels, log them, record that an error occur, but then proceed with a sensible next unit of work. so it's not possible to just add a lot of context to an error like with simple(r) API calls, where many/most errors can be treated as fatal, maybe requiring a bit of cleanup before bailing completely.
>
> I think a good compromise is to treat those units of work as a sort of error scope, and
> - add context where it is missing (e.g., low level file or network access, where just bubbling up errors might be meaningless)
> - and then when logging the error, think about formatting
The current series already tries to follow your lead here, following
along the lines of you followup patches to the original series.
I will however double check where the suggested adding of an anyhow
Error context makes sense and how to best log soft errors without
disrupting the logs to much, increasing the verbosity for now as you
suggested below.
>
> e.g., for syncs we have groups as lowest unit of work - if something fails within a group, we abort that group, but proceed with the next one. but if we just log "syncing group X failed - permission denied" that doesn't help whatsoever. it might be better to have three lines of warnings if something goes wrong, if those three lines contain appropriate information like
> - request X / file access Y /.. failed with error Z (root cause)
> - while processing snapshot A in group B (important context, because the request or path of the root cause might not tell us this)
> - syncing group B failed (result of the error chain)
>
> achieving that requires carefully analyzing all error sources/chains though. when in doubt, I'd rather have a bit too much context initially (that usually means too much is added at lower levels and we can optimize there) than too little, but within reason ;) we can't travel back in time and add needed information to debug issues to our or users' logs, a stray duplicate context can easily be filtered out though. of course that doesn't mean every error is allowed to grow 10 lines long by default, balance is still important even when doing a first pass/implementation of improved error messages!
>
> and of course, the general rule applies - if we use errors with proper context, it should be done fairly consistently, so that the next level up can actually use it. it might be easier to use format_err or explicit additional `warn!` invocations to improve the situation immediately, and postpone adding proper contexts to do it for all of sync without time pressure.
Yes, aligning the sync job logs for push and pull would of course also
be of interest here, but given that the pull job also logs information
not controlled by the job directly, but also indirectly by e.g. logs of
operations on the datastore, that will require some more refactoring and
reorganization of the logic to get it right, without breaking other logging.
More information about the pbs-devel
mailing list