[pbs-devel] [RFC proxmox v2 2/2] proxmox-log: added tracing infra

Fri Nov 3 09:56:06 CET 2023

On Thu, Nov 02, 2023 at 03:58:02PM +0100, Gabriel Goller wrote:
> Thanks for the review!
> 
> On 11/2/23 14:43, Wolfgang Bumiller wrote:
> > [..]
> > > +
> > > +impl<S: Subscriber> Layer<S> for FilelogLayer {
> > > +    fn on_event(&self, event: &Event<'_>, _ctx: Context<'_, S>) {
> > > +        let mut buf = String::new();
> > > +
> > > +        event.record(&mut EventVisitor::new(&mut buf));
> > I'd argue the above 2 lines should be part of the closure below,
> > otherwise in the error case you just produce a string to throw away.
> I Agree!
> > > +
> > > +        let logger_exists = LOGGER.try_with(|logger| {
> > > +            log_to_file(&mut logger.borrow_mut(), event.metadata().level(), buf);
> > > +        });
> > > +        if let Err(e) = logger_exists {
> > > +            error!(
> > Is it wise to call log functions from within log handlers? ;-)
> Hmm, yes this could be tricky...
> How do we want to handle this anyway? We could panic here, but that seems a
> bit
> harsh. Logging with `eprintln` is also an option, but a quite useless one
> tbh.

Well it's either eprinln!, or, well, do nothing and hope there's another
layer dealing with it. 
But really, what *is* this error anyway? AFAICT it means the task-local
is not set, so we should not even run into this, and panicking might
almost be fine...

Alternatively we could drop the separate layer idea and just have 1
layer where the file logger not existing is the same as the task logger
not being enabled, that is, we fall back to using the syslogger.
I guess that depends on how much additional functionality we really need
from all this tracing infrastructure?

> > > [..]
> > > +pub struct WorkerTaskFilter {
> > > +    in_worker_task: Arc<Mutex<bool>>,
> > AFAICT you only have locks which are shortlived to set/clear/check this
> > value.
> > For such a thing you can use `Arc<AtomicBool>` and get rid of all the
> > error handling.
> Ooh, this makes it a lot easier, thanks for the heads up!I think we should
> be alright by just using `Ordering::Relaxed` everywhere, right?

Yes, this will AFAICT never be accessed in parallel. (Might as well be
a simple Cell unless we run into a `Sync` bound somewhere?)

> > > +}
> > > +
> > > +impl WorkerTaskFilter {
> > > +    pub fn new(in_worker_task: Arc<Mutex<bool>>) -> WorkerTaskFilter {
> > > +        WorkerTaskFilter { in_worker_task }
> > > +    }
> > > +}
> > > +
> > > +impl<S: Subscriber + for<'a> LookupSpan<'a>> Filter<S> for WorkerTaskFilter {
> > > +    fn on_enter(&self, id: &span::Id, ctx: Context<'_, S>) {
> > > +        let metadata = ctx.metadata(id);
> > > +        if let Some(m) = metadata {
> > > +            if m.name() == "worker_task" {
> > I'm not so happy with this.
> > Now each time we poll a worker task we go through this layer system
> > which uses string comparison to know whether we're currently in a worker
> > task, for something that is actually rather static in the code.
> > I'd much prefer a simply custom `Future` wrapping the worker task's
> > future and setting this flag for the duration of the `poll()` method.
> Ok, but the `Future` wrapper would only work for the task, right?

I don't see the difference. In the threaded version you'd just
initialize in_worker_task to `true` - I mean, it does not matter where
you access that `Arc<AtomicBool>` from?

I don't know. I find this whole thing a bit backwards? Maybe it's just
me, but I'm not convinced that's how this part of `tracing` is meant to
be used.

Consider this:
If you already have an Arc<Bool>, all you need is a thread-local
copy of it:
- A Future based worker would have a wrapping Future containing that
  Arc, which, for the duration of `poll()`, sets the thread-local to
  point to its Arc.
- A thread based worker would just set it at the very beginning.
- A guard to enable/disable it (if we even need it, like on_enter/on_exit)
  would contain the old value and restore it on Drop (giving us a
  push/pop mechanics for cheap).

Also note that AFAICT neither the current tracing variant nor what I
wrote above deal with `tokio::spawn()`ed tasks, but that's fine.
If we want the ability to inherit the logging facility, this would need
to be independent from the boolean anyway since tasks can run on
separate threads - unless we restrict it to `LocalSet`s.
So I think we might need to first decide how that bit *should* work
before adding the ability to enter/leave the scope at will.
Unless I'm missing something.

> So we would need to keep the `span` version (or come up with
> something different) for the thread use-case and then again have a
> lot of `if (thread) { do this } else if (task) { do this }` stuff, which I
> don't
> really like.
> What we could do is have another `tokio::task_local!()` thingy, which
> contains a bool 'log_to_tasklog'. Then have another `scope` and `sync_scope`
> around the worker logic (So we would substitute the span stuff with another
> TLS).
> > This just seems like a whole lot of overhead we don't need for such
> > simple functionality. Also, the `on_enter` and `on_exit` methods make it
> > look like you could easily enter and exit this type of span, but that's
> > not the case. `on_exit` always stores `false`, so nested spans
> > temporarily disabling and enabling the worker task log would just end up
> > with a messed up state (this would need to be a counter...).
> Right, I would need a counter here..

(well actually some kind of stack, if we even really need the
functionality)

> > > [..]
> > > +
> > > +impl Visit for EventVisitor<'_> {
> > > +    fn record_debug(&mut self, field: &Field, value: &dyn fmt::Debug) {
> > > +        if field.name() == "message" {
> > > +            self.buf.push_str(&format!("{:?}", value));
> > String implements fmt::Write. You can
> > 
> >      use std::fmt::Write as _;
> >      let _ = write!(state.buf, "{value:?}");
> > 
> > it's possible for this to be more efficient since it does not enforce
> > the creation of a separate allocated string.
> 
> Thanks!
> 
> > [..]