[pve-devel] [PATCH manager 7/7] report: add recent boot timestamps which may show fencing/crash events

Alexander Zeidler a.zeidler at proxmox.com
Thu Apr 18 17:45:49 CEST 2024


On Thu, 2024-04-18 at 12:43 +0200, Mira Limbeck wrote:
> On 4/18/24 11:16, Alexander Zeidler wrote:
> > Successful boots which crashed somehow and sometime afterwards, will
> > show the same "until" value ("still running" or timestamp) as the next
> > following boot(s). The most recent boot from such a sequence of
> > duplicated "until" lines, has not been crashed or not yet.
> > 
> > Example output where only the boot from 16:25:41 crashed:
> >  reboot system boot 6.5.11-7-pve Thu Apr 11 16:31:24 2024 still running
> >  reboot system boot 6.5.11-7-pve Thu Apr 11 16:29:17 2024 - Thu Apr 11 16:31:12 2024 (00:01)
> >  reboot system boot 6.5.11-7-pve Thu Apr 11 16:25:41 2024 - Thu Apr 11 16:31:12 2024 (00:05)
> >  ...
> > 
> > Furthermore, it shows the booted/crashed/problematic kernel version.
> > 
> > `last` is also used since currently `journalctl --list-boots` can take
> > 10 seconds or even longer on some systems, with no option to limit the
> > amount of reported boot lines.
> > 
> > Signed-off-by: Alexander Zeidler <a.zeidler at proxmox.com>
> > ---
> > v2:
> > * move away from dmesg base
> > * list also recent (5) boot timestamps with additional information
> > 
> > v1: https://lists.proxmox.com/pipermail/pve-devel/2024-March/062342.html
> > 
> > 
> >  PVE/Report.pm | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/PVE/Report.pm b/PVE/Report.pm
> > index d9f81a0f..c3abb776 100644
> > --- a/PVE/Report.pm
> > +++ b/PVE/Report.pm
> > @@ -32,6 +32,7 @@ my $init_report_cmds = sub {
> >  		'hostname',
> >  		'date -R',
> >  		'cat /proc/cmdline',
> > +		'last reboot -F -n5',
> >  		'pveversion --verbose',
> >  		'cat /etc/hosts',
> >  		'pvesubscription get',
> 
> Do we want the reboot info that far up, even above the version output?
> I'd say it's less interesting most of the time than the `pveversion` output.

I'm not sure if it really fits better with your suggestion. Because, while
the pveversion output can be considered as often more relevant, I have placed
it as it is because it fits well with the surrounding information:

* You can see/compare the booted kernel versions to the kernel command line
  and pveversion output.

* For the kernel command line it makes rather sense to have it at the
  beginning of the report.

* Also it may be interesting how frequent the host is rebooted (e.g. after
  kernel updates)


Btw. the "wtmp begins ..." output does not have to be the installation date.
In case we do not store this information somewhere, currently something like

stat / | grep Birth

could be used if needed.

> 
> And for uptime, we do have /cluster/resources and `top` which both show it.
> Maybe it could be moved a bit further down? After /cluster/resources
> could perhaps be a nice spot since it is (currently) followed by `top`?





More information about the pve-devel mailing list