[pbs-devel] [PATCH 0/5] Add LVM (thin) support for single file restore

Mon Jul 5 08:26:45 CEST 2021

On 02.07.21 14:39, Fabian Grünbichler wrote:
> Reviewed-By: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> 
> requires a Breaks on the old restore image (else the restore daemon 
> crashes because of missing lock/LVM support).
> 
> some potential for follow-ups/issues I encountered with a not-so-fast 
> PBS instance and bigger VMs:
> - increase VM start timeout (currently 12s?), as the initial scan of 
>   disks seems to block responding to the startup status request, and 
>   scanning all VGs/LVs can take quite a lot longer than that - 
>   alternatively, fix this blockage and/or consider VM started sooner?

I do not think that is required, rather, did it work at all for you?
As here it did not, and that was rather due to a recently introduced side-effect by the
Memcom(pare) module[0], which is instantiated deep in the REST handler for the cached user
handling and expects to be able to create a file in `/run/proxmox-backup`. As that directory
did not exists in the restore image environment, all requests to the REST server instance
there just got a 400 Bad Request:

> proxmox_backup::server::rest] GET /api2/json/status: 400 Bad Request: [client 0.0.0.0:807] request failed

Which is stupid in two ways:
* in most cases this rather should be denoted as 50X HTTP error, as it's produced
  by a general error result in the server, user and param verification already create
  correct 40X errors, and API calls that want to denote fault on the user request side
  should be able to do so too, so remaining stuff is quite likely a 500 error - making
  it easier to search for.
* We did not set the log extension for the general error, only returned it as response
  body, so the real error got lost in any access log.

The latter was fixed[1], but I did not care to much, nor had the time to look into details
of the first one, just noting for the record.

Anyway, the symptoms of the earlier missing, now created[2], proxmox-backup run-dir where
a "VM start timeout", as the any API call failed the status one used for detecting a
finished start failed to -> timeout.

In any way, neither of you did actually test this on master, Stefan send it the day
Dietmar applied the Memcom stuff, but it was still done so 9h before this series was
send, meh...

[0]: https://git.proxmox.com/?p=proxmox-backup.git;a=commitdiff;h=fda19dcc6f41b2537896b2e874be1c0cfe71d5ea;hp=cd975e5787a3a9ace56e4aea0acc2364c283f438
[1]: https://git.proxmox.com/?p=proxmox-backup.git;a=commitdiff;h=f4d371d2d2c5d79352e9681b5d942627e26c6fd4;hp=835d0e5dd3d69886f0d6bfdd4192a4a8c81af395
[2]: https://git.proxmox.com/?p=proxmox-backup.git;a=commitdiff;h=33d7292f29bedfd7570bebcef2a5d6943ebe8cbb;hp=f4d371d2d2c5d79352e9681b5d942627e26c6fd4

> - filter out "known unreadable" LVs, like Ceph (bluestore) OSDs
> - investigate '--sysinit' option for lvchange/vgchange (seems 
>   appropriate for what we are doing inside the VM ;))
> - investigate '-K' to allow looking at "non-auto-activatable" LVs
> 
> and as discussed off-list, I think we should switch to 
> build-dependencies and copying binaries+needed libs in 
> build-initramfs.sh instead of downloading .deb packages and extracting 
> them only to remove part of their contents again - but that is firmly in 
> "improve/cleanup build" territory, so lower priority than the above.