[pbs-devel] Scheduler causing connectivity issues?
Mark Schouten
mark at tuxis.nl
Mon Jul 18 09:31:41 CEST 2022
Hi,
>You have 30% of runnable process getting stalled due waiting for IO, that
>naturally should not cause the request accept future to get starved but is
>the reason for why it happened with the current (or better old)
>architecture. Increasing available memory, so that the page cache can hold
>more entries, could already relieve that system a bit.
Thanks. Please note that /var/lib/proxmox is on a different set of disks
than the datastores. Root pool is on two PM883’s, datastore is lots of
spinning disks with nvme-special devices. Not sure if that’s relevant in
your findings, but here you have it :)
Memory upgrade is somewhere on our roadmap.
>We improved on the reproducer we got locally by simulating a higher latency
>disk using dm-delay on a small single core VM.
>
>For one we made the libpve-storage-perl do more efficient list-snapshot
>requests if they can be filtered by VMID, and on the PBS side we moved most
>operations that cause IO (and are related to backup groups/snapshots) to a
>separate thread pool so that the main thread should be less
>congested/blocked.
Given the other responses in this thread, I’m not going to upgrade yet
to a testing-version in production. Please let me know if there is any
other info you need from me.
—
Mark Schouten, CTO
Tuxis B.V.
mark at tuxis.nl
More information about the pbs-devel
mailing list