[pbs-devel] [PATCH proxmox-backup v3] etc: raise nofile soft limit to hard limit for proxmox-backup-proxy
Fabian Grünbichler
f.gruenbichler at proxmox.com
Fri Nov 21 08:43:11 CET 2025
On November 20, 2025 6:23 pm, Thomas Lamprecht wrote:
> Am 20.11.25 um 16:12 schrieb Christian Ebner:
>> On 11/20/25 4:05 PM, Thomas Lamprecht wrote:
>>> Am 20.11.25 um 15:32 schrieb Christian Ebner:
>>>> This is acceptable since PBS does not directly depend on problematic
>>>> select() calls as verified via `nm` and does not use it in linked
>>>> libraries to the best of my knowledge.
>>>>
>>>
>>> Isn't above and
>>
>> With above I intended to state that the PBS code itself does not call into select(), while below are dependencies on shared objects which might call into select() according to their symbols.
>>
>
> And the systemd news entry you link to in the commit message clearly states:
>
> ----8<----
> Programs that want to take benefit of the increased limit have to "opt-in" into
> high file descriptors explicitly by raising their soft limit. Of course, when
> they do that they must acknowledge that they cannot use select() anymore (and
> **neither can any shared library they use — or any shared library used by any
> shared library they use and so on**).
> ---->8----
>
> I just checked the apt repo, and it includes various select calls. Most seem
> to center around downloading packages and such, but I'd not bet on it that
> no such select is anywhere in the code paths we use.
>
> PAM uses select in the pam_loginuid, which might be part of the login call,
> albeit it uses it only if require_auditd is enabled (which I don't think it is).
> I did not yet checked the others out.
>
> I mean, one option might be to provide our own select wrapper preloaded
> overriding the glibc one and keep some FDs below 1024 resereved for that, but
> I really really dislike doing such things. Similar in spirit would be providing
> a select compatible implementation using poll and ld_preload that, but also far
> from great..
>
> Moving either GC, or all the things that might call select as per your list,
> into a dedicated process might be the nicer thing to do. But as mentioned offlist
> I'll try to walk through the problem and code again tomorrow and see if I can
> find some other viable options (or you/fabian got some ideas), as of my current
> knowledge I cannot really accept doing this bump.
if we move something, we should move the things (potentially) calling
select, as we can then benefit from higher FD limits for all the regular
operations. 1k open FDs is not much even without the newly added locks,
and we had users running into issues already before that fixed them by
raising the limit with a systemd override or other means (or not at
all):
https://forum.proxmox.com/threads/too-many-open-files-os-error-24.73094/
https://forum.proxmox.com/threads/garbage-collect-job-fails-with-emfile-too-many-open-files.152687/
https://forum.proxmox.com/threads/tasks-fail-with-too-many-open-files-os-error-24.126770/
https://forum.proxmox.com/threads/sync-from-pbs-to-pbs-failed-too-many-open-files.113036/
https://forum.proxmox.com/threads/another-sync-error.73417/
the only alternative I see at the moment would be to either
- reduce the lock granularity of the newly introduced lock (e.g.,
lock-per-chunk-prefix)
- reduce the batch size (which determines the number of concurrently
held locks in GC) for S3 deletion
the latter would be a fairly simple patch, but make GC potentially a bit
more expensive (more delete requests to S3):
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 0a5179230..20372190c 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1716,6 +1716,24 @@ impl DataStore {
}
chunk_count += 1;
+
+ drop(_guard);
+
+ if delete_list.len() > 100 {
+ let delete_objects_result = proxmox_async::runtime::block_on(
+ s3_client.delete_objects(
+ &delete_list
+ .iter()
+ .map(|(key, _)| key.clone())
+ .collect::<Vec<S3ObjectKey>>(),
+ ),
+ )?;
+ if let Some(_err) = delete_objects_result.error {
+ bail!("failed to delete some objects");
+ }
+ // release all chunk guards
+ delete_list.clear();
+ }
}
if !delete_list.is_empty() {
More information about the pbs-devel
mailing list