[pve-devel] [PATCH cluster v2] limit tasklist to the maximal pmxcfs status entry size

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Aug 18 11:21:18 CEST 2017

We tried to limit the size of the tasklist by including non-running
task only if we have less than 25 entries. A reason, among others,
was that a single status entry in the cfs_status.kvhash is limited to
32 KiB.

The "max. 25 entry" heuristic assumes that entries are small, which
is also the norm.  But on failed tasks, e.g. a Qemu VM with a
problematic command line, is far longer than the usual task entry.

This led to a situation where the last 25 task were bigger than
32KiB, so the ipcc call to the pmxcfs failed with EFBIG.
This aborted then every new task run with fork_worker, and could
render a node partially unusable until "/var/log/pve/tasks/active"
got truncated.
To recreate this issue quite fast do:
 # qm create 11109 --args "'$(dd if=/dev/urandom bs=1024 count=1 2>/dev/null | base64 -w 0)'"
 # while true; do qm start 11109; done

You should see soon a "ipcc_send_rec failed: File too large"
After this all new task fail, even if they could succeed. pvestatd
also fails to broadcast the tasklist now. To get out of this do:

To address this check the length of the serialized list and remove
elements from its end until we do not exceed the size limit anymore.

Current running tasks and chronological newer ones will get

Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>

v2 from the "limit tasklist also to the maximal pmxcfs status entry size"

changes v1 -> v2:
* move the patch to cluster, there is the interface to the pmxcfs and thus we
  should make such checks there.
* move from "add if there may be space" to "remove if theres to much", its more
  accurate and encode_json gets called less than with the other approach

 data/PVE/Cluster.pm | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/data/PVE/Cluster.pm b/data/PVE/Cluster.pm
index 2f46754..c845298 100644
--- a/data/PVE/Cluster.pm
+++ b/data/PVE/Cluster.pm
@@ -523,9 +523,18 @@ sub get_nodelist {
     return [ keys %$nodelist ];
+# $data must be a chronological descending ordered array of tasks
 sub broadcast_tasklist {
     my ($data) = @_;
+    # the serialized list may not get bigger than 32kb (CFS_MAX_STATUS_SIZE
+    # from pmxcfs) - drop older items until we satisfy this constraint
+    my $size = length(encode_json($data));
+    while ($size >= (32 * 1024)) {
+	pop @$data;
+	$size = length(encode_json($data));
+    }
     eval {
 	&$ipcc_update_status("tasklist", $data);

More information about the pve-devel mailing list