[pve-devel] stream or rrd for ksm sharing counter ?

Wed Oct 9 09:41:58 CEST 2019

On 10/9/19 8:41 AM, Alexandre DERUMIER wrote:
> Hi,
> 
> I'm still trying to improve loadbalancing.
> 
> Currently we don't stream ksm sharing counter,
> I think it could be great to stream it or push it to rrd (with extra rrd ? change the current memory format ?)
> 
> What is the best way to do it ?
> 
> 
> As we could have 2 servers with 80% memory usage, but real ksm can be different.
> (for better loadbalancing, we should calc the memory + ksm memory usage).

but would you need to know how much could be shared on the target
node to actually get this calculation right??

We'd rather keep it simple for now, start off with just the whole code
infrastructure and static scheduling (i.e., VMs/CTs can be assigned
resource-use-points) after we have that we have a base we can compare
against and be sure that we do no "move everything one node away, in a
circle" situations.
Different load alogs/params could then be tried out and probably users
have different needs. E.g., personally I'd mostly want to do memory as
the rest is just to dynamic in my setups.

For CPU IMO the correct metric needs to be still found.. But IMO one
that's somewhat OK at all, could be pressure stall information[0].

As long as no task is stalled just because of lack of CPU resources
there's no need to move things around (when just looking at CPU
balancing), same for memory.

We could even poll the PSI interface and get notified if it passes
certain tresholds[0].

[0]: https://www.kernel.org/doc/html/latest/accounting/psi.html

I'd actually separate this all into tow things:

* dynamic balancing: done only if really needed, doesn't cares of
  balancing-out as long as all VM/CTs have enough resources to run.
  IOW, there could be grave utilization differences but still all
  VM/CTs get scheduled, so we do not move. This should be rather
  try to ensure all can keep running on a longer time. PSI would
  be great here as it can be used to see if a task (group) is
  actually not able to run, if another node has better (lower) PSI
  then we know that it has less utilization (and we know for real)
  and can move something there, if the difference is big enough.

* user-triggered balance out: this would only be triggered manually
  by an admin. UI should be made so that the suggested movements are
  visible. It's somewhat like your proposed patc

just to throw out my ideas :)