[pve-devel] [RFC cluster 1/1] add generic data broadcast interface

Fri Apr 26 10:43:03 CEST 2019

Am 4/26/19 um 8:21 AM schrieb Dominik Csapak:
> similar to how we handle the cluster wide tasklist and rrd data,
> have an interface that can sync data across the cluster
> 
> this data is only transient and will not be written to disk
> 
> we can use this for a number of things, e.g. getting the locks of the
> guests clusterwide, listing ceph services across the cluster, etc.

How?? One needs to write it on lock, and if it's temporarily it does not
seem really that great for things like locks?? Also you start to broadcast
everything twice to all, the lock info is already there in the memdb store
no need to do some heuristic parallel heuristics for that..

> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  data/PVE/Cluster.pm | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/data/PVE/Cluster.pm b/data/PVE/Cluster.pm
> index 5af11e6..ea36a0a 100644
> --- a/data/PVE/Cluster.pm
> +++ b/data/PVE/Cluster.pm
> @@ -540,6 +540,53 @@ sub get_nodelist {
>      return [ keys %$nodelist ];
>  }
>  
> +# best effort data store for cluster
> +# this data is gone if the pmxcfs is restarted, but only the local data,

if one is restarted or if all are? Else, who cleans up locks, or other data,
on node failure (in any way possible), else this _will_ have very confusing
outcomes...

> +# so we should not use this for very important data
> +sub broadcast_data {

broadcast_node_kv ?

> +    my ($key, $data) = @_;
> +
> +    my $size = length(encode_json($data));
> +    if ($size >= (32 * 1024)) {
> +	warn "data for '$key' too big\n";
> +	return;
> +    }
> +
> +    eval {
> +	&$ipcc_update_status($key, $data);

so we just allow overwriting existing keys? how about a prefix like we have
for "rrd/"

> +    };
> +
> +    warn $@ if $@;
> +}
> +
> +sub get_data {

get_node_kv ?

> +    my ($key, $nodename) = @_;
> +
> +    my $res = {};
> +    my $sub = sub {

maybe a name which actually says something would help reading this ;-)

> +	my ($node) = @_;
> +	eval {
> +	    my $raw = &$ipcc_get_status($key, $node);

use new coderef call syntax, don't mix it in the same patch..
(above &$ vs below $sub->())

> +	    my $data = decode_json($raw) if $raw;
> +	    $res->{$node} = $data;
> +	};
> +	my $err = $@;
> +	syslog('err', $err) if $err;
> +    };
> +
> +    if ($nodename) {
> +	$sub->($nodename);
> +    } else {
> +	my $nodelist = get_nodelist();
> +
> +	foreach my $node (@$nodelist) {
> +	    $sub->($node);
> +	}

so this is not arbitrary data indexed by key, but a value store indexed with
($key, $node), i.e., same key on different nodes can hold different data, can
be OK, but this needs to be noted, i.e., through more specific method names

not yet really sure about this, or better said, I'm sure I don't want this for
locks, but for things like ceph service states it could be OK.

> +    }
> +
> +    return $res;
> +}
> +
>  # $data must be a chronological descending ordered array of tasks
>  sub broadcast_tasklist {
>      my ($data) = @_;
>