[pve-devel] [RFC PATCH manager] pvestatd: fix status server udp overflow

Dominik Csapak d.csapak at proxmox.com
Wed Mar 4 10:01:47 CET 2020


On 3/4/20 9:46 AM, Thomas Lamprecht wrote:
> On 2/27/20 4:14 PM, Dominik Csapak wrote:
>> for big vms (many disks/nics) a single status update can be bigger
>> than 20k (i measured about 21k for a vm with 31 disks and nics)
>> which means that a single update can bring the length of
>> $txn->{data} over 65k (if it was >45k before)
> 
> nice catch
> 
>>
>> we could reduce the limit when we send, but this would only move
>> the problem, and we might run into it later on again
>>
>> to solve it (mostly) completely, append the data to $txn->{olddata}
>> as long as the new and old data together do not get over 65k
>>
>> in that case, flush the olddata instead
>>
>> on finish flush olddata and if somehow data was left in data, flush
>> that also
>>
>> one problem remains: if a single update from a vm/node/etc.. is over
>> 65k, but this would require a bigger refactoring of the status
>> plugins...
> 
> hmm, your changes seems wrong, or better said unnecessarily complicated
> while not solving all issues..
> 
> I'll simply loop over data in 64k chunks if it's bigger than that and
> send those out, simple package segmentation..
> 

but i guess that won't work properly, since e.g. the influxdb
protocol is line-based and i guess it does not concatenate
multiple udp packets... so if a line gets cut right in the middle
the first part puts potentially wrong data into the db
and the second is not the right syntax...

>>
>> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
>> ---
>>   PVE/ExtMetric.pm | 32 ++++++++++++++++++++++++++------
>>   1 file changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
>> index 14d98317..2f52b883 100644
>> --- a/PVE/ExtMetric.pm
>> +++ b/PVE/ExtMetric.pm
>> @@ -34,10 +34,16 @@ sub update_all($$@) {
>>   
>>   	$plugin->$method($txn, @params);
>>   
>> -	if (length($txn->{data}) > 48000) {
>> +	$txn->{olddata} //= '';
>> +	$txn->{data} //= '';
>> +
>> +	if ((length($txn->{olddata}) + length($txn->{data})) > 65000) {
>>   	    # UDP stack cannot handle messages > 65k, if we've alot of data we
>>   	    # do smaller batch sends then, but keep the connection alive
>> -	    transaction_flush($txn, 1);
>> +	    transaction_flush($txn, 1, 1);
>> +	    $txn->{olddata} = delete $txn->{data};
>> +	} else {
>> +	    $txn->{olddata} .= delete $txn->{data};
>>   	}
>>       }
>>   }
>> @@ -70,17 +76,27 @@ sub transactions_start {
>>   }
>>   
>>   sub transaction_flush {
>> -    my ($txn, $keepconnected) = @_;
>> +    my ($txn, $keepconnected, $useolddata) = @_;
>>   
>>       if (!$txn->{connection}) {
>>   	return if !$txn->{data}; # OK, if data was already sent/flushed
>>   	die "cannot flush metric data, no connection available!\n";
>>       }
>> -    return if !defined($txn->{data}) || $txn->{data} eq '';
>> +    if ($useolddata) {
>> +	return if !defined($txn->{olddata}) || $txn->{olddata} eq '';
>> +    } else {
>> +	return if !defined($txn->{data}) || $txn->{data} eq '';
>> +    }
>>   
>>       my $plugin = PVE::Status::Plugin->lookup($txn->{cfg}->{type});
>>   
>> -    my $data = delete $txn->{data};
>> +    my $data;
>> +    if ($useolddata) {
>> +	$data = delete $txn->{olddata};
>> +    } else {
>> +	$data = delete $txn->{data};
>> +    }
>> +
>>       eval { $plugin->send($txn->{connection}, $data) };
>>       my $senderr = $@;
>>   
>> @@ -97,7 +113,11 @@ sub transactions_finish {
>>       my ($transactions) = @_;
>>   
>>       for my $txn (@$transactions) {
>> -	eval { transaction_flush($txn) };
>> +	eval { transaction_flush($txn, undef, 1) };
>> +	if ($txn->{data}) {
>> +	    # should not happen, just for safety
>> +	    eval { transaction_flush($txn) };
>> +	}
>>   	warn "$@" if $@;
>>       }
>>   }
>>
> 





More information about the pve-devel mailing list