[pve-devel] [RFC PATCH manager] pvestatd: fix status server udp overflow
Dominik Csapak
d.csapak at proxmox.com
Wed Mar 4 10:01:47 CET 2020
On 3/4/20 9:46 AM, Thomas Lamprecht wrote:
> On 2/27/20 4:14 PM, Dominik Csapak wrote:
>> for big vms (many disks/nics) a single status update can be bigger
>> than 20k (i measured about 21k for a vm with 31 disks and nics)
>> which means that a single update can bring the length of
>> $txn->{data} over 65k (if it was >45k before)
>
> nice catch
>
>>
>> we could reduce the limit when we send, but this would only move
>> the problem, and we might run into it later on again
>>
>> to solve it (mostly) completely, append the data to $txn->{olddata}
>> as long as the new and old data together do not get over 65k
>>
>> in that case, flush the olddata instead
>>
>> on finish flush olddata and if somehow data was left in data, flush
>> that also
>>
>> one problem remains: if a single update from a vm/node/etc.. is over
>> 65k, but this would require a bigger refactoring of the status
>> plugins...
>
> hmm, your changes seems wrong, or better said unnecessarily complicated
> while not solving all issues..
>
> I'll simply loop over data in 64k chunks if it's bigger than that and
> send those out, simple package segmentation..
>
but i guess that won't work properly, since e.g. the influxdb
protocol is line-based and i guess it does not concatenate
multiple udp packets... so if a line gets cut right in the middle
the first part puts potentially wrong data into the db
and the second is not the right syntax...
>>
>> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
>> ---
>> PVE/ExtMetric.pm | 32 ++++++++++++++++++++++++++------
>> 1 file changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm
>> index 14d98317..2f52b883 100644
>> --- a/PVE/ExtMetric.pm
>> +++ b/PVE/ExtMetric.pm
>> @@ -34,10 +34,16 @@ sub update_all($$@) {
>>
>> $plugin->$method($txn, @params);
>>
>> - if (length($txn->{data}) > 48000) {
>> + $txn->{olddata} //= '';
>> + $txn->{data} //= '';
>> +
>> + if ((length($txn->{olddata}) + length($txn->{data})) > 65000) {
>> # UDP stack cannot handle messages > 65k, if we've alot of data we
>> # do smaller batch sends then, but keep the connection alive
>> - transaction_flush($txn, 1);
>> + transaction_flush($txn, 1, 1);
>> + $txn->{olddata} = delete $txn->{data};
>> + } else {
>> + $txn->{olddata} .= delete $txn->{data};
>> }
>> }
>> }
>> @@ -70,17 +76,27 @@ sub transactions_start {
>> }
>>
>> sub transaction_flush {
>> - my ($txn, $keepconnected) = @_;
>> + my ($txn, $keepconnected, $useolddata) = @_;
>>
>> if (!$txn->{connection}) {
>> return if !$txn->{data}; # OK, if data was already sent/flushed
>> die "cannot flush metric data, no connection available!\n";
>> }
>> - return if !defined($txn->{data}) || $txn->{data} eq '';
>> + if ($useolddata) {
>> + return if !defined($txn->{olddata}) || $txn->{olddata} eq '';
>> + } else {
>> + return if !defined($txn->{data}) || $txn->{data} eq '';
>> + }
>>
>> my $plugin = PVE::Status::Plugin->lookup($txn->{cfg}->{type});
>>
>> - my $data = delete $txn->{data};
>> + my $data;
>> + if ($useolddata) {
>> + $data = delete $txn->{olddata};
>> + } else {
>> + $data = delete $txn->{data};
>> + }
>> +
>> eval { $plugin->send($txn->{connection}, $data) };
>> my $senderr = $@;
>>
>> @@ -97,7 +113,11 @@ sub transactions_finish {
>> my ($transactions) = @_;
>>
>> for my $txn (@$transactions) {
>> - eval { transaction_flush($txn) };
>> + eval { transaction_flush($txn, undef, 1) };
>> + if ($txn->{data}) {
>> + # should not happen, just for safety
>> + eval { transaction_flush($txn) };
>> + }
>> warn "$@" if $@;
>> }
>> }
>>
>
More information about the pve-devel
mailing list