[pve-devel] [PATCH zsync] fix #2821: only abort if there really is a waiting/syncing job instance already
Thomas Lamprecht
t.lamprecht at proxmox.com
Thu Dec 17 10:23:33 CET 2020
On 17/12/2020 09:40, Fabian Ebner wrote:
> Am 14.12.20 um 14:47 schrieb Thomas Lamprecht:
>> On 14.12.20 14:00, Fabian Ebner wrote:
>>> @@ -584,6 +586,33 @@ sub destroy_job {
>>> });
>>> }
>>> +sub get_process_start_time {
>>> + my ($pid) = @_;
>>> +
>>> + return eval { run_cmd(['ps', '-o', 'lstart=', '-p', "$pid"]); };
>>
>> instead of fork+exec do a much cheaper file read?
>>
>> I.e., copying over file_read_firstline from PVE::Tools then:
>>
>> sub get_process_start_time {
>> my $stat_str = file_read_firstline("/proc/$pid/stat");
>> my $stat = [ split(/\s+/, $stat_str) ];
>>
>> return $stat->[21];
>> }
>>
>> plus some error handling (note I did not test above)
>>
>
> Agreed, although we also need to obtain the boot time (from /proc/stat) to have the actual start time, because the value in /proc/$pid/stat is just the number of clock ticks since boot when the process was started. But it's still much cheaper of course.
hmm, yeah intra-boot this would not be enough to always tell 100% for sure.
FYI, there you probably could also use `/proc/sys/kernel/random/boot_id` can be
read once at program startup.
http://0pointer.de/blog/projects/ids.html (see "Software IDs"),
>>> @@ -593,11 +622,18 @@ sub sync {
>>> eval { $job = get_job($param) };
>>> if ($job) {
>>> - if (defined($job->{state}) && ($job->{state} eq "syncing" || $job->{state} eq "waiting")) {
>>> + my $state = $job->{state} // 'ok';
>>> + $state = 'ok' if !instance_exists($job->{instance_id});
>>> +
>>> + if ($state eq "syncing" || $state eq "waiting") {
>>> die "Job --source $param->{source} --name $param->{name} is already scheduled to sync\n";
>>> }
>>> $job->{state} = "waiting";
>>> +
>>> + eval { $job->{instance_id} = get_instance_id($$); };
>>
>> I'd query and cache the local instance ID from the current process on startup, this
>> would have the nice side effect of avoiding error potential here completely
>>
>
> What if querying fails on startup? I'd rather have it be a non-critical failure and continue. Then we'd still need a check here to see if the cached instance_id is defined.
if you make it just reads of /proc and it fails you can assume critical
conditions and abort. If you really do not want too, you can add a singleton
which returns the cached info and if not available retry getting it and warn.
my $id_cache;
sub get_local_instance_id {
return $id_cache if defined($id_cache);
$id_cache = eval { get_instance_id($$) };
warn $@ if $@;
return $id_cache;
}
Albeit, I'd have less hard feelings about caching if getting the ID doesn't
fork, nor other rather costly operations.
More information about the pve-devel
mailing list