[pve-devel] [PATCH/RFC guest-common 6/6] job_status: return jobs with target local node
Fabian Ebner
f.ebner at proxmox.com
Tue Aug 11 11:20:55 CEST 2020
There is another minor issue (with and without my patches):
If there is a job 123-0 for a guest on pve0 with source=target=pve0 and
a job 123-4 with source=pve0 and target=pve1, and we migrate to pve1,
then switching source and target for job 123-4 is not possible, because
there already is a job with target pve0. Thus cfg->write() will fail,
and by extension job_status. (Also the switch_replication_job_target
call during migration fails for the same reason).
Possible solutions:
1. Instead of making such jobs (i.e. jobs with target=source) visible,
in the hope that a user would remove/fix them, we could automatically
remove them ourselves (this could be done as part of the
switch_replication_job_target function as well). Under normal
conditions, there shouldn't be any such jobs anyways.
2. Alternatively (or additionally), we could also add checks in the
create/update API paths to ensure that the target is not the node the
guest is on.
Option 2 would add a reason for using guest_migration locks in the
create/update paths. But I'm not sure we'd want that. The ability to
update job configurations while a replication is running is a feature
IMHO, and I think stealing guests might still lead to a bad
configuration. Therefore, I'd prefer option 1, which just adds a bit to
the automatic fixing we already do.
@Fabian G.: Opinions?
Am 10.08.20 um 14:35 schrieb Fabian Ebner:
> even if not scheduled for removal, while adapting
> replicate to die gracefully except for the removal case.
>
> Like this such invalid jobs are not hidden to the user anymore
> (at least via the API, the GUI still hides them)
>
> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
> ---
>
> I think it's a bit weird that such jobs only show up once
> they are scheduled for removal. I'll send a patch for the
> GUI too if we do want the new behavior.
>
> PVE/Replication.pm | 3 +++
> PVE/ReplicationState.pm | 5 +----
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/Replication.pm b/PVE/Replication.pm
> index ae0f145..b5835bd 100644
> --- a/PVE/Replication.pm
> +++ b/PVE/Replication.pm
> @@ -207,6 +207,9 @@ sub replicate {
>
> die "not implemented - internal error" if $jobcfg->{type} ne 'local';
>
> + die "job target is local node\n" if $jobcfg->{target} eq $local_node
> + && !$jobcfg->{remove_job};
> +
> my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
>
> my $migration_network;
> diff --git a/PVE/ReplicationState.pm b/PVE/ReplicationState.pm
> index e486bc7..0b751bb 100644
> --- a/PVE/ReplicationState.pm
> +++ b/PVE/ReplicationState.pm
> @@ -261,10 +261,6 @@ sub job_status {
> $cfg->switch_replication_job_target_nolock($vmid, $local_node, $jobcfg->{source})
> if $local_node ne $jobcfg->{source};
>
> - my $target = $jobcfg->{target};
> - # never sync to local node
> - next if !$jobcfg->{remove_job} && $target eq $local_node;
> -
> next if !$get_disabled && $jobcfg->{disable};
>
> my $state = extract_job_state($stateobj, $jobcfg);
> @@ -280,6 +276,7 @@ sub job_status {
> } else {
> if (my $fail_count = $state->{fail_count}) {
> my $members = PVE::Cluster::get_members();
> + my $target = $jobcfg->{target};
> if (!$fail_count || ($members->{$target} && $members->{$target}->{online})) {
> $next_sync = $state->{last_try} + 60*($fail_count < 3 ? 5*$fail_count : 30);
> }
>
More information about the pve-devel
mailing list