[pve-devel] [PATCH storage] iscsi: disable Open-iSCSI login retries to avoid blocking pvestatd
Mira Limbeck
m.limbeck at proxmox.com
Fri Oct 11 13:20:06 CEST 2024
On 10/11/24 11:37, Friedrich Weber wrote:
> Since 90c1b10 ("fix #254: iscsi: add support for multipath targets"),
> iSCSI storage activation checks whether a session exists for each
> discovered portal. If there is a discovered portal without a session,
> it performs a discovery and login in the hope of establishing a
> session to the portal. If the portal is unreachable when trying to log
> in, Open-iSCSI's default behavior is to retry for up to 2 minutes, as
> explained in /etc/iscsi/iscid.conf:
>
>> # The default node.session.initial_login_retry_max is 8 and
>> # node.conn[0].timeo.login_timeout is 15 so we have:
>> #
>> # node.conn[0].timeo.login_timeout * \
>> node.session.initial_login_retry_max = 120s
>
> If pvestatd is activating the storage, it will be blocked during that
> time, which is undesirable. This is particularly unfortunate if the
> target announces portals that the host permanently cannot reach. In
> that case, every pvestatd iteration will take 2 minutes. While it can
> be argued that such setups are misconfigured, it is still desirable to
> keep the fallout of that misconfiguration as low as possible.
>
> In order to reduce the time Open-iSCSI tries to log in, instruct
> Open-ISCSI to not perform login retries for that target. For this, set
> node.session.initial_login_retry_max for the target to 0. This setting
> is stored in Open-iSCSI's records under /etc/iscsi/nodes. As these
> records are overwritten with the defaults from /etc/iscsi/iscsid.conf
> on discovery, the setting needs to be applied after discovery.
>
> With this setting, one login attempt should take at most 15 seconds.
> This is still higher than pvestatd's iteration time of 10 seconds, but
> more tolerable. Logins will still be continuously retried by pvestatd
> in every iteration until there is a session to each discovered portal.
>
> Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
> ---
> src/PVE/Storage/ISCSIPlugin.pm | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/src/PVE/Storage/ISCSIPlugin.pm b/src/PVE/Storage/ISCSIPlugin.pm
> index 2bdd9a2..efd9de4 100644
> --- a/src/PVE/Storage/ISCSIPlugin.pm
> +++ b/src/PVE/Storage/ISCSIPlugin.pm
> @@ -132,6 +132,14 @@ sub iscsi_login {
> eval { iscsi_discovery($portals); };
> warn $@ if $@;
>
> + # Disable retries to avoid blocking pvestatd for too long, next iteration will retry anyway
> + eval {
> + my $cmd = [$ISCSIADM, '--mode', 'node', '--targetname', $target, '--op', 'update',
> + '--name', 'node.session.initial_login_retry_max', '--value', '0'];
As shortly discussed off-list, this should probably follow a similar
style as the `Wrapping Arguments` section in the Perl Style Guide, but
grouping option and value together in the same line?
https://pve.proxmox.com/wiki/Perl_Style_Guide#Wrapping_Arguments
> + run_command($cmd);
> + };
> + warn $@ if $@;
> +
> run_command([$ISCSIADM, '--mode', 'node', '--targetname', $target, '--login']);
> }
>
Tested this with 4 portals by disconnecting 2. With this patch the
pvestatd update time was at ~30 seconds, matching 2 failed logins.
Without the patch it was ~484 seconds.
Since a login fails in 7 seconds, the old behavior actually did more
than 8 retries, see the comment for `initial_login_retry_max`:
# Note that if the login fails
# quickly (before node.conn[0].timeo.login_timeout fires) because the
network
# layer or the target returns an error, iscsid may retry the login more than
# node.session.initial_login_retry_max times.
So especially for the cases of `no route to host` this should improve
the update time significantly for multiple portals where some are never
reachable.
Consider this patch:
Tested-by: Mira Limbeck <m.limbeck at proxmox.com>
Reviewed-by: Mira Limbeck <m.limbeck at proxmox.com>
More information about the pve-devel
mailing list