[pve-devel] [PATCH kronosnet 1/2] fix #3672: cherry-pick knet fixes
Fabian Grünbichler
f.gruenbichler at proxmox.com
Tue Nov 9 13:07:41 CET 2021
see https://github.com/corosync/corosync/issues/660 as well. these are
already queued for 1.23 and taken straight from stable1-proposed.
Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
---
...eq_num-initialization-race-condition.patch | 53 +++++++++++
...or-messages-to-trigger-faster-link-d.patch | 92 +++++++++++++++++++
debian/patches/series | 3 +-
3 files changed, 147 insertions(+), 1 deletion(-)
create mode 100644 debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
create mode 100644 debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
diff --git a/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
new file mode 100644
index 0000000..d01e0d4
--- /dev/null
+++ b/debian/patches/0001-host-fix-dst_seq_num-initialization-race-condition.patch
@@ -0,0 +1,53 @@
+From 7eebe93c5039dad432bdd40101287e7fc04b3d10 Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto at redhat.com>
+Date: Mon, 8 Nov 2021 09:14:22 +0100
+Subject: [PATCH 1/2] [host] fix dst_seq_num initialization race condition
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There is a potential race condition where the sender
+is overloaded, sending data packets before pings
+can kick in and set the correct dst_seq_num.
+
+if this node is starting up (dst_seq_num = 0),
+it can start rejecing valid packets and get stuck.
+
+Set the dst_seq_num to the first seen packet and
+use that as reference instead.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto at redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
+---
+ libknet/host.c | 15 +++++++++++++++
+ 1 file changed, 15 insertions(+)
+
+diff --git a/libknet/host.c b/libknet/host.c
+index ec73c0df..6fca01f8 100644
+--- a/libknet/host.c
++++ b/libknet/host.c
+@@ -573,6 +573,21 @@ int _seq_num_lookup(struct knet_host *host, seq_num_t seq_num, int defrag_buf, i
+ char *dst_cbuf_defrag = host->circular_buffer_defrag;
+ seq_num_t *dst_seq_num = &host->rx_seq_num;
+
++ /*
++ * There is a potential race condition where the sender
++ * is overloaded, sending data packets before pings
++ * can kick in and set the correct dst_seq_num.
++ *
++ * if this node is starting up (dst_seq_num = 0),
++ * it can start rejecing valid packets and get stuck.
++ *
++ * Set the dst_seq_num to the first seen packet and
++ * use that as reference instead.
++ */
++ if (!*dst_seq_num) {
++ *dst_seq_num = seq_num;
++ }
++
+ if (clear_buf) {
+ _clear_cbuffers(host, seq_num);
+ }
+--
+2.30.2
+
diff --git a/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
new file mode 100644
index 0000000..c8a9990
--- /dev/null
+++ b/debian/patches/0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
@@ -0,0 +1,92 @@
+From 1d52003ae7814ebf2b47c1ac3463c7d82486a5fd Mon Sep 17 00:00:00 2001
+From: "Fabio M. Di Nitto" <fdinitto at redhat.com>
+Date: Sun, 7 Nov 2021 17:02:05 +0100
+Subject: [PATCH 2/2] [udp] use ICMP error messages to trigger faster link down
+ detection
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+this solves a possible race condition when:
+
+- node1 is running
+- node2 very fast
+- node1 does NOT have enough time to detect that node2 has gone
+ and reset the local seq numbers / buffers
+- node1 will start rejecting valid packets from node2
+
+There is still a potential minor race condition where app
+can restart so fast that kernel / network don't have time
+to generate an ICMP error. This will be addressed using
+instance id in onwire v2 protocol, as suggested by Jan F.
+
+Signed-off-by: Fabio M. Di Nitto <fdinitto at redhat.com>
+Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
+---
+ libknet/transport_udp.c | 44 +++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 44 insertions(+)
+
+diff --git a/libknet/transport_udp.c b/libknet/transport_udp.c
+index 482c99b1..a1419c89 100644
+--- a/libknet/transport_udp.c
++++ b/libknet/transport_udp.c
+@@ -364,6 +364,46 @@ static int read_errs_from_sock(knet_handle_t knet_h, int sockfd)
+ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s destination unknown", addr_str, strerror(sock_err->ee_errno));
+ } else {
+ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Received ICMP error from %s: %s %s", addr_str, strerror(sock_err->ee_errno), addr_remote_str);
++ if ((sock_err->ee_errno == ECONNREFUSED) || /* knet is not running on the other node */
++ (sock_err->ee_errno == ECONNABORTED) || /* local kernel closed the socket */
++ (sock_err->ee_errno == ENONET) || /* network does not exist */
++ (sock_err->ee_errno == ENETUNREACH) || /* network unreachable */
++ (sock_err->ee_errno == EHOSTUNREACH) || /* host unreachable */
++ (sock_err->ee_errno == EHOSTDOWN) || /* host down (from kernel/net/ipv4/icmp.c */
++ (sock_err->ee_errno == ENETDOWN)) { /* network down */
++ struct knet_host *host = NULL;
++ struct knet_link *kn_link = NULL;
++ int link_idx, found = 0;
++
++ for (host = knet_h->host_head; host != NULL; host = host->next) {
++ for (link_idx = 0; link_idx < KNET_MAX_LINK; link_idx++) {
++ kn_link = &host->link[link_idx];
++ if (kn_link->outsock == sockfd) {
++ if (!cmpaddr(&remote, &kn_link->dst_addr)) {
++ found = 1;
++ break;
++ }
++ }
++ }
++ if (found) {
++ break;
++ }
++ }
++
++ if ((host) && (kn_link) &&
++ (kn_link->status.connected)) {
++ log_debug(knet_h, KNET_SUB_TRANSP_UDP, "Setting down host %u link %i", host->host_id, kn_link->link_id);
++ /*
++ * setting transport_connected = 0 will trigger
++ * thread_heartbeat link_down process.
++ *
++ * the process terminates calling into transport_link_down
++ * below that will set transport_connected = 1
++ */
++ kn_link->transport_connected = 0;
++ }
++
++ }
+ }
+ }
+ break;
+@@ -436,5 +476,9 @@ int udp_transport_link_dyn_connect(knet_handle_t knet_h, int sockfd, struct knet
+
+ int udp_transport_link_is_down(knet_handle_t knet_h, struct knet_link *kn_link)
+ {
++ /*
++ * see comments about handling ICMP error messages
++ */
++ kn_link->transport_connected = 1;
+ return 0;
+ }
+--
+2.30.2
+
diff --git a/debian/patches/series b/debian/patches/series
index 8b13789..16fba19 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1 +1,2 @@
-
+0001-host-fix-dst_seq_num-initialization-race-condition.patch
+0002-udp-use-ICMP-error-messages-to-trigger-faster-link-d.patch
--
2.30.2
More information about the pve-devel
mailing list