[pve-devel] [PATCH v2 kernel] fix #950: cherry-pick length check for tcp_mark_head_lost

Fabian Grünbichler f.gruenbichler at proxmox.com
Thu Apr 21 11:06:40 CEST 2016


---
rebased on current master
 Makefile                                           |  1 +
 ..._mark_head_lost-to-check-skb-len-before-f.patch | 77 ++++++++++++++++++++++
 2 files changed, 78 insertions(+)
 create mode 100644 bug-950-tcp-fix-tcp_mark_head_lost-to-check-skb-len-before-f.patch

diff --git a/Makefile b/Makefile
index 4556942..bea6f28 100644
--- a/Makefile
+++ b/Makefile
@@ -238,6 +238,7 @@ ${KERNEL_SRC}/README ${KERNEL_CFG_ORG}: ${KERNELSRCTAR}
 	#cd ${KERNEL_SRC}; patch -p1 <../vhost-net-extend-device-allocation-to-vmalloc.patch
 	cd ${KERNEL_SRC}; patch -p1 <../CVE-2016-3955-usbip-fix-potential-out-of-bound-write.patch
 	cd ${KERNEL_SRC}; patch -p1 <../CVE-2016-3951-usbnet-memory-corruption-triggered-by-invalid-USB-descriptor.patch
+	cd ${KERNEL_SRC}; patch -p1 <../bug-950-tcp-fix-tcp_mark_head_lost-to-check-skb-len-before-f.patch
 	sed -i ${KERNEL_SRC}/Makefile -e 's/^EXTRAVERSION.*$$/EXTRAVERSION=${EXTRAVERSION}/'
 	touch $@
 
diff --git a/bug-950-tcp-fix-tcp_mark_head_lost-to-check-skb-len-before-f.patch b/bug-950-tcp-fix-tcp_mark_head_lost-to-check-skb-len-before-f.patch
new file mode 100644
index 0000000..2d9bebd
--- /dev/null
+++ b/bug-950-tcp-fix-tcp_mark_head_lost-to-check-skb-len-before-f.patch
@@ -0,0 +1,77 @@
+From d88270eef4b56bd7973841dd1fed387ccfa83709 Mon Sep 17 00:00:00 2001
+From: Neal Cardwell <ncardwell at google.com>
+Date: Mon, 25 Jan 2016 14:01:53 -0800
+Subject: [PATCH] tcp: fix tcp_mark_head_lost to check skb len before
+ fragmenting
+
+This commit fixes a corner case in tcp_mark_head_lost() which was
+causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.
+
+tcp_mark_head_lost() was assuming that if a packet has
+tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
+M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
+maintained, this is not always true.
+
+For example, suppose the sender sends 4 1-byte packets and have the
+last 3 packet sacked. It will merge the last 3 packets in the write
+queue into an skb with pcount = 3 and len = 3 bytes. If another
+recovery happens after a sack reneging event, tcp_mark_head_lost()
+may attempt to split the skb assuming it has more than 2*MSS bytes.
+
+This sounds very counterintuitive, but as the commit description for
+the related commit c0638c247f55 ("tcp: don't fragment SACKed skbs in
+tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
+coalesces adjacent regions of SACKed skbs, and when doing this it
+preserves the sum of their packet counts in order to reflect the
+real-world dynamics on the wire. The c0638c247f55 commit tried to
+avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
+where the non-proportionality between pcount and skb->len/mss is known
+to be possible. However, that commit did not handle the case where
+during a reneging event one of these weird SACKed skbs becomes an
+un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.
+
+The fix is to simply mark the entire skb lost when this happens.
+This makes the recovery slightly more aggressive in such corner
+cases before we detect reordering. But once we detect reordering
+this code path is by-passed because FACK is disabled.
+
+Signed-off-by: Neal Cardwell <ncardwell at google.com>
+Signed-off-by: Yuchung Cheng <ycheng at google.com>
+Signed-off-by: Eric Dumazet <edumazet at google.com>
+Signed-off-by: David S. Miller <davem at davemloft.net>
+Cherry-picked-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
+---
+ net/ipv4/tcp_input.c | 10 +++++-----
+ 1 file changed, 5 insertions(+), 5 deletions(-)
+
+diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
+index 0003d40..d2ad433 100644
+--- a/net/ipv4/tcp_input.c
++++ b/net/ipv4/tcp_input.c
+@@ -2164,8 +2164,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head)
+ {
+ 	struct tcp_sock *tp = tcp_sk(sk);
+ 	struct sk_buff *skb;
+-	int cnt, oldcnt;
+-	int err;
++	int cnt, oldcnt, lost;
+ 	unsigned int mss;
+ 	/* Use SACK to deduce losses of new sequences sent during recovery */
+ 	const u32 loss_high = tcp_is_sack(tp) ?  tp->snd_nxt : tp->high_seq;
+@@ -2205,9 +2204,10 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head)
+ 				break;
+ 
+ 			mss = tcp_skb_mss(skb);
+-			err = tcp_fragment(sk, skb, (packets - oldcnt) * mss,
+-					   mss, GFP_ATOMIC);
+-			if (err < 0)
++			/* If needed, chop off the prefix to mark as lost. */
++			lost = (packets - oldcnt) * mss;
++			if (lost < skb->len &&
++			    tcp_fragment(sk, skb, lost, mss, GFP_ATOMIC) < 0)
+ 				break;
+ 			cnt = packets;
+ 		}
+-- 
+2.1.4
+
-- 
2.1.4





More information about the pve-devel mailing list