Message ID | 160765171921.6905.7897898635812579754.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | fail | Series targets non-next tree, but doesn't contain any Fixes tags |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | fail | Errors and warnings before: 1560 this patch: 1564 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 43 lines checked |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 1569 this patch: 1573 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Fri, Dec 11, 2020 at 2:55 AM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > From: Alexander Duyck <alexanderduyck@fb.com> > > In the case of a fastopen SYN there are cases where it may trigger either a > ICMP_TOOBIG message in the case of IPv6 or a fragmentation request in the > case of IPv4. This results in the socket stalling for a second or more as > it does not respond to the message by retransmitting the SYN frame. > > Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or > ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that > makes use of the entire MTU. In the case of fastopen it does, and an > additional complication is that the retransmit queue doesn't contain the > original frames. As a result when tcp_simple_retransmit is called and > walks the list of frames in the queue it may not mark the frames as lost > because both the SYN and the data packet each individually are smaller than > the MSS size after the adjustment. This results in the socket being stalled > until the retransmit timer kicks in and forces the SYN frame out again > without the data attached. > > In order to resolve this we need to mark the SYN frame as lost if it is the > first packet in the queue. Doing this allows the socket to recover much > more quickly without the retransmit timeout stall. > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> I do not think it is net candidate, but net-next Yuchung might correct me, but I think TCP Fastopen standard was very conservative about payload len in the SYN packet So receiving an ICMP was never considered. > --- > include/net/tcp.h | 1 + > net/ipv4/tcp_input.c | 8 ++++++++ > net/ipv4/tcp_ipv4.c | 6 ++++++ > net/ipv6/tcp_ipv6.c | 4 ++++ > 4 files changed, 19 insertions(+) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index d4ef5bf94168..6181ad98727a 100644 > --- a/include/net/tcp.h > +++ b/net/ipv4/tcp_ipv4.c > @@ -546,6 +546,12 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > if (sk->sk_state == TCP_LISTEN) > goto out; > > + /* fastopen SYN may have triggered the fragmentation > + * request. Mark the SYN or SYN/ACK as lost. > + */ > + if (sk->sk_state == TCP_SYN_SENT) > + tcp_mark_syn_lost(sk); This is going to crash in some cases, you do not know if you own the socket. (Look a few lines below) > + > tp->mtu_info = info; > if (!sock_owned_by_user(sk)) { > tcp_v4_mtu_reduced(sk); > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c > index 992cbf3eb9e3..d7b1346863e3 100644 > --- a/net/ipv6/tcp_ipv6.c > +++ b/net/ipv6/tcp_ipv6.c > @@ -443,6 +443,10 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, > if (!ip6_sk_accept_pmtu(sk)) > goto out; > > + /* fastopen SYN may have triggered TOOBIG, mark it lost. */ > + if (sk->sk_state == TCP_SYN_SENT) > + tcp_mark_syn_lost(sk); Same issue here. > + > tp->mtu_info = ntohl(info); > if (!sock_owned_by_user(sk)) > tcp_v6_mtu_reduced(sk); > >
Hi Alexander,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net/master]
url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/tcp-Mark-fastopen-SYN-packet-as-lost-when-receiving-ICMP_TOOBIG-ICMP_FRAG_NEEDED/20201211-100032
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git d9838b1d39283c1200c13f9076474c7624b8ec34
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/01abda6be2a196620ae2057c3e654edc82beb144
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alexander-Duyck/tcp-Mark-fastopen-SYN-packet-as-lost-when-receiving-ICMP_TOOBIG-ICMP_FRAG_NEEDED/20201211-100032
git checkout 01abda6be2a196620ae2057c3e654edc82beb144
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "tcp_mark_syn_lost" [net/ipv6/ipv6.ko] undefined!
ERROR: modpost: "clk_set_min_rate" [sound/soc/atmel/snd-soc-mchp-spdifrx.ko] undefined!
ERROR: modpost: "__delay" [drivers/net/mdio/mdio-cavium.ko] undefined!
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Thu, Dec 10, 2020 at 10:24 PM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Dec 11, 2020 at 2:55 AM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > In the case of a fastopen SYN there are cases where it may trigger either a > > ICMP_TOOBIG message in the case of IPv6 or a fragmentation request in the > > case of IPv4. This results in the socket stalling for a second or more as > > it does not respond to the message by retransmitting the SYN frame. > > > > Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or > > ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that > > makes use of the entire MTU. In the case of fastopen it does, and an > > additional complication is that the retransmit queue doesn't contain the > > original frames. As a result when tcp_simple_retransmit is called and > > walks the list of frames in the queue it may not mark the frames as lost > > because both the SYN and the data packet each individually are smaller than > > the MSS size after the adjustment. This results in the socket being stalled > > until the retransmit timer kicks in and forces the SYN frame out again > > without the data attached. > > > > In order to resolve this we need to mark the SYN frame as lost if it is the > > first packet in the queue. Doing this allows the socket to recover much > > more quickly without the retransmit timeout stall. > > > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > > I do not think it is net candidate, but net-next > > Yuchung might correct me, but I think TCP Fastopen standard was very > conservative about payload len in the SYN packet > > So receiving an ICMP was never considered. That's fine. I can target this for net-next. I had just selected net since I had considered it a fix, but I suppose it could be considered a behavioral change. > > --- > > include/net/tcp.h | 1 + > > net/ipv4/tcp_input.c | 8 ++++++++ > > net/ipv4/tcp_ipv4.c | 6 ++++++ > > net/ipv6/tcp_ipv6.c | 4 ++++ > > 4 files changed, 19 insertions(+) > > > > diff --git a/include/net/tcp.h b/include/net/tcp.h > > index d4ef5bf94168..6181ad98727a 100644 > > --- a/include/net/tcp.h > > > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -546,6 +546,12 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > > if (sk->sk_state == TCP_LISTEN) > > goto out; > > > > + /* fastopen SYN may have triggered the fragmentation > > + * request. Mark the SYN or SYN/ACK as lost. > > + */ > > + if (sk->sk_state == TCP_SYN_SENT) > > + tcp_mark_syn_lost(sk); > > This is going to crash in some cases, you do not know if you own the socket. > (Look a few lines below) Okay, I will look into moving this down into the block below since I assume if it is owned by user we cannot make these changes. > > + > > tp->mtu_info = info; > > if (!sock_owned_by_user(sk)) { > > tcp_v4_mtu_reduced(sk); > > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c > > index 992cbf3eb9e3..d7b1346863e3 100644 > > --- a/net/ipv6/tcp_ipv6.c > > +++ b/net/ipv6/tcp_ipv6.c > > @@ -443,6 +443,10 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, > > if (!ip6_sk_accept_pmtu(sk)) > > goto out; > > > > + /* fastopen SYN may have triggered TOOBIG, mark it lost. */ > > + if (sk->sk_state == TCP_SYN_SENT) > > + tcp_mark_syn_lost(sk); > > > Same issue here. I'll move this one too. > > + > > tp->mtu_info = ntohl(info); > > if (!sock_owned_by_user(sk)) > > tcp_v6_mtu_reduced(sk); > > > >
On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > That's fine. I can target this for net-next. I had just selected net > since I had considered it a fix, but I suppose it could be considered > a behavioral change. We are very late in the 5.10 cycle, and we never handled ICMP in this state, so net-next is definitely better. Note that RFC 7413 states in 4.1.3 : The client MUST cache cookies from servers for later Fast Open connections. For a multihomed client, the cookies are dependent on the client and server IP addresses. Hence, the client should cache at most one (most recently received) cookie per client and server IP address pair. When caching cookies, we recommend that the client also cache the Maximum Segment Size (MSS) advertised by the server. The client can cache the MSS advertised by the server in order to determine the maximum amount of data that the client can fit in the SYN packet in subsequent TFO connections. Caching the server MSS is useful because, with Fast Open, a client sends data in the SYN packet before the server announces its MSS in the SYN-ACK packet. If the client sends more data in the SYN packet than the server will accept, this will likely require the client to retransmit some or all of the data. Hence, caching the server MSS can enhance performance. Without a cached server MSS, the amount of data in the SYN packet is limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220 bytes for IPv6 [RFC2460]. Even if the client complies with this limit when sending the SYN, it is known that an IPv4 receiver advertising an MSS less than 536 bytes can receive a segment larger than it is expecting. If the cached MSS is larger than the typical size (1460 bytes for IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet may cause problems that offset the performance benefit of Fast Open. For example, the unusually large SYN may trigger IP fragmentation and may confuse firewalls or middleboxes, causing SYN retransmission and other side effects. Therefore, the client MAY limit the cached MSS to 1460 bytes for IPv4 or 1440 for IPv6. Relying on ICMP is fragile, since they can be filtered in some way.
On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > That's fine. I can target this for net-next. I had just selected net > > since I had considered it a fix, but I suppose it could be considered > > a behavioral change. > > We are very late in the 5.10 cycle, and we never handled ICMP in this > state, so net-next is definitely better. > > Note that RFC 7413 states in 4.1.3 : > > The client MUST cache cookies from servers for later Fast Open > connections. For a multihomed client, the cookies are dependent on > the client and server IP addresses. Hence, the client should cache > at most one (most recently received) cookie per client and server IP > address pair. > > When caching cookies, we recommend that the client also cache the > Maximum Segment Size (MSS) advertised by the server. The client can > cache the MSS advertised by the server in order to determine the > maximum amount of data that the client can fit in the SYN packet in > subsequent TFO connections. Caching the server MSS is useful > because, with Fast Open, a client sends data in the SYN packet before > the server announces its MSS in the SYN-ACK packet. If the client > sends more data in the SYN packet than the server will accept, this > will likely require the client to retransmit some or all of the data. > Hence, caching the server MSS can enhance performance. > > Without a cached server MSS, the amount of data in the SYN packet is > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220 > bytes for IPv6 [RFC2460]. Even if the client complies with this > limit when sending the SYN, it is known that an IPv4 receiver > advertising an MSS less than 536 bytes can receive a segment larger > than it is expecting. > > If the cached MSS is larger than the typical size (1460 bytes for > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet > may cause problems that offset the performance benefit of Fast Open. > For example, the unusually large SYN may trigger IP fragmentation and > may confuse firewalls or middleboxes, causing SYN retransmission and > other side effects. Therefore, the client MAY limit the cached MSS > to 1460 bytes for IPv4 or 1440 for IPv6. > > > Relying on ICMP is fragile, since they can be filtered in some way. In this case I am not relying on the ICMP, but thought that since I have it I should make use of it. WIthout the ICMP we would still just be waiting on the retransmit timer. The problem case has a v6-in-v6 tunnel between the client and the endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS which works fine until they actually go to send a large packet between the two. At that point the tunnel is triggering an ICMP_TOOBIG and the endpoint is stalling since the MSS is dropped to 1400, but the SYN and data payload were already smaller than that so no retransmits are being triggered. This results in TFO being 1s slower than non-TFO because of the failure to trigger the retransmit for the frame that violated the PMTU. The patch is meant to get the two back into comparable times.
On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > That's fine. I can target this for net-next. I had just selected net > > > since I had considered it a fix, but I suppose it could be considered > > > a behavioral change. > > > > We are very late in the 5.10 cycle, and we never handled ICMP in this > > state, so net-next is definitely better. > > > > Note that RFC 7413 states in 4.1.3 : > > > > The client MUST cache cookies from servers for later Fast Open > > connections. For a multihomed client, the cookies are dependent on > > the client and server IP addresses. Hence, the client should cache > > at most one (most recently received) cookie per client and server IP > > address pair. > > > > When caching cookies, we recommend that the client also cache the > > Maximum Segment Size (MSS) advertised by the server. The client can > > cache the MSS advertised by the server in order to determine the > > maximum amount of data that the client can fit in the SYN packet in > > subsequent TFO connections. Caching the server MSS is useful > > because, with Fast Open, a client sends data in the SYN packet before > > the server announces its MSS in the SYN-ACK packet. If the client > > sends more data in the SYN packet than the server will accept, this > > will likely require the client to retransmit some or all of the data. > > Hence, caching the server MSS can enhance performance. > > > > Without a cached server MSS, the amount of data in the SYN packet is > > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220 > > bytes for IPv6 [RFC2460]. Even if the client complies with this > > limit when sending the SYN, it is known that an IPv4 receiver > > advertising an MSS less than 536 bytes can receive a segment larger > > than it is expecting. > > > > If the cached MSS is larger than the typical size (1460 bytes for > > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet > > may cause problems that offset the performance benefit of Fast Open. > > For example, the unusually large SYN may trigger IP fragmentation and > > may confuse firewalls or middleboxes, causing SYN retransmission and > > other side effects. Therefore, the client MAY limit the cached MSS > > to 1460 bytes for IPv4 or 1440 for IPv6. > > > > > > Relying on ICMP is fragile, since they can be filtered in some way. > > In this case I am not relying on the ICMP, but thought that since I > have it I should make use of it. WIthout the ICMP we would still just > be waiting on the retransmit timer. > > The problem case has a v6-in-v6 tunnel between the client and the > endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS > which works fine until they actually go to send a large packet between > the two. At that point the tunnel is triggering an ICMP_TOOBIG and the > endpoint is stalling since the MSS is dropped to 1400, but the SYN and > data payload were already smaller than that so no retransmits are > being triggered. This results in TFO being 1s slower than non-TFO > because of the failure to trigger the retransmit for the frame that > violated the PMTU. The patch is meant to get the two back into > comparable times. Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent) code does not yet handle the retransmit in TCP_SYN_SENT state ?
On Fri, Dec 11, 2020 at 11:18 AM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck > > > <alexander.duyck@gmail.com> wrote: > > > > > > > That's fine. I can target this for net-next. I had just selected net > > > > since I had considered it a fix, but I suppose it could be considered > > > > a behavioral change. > > > > > > We are very late in the 5.10 cycle, and we never handled ICMP in this > > > state, so net-next is definitely better. > > > > > > Note that RFC 7413 states in 4.1.3 : > > > > > > The client MUST cache cookies from servers for later Fast Open > > > connections. For a multihomed client, the cookies are dependent on > > > the client and server IP addresses. Hence, the client should cache > > > at most one (most recently received) cookie per client and server IP > > > address pair. > > > > > > When caching cookies, we recommend that the client also cache the > > > Maximum Segment Size (MSS) advertised by the server. The client can > > > cache the MSS advertised by the server in order to determine the > > > maximum amount of data that the client can fit in the SYN packet in > > > subsequent TFO connections. Caching the server MSS is useful > > > because, with Fast Open, a client sends data in the SYN packet before > > > the server announces its MSS in the SYN-ACK packet. If the client > > > sends more data in the SYN packet than the server will accept, this > > > will likely require the client to retransmit some or all of the data. > > > Hence, caching the server MSS can enhance performance. > > > > > > Without a cached server MSS, the amount of data in the SYN packet is > > > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220 > > > bytes for IPv6 [RFC2460]. Even if the client complies with this > > > limit when sending the SYN, it is known that an IPv4 receiver > > > advertising an MSS less than 536 bytes can receive a segment larger > > > than it is expecting. > > > > > > If the cached MSS is larger than the typical size (1460 bytes for > > > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet > > > may cause problems that offset the performance benefit of Fast Open. > > > For example, the unusually large SYN may trigger IP fragmentation and > > > may confuse firewalls or middleboxes, causing SYN retransmission and > > > other side effects. Therefore, the client MAY limit the cached MSS > > > to 1460 bytes for IPv4 or 1440 for IPv6. > > > > > > > > > Relying on ICMP is fragile, since they can be filtered in some way. > > > > In this case I am not relying on the ICMP, but thought that since I > > have it I should make use of it. WIthout the ICMP we would still just > > be waiting on the retransmit timer. > > > > The problem case has a v6-in-v6 tunnel between the client and the > > endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS > > which works fine until they actually go to send a large packet between > > the two. At that point the tunnel is triggering an ICMP_TOOBIG and the > > endpoint is stalling since the MSS is dropped to 1400, but the SYN and > > data payload were already smaller than that so no retransmits are > > being triggered. This results in TFO being 1s slower than non-TFO > > because of the failure to trigger the retransmit for the frame that > > violated the PMTU. The patch is meant to get the two back into > > comparable times. > > Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent) > code does not yet handle the retransmit in TCP_SYN_SENT state ? The problem lies in tcp_simple_retransmit(). Specifically the loop at the start of the function goes to check the retransmit queue to see if there are any packets larger than MSS and finds none since we don't place the SYN w/ data in there and instead have a separate SYN and data packet. I'm debating if I should take an alternative approach and modify the loop at the start of tcp_simple_transmit to add a check for a SYN packet, tp->syn_data being set, and then comparing the next frame length + MAX_TCP_HEADER_OPTIONS versus mss.
On Fri, Dec 11, 2020 at 1:51 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Fri, Dec 11, 2020 at 11:18 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > > > > > > On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck > > > > <alexander.duyck@gmail.com> wrote: > > > > > > > > > That's fine. I can target this for net-next. I had just selected net > > > > > since I had considered it a fix, but I suppose it could be considered > > > > > a behavioral change. > > > > > > > > We are very late in the 5.10 cycle, and we never handled ICMP in this > > > > state, so net-next is definitely better. > > > > > > > > Note that RFC 7413 states in 4.1.3 : > > > > > > > > The client MUST cache cookies from servers for later Fast Open > > > > connections. For a multihomed client, the cookies are dependent on > > > > the client and server IP addresses. Hence, the client should cache > > > > at most one (most recently received) cookie per client and server IP > > > > address pair. > > > > > > > > When caching cookies, we recommend that the client also cache the > > > > Maximum Segment Size (MSS) advertised by the server. The client can > > > > cache the MSS advertised by the server in order to determine the > > > > maximum amount of data that the client can fit in the SYN packet in > > > > subsequent TFO connections. Caching the server MSS is useful > > > > because, with Fast Open, a client sends data in the SYN packet before > > > > the server announces its MSS in the SYN-ACK packet. If the client > > > > sends more data in the SYN packet than the server will accept, this > > > > will likely require the client to retransmit some or all of the data. > > > > Hence, caching the server MSS can enhance performance. > > > > > > > > Without a cached server MSS, the amount of data in the SYN packet is > > > > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220 > > > > bytes for IPv6 [RFC2460]. Even if the client complies with this > > > > limit when sending the SYN, it is known that an IPv4 receiver > > > > advertising an MSS less than 536 bytes can receive a segment larger > > > > than it is expecting. > > > > > > > > If the cached MSS is larger than the typical size (1460 bytes for > > > > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet > > > > may cause problems that offset the performance benefit of Fast Open. > > > > For example, the unusually large SYN may trigger IP fragmentation and > > > > may confuse firewalls or middleboxes, causing SYN retransmission and > > > > other side effects. Therefore, the client MAY limit the cached MSS > > > > to 1460 bytes for IPv4 or 1440 for IPv6. > > > > > > > > > > > > Relying on ICMP is fragile, since they can be filtered in some way. > > > > > > In this case I am not relying on the ICMP, but thought that since I > > > have it I should make use of it. WIthout the ICMP we would still just > > > be waiting on the retransmit timer. > > > > > > The problem case has a v6-in-v6 tunnel between the client and the > > > endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS > > > which works fine until they actually go to send a large packet between > > > the two. At that point the tunnel is triggering an ICMP_TOOBIG and the > > > endpoint is stalling since the MSS is dropped to 1400, but the SYN and > > > data payload were already smaller than that so no retransmits are > > > being triggered. This results in TFO being 1s slower than non-TFO > > > because of the failure to trigger the retransmit for the frame that > > > violated the PMTU. The patch is meant to get the two back into > > > comparable times. > > > > Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent) > > code does not yet handle the retransmit in TCP_SYN_SENT state ? > > The problem lies in tcp_simple_retransmit(). Specifically the loop at > the start of the function goes to check the retransmit queue to see if > there are any packets larger than MSS and finds none since we don't > place the SYN w/ data in there and instead have a separate SYN and > data packet. > > I'm debating if I should take an alternative approach and modify the > loop at the start of tcp_simple_transmit to add a check for a SYN > packet, tp->syn_data being set, and then comparing the next frame > length + MAX_TCP_HEADER_OPTIONS versus mss. Thanks for bringing up this tricky issue. The root cause seems to be the special arrangement of storing SYN-data as one-(pure)-SYN and one non-SYN data segment. Given tcp_simple_transmit probably is not called frequently, your alternative approach sounds more appealing to me. Replacing that strange syn|data arrangement for TFO has been on my wish list for a long time... Ideally it's better to just store the SYN+data and just carve out the SYN for retransmit.
diff --git a/include/net/tcp.h b/include/net/tcp.h index d4ef5bf94168..6181ad98727a 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2062,6 +2062,7 @@ void tcp_init(void); /* tcp_recovery.c */ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb); +void tcp_mark_syn_lost(struct sock *sk); void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced); extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, u32 reo_wnd); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 389d1b340248..d0c5248bc4e1 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1066,6 +1066,14 @@ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb) } } +void tcp_mark_syn_lost(struct sock *sk) +{ + struct sk_buff *skb = tcp_rtx_queue_head(sk); + + if (skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN) + tcp_mark_skb_lost(sk, skb); +} + /* Updates the delivered and delivered_ce counts */ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered, bool ece_ack) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8391aa29e7a4..ad62fe029646 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -546,6 +546,12 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) if (sk->sk_state == TCP_LISTEN) goto out; + /* fastopen SYN may have triggered the fragmentation + * request. Mark the SYN or SYN/ACK as lost. + */ + if (sk->sk_state == TCP_SYN_SENT) + tcp_mark_syn_lost(sk); + tp->mtu_info = info; if (!sock_owned_by_user(sk)) { tcp_v4_mtu_reduced(sk); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 992cbf3eb9e3..d7b1346863e3 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -443,6 +443,10 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, if (!ip6_sk_accept_pmtu(sk)) goto out; + /* fastopen SYN may have triggered TOOBIG, mark it lost. */ + if (sk->sk_state == TCP_SYN_SENT) + tcp_mark_syn_lost(sk); + tp->mtu_info = ntohl(info); if (!sock_owned_by_user(sk)) tcp_v6_mtu_reduced(sk);