Message ID | 20220721204404.388396-1-weiwan@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 4d8f24eeedc58d5f87b650ddda73c16e8ba56559 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] Revert "tcp: change pingpong threshold to 3" | expand |
On Thu, Jul 21, 2022 at 4:44 PM Wei Wang <weiwan@google.com> wrote: > > This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > This to-be-reverted commit was meant to apply a stricter rule for the > stack to enter pingpong mode. However, the condition used to check for > interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > jiffy based and might be too coarse, which delays the stack entering > pingpong mode. > We revert this patch so that we no longer use the above condition to > determine interactive session, and also reduce pingpong threshold to 1. > > Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > Reported-by: LemmyHuang <hlm3280@163.com> > Suggested-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: Wei Wang <weiwan@google.com> > > --- > v2: added Fixes tag Acked-by: Neal Cardwell <ncardwell@google.com> Looks great to me. Thanks, Wei! neal
On Thu, Jul 21, 2022 at 10:56 PM Neal Cardwell <ncardwell@google.com> wrote: > > On Thu, Jul 21, 2022 at 4:44 PM Wei Wang <weiwan@google.com> wrote: > > > > This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > > > This to-be-reverted commit was meant to apply a stricter rule for the > > stack to enter pingpong mode. However, the condition used to check for > > interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > > jiffy based and might be too coarse, which delays the stack entering > > pingpong mode. > > We revert this patch so that we no longer use the above condition to > > determine interactive session, and also reduce pingpong threshold to 1. > > > > Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > > Reported-by: LemmyHuang <hlm3280@163.com> > > Suggested-by: Neal Cardwell <ncardwell@google.com> > > Signed-off-by: Wei Wang <weiwan@google.com> > > > > --- > > v2: added Fixes tag > > Acked-by: Neal Cardwell <ncardwell@google.com> > > Looks great to me. Thanks, Wei! Reviewed-by: Eric Dumazet <edumazet@google.com>
Hello: This patch was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Thu, 21 Jul 2022 20:44:04 +0000 you wrote: > This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > This to-be-reverted commit was meant to apply a stricter rule for the > stack to enter pingpong mode. However, the condition used to check for > interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > jiffy based and might be too coarse, which delays the stack entering > pingpong mode. > We revert this patch so that we no longer use the above condition to > determine interactive session, and also reduce pingpong threshold to 1. > > [...] Here is the summary with links: - [net,v2] Revert "tcp: change pingpong threshold to 3" https://git.kernel.org/netdev/net/c/4d8f24eeedc5 You are awesome, thank you!
On 21. 07. 22, 22:44, Wei Wang wrote: > This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > This to-be-reverted commit was meant to apply a stricter rule for the > stack to enter pingpong mode. However, the condition used to check for > interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > jiffy based and might be too coarse, which delays the stack entering > pingpong mode. > We revert this patch so that we no longer use the above condition to > determine interactive session, and also reduce pingpong threshold to 1. > > Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > Reported-by: LemmyHuang <hlm3280@163.com> > Suggested-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: Wei Wang <weiwan@google.com> This breaks python-eventlet [1] (and was backported to stable trees): ________________ TestHttpd.test_018b_http_10_keepalive_framing
On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > On 21. 07. 22, 22:44, Wei Wang wrote: > > This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > > > This to-be-reverted commit was meant to apply a stricter rule for the > > stack to enter pingpong mode. However, the condition used to check for > > interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > > jiffy based and might be too coarse, which delays the stack entering > > pingpong mode. > > We revert this patch so that we no longer use the above condition to > > determine interactive session, and also reduce pingpong threshold to 1. > > > > Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > > Reported-by: LemmyHuang <hlm3280@163.com> > > Suggested-by: Neal Cardwell <ncardwell@google.com> > > Signed-off-by: Wei Wang <weiwan@google.com> > > > This breaks python-eventlet [1] (and was backported to stable trees): > ________________ TestHttpd.test_018b_http_10_keepalive_framing > _________________ > > self = <tests.wsgi_test.TestHttpd > testMethod=test_018b_http_10_keepalive_framing> > > def test_018b_http_10_keepalive_framing(self): > # verify that if an http/1.0 client sends connection: keep-alive > # that we don't mangle the request framing if the app doesn't > read the request > def app(environ, start_response): > resp_body = { > '/1': b'first response', > '/2': b'second response', > '/3': b'third response', > }.get(environ['PATH_INFO']) > if resp_body is None: > resp_body = 'Unexpected path: ' + environ['PATH_INFO'] > if six.PY3: > resp_body = resp_body.encode('latin1') > # Never look at wsgi.input! > start_response('200 OK', [('Content-type', 'text/plain')]) > return [resp_body] > > self.site.application = app > sock = eventlet.connect(self.server_addr) > req_body = b'GET /tricksy HTTP/1.1\r\n' > body_len = str(len(req_body)).encode('ascii') > > sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: > localhost\r\nConnection: keep-alive\r\n' > b'Content-Length: ' + body_len + b'\r\n\r\n' + > req_body) > result1 = read_http(sock) > self.assertEqual(b'first response', result1.body) > self.assertEqual(result1.headers_original.get('Connection'), > 'keep-alive') > > sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: > localhost\r\nConnection: keep-alive\r\n' > b'Content-Length: ' + body_len + b'\r\nExpect: > 100-continue\r\n\r\n') > # Client may have a short timeout waiting on that 100 Continue > # and basically immediately send its body > sock.sendall(req_body) > result2 = read_http(sock) > self.assertEqual(b'second response', result2.body) > self.assertEqual(result2.headers_original.get('Connection'), > 'close') > > > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: > localhost\r\nConnection: close\r\n\r\n') > > tests/wsgi_test.py:648: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ > eventlet/greenio/base.py:407: in sendall > tail = self.send(data, flags) > eventlet/greenio/base.py:401: in send > return self._send_loop(self.fd.send, data, flags) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ > > self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> > send_method = <built-in method send of socket object at 0x7f5f2f73d520> > data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: close\r\n\r\n' > args = (0,), _timeout_exc = timeout('timed out'), eno = 32 > > def _send_loop(self, send_method, data, *args): > if self.act_non_blocking: > return send_method(data, *args) > > _timeout_exc = socket_timeout('timed out') > while True: > try: > > return send_method(data, *args) > E BrokenPipeError: [Errno 32] Broken pipe > > eventlet/greenio/base.py:388: BrokenPipeError > ==================== > > Reverting this revert on the top of 5.19 solves the issue. > > Any ideas? Interesting. This revert should return the kernel back to the delayed ACK behavior it had for many years before May 2019 and Linux 5.1, which contains the commit it is reverting: 4a41f453bedfd tcp: change pingpong threshold to 3 It sounds like perhaps this test you mention has an implicit dependence on the timing of delayed ACKs. A few questions: (1) What are the timeout values in this test? If there is some implicit or explicit timeout value less than the typical Linux TCP 40ms delayed ACK timer value then this could be the problem. If you make sure all timeouts are at least, say, 300ms then this should remove dependencies on delayed ACK behavior (and make the test more portable). (2) Does this test use the TCP_NODELAY socket option to disable Nagle's algorithm? Presumably it should, given that it's a network app that cares about latency. Omitting the TCP_NODELAY socket option can cause request/response traffic to depend on delayed ACK behavior. (3) If (1) and (2) do not fix the test, would you be able to provide binary .pcap traces of the behavior with the test (a) passing and (b) failing? For example: sudo tcpdump -i any -w /tmp/trace.pcap -s 100 port 80 & # run test killall tcpdump thanks, neal
On 06. 08. 22, 13:24, Neal Cardwell wrote: > On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: >> >> On 21. 07. 22, 22:44, Wei Wang wrote: >>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. >>> >>> This to-be-reverted commit was meant to apply a stricter rule for the >>> stack to enter pingpong mode. However, the condition used to check for >>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is >>> jiffy based and might be too coarse, which delays the stack entering >>> pingpong mode. >>> We revert this patch so that we no longer use the above condition to >>> determine interactive session, and also reduce pingpong threshold to 1. >>> >>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") >>> Reported-by: LemmyHuang <hlm3280@163.com> >>> Suggested-by: Neal Cardwell <ncardwell@google.com> >>> Signed-off-by: Wei Wang <weiwan@google.com> >> >> >> This breaks python-eventlet [1] (and was backported to stable trees): >> ________________ TestHttpd.test_018b_http_10_keepalive_framing >> _________________ >> >> self = <tests.wsgi_test.TestHttpd >> testMethod=test_018b_http_10_keepalive_framing> >> >> def test_018b_http_10_keepalive_framing(self): >> # verify that if an http/1.0 client sends connection: keep-alive >> # that we don't mangle the request framing if the app doesn't >> read the request >> def app(environ, start_response): >> resp_body = { >> '/1': b'first response', >> '/2': b'second response', >> '/3': b'third response', >> }.get(environ['PATH_INFO']) >> if resp_body is None: >> resp_body = 'Unexpected path: ' + environ['PATH_INFO'] >> if six.PY3: >> resp_body = resp_body.encode('latin1') >> # Never look at wsgi.input! >> start_response('200 OK', [('Content-type', 'text/plain')]) >> return [resp_body] >> >> self.site.application = app >> sock = eventlet.connect(self.server_addr) >> req_body = b'GET /tricksy HTTP/1.1\r\n' >> body_len = str(len(req_body)).encode('ascii') >> >> sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: >> localhost\r\nConnection: keep-alive\r\n' >> b'Content-Length: ' + body_len + b'\r\n\r\n' + >> req_body) >> result1 = read_http(sock) >> self.assertEqual(b'first response', result1.body) >> self.assertEqual(result1.headers_original.get('Connection'), >> 'keep-alive') >> >> sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: >> localhost\r\nConnection: keep-alive\r\n' >> b'Content-Length: ' + body_len + b'\r\nExpect: >> 100-continue\r\n\r\n') >> # Client may have a short timeout waiting on that 100 Continue >> # and basically immediately send its body >> sock.sendall(req_body) >> result2 = read_http(sock) >> self.assertEqual(b'second response', result2.body) >> self.assertEqual(result2.headers_original.get('Connection'), >> 'close') >> >> > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: >> localhost\r\nConnection: close\r\n\r\n') >> >> tests/wsgi_test.py:648: >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >> _ _ _ _ >> eventlet/greenio/base.py:407: in sendall >> tail = self.send(data, flags) >> eventlet/greenio/base.py:401: in send >> return self._send_loop(self.fd.send, data, flags) >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >> _ _ _ _ >> >> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> >> send_method = <built-in method send of socket object at 0x7f5f2f73d520> >> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: close\r\n\r\n' >> args = (0,), _timeout_exc = timeout('timed out'), eno = 32 >> >> def _send_loop(self, send_method, data, *args): >> if self.act_non_blocking: >> return send_method(data, *args) >> >> _timeout_exc = socket_timeout('timed out') >> while True: >> try: >> > return send_method(data, *args) >> E BrokenPipeError: [Errno 32] Broken pipe >> >> eventlet/greenio/base.py:388: BrokenPipeError >> ==================== >> >> Reverting this revert on the top of 5.19 solves the issue. >> >> Any ideas? > > Interesting. This revert should return the kernel back to the delayed > ACK behavior it had for many years before May 2019 and Linux 5.1, > which contains the commit it is reverting: > > 4a41f453bedfd tcp: change pingpong threshold to 3 > > It sounds like perhaps this test you mention has an implicit > dependence on the timing of delayed ACKs. > > A few questions: Dunno. I am only an openSUSE kernel maintainer and this popped out at me. Feel free to dig to eventlet's sources on your own :P. > (1) What are the timeout values in this test? If there is some > implicit or explicit timeout value less than the typical Linux TCP > 40ms delayed ACK timer value then this could be the problem. If you > make sure all timeouts are at least, say, 300ms then this should > remove dependencies on delayed ACK behavior (and make the test more > portable). > > (2) Does this test use the TCP_NODELAY socket option to disable > Nagle's algorithm? Presumably it should, given that it's a network app > that cares about latency. Omitting the TCP_NODELAY socket option can > cause request/response traffic to depend on delayed ACK behavior. > > (3) If (1) and (2) do not fix the test, would you be able to provide > binary .pcap traces of the behavior with the test (a) passing and (b) > failing? For example: > sudo tcpdump -i any -w /tmp/trace.pcap -s 100 port 80 & > # run test > killall tcpdump > > thanks, > neal
On 06. 08. 22, 16:41, Jiri Slaby wrote: > On 06. 08. 22, 13:24, Neal Cardwell wrote: >> On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: >>> >>> On 21. 07. 22, 22:44, Wei Wang wrote: >>>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. >>>> >>>> This to-be-reverted commit was meant to apply a stricter rule for the >>>> stack to enter pingpong mode. However, the condition used to check for >>>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is >>>> jiffy based and might be too coarse, which delays the stack entering >>>> pingpong mode. >>>> We revert this patch so that we no longer use the above condition to >>>> determine interactive session, and also reduce pingpong threshold to 1. >>>> >>>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") >>>> Reported-by: LemmyHuang <hlm3280@163.com> >>>> Suggested-by: Neal Cardwell <ncardwell@google.com> >>>> Signed-off-by: Wei Wang <weiwan@google.com> >>> >>> >>> This breaks python-eventlet [1] (and was backported to stable trees): >>> ________________ TestHttpd.test_018b_http_10_keepalive_framing >>> _________________ >>> >>> self = <tests.wsgi_test.TestHttpd >>> testMethod=test_018b_http_10_keepalive_framing> >>> >>> def test_018b_http_10_keepalive_framing(self): >>> # verify that if an http/1.0 client sends connection: >>> keep-alive >>> # that we don't mangle the request framing if the app doesn't >>> read the request >>> def app(environ, start_response): >>> resp_body = { >>> '/1': b'first response', >>> '/2': b'second response', >>> '/3': b'third response', >>> }.get(environ['PATH_INFO']) >>> if resp_body is None: >>> resp_body = 'Unexpected path: ' + environ['PATH_INFO'] >>> if six.PY3: >>> resp_body = resp_body.encode('latin1') >>> # Never look at wsgi.input! >>> start_response('200 OK', [('Content-type', 'text/plain')]) >>> return [resp_body] >>> >>> self.site.application = app >>> sock = eventlet.connect(self.server_addr) >>> req_body = b'GET /tricksy HTTP/1.1\r\n' >>> body_len = str(len(req_body)).encode('ascii') >>> >>> sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: >>> localhost\r\nConnection: keep-alive\r\n' >>> b'Content-Length: ' + body_len + b'\r\n\r\n' + >>> req_body) >>> result1 = read_http(sock) >>> self.assertEqual(b'first response', result1.body) >>> self.assertEqual(result1.headers_original.get('Connection'), >>> 'keep-alive') >>> >>> sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: >>> localhost\r\nConnection: keep-alive\r\n' >>> b'Content-Length: ' + body_len + b'\r\nExpect: >>> 100-continue\r\n\r\n') >>> # Client may have a short timeout waiting on that 100 Continue >>> # and basically immediately send its body >>> sock.sendall(req_body) >>> result2 = read_http(sock) >>> self.assertEqual(b'second response', result2.body) >>> self.assertEqual(result2.headers_original.get('Connection'), >>> 'close') >>> >>> > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: >>> localhost\r\nConnection: close\r\n\r\n') >>> >>> tests/wsgi_test.py:648: >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>> _ _ _ _ >>> eventlet/greenio/base.py:407: in sendall >>> tail = self.send(data, flags) >>> eventlet/greenio/base.py:401: in send >>> return self._send_loop(self.fd.send, data, flags) >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>> _ _ _ _ >>> >>> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> >>> send_method = <built-in method send of socket object at 0x7f5f2f73d520> >>> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: >>> close\r\n\r\n' >>> args = (0,), _timeout_exc = timeout('timed out'), eno = 32 >>> >>> def _send_loop(self, send_method, data, *args): >>> if self.act_non_blocking: >>> return send_method(data, *args) >>> >>> _timeout_exc = socket_timeout('timed out') >>> while True: >>> try: >>> > return send_method(data, *args) >>> E BrokenPipeError: [Errno 32] Broken pipe >>> >>> eventlet/greenio/base.py:388: BrokenPipeError >>> ==================== >>> >>> Reverting this revert on the top of 5.19 solves the issue. >>> >>> Any ideas? >> >> Interesting. This revert should return the kernel back to the delayed >> ACK behavior it had for many years before May 2019 and Linux 5.1, >> which contains the commit it is reverting: >> >> 4a41f453bedfd tcp: change pingpong threshold to 3 >> >> It sounds like perhaps this test you mention has an implicit >> dependence on the timing of delayed ACKs. >> >> A few questions: > > Dunno. I am only an openSUSE kernel maintainer and this popped out at > me. Feel free to dig to eventlet's sources on your own :P. Any updates on this or should I send a revert directly? The "before() &&" part of the patch makes the difference. That is this diff: --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp, * and it is a reply for ato after last received packet, * increase pingpong count. */ - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) + pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u ping=%u\n", + __func__, sk, sk->sk_addrpair, sk->sk_portpair, now, + tp->lsndtime, icsk->icsk_ack.lrcvtime, + inet_csk(sk)->icsk_ack.pingpong); + if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && + (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) { inet_csk_inc_pingpong_cnt(sk); + pr_info("\tINC ping=%u before=%u\n", + inet_csk(sk)->icsk_ack.pingpong, + before(tp->lsndtime, icsk->icsk_ack.lrcvtime)); + } tp->lsndtime = now; } makes it work again, and outputs this: > TCP: tcp_event_data_sent: sk=00000000fd67cf8d (100007f0100007f:e858b18b) now=4294902140 lsndtime=4294902140 lrcvtime=4294902140 ping=0 > TCP: tcp_event_data_sent: sk=00000000a4becf82 (100007f0100007f:8bb158e8) now=4294902143 lsndtime=4294902140 lrcvtime=4294902142 ping=0 > TCP: INC ping=1 before=1 > TCP: tcp_event_data_sent: sk=00000000fd67cf8d (100007f0100007f:e858b18b) now=4294902145 lsndtime=4294902140 lrcvtime=4294902144 ping=0 > TCP: INC ping=1 before=1 > TCP: tcp_event_data_sent: sk=00000000fd67cf8d (100007f0100007f:e858b18b) now=4294902147 lsndtime=4294902145 lrcvtime=4294902144 ping=1 > TCP: INC ping=2 before=0 IMO, this "before=0" is the "source" of the problem. But I have no idea what this means at all... > TCP: tcp_event_data_sent: sk=00000000a4becf82 (100007f0100007f:8bb158e8) now=4294902149 lsndtime=4294902143 lrcvtime=4294902148 ping=1 > TCP: INC ping=2 before=1 > TCP: tcp_event_data_sent: sk=00000000fd67cf8d (100007f0100007f:e858b18b) now=4294902151 lsndtime=4294902147 lrcvtime=4294902150 ping=3 > TCP: INC ping=4 before=1 > TCP: tcp_event_data_sent: sk=00000000c7a417e9 (100007f0100007f:e85ab18b) now=4294902153 lsndtime=4294902153 lrcvtime=4294902153 ping=0 > TCP: tcp_event_data_sent: sk=000000008681183e (100007f0100007f:8bb15ae8) now=4294902155 lsndtime=4294902153 lrcvtime=4294902154 ping=0 > TCP: INC ping=1 before=1 >> (1) What are the timeout values in this test? If there is some >> implicit or explicit timeout value less than the typical Linux TCP >> 40ms delayed ACK timer value then this could be the problem. If you >> make sure all timeouts are at least, say, 300ms then this should >> remove dependencies on delayed ACK behavior (and make the test more >> portable). >> >> (2) Does this test use the TCP_NODELAY socket option to disable >> Nagle's algorithm? Presumably it should, given that it's a network app >> that cares about latency. Omitting the TCP_NODELAY socket option can >> cause request/response traffic to depend on delayed ACK behavior. >> >> (3) If (1) and (2) do not fix the test, would you be able to provide >> binary .pcap traces of the behavior with the test (a) passing and (b) >> failing? For example: >> sudo tcpdump -i any -w /tmp/trace.pcap -s 100 port 80 & >> # run test >> killall tcpdump >> >> thanks, >> neal > > thanks,
On Mon, Aug 15, 2022 at 3:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > On 06. 08. 22, 16:41, Jiri Slaby wrote: > > On 06. 08. 22, 13:24, Neal Cardwell wrote: > >> On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: > >>> > >>> On 21. 07. 22, 22:44, Wei Wang wrote: > >>>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > >>>> > >>>> This to-be-reverted commit was meant to apply a stricter rule for the > >>>> stack to enter pingpong mode. However, the condition used to check for > >>>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > >>>> jiffy based and might be too coarse, which delays the stack entering > >>>> pingpong mode. > >>>> We revert this patch so that we no longer use the above condition to > >>>> determine interactive session, and also reduce pingpong threshold to 1. > >>>> > >>>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > >>>> Reported-by: LemmyHuang <hlm3280@163.com> > >>>> Suggested-by: Neal Cardwell <ncardwell@google.com> > >>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>> > >>> > >>> This breaks python-eventlet [1] (and was backported to stable trees): > >>> ________________ TestHttpd.test_018b_http_10_keepalive_framing > >>> _________________ > >>> > >>> self = <tests.wsgi_test.TestHttpd > >>> testMethod=test_018b_http_10_keepalive_framing> > >>> > >>> def test_018b_http_10_keepalive_framing(self): > >>> # verify that if an http/1.0 client sends connection: > >>> keep-alive > >>> # that we don't mangle the request framing if the app doesn't > >>> read the request > >>> def app(environ, start_response): > >>> resp_body = { > >>> '/1': b'first response', > >>> '/2': b'second response', > >>> '/3': b'third response', > >>> }.get(environ['PATH_INFO']) > >>> if resp_body is None: > >>> resp_body = 'Unexpected path: ' + environ['PATH_INFO'] > >>> if six.PY3: > >>> resp_body = resp_body.encode('latin1') > >>> # Never look at wsgi.input! > >>> start_response('200 OK', [('Content-type', 'text/plain')]) > >>> return [resp_body] > >>> > >>> self.site.application = app > >>> sock = eventlet.connect(self.server_addr) > >>> req_body = b'GET /tricksy HTTP/1.1\r\n' > >>> body_len = str(len(req_body)).encode('ascii') > >>> > >>> sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: > >>> localhost\r\nConnection: keep-alive\r\n' > >>> b'Content-Length: ' + body_len + b'\r\n\r\n' + > >>> req_body) > >>> result1 = read_http(sock) > >>> self.assertEqual(b'first response', result1.body) > >>> self.assertEqual(result1.headers_original.get('Connection'), > >>> 'keep-alive') > >>> > >>> sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: > >>> localhost\r\nConnection: keep-alive\r\n' > >>> b'Content-Length: ' + body_len + b'\r\nExpect: > >>> 100-continue\r\n\r\n') > >>> # Client may have a short timeout waiting on that 100 Continue > >>> # and basically immediately send its body > >>> sock.sendall(req_body) > >>> result2 = read_http(sock) > >>> self.assertEqual(b'second response', result2.body) > >>> self.assertEqual(result2.headers_original.get('Connection'), > >>> 'close') > >>> > >>> > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: > >>> localhost\r\nConnection: close\r\n\r\n') > >>> > >>> tests/wsgi_test.py:648: > >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > >>> _ _ _ _ > >>> eventlet/greenio/base.py:407: in sendall > >>> tail = self.send(data, flags) > >>> eventlet/greenio/base.py:401: in send > >>> return self._send_loop(self.fd.send, data, flags) > >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > >>> _ _ _ _ > >>> > >>> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> > >>> send_method = <built-in method send of socket object at 0x7f5f2f73d520> > >>> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: > >>> close\r\n\r\n' > >>> args = (0,), _timeout_exc = timeout('timed out'), eno = 32 > >>> > >>> def _send_loop(self, send_method, data, *args): > >>> if self.act_non_blocking: > >>> return send_method(data, *args) > >>> > >>> _timeout_exc = socket_timeout('timed out') > >>> while True: > >>> try: > >>> > return send_method(data, *args) > >>> E BrokenPipeError: [Errno 32] Broken pipe > >>> > >>> eventlet/greenio/base.py:388: BrokenPipeError > >>> ==================== > >>> > >>> Reverting this revert on the top of 5.19 solves the issue. > >>> > >>> Any ideas? > >> > >> Interesting. This revert should return the kernel back to the delayed > >> ACK behavior it had for many years before May 2019 and Linux 5.1, > >> which contains the commit it is reverting: > >> > >> 4a41f453bedfd tcp: change pingpong threshold to 3 > >> > >> It sounds like perhaps this test you mention has an implicit > >> dependence on the timing of delayed ACKs. > >> > >> A few questions: > > > > Dunno. I am only an openSUSE kernel maintainer and this popped out at > > me. Feel free to dig to eventlet's sources on your own :P. > > Any updates on this or should I send a revert directly? > > The "before() &&" part of the patch makes the difference. That is this diff: > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp, > * and it is a reply for ato after last received packet, > * increase pingpong count. > */ > - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) > + pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u > ping=%u\n", > + __func__, sk, sk->sk_addrpair, sk->sk_portpair, now, > + tp->lsndtime, icsk->icsk_ack.lrcvtime, > + inet_csk(sk)->icsk_ack.pingpong); > + if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > + (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) { > inet_csk_inc_pingpong_cnt(sk); > + pr_info("\tINC ping=%u before=%u\n", > + inet_csk(sk)->icsk_ack.pingpong, > + before(tp->lsndtime, > icsk->icsk_ack.lrcvtime)); > + } > > tp->lsndtime = now; > } > > makes it work again, and outputs this: > > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > (100007f0100007f:e858b18b) now=4294902140 lsndtime=4294902140 > lrcvtime=4294902140 ping=0 > > TCP: tcp_event_data_sent: sk=00000000a4becf82 > (100007f0100007f:8bb158e8) now=4294902143 lsndtime=4294902140 > lrcvtime=4294902142 ping=0 > > TCP: INC ping=1 before=1 > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > (100007f0100007f:e858b18b) now=4294902145 lsndtime=4294902140 > lrcvtime=4294902144 ping=0 > > TCP: INC ping=1 before=1 > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > (100007f0100007f:e858b18b) now=4294902147 lsndtime=4294902145 > lrcvtime=4294902144 ping=1 > > TCP: INC ping=2 before=0 > > IMO, this "before=0" is the "source" of the problem. But I have no idea > what this means at all... > > > TCP: tcp_event_data_sent: sk=00000000a4becf82 > (100007f0100007f:8bb158e8) now=4294902149 lsndtime=4294902143 > lrcvtime=4294902148 ping=1 > > TCP: INC ping=2 before=1 > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > (100007f0100007f:e858b18b) now=4294902151 lsndtime=4294902147 > lrcvtime=4294902150 ping=3 > > TCP: INC ping=4 before=1 > > TCP: tcp_event_data_sent: sk=00000000c7a417e9 > (100007f0100007f:e85ab18b) now=4294902153 lsndtime=4294902153 > lrcvtime=4294902153 ping=0 > > TCP: tcp_event_data_sent: sk=000000008681183e > (100007f0100007f:8bb15ae8) now=4294902155 lsndtime=4294902153 > lrcvtime=4294902154 ping=0 > > TCP: INC ping=1 before=1 It sounds like this test has a very specific dependence on the buggy delayed ACK timing behavior from the buggy commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. IMHO I don't think we can revert a kernel bug fix based on a test that decided to depend on the exact timing of delayed ACKs during a time when that delayed ACK behavior was buggy. :-) best regards, neal
On Mon, Aug 15, 2022 at 6:30 AM Neal Cardwell <ncardwell@google.com> wrote: > > On Mon, Aug 15, 2022 at 3:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > > > On 06. 08. 22, 16:41, Jiri Slaby wrote: > > > On 06. 08. 22, 13:24, Neal Cardwell wrote: > > >> On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > >>> > > >>> On 21. 07. 22, 22:44, Wei Wang wrote: > > >>>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > >>>> > > >>>> This to-be-reverted commit was meant to apply a stricter rule for the > > >>>> stack to enter pingpong mode. However, the condition used to check for > > >>>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > > >>>> jiffy based and might be too coarse, which delays the stack entering > > >>>> pingpong mode. > > >>>> We revert this patch so that we no longer use the above condition to > > >>>> determine interactive session, and also reduce pingpong threshold to 1. > > >>>> > > >>>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > > >>>> Reported-by: LemmyHuang <hlm3280@163.com> > > >>>> Suggested-by: Neal Cardwell <ncardwell@google.com> > > >>>> Signed-off-by: Wei Wang <weiwan@google.com> > > >>> > > >>> > > >>> This breaks python-eventlet [1] (and was backported to stable trees): > > >>> ________________ TestHttpd.test_018b_http_10_keepalive_framing > > >>> _________________ > > >>> > > >>> self = <tests.wsgi_test.TestHttpd > > >>> testMethod=test_018b_http_10_keepalive_framing> > > >>> > > >>> def test_018b_http_10_keepalive_framing(self): > > >>> # verify that if an http/1.0 client sends connection: > > >>> keep-alive > > >>> # that we don't mangle the request framing if the app doesn't > > >>> read the request > > >>> def app(environ, start_response): > > >>> resp_body = { > > >>> '/1': b'first response', > > >>> '/2': b'second response', > > >>> '/3': b'third response', > > >>> }.get(environ['PATH_INFO']) > > >>> if resp_body is None: > > >>> resp_body = 'Unexpected path: ' + environ['PATH_INFO'] > > >>> if six.PY3: > > >>> resp_body = resp_body.encode('latin1') > > >>> # Never look at wsgi.input! > > >>> start_response('200 OK', [('Content-type', 'text/plain')]) > > >>> return [resp_body] > > >>> > > >>> self.site.application = app > > >>> sock = eventlet.connect(self.server_addr) > > >>> req_body = b'GET /tricksy HTTP/1.1\r\n' > > >>> body_len = str(len(req_body)).encode('ascii') > > >>> > > >>> sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: > > >>> localhost\r\nConnection: keep-alive\r\n' > > >>> b'Content-Length: ' + body_len + b'\r\n\r\n' + > > >>> req_body) > > >>> result1 = read_http(sock) > > >>> self.assertEqual(b'first response', result1.body) > > >>> self.assertEqual(result1.headers_original.get('Connection'), > > >>> 'keep-alive') > > >>> > > >>> sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: > > >>> localhost\r\nConnection: keep-alive\r\n' > > >>> b'Content-Length: ' + body_len + b'\r\nExpect: > > >>> 100-continue\r\n\r\n') > > >>> # Client may have a short timeout waiting on that 100 Continue > > >>> # and basically immediately send its body > > >>> sock.sendall(req_body) > > >>> result2 = read_http(sock) > > >>> self.assertEqual(b'second response', result2.body) > > >>> self.assertEqual(result2.headers_original.get('Connection'), > > >>> 'close') > > >>> > > >>> > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: > > >>> localhost\r\nConnection: close\r\n\r\n') > > >>> > > >>> tests/wsgi_test.py:648: > > >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > >>> _ _ _ _ > > >>> eventlet/greenio/base.py:407: in sendall > > >>> tail = self.send(data, flags) > > >>> eventlet/greenio/base.py:401: in send > > >>> return self._send_loop(self.fd.send, data, flags) > > >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > >>> _ _ _ _ > > >>> > > >>> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> > > >>> send_method = <built-in method send of socket object at 0x7f5f2f73d520> > > >>> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: > > >>> close\r\n\r\n' > > >>> args = (0,), _timeout_exc = timeout('timed out'), eno = 32 > > >>> > > >>> def _send_loop(self, send_method, data, *args): > > >>> if self.act_non_blocking: > > >>> return send_method(data, *args) > > >>> > > >>> _timeout_exc = socket_timeout('timed out') > > >>> while True: > > >>> try: > > >>> > return send_method(data, *args) > > >>> E BrokenPipeError: [Errno 32] Broken pipe > > >>> > > >>> eventlet/greenio/base.py:388: BrokenPipeError > > >>> ==================== > > >>> > > >>> Reverting this revert on the top of 5.19 solves the issue. > > >>> > > >>> Any ideas? > > >> > > >> Interesting. This revert should return the kernel back to the delayed > > >> ACK behavior it had for many years before May 2019 and Linux 5.1, > > >> which contains the commit it is reverting: > > >> > > >> 4a41f453bedfd tcp: change pingpong threshold to 3 > > >> > > >> It sounds like perhaps this test you mention has an implicit > > >> dependence on the timing of delayed ACKs. > > >> > > >> A few questions: > > > > > > Dunno. I am only an openSUSE kernel maintainer and this popped out at > > > me. Feel free to dig to eventlet's sources on your own :P. > > > > Any updates on this or should I send a revert directly? > > > > The "before() &&" part of the patch makes the difference. That is this diff: > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp, > > * and it is a reply for ato after last received packet, > > * increase pingpong count. > > */ > > - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > > - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) > > + pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u > > ping=%u\n", > > + __func__, sk, sk->sk_addrpair, sk->sk_portpair, now, > > + tp->lsndtime, icsk->icsk_ack.lrcvtime, > > + inet_csk(sk)->icsk_ack.pingpong); > > + if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > > + (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) { > > inet_csk_inc_pingpong_cnt(sk); > > + pr_info("\tINC ping=%u before=%u\n", > > + inet_csk(sk)->icsk_ack.pingpong, > > + before(tp->lsndtime, > > icsk->icsk_ack.lrcvtime)); > > + } > > > > tp->lsndtime = now; > > } > > > > makes it work again, and outputs this: Is the above patch made on top of my reverted patch? It seems not according to this part of diff. Then what is the definition of TCP_PINGPONG_THRESH in the working case? I think that is the key, regardless of the result of: before(tp->lsndtime, icsk->icsk_ack.lrcvtime) I tried to look into what exactly the test is doing, and can't tell why it is failing. I don't see any check that is based on the timing of the reply. :( I hope someone could explain more about what this test is doing. > > > > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > > (100007f0100007f:e858b18b) now=4294902140 lsndtime=4294902140 > > lrcvtime=4294902140 ping=0 > > > TCP: tcp_event_data_sent: sk=00000000a4becf82 > > (100007f0100007f:8bb158e8) now=4294902143 lsndtime=4294902140 > > lrcvtime=4294902142 ping=0 > > > TCP: INC ping=1 before=1 > > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > > (100007f0100007f:e858b18b) now=4294902145 lsndtime=4294902140 > > lrcvtime=4294902144 ping=0 > > > TCP: INC ping=1 before=1 > > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > > (100007f0100007f:e858b18b) now=4294902147 lsndtime=4294902145 > > lrcvtime=4294902144 ping=1 > > > TCP: INC ping=2 before=0 > > > > IMO, this "before=0" is the "source" of the problem. But I have no idea > > what this means at all... > > > > > TCP: tcp_event_data_sent: sk=00000000a4becf82 > > (100007f0100007f:8bb158e8) now=4294902149 lsndtime=4294902143 > > lrcvtime=4294902148 ping=1 > > > TCP: INC ping=2 before=1 > > > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > > (100007f0100007f:e858b18b) now=4294902151 lsndtime=4294902147 > > lrcvtime=4294902150 ping=3 > > > TCP: INC ping=4 before=1 > > > TCP: tcp_event_data_sent: sk=00000000c7a417e9 > > (100007f0100007f:e85ab18b) now=4294902153 lsndtime=4294902153 > > lrcvtime=4294902153 ping=0 > > > TCP: tcp_event_data_sent: sk=000000008681183e > > (100007f0100007f:8bb15ae8) now=4294902155 lsndtime=4294902153 > > lrcvtime=4294902154 ping=0 > > > TCP: INC ping=1 before=1 > > It sounds like this test has a very specific dependence on the buggy > delayed ACK timing behavior from the buggy commit > 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > IMHO I don't think we can revert a kernel bug fix based on a test that > decided to depend on the exact timing of delayed ACKs during a time > when that delayed ACK behavior was buggy. :-) > > best regards, > neal
= On Mon, Aug 15, 2022 at 2:54 PM Wei Wang <weiwan@google.com> wrote: > > On Mon, Aug 15, 2022 at 6:30 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > On Mon, Aug 15, 2022 at 3:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > > > > > On 06. 08. 22, 16:41, Jiri Slaby wrote: > > > > On 06. 08. 22, 13:24, Neal Cardwell wrote: > > > >> On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > > >>> > > > >>> On 21. 07. 22, 22:44, Wei Wang wrote: > > > >>>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > > >>>> > > > >>>> This to-be-reverted commit was meant to apply a stricter rule for the > > > >>>> stack to enter pingpong mode. However, the condition used to check for > > > >>>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > > > >>>> jiffy based and might be too coarse, which delays the stack entering > > > >>>> pingpong mode. > > > >>>> We revert this patch so that we no longer use the above condition to > > > >>>> determine interactive session, and also reduce pingpong threshold to 1. > > > >>>> > > > >>>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > > > >>>> Reported-by: LemmyHuang <hlm3280@163.com> > > > >>>> Suggested-by: Neal Cardwell <ncardwell@google.com> > > > >>>> Signed-off-by: Wei Wang <weiwan@google.com> > > > >>> > > > >>> > > > >>> This breaks python-eventlet [1] (and was backported to stable trees): > > > >>> ________________ TestHttpd.test_018b_http_10_keepalive_framing > > > >>> _________________ > > > >>> > > > >>> self = <tests.wsgi_test.TestHttpd > > > >>> testMethod=test_018b_http_10_keepalive_framing> > > > >>> > > > >>> def test_018b_http_10_keepalive_framing(self): > > > >>> # verify that if an http/1.0 client sends connection: > > > >>> keep-alive > > > >>> # that we don't mangle the request framing if the app doesn't > > > >>> read the request > > > >>> def app(environ, start_response): > > > >>> resp_body = { > > > >>> '/1': b'first response', > > > >>> '/2': b'second response', > > > >>> '/3': b'third response', > > > >>> }.get(environ['PATH_INFO']) > > > >>> if resp_body is None: > > > >>> resp_body = 'Unexpected path: ' + environ['PATH_INFO'] > > > >>> if six.PY3: > > > >>> resp_body = resp_body.encode('latin1') > > > >>> # Never look at wsgi.input! > > > >>> start_response('200 OK', [('Content-type', 'text/plain')]) > > > >>> return [resp_body] > > > >>> > > > >>> self.site.application = app > > > >>> sock = eventlet.connect(self.server_addr) > > > >>> req_body = b'GET /tricksy HTTP/1.1\r\n' > > > >>> body_len = str(len(req_body)).encode('ascii') > > > >>> > > > >>> sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: > > > >>> localhost\r\nConnection: keep-alive\r\n' > > > >>> b'Content-Length: ' + body_len + b'\r\n\r\n' + > > > >>> req_body) > > > >>> result1 = read_http(sock) > > > >>> self.assertEqual(b'first response', result1.body) > > > >>> self.assertEqual(result1.headers_original.get('Connection'), > > > >>> 'keep-alive') > > > >>> > > > >>> sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: > > > >>> localhost\r\nConnection: keep-alive\r\n' > > > >>> b'Content-Length: ' + body_len + b'\r\nExpect: > > > >>> 100-continue\r\n\r\n') > > > >>> # Client may have a short timeout waiting on that 100 Continue > > > >>> # and basically immediately send its body > > > >>> sock.sendall(req_body) > > > >>> result2 = read_http(sock) > > > >>> self.assertEqual(b'second response', result2.body) > > > >>> self.assertEqual(result2.headers_original.get('Connection'), > > > >>> 'close') > > > >>> > > > >>> > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: > > > >>> localhost\r\nConnection: close\r\n\r\n') > > > >>> > > > >>> tests/wsgi_test.py:648: > > > >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > >>> _ _ _ _ > > > >>> eventlet/greenio/base.py:407: in sendall > > > >>> tail = self.send(data, flags) > > > >>> eventlet/greenio/base.py:401: in send > > > >>> return self._send_loop(self.fd.send, data, flags) > > > >>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > >>> _ _ _ _ > > > >>> > > > >>> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> > > > >>> send_method = <built-in method send of socket object at 0x7f5f2f73d520> > > > >>> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: > > > >>> close\r\n\r\n' > > > >>> args = (0,), _timeout_exc = timeout('timed out'), eno = 32 > > > >>> > > > >>> def _send_loop(self, send_method, data, *args): > > > >>> if self.act_non_blocking: > > > >>> return send_method(data, *args) > > > >>> > > > >>> _timeout_exc = socket_timeout('timed out') > > > >>> while True: > > > >>> try: > > > >>> > return send_method(data, *args) > > > >>> E BrokenPipeError: [Errno 32] Broken pipe > > > >>> > > > >>> eventlet/greenio/base.py:388: BrokenPipeError > > > >>> ==================== > > > >>> > > > >>> Reverting this revert on the top of 5.19 solves the issue. > > > >>> > > > >>> Any ideas? > > > >> > > > >> Interesting. This revert should return the kernel back to the delayed > > > >> ACK behavior it had for many years before May 2019 and Linux 5.1, > > > >> which contains the commit it is reverting: > > > >> > > > >> 4a41f453bedfd tcp: change pingpong threshold to 3 > > > >> > > > >> It sounds like perhaps this test you mention has an implicit > > > >> dependence on the timing of delayed ACKs. > > > >> > > > >> A few questions: > > > > > > > > Dunno. I am only an openSUSE kernel maintainer and this popped out at > > > > me. Feel free to dig to eventlet's sources on your own :P. > > > > > > Any updates on this or should I send a revert directly? > > > > > > The "before() &&" part of the patch makes the difference. That is this diff: > > > --- a/net/ipv4/tcp_output.c > > > +++ b/net/ipv4/tcp_output.c > > > @@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp, > > > * and it is a reply for ato after last received packet, > > > * increase pingpong count. > > > */ > > > - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > > > - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) > > > + pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u > > > ping=%u\n", > > > + __func__, sk, sk->sk_addrpair, sk->sk_portpair, now, > > > + tp->lsndtime, icsk->icsk_ack.lrcvtime, > > > + inet_csk(sk)->icsk_ack.pingpong); > > > + if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > > > + (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) { > > > inet_csk_inc_pingpong_cnt(sk); > > > + pr_info("\tINC ping=%u before=%u\n", > > > + inet_csk(sk)->icsk_ack.pingpong, > > > + before(tp->lsndtime, > > > icsk->icsk_ack.lrcvtime)); > > > + } > > > > > > tp->lsndtime = now; > > > } > > > > > > makes it work again, and outputs this: > > Is the above patch made on top of my reverted patch? It seems not > according to this part of diff. > Then what is the definition of TCP_PINGPONG_THRESH in the working > case? I think that is the key, regardless of the result of: > before(tp->lsndtime, icsk->icsk_ack.lrcvtime) > > I tried to look into what exactly the test is doing, and can't tell > why it is failing. I don't see any check that is based on the timing > of the reply. :( > I hope someone could explain more about what this test is doing. Yes, the test case is a bit hard to read. I have a conjecture about what might be going wrong: the eventlet client code connect() method in https://github.com/eventlet/eventlet/blob/master/eventlet/green/http/client.py uses: self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) to disable Nagle's algorithm. However, I don't see anything on the server side in https://github.com/eventlet/eventlet/blob/master/eventlet/green/http/server.py disabling Nagle's algorithm. AFAICT the server code is using TCPServer, so from skimming https://github.com/python/cpython/blob/3.10/Lib/socketserver.py ...it seems that the code should probably be setting disable_nagle_algorithm to True to disable Nagle's algorithm (disable_nagle_algorithm defaults to False). If the server code is indeed leaving Nagle's algorithm enabled (the TCP default), as I suspect here, that's a classic userspace/application bug, which means the server performance can be stalled by waiting for delayed ACKs from the remote side, which means changes in delayed ACK timing behavior (such as from this kernel patch) can cause the test to fail. The appropriate fix is probably to have the eventlet server code use disable_nagle_algorithm=True. Just a conjecture. :-) cheers, neal
Cc eventlet guys + Linus. On 15. 08. 22, 15:30, Neal Cardwell wrote: > On Mon, Aug 15, 2022 at 3:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: >> >> On 06. 08. 22, 16:41, Jiri Slaby wrote: >>> On 06. 08. 22, 13:24, Neal Cardwell wrote: >>>> On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: >>>>> >>>>> On 21. 07. 22, 22:44, Wei Wang wrote: >>>>>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. >>>>>> >>>>>> This to-be-reverted commit was meant to apply a stricter rule for the >>>>>> stack to enter pingpong mode. However, the condition used to check for >>>>>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is >>>>>> jiffy based and might be too coarse, which delays the stack entering >>>>>> pingpong mode. >>>>>> We revert this patch so that we no longer use the above condition to >>>>>> determine interactive session, and also reduce pingpong threshold to 1. >>>>>> >>>>>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") >>>>>> Reported-by: LemmyHuang <hlm3280@163.com> >>>>>> Suggested-by: Neal Cardwell <ncardwell@google.com> >>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>> >>>>> >>>>> This breaks python-eventlet [1] (and was backported to stable trees): >>>>> ________________ TestHttpd.test_018b_http_10_keepalive_framing >>>>> _________________ >>>>> >>>>> self = <tests.wsgi_test.TestHttpd >>>>> testMethod=test_018b_http_10_keepalive_framing> >>>>> >>>>> def test_018b_http_10_keepalive_framing(self): >>>>> # verify that if an http/1.0 client sends connection: >>>>> keep-alive >>>>> # that we don't mangle the request framing if the app doesn't >>>>> read the request >>>>> def app(environ, start_response): >>>>> resp_body = { >>>>> '/1': b'first response', >>>>> '/2': b'second response', >>>>> '/3': b'third response', >>>>> }.get(environ['PATH_INFO']) >>>>> if resp_body is None: >>>>> resp_body = 'Unexpected path: ' + environ['PATH_INFO'] >>>>> if six.PY3: >>>>> resp_body = resp_body.encode('latin1') >>>>> # Never look at wsgi.input! >>>>> start_response('200 OK', [('Content-type', 'text/plain')]) >>>>> return [resp_body] >>>>> >>>>> self.site.application = app >>>>> sock = eventlet.connect(self.server_addr) >>>>> req_body = b'GET /tricksy HTTP/1.1\r\n' >>>>> body_len = str(len(req_body)).encode('ascii') >>>>> >>>>> sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: >>>>> localhost\r\nConnection: keep-alive\r\n' >>>>> b'Content-Length: ' + body_len + b'\r\n\r\n' + >>>>> req_body) >>>>> result1 = read_http(sock) >>>>> self.assertEqual(b'first response', result1.body) >>>>> self.assertEqual(result1.headers_original.get('Connection'), >>>>> 'keep-alive') >>>>> >>>>> sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: >>>>> localhost\r\nConnection: keep-alive\r\n' >>>>> b'Content-Length: ' + body_len + b'\r\nExpect: >>>>> 100-continue\r\n\r\n') >>>>> # Client may have a short timeout waiting on that 100 Continue >>>>> # and basically immediately send its body >>>>> sock.sendall(req_body) >>>>> result2 = read_http(sock) >>>>> self.assertEqual(b'second response', result2.body) >>>>> self.assertEqual(result2.headers_original.get('Connection'), >>>>> 'close') >>>>> >>>>> > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: >>>>> localhost\r\nConnection: close\r\n\r\n') >>>>> >>>>> tests/wsgi_test.py:648: >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>>> _ _ _ _ >>>>> eventlet/greenio/base.py:407: in sendall >>>>> tail = self.send(data, flags) >>>>> eventlet/greenio/base.py:401: in send >>>>> return self._send_loop(self.fd.send, data, flags) >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >>>>> _ _ _ _ >>>>> >>>>> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> >>>>> send_method = <built-in method send of socket object at 0x7f5f2f73d520> >>>>> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: >>>>> close\r\n\r\n' >>>>> args = (0,), _timeout_exc = timeout('timed out'), eno = 32 >>>>> >>>>> def _send_loop(self, send_method, data, *args): >>>>> if self.act_non_blocking: >>>>> return send_method(data, *args) >>>>> >>>>> _timeout_exc = socket_timeout('timed out') >>>>> while True: >>>>> try: >>>>> > return send_method(data, *args) >>>>> E BrokenPipeError: [Errno 32] Broken pipe >>>>> >>>>> eventlet/greenio/base.py:388: BrokenPipeError >>>>> ==================== >>>>> >>>>> Reverting this revert on the top of 5.19 solves the issue. >>>>> >>>>> Any ideas? >>>> >>>> Interesting. This revert should return the kernel back to the delayed >>>> ACK behavior it had for many years before May 2019 and Linux 5.1, >>>> which contains the commit it is reverting: >>>> >>>> 4a41f453bedfd tcp: change pingpong threshold to 3 >>>> >>>> It sounds like perhaps this test you mention has an implicit >>>> dependence on the timing of delayed ACKs. >>>> >>>> A few questions: >>> >>> Dunno. I am only an openSUSE kernel maintainer and this popped out at >>> me. Feel free to dig to eventlet's sources on your own :P. >> >> Any updates on this or should I send a revert directly? >> >> The "before() &&" part of the patch makes the difference. That is this diff: >> --- a/net/ipv4/tcp_output.c >> +++ b/net/ipv4/tcp_output.c >> @@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp, >> * and it is a reply for ato after last received packet, >> * increase pingpong count. >> */ >> - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && >> - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) >> + pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u >> ping=%u\n", >> + __func__, sk, sk->sk_addrpair, sk->sk_portpair, now, >> + tp->lsndtime, icsk->icsk_ack.lrcvtime, >> + inet_csk(sk)->icsk_ack.pingpong); >> + if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && >> + (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) { >> inet_csk_inc_pingpong_cnt(sk); >> + pr_info("\tINC ping=%u before=%u\n", >> + inet_csk(sk)->icsk_ack.pingpong, >> + before(tp->lsndtime, >> icsk->icsk_ack.lrcvtime)); >> + } >> >> tp->lsndtime = now; >> } >> >> makes it work again, and outputs this: >> >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d >> (100007f0100007f:e858b18b) now=4294902140 lsndtime=4294902140 >> lrcvtime=4294902140 ping=0 >> > TCP: tcp_event_data_sent: sk=00000000a4becf82 >> (100007f0100007f:8bb158e8) now=4294902143 lsndtime=4294902140 >> lrcvtime=4294902142 ping=0 >> > TCP: INC ping=1 before=1 >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d >> (100007f0100007f:e858b18b) now=4294902145 lsndtime=4294902140 >> lrcvtime=4294902144 ping=0 >> > TCP: INC ping=1 before=1 >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d >> (100007f0100007f:e858b18b) now=4294902147 lsndtime=4294902145 >> lrcvtime=4294902144 ping=1 >> > TCP: INC ping=2 before=0 >> >> IMO, this "before=0" is the "source" of the problem. But I have no idea >> what this means at all... >> >> > TCP: tcp_event_data_sent: sk=00000000a4becf82 >> (100007f0100007f:8bb158e8) now=4294902149 lsndtime=4294902143 >> lrcvtime=4294902148 ping=1 >> > TCP: INC ping=2 before=1 >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d >> (100007f0100007f:e858b18b) now=4294902151 lsndtime=4294902147 >> lrcvtime=4294902150 ping=3 >> > TCP: INC ping=4 before=1 >> > TCP: tcp_event_data_sent: sk=00000000c7a417e9 >> (100007f0100007f:e85ab18b) now=4294902153 lsndtime=4294902153 >> lrcvtime=4294902153 ping=0 >> > TCP: tcp_event_data_sent: sk=000000008681183e >> (100007f0100007f:8bb15ae8) now=4294902155 lsndtime=4294902153 >> lrcvtime=4294902154 ping=0 >> > TCP: INC ping=1 before=1 > > It sounds like this test has a very specific dependence on the buggy > delayed ACK timing behavior from the buggy commit > 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > IMHO I don't think we can revert a kernel bug fix based on a test that > decided to depend on the exact timing of delayed ACKs during a time > when that delayed ACK behavior was buggy. :-) Unfortunately despite the test is likely bogus (I am unable to say it is or not), it does happen and the patch (the revert -- 4d8f24eeedc) breaks userspace. I'd say this is exactly the case where we apply "we do not break userspace". But I might be wrong as we might not care about silly tests. In anyway, openSUSE has to have the patch (the revert) reverted, so that the distro actually builds/works. (Until this is fixed on the eventlet side at least. And more importantly _until_ it propagates to distros or is fixed otherwise (like disabling the test).) And I suppose other distros would have to do the same. That is quite unfortunate :/. thanks,
On Tue, Aug 16, 2022 at 1:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > Cc eventlet guys + Linus. > > On 15. 08. 22, 15:30, Neal Cardwell wrote: > > On Mon, Aug 15, 2022 at 3:48 AM Jiri Slaby <jirislaby@kernel.org> wrote: > >> > >> On 06. 08. 22, 16:41, Jiri Slaby wrote: > >>> On 06. 08. 22, 13:24, Neal Cardwell wrote: > >>>> On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote: > >>>>> > >>>>> On 21. 07. 22, 22:44, Wei Wang wrote: > >>>>>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > >>>>>> > >>>>>> This to-be-reverted commit was meant to apply a stricter rule for the > >>>>>> stack to enter pingpong mode. However, the condition used to check for > >>>>>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is > >>>>>> jiffy based and might be too coarse, which delays the stack entering > >>>>>> pingpong mode. > >>>>>> We revert this patch so that we no longer use the above condition to > >>>>>> determine interactive session, and also reduce pingpong threshold to 1. > >>>>>> > >>>>>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") > >>>>>> Reported-by: LemmyHuang <hlm3280@163.com> > >>>>>> Suggested-by: Neal Cardwell <ncardwell@google.com> > >>>>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>>>> > >>>>> > >>>>> This breaks python-eventlet [1] (and was backported to stable trees): > >>>>> ________________ TestHttpd.test_018b_http_10_keepalive_framing > >>>>> _________________ > >>>>> > >>>>> self = <tests.wsgi_test.TestHttpd > >>>>> testMethod=test_018b_http_10_keepalive_framing> > >>>>> > >>>>> def test_018b_http_10_keepalive_framing(self): > >>>>> # verify that if an http/1.0 client sends connection: > >>>>> keep-alive > >>>>> # that we don't mangle the request framing if the app doesn't > >>>>> read the request > >>>>> def app(environ, start_response): > >>>>> resp_body = { > >>>>> '/1': b'first response', > >>>>> '/2': b'second response', > >>>>> '/3': b'third response', > >>>>> }.get(environ['PATH_INFO']) > >>>>> if resp_body is None: > >>>>> resp_body = 'Unexpected path: ' + environ['PATH_INFO'] > >>>>> if six.PY3: > >>>>> resp_body = resp_body.encode('latin1') > >>>>> # Never look at wsgi.input! > >>>>> start_response('200 OK', [('Content-type', 'text/plain')]) > >>>>> return [resp_body] > >>>>> > >>>>> self.site.application = app > >>>>> sock = eventlet.connect(self.server_addr) > >>>>> req_body = b'GET /tricksy HTTP/1.1\r\n' > >>>>> body_len = str(len(req_body)).encode('ascii') > >>>>> > >>>>> sock.sendall(b'PUT /1 HTTP/1.0\r\nHost: > >>>>> localhost\r\nConnection: keep-alive\r\n' > >>>>> b'Content-Length: ' + body_len + b'\r\n\r\n' + > >>>>> req_body) > >>>>> result1 = read_http(sock) > >>>>> self.assertEqual(b'first response', result1.body) > >>>>> self.assertEqual(result1.headers_original.get('Connection'), > >>>>> 'keep-alive') > >>>>> > >>>>> sock.sendall(b'PUT /2 HTTP/1.0\r\nHost: > >>>>> localhost\r\nConnection: keep-alive\r\n' > >>>>> b'Content-Length: ' + body_len + b'\r\nExpect: > >>>>> 100-continue\r\n\r\n') > >>>>> # Client may have a short timeout waiting on that 100 Continue > >>>>> # and basically immediately send its body > >>>>> sock.sendall(req_body) > >>>>> result2 = read_http(sock) > >>>>> self.assertEqual(b'second response', result2.body) > >>>>> self.assertEqual(result2.headers_original.get('Connection'), > >>>>> 'close') > >>>>> > >>>>> > sock.sendall(b'PUT /3 HTTP/1.0\r\nHost: > >>>>> localhost\r\nConnection: close\r\n\r\n') > >>>>> > >>>>> tests/wsgi_test.py:648: > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > >>>>> _ _ _ _ > >>>>> eventlet/greenio/base.py:407: in sendall > >>>>> tail = self.send(data, flags) > >>>>> eventlet/greenio/base.py:401: in send > >>>>> return self._send_loop(self.fd.send, data, flags) > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > >>>>> _ _ _ _ > >>>>> > >>>>> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0> > >>>>> send_method = <built-in method send of socket object at 0x7f5f2f73d520> > >>>>> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: > >>>>> close\r\n\r\n' > >>>>> args = (0,), _timeout_exc = timeout('timed out'), eno = 32 > >>>>> > >>>>> def _send_loop(self, send_method, data, *args): > >>>>> if self.act_non_blocking: > >>>>> return send_method(data, *args) > >>>>> > >>>>> _timeout_exc = socket_timeout('timed out') > >>>>> while True: > >>>>> try: > >>>>> > return send_method(data, *args) > >>>>> E BrokenPipeError: [Errno 32] Broken pipe > >>>>> > >>>>> eventlet/greenio/base.py:388: BrokenPipeError > >>>>> ==================== > >>>>> > >>>>> Reverting this revert on the top of 5.19 solves the issue. > >>>>> > >>>>> Any ideas? > >>>> > >>>> Interesting. This revert should return the kernel back to the delayed > >>>> ACK behavior it had for many years before May 2019 and Linux 5.1, > >>>> which contains the commit it is reverting: > >>>> > >>>> 4a41f453bedfd tcp: change pingpong threshold to 3 > >>>> > >>>> It sounds like perhaps this test you mention has an implicit > >>>> dependence on the timing of delayed ACKs. > >>>> > >>>> A few questions: > >>> > >>> Dunno. I am only an openSUSE kernel maintainer and this popped out at > >>> me. Feel free to dig to eventlet's sources on your own :P. > >> > >> Any updates on this or should I send a revert directly? > >> > >> The "before() &&" part of the patch makes the difference. That is this diff: > >> --- a/net/ipv4/tcp_output.c > >> +++ b/net/ipv4/tcp_output.c > >> @@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp, > >> * and it is a reply for ato after last received packet, > >> * increase pingpong count. > >> */ > >> - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > >> - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) > >> + pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u > >> ping=%u\n", > >> + __func__, sk, sk->sk_addrpair, sk->sk_portpair, now, > >> + tp->lsndtime, icsk->icsk_ack.lrcvtime, > >> + inet_csk(sk)->icsk_ack.pingpong); > >> + if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && > >> + (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) { > >> inet_csk_inc_pingpong_cnt(sk); > >> + pr_info("\tINC ping=%u before=%u\n", > >> + inet_csk(sk)->icsk_ack.pingpong, > >> + before(tp->lsndtime, > >> icsk->icsk_ack.lrcvtime)); > >> + } > >> > >> tp->lsndtime = now; > >> } > >> > >> makes it work again, and outputs this: > >> > >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > >> (100007f0100007f:e858b18b) now=4294902140 lsndtime=4294902140 > >> lrcvtime=4294902140 ping=0 > >> > TCP: tcp_event_data_sent: sk=00000000a4becf82 > >> (100007f0100007f:8bb158e8) now=4294902143 lsndtime=4294902140 > >> lrcvtime=4294902142 ping=0 > >> > TCP: INC ping=1 before=1 > >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > >> (100007f0100007f:e858b18b) now=4294902145 lsndtime=4294902140 > >> lrcvtime=4294902144 ping=0 > >> > TCP: INC ping=1 before=1 > >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > >> (100007f0100007f:e858b18b) now=4294902147 lsndtime=4294902145 > >> lrcvtime=4294902144 ping=1 > >> > TCP: INC ping=2 before=0 > >> > >> IMO, this "before=0" is the "source" of the problem. But I have no idea > >> what this means at all... > >> > >> > TCP: tcp_event_data_sent: sk=00000000a4becf82 > >> (100007f0100007f:8bb158e8) now=4294902149 lsndtime=4294902143 > >> lrcvtime=4294902148 ping=1 > >> > TCP: INC ping=2 before=1 > >> > TCP: tcp_event_data_sent: sk=00000000fd67cf8d > >> (100007f0100007f:e858b18b) now=4294902151 lsndtime=4294902147 > >> lrcvtime=4294902150 ping=3 > >> > TCP: INC ping=4 before=1 > >> > TCP: tcp_event_data_sent: sk=00000000c7a417e9 > >> (100007f0100007f:e85ab18b) now=4294902153 lsndtime=4294902153 > >> lrcvtime=4294902153 ping=0 > >> > TCP: tcp_event_data_sent: sk=000000008681183e > >> (100007f0100007f:8bb15ae8) now=4294902155 lsndtime=4294902153 > >> lrcvtime=4294902154 ping=0 > >> > TCP: INC ping=1 before=1 > > > > It sounds like this test has a very specific dependence on the buggy > > delayed ACK timing behavior from the buggy commit > > 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. > > > > IMHO I don't think we can revert a kernel bug fix based on a test that > > decided to depend on the exact timing of delayed ACKs during a time > > when that delayed ACK behavior was buggy. :-) > > Unfortunately despite the test is likely bogus (I am unable to say it is > or not), it does happen and the patch (the revert -- 4d8f24eeedc) breaks > userspace. I'd say this is exactly the case where we apply "we do not > break userspace". But I might be wrong as we might not care about silly > tests. > > In anyway, openSUSE has to have the patch (the revert) reverted, so that > the distro actually builds/works. (Until this is fixed on the eventlet > side at least. And more importantly _until_ it propagates to distros or > is fixed otherwise (like disabling the test).) And I suppose other > distros would have to do the same. That is quite unfortunate :/. > > thanks, > -- > js > suse labs Thanks, Jiri. Here are some notes from an analysis of this test and its interaction with the kernel, along with a suggested next step: A summary: ------------------- The python test: https://github.com/eventlet/eventlet/blob/890f320bbb56f915f128e5ed914be5e2a34b26d9/tests/wsgi_test.py is a test of a python networking library: https://github.com/eventlet/eventlet This user-space test has a race, and the change in the timing of TCP ACK packets in v5.19 has uncovered this race and caused the test to fail. But the change in the timing of the ACK is valid, and restores ACK behavior that was in the kernel for a long time: at least between the beginning of kernel git history in 2005 and 4d8f24eeedc5 (Revert "tcp: change pingpong threshold to 3") in v5.1 in May 2019. Slightly more details: The race in the test causes the pass/fail outcome of the test to depend on the exact timing of TCP ACKs. The ACK of the PUT /2 being delayed several hundred microseconds (delayed ACK behavior for kernel v5.19 and kernels up through v5.0), in combination with the client's Nagle behavior, delays the sending of the client PUT /2 payload data, causing the payload data to be unread in the socket when the server closes the connection in this test case, causing the server to send a RST sooner than the client code is expecting (after the PUT /2 payload data rather than after the PUT /3 request). This causes the test to receive an unexpected "BrokenPipeError: [Errno 32] Broken pipe" exception here, in line 648: https://github.com/eventlet/eventlet/blob/890f320bbb56f915f128e5ed914be5e2a34b26d9/tests/wsgi_test.py#L648 IMHO the best solution here is to tweak the test code so that it does not race and depend on the exact timing of TCP ACKs. One possible way to achieve this would be to have the client TCP connection use setsockopt(TCP_NODELAY) so that the body for the PUT /2 request is transmitted immediately, whether or not the server delays the ACK of the PUT /2 headers. Detailed analysis: ------------------- The client has apparently left the default Nagle algorithm behavior enabled (did not use setsockopt(TCP_NODELAY), and so allows at most one sub-MSS packet outstanding/un-ACKed, and so is sensitive to delayed ACK behavior. (1) With the behavior from 4a41f453bedf ("tcp: change pingpong threshold to 3") in kernel 5.1 to 5.18 (passing test): - The ACK of 'PUT /2' is immediate, so the "GET /tricksy" (HTTP content payload for the PUT /2") goes out quickly. - Because the 'GET /tricksy' goes out quickly, the server probably reads it out of its receive buffer while reading the PUT itself. - So when the server process closes the socket there is no data in the receive buffer, so the server socket does not sent a RST - By the time the client test code has called sock.sendall(b'PUT /3...) the client socket has received the FIN, and no RST has arrived yet, so the client gets the expected ConnectionClosed exception (2) With the 4d8f24eeedc5 (Revert "tcp: change pingpong threshold to 3") commit and the behavior from kernels up through v5.0 and now also 5.19 (failing test): - ACKs are delayed, so the server's response to the PUT carries the ACK that triggers the client to sent the GET /tricksy. - The server app code triggers the server to close the socket when the 'GET /tricksy' is still in the server's receive buffer, so when the server closes the socket the server connection generates a RST. - By the time the client test code has called sock.sendall(b'PUT /3') the client socket has received the server's FIN *and* the RST, so the client gets an unexpected: BrokenPipeError: [Errno 32] Broken pipe tcpdump traces: --------------- (1) passing test (kernel based on v5.10.31): 19:38:11.443592 IP 127.0.0.1.49786 > 127.0.0.1.39751: Flags [P.], seq 104:206, ack 141, win 512, options [nop,nop,TS val 981850760 ecr 981850760], length 102 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 009a 69c1 4000 4006 d29a 7f00 0001 E...i.@.@....... 0x0020: 7f00 0001 c27a 9b47 142a 4bcf 389c f6cf .....z.G.*K.8... 0x0030: 8018 0200 fe8e 0000 0101 080a 3a85 da88 ............:... 0x0040: 3a85 da88 5055 5420 2f32 2048 5454 502f :...PUT./2.HTTP/ 0x0050: 312e 300d 0a48 6f73 743a 206c 6f63 616c 1.0..Host:.local 0x0060: 686f 7374 0d0a 436f 6e6e 6563 7469 6f6e host..Connection 0x0070: 3a20 6b65 6570 2d61 6c69 7665 0d0a 436f :.keep-alive..Co 0x0080: 6e74 656e 742d 4c65 6e67 7468 3a20 3233 ntent-Length:.23 0x0090: 0d0a 4578 7065 6374 3a20 3130 302d 636f ..Expect:.100-co 0x00a0: 6e74 696e 7565 0d0a 0d0a ntinue.... 19:38:11.443597 IP 127.0.0.1.39751 > 127.0.0.1.49786: Flags [.], ack 206, win 512, options [nop,nop,TS val 981850760 ecr 981850760], length 0 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 0034 050f 4000 4006 37b3 7f00 0001 E..4..@.@.7..... 0x0020: 7f00 0001 9b47 c27a 389c f6cf 142a 4c35 .....G.z8....*L5 0x0030: 8010 0200 fe28 0000 0101 080a 3a85 da88 .....(......:... 0x0040: 3a85 da88 :... 19:38:11.443610 IP 127.0.0.1.49786 > 127.0.0.1.39751: Flags [P.], seq 206:229, ack 141, win 512, options [nop,nop,TS val 981850760 ecr 981850760], length 23 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 004b 69c2 4000 4006 d2e8 7f00 0001 E..Ki.@.@....... 0x0020: 7f00 0001 c27a 9b47 142a 4c35 389c f6cf .....z.G.*L58... 0x0030: 8018 0200 fe3f 0000 0101 080a 3a85 da88 .....?......:... 0x0040: 3a85 da88 4745 5420 2f74 7269 636b 7379 :...GET./tricksy 0x0050: 2048 5454 502f 312e 310d 0a .HTTP/1.1.. 19:38:11.443613 IP 127.0.0.1.39751 > 127.0.0.1.49786: Flags [.], ack 229, win 512, options [nop,nop,TS val 981850760 ecr 981850760], length 0 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 0034 0510 4000 4006 37b2 7f00 0001 E..4..@.@.7..... 0x0020: 7f00 0001 9b47 c27a 389c f6cf 142a 4c4c .....G.z8....*LL 0x0030: 8010 0200 fe28 0000 0101 080a 3a85 da88 .....(......:... 0x0040: 3a85 da88 :... 19:38:11.443921 IP 127.0.0.1.39751 > 127.0.0.1.49786: Flags [P.], seq 141:277, ack 229, win 512, options [nop,nop,TS val 981850760 ecr 981850760], length 136 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 00bc 0511 4000 4006 3729 7f00 0001 E.....@.@.7).... 0x0020: 7f00 0001 9b47 c27a 389c f6cf 142a 4c4c .....G.z8....*LL 0x0030: 8018 0200 feb0 0000 0101 080a 3a85 da88 ............:... 0x0040: 3a85 da88 4854 5450 2f31 2e31 2032 3030 :...HTTP/1.1.200 0x0050: 204f 4b0d 0a43 6f6e 7465 6e74 2d54 7970 .OK..Content-Typ 0x0060: 653a 2074 6578 742f 706c 6169 6e0d 0a43 e:.text/plain..C 0x0070: 6f6e 7465 6e74 2d4c 656e 6774 683a 2031 ontent-Length:.1 0x0080: 350d 0a44 6174 653a 2054 7565 2c20 3136 5..Date:.Tue,.16 0x0090: 2041 7567 2032 3032 3220 3139 3a33 383a .Aug.2022.19:38: 0x00a0: 3131 2047 4d54 0d0a 436f 6e6e 6563 7469 11.GMT..Connecti 0x00b0: 6f6e 3a20 636c 6f73 650d 0a0d 0a73 6563 on:.close....sec 0x00c0: 6f6e 6420 7265 7370 6f6e 7365 ond.response 19:38:11.443926 IP 127.0.0.1.49786 > 127.0.0.1.39751: Flags [.], ack 277, win 511, options [nop,nop,TS val 981850760 ecr 981850760], length 0 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 0034 69c3 4000 4006 d2fe 7f00 0001 E..4i.@.@....... 0x0020: 7f00 0001 c27a 9b47 142a 4c4c 389c f757 .....z.G.*LL8..W 0x0030: 8010 01ff fe28 0000 0101 080a 3a85 da88 .....(......:... 0x0040: 3a85 da88 :... 19:38:11.443990 IP 127.0.0.1.39751 > 127.0.0.1.49786: Flags [F.], seq 277, ack 229, win 512, options [nop,nop,TS val 981850760 ecr 981850760], length 0 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 0034 0512 4000 4006 37b0 7f00 0001 E..4..@.@.7..... 0x0020: 7f00 0001 9b47 c27a 389c f757 142a 4c4c .....G.z8..W.*LL 0x0030: 8011 0200 fe28 0000 0101 080a 3a85 da88 .....(......:... 19:38:11.444132 IP 127.0.0.1.49786 > 127.0.0.1.39751: Flags [P.], seq 229:284, ack 278, win 512, options [nop,nop,TS val 981850760 ecr 981850760], length 55 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 006b 69c4 4000 4006 d2c6 7f00 0001 E..ki.@.@....... 0x0020: 7f00 0001 c27a 9b47 142a 4c4c 389c f758 .....z.G.*LL8..X 0x0030: 8018 0200 fe5f 0000 0101 080a 3a85 da88 ....._......:... 0x0040: 3a85 da88 5055 5420 2f33 2048 5454 502f :...PUT./3.HTTP/ 0x0050: 312e 300d 0a48 6f73 743a 206c 6f63 616c 1.0..Host:.local 0x0060: 686f 7374 0d0a 436f 6e6e 6563 7469 6f6e host..Connection 0x0070: 3a20 636c 6f73 650d 0a0d 0a :.close.... 19:38:11.444144 IP 127.0.0.1.39751 > 127.0.0.1.49786: Flags [R], seq 949811032, win 0, length 0 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 0028 0000 4000 4006 3cce 7f00 0001 E..(..@.@.<..... 0x0020: 7f00 0001 9b47 c27a 389c f758 0000 0000 .....G.z8..X.... 0x0030: 5004 0000 2427 0000 P...$'.. (2) failing test (kernel based on v5.10.31 with a cherry-pick of 4d8f24eeedc5 (Revert "tcp: change pingpong threshold to 3"): 19:13:26.543081 IP 127.0.0.1.33382 > 127.0.0.1.45645: Flags [P.], seq 104:206, ack 141, win 512, options [nop,nop,TS val 1506020510 ecr 1506020510], length 102 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 009a 5f26 4000 4006 dd35 7f00 0001 E..._&@.@..5.... 0x0020: 7f00 0001 8266 b24d 1931 9034 880c 382e .....f.M.1.4..8. 0x0030: 8018 0200 fe8e 0000 0101 080a 59c4 0c9e ............Y... 0x0040: 59c4 0c9e 5055 5420 2f32 2048 5454 502f Y...PUT./2.HTTP/ 0x0050: 312e 300d 0a48 6f73 743a 206c 6f63 616c 1.0..Host:.local 0x0060: 686f 7374 0d0a 436f 6e6e 6563 7469 6f6e host..Connection 0x0070: 3a20 6b65 6570 2d61 6c69 7665 0d0a 436f :.keep-alive..Co 0x0080: 6e74 656e 742d 4c65 6e67 7468 3a20 3233 ntent-Length:.23 0x0090: 0d0a 4578 7065 6374 3a20 3130 302d 636f ..Expect:.100-co 0x00a0: 6e74 696e 7565 0d0a 0d0a ntinue.... 19:13:26.543361 IP 127.0.0.1.45645 > 127.0.0.1.33382: Flags [P.], seq 141:277, ack 206, win 512, options [nop,nop,TS val 1506020510 ecr 1506020510], length 136 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 00bc cbe9 4000 4006 7050 7f00 0001 E.....@.@.pP.... 0x0020: 7f00 0001 b24d 8266 880c 382e 1931 909a .....M.f..8..1.. 0x0030: 8018 0200 feb0 0000 0101 080a 59c4 0c9e ............Y... 0x0040: 59c4 0c9e 4854 5450 2f31 2e31 2032 3030 Y...HTTP/1.1.200 0x0050: 204f 4b0d 0a43 6f6e 7465 6e74 2d54 7970 .OK..Content-Typ 0x0060: 653a 2074 6578 742f 706c 6169 6e0d 0a43 e:.text/plain..C 0x0070: 6f6e 7465 6e74 2d4c 656e 6774 683a 2031 ontent-Length:.1 0x0080: 350d 0a44 6174 653a 2054 7565 2c20 3136 5..Date:.Tue,.16 0x0090: 2041 7567 2032 3032 3220 3139 3a31 333a .Aug.2022.19:13: 0x00a0: 3236 2047 4d54 0d0a 436f 6e6e 6563 7469 26.GMT..Connecti 0x00b0: 6f6e 3a20 636c 6f73 650d 0a0d 0a73 6563 on:.close....sec 0x00c0: 6f6e 6420 7265 7370 6f6e 7365 ond.response 19:13:26.543368 IP 127.0.0.1.33382 > 127.0.0.1.45645: Flags [P.], seq 206:229, ack 277, win 511, options [nop,nop,TS val 1506020510 ecr 1506020510], length 23 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 004b 5f27 4000 4006 dd83 7f00 0001 E..K_'@.@....... 0x0020: 7f00 0001 8266 b24d 1931 909a 880c 38b6 .....f.M.1....8. 0x0030: 8018 01ff fe3f 0000 0101 080a 59c4 0c9e .....?......Y... 0x0040: 59c4 0c9e 4745 5420 2f74 7269 636b 7379 Y...GET./tricksy 0x0050: 2048 5454 502f 312e 310d 0a .HTTP/1.1.. 19:13:26.543418 IP 127.0.0.1.45645 > 127.0.0.1.33382: Flags [F.], seq 277, ack 229, win 512, options [nop,nop,TS val 1506020510 ecr 1506020510], length 0 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 0034 cbea 4000 4006 70d7 7f00 0001 E..4..@.@.p..... 0x0020: 7f00 0001 b24d 8266 880c 38b6 1931 90b1 .....M.f..8..1.. 0x0030: 8011 0200 fe28 0000 0101 080a 59c4 0c9e .....(......Y... 0x0040: 59c4 0c9e Y... 19:13:26.543459 IP 127.0.0.1.45645 > 127.0.0.1.33382: Flags [R.], seq 278, ack 229, win 512, options [nop,nop,TS val 1506020510 ecr 1506020510], length 0 0x0000: 0000 0304 0006 0000 0000 0000 0000 0800 ................ 0x0010: 4500 0034 cbeb 4000 4006 70d6 7f00 0001 E..4..@.@.p..... 0x0020: 7f00 0001 b24d 8266 880c 38b7 1931 90b1 .....M.f..8..1.. 0x0030: 8014 0200 fe28 0000 0101 080a 59c4 0c9e .....(......Y... 0x0040: 59c4 0c9e Y... best regards, neal
On 17. 08. 22, 0:19, Neal Cardwell wrote: > IMHO the best solution here is to tweak the test code so that it does > not race and depend on the exact timing of TCP ACKs. One possible way > to achieve this would be to have the client TCP connection use > setsockopt(TCP_NODELAY) so that the body for the PUT /2 request is > transmitted immediately, whether or not the server delays the ACK of > the PUT /2 headers. Thanks a lot, Neal! So for the time being, until this is resolved in eventlet, I pushed a change to disable the test in openSUSE. thanks,
On Tue, Aug 16, 2022 at 11:44 PM Jiri Slaby <jirislaby@kernel.org> wrote: > > Thanks a lot, Neal! So for the time being, until this is resolved in > eventlet, I pushed a change to disable the test in openSUSE. Yes, that is the right approach in this case, since it really does seem to be purely about timing. Note that "breaks user space" is not about random test suites. For all we know, a test suite can exist literally to check for known bugs etc. No user space breakage is about real loads by real people, ie "I used to do this, and now that no longer works". The test suites showing behavioral differences are certainly always a cause of worry - because the test may exist for a very particular reason, and may be indicative that yes, there are actual real programs depending on this behavior - but isn't in itself necessarily a problem in itself. Linus
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 85cd695e7fd1..ee88f0f1350f 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -321,7 +321,7 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb, struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu); -#define TCP_PINGPONG_THRESH 3 +#define TCP_PINGPONG_THRESH 1 static inline void inet_csk_enter_pingpong_mode(struct sock *sk) { @@ -338,14 +338,6 @@ static inline bool inet_csk_in_pingpong_mode(struct sock *sk) return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; } -static inline void inet_csk_inc_pingpong_cnt(struct sock *sk) -{ - struct inet_connection_sock *icsk = inet_csk(sk); - - if (icsk->icsk_ack.pingpong < U8_MAX) - icsk->icsk_ack.pingpong++; -} - static inline bool inet_csk_has_ulp(struct sock *sk) { return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c38e07b50639..d06e72e141ac 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -167,16 +167,13 @@ static void tcp_event_data_sent(struct tcp_sock *tp, if (tcp_packets_in_flight(tp) == 0) tcp_ca_event(sk, CA_EVENT_TX_START); - /* If this is the first data packet sent in response to the - * previous received data, - * and it is a reply for ato after last received packet, - * increase pingpong count. - */ - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) && - (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) - inet_csk_inc_pingpong_cnt(sk); - tp->lsndtime = now; + + /* If it is a reply for ato after last received + * packet, enter pingpong mode. + */ + if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) + inet_csk_enter_pingpong_mode(sk); } /* Account for an ACK we sent. */
This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. This to-be-reverted commit was meant to apply a stricter rule for the stack to enter pingpong mode. However, the condition used to check for interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is jiffy based and might be too coarse, which delays the stack entering pingpong mode. We revert this patch so that we no longer use the above condition to determine interactive session, and also reduce pingpong threshold to 1. Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") Reported-by: LemmyHuang <hlm3280@163.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> --- v2: added Fixes tag --- include/net/inet_connection_sock.h | 10 +--------- net/ipv4/tcp_output.c | 15 ++++++--------- 2 files changed, 7 insertions(+), 18 deletions(-)