Message ID | 9594185559881679d81f071b181a10eb07cd079f.1736004079.git.bcodding@redhat.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tls: Fix tls_sw_sendmsg error handling | expand |
On Sat, 4 Jan 2025 10:29:45 -0500 Benjamin Coddington wrote: > We've noticed that NFS can hang when using RPC over TLS on an unstable > connection, and investigation shows that the RPC layer is stuck in a tight > loop attempting to transmit, but forever getting -EBADMSG back from the > underlying network. The loop begins when tcp_sendmsg_locked() returns > -EPIPE to tls_tx_records(), but that error is converted to -EBADMSG when > calling the socket's error reporting handler. > > Instead of converting errors from tcp_sendmsg_locked(), let's pass them > along in this path. The RPC layer handles -EPIPE by reconnecting the > transport, which prevents the endless attempts to transmit on a broken > connection. LGTM, only question in my mind is whether we should send this to stable. Any preference?
On 6 Jan 2025, at 21:36, Jakub Kicinski wrote: > On Sat, 4 Jan 2025 10:29:45 -0500 Benjamin Coddington wrote: >> We've noticed that NFS can hang when using RPC over TLS on an unstable >> connection, and investigation shows that the RPC layer is stuck in a tight >> loop attempting to transmit, but forever getting -EBADMSG back from the >> underlying network. The loop begins when tcp_sendmsg_locked() returns >> -EPIPE to tls_tx_records(), but that error is converted to -EBADMSG when >> calling the socket's error reporting handler. >> >> Instead of converting errors from tcp_sendmsg_locked(), let's pass them >> along in this path. The RPC layer handles -EPIPE by reconnecting the >> transport, which prevents the endless attempts to transmit on a broken >> connection. > > LGTM, only question in my mind is whether we should send this to stable. > Any preference? Yes, I think it can go, though not a strong preference. This code well predates RPC over TLS which landed on v6.5. I haven't investigated other users - they may not have the same problem since RPC over TLS has very precise error handling, so it perhaps it makes sense to show the Fixes but limit how far back we go for RPC. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Cc: <stable@vger.kernel.org> # 6.5.x Thanks for the look Jakub. Ben
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index bbf26cc4f6ee..7bcc9b4408a2 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -458,7 +458,7 @@ int tls_tx_records(struct sock *sk, int flags) tx_err: if (rc < 0 && rc != -EAGAIN) - tls_err_abort(sk, -EBADMSG); + tls_err_abort(sk, rc); return rc; }
We've noticed that NFS can hang when using RPC over TLS on an unstable connection, and investigation shows that the RPC layer is stuck in a tight loop attempting to transmit, but forever getting -EBADMSG back from the underlying network. The loop begins when tcp_sendmsg_locked() returns -EPIPE to tls_tx_records(), but that error is converted to -EBADMSG when calling the socket's error reporting handler. Instead of converting errors from tcp_sendmsg_locked(), let's pass them along in this path. The RPC layer handles -EPIPE by reconnecting the transport, which prevents the endless attempts to transmit on a broken connection. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) base-commit: 0bc21e701a6ffacfdde7f04f87d664d82e8a13bf