[net,v3,2/2] skmsg: bugfix for sk_msg sge iteration

Message ID	56d8ec28df901432e7bde4953795166ce2edd472.1719553101.git.tanggeliang@kylinos.cn (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAAB4249F5; Fri, 28 Jun 2024 05:48:28 +0000 (UTC) From: Geliang Tang <geliang@kernel.org> To: John Fastabend <john.fastabend@gmail.com>, Jakub Sitnicki <jakub@cloudflare.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net> Cc: Geliang Tang <tanggeliang@kylinos.cn>, David Ahern <dsahern@kernel.org>, Eduard Zingerman <eddyz87@gmail.com>, Mykola Lysenko <mykolal@fb.com>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>, Mykyta Yatsenko <yatsenko@meta.com>, Miao Xu <miaxu@meta.com>, Yuran Pereira <yuran.pereira@hotmail.com>, Huacai Chen <chenhuacai@kernel.org>, Tiezhu Yang <yangtiezhu@loongson.cn>, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration Date: Fri, 28 Jun 2024 13:47:48 +0800 Message-ID: <56d8ec28df901432e7bde4953795166ce2edd472.1719553101.git.tanggeliang@kylinos.cn> In-Reply-To: <cover.1719553101.git.tanggeliang@kylinos.cn> References: <cover.1719553101.git.tanggeliang@kylinos.cn> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	bugfixes for skmsg \| expand [net,v3,0/2] bugfixes for skmsg [net,v3,1/2] skmsg: prevent empty ingress skb from enqueuing [net,v3,2/2] skmsg: bugfix for sk_msg sge iteration

Message ID

56d8ec28df901432e7bde4953795166ce2edd472.1719553101.git.tanggeliang@kylinos.cn (mailing list archive)

State

Changes Requested

Delegated to:

Netdev Maintainers

Headers

From: Geliang Tang <geliang@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>
Cc: Geliang Tang <tanggeliang@kylinos.cn>,
	David Ahern <dsahern@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Mykola Lysenko <mykolal@fb.com>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>,
	Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Shuah Khan <shuah@kernel.org>,
	Mykyta Yatsenko <yatsenko@meta.com>,
	Miao Xu <miaxu@meta.com>,
	Yuran Pereira <yuran.pereira@hotmail.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	Tiezhu Yang <yangtiezhu@loongson.cn>,
	netdev@vger.kernel.org,
	bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration
Date: Fri, 28 Jun 2024 13:47:48 +0800
Message-ID: 
 <56d8ec28df901432e7bde4953795166ce2edd472.1719553101.git.tanggeliang@kylinos.cn>
In-Reply-To: <cover.1719553101.git.tanggeliang@kylinos.cn>
References: <cover.1719553101.git.tanggeliang@kylinos.cn>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

bugfixes for skmsg | expand

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for net
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 856 this patch: 856
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 9 of 9 maintainers
netdev/build_clang	success	Errors and warnings before: 860 this patch: 860
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 860 this patch: 860
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-06-28--09-00 (tests: 665)

Context

Check

Description

netdev/series_format

success

Posting correctly formatted

netdev/tree_selection

success

Clearly marked for net

netdev/ynl

success

Generated files up to date; no warnings/errors; no diff in generated;

netdev/fixes_present

success

Fixes tag present in non-next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 856 this patch: 856

netdev/build_tools

success

No tools touched, skip

netdev/cc_maintainers

success

CCed 9 of 9 maintainers

netdev/build_clang

success

Errors and warnings before: 860 this patch: 860

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

Fixes tag looks correct

netdev/build_allmodconfig_warn

success

Errors and warnings before: 860 this patch: 860

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 10 lines checked

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

netdev/contest

success

net-next-2024-06-28--09-00 (tests: 665)

Commit Message

Geliang Tang June 28, 2024, 5:47 a.m. UTC

From: Geliang Tang <tanggeliang@kylinos.cn>

Every time run this BPF selftests (./test_sockmap) on a Loongarch platform,
a Kernel panic occurs:

'''
 Oops[#1]:
 CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G     OE 6.10.0-rc2+ #32
 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
 ... ...
    ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420
   ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 0000000c (PPLV0 +PIE +PWE)
  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
  BADV: 0000000000000040
  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
 Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
 Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043, task=...)
 Stack : ... ...
         ...
 Call Trace:
 [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220
 [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420
 [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0
 [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360
 [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120
 [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100
 [<90000000045b76ec>] do_syscall+0x8c/0xc0
 [<90000000030e1da4>] handle_syscall+0xc4/0x160

 Code: ...

 ---[ end trace 0000000000000000 ]---
'''

This crash is because a NULL pointer is passed to page_address() in
sk_msg_memcopy_from_iter(). Due to the difference in architecture,
page_address(0) will not trigger a panic on the X86 platform but will panic
on the Loogarch platform. So this bug was hidden on the x86 platform, but
now it is exposed on the Loogarch platform.

This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an invalid
"sge" is always used:

	if (msg->sg.copybreak >= sge->length) {
		msg->sg.copybreak = 0;
		sk_msg_iter_var_next(i);
		if (i == msg->sg.end)
			break;
		sge = sk_msg_elem(msg, i);
	}

If the value of i is 2, msg->sg.end is also 2 when entering this if block.
sk_msg_iter_var_next() increases i by 1, and now i is 3, which is no longer
equal to msg->sg.end. The break will not be triggered, and the next sge
obtained by sk_msg_elem(3) will be an invalid one.

The correct approach is to check (i == msg->sg.end) first, and then invoke
sk_msg_iter_var_next() if they are not equal.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/core/skmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

D. Wythe July 1, 2024, 9 a.m. UTC | #1

On 6/28/24 1:47 PM, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Every time run this BPF selftests (./test_sockmap) on a Loongarch platform,
> a Kernel panic occurs:
>
> '''
>   Oops[#1]:
>   CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G     OE 6.10.0-rc2+ #32
>   Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
>   ... ...
>      ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420
>     ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220
>    CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>    PRMD: 0000000c (PPLV0 +PIE +PWE)
>    EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>    ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
>   ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>    BADV: 0000000000000040
>    PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
>   Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
>   Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043, task=...)
>   Stack : ... ...
>           ...
>   Call Trace:
>   [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220
>   [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420
>   [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0
>   [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360
>   [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120
>   [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100
>   [<90000000045b76ec>] do_syscall+0x8c/0xc0
>   [<90000000030e1da4>] handle_syscall+0xc4/0x160
>
>   Code: ...
>
>   ---[ end trace 0000000000000000 ]---
> '''
>
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_memcopy_from_iter(). Due to the difference in architecture,
> page_address(0) will not trigger a panic on the X86 platform but will panic
> on the Loogarch platform. So this bug was hidden on the x86 platform, but
> now it is exposed on the Loogarch platform.
>
> This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an invalid
> "sge" is always used:
>
> 	if (msg->sg.copybreak >= sge->length) {
> 		msg->sg.copybreak = 0;
> 		sk_msg_iter_var_next(i);
> 		if (i == msg->sg.end)
> 			break;
> 		sge = sk_msg_elem(msg, i);
> 	}
>
> If the value of i is 2, msg->sg.end is also 2 when entering this if block.
> sk_msg_iter_var_next() increases i by 1, and now i is 3, which is no longer
> equal to msg->sg.end. The break will not be triggered, and the next sge
> obtained by sk_msg_elem(3) will be an invalid one.
>
> The correct approach is to check (i == msg->sg.end) first, and then invoke
> sk_msg_iter_var_next() if they are not equal.
>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>   net/core/skmsg.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 44952cdd1425..1906d0d0eeac 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
>   		/* This is possible if a trim operation shrunk the buffer */
>   		if (msg->sg.copybreak >= sge->length) {
>   			msg->sg.copybreak = 0;
> -			sk_msg_iter_var_next(i);
>   			if (i == msg->sg.end)
>   				break;
> +			sk_msg_iter_var_next(i);
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
>   			sge = sk_msg_elem(msg, i);
>   		}
>

Geliang Tang July 1, 2024, 10:29 a.m. UTC | #2

Hello,

On Mon, 2024-07-01 at 17:00 +0800, D. Wythe wrote:
> 
> 
> On 6/28/24 1:47 PM, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > Every time run this BPF selftests (./test_sockmap) on a Loongarch
> > platform,
> > a Kernel panic occurs:
> > 
> > '''
> >   Oops[#1]:
> >   CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G     OE 6.10.0-
> > rc2+ #32
> >   Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS
> > Loongson-UDK2018
> >   ... ...
> >      ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420
> >     ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220
> >    CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> >    PRMD: 0000000c (PPLV0 +PIE +PWE)
> >    EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
> >    ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> >   ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
> >    BADV: 0000000000000040
> >    PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> >   Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack
> > ipt_REJECT
> >   Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043,
> > task=...)
> >   Stack : ... ...
> >           ...
> >   Call Trace:
> >   [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220
> >   [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420
> >   [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0
> >   [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360
> >   [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120
> >   [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100
> >   [<90000000045b76ec>] do_syscall+0x8c/0xc0
> >   [<90000000030e1da4>] handle_syscall+0xc4/0x160
> > 
> >   Code: ...
> > 
> >   ---[ end trace 0000000000000000 ]---
> > '''
> > 
> > This crash is because a NULL pointer is passed to page_address() in
> > sk_msg_memcopy_from_iter(). Due to the difference in architecture,
> > page_address(0) will not trigger a panic on the X86 platform but
> > will panic
> > on the Loogarch platform. So this bug was hidden on the x86
> > platform, but
> > now it is exposed on the Loogarch platform.
> > 
> > This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an
> > invalid
> > "sge" is always used:
> > 
> > 	if (msg->sg.copybreak >= sge->length) {
> > 		msg->sg.copybreak = 0;
> > 		sk_msg_iter_var_next(i);
> > 		if (i == msg->sg.end)
> > 			break;
> > 		sge = sk_msg_elem(msg, i);
> > 	}
> > 
> > If the value of i is 2, msg->sg.end is also 2 when entering this if
> > block.
> > sk_msg_iter_var_next() increases i by 1, and now i is 3, which is
> > no longer
> > equal to msg->sg.end. The break will not be triggered, and the next
> > sge
> > obtained by sk_msg_elem(3) will be an invalid one.
> > 
> > The correct approach is to check (i == msg->sg.end) first, and then
> > invoke
> > sk_msg_iter_var_next() if they are not equal.
> > 
> > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg
> > interface")
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> >   net/core/skmsg.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index 44952cdd1425..1906d0d0eeac 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
> > struct iov_iter *from,
> >   		/* This is possible if a trim operation shrunk the
> > buffer */
> >   		if (msg->sg.copybreak >= sge->length) {
> >   			msg->sg.copybreak = 0;
> > -			sk_msg_iter_var_next(i);
> >   			if (i == msg->sg.end)
> >   				break;
> > +			sk_msg_iter_var_next(i);
> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>

Thanks for your review.

But this change breaks test_sockmap. Will send a v4 to fix this.

Changes Requested.

-Geliang

> >   			sge = sk_msg_elem(msg, i);
> >   		}
> >   
> 
> 
>

Geliang Tang July 16, 2024, 6:53 a.m. UTC | #3

Hi John,

On Mon, 2024-07-01 at 18:29 +0800, Geliang Tang wrote:
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > On Mon, 2024-07-01 at 17:00 +0800, D. Wythe wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > On 6/28/24 1:47 PM, Geliang Tang wrote:
> > > > > > > > > > > > > > > > > > > > > > > > From: Geliang Tang
> > > > > > > > > > > > > > > > > > > > > > > > <tanggeliang@kylinos.cn
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Every time run this BPF
> > > > > > > > > > > > > > > > > > > > > > > > selftests
> > > > > > > > > > > > > > > > > > > > > > > > (./test_sockmap) on a
> > > > > > > > > > > > > > > > > > > > > > > > Loongarch
> > > > > > > > > > > > > > > > > > > > > > > > platform,

This issue seems related to "txmsg_cork".

Every subtest of test_sockmap with a txmsg_cork value greater than 1
will trigger this issue. For example, this test_txmsg_cork_hangs test:

static void test_txmsg_cork_hangs(int cgrp, struct sockmap_options
*opt)
{
        txmsg_pass = 1; 
        txmsg_redir = 0; 
        txmsg_cork = 4097;
        txmsg_apply = 4097;
        test_send_large(opt, cgrp);

        txmsg_pass = 0; 
        txmsg_redir = 1; 
        txmsg_apply = 0; 
        txmsg_cork = 4097;
        test_send_large(opt, cgrp);

        txmsg_pass = 0; 
        txmsg_redir = 1; 
        txmsg_apply = 4097;
        txmsg_cork = 4097;
        test_send_large(opt, cgrp);
}

These tests will break the sk_msg sge iteration in
sk_msg_memcopy_from_iter().

I added the following test code:

'''
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index bbf40b999713..e1fd40aa8586 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -374,6 +374,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
struct iov_iter *from,
 	void *to;
 
 	do {
+		pr_info("%s i=%d end=%d\n", __func__, i, msg->sg.end);
 		sge = sk_msg_elem(msg, i);
 		/* This is possible if a trim operation shrunk the
buffer */
 		if (msg->sg.copybreak >= sge->length) {
--
'''

And got this logs when running "test hanging corks" test:

'''
# 1/ 1  sockmap::txmsg test hanging corks:OK
# 2/ 1 sockhash::txmsg test hanging corks:OK
# 3/ 1 sockhash:ktls:txmsg test hanging corks:OK

[   55.751687] sk_msg_memcopy_from_iter i=0 end=8
[   55.751712] sk_msg_memcopy_from_iter i=1 end=8
[   55.751726] sk_msg_memcopy_from_iter i=2 end=8
[   55.751769] sk_msg_memcopy_from_iter i=3 end=8
[   55.751778] sk_msg_memcopy_from_iter i=4 end=8
[   55.751787] sk_msg_memcopy_from_iter i=5 end=8
[   55.751796] sk_msg_memcopy_from_iter i=6 end=8
[   55.751805] sk_msg_memcopy_from_iter i=7 end=8
[   55.752979] sk_msg_memcopy_from_iter i=8 end=16
[   55.752988] sk_msg_memcopy_from_iter i=9 end=16
[   55.752995] sk_msg_memcopy_from_iter i=10 end=16
[   55.753002] sk_msg_memcopy_from_iter i=11 end=16
[   55.753008] sk_msg_memcopy_from_iter i=12 end=16
[   55.753015] sk_msg_memcopy_from_iter i=13 end=16
[   55.753022] sk_msg_memcopy_from_iter i=14 end=16
[   55.753028] sk_msg_memcopy_from_iter i=15 end=16
[   56.087047] sk_msg_memcopy_from_iter i=0 end=1
[   56.087075] sk_msg_memcopy_from_iter i=1 end=1
[   56.087081] sk_msg_memcopy_from_iter i=3 end=1
[   56.087086] sk_msg_memcopy_from_iter i=5 end=1
[   56.087091] sk_msg_memcopy_from_iter i=7 end=1
[   56.087095] sk_msg_memcopy_from_iter i=9 end=1
[   56.087100] sk_msg_memcopy_from_iter i=11 end=1
[   56.087105] sk_msg_memcopy_from_iter i=13 end=1
[   56.087110] sk_msg_memcopy_from_iter i=15 end=1
[   56.087115] sk_msg_memcopy_from_iter i=17 end=1
'''

When "i" is greater than "end", the sge we get is empty, sge->length is
0. If we access this sge, this issue will occur. Kernel panics on some
machines.

To fix this, we can't "break" the loop like I did in this patch itself.
It will break test_sockmap with the following error logs:

# 1/ 1  sockmap::txmsg test hanging corks:OK
sendpage loop error: No space left on device
msg_loop_tx: iov_count 1024 iov_buf 256 cnt 2 err -1
tx thread exited with err 1.
# 2/ 1 sockhash::txmsg test hanging corks:FAIL
# 3/ 1 sockhash:ktls:txmsg test hanging corks:OK
Pass: 2 Fail: 1

We must "continue" the loop to get the next valid sge.

A better fix is here:

'''
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index bbf40b999713..bbaf909d0f9c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -376,13 +376,8 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
struct iov_iter *from,
 	do {
 		sge = sk_msg_elem(msg, i);
 		/* This is possible if a trim operation shrunk the
buffer */
-		if (msg->sg.copybreak >= sge->length) {
-			msg->sg.copybreak = 0;
-			sk_msg_iter_var_next(i);
-			if (i == msg->sg.end)
-				break;
-			sge = sk_msg_elem(msg, i);
-		}
+		if (msg->sg.copybreak >= sge->length)
+			goto next;
 
 		buf_size = sge->length - msg->sg.copybreak;
 		copy = (buf_size > bytes) ? bytes : buf_size;
@@ -399,6 +394,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
struct iov_iter *from,
 		bytes -= copy;
 		if (!bytes)
 			break;
+next:
 		msg->sg.copybreak = 0;
 		sk_msg_iter_var_next(i);
 	} while (i != msg->sg.end);

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 44952cdd1425..1906d0d0eeac 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -378,9 +378,9 @@  int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 		/* This is possible if a trim operation shrunk the buffer */
 		if (msg->sg.copybreak >= sge->length) {
 			msg->sg.copybreak = 0;
-			sk_msg_iter_var_next(i);
 			if (i == msg->sg.end)
 				break;
+			sk_msg_iter_var_next(i);
 			sge = sk_msg_elem(msg, i);
 		}

[net,v3,2/2] skmsg: bugfix for sk_msg sge iteration

Checks

Commit Message

Comments

Patch