diff mbox series

[net,v3,1/2] skmsg: prevent empty ingress skb from enqueuing

Message ID 5b6a55017ab616131f7de1268b60cb34e99941a1.1719553101.git.tanggeliang@kylinos.cn (mailing list archive)
State New
Headers show
Series bugfixes for skmsg | expand

Commit Message

Geliang Tang June 28, 2024, 5:47 a.m. UTC
From: Geliang Tang <tanggeliang@kylinos.cn>

Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, a Kernel panic occurs:

'''
Oops[#1]:
CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
   ... ...
   ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
  ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
 PRMD: 0000000c (PPLV0 +PIE +PWE)
 EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
 ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
 BADV: 0000000000000040
 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
Stack : ...
        ...
Call Trace:
[<9000000004162774>] copy_page_to_iter+0x74/0x1c0
[<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
[<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
[<90000000049aae34>] inet_recvmsg+0x54/0x100
[<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
[<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
[<900000000481e27c>] sys_recvfrom+0x1c/0x40
[<9000000004c076ec>] do_syscall+0x8c/0xc0
[<9000000003731da4>] handle_syscall+0xc4/0x160

Code: ...

---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
Kernel relocated by 0x3510000
 .text @ 0x9000000003710000
 .data @ 0x9000000004d70000
 .bss  @ 0x9000000006469400
---[ end Kernel panic - not syncing: Fatal exception ]---
'''

This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.

This crash is because a NULL pointer is passed to page_address() in
sk_msg_recvmsg(). Due to the difference in architecture, page_address(0)
will not trigger a panic on the X86 platform but will panic on the
Loogarch platform. So this bug was hidden on the x86 platform, but now
it is exposed on the Loogarch platform.

The root cause is an empty skb (skb->len == 0) is put on the queue.

In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.

And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
passed to kmap_local_page() and page_address(), then kernel panics.

To solve this, we should prevent empty skb from putting on the queue. So
in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.

Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/core/skmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

D. Wythe July 1, 2024, 9:08 a.m. UTC | #1
On 6/28/24 1:47 PM, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
> platform, a Kernel panic occurs:
>
> '''
> Oops[#1]:
> CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
> Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
>     ... ...
>     ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
>    ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
>   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>   PRMD: 0000000c (PPLV0 +PIE +PWE)
>   EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>   BADV: 0000000000000040
>   PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
> Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
> Stack : ...
>          ...
> Call Trace:
> [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> [<90000000049aae34>] inet_recvmsg+0x54/0x100
> [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> [<9000000004c076ec>] do_syscall+0x8c/0xc0
> [<9000000003731da4>] handle_syscall+0xc4/0x160
>
> Code: ...
>
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Fatal exception
> Kernel relocated by 0x3510000
>   .text @ 0x9000000003710000
>   .data @ 0x9000000004d70000
>   .bss  @ 0x9000000006469400
> ---[ end Kernel panic - not syncing: Fatal exception ]---
> '''
>
> This crash happens every time when running sockmap_skb_verdict_shutdown
> subtest in sockmap_basic.
>
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_recvmsg(). Due to the difference in architecture, page_address(0)
> will not trigger a panic on the X86 platform but will panic on the
> Loogarch platform. So this bug was hidden on the x86 platform, but now
> it is exposed on the Loogarch platform.

Maybe Loongarch ?  Besides that, LGTM.
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>

>
> The root cause is an empty skb (skb->len == 0) is put on the queue.
>
> In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
> page is put to this sge (see sg_set_page in sg_set_page), but this empty
> sge is queued into ingress_msg list.
>
> And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> passed to kmap_local_page() and page_address(), then kernel panics.
>
> To solve this, we should prevent empty skb from putting on the queue. So
> in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
>
> Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>   net/core/skmsg.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..44952cdd1425 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
>   
>   	rcu_read_lock();
>   	psock = sk_psock(sk);
> -	if (unlikely(!psock)) {
> +	if (unlikely(!psock || !len)) {
>   		len = 0;
>   		tcp_eat_skb(sk, skb);
>   		sock_drop(sk, skb);
Geliang Tang July 2, 2024, 7:02 a.m. UTC | #2
Superseded.

I just sent a v4 ("skmsg: skip empty sge in sk_msg_recvmsg").

Thanks,
-Geliang

On Fri, 2024-06-28 at 13:47 +0800, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
> platform, a Kernel panic occurs:
> 
> '''
> Oops[#1]:
> CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-
> rc2+ #18
> Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-
> UDK2018
>    ... ...
>    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
>   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
>  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>  PRMD: 0000000c (PPLV0 +PIE +PWE)
>  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>  BADV: 0000000000000040
>  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE
> xt_conntrack
> Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
> Stack : ...
>         ...
> Call Trace:
> [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> [<90000000049aae34>] inet_recvmsg+0x54/0x100
> [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> [<9000000004c076ec>] do_syscall+0x8c/0xc0
> [<9000000003731da4>] handle_syscall+0xc4/0x160
> 
> Code: ...
> 
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Fatal exception
> Kernel relocated by 0x3510000
>  .text @ 0x9000000003710000
>  .data @ 0x9000000004d70000
>  .bss  @ 0x9000000006469400
> ---[ end Kernel panic - not syncing: Fatal exception ]---
> '''
> 
> This crash happens every time when running
> sockmap_skb_verdict_shutdown
> subtest in sockmap_basic.
> 
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_recvmsg(). Due to the difference in architecture,
> page_address(0)
> will not trigger a panic on the X86 platform but will panic on the
> Loogarch platform. So this bug was hidden on the x86 platform, but
> now
> it is exposed on the Loogarch platform.
> 
> The root cause is an empty skb (skb->len == 0) is put on the queue.
> 
> In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and
> no
> page is put to this sge (see sg_set_page in sg_set_page), but this
> empty
> sge is queued into ingress_msg list.
> 
> And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> passed to kmap_local_page() and page_address(), then kernel panics.
> 
> To solve this, we should prevent empty skb from putting on the queue.
> So
> in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
> 
> Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser
> program")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  net/core/skmsg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..44952cdd1425 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock
> *sk, struct sk_buff *skb)
>  
>  	rcu_read_lock();
>  	psock = sk_psock(sk);
> -	if (unlikely(!psock)) {
> +	if (unlikely(!psock || !len)) {
>  		len = 0;
>  		tcp_eat_skb(sk, skb);
>  		sock_drop(sk, skb);
John Fastabend July 3, 2024, 1:03 a.m. UTC | #3
Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
> platform, a Kernel panic occurs:
> 
> '''
> Oops[#1]:
> CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
> Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
>    ... ...
>    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
>   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
>  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>  PRMD: 0000000c (PPLV0 +PIE +PWE)
>  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>  BADV: 0000000000000040
>  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
> Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
> Stack : ...
>         ...
> Call Trace:
> [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> [<90000000049aae34>] inet_recvmsg+0x54/0x100
> [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> [<9000000004c076ec>] do_syscall+0x8c/0xc0
> [<9000000003731da4>] handle_syscall+0xc4/0x160
> 
> Code: ...
> 
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Fatal exception
> Kernel relocated by 0x3510000
>  .text @ 0x9000000003710000
>  .data @ 0x9000000004d70000
>  .bss  @ 0x9000000006469400
> ---[ end Kernel panic - not syncing: Fatal exception ]---
> '''
> 
> This crash happens every time when running sockmap_skb_verdict_shutdown
> subtest in sockmap_basic.
> 
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_recvmsg(). Due to the difference in architecture, page_address(0)
> will not trigger a panic on the X86 platform but will panic on the
> Loogarch platform. So this bug was hidden on the x86 platform, but now
> it is exposed on the Loogarch platform.
> 
> The root cause is an empty skb (skb->len == 0) is put on the queue.
> 
> In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
> page is put to this sge (see sg_set_page in sg_set_page), but this empty
> sge is queued into ingress_msg list.
> 
> And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> passed to kmap_local_page() and page_address(), then kernel panics.
> 
> To solve this, we should prevent empty skb from putting on the queue. So
> in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
> 
> Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  net/core/skmsg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..44952cdd1425 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
>  
>  	rcu_read_lock();
>  	psock = sk_psock(sk);
> -	if (unlikely(!psock)) {
> +	if (unlikely(!psock || !len)) {
>  		len = 0;
>  		tcp_eat_skb(sk, skb);
>  		sock_drop(sk, skb);

The skb->len == 0 here is the FIN pkt right? We are using the EFAULT return
triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c.

The concern I have here is if we don't have the skb fin pkt on the recv
queue we might go into wait_data and block instead of return to user when
rcvmsg() is called from user. I wonder if we can write a test for this if
we don't already have one we probably should create one.

Maybe a better fix assuming my assumption about fin being skb->len=0 is
correct?

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..bbf40b999713 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
                        page = sg_page(sge);
                        if (copied + copy > len)
                                copy = len - copied;
-                       copy = copy_page_to_iter(page, sge->offset, copy, iter);
+                       if (copy)
+                               copy = copy_page_to_iter(page, sge->offset, copy, iter);
                        if (!copy) {
                                copied = copied ? copied : -EFAULT;
                                goto out;

Thanks,
John
Geliang Tang July 3, 2024, 1:54 a.m. UTC | #4
On Tue, 2024-07-02 at 18:03 -0700, John Fastabend wrote:
> Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > Run this BPF selftests (./test_progs -t sockmap_basic) on a
> > Loongarch
> > platform, a Kernel panic occurs:
> > 
> > '''
> > Oops[#1]:
> > CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-
> > rc2+ #18
> > Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-
> > UDK2018
> >    ... ...
> >    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
> >   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
> >  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> >  PRMD: 0000000c (PPLV0 +PIE +PWE)
> >  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
> >  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> > ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
> >  BADV: 0000000000000040
> >  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> > Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE
> > xt_conntrack
> > Process test_progs (pid: 2824, threadinfo=0000000000863a31,
> > task=...)
> > Stack : ...
> >         ...
> > Call Trace:
> > [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> > [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> > [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> > [<90000000049aae34>] inet_recvmsg+0x54/0x100
> > [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> > [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> > [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> > [<9000000004c076ec>] do_syscall+0x8c/0xc0
> > [<9000000003731da4>] handle_syscall+0xc4/0x160
> > 
> > Code: ...
> > 
> > ---[ end trace 0000000000000000 ]---
> > Kernel panic - not syncing: Fatal exception
> > Kernel relocated by 0x3510000
> >  .text @ 0x9000000003710000
> >  .data @ 0x9000000004d70000
> >  .bss  @ 0x9000000006469400
> > ---[ end Kernel panic - not syncing: Fatal exception ]---
> > '''
> > 
> > This crash happens every time when running
> > sockmap_skb_verdict_shutdown
> > subtest in sockmap_basic.
> > 
> > This crash is because a NULL pointer is passed to page_address() in
> > sk_msg_recvmsg(). Due to the difference in architecture,
> > page_address(0)
> > will not trigger a panic on the X86 platform but will panic on the
> > Loogarch platform. So this bug was hidden on the x86 platform, but
> > now
> > it is exposed on the Loogarch platform.
> > 
> > The root cause is an empty skb (skb->len == 0) is put on the queue.
> > 
> > In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero,
> > and no
> > page is put to this sge (see sg_set_page in sg_set_page), but this
> > empty
> > sge is queued into ingress_msg list.
> > 
> > And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> > got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> > passed to kmap_local_page() and page_address(), then kernel panics.
> > 
> > To solve this, we should prevent empty skb from putting on the
> > queue. So
> > in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
> > 
> > Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser
> > program")
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> >  net/core/skmsg.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index fd20aae30be2..44952cdd1425 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock
> > *sk, struct sk_buff *skb)
> >  
> >  	rcu_read_lock();
> >  	psock = sk_psock(sk);
> > -	if (unlikely(!psock)) {
> > +	if (unlikely(!psock || !len)) {
> >  		len = 0;
> >  		tcp_eat_skb(sk, skb);
> >  		sock_drop(sk, skb);
> 
> The skb->len == 0 here is the FIN pkt right? We are using the EFAULT
> return
> triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c.
> 
> The concern I have here is if we don't have the skb fin pkt on the
> recv
> queue we might go into wait_data and block instead of return to user
> when
> rcvmsg() is called from user. I wonder if we can write a test for
> this if
> we don't already have one we probably should create one.
> 
> Maybe a better fix assuming my assumption about fin being skb->len=0
> is
> correct?

Thanks John. Your fix is much better than mine. I'll use this as v5 and
update the commit log. I'll add your "Suggested-by" tag in it.

-Geliang

> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..bbf40b999713 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct
> sk_psock *psock, struct msghdr *msg,
>                         page = sg_page(sge);
>                         if (copied + copy > len)
>                                 copy = len - copied;
> -                       copy = copy_page_to_iter(page, sge->offset,
> copy, iter);
> +                       if (copy)
> +                               copy = copy_page_to_iter(page, sge-
> >offset, copy, iter);
>                         if (!copy) {
>                                 copied = copied ? copied : -EFAULT;
>                                 goto out;
> 
> Thanks,
> John
Geliang Tang July 5, 2024, 4:25 a.m. UTC | #5
Hi John,

On Wed, 2024-07-03 at 09:54 +0800, Geliang Tang wrote:
> On Tue, 2024-07-02 at 18:03 -0700, John Fastabend wrote:
> > Geliang Tang wrote:
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > 
> > > Run this BPF selftests (./test_progs -t sockmap_basic) on a
> > > Loongarch
> > > platform, a Kernel panic occurs:
> > > 
> > > '''
> > > Oops[#1]:
> > > CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE 
> > > 6.10.0-
> > > rc2+ #18
> > > Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS
> > > Loongson-
> > > UDK2018
> > >    ... ...
> > >    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
> > >   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
> > >  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> > >  PRMD: 0000000c (PPLV0 +PIE +PWE)
> > >  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
> > >  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> > > ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
> > >  BADV: 0000000000000040
> > >  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> > > Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE
> > > xt_conntrack
> > > Process test_progs (pid: 2824, threadinfo=0000000000863a31,
> > > task=...)
> > > Stack : ...
> > >         ...
> > > Call Trace:
> > > [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> > > [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> > > [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> > > [<90000000049aae34>] inet_recvmsg+0x54/0x100
> > > [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> > > [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> > > [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> > > [<9000000004c076ec>] do_syscall+0x8c/0xc0
> > > [<9000000003731da4>] handle_syscall+0xc4/0x160
> > > 
> > > Code: ...
> > > 
> > > ---[ end trace 0000000000000000 ]---
> > > Kernel panic - not syncing: Fatal exception
> > > Kernel relocated by 0x3510000
> > >  .text @ 0x9000000003710000
> > >  .data @ 0x9000000004d70000
> > >  .bss  @ 0x9000000006469400
> > > ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > '''
> > > 
> > > This crash happens every time when running
> > > sockmap_skb_verdict_shutdown
> > > subtest in sockmap_basic.
> > > 
> > > This crash is because a NULL pointer is passed to page_address()
> > > in
> > > sk_msg_recvmsg(). Due to the difference in architecture,
> > > page_address(0)
> > > will not trigger a panic on the X86 platform but will panic on
> > > the
> > > Loogarch platform. So this bug was hidden on the x86 platform,
> > > but
> > > now
> > > it is exposed on the Loogarch platform.
> > > 
> > > The root cause is an empty skb (skb->len == 0) is put on the
> > > queue.
> > > 
> > > In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero,
> > > and no
> > > page is put to this sge (see sg_set_page in sg_set_page), but
> > > this
> > > empty
> > > sge is queued into ingress_msg list.
> > > 
> > > And in sk_msg_recvmsg(), this empty sge is used, and a NULL page
> > > is
> > > got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(),
> > > it
> > > passed to kmap_local_page() and page_address(), then kernel
> > > panics.
> > > 
> > > To solve this, we should prevent empty skb from putting on the
> > > queue. So
> > > in sk_psock_verdict_recv(), if the skb->len is zero, drop this
> > > skb.
> > > 
> > > Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser
> > > program")
> > > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > > ---
> > >  net/core/skmsg.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > index fd20aae30be2..44952cdd1425 100644
> > > --- a/net/core/skmsg.c
> > > +++ b/net/core/skmsg.c
> > > @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct
> > > sock
> > > *sk, struct sk_buff *skb)
> > >  
> > >  	rcu_read_lock();
> > >  	psock = sk_psock(sk);
> > > -	if (unlikely(!psock)) {
> > > +	if (unlikely(!psock || !len)) {
> > >  		len = 0;
> > >  		tcp_eat_skb(sk, skb);
> > >  		sock_drop(sk, skb);
> > 
> > The skb->len == 0 here is the FIN pkt right? We are using the
> > EFAULT
> > return
> > triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c.

I added some logs for debugging and found that this FIN packet do hit
is_fin check in tcp_bpf.c.

> > 
> > The concern I have here is if we don't have the skb fin pkt on the
> > recv
> > queue we might go into wait_data and block instead of return to
> > user
> > when
> > rcvmsg() is called from user. I wonder if we can write a test for
> > this if
> > we don't already have one we probably should create one.

In test_sockmap_skb_verdict_shutdown(), the FIN packet is sent by

	shutdown(p1, SHUT_WR);

and received by

	n = recv(c1, &b, 1, SOCK_NONBLOCK);
	ASSERT_EQ(n, 0, "recv_timeout(fin)");

I think this test has covered the FIN packet scenario already. No need
to add a new one. WDYT?

> > 
> > Maybe a better fix assuming my assumption about fin being skb-
> > >len=0
> > is
> > correct?
> 
> Thanks John. Your fix is much better than mine. I'll use this as v5
> and
> update the commit log. I'll add your "Suggested-by" tag in it.

Anyway, this v5 (skmsg: skip zero length skb in sk_msg_recvmsg) seems
ready to be merged.

Thanks,
-Geliang

> 
> -Geliang
> 
> > 
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index fd20aae30be2..bbf40b999713 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct
> > sk_psock *psock, struct msghdr *msg,
> >                         page = sg_page(sge);
> >                         if (copied + copy > len)
> >                                 copy = len - copied;
> > -                       copy = copy_page_to_iter(page, sge->offset,
> > copy, iter);
> > +                       if (copy)
> > +                               copy = copy_page_to_iter(page, sge-
> > > offset, copy, iter);
> >                         if (!copy) {
> >                                 copied = copied ? copied : -EFAULT;
> >                                 goto out;
> > 
> > Thanks,
> > John
> 
>
diff mbox series

Patch

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..44952cdd1425 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1184,7 +1184,7 @@  static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
 
 	rcu_read_lock();
 	psock = sk_psock(sk);
-	if (unlikely(!psock)) {
+	if (unlikely(!psock || !len)) {
 		len = 0;
 		tcp_eat_skb(sk, skb);
 		sock_drop(sk, skb);