diff mbox series

[net] net: tun: call napi_schedule_prep() to ensure we own a napi

Message ID 20221107180011.188437-1-edumazet@google.com (mailing list archive)
State Accepted
Commit 07d120aa33cc9d9115753d159f64d20c94458781
Delegated to: Netdev Maintainers
Headers show
Series [net] net: tun: call napi_schedule_prep() to ensure we own a napi | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net, async
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet Nov. 7, 2022, 6 p.m. UTC
A recent patch exposed another issue in napi_get_frags()
caught by syzbot [1]

Before feeding packets to GRO, and calling napi_complete()
we must first grab NAPI_STATE_SCHED.

[1]
WARNING: CPU: 0 PID: 3612 at net/core/dev.c:6076 napi_complete_done+0x45b/0x880 net/core/dev.c:6076
Modules linked in:
CPU: 0 PID: 3612 Comm: syz-executor408 Not tainted 6.1.0-rc3-syzkaller-00175-g1118b2049d77 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:napi_complete_done+0x45b/0x880 net/core/dev.c:6076
Code: c1 ea 03 0f b6 14 02 4c 89 f0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 24 04 00 00 41 89 5d 1c e9 73 fc ff ff e8 b5 53 22 fa <0f> 0b e9 82 fe ff ff e8 a9 53 22 fa 48 8b 5c 24 08 31 ff 48 89 de
RSP: 0018:ffffc90003c4f920 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000030 RCX: 0000000000000000
RDX: ffff8880251c0000 RSI: ffffffff875a58db RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888072d02628
R13: ffff888072d02618 R14: ffff888072d02634 R15: 0000000000000000
FS: 0000555555f13300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c44d3892b8 CR3: 00000000172d2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
napi_complete include/linux/netdevice.h:510 [inline]
tun_get_user+0x206d/0x3a60 drivers/net/tun.c:1980
tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2027
call_write_iter include/linux/fs.h:2191 [inline]
do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
do_iter_write+0x182/0x700 fs/read_write.c:861
vfs_writev+0x1aa/0x630 fs/read_write.c:934
do_writev+0x133/0x2f0 fs/read_write.c:977
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f37021a3c19

Fixes: 1118b2049d77 ("net: tun: Fix memory leaks of napi_get_frags")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wang Yufen <wangyufen@huawei.com>
---
 drivers/net/tun.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Comments

Jakub Kicinski Nov. 9, 2022, 1:40 a.m. UTC | #1
On Mon,  7 Nov 2022 18:00:11 +0000 Eric Dumazet wrote:
>  		if (unlikely(headlen > skb_headlen(skb))) {
> +			WARN_ON_ONCE(1);
> +			err = -ENOMEM;
>  			dev_core_stats_rx_dropped_inc(tun->dev);
> +napi_busy:
>  			napi_free_frags(&tfile->napi);
>  			rcu_read_unlock();
>  			mutex_unlock(&tfile->napi_mutex);
> -			WARN_ON(1);
> -			return -ENOMEM;
> +			return err;
>  		}
>  
> -		local_bh_disable();
> -		napi_gro_frags(&tfile->napi);
> -		napi_complete(&tfile->napi);
> -		local_bh_enable();
> +		if (likely(napi_schedule_prep(&tfile->napi))) {
> +			local_bh_disable();
> +			napi_gro_frags(&tfile->napi);
> +			napi_complete(&tfile->napi);
> +			local_bh_enable();
> +		} else {
> +			err = -EBUSY;
> +			goto napi_busy;

This can only hit if someone else is trying to detach / napi_disable()
at the same time?

> +		}
Eric Dumazet Nov. 9, 2022, 1:45 a.m. UTC | #2
On Tue, Nov 8, 2022 at 5:41 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon,  7 Nov 2022 18:00:11 +0000 Eric Dumazet wrote:
> >               if (unlikely(headlen > skb_headlen(skb))) {
> > +                     WARN_ON_ONCE(1);
> > +                     err = -ENOMEM;
> >                       dev_core_stats_rx_dropped_inc(tun->dev);
> > +napi_busy:
> >                       napi_free_frags(&tfile->napi);
> >                       rcu_read_unlock();
> >                       mutex_unlock(&tfile->napi_mutex);
> > -                     WARN_ON(1);
> > -                     return -ENOMEM;
> > +                     return err;
> >               }
> >
> > -             local_bh_disable();
> > -             napi_gro_frags(&tfile->napi);
> > -             napi_complete(&tfile->napi);
> > -             local_bh_enable();
> > +             if (likely(napi_schedule_prep(&tfile->napi))) {
> > +                     local_bh_disable();
> > +                     napi_gro_frags(&tfile->napi);
> > +                     napi_complete(&tfile->napi);
> > +                     local_bh_enable();
> > +             } else {
> > +                     err = -EBUSY;
> > +                     goto napi_busy;
>
> This can only hit if someone else is trying to detach / napi_disable()
> at the same time?

I think this can happen if /sys/class/net/${dev}/gro_flush_timeout is used.

napi_watchdog() might grab NAPI_STATE_SCHED

Since this is mostly used by validation tools, I think it is better to
let the tool retry,
rather than trying to spin to acquire NAPI_STATE_SCHED.
patchwork-bot+netdevbpf@kernel.org Nov. 9, 2022, 2 a.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  7 Nov 2022 18:00:11 +0000 you wrote:
> A recent patch exposed another issue in napi_get_frags()
> caught by syzbot [1]
> 
> Before feeding packets to GRO, and calling napi_complete()
> we must first grab NAPI_STATE_SCHED.
> 
> [1]
> WARNING: CPU: 0 PID: 3612 at net/core/dev.c:6076 napi_complete_done+0x45b/0x880 net/core/dev.c:6076
> Modules linked in:
> CPU: 0 PID: 3612 Comm: syz-executor408 Not tainted 6.1.0-rc3-syzkaller-00175-g1118b2049d77 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
> RIP: 0010:napi_complete_done+0x45b/0x880 net/core/dev.c:6076
> Code: c1 ea 03 0f b6 14 02 4c 89 f0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 24 04 00 00 41 89 5d 1c e9 73 fc ff ff e8 b5 53 22 fa <0f> 0b e9 82 fe ff ff e8 a9 53 22 fa 48 8b 5c 24 08 31 ff 48 89 de
> RSP: 0018:ffffc90003c4f920 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000000000030 RCX: 0000000000000000
> RDX: ffff8880251c0000 RSI: ffffffff875a58db RDI: 0000000000000007
> RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000001 R12: ffff888072d02628
> R13: ffff888072d02618 R14: ffff888072d02634 R15: 0000000000000000
> FS: 0000555555f13300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055c44d3892b8 CR3: 00000000172d2000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> napi_complete include/linux/netdevice.h:510 [inline]
> tun_get_user+0x206d/0x3a60 drivers/net/tun.c:1980
> tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2027
> call_write_iter include/linux/fs.h:2191 [inline]
> do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735
> do_iter_write+0x182/0x700 fs/read_write.c:861
> vfs_writev+0x1aa/0x630 fs/read_write.c:934
> do_writev+0x133/0x2f0 fs/read_write.c:977
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f37021a3c19
> 
> [...]

Here is the summary with links:
  - [net] net: tun: call napi_schedule_prep() to ensure we own a napi
    https://git.kernel.org/netdev/net/c/07d120aa33cc

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index eb12f3136a5490afe18f774089a2fa211fd21a54..7a3ab3427369abab7472c3fbb07c24e7031f21b2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1967,18 +1967,25 @@  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 					  skb_headlen(skb));
 
 		if (unlikely(headlen > skb_headlen(skb))) {
+			WARN_ON_ONCE(1);
+			err = -ENOMEM;
 			dev_core_stats_rx_dropped_inc(tun->dev);
+napi_busy:
 			napi_free_frags(&tfile->napi);
 			rcu_read_unlock();
 			mutex_unlock(&tfile->napi_mutex);
-			WARN_ON(1);
-			return -ENOMEM;
+			return err;
 		}
 
-		local_bh_disable();
-		napi_gro_frags(&tfile->napi);
-		napi_complete(&tfile->napi);
-		local_bh_enable();
+		if (likely(napi_schedule_prep(&tfile->napi))) {
+			local_bh_disable();
+			napi_gro_frags(&tfile->napi);
+			napi_complete(&tfile->napi);
+			local_bh_enable();
+		} else {
+			err = -EBUSY;
+			goto napi_busy;
+		}
 		mutex_unlock(&tfile->napi_mutex);
 	} else if (tfile->napi_enabled) {
 		struct sk_buff_head *queue = &tfile->sk.sk_write_queue;