diff mbox series

[net] net: sched: fix race condition in qdisc_graft()

Message ID 20221018203258.2793282-1-edumazet@google.com (mailing list archive)
State Accepted
Commit ebda44da44f6f309d302522b049f43d6f829f7aa
Delegated to: Netdev Maintainers
Headers show
Series [net] net: sched: fix race condition in qdisc_graft() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net, async
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/cc_maintainers fail 1 blamed authors not CCed: kaber@trash.net; 3 maintainers not CCed: kaber@trash.net jhs@mojatatu.com jiri@resnulli.us
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch warning WARNING: Non-standard signature: Diagnosed-by: WARNING: Possible repeated word: 'Google'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet Oct. 18, 2022, 8:32 p.m. UTC
We had one syzbot report [1] in syzbot queue for a while.
I was waiting for more occurrences and/or a repro but
Dmitry Vyukov spotted the issue right away.

<quoting Dmitry>
qdisc_graft() drops reference to qdisc in notify_and_destroy
while it's still assigned to dev->qdisc
</quoting>

Indeed, RCU rules are clear when replacing a data structure.
The visible pointer (dev->qdisc in this case) must be updated
to the new object _before_ RCU grace period is started
(qdisc_put(old) in this case).

[1]
BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027

CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
__tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
__tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5efaa89279
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
</TASK>

Allocated by task 21027:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_node include/linux/slab.h:623 [inline]
kzalloc_node include/linux/slab.h:744 [inline]
qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
__dev_open+0x393/0x4d0 net/core/dev.c:1441
__dev_change_flags+0x583/0x750 net/core/dev.c:8556
rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
__rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 21020:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1754 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
slab_free mm/slub.c:3534 [inline]
kfree+0xe2/0x580 mm/slub.c:4562
rcu_do_batch kernel/rcu/tree.c:2245 [inline]
rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
__do_softirq+0x1d3/0x9c6 kernel/softirq.c:571

Last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
notify_and_destroy net/sched/sch_api.c:1012 [inline]
qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Second to last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
neigh_destroy+0x431/0x630 net/core/neighbour.c:912
neigh_release include/net/neighbour.h:454 [inline]
neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
neigh_del net/core/neighbour.c:225 [inline]
neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
neigh_forced_gc net/core/neighbour.c:276 [inline]
neigh_alloc net/core/neighbour.c:447 [inline]
___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:451 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
mld_send_cr net/ipv6/mcast.c:2121 [inline]
mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

The buggy address belongs to the object at ffff88802065e000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 56 bytes inside of
1024-byte region [ffff88802065e000, ffff88802065e400)

The buggy address belongs to the physical page:
page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
prep_new_page mm/page_alloc.c:2532 [inline]
get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
__alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
alloc_slab_page mm/slub.c:1824 [inline]
allocate_slab+0x27e/0x3d0 mm/slub.c:1969
new_slab mm/slub.c:2029 [inline]
___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
__slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
slab_alloc_node mm/slub.c:3209 [inline]
__kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
kmalloc_reserve net/core/skbuff.c:358 [inline]
__alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
sock_write_iter+0x291/0x3d0 net/socket.c:1108
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:578
ksys_write+0x1e8/0x250 fs/read_write.c:631
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1449 [inline]
free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
free_unref_page_prepare mm/page_alloc.c:3380 [inline]
free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
__unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
qlink_free mm/kasan/quarantine.c:168 [inline]
qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
kasan_slab_alloc include/linux/kasan.h:224 [inline]
slab_post_alloc_hook mm/slab.h:727 [inline]
slab_alloc_node mm/slub.c:3243 [inline]
slab_alloc mm/slub.c:3251 [inline]
__kmem_cache_alloc_lru mm/slub.c:3258 [inline]
kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
kmem_cache_zalloc include/linux/slab.h:723 [inline]
alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
alloc_page_buffers+0x280/0x790 fs/buffer.c:829
create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
generic_perform_write+0x246/0x560 mm/filemap.c:3738
ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:578

Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_api.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Oct. 20, 2022, 12:50 a.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 18 Oct 2022 20:32:58 +0000 you wrote:
> We had one syzbot report [1] in syzbot queue for a while.
> I was waiting for more occurrences and/or a repro but
> Dmitry Vyukov spotted the issue right away.
> 
> <quoting Dmitry>
> qdisc_graft() drops reference to qdisc in notify_and_destroy
> while it's still assigned to dev->qdisc
> </quoting>
> 
> [...]

Here is the summary with links:
  - [net] net: sched: fix race condition in qdisc_graft()
    https://git.kernel.org/netdev/net/c/ebda44da44f6

You are awesome, thank you!
Gal Pressman Nov. 3, 2022, 1:34 p.m. UTC | #2
On 18/10/2022 23:32, Eric Dumazet wrote:
> We had one syzbot report [1] in syzbot queue for a while.
> I was waiting for more occurrences and/or a repro but
> Dmitry Vyukov spotted the issue right away.
>
> <quoting Dmitry>
> qdisc_graft() drops reference to qdisc in notify_and_destroy
> while it's still assigned to dev->qdisc
> </quoting>
>
> Indeed, RCU rules are clear when replacing a data structure.
> The visible pointer (dev->qdisc in this case) must be updated
> to the new object _before_ RCU grace period is started
> (qdisc_put(old) in this case).
>
> [1]
> BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
>
> CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> print_address_description mm/kasan/report.c:317 [inline]
> print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
> kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
> __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
> tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
> rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> sock_sendmsg_nosec net/socket.c:714 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:734
> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f5efaa89279
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
> RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
> </TASK>
>
> Allocated by task 21027:
> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> kasan_set_track mm/kasan/common.c:45 [inline]
> set_alloc_info mm/kasan/common.c:437 [inline]
> ____kasan_kmalloc mm/kasan/common.c:516 [inline]
> ____kasan_kmalloc mm/kasan/common.c:475 [inline]
> __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
> kmalloc_node include/linux/slab.h:623 [inline]
> kzalloc_node include/linux/slab.h:744 [inline]
> qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
> qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
> attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
> netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
> attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
> dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
> __dev_open+0x393/0x4d0 net/core/dev.c:1441
> __dev_change_flags+0x583/0x750 net/core/dev.c:8556
> rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
> rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
> __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
> rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> sock_sendmsg_nosec net/socket.c:714 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:734
> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> Freed by task 21020:
> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> kasan_set_track+0x21/0x30 mm/kasan/common.c:45
> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
> ____kasan_slab_free mm/kasan/common.c:367 [inline]
> ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
> kasan_slab_free include/linux/kasan.h:200 [inline]
> slab_free_hook mm/slub.c:1754 [inline]
> slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
> slab_free mm/slub.c:3534 [inline]
> kfree+0xe2/0x580 mm/slub.c:4562
> rcu_do_batch kernel/rcu/tree.c:2245 [inline]
> rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
> __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
>
> Last potentially related work creation:
> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
> qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
> notify_and_destroy net/sched/sch_api.c:1012 [inline]
> qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
> tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> sock_sendmsg_nosec net/socket.c:714 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:734
> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> Second to last potentially related work creation:
> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
> neigh_destroy+0x431/0x630 net/core/neighbour.c:912
> neigh_release include/net/neighbour.h:454 [inline]
> neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
> neigh_del net/core/neighbour.c:225 [inline]
> neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
> neigh_forced_gc net/core/neighbour.c:276 [inline]
> neigh_alloc net/core/neighbour.c:447 [inline]
> ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
> ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
> __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
> ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
> NF_HOOK_COND include/linux/netfilter.h:296 [inline]
> ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
> dst_output include/net/dst.h:451 [inline]
> NF_HOOK include/linux/netfilter.h:307 [inline]
> NF_HOOK include/linux/netfilter.h:301 [inline]
> mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
> mld_send_cr net/ipv6/mcast.c:2121 [inline]
> mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
> process_one_work+0x991/0x1610 kernel/workqueue.c:2289
> worker_thread+0x665/0x1080 kernel/workqueue.c:2436
> kthread+0x2e4/0x3a0 kernel/kthread.c:376
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
>
> The buggy address belongs to the object at ffff88802065e000
> which belongs to the cache kmalloc-1k of size 1024
> The buggy address is located 56 bytes inside of
> 1024-byte region [ffff88802065e000, ffff88802065e400)
>
> The buggy address belongs to the physical page:
> page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
> head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
> flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
> raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
> prep_new_page mm/page_alloc.c:2532 [inline]
> get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
> __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
> alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
> alloc_slab_page mm/slub.c:1824 [inline]
> allocate_slab+0x27e/0x3d0 mm/slub.c:1969
> new_slab mm/slub.c:2029 [inline]
> ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
> __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
> slab_alloc_node mm/slub.c:3209 [inline]
> __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
> kmalloc_reserve net/core/skbuff.c:358 [inline]
> __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
> alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
> tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
> tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
> tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
> inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
> sock_sendmsg_nosec net/socket.c:714 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:734
> sock_write_iter+0x291/0x3d0 net/socket.c:1108
> call_write_iter include/linux/fs.h:2187 [inline]
> new_sync_write fs/read_write.c:491 [inline]
> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> ksys_write+0x1e8/0x250 fs/read_write.c:631
> page last free stack trace:
> reset_page_owner include/linux/page_owner.h:24 [inline]
> free_pages_prepare mm/page_alloc.c:1449 [inline]
> free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
> free_unref_page_prepare mm/page_alloc.c:3380 [inline]
> free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
> __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
> qlink_free mm/kasan/quarantine.c:168 [inline]
> qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
> kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
> __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
> kasan_slab_alloc include/linux/kasan.h:224 [inline]
> slab_post_alloc_hook mm/slab.h:727 [inline]
> slab_alloc_node mm/slub.c:3243 [inline]
> slab_alloc mm/slub.c:3251 [inline]
> __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
> kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
> kmem_cache_zalloc include/linux/slab.h:723 [inline]
> alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
> alloc_page_buffers+0x280/0x790 fs/buffer.c:829
> create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
> ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
> ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
> generic_perform_write+0x246/0x560 mm/filemap.c:3738
> ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
> ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
> call_write_iter include/linux/fs.h:2187 [inline]
> new_sync_write fs/read_write.c:491 [inline]
> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
>
> Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/sched/sch_api.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>  
>  skip:
>  		if (!ingress) {
> -			notify_and_destroy(net, skb, n, classid,
> -					   rtnl_dereference(dev->qdisc), new);
> +			old = rtnl_dereference(dev->qdisc);
>  			if (new && !new->ops->attach)
>  				qdisc_refcount_inc(new);
>  			rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
>  
> +			notify_and_destroy(net, skb, n, classid, old, new);
> +
>  			if (new && new->ops->attach)
>  				new->ops->attach(new);
>  		} else {

Hi Eric,
We started seeing the following WARN_ON happening on htb destroy
following your patch:
https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561

Anything comes to mind?
I can provide the exact shell commands and more info if needed.

Thanks
Eric Dumazet Nov. 3, 2022, 3:17 p.m. UTC | #3
On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
>
> On 18/10/2022 23:32, Eric Dumazet wrote:
> > We had one syzbot report [1] in syzbot queue for a while.
> > I was waiting for more occurrences and/or a repro but
> > Dmitry Vyukov spotted the issue right away.
> >
> > <quoting Dmitry>
> > qdisc_graft() drops reference to qdisc in notify_and_destroy
> > while it's still assigned to dev->qdisc
> > </quoting>
> >
> > Indeed, RCU rules are clear when replacing a data structure.
> > The visible pointer (dev->qdisc in this case) must be updated
> > to the new object _before_ RCU grace period is started
> > (qdisc_put(old) in this case).
> >
> > [1]
> > BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
> >
> > CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > print_address_description mm/kasan/report.c:317 [inline]
> > print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
> > kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
> > __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
> > tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
> > rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
> > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > sock_sendmsg_nosec net/socket.c:714 [inline]
> > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > RIP: 0033:0x7f5efaa89279
> > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
> > RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
> > RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
> > </TASK>
> >
> > Allocated by task 21027:
> > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > kasan_set_track mm/kasan/common.c:45 [inline]
> > set_alloc_info mm/kasan/common.c:437 [inline]
> > ____kasan_kmalloc mm/kasan/common.c:516 [inline]
> > ____kasan_kmalloc mm/kasan/common.c:475 [inline]
> > __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
> > kmalloc_node include/linux/slab.h:623 [inline]
> > kzalloc_node include/linux/slab.h:744 [inline]
> > qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
> > qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
> > attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
> > netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
> > attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
> > dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
> > __dev_open+0x393/0x4d0 net/core/dev.c:1441
> > __dev_change_flags+0x583/0x750 net/core/dev.c:8556
> > rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
> > rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
> > __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
> > rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
> > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > sock_sendmsg_nosec net/socket.c:714 [inline]
> > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > Freed by task 21020:
> > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > kasan_set_track+0x21/0x30 mm/kasan/common.c:45
> > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
> > ____kasan_slab_free mm/kasan/common.c:367 [inline]
> > ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
> > kasan_slab_free include/linux/kasan.h:200 [inline]
> > slab_free_hook mm/slub.c:1754 [inline]
> > slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
> > slab_free mm/slub.c:3534 [inline]
> > kfree+0xe2/0x580 mm/slub.c:4562
> > rcu_do_batch kernel/rcu/tree.c:2245 [inline]
> > rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
> > __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
> >
> > Last potentially related work creation:
> > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
> > qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
> > notify_and_destroy net/sched/sch_api.c:1012 [inline]
> > qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
> > tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
> > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > sock_sendmsg_nosec net/socket.c:714 [inline]
> > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > Second to last potentially related work creation:
> > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
> > neigh_destroy+0x431/0x630 net/core/neighbour.c:912
> > neigh_release include/net/neighbour.h:454 [inline]
> > neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
> > neigh_del net/core/neighbour.c:225 [inline]
> > neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
> > neigh_forced_gc net/core/neighbour.c:276 [inline]
> > neigh_alloc net/core/neighbour.c:447 [inline]
> > ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
> > ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
> > __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
> > ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
> > NF_HOOK_COND include/linux/netfilter.h:296 [inline]
> > ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
> > dst_output include/net/dst.h:451 [inline]
> > NF_HOOK include/linux/netfilter.h:307 [inline]
> > NF_HOOK include/linux/netfilter.h:301 [inline]
> > mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
> > mld_send_cr net/ipv6/mcast.c:2121 [inline]
> > mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
> > process_one_work+0x991/0x1610 kernel/workqueue.c:2289
> > worker_thread+0x665/0x1080 kernel/workqueue.c:2436
> > kthread+0x2e4/0x3a0 kernel/kthread.c:376
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
> >
> > The buggy address belongs to the object at ffff88802065e000
> > which belongs to the cache kmalloc-1k of size 1024
> > The buggy address is located 56 bytes inside of
> > 1024-byte region [ffff88802065e000, ffff88802065e400)
> >
> > The buggy address belongs to the physical page:
> > page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
> > head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
> > flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> > raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
> > raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> > page dumped because: kasan: bad access detected
> > page_owner tracks the page as allocated
> > page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
> > prep_new_page mm/page_alloc.c:2532 [inline]
> > get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
> > __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
> > alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
> > alloc_slab_page mm/slub.c:1824 [inline]
> > allocate_slab+0x27e/0x3d0 mm/slub.c:1969
> > new_slab mm/slub.c:2029 [inline]
> > ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
> > __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
> > slab_alloc_node mm/slub.c:3209 [inline]
> > __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
> > kmalloc_reserve net/core/skbuff.c:358 [inline]
> > __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
> > alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
> > tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
> > tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
> > tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
> > inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
> > sock_sendmsg_nosec net/socket.c:714 [inline]
> > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > sock_write_iter+0x291/0x3d0 net/socket.c:1108
> > call_write_iter include/linux/fs.h:2187 [inline]
> > new_sync_write fs/read_write.c:491 [inline]
> > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > ksys_write+0x1e8/0x250 fs/read_write.c:631
> > page last free stack trace:
> > reset_page_owner include/linux/page_owner.h:24 [inline]
> > free_pages_prepare mm/page_alloc.c:1449 [inline]
> > free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
> > free_unref_page_prepare mm/page_alloc.c:3380 [inline]
> > free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
> > __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
> > qlink_free mm/kasan/quarantine.c:168 [inline]
> > qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
> > kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
> > __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
> > kasan_slab_alloc include/linux/kasan.h:224 [inline]
> > slab_post_alloc_hook mm/slab.h:727 [inline]
> > slab_alloc_node mm/slub.c:3243 [inline]
> > slab_alloc mm/slub.c:3251 [inline]
> > __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
> > kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
> > kmem_cache_zalloc include/linux/slab.h:723 [inline]
> > alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
> > alloc_page_buffers+0x280/0x790 fs/buffer.c:829
> > create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
> > ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
> > ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
> > generic_perform_write+0x246/0x560 mm/filemap.c:3738
> > ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
> > ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
> > call_write_iter include/linux/fs.h:2187 [inline]
> > new_sync_write fs/read_write.c:491 [inline]
> > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> >
> > Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
> > Reported-by: syzbot <syzkaller@googlegroups.com>
> > Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/sched/sch_api.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
> > --- a/net/sched/sch_api.c
> > +++ b/net/sched/sch_api.c
> > @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> >
> >  skip:
> >               if (!ingress) {
> > -                     notify_and_destroy(net, skb, n, classid,
> > -                                        rtnl_dereference(dev->qdisc), new);
> > +                     old = rtnl_dereference(dev->qdisc);
> >                       if (new && !new->ops->attach)
> >                               qdisc_refcount_inc(new);
> >                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
> >
> > +                     notify_and_destroy(net, skb, n, classid, old, new);
> > +
> >                       if (new && new->ops->attach)
> >                               new->ops->attach(new);
> >               } else {
>
> Hi Eric,
> We started seeing the following WARN_ON happening on htb destroy
> following your patch:
> https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
>
> Anything comes to mind?

Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>

> I can provide the exact shell commands and more info if needed.
>
> Thanks
Jakub Kicinski Nov. 3, 2022, 3:27 p.m. UTC | #4
On Thu, 3 Nov 2022 08:17:06 -0700 Eric Dumazet wrote:
> On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
> >
> > On 18/10/2022 23:32, Eric Dumazet wrote:  
> > > We had one syzbot report [1] in syzbot queue for a while.
> > > I was waiting for more occurrences and/or a repro but
> > > Dmitry Vyukov spotted the issue right away.
> > >
> > > <quoting Dmitry>
> > > qdisc_graft() drops reference to qdisc in notify_and_destroy
> > > while it's still assigned to dev->qdisc
> > > </quoting>
> > >
> > > Indeed, RCU rules are clear when replacing a data structure.
> > > The visible pointer (dev->qdisc in this case) must be updated
> > > to the new object _before_ RCU grace period is started
> > > (qdisc_put(old) in this case).
> > >
> > > [1]
> > > BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
> > >
> > > CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
> > > Call Trace:
> > > <TASK>
> > > __dump_stack lib/dump_stack.c:88 [inline]
> > > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > > print_address_description mm/kasan/report.c:317 [inline]
> > > print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
> > > kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
> > > __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
> > > tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
> > > rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
> > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > RIP: 0033:0x7f5efaa89279
> > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > > RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
> > > RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
> > > RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
> > > </TASK>
> > >
> > > Allocated by task 21027:
> > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > kasan_set_track mm/kasan/common.c:45 [inline]
> > > set_alloc_info mm/kasan/common.c:437 [inline]
> > > ____kasan_kmalloc mm/kasan/common.c:516 [inline]
> > > ____kasan_kmalloc mm/kasan/common.c:475 [inline]
> > > __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
> > > kmalloc_node include/linux/slab.h:623 [inline]
> > > kzalloc_node include/linux/slab.h:744 [inline]
> > > qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
> > > qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
> > > attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
> > > netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
> > > attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
> > > dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
> > > __dev_open+0x393/0x4d0 net/core/dev.c:1441
> > > __dev_change_flags+0x583/0x750 net/core/dev.c:8556
> > > rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
> > > rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
> > > __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
> > > rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
> > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >
> > > Freed by task 21020:
> > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > kasan_set_track+0x21/0x30 mm/kasan/common.c:45
> > > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
> > > ____kasan_slab_free mm/kasan/common.c:367 [inline]
> > > ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
> > > kasan_slab_free include/linux/kasan.h:200 [inline]
> > > slab_free_hook mm/slub.c:1754 [inline]
> > > slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
> > > slab_free mm/slub.c:3534 [inline]
> > > kfree+0xe2/0x580 mm/slub.c:4562
> > > rcu_do_batch kernel/rcu/tree.c:2245 [inline]
> > > rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
> > > __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
> > >
> > > Last potentially related work creation:
> > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
> > > qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
> > > notify_and_destroy net/sched/sch_api.c:1012 [inline]
> > > qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
> > > tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
> > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >
> > > Second to last potentially related work creation:
> > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
> > > neigh_destroy+0x431/0x630 net/core/neighbour.c:912
> > > neigh_release include/net/neighbour.h:454 [inline]
> > > neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
> > > neigh_del net/core/neighbour.c:225 [inline]
> > > neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
> > > neigh_forced_gc net/core/neighbour.c:276 [inline]
> > > neigh_alloc net/core/neighbour.c:447 [inline]
> > > ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
> > > ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
> > > __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
> > > ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
> > > NF_HOOK_COND include/linux/netfilter.h:296 [inline]
> > > ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
> > > dst_output include/net/dst.h:451 [inline]
> > > NF_HOOK include/linux/netfilter.h:307 [inline]
> > > NF_HOOK include/linux/netfilter.h:301 [inline]
> > > mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
> > > mld_send_cr net/ipv6/mcast.c:2121 [inline]
> > > mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
> > > process_one_work+0x991/0x1610 kernel/workqueue.c:2289
> > > worker_thread+0x665/0x1080 kernel/workqueue.c:2436
> > > kthread+0x2e4/0x3a0 kernel/kthread.c:376
> > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
> > >
> > > The buggy address belongs to the object at ffff88802065e000
> > > which belongs to the cache kmalloc-1k of size 1024
> > > The buggy address is located 56 bytes inside of
> > > 1024-byte region [ffff88802065e000, ffff88802065e400)
> > >
> > > The buggy address belongs to the physical page:
> > > page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
> > > head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
> > > flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> > > raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
> > > raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> > > page dumped because: kasan: bad access detected
> > > page_owner tracks the page as allocated
> > > page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
> > > prep_new_page mm/page_alloc.c:2532 [inline]
> > > get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
> > > __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
> > > alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
> > > alloc_slab_page mm/slub.c:1824 [inline]
> > > allocate_slab+0x27e/0x3d0 mm/slub.c:1969
> > > new_slab mm/slub.c:2029 [inline]
> > > ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
> > > __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
> > > slab_alloc_node mm/slub.c:3209 [inline]
> > > __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
> > > kmalloc_reserve net/core/skbuff.c:358 [inline]
> > > __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
> > > alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
> > > tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
> > > tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
> > > tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
> > > inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
> > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > sock_write_iter+0x291/0x3d0 net/socket.c:1108
> > > call_write_iter include/linux/fs.h:2187 [inline]
> > > new_sync_write fs/read_write.c:491 [inline]
> > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > > ksys_write+0x1e8/0x250 fs/read_write.c:631
> > > page last free stack trace:
> > > reset_page_owner include/linux/page_owner.h:24 [inline]
> > > free_pages_prepare mm/page_alloc.c:1449 [inline]
> > > free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
> > > free_unref_page_prepare mm/page_alloc.c:3380 [inline]
> > > free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
> > > __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
> > > qlink_free mm/kasan/quarantine.c:168 [inline]
> > > qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
> > > kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
> > > __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
> > > kasan_slab_alloc include/linux/kasan.h:224 [inline]
> > > slab_post_alloc_hook mm/slab.h:727 [inline]
> > > slab_alloc_node mm/slub.c:3243 [inline]
> > > slab_alloc mm/slub.c:3251 [inline]
> > > __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
> > > kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
> > > kmem_cache_zalloc include/linux/slab.h:723 [inline]
> > > alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
> > > alloc_page_buffers+0x280/0x790 fs/buffer.c:829
> > > create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
> > > ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
> > > ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
> > > generic_perform_write+0x246/0x560 mm/filemap.c:3738
> > > ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
> > > ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
> > > call_write_iter include/linux/fs.h:2187 [inline]
> > > new_sync_write fs/read_write.c:491 [inline]
> > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > >
> > > Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
> > > Reported-by: syzbot <syzkaller@googlegroups.com>
> > > Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  net/sched/sch_api.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > > index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
> > > --- a/net/sched/sch_api.c
> > > +++ b/net/sched/sch_api.c
> > > @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> > >
> > >  skip:
> > >               if (!ingress) {
> > > -                     notify_and_destroy(net, skb, n, classid,
> > > -                                        rtnl_dereference(dev->qdisc), new);
> > > +                     old = rtnl_dereference(dev->qdisc);
> > >                       if (new && !new->ops->attach)
> > >                               qdisc_refcount_inc(new);
> > >                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
> > >
> > > +                     notify_and_destroy(net, skb, n, classid, old, new);
> > > +
> > >                       if (new && new->ops->attach)
> > >                               new->ops->attach(new);
> > >               } else {  
> >
> > Hi Eric,
> > We started seeing the following WARN_ON happening on htb destroy
> > following your patch:
> > https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
> >
> > Anything comes to mind?  
> 
> Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>

CC: Maxim on non-nvidia address

> > I can provide the exact shell commands and more info if needed.
> >
> > Thanks
Eric Dumazet Nov. 3, 2022, 3:29 p.m. UTC | #5
On Thu, Nov 3, 2022 at 8:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 3 Nov 2022 08:17:06 -0700 Eric Dumazet wrote:
> > On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
> > >
> > > On 18/10/2022 23:32, Eric Dumazet wrote:
> > > > We had one syzbot report [1] in syzbot queue for a while.
> > > > I was waiting for more occurrences and/or a repro but
> > > > Dmitry Vyukov spotted the issue right away.
> > > >
> > > > <quoting Dmitry>
> > > > qdisc_graft() drops reference to qdisc in notify_and_destroy
> > > > while it's still assigned to dev->qdisc
> > > > </quoting>
> > > >
> > > > Indeed, RCU rules are clear when replacing a data structure.
> > > > The visible pointer (dev->qdisc in this case) must be updated
> > > > to the new object _before_ RCU grace period is started
> > > > (qdisc_put(old) in this case).
> > > >
> > > > [1]
> > > > BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > > Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
> > > >
> > > > CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
> > > > Call Trace:
> > > > <TASK>
> > > > __dump_stack lib/dump_stack.c:88 [inline]
> > > > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > > > print_address_description mm/kasan/report.c:317 [inline]
> > > > print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
> > > > kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
> > > > __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > > __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
> > > > tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
> > > > rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
> > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > RIP: 0033:0x7f5efaa89279
> > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > > > RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
> > > > RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
> > > > RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
> > > > </TASK>
> > > >
> > > > Allocated by task 21027:
> > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > kasan_set_track mm/kasan/common.c:45 [inline]
> > > > set_alloc_info mm/kasan/common.c:437 [inline]
> > > > ____kasan_kmalloc mm/kasan/common.c:516 [inline]
> > > > ____kasan_kmalloc mm/kasan/common.c:475 [inline]
> > > > __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
> > > > kmalloc_node include/linux/slab.h:623 [inline]
> > > > kzalloc_node include/linux/slab.h:744 [inline]
> > > > qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
> > > > qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
> > > > attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
> > > > netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
> > > > attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
> > > > dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
> > > > __dev_open+0x393/0x4d0 net/core/dev.c:1441
> > > > __dev_change_flags+0x583/0x750 net/core/dev.c:8556
> > > > rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
> > > > rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
> > > > __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
> > > > rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
> > > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >
> > > > Freed by task 21020:
> > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > kasan_set_track+0x21/0x30 mm/kasan/common.c:45
> > > > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
> > > > ____kasan_slab_free mm/kasan/common.c:367 [inline]
> > > > ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
> > > > kasan_slab_free include/linux/kasan.h:200 [inline]
> > > > slab_free_hook mm/slub.c:1754 [inline]
> > > > slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
> > > > slab_free mm/slub.c:3534 [inline]
> > > > kfree+0xe2/0x580 mm/slub.c:4562
> > > > rcu_do_batch kernel/rcu/tree.c:2245 [inline]
> > > > rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
> > > > __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
> > > >
> > > > Last potentially related work creation:
> > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > > call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
> > > > qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
> > > > notify_and_destroy net/sched/sch_api.c:1012 [inline]
> > > > qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
> > > > tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
> > > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >
> > > > Second to last potentially related work creation:
> > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > > kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
> > > > neigh_destroy+0x431/0x630 net/core/neighbour.c:912
> > > > neigh_release include/net/neighbour.h:454 [inline]
> > > > neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
> > > > neigh_del net/core/neighbour.c:225 [inline]
> > > > neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
> > > > neigh_forced_gc net/core/neighbour.c:276 [inline]
> > > > neigh_alloc net/core/neighbour.c:447 [inline]
> > > > ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
> > > > ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
> > > > __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
> > > > ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
> > > > NF_HOOK_COND include/linux/netfilter.h:296 [inline]
> > > > ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
> > > > dst_output include/net/dst.h:451 [inline]
> > > > NF_HOOK include/linux/netfilter.h:307 [inline]
> > > > NF_HOOK include/linux/netfilter.h:301 [inline]
> > > > mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
> > > > mld_send_cr net/ipv6/mcast.c:2121 [inline]
> > > > mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
> > > > process_one_work+0x991/0x1610 kernel/workqueue.c:2289
> > > > worker_thread+0x665/0x1080 kernel/workqueue.c:2436
> > > > kthread+0x2e4/0x3a0 kernel/kthread.c:376
> > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
> > > >
> > > > The buggy address belongs to the object at ffff88802065e000
> > > > which belongs to the cache kmalloc-1k of size 1024
> > > > The buggy address is located 56 bytes inside of
> > > > 1024-byte region [ffff88802065e000, ffff88802065e400)
> > > >
> > > > The buggy address belongs to the physical page:
> > > > page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
> > > > head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
> > > > flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> > > > raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
> > > > raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> > > > page dumped because: kasan: bad access detected
> > > > page_owner tracks the page as allocated
> > > > page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
> > > > prep_new_page mm/page_alloc.c:2532 [inline]
> > > > get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
> > > > __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
> > > > alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
> > > > alloc_slab_page mm/slub.c:1824 [inline]
> > > > allocate_slab+0x27e/0x3d0 mm/slub.c:1969
> > > > new_slab mm/slub.c:2029 [inline]
> > > > ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
> > > > __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
> > > > slab_alloc_node mm/slub.c:3209 [inline]
> > > > __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
> > > > kmalloc_reserve net/core/skbuff.c:358 [inline]
> > > > __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
> > > > alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
> > > > tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
> > > > tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
> > > > tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
> > > > inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
> > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > sock_write_iter+0x291/0x3d0 net/socket.c:1108
> > > > call_write_iter include/linux/fs.h:2187 [inline]
> > > > new_sync_write fs/read_write.c:491 [inline]
> > > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > > > ksys_write+0x1e8/0x250 fs/read_write.c:631
> > > > page last free stack trace:
> > > > reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > free_pages_prepare mm/page_alloc.c:1449 [inline]
> > > > free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
> > > > free_unref_page_prepare mm/page_alloc.c:3380 [inline]
> > > > free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
> > > > __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
> > > > qlink_free mm/kasan/quarantine.c:168 [inline]
> > > > qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
> > > > kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
> > > > __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
> > > > kasan_slab_alloc include/linux/kasan.h:224 [inline]
> > > > slab_post_alloc_hook mm/slab.h:727 [inline]
> > > > slab_alloc_node mm/slub.c:3243 [inline]
> > > > slab_alloc mm/slub.c:3251 [inline]
> > > > __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
> > > > kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
> > > > kmem_cache_zalloc include/linux/slab.h:723 [inline]
> > > > alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
> > > > alloc_page_buffers+0x280/0x790 fs/buffer.c:829
> > > > create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
> > > > ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
> > > > ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
> > > > generic_perform_write+0x246/0x560 mm/filemap.c:3738
> > > > ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
> > > > ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
> > > > call_write_iter include/linux/fs.h:2187 [inline]
> > > > new_sync_write fs/read_write.c:491 [inline]
> > > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > > >
> > > > Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
> > > > Reported-by: syzbot <syzkaller@googlegroups.com>
> > > > Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
> > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > ---
> > > >  net/sched/sch_api.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > > > index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
> > > > --- a/net/sched/sch_api.c
> > > > +++ b/net/sched/sch_api.c
> > > > @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> > > >
> > > >  skip:
> > > >               if (!ingress) {
> > > > -                     notify_and_destroy(net, skb, n, classid,
> > > > -                                        rtnl_dereference(dev->qdisc), new);
> > > > +                     old = rtnl_dereference(dev->qdisc);
> > > >                       if (new && !new->ops->attach)
> > > >                               qdisc_refcount_inc(new);
> > > >                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
> > > >
> > > > +                     notify_and_destroy(net, skb, n, classid, old, new);
> > > > +
> > > >                       if (new && new->ops->attach)
> > > >                               new->ops->attach(new);
> > > >               } else {
> > >
> > > Hi Eric,
> > > We started seeing the following WARN_ON happening on htb destroy
> > > following your patch:
> > > https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
> > >
> > > Anything comes to mind?
> >
> > Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>
>
> CC: Maxim on non-nvidia address
>


Wild guess is that we need this fix:

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index e5b4bbf3ce3d5f36edb512d4017ebd97209bb377..f82c07bd3d60e26e5a1fe2b335dbec29aebb602e
100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1558,7 +1558,7 @@ static int htb_destroy_class_offload(struct
Qdisc *sch, struct htb_class *cl,
                /* Before HTB is destroyed, the kernel grafts noop_qdisc to
                 * all queues.
                 */
-               WARN_ON(!(old->flags & TCQ_F_BUILTIN));
+               WARN_ON(old && !(old->flags & TCQ_F_BUILTIN));
        else
                WARN_ON(old != q);


> > > I can provide the exact shell commands and more info if needed.
> > >
> > > Thanks
>
Maxim Mikityanskiy Nov. 3, 2022, 4:30 p.m. UTC | #6
On Thu, 3 Nov 2022 at 17:30, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 3, 2022 at 8:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 3 Nov 2022 08:17:06 -0700 Eric Dumazet wrote:
> > > On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
> > > >
> > > > On 18/10/2022 23:32, Eric Dumazet wrote:
> > > > > We had one syzbot report [1] in syzbot queue for a while.
> > > > > I was waiting for more occurrences and/or a repro but
> > > > > Dmitry Vyukov spotted the issue right away.
> > > > >
> > > > > <quoting Dmitry>
> > > > > qdisc_graft() drops reference to qdisc in notify_and_destroy
> > > > > while it's still assigned to dev->qdisc
> > > > > </quoting>
> > > > >
> > > > > Indeed, RCU rules are clear when replacing a data structure.
> > > > > The visible pointer (dev->qdisc in this case) must be updated
> > > > > to the new object _before_ RCU grace period is started
> > > > > (qdisc_put(old) in this case).
> > > > >
> > > > > [1]
> > > > > BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > > > Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
> > > > >
> > > > > CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
> > > > > Call Trace:
> > > > > <TASK>
> > > > > __dump_stack lib/dump_stack.c:88 [inline]
> > > > > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > > > > print_address_description mm/kasan/report.c:317 [inline]
> > > > > print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
> > > > > kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
> > > > > __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > > > __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
> > > > > tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
> > > > > rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
> > > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > RIP: 0033:0x7f5efaa89279
> > > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > > RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > > > > RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
> > > > > RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
> > > > > RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
> > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > > R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
> > > > > </TASK>
> > > > >
> > > > > Allocated by task 21027:
> > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > kasan_set_track mm/kasan/common.c:45 [inline]
> > > > > set_alloc_info mm/kasan/common.c:437 [inline]
> > > > > ____kasan_kmalloc mm/kasan/common.c:516 [inline]
> > > > > ____kasan_kmalloc mm/kasan/common.c:475 [inline]
> > > > > __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
> > > > > kmalloc_node include/linux/slab.h:623 [inline]
> > > > > kzalloc_node include/linux/slab.h:744 [inline]
> > > > > qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
> > > > > qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
> > > > > attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
> > > > > netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
> > > > > attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
> > > > > dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
> > > > > __dev_open+0x393/0x4d0 net/core/dev.c:1441
> > > > > __dev_change_flags+0x583/0x750 net/core/dev.c:8556
> > > > > rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
> > > > > rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
> > > > > __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
> > > > > rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
> > > > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >
> > > > > Freed by task 21020:
> > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > kasan_set_track+0x21/0x30 mm/kasan/common.c:45
> > > > > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
> > > > > ____kasan_slab_free mm/kasan/common.c:367 [inline]
> > > > > ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
> > > > > kasan_slab_free include/linux/kasan.h:200 [inline]
> > > > > slab_free_hook mm/slub.c:1754 [inline]
> > > > > slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
> > > > > slab_free mm/slub.c:3534 [inline]
> > > > > kfree+0xe2/0x580 mm/slub.c:4562
> > > > > rcu_do_batch kernel/rcu/tree.c:2245 [inline]
> > > > > rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
> > > > > __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
> > > > >
> > > > > Last potentially related work creation:
> > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > > > call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
> > > > > qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
> > > > > notify_and_destroy net/sched/sch_api.c:1012 [inline]
> > > > > qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
> > > > > tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
> > > > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >
> > > > > Second to last potentially related work creation:
> > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > > > kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
> > > > > neigh_destroy+0x431/0x630 net/core/neighbour.c:912
> > > > > neigh_release include/net/neighbour.h:454 [inline]
> > > > > neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
> > > > > neigh_del net/core/neighbour.c:225 [inline]
> > > > > neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
> > > > > neigh_forced_gc net/core/neighbour.c:276 [inline]
> > > > > neigh_alloc net/core/neighbour.c:447 [inline]
> > > > > ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
> > > > > ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
> > > > > __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
> > > > > ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
> > > > > NF_HOOK_COND include/linux/netfilter.h:296 [inline]
> > > > > ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
> > > > > dst_output include/net/dst.h:451 [inline]
> > > > > NF_HOOK include/linux/netfilter.h:307 [inline]
> > > > > NF_HOOK include/linux/netfilter.h:301 [inline]
> > > > > mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
> > > > > mld_send_cr net/ipv6/mcast.c:2121 [inline]
> > > > > mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
> > > > > process_one_work+0x991/0x1610 kernel/workqueue.c:2289
> > > > > worker_thread+0x665/0x1080 kernel/workqueue.c:2436
> > > > > kthread+0x2e4/0x3a0 kernel/kthread.c:376
> > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
> > > > >
> > > > > The buggy address belongs to the object at ffff88802065e000
> > > > > which belongs to the cache kmalloc-1k of size 1024
> > > > > The buggy address is located 56 bytes inside of
> > > > > 1024-byte region [ffff88802065e000, ffff88802065e400)
> > > > >
> > > > > The buggy address belongs to the physical page:
> > > > > page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
> > > > > head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
> > > > > flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> > > > > raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
> > > > > raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> > > > > page dumped because: kasan: bad access detected
> > > > > page_owner tracks the page as allocated
> > > > > page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
> > > > > prep_new_page mm/page_alloc.c:2532 [inline]
> > > > > get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
> > > > > __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
> > > > > alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
> > > > > alloc_slab_page mm/slub.c:1824 [inline]
> > > > > allocate_slab+0x27e/0x3d0 mm/slub.c:1969
> > > > > new_slab mm/slub.c:2029 [inline]
> > > > > ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
> > > > > __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
> > > > > slab_alloc_node mm/slub.c:3209 [inline]
> > > > > __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
> > > > > kmalloc_reserve net/core/skbuff.c:358 [inline]
> > > > > __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
> > > > > alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
> > > > > tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
> > > > > tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
> > > > > tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
> > > > > inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
> > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > sock_write_iter+0x291/0x3d0 net/socket.c:1108
> > > > > call_write_iter include/linux/fs.h:2187 [inline]
> > > > > new_sync_write fs/read_write.c:491 [inline]
> > > > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > > > > ksys_write+0x1e8/0x250 fs/read_write.c:631
> > > > > page last free stack trace:
> > > > > reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > > free_pages_prepare mm/page_alloc.c:1449 [inline]
> > > > > free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
> > > > > free_unref_page_prepare mm/page_alloc.c:3380 [inline]
> > > > > free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
> > > > > __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
> > > > > qlink_free mm/kasan/quarantine.c:168 [inline]
> > > > > qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
> > > > > kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
> > > > > __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
> > > > > kasan_slab_alloc include/linux/kasan.h:224 [inline]
> > > > > slab_post_alloc_hook mm/slab.h:727 [inline]
> > > > > slab_alloc_node mm/slub.c:3243 [inline]
> > > > > slab_alloc mm/slub.c:3251 [inline]
> > > > > __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
> > > > > kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
> > > > > kmem_cache_zalloc include/linux/slab.h:723 [inline]
> > > > > alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
> > > > > alloc_page_buffers+0x280/0x790 fs/buffer.c:829
> > > > > create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
> > > > > ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
> > > > > ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
> > > > > generic_perform_write+0x246/0x560 mm/filemap.c:3738
> > > > > ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
> > > > > ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
> > > > > call_write_iter include/linux/fs.h:2187 [inline]
> > > > > new_sync_write fs/read_write.c:491 [inline]
> > > > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > > > >
> > > > > Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
> > > > > Reported-by: syzbot <syzkaller@googlegroups.com>
> > > > > Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
> > > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > ---
> > > > >  net/sched/sch_api.c | 5 +++--
> > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > > > > index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
> > > > > --- a/net/sched/sch_api.c
> > > > > +++ b/net/sched/sch_api.c
> > > > > @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> > > > >
> > > > >  skip:
> > > > >               if (!ingress) {
> > > > > -                     notify_and_destroy(net, skb, n, classid,
> > > > > -                                        rtnl_dereference(dev->qdisc), new);
> > > > > +                     old = rtnl_dereference(dev->qdisc);
> > > > >                       if (new && !new->ops->attach)
> > > > >                               qdisc_refcount_inc(new);
> > > > >                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
> > > > >
> > > > > +                     notify_and_destroy(net, skb, n, classid, old, new);
> > > > > +
> > > > >                       if (new && new->ops->attach)
> > > > >                               new->ops->attach(new);
> > > > >               } else {
> > > >
> > > > Hi Eric,
> > > > We started seeing the following WARN_ON happening on htb destroy
> > > > following your patch:
> > > > https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
> > > >
> > > > Anything comes to mind?
> > >
> > > Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>
> >
> > CC: Maxim on non-nvidia address
> >
>
>
> Wild guess is that we need this fix:
>
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index e5b4bbf3ce3d5f36edb512d4017ebd97209bb377..f82c07bd3d60e26e5a1fe2b335dbec29aebb602e
> 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -1558,7 +1558,7 @@ static int htb_destroy_class_offload(struct
> Qdisc *sch, struct htb_class *cl,
>                 /* Before HTB is destroyed, the kernel grafts noop_qdisc to
>                  * all queues.
>                  */
> -               WARN_ON(!(old->flags & TCQ_F_BUILTIN));
> +               WARN_ON(old && !(old->flags & TCQ_F_BUILTIN));

I don't think it's the right fix. If the WARN_ON happens and doesn't
complain about old == NULL, it's not NULL, so this extra check won't
help. In any case, old comes from dev_graft_qdisc, which should never
return NULL.

Maybe something changed after Eric's patch, so that the statement
above is no longer true? I.e. I expect a noop_qdisc here, but there is
some other qdisc attached to the queue. I need to recall what was the
destroy flow and what code assigned noop_qdisc there...

>         else
>                 WARN_ON(old != q);
>
>
> > > > I can provide the exact shell commands and more info if needed.
> > > >
> > > > Thanks
> >
Eric Dumazet Nov. 3, 2022, 4:33 p.m. UTC | #7
On Thu, Nov 3, 2022 at 9:31 AM Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
>
> On Thu, 3 Nov 2022 at 17:30, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Nov 3, 2022 at 8:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Thu, 3 Nov 2022 08:17:06 -0700 Eric Dumazet wrote:
> > > > On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
> > > > >
> > > > > On 18/10/2022 23:32, Eric Dumazet wrote:
> > > > > > We had one syzbot report [1] in syzbot queue for a while.
> > > > > > I was waiting for more occurrences and/or a repro but
> > > > > > Dmitry Vyukov spotted the issue right away.
> > > > > >
> > > > > > <quoting Dmitry>
> > > > > > qdisc_graft() drops reference to qdisc in notify_and_destroy
> > > > > > while it's still assigned to dev->qdisc
> > > > > > </quoting>
> > > > > >
> > > > > > Indeed, RCU rules are clear when replacing a data structure.
> > > > > > The visible pointer (dev->qdisc in this case) must be updated
> > > > > > to the new object _before_ RCU grace period is started
> > > > > > (qdisc_put(old) in this case).
> > > > > >
> > > > > > [1]
> > > > > > BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > > > > Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
> > > > > >
> > > > > > CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > __dump_stack lib/dump_stack.c:88 [inline]
> > > > > > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > > > > > print_address_description mm/kasan/report.c:317 [inline]
> > > > > > print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
> > > > > > kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
> > > > > > __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
> > > > > > __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
> > > > > > tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
> > > > > > rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
> > > > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > RIP: 0033:0x7f5efaa89279
> > > > > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > > > RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > > > > > RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
> > > > > > RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
> > > > > > RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
> > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > > > R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
> > > > > > </TASK>
> > > > > >
> > > > > > Allocated by task 21027:
> > > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > > kasan_set_track mm/kasan/common.c:45 [inline]
> > > > > > set_alloc_info mm/kasan/common.c:437 [inline]
> > > > > > ____kasan_kmalloc mm/kasan/common.c:516 [inline]
> > > > > > ____kasan_kmalloc mm/kasan/common.c:475 [inline]
> > > > > > __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
> > > > > > kmalloc_node include/linux/slab.h:623 [inline]
> > > > > > kzalloc_node include/linux/slab.h:744 [inline]
> > > > > > qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
> > > > > > qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
> > > > > > attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
> > > > > > netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
> > > > > > attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
> > > > > > dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
> > > > > > __dev_open+0x393/0x4d0 net/core/dev.c:1441
> > > > > > __dev_change_flags+0x583/0x750 net/core/dev.c:8556
> > > > > > rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
> > > > > > rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
> > > > > > __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
> > > > > > rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
> > > > > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > >
> > > > > > Freed by task 21020:
> > > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > > kasan_set_track+0x21/0x30 mm/kasan/common.c:45
> > > > > > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
> > > > > > ____kasan_slab_free mm/kasan/common.c:367 [inline]
> > > > > > ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
> > > > > > kasan_slab_free include/linux/kasan.h:200 [inline]
> > > > > > slab_free_hook mm/slub.c:1754 [inline]
> > > > > > slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
> > > > > > slab_free mm/slub.c:3534 [inline]
> > > > > > kfree+0xe2/0x580 mm/slub.c:4562
> > > > > > rcu_do_batch kernel/rcu/tree.c:2245 [inline]
> > > > > > rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
> > > > > > __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
> > > > > >
> > > > > > Last potentially related work creation:
> > > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > > > > call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
> > > > > > qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
> > > > > > notify_and_destroy net/sched/sch_api.c:1012 [inline]
> > > > > > qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
> > > > > > tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
> > > > > > rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
> > > > > > netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
> > > > > > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
> > > > > > netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
> > > > > > netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
> > > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > > ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
> > > > > > ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
> > > > > > __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
> > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > >
> > > > > > Second to last potentially related work creation:
> > > > > > kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
> > > > > > __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
> > > > > > kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
> > > > > > neigh_destroy+0x431/0x630 net/core/neighbour.c:912
> > > > > > neigh_release include/net/neighbour.h:454 [inline]
> > > > > > neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
> > > > > > neigh_del net/core/neighbour.c:225 [inline]
> > > > > > neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
> > > > > > neigh_forced_gc net/core/neighbour.c:276 [inline]
> > > > > > neigh_alloc net/core/neighbour.c:447 [inline]
> > > > > > ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
> > > > > > ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
> > > > > > __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
> > > > > > ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
> > > > > > NF_HOOK_COND include/linux/netfilter.h:296 [inline]
> > > > > > ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
> > > > > > dst_output include/net/dst.h:451 [inline]
> > > > > > NF_HOOK include/linux/netfilter.h:307 [inline]
> > > > > > NF_HOOK include/linux/netfilter.h:301 [inline]
> > > > > > mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
> > > > > > mld_send_cr net/ipv6/mcast.c:2121 [inline]
> > > > > > mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
> > > > > > process_one_work+0x991/0x1610 kernel/workqueue.c:2289
> > > > > > worker_thread+0x665/0x1080 kernel/workqueue.c:2436
> > > > > > kthread+0x2e4/0x3a0 kernel/kthread.c:376
> > > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
> > > > > >
> > > > > > The buggy address belongs to the object at ffff88802065e000
> > > > > > which belongs to the cache kmalloc-1k of size 1024
> > > > > > The buggy address is located 56 bytes inside of
> > > > > > 1024-byte region [ffff88802065e000, ffff88802065e400)
> > > > > >
> > > > > > The buggy address belongs to the physical page:
> > > > > > page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
> > > > > > head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
> > > > > > flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> > > > > > raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
> > > > > > raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> > > > > > page dumped because: kasan: bad access detected
> > > > > > page_owner tracks the page as allocated
> > > > > > page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
> > > > > > prep_new_page mm/page_alloc.c:2532 [inline]
> > > > > > get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
> > > > > > __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
> > > > > > alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
> > > > > > alloc_slab_page mm/slub.c:1824 [inline]
> > > > > > allocate_slab+0x27e/0x3d0 mm/slub.c:1969
> > > > > > new_slab mm/slub.c:2029 [inline]
> > > > > > ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
> > > > > > __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
> > > > > > slab_alloc_node mm/slub.c:3209 [inline]
> > > > > > __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
> > > > > > kmalloc_reserve net/core/skbuff.c:358 [inline]
> > > > > > __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
> > > > > > alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
> > > > > > tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
> > > > > > tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
> > > > > > tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
> > > > > > inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
> > > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > sock_sendmsg+0xcf/0x120 net/socket.c:734
> > > > > > sock_write_iter+0x291/0x3d0 net/socket.c:1108
> > > > > > call_write_iter include/linux/fs.h:2187 [inline]
> > > > > > new_sync_write fs/read_write.c:491 [inline]
> > > > > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > > > > > ksys_write+0x1e8/0x250 fs/read_write.c:631
> > > > > > page last free stack trace:
> > > > > > reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > > > free_pages_prepare mm/page_alloc.c:1449 [inline]
> > > > > > free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
> > > > > > free_unref_page_prepare mm/page_alloc.c:3380 [inline]
> > > > > > free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
> > > > > > __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
> > > > > > qlink_free mm/kasan/quarantine.c:168 [inline]
> > > > > > qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
> > > > > > kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
> > > > > > __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
> > > > > > kasan_slab_alloc include/linux/kasan.h:224 [inline]
> > > > > > slab_post_alloc_hook mm/slab.h:727 [inline]
> > > > > > slab_alloc_node mm/slub.c:3243 [inline]
> > > > > > slab_alloc mm/slub.c:3251 [inline]
> > > > > > __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
> > > > > > kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
> > > > > > kmem_cache_zalloc include/linux/slab.h:723 [inline]
> > > > > > alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
> > > > > > alloc_page_buffers+0x280/0x790 fs/buffer.c:829
> > > > > > create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
> > > > > > ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
> > > > > > ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
> > > > > > generic_perform_write+0x246/0x560 mm/filemap.c:3738
> > > > > > ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
> > > > > > ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
> > > > > > call_write_iter include/linux/fs.h:2187 [inline]
> > > > > > new_sync_write fs/read_write.c:491 [inline]
> > > > > > vfs_write+0x9e9/0xdd0 fs/read_write.c:578
> > > > > >
> > > > > > Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
> > > > > > Reported-by: syzbot <syzkaller@googlegroups.com>
> > > > > > Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
> > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > > ---
> > > > > >  net/sched/sch_api.c | 5 +++--
> > > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > > > > > index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
> > > > > > --- a/net/sched/sch_api.c
> > > > > > +++ b/net/sched/sch_api.c
> > > > > > @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> > > > > >
> > > > > >  skip:
> > > > > >               if (!ingress) {
> > > > > > -                     notify_and_destroy(net, skb, n, classid,
> > > > > > -                                        rtnl_dereference(dev->qdisc), new);
> > > > > > +                     old = rtnl_dereference(dev->qdisc);
> > > > > >                       if (new && !new->ops->attach)
> > > > > >                               qdisc_refcount_inc(new);
> > > > > >                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
> > > > > >
> > > > > > +                     notify_and_destroy(net, skb, n, classid, old, new);
> > > > > > +
> > > > > >                       if (new && new->ops->attach)
> > > > > >                               new->ops->attach(new);
> > > > > >               } else {
> > > > >
> > > > > Hi Eric,
> > > > > We started seeing the following WARN_ON happening on htb destroy
> > > > > following your patch:
> > > > > https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
> > > > >
> > > > > Anything comes to mind?
> > > >
> > > > Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>
> > >
> > > CC: Maxim on non-nvidia address
> > >
> >
> >
> > Wild guess is that we need this fix:
> >
> > diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> > index e5b4bbf3ce3d5f36edb512d4017ebd97209bb377..f82c07bd3d60e26e5a1fe2b335dbec29aebb602e
> > 100644
> > --- a/net/sched/sch_htb.c
> > +++ b/net/sched/sch_htb.c
> > @@ -1558,7 +1558,7 @@ static int htb_destroy_class_offload(struct
> > Qdisc *sch, struct htb_class *cl,
> >                 /* Before HTB is destroyed, the kernel grafts noop_qdisc to
> >                  * all queues.
> >                  */
> > -               WARN_ON(!(old->flags & TCQ_F_BUILTIN));
> > +               WARN_ON(old && !(old->flags & TCQ_F_BUILTIN));
>
> I don't think it's the right fix. If the WARN_ON happens and doesn't
> complain about old == NULL, it's not NULL, so this extra check won't
> help. In any case, old comes from dev_graft_qdisc, which should never
> return NULL.
>
> Maybe something changed after Eric's patch, so that the statement
> above is no longer true? I.e. I expect a noop_qdisc here, but there is
> some other qdisc attached to the queue. I need to recall what was the
> destroy flow and what code assigned noop_qdisc there...

Gal please provide a repro. It is not clear if you get a WARN or a NULL deref.

>
> >         else
> >                 WARN_ON(old != q);
> >
> >
> > > > > I can provide the exact shell commands and more info if needed.
> > > > >
> > > > > Thanks
> > >
Gal Pressman Nov. 6, 2022, 8:07 a.m. UTC | #8
On 03/11/2022 18:33, Eric Dumazet wrote:
> On Thu, Nov 3, 2022 at 9:31 AM Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
>> On Thu, 3 Nov 2022 at 17:30, Eric Dumazet <edumazet@google.com> wrote:
>>> On Thu, Nov 3, 2022 at 8:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>> On Thu, 3 Nov 2022 08:17:06 -0700 Eric Dumazet wrote:
>>>>> On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
>>>>>> On 18/10/2022 23:32, Eric Dumazet wrote:
>>>>>>> We had one syzbot report [1] in syzbot queue for a while.
>>>>>>> I was waiting for more occurrences and/or a repro but
>>>>>>> Dmitry Vyukov spotted the issue right away.
>>>>>>>
>>>>>>> <quoting Dmitry>
>>>>>>> qdisc_graft() drops reference to qdisc in notify_and_destroy
>>>>>>> while it's still assigned to dev->qdisc
>>>>>>> </quoting>
>>>>>>>
>>>>>>> Indeed, RCU rules are clear when replacing a data structure.
>>>>>>> The visible pointer (dev->qdisc in this case) must be updated
>>>>>>> to the new object _before_ RCU grace period is started
>>>>>>> (qdisc_put(old) in this case).
>>>>>>>
>>>>>>> [1]
>>>>>>> BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
>>>>>>> Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
>>>>>>>
>>>>>>> CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
>>>>>>> Call Trace:
>>>>>>> <TASK>
>>>>>>> __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>>>>>>> print_address_description mm/kasan/report.c:317 [inline]
>>>>>>> print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
>>>>>>> kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
>>>>>>> __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
>>>>>>> __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
>>>>>>> tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
>>>>>>> rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>> RIP: 0033:0x7f5efaa89279
>>>>>>> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>>> RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>>>>>>> RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
>>>>>>> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
>>>>>>> RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>>> R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
>>>>>>> </TASK>
>>>>>>>
>>>>>>> Allocated by task 21027:
>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>> kasan_set_track mm/kasan/common.c:45 [inline]
>>>>>>> set_alloc_info mm/kasan/common.c:437 [inline]
>>>>>>> ____kasan_kmalloc mm/kasan/common.c:516 [inline]
>>>>>>> ____kasan_kmalloc mm/kasan/common.c:475 [inline]
>>>>>>> __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
>>>>>>> kmalloc_node include/linux/slab.h:623 [inline]
>>>>>>> kzalloc_node include/linux/slab.h:744 [inline]
>>>>>>> qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
>>>>>>> qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
>>>>>>> attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
>>>>>>> netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
>>>>>>> attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
>>>>>>> dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
>>>>>>> __dev_open+0x393/0x4d0 net/core/dev.c:1441
>>>>>>> __dev_change_flags+0x583/0x750 net/core/dev.c:8556
>>>>>>> rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
>>>>>>> rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
>>>>>>> __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
>>>>>>> rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
>>>>>>> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>
>>>>>>> Freed by task 21020:
>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>> kasan_set_track+0x21/0x30 mm/kasan/common.c:45
>>>>>>> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
>>>>>>> ____kasan_slab_free mm/kasan/common.c:367 [inline]
>>>>>>> ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
>>>>>>> kasan_slab_free include/linux/kasan.h:200 [inline]
>>>>>>> slab_free_hook mm/slub.c:1754 [inline]
>>>>>>> slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
>>>>>>> slab_free mm/slub.c:3534 [inline]
>>>>>>> kfree+0xe2/0x580 mm/slub.c:4562
>>>>>>> rcu_do_batch kernel/rcu/tree.c:2245 [inline]
>>>>>>> rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
>>>>>>> __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
>>>>>>>
>>>>>>> Last potentially related work creation:
>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
>>>>>>> call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
>>>>>>> qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
>>>>>>> notify_and_destroy net/sched/sch_api.c:1012 [inline]
>>>>>>> qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
>>>>>>> tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
>>>>>>> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>
>>>>>>> Second to last potentially related work creation:
>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
>>>>>>> kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
>>>>>>> neigh_destroy+0x431/0x630 net/core/neighbour.c:912
>>>>>>> neigh_release include/net/neighbour.h:454 [inline]
>>>>>>> neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
>>>>>>> neigh_del net/core/neighbour.c:225 [inline]
>>>>>>> neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
>>>>>>> neigh_forced_gc net/core/neighbour.c:276 [inline]
>>>>>>> neigh_alloc net/core/neighbour.c:447 [inline]
>>>>>>> ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
>>>>>>> ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
>>>>>>> __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
>>>>>>> ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
>>>>>>> NF_HOOK_COND include/linux/netfilter.h:296 [inline]
>>>>>>> ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
>>>>>>> dst_output include/net/dst.h:451 [inline]
>>>>>>> NF_HOOK include/linux/netfilter.h:307 [inline]
>>>>>>> NF_HOOK include/linux/netfilter.h:301 [inline]
>>>>>>> mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
>>>>>>> mld_send_cr net/ipv6/mcast.c:2121 [inline]
>>>>>>> mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
>>>>>>> process_one_work+0x991/0x1610 kernel/workqueue.c:2289
>>>>>>> worker_thread+0x665/0x1080 kernel/workqueue.c:2436
>>>>>>> kthread+0x2e4/0x3a0 kernel/kthread.c:376
>>>>>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
>>>>>>>
>>>>>>> The buggy address belongs to the object at ffff88802065e000
>>>>>>> which belongs to the cache kmalloc-1k of size 1024
>>>>>>> The buggy address is located 56 bytes inside of
>>>>>>> 1024-byte region [ffff88802065e000, ffff88802065e400)
>>>>>>>
>>>>>>> The buggy address belongs to the physical page:
>>>>>>> page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
>>>>>>> head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
>>>>>>> flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
>>>>>>> raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
>>>>>>> raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>>>>>>> page dumped because: kasan: bad access detected
>>>>>>> page_owner tracks the page as allocated
>>>>>>> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
>>>>>>> prep_new_page mm/page_alloc.c:2532 [inline]
>>>>>>> get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
>>>>>>> __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
>>>>>>> alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
>>>>>>> alloc_slab_page mm/slub.c:1824 [inline]
>>>>>>> allocate_slab+0x27e/0x3d0 mm/slub.c:1969
>>>>>>> new_slab mm/slub.c:2029 [inline]
>>>>>>> ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
>>>>>>> __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
>>>>>>> slab_alloc_node mm/slub.c:3209 [inline]
>>>>>>> __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
>>>>>>> kmalloc_reserve net/core/skbuff.c:358 [inline]
>>>>>>> __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
>>>>>>> alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
>>>>>>> tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
>>>>>>> tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
>>>>>>> tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
>>>>>>> inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>> sock_write_iter+0x291/0x3d0 net/socket.c:1108
>>>>>>> call_write_iter include/linux/fs.h:2187 [inline]
>>>>>>> new_sync_write fs/read_write.c:491 [inline]
>>>>>>> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
>>>>>>> ksys_write+0x1e8/0x250 fs/read_write.c:631
>>>>>>> page last free stack trace:
>>>>>>> reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>> free_pages_prepare mm/page_alloc.c:1449 [inline]
>>>>>>> free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
>>>>>>> free_unref_page_prepare mm/page_alloc.c:3380 [inline]
>>>>>>> free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
>>>>>>> __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
>>>>>>> qlink_free mm/kasan/quarantine.c:168 [inline]
>>>>>>> qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
>>>>>>> kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
>>>>>>> __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
>>>>>>> kasan_slab_alloc include/linux/kasan.h:224 [inline]
>>>>>>> slab_post_alloc_hook mm/slab.h:727 [inline]
>>>>>>> slab_alloc_node mm/slub.c:3243 [inline]
>>>>>>> slab_alloc mm/slub.c:3251 [inline]
>>>>>>> __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
>>>>>>> kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
>>>>>>> kmem_cache_zalloc include/linux/slab.h:723 [inline]
>>>>>>> alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
>>>>>>> alloc_page_buffers+0x280/0x790 fs/buffer.c:829
>>>>>>> create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
>>>>>>> ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
>>>>>>> ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
>>>>>>> generic_perform_write+0x246/0x560 mm/filemap.c:3738
>>>>>>> ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
>>>>>>> ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
>>>>>>> call_write_iter include/linux/fs.h:2187 [inline]
>>>>>>> new_sync_write fs/read_write.c:491 [inline]
>>>>>>> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
>>>>>>>
>>>>>>> Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
>>>>>>> Reported-by: syzbot <syzkaller@googlegroups.com>
>>>>>>> Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
>>>>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>>>> ---
>>>>>>>  net/sched/sch_api.c | 5 +++--
>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>>>>>>> index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
>>>>>>> --- a/net/sched/sch_api.c
>>>>>>> +++ b/net/sched/sch_api.c
>>>>>>> @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>>>>>>>
>>>>>>>  skip:
>>>>>>>               if (!ingress) {
>>>>>>> -                     notify_and_destroy(net, skb, n, classid,
>>>>>>> -                                        rtnl_dereference(dev->qdisc), new);
>>>>>>> +                     old = rtnl_dereference(dev->qdisc);
>>>>>>>                       if (new && !new->ops->attach)
>>>>>>>                               qdisc_refcount_inc(new);
>>>>>>>                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
>>>>>>>
>>>>>>> +                     notify_and_destroy(net, skb, n, classid, old, new);
>>>>>>> +
>>>>>>>                       if (new && new->ops->attach)
>>>>>>>                               new->ops->attach(new);
>>>>>>>               } else {
>>>>>> Hi Eric,
>>>>>> We started seeing the following WARN_ON happening on htb destroy
>>>>>> following your patch:
>>>>>> https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
>>>>>>
>>>>>> Anything comes to mind?
>>>>> Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>
>>>> CC: Maxim on non-nvidia address
>>>>
>>>
>>> Wild guess is that we need this fix:
>>>
>>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
>>> index e5b4bbf3ce3d5f36edb512d4017ebd97209bb377..f82c07bd3d60e26e5a1fe2b335dbec29aebb602e
>>> 100644
>>> --- a/net/sched/sch_htb.c
>>> +++ b/net/sched/sch_htb.c
>>> @@ -1558,7 +1558,7 @@ static int htb_destroy_class_offload(struct
>>> Qdisc *sch, struct htb_class *cl,
>>>                 /* Before HTB is destroyed, the kernel grafts noop_qdisc to
>>>                  * all queues.
>>>                  */
>>> -               WARN_ON(!(old->flags & TCQ_F_BUILTIN));
>>> +               WARN_ON(old && !(old->flags & TCQ_F_BUILTIN));
>> I don't think it's the right fix. If the WARN_ON happens and doesn't
>> complain about old == NULL, it's not NULL, so this extra check won't
>> help. In any case, old comes from dev_graft_qdisc, which should never
>> return NULL.
>>
>> Maybe something changed after Eric's patch, so that the statement
>> above is no longer true? I.e. I expect a noop_qdisc here, but there is
>> some other qdisc attached to the queue. I need to recall what was the
>> destroy flow and what code assigned noop_qdisc there...
> Gal please provide a repro. It is not clear if you get a WARN or a NULL deref.

The WARN_ON is triggering, old is not NULL and old->flags is equal to 0x174.

It reproduces consistently:
ip link set dev eth2 up
ip addr add 194.237.173.123/16 dev eth2
tc qdisc add dev eth2 clsact
tc qdisc add dev eth2 root handle 1: htb default 1 offload
tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
22500.0mbit burst 450000kbit cburst 450000kbit
tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
89900kbit cburst 89900kbit
tc qdisc delete dev eth2 clsact
tc qdisc delete dev eth2 root handle 1: htb default 1

Please let me know if there's anything else you want me to check.
Gal Pressman Nov. 10, 2022, 9:08 a.m. UTC | #9
On 06/11/2022 10:07, Gal Pressman wrote:
> On 03/11/2022 18:33, Eric Dumazet wrote:
>> On Thu, Nov 3, 2022 at 9:31 AM Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
>>> On Thu, 3 Nov 2022 at 17:30, Eric Dumazet <edumazet@google.com> wrote:
>>>> On Thu, Nov 3, 2022 at 8:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>> On Thu, 3 Nov 2022 08:17:06 -0700 Eric Dumazet wrote:
>>>>>> On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
>>>>>>> On 18/10/2022 23:32, Eric Dumazet wrote:
>>>>>>>> We had one syzbot report [1] in syzbot queue for a while.
>>>>>>>> I was waiting for more occurrences and/or a repro but
>>>>>>>> Dmitry Vyukov spotted the issue right away.
>>>>>>>>
>>>>>>>> <quoting Dmitry>
>>>>>>>> qdisc_graft() drops reference to qdisc in notify_and_destroy
>>>>>>>> while it's still assigned to dev->qdisc
>>>>>>>> </quoting>
>>>>>>>>
>>>>>>>> Indeed, RCU rules are clear when replacing a data structure.
>>>>>>>> The visible pointer (dev->qdisc in this case) must be updated
>>>>>>>> to the new object _before_ RCU grace period is started
>>>>>>>> (qdisc_put(old) in this case).
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
>>>>>>>> Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
>>>>>>>>
>>>>>>>> CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
>>>>>>>> Call Trace:
>>>>>>>> <TASK>
>>>>>>>> __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>>> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>>>>>>>> print_address_description mm/kasan/report.c:317 [inline]
>>>>>>>> print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
>>>>>>>> kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
>>>>>>>> __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
>>>>>>>> __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
>>>>>>>> tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
>>>>>>>> rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
>>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> RIP: 0033:0x7f5efaa89279
>>>>>>>> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>>>> RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>>>>>>>> RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
>>>>>>>> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
>>>>>>>> RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>>>> R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
>>>>>>>> </TASK>
>>>>>>>>
>>>>>>>> Allocated by task 21027:
>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>> kasan_set_track mm/kasan/common.c:45 [inline]
>>>>>>>> set_alloc_info mm/kasan/common.c:437 [inline]
>>>>>>>> ____kasan_kmalloc mm/kasan/common.c:516 [inline]
>>>>>>>> ____kasan_kmalloc mm/kasan/common.c:475 [inline]
>>>>>>>> __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
>>>>>>>> kmalloc_node include/linux/slab.h:623 [inline]
>>>>>>>> kzalloc_node include/linux/slab.h:744 [inline]
>>>>>>>> qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
>>>>>>>> qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
>>>>>>>> attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
>>>>>>>> netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
>>>>>>>> attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
>>>>>>>> dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
>>>>>>>> __dev_open+0x393/0x4d0 net/core/dev.c:1441
>>>>>>>> __dev_change_flags+0x583/0x750 net/core/dev.c:8556
>>>>>>>> rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
>>>>>>>> rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
>>>>>>>> __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
>>>>>>>> rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
>>>>>>>> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
>>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>
>>>>>>>> Freed by task 21020:
>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>> kasan_set_track+0x21/0x30 mm/kasan/common.c:45
>>>>>>>> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
>>>>>>>> ____kasan_slab_free mm/kasan/common.c:367 [inline]
>>>>>>>> ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
>>>>>>>> kasan_slab_free include/linux/kasan.h:200 [inline]
>>>>>>>> slab_free_hook mm/slub.c:1754 [inline]
>>>>>>>> slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
>>>>>>>> slab_free mm/slub.c:3534 [inline]
>>>>>>>> kfree+0xe2/0x580 mm/slub.c:4562
>>>>>>>> rcu_do_batch kernel/rcu/tree.c:2245 [inline]
>>>>>>>> rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
>>>>>>>> __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
>>>>>>>>
>>>>>>>> Last potentially related work creation:
>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
>>>>>>>> call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
>>>>>>>> qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
>>>>>>>> notify_and_destroy net/sched/sch_api.c:1012 [inline]
>>>>>>>> qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
>>>>>>>> tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
>>>>>>>> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
>>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>
>>>>>>>> Second to last potentially related work creation:
>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
>>>>>>>> kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
>>>>>>>> neigh_destroy+0x431/0x630 net/core/neighbour.c:912
>>>>>>>> neigh_release include/net/neighbour.h:454 [inline]
>>>>>>>> neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
>>>>>>>> neigh_del net/core/neighbour.c:225 [inline]
>>>>>>>> neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
>>>>>>>> neigh_forced_gc net/core/neighbour.c:276 [inline]
>>>>>>>> neigh_alloc net/core/neighbour.c:447 [inline]
>>>>>>>> ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
>>>>>>>> ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
>>>>>>>> __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
>>>>>>>> ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
>>>>>>>> NF_HOOK_COND include/linux/netfilter.h:296 [inline]
>>>>>>>> ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
>>>>>>>> dst_output include/net/dst.h:451 [inline]
>>>>>>>> NF_HOOK include/linux/netfilter.h:307 [inline]
>>>>>>>> NF_HOOK include/linux/netfilter.h:301 [inline]
>>>>>>>> mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
>>>>>>>> mld_send_cr net/ipv6/mcast.c:2121 [inline]
>>>>>>>> mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
>>>>>>>> process_one_work+0x991/0x1610 kernel/workqueue.c:2289
>>>>>>>> worker_thread+0x665/0x1080 kernel/workqueue.c:2436
>>>>>>>> kthread+0x2e4/0x3a0 kernel/kthread.c:376
>>>>>>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
>>>>>>>>
>>>>>>>> The buggy address belongs to the object at ffff88802065e000
>>>>>>>> which belongs to the cache kmalloc-1k of size 1024
>>>>>>>> The buggy address is located 56 bytes inside of
>>>>>>>> 1024-byte region [ffff88802065e000, ffff88802065e400)
>>>>>>>>
>>>>>>>> The buggy address belongs to the physical page:
>>>>>>>> page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
>>>>>>>> head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
>>>>>>>> flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
>>>>>>>> raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
>>>>>>>> raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>>>>>>>> page dumped because: kasan: bad access detected
>>>>>>>> page_owner tracks the page as allocated
>>>>>>>> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
>>>>>>>> prep_new_page mm/page_alloc.c:2532 [inline]
>>>>>>>> get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
>>>>>>>> __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
>>>>>>>> alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
>>>>>>>> alloc_slab_page mm/slub.c:1824 [inline]
>>>>>>>> allocate_slab+0x27e/0x3d0 mm/slub.c:1969
>>>>>>>> new_slab mm/slub.c:2029 [inline]
>>>>>>>> ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
>>>>>>>> __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
>>>>>>>> slab_alloc_node mm/slub.c:3209 [inline]
>>>>>>>> __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
>>>>>>>> kmalloc_reserve net/core/skbuff.c:358 [inline]
>>>>>>>> __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
>>>>>>>> alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
>>>>>>>> tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
>>>>>>>> tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
>>>>>>>> tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
>>>>>>>> inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>> sock_write_iter+0x291/0x3d0 net/socket.c:1108
>>>>>>>> call_write_iter include/linux/fs.h:2187 [inline]
>>>>>>>> new_sync_write fs/read_write.c:491 [inline]
>>>>>>>> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
>>>>>>>> ksys_write+0x1e8/0x250 fs/read_write.c:631
>>>>>>>> page last free stack trace:
>>>>>>>> reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>>> free_pages_prepare mm/page_alloc.c:1449 [inline]
>>>>>>>> free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
>>>>>>>> free_unref_page_prepare mm/page_alloc.c:3380 [inline]
>>>>>>>> free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
>>>>>>>> __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
>>>>>>>> qlink_free mm/kasan/quarantine.c:168 [inline]
>>>>>>>> qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
>>>>>>>> kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
>>>>>>>> __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
>>>>>>>> kasan_slab_alloc include/linux/kasan.h:224 [inline]
>>>>>>>> slab_post_alloc_hook mm/slab.h:727 [inline]
>>>>>>>> slab_alloc_node mm/slub.c:3243 [inline]
>>>>>>>> slab_alloc mm/slub.c:3251 [inline]
>>>>>>>> __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
>>>>>>>> kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
>>>>>>>> kmem_cache_zalloc include/linux/slab.h:723 [inline]
>>>>>>>> alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
>>>>>>>> alloc_page_buffers+0x280/0x790 fs/buffer.c:829
>>>>>>>> create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
>>>>>>>> ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
>>>>>>>> ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
>>>>>>>> generic_perform_write+0x246/0x560 mm/filemap.c:3738
>>>>>>>> ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
>>>>>>>> ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
>>>>>>>> call_write_iter include/linux/fs.h:2187 [inline]
>>>>>>>> new_sync_write fs/read_write.c:491 [inline]
>>>>>>>> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
>>>>>>>>
>>>>>>>> Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
>>>>>>>> Reported-by: syzbot <syzkaller@googlegroups.com>
>>>>>>>> Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
>>>>>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>>>>> ---
>>>>>>>>  net/sched/sch_api.c | 5 +++--
>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>>>>>>>> index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
>>>>>>>> --- a/net/sched/sch_api.c
>>>>>>>> +++ b/net/sched/sch_api.c
>>>>>>>> @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>>>>>>>>
>>>>>>>>  skip:
>>>>>>>>               if (!ingress) {
>>>>>>>> -                     notify_and_destroy(net, skb, n, classid,
>>>>>>>> -                                        rtnl_dereference(dev->qdisc), new);
>>>>>>>> +                     old = rtnl_dereference(dev->qdisc);
>>>>>>>>                       if (new && !new->ops->attach)
>>>>>>>>                               qdisc_refcount_inc(new);
>>>>>>>>                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
>>>>>>>>
>>>>>>>> +                     notify_and_destroy(net, skb, n, classid, old, new);
>>>>>>>> +
>>>>>>>>                       if (new && new->ops->attach)
>>>>>>>>                               new->ops->attach(new);
>>>>>>>>               } else {
>>>>>>> Hi Eric,
>>>>>>> We started seeing the following WARN_ON happening on htb destroy
>>>>>>> following your patch:
>>>>>>> https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
>>>>>>>
>>>>>>> Anything comes to mind?
>>>>>> Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>
>>>>> CC: Maxim on non-nvidia address
>>>>>
>>>> Wild guess is that we need this fix:
>>>>
>>>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
>>>> index e5b4bbf3ce3d5f36edb512d4017ebd97209bb377..f82c07bd3d60e26e5a1fe2b335dbec29aebb602e
>>>> 100644
>>>> --- a/net/sched/sch_htb.c
>>>> +++ b/net/sched/sch_htb.c
>>>> @@ -1558,7 +1558,7 @@ static int htb_destroy_class_offload(struct
>>>> Qdisc *sch, struct htb_class *cl,
>>>>                 /* Before HTB is destroyed, the kernel grafts noop_qdisc to
>>>>                  * all queues.
>>>>                  */
>>>> -               WARN_ON(!(old->flags & TCQ_F_BUILTIN));
>>>> +               WARN_ON(old && !(old->flags & TCQ_F_BUILTIN));
>>> I don't think it's the right fix. If the WARN_ON happens and doesn't
>>> complain about old == NULL, it's not NULL, so this extra check won't
>>> help. In any case, old comes from dev_graft_qdisc, which should never
>>> return NULL.
>>>
>>> Maybe something changed after Eric's patch, so that the statement
>>> above is no longer true? I.e. I expect a noop_qdisc here, but there is
>>> some other qdisc attached to the queue. I need to recall what was the
>>> destroy flow and what code assigned noop_qdisc there...
>> Gal please provide a repro. It is not clear if you get a WARN or a NULL deref.
> The WARN_ON is triggering, old is not NULL and old->flags is equal to 0x174.
>
> It reproduces consistently:
> ip link set dev eth2 up
> ip addr add 194.237.173.123/16 dev eth2
> tc qdisc add dev eth2 clsact
> tc qdisc add dev eth2 root handle 1: htb default 1 offload
> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
> 22500.0mbit burst 450000kbit cburst 450000kbit
> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
> 89900kbit cburst 89900kbit
> tc qdisc delete dev eth2 clsact
> tc qdisc delete dev eth2 root handle 1: htb default 1
>
> Please let me know if there's anything else you want me to check.

Hi Eric, did you get a chance to take a look?
Gal Pressman Nov. 20, 2022, 7:42 a.m. UTC | #10
On 10/11/2022 11:08, Gal Pressman wrote:
> On 06/11/2022 10:07, Gal Pressman wrote:
>> On 03/11/2022 18:33, Eric Dumazet wrote:
>>> On Thu, Nov 3, 2022 at 9:31 AM Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
>>>> On Thu, 3 Nov 2022 at 17:30, Eric Dumazet <edumazet@google.com> wrote:
>>>>> On Thu, Nov 3, 2022 at 8:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>> On Thu, 3 Nov 2022 08:17:06 -0700 Eric Dumazet wrote:
>>>>>>> On Thu, Nov 3, 2022 at 6:34 AM Gal Pressman <gal@nvidia.com> wrote:
>>>>>>>> On 18/10/2022 23:32, Eric Dumazet wrote:
>>>>>>>>> We had one syzbot report [1] in syzbot queue for a while.
>>>>>>>>> I was waiting for more occurrences and/or a repro but
>>>>>>>>> Dmitry Vyukov spotted the issue right away.
>>>>>>>>>
>>>>>>>>> <quoting Dmitry>
>>>>>>>>> qdisc_graft() drops reference to qdisc in notify_and_destroy
>>>>>>>>> while it's still assigned to dev->qdisc
>>>>>>>>> </quoting>
>>>>>>>>>
>>>>>>>>> Indeed, RCU rules are clear when replacing a data structure.
>>>>>>>>> The visible pointer (dev->qdisc in this case) must be updated
>>>>>>>>> to the new object _before_ RCU grace period is started
>>>>>>>>> (qdisc_put(old) in this case).
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
>>>>>>>>> Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
>>>>>>>>>
>>>>>>>>> CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
>>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
>>>>>>>>> Call Trace:
>>>>>>>>> <TASK>
>>>>>>>>> __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>>>> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>>>>>>>>> print_address_description mm/kasan/report.c:317 [inline]
>>>>>>>>> print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
>>>>>>>>> kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
>>>>>>>>> __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
>>>>>>>>> __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
>>>>>>>>> tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
>>>>>>>>> rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
>>>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>> RIP: 0033:0x7f5efaa89279
>>>>>>>>> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>>>>> RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>>>>>>>>> RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
>>>>>>>>> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
>>>>>>>>> RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>>>>> R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
>>>>>>>>> </TASK>
>>>>>>>>>
>>>>>>>>> Allocated by task 21027:
>>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>>> kasan_set_track mm/kasan/common.c:45 [inline]
>>>>>>>>> set_alloc_info mm/kasan/common.c:437 [inline]
>>>>>>>>> ____kasan_kmalloc mm/kasan/common.c:516 [inline]
>>>>>>>>> ____kasan_kmalloc mm/kasan/common.c:475 [inline]
>>>>>>>>> __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
>>>>>>>>> kmalloc_node include/linux/slab.h:623 [inline]
>>>>>>>>> kzalloc_node include/linux/slab.h:744 [inline]
>>>>>>>>> qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
>>>>>>>>> qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
>>>>>>>>> attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
>>>>>>>>> netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
>>>>>>>>> attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
>>>>>>>>> dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
>>>>>>>>> __dev_open+0x393/0x4d0 net/core/dev.c:1441
>>>>>>>>> __dev_change_flags+0x583/0x750 net/core/dev.c:8556
>>>>>>>>> rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
>>>>>>>>> rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
>>>>>>>>> __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
>>>>>>>>> rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
>>>>>>>>> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
>>>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>>
>>>>>>>>> Freed by task 21020:
>>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>>> kasan_set_track+0x21/0x30 mm/kasan/common.c:45
>>>>>>>>> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
>>>>>>>>> ____kasan_slab_free mm/kasan/common.c:367 [inline]
>>>>>>>>> ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
>>>>>>>>> kasan_slab_free include/linux/kasan.h:200 [inline]
>>>>>>>>> slab_free_hook mm/slub.c:1754 [inline]
>>>>>>>>> slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
>>>>>>>>> slab_free mm/slub.c:3534 [inline]
>>>>>>>>> kfree+0xe2/0x580 mm/slub.c:4562
>>>>>>>>> rcu_do_batch kernel/rcu/tree.c:2245 [inline]
>>>>>>>>> rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
>>>>>>>>> __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
>>>>>>>>>
>>>>>>>>> Last potentially related work creation:
>>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>>> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
>>>>>>>>> call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
>>>>>>>>> qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
>>>>>>>>> notify_and_destroy net/sched/sch_api.c:1012 [inline]
>>>>>>>>> qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
>>>>>>>>> tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
>>>>>>>>> rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
>>>>>>>>> netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
>>>>>>>>> netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>>>>>>>>> netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
>>>>>>>>> netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
>>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>>> ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
>>>>>>>>> ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
>>>>>>>>> __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
>>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>>
>>>>>>>>> Second to last potentially related work creation:
>>>>>>>>> kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
>>>>>>>>> __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
>>>>>>>>> kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
>>>>>>>>> neigh_destroy+0x431/0x630 net/core/neighbour.c:912
>>>>>>>>> neigh_release include/net/neighbour.h:454 [inline]
>>>>>>>>> neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
>>>>>>>>> neigh_del net/core/neighbour.c:225 [inline]
>>>>>>>>> neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
>>>>>>>>> neigh_forced_gc net/core/neighbour.c:276 [inline]
>>>>>>>>> neigh_alloc net/core/neighbour.c:447 [inline]
>>>>>>>>> ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
>>>>>>>>> ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
>>>>>>>>> __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
>>>>>>>>> ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
>>>>>>>>> NF_HOOK_COND include/linux/netfilter.h:296 [inline]
>>>>>>>>> ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
>>>>>>>>> dst_output include/net/dst.h:451 [inline]
>>>>>>>>> NF_HOOK include/linux/netfilter.h:307 [inline]
>>>>>>>>> NF_HOOK include/linux/netfilter.h:301 [inline]
>>>>>>>>> mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
>>>>>>>>> mld_send_cr net/ipv6/mcast.c:2121 [inline]
>>>>>>>>> mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
>>>>>>>>> process_one_work+0x991/0x1610 kernel/workqueue.c:2289
>>>>>>>>> worker_thread+0x665/0x1080 kernel/workqueue.c:2436
>>>>>>>>> kthread+0x2e4/0x3a0 kernel/kthread.c:376
>>>>>>>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
>>>>>>>>>
>>>>>>>>> The buggy address belongs to the object at ffff88802065e000
>>>>>>>>> which belongs to the cache kmalloc-1k of size 1024
>>>>>>>>> The buggy address is located 56 bytes inside of
>>>>>>>>> 1024-byte region [ffff88802065e000, ffff88802065e400)
>>>>>>>>>
>>>>>>>>> The buggy address belongs to the physical page:
>>>>>>>>> page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
>>>>>>>>> head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
>>>>>>>>> flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
>>>>>>>>> raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
>>>>>>>>> raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>>>>>>>>> page dumped because: kasan: bad access detected
>>>>>>>>> page_owner tracks the page as allocated
>>>>>>>>> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
>>>>>>>>> prep_new_page mm/page_alloc.c:2532 [inline]
>>>>>>>>> get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
>>>>>>>>> __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
>>>>>>>>> alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
>>>>>>>>> alloc_slab_page mm/slub.c:1824 [inline]
>>>>>>>>> allocate_slab+0x27e/0x3d0 mm/slub.c:1969
>>>>>>>>> new_slab mm/slub.c:2029 [inline]
>>>>>>>>> ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
>>>>>>>>> __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
>>>>>>>>> slab_alloc_node mm/slub.c:3209 [inline]
>>>>>>>>> __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
>>>>>>>>> kmalloc_reserve net/core/skbuff.c:358 [inline]
>>>>>>>>> __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
>>>>>>>>> alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
>>>>>>>>> tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
>>>>>>>>> tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
>>>>>>>>> tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
>>>>>>>>> inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
>>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>>> sock_sendmsg+0xcf/0x120 net/socket.c:734
>>>>>>>>> sock_write_iter+0x291/0x3d0 net/socket.c:1108
>>>>>>>>> call_write_iter include/linux/fs.h:2187 [inline]
>>>>>>>>> new_sync_write fs/read_write.c:491 [inline]
>>>>>>>>> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
>>>>>>>>> ksys_write+0x1e8/0x250 fs/read_write.c:631
>>>>>>>>> page last free stack trace:
>>>>>>>>> reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>>>> free_pages_prepare mm/page_alloc.c:1449 [inline]
>>>>>>>>> free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
>>>>>>>>> free_unref_page_prepare mm/page_alloc.c:3380 [inline]
>>>>>>>>> free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
>>>>>>>>> __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
>>>>>>>>> qlink_free mm/kasan/quarantine.c:168 [inline]
>>>>>>>>> qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
>>>>>>>>> kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
>>>>>>>>> __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
>>>>>>>>> kasan_slab_alloc include/linux/kasan.h:224 [inline]
>>>>>>>>> slab_post_alloc_hook mm/slab.h:727 [inline]
>>>>>>>>> slab_alloc_node mm/slub.c:3243 [inline]
>>>>>>>>> slab_alloc mm/slub.c:3251 [inline]
>>>>>>>>> __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
>>>>>>>>> kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
>>>>>>>>> kmem_cache_zalloc include/linux/slab.h:723 [inline]
>>>>>>>>> alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
>>>>>>>>> alloc_page_buffers+0x280/0x790 fs/buffer.c:829
>>>>>>>>> create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
>>>>>>>>> ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
>>>>>>>>> ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
>>>>>>>>> generic_perform_write+0x246/0x560 mm/filemap.c:3738
>>>>>>>>> ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
>>>>>>>>> ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
>>>>>>>>> call_write_iter include/linux/fs.h:2187 [inline]
>>>>>>>>> new_sync_write fs/read_write.c:491 [inline]
>>>>>>>>> vfs_write+0x9e9/0xdd0 fs/read_write.c:578
>>>>>>>>>
>>>>>>>>> Fixes: af356afa010f ("net_sched: reintroduce dev->qdisc for use by sch_api")
>>>>>>>>> Reported-by: syzbot <syzkaller@googlegroups.com>
>>>>>>>>> Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
>>>>>>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>>>>>> ---
>>>>>>>>>  net/sched/sch_api.c | 5 +++--
>>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>>>>>>>>> index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
>>>>>>>>> --- a/net/sched/sch_api.c
>>>>>>>>> +++ b/net/sched/sch_api.c
>>>>>>>>> @@ -1099,12 +1099,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>>>>>>>>>
>>>>>>>>>  skip:
>>>>>>>>>               if (!ingress) {
>>>>>>>>> -                     notify_and_destroy(net, skb, n, classid,
>>>>>>>>> -                                        rtnl_dereference(dev->qdisc), new);
>>>>>>>>> +                     old = rtnl_dereference(dev->qdisc);
>>>>>>>>>                       if (new && !new->ops->attach)
>>>>>>>>>                               qdisc_refcount_inc(new);
>>>>>>>>>                       rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
>>>>>>>>>
>>>>>>>>> +                     notify_and_destroy(net, skb, n, classid, old, new);
>>>>>>>>> +
>>>>>>>>>                       if (new && new->ops->attach)
>>>>>>>>>                               new->ops->attach(new);
>>>>>>>>>               } else {
>>>>>>>> Hi Eric,
>>>>>>>> We started seeing the following WARN_ON happening on htb destroy
>>>>>>>> following your patch:
>>>>>>>> https://elixir.bootlin.com/linux/v6.1-rc3/source/net/sched/sch_htb.c#L1561
>>>>>>>>
>>>>>>>> Anything comes to mind?
>>>>>>> Not really. Let's ask Maxim Mikityanskiy <maximmi@nvidia.com>
>>>>>> CC: Maxim on non-nvidia address
>>>>>>
>>>>> Wild guess is that we need this fix:
>>>>>
>>>>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
>>>>> index e5b4bbf3ce3d5f36edb512d4017ebd97209bb377..f82c07bd3d60e26e5a1fe2b335dbec29aebb602e
>>>>> 100644
>>>>> --- a/net/sched/sch_htb.c
>>>>> +++ b/net/sched/sch_htb.c
>>>>> @@ -1558,7 +1558,7 @@ static int htb_destroy_class_offload(struct
>>>>> Qdisc *sch, struct htb_class *cl,
>>>>>                 /* Before HTB is destroyed, the kernel grafts noop_qdisc to
>>>>>                  * all queues.
>>>>>                  */
>>>>> -               WARN_ON(!(old->flags & TCQ_F_BUILTIN));
>>>>> +               WARN_ON(old && !(old->flags & TCQ_F_BUILTIN));
>>>> I don't think it's the right fix. If the WARN_ON happens and doesn't
>>>> complain about old == NULL, it's not NULL, so this extra check won't
>>>> help. In any case, old comes from dev_graft_qdisc, which should never
>>>> return NULL.
>>>>
>>>> Maybe something changed after Eric's patch, so that the statement
>>>> above is no longer true? I.e. I expect a noop_qdisc here, but there is
>>>> some other qdisc attached to the queue. I need to recall what was the
>>>> destroy flow and what code assigned noop_qdisc there...
>>> Gal please provide a repro. It is not clear if you get a WARN or a NULL deref.
>> The WARN_ON is triggering, old is not NULL and old->flags is equal to 0x174.
>>
>> It reproduces consistently:
>> ip link set dev eth2 up
>> ip addr add 194.237.173.123/16 dev eth2
>> tc qdisc add dev eth2 clsact
>> tc qdisc add dev eth2 root handle 1: htb default 1 offload
>> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
>> 22500.0mbit burst 450000kbit cburst 450000kbit
>> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
>> 89900kbit cburst 89900kbit
>> tc qdisc delete dev eth2 clsact
>> tc qdisc delete dev eth2 root handle 1: htb default 1
>>
>> Please let me know if there's anything else you want me to check.
> Hi Eric, did you get a chance to take a look?

No response for quite a long time, Jakub, should I submit a revert?
Eric Dumazet Nov. 20, 2022, 4:09 p.m. UTC | #11
On Sat, Nov 19, 2022 at 11:42 PM Gal Pressman <gal@nvidia.com> wrote:
>
> On 10/11/2022 11:08, Gal Pressman wrote:
> > On 06/11/2022 10:07, Gal Pressman wrote:

> >> It reproduces consistently:
> >> ip link set dev eth2 up
> >> ip addr add 194.237.173.123/16 dev eth2
> >> tc qdisc add dev eth2 clsact
> >> tc qdisc add dev eth2 root handle 1: htb default 1 offload
> >> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
> >> 22500.0mbit burst 450000kbit cburst 450000kbit
> >> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
> >> 89900kbit cburst 89900kbit
> >> tc qdisc delete dev eth2 clsact
> >> tc qdisc delete dev eth2 root handle 1: htb default 1
> >>
> >> Please let me know if there's anything else you want me to check.
> > Hi Eric, did you get a chance to take a look?
>
> No response for quite a long time, Jakub, should I submit a revert?

Sorry, I won't have time to look at this before maybe two weeks.

If you want to revert a patch which is correct, because some code
assumes something wrong,
I will simply say this seems not good.

I think this offload code path needs more care from nvidia folks.
Gal Pressman Nov. 20, 2022, 4:43 p.m. UTC | #12
On 20/11/2022 18:09, Eric Dumazet wrote:
> On Sat, Nov 19, 2022 at 11:42 PM Gal Pressman <gal@nvidia.com> wrote:
>> On 10/11/2022 11:08, Gal Pressman wrote:
>>> On 06/11/2022 10:07, Gal Pressman wrote:
>>>> It reproduces consistently:
>>>> ip link set dev eth2 up
>>>> ip addr add 194.237.173.123/16 dev eth2
>>>> tc qdisc add dev eth2 clsact
>>>> tc qdisc add dev eth2 root handle 1: htb default 1 offload
>>>> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
>>>> 22500.0mbit burst 450000kbit cburst 450000kbit
>>>> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
>>>> 89900kbit cburst 89900kbit
>>>> tc qdisc delete dev eth2 clsact
>>>> tc qdisc delete dev eth2 root handle 1: htb default 1
>>>>
>>>> Please let me know if there's anything else you want me to check.
>>> Hi Eric, did you get a chance to take a look?
>> No response for quite a long time, Jakub, should I submit a revert?
> Sorry, I won't have time to look at this before maybe two weeks.

Thanks for the response, Eric.

> If you want to revert a patch which is correct, because some code
> assumes something wrong,

I am not convinced about the "code assumes something wrong" part, and
not sure what are the consequences of this WARN being triggered, are you?

> I will simply say this seems not good.

Arguable, it is not that clear that a fix that introduces another issue
is a good thing, particularly when we don't understand the severity of
the thing that got broken.

Two weeks gets us to the end of -rc7, a bit too dangerous to my personal
taste, but I'm not the one making the calls.
Eric Dumazet Nov. 20, 2022, 5 p.m. UTC | #13
On Sun, Nov 20, 2022 at 8:43 AM Gal Pressman <gal@nvidia.com> wrote:
>
> On 20/11/2022 18:09, Eric Dumazet wrote:
> > On Sat, Nov 19, 2022 at 11:42 PM Gal Pressman <gal@nvidia.com> wrote:
> >> On 10/11/2022 11:08, Gal Pressman wrote:
> >>> On 06/11/2022 10:07, Gal Pressman wrote:
> >>>> It reproduces consistently:
> >>>> ip link set dev eth2 up
> >>>> ip addr add 194.237.173.123/16 dev eth2
> >>>> tc qdisc add dev eth2 clsact
> >>>> tc qdisc add dev eth2 root handle 1: htb default 1 offload
> >>>> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
> >>>> 22500.0mbit burst 450000kbit cburst 450000kbit
> >>>> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
> >>>> 89900kbit cburst 89900kbit
> >>>> tc qdisc delete dev eth2 clsact
> >>>> tc qdisc delete dev eth2 root handle 1: htb default 1
> >>>>
> >>>> Please let me know if there's anything else you want me to check.
> >>> Hi Eric, did you get a chance to take a look?
> >> No response for quite a long time, Jakub, should I submit a revert?
> > Sorry, I won't have time to look at this before maybe two weeks.
>
> Thanks for the response, Eric.
>
> > If you want to revert a patch which is correct, because some code
> > assumes something wrong,
>
> I am not convinced about the "code assumes something wrong" part, and
> not sure what are the consequences of this WARN being triggered, are you?
>
> > I will simply say this seems not good.
>
> Arguable, it is not that clear that a fix that introduces another issue
> is a good thing, particularly when we don't understand the severity of
> the thing that got broken.

The offload part has been put while assuming a certain (clearly wrong) behavior.

RCU rules are quite the first thing we need to respect in the kernel.

Simply put, when KASAN detects a bug, you can be pretty damn sure it
is a real one.

>
> Two weeks gets us to the end of -rc7, a bit too dangerous to my personal
> taste, but I'm not the one making the calls.

Agreed, please try to find someone at nvidia able to understand what Maxim
was doing in commit ca49bfd90a9dde175d2929dc1544b54841e33804

If something needs stronger rules than standard RCU ones, this should
be articulated.

As I said, I won't be able to work on this before ~2 weeks.
Maxim Mikityanskiy Nov. 20, 2022, 9:39 p.m. UTC | #14
On Sun, 20 Nov 2022 at 19:01, Eric Dumazet <edumazet@google.com> wrote:
>
> On Sun, Nov 20, 2022 at 8:43 AM Gal Pressman <gal@nvidia.com> wrote:
> >
> > On 20/11/2022 18:09, Eric Dumazet wrote:
> > > On Sat, Nov 19, 2022 at 11:42 PM Gal Pressman <gal@nvidia.com> wrote:
> > >> On 10/11/2022 11:08, Gal Pressman wrote:
> > >>> On 06/11/2022 10:07, Gal Pressman wrote:
> > >>>> It reproduces consistently:
> > >>>> ip link set dev eth2 up
> > >>>> ip addr add 194.237.173.123/16 dev eth2
> > >>>> tc qdisc add dev eth2 clsact
> > >>>> tc qdisc add dev eth2 root handle 1: htb default 1 offload
> > >>>> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
> > >>>> 22500.0mbit burst 450000kbit cburst 450000kbit
> > >>>> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
> > >>>> 89900kbit cburst 89900kbit
> > >>>> tc qdisc delete dev eth2 clsact
> > >>>> tc qdisc delete dev eth2 root handle 1: htb default 1
> > >>>>
> > >>>> Please let me know if there's anything else you want me to check.
> > >>> Hi Eric, did you get a chance to take a look?
> > >> No response for quite a long time, Jakub, should I submit a revert?
> > > Sorry, I won't have time to look at this before maybe two weeks.
> >
> > Thanks for the response, Eric.
> >
> > > If you want to revert a patch which is correct, because some code
> > > assumes something wrong,
> >
> > I am not convinced about the "code assumes something wrong" part, and
> > not sure what are the consequences of this WARN being triggered, are you?
> >
> > > I will simply say this seems not good.
> >
> > Arguable, it is not that clear that a fix that introduces another issue
> > is a good thing, particularly when we don't understand the severity of
> > the thing that got broken.
>
> The offload part has been put while assuming a certain (clearly wrong) behavior.
>
> RCU rules are quite the first thing we need to respect in the kernel.
>
> Simply put, when KASAN detects a bug, you can be pretty damn sure it
> is a real one.
>
> >
> > Two weeks gets us to the end of -rc7, a bit too dangerous to my personal
> > taste, but I'm not the one making the calls.
>
> Agreed, please try to find someone at nvidia able to understand what Maxim
> was doing in commit ca49bfd90a9dde175d2929dc1544b54841e33804

The check for TCQ_F_BUILTIN basically means checking for noop_qdisc.
As the comment above the WARN_ON says, my code expects that before the
root HTB qdisc is destroyed (the notify_and_destroy line from Eric's
patch, in qdisc_graft), the kernel assigns &noop_qdisc to all of
dev_queue->qdisc-s (i.e. qdiscs of HTB classes). IIRC, it should
happen in dev_deactivate in qdisc_graft (and if IFF_UP is unset at
this point, that means dev_deactivate has been called before, from
__dev_close_many). That said, I don't see how Eric's patch could
affect this logic. What has changed in Eric's patch is that the old
root qdisc (HTB) is now destroyed after assigning the new qdisc to
dev->qdisc, but it has nothing to do with dev_queue->qdisc-s.

Gal, are you sure the WARN_ON only happens after Eric's patch? If yes,
could you do some tracing and find out whether all the qdiscs of the
netdev queues are noop_qdisc after the dev_deactivate line in
qdisc_graft — before and after Eric's patch?

for (i = 0; i < num_q; i++) {
        struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
        if (dev_queue->qdisc == &noop_qdisc)
                ...
}

> If something needs stronger rules than standard RCU ones, this should
> be articulated.
>
> As I said, I won't be able to work on this before ~2 weeks.
Gal Pressman Nov. 21, 2022, 8:08 a.m. UTC | #15
On 20/11/2022 23:39, Максим wrote:
> On Sun, 20 Nov 2022 at 19:01, Eric Dumazet <edumazet@google.com> wrote:
>> On Sun, Nov 20, 2022 at 8:43 AM Gal Pressman <gal@nvidia.com> wrote:
>>> On 20/11/2022 18:09, Eric Dumazet wrote:
>>>> On Sat, Nov 19, 2022 at 11:42 PM Gal Pressman <gal@nvidia.com> wrote:
>>>>> On 10/11/2022 11:08, Gal Pressman wrote:
>>>>>> On 06/11/2022 10:07, Gal Pressman wrote:
>>>>>>> It reproduces consistently:
>>>>>>> ip link set dev eth2 up
>>>>>>> ip addr add 194.237.173.123/16 dev eth2
>>>>>>> tc qdisc add dev eth2 clsact
>>>>>>> tc qdisc add dev eth2 root handle 1: htb default 1 offload
>>>>>>> tc class add dev eth2 classid 1: parent root htb rate 18000mbit ceil
>>>>>>> 22500.0mbit burst 450000kbit cburst 450000kbit
>>>>>>> tc class add dev eth2 classid 1:3 parent 1: htb rate 3596mbit burst
>>>>>>> 89900kbit cburst 89900kbit
>>>>>>> tc qdisc delete dev eth2 clsact
>>>>>>> tc qdisc delete dev eth2 root handle 1: htb default 1
>>>>>>>
>>>>>>> Please let me know if there's anything else you want me to check.
>>>>>> Hi Eric, did you get a chance to take a look?
>>>>> No response for quite a long time, Jakub, should I submit a revert?
>>>> Sorry, I won't have time to look at this before maybe two weeks.
>>> Thanks for the response, Eric.
>>>
>>>> If you want to revert a patch which is correct, because some code
>>>> assumes something wrong,
>>> I am not convinced about the "code assumes something wrong" part, and
>>> not sure what are the consequences of this WARN being triggered, are you?
>>>
>>>> I will simply say this seems not good.
>>> Arguable, it is not that clear that a fix that introduces another issue
>>> is a good thing, particularly when we don't understand the severity of
>>> the thing that got broken.
>> The offload part has been put while assuming a certain (clearly wrong) behavior.
>>
>> RCU rules are quite the first thing we need to respect in the kernel.
>>
>> Simply put, when KASAN detects a bug, you can be pretty damn sure it
>> is a real one.
>>
>>> Two weeks gets us to the end of -rc7, a bit too dangerous to my personal
>>> taste, but I'm not the one making the calls.
>> Agreed, please try to find someone at nvidia able to understand what Maxim
>> was doing in commit ca49bfd90a9dde175d2929dc1544b54841e33804
> The check for TCQ_F_BUILTIN basically means checking for noop_qdisc.
> As the comment above the WARN_ON says, my code expects that before the
> root HTB qdisc is destroyed (the notify_and_destroy line from Eric's
> patch, in qdisc_graft), the kernel assigns &noop_qdisc to all of
> dev_queue->qdisc-s (i.e. qdiscs of HTB classes). IIRC, it should
> happen in dev_deactivate in qdisc_graft (and if IFF_UP is unset at
> this point, that means dev_deactivate has been called before, from
> __dev_close_many). That said, I don't see how Eric's patch could
> affect this logic. What has changed in Eric's patch is that the old
> root qdisc (HTB) is now destroyed after assigning the new qdisc to
> dev->qdisc, but it has nothing to do with dev_queue->qdisc-s.
>
> Gal, are you sure the WARN_ON only happens after Eric's patch? If yes,
> could you do some tracing and find out whether all the qdiscs of the
> netdev queues are noop_qdisc after the dev_deactivate line in
> qdisc_graft — before and after Eric's patch?
>
> for (i = 0; i < num_q; i++) {
>         struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
>         if (dev_queue->qdisc == &noop_qdisc)
>                 ...
> }

Hey Maxim, thanks for taking a look into this.
Yes, the WARN_ON is not triggered before Eric's patch.

The behavior of the code you mentioned is the same before and after his
patch though.
In qdisc_graft():
* Before dev_deactivate(), qdiscs 0-5 are != noop_qdisc, qdiscs 6-303
are == noop_qdisc
* After dev_deactivate(), all qdiscs are == noop_qdisc.
diff mbox series

Patch

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c98af0ada706efee202a20a6bfb6f2b984106f45..4a27dfb1ba0faab3692a82969fb8b78768742779 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1099,12 +1099,13 @@  static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 
 skip:
 		if (!ingress) {
-			notify_and_destroy(net, skb, n, classid,
-					   rtnl_dereference(dev->qdisc), new);
+			old = rtnl_dereference(dev->qdisc);
 			if (new && !new->ops->attach)
 				qdisc_refcount_inc(new);
 			rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc);
 
+			notify_and_destroy(net, skb, n, classid, old, new);
+
 			if (new && new->ops->attach)
 				new->ops->attach(new);
 		} else {