[net] net/sched: sch_taprio: fix possible use-after-free

Message ID	20230113164849.4004848-1-edumazet@google.com (mailing list archive)
State	Accepted
Commit	3a415d59c1dbec9d772dbfab2d2520d98360caae
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Fri, 13 Jan 2023 16:48:49 +0000 Mime-Version: 1.0 Message-ID: <20230113164849.4004848-1-edumazet@google.com> Subject: [PATCH net] net/sched: sch_taprio: fix possible use-after-free From: Eric Dumazet <edumazet@google.com> To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org, eric.dumazet@gmail.com, Eric Dumazet <edumazet@google.com>, syzbot <syzkaller@googlegroups.com>, Alexander Potapenko <glider@google.com>, Vinicius Costa Gomes <vinicius.gomes@intel.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	[net] net/sched: sch_taprio: fix possible use-after-free \| expand [net] net/sched: sch_taprio: fix possible use-after-free

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1373 this patch: 1373
netdev/cc_maintainers	warning	3 maintainers not CCed: jhs@mojatatu.com xiyou.wangcong@gmail.com jiri@resnulli.us
netdev/build_clang	success	Errors and warnings before: 138 this patch: 138
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1394 this patch: 1394
netdev/checkpatch	warning	WARNING: Possible repeated word: 'Google' WARNING: msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Eric Dumazet Jan. 13, 2023, 4:48 p.m. UTC

syzbot reported a nasty crash [1] in net_tx_action() which
made little sense until we got a repro.

This repro installs a taprio qdisc, but providing an
invalid TCA_RATE attribute.

qdisc_create() has to destroy the just initialized
taprio qdisc, and taprio_destroy() is called.

However, the hrtimer used by taprio had already fired,
therefore advance_sched() called __netif_schedule().

Then net_tx_action was trying to use a destroyed qdisc.

We can not undo the __netif_schedule(), so we must wait
until one cpu serviced the qdisc before we can proceed.

Many thanks to Alexander Potapenko for his help.

[1]
BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
 queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
 do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
 __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
 _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
 spin_trylock include/linux/spinlock.h:359 [inline]
 qdisc_run_begin include/net/sch_generic.h:187 [inline]
 qdisc_run+0xee/0x540 include/net/pkt_sched.h:125
 net_tx_action+0x77c/0x9a0 net/core/dev.c:5086
 __do_softirq+0x1cc/0x7fb kernel/softirq.c:571
 run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934
 smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164
 kthread+0x31b/0x430 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30

Uninit was created at:
 slab_post_alloc_hook mm/slab.h:732 [inline]
 slab_alloc_node mm/slub.c:3258 [inline]
 __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970
 kmalloc_reserve net/core/skbuff.c:358 [inline]
 __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430
 alloc_skb include/linux/skbuff.h:1257 [inline]
 nlmsg_new include/net/netlink.h:953 [inline]
 netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436
 netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507
 rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
 sock_sendmsg_nosec net/socket.c:714 [inline]
 sock_sendmsg net/socket.c:734 [inline]
 ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482
 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536
 __sys_sendmsg net/socket.c:2565 [inline]
 __do_sys_sendmsg net/socket.c:2574 [inline]
 __se_sys_sendmsg net/socket.c:2572 [inline]
 __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022

Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/net/sch_generic.h | 7 +++++++
 net/sched/sch_taprio.c    | 3 +++
 2 files changed, 10 insertions(+)

Vinicius Costa Gomes Jan. 13, 2023, 11:41 p.m. UTC | #1

Hi,

Eric Dumazet <edumazet@google.com> writes:

> syzbot reported a nasty crash [1] in net_tx_action() which
> made little sense until we got a repro.
>
> This repro installs a taprio qdisc, but providing an
> invalid TCA_RATE attribute.
>
> qdisc_create() has to destroy the just initialized
> taprio qdisc, and taprio_destroy() is called.
>
> However, the hrtimer used by taprio had already fired,
> therefore advance_sched() called __netif_schedule().
>
> Then net_tx_action was trying to use a destroyed qdisc.
>
> We can not undo the __netif_schedule(), so we must wait
> until one cpu serviced the qdisc before we can proceed.
>
> Many thanks to Alexander Potapenko for his help.
>
> [1]
> BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
> BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
> BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
> BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
>  queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
>  do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
>  __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
>  _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
>  spin_trylock include/linux/spinlock.h:359 [inline]
>  qdisc_run_begin include/net/sch_generic.h:187 [inline]
>  qdisc_run+0xee/0x540 include/net/pkt_sched.h:125
>  net_tx_action+0x77c/0x9a0 net/core/dev.c:5086
>  __do_softirq+0x1cc/0x7fb kernel/softirq.c:571
>  run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934
>  smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164
>  kthread+0x31b/0x430 kernel/kthread.c:376
>  ret_from_fork+0x1f/0x30
>
> Uninit was created at:
>  slab_post_alloc_hook mm/slab.h:732 [inline]
>  slab_alloc_node mm/slub.c:3258 [inline]
>  __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970
>  kmalloc_reserve net/core/skbuff.c:358 [inline]
>  __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430
>  alloc_skb include/linux/skbuff.h:1257 [inline]
>  nlmsg_new include/net/netlink.h:953 [inline]
>  netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436
>  netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507
>  rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108
>  netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>  netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
>  netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
>  sock_sendmsg_nosec net/socket.c:714 [inline]
>  sock_sendmsg net/socket.c:734 [inline]
>  ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482
>  ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536
>  __sys_sendmsg net/socket.c:2565 [inline]
>  __do_sys_sendmsg net/socket.c:2574 [inline]
>  __se_sys_sendmsg net/socket.c:2572 [inline]
>  __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
>
> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  include/net/sch_generic.h | 7 +++++++
>  net/sched/sch_taprio.c    | 3 +++
>  2 files changed, 10 insertions(+)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index d5517719af4ef22282f0a15b132f8e8a07ae4179..af4aa66aaa4eba8f2eacdd00bc8fef31165c6a90 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -1288,4 +1288,11 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx);
>  
>  int sch_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb));
>  
> +/* Make sure qdisc is no longer in SCHED state. */
> +static inline void qdisc_synchronize(const struct Qdisc *q)
> +{
> +	while (test_bit(__QDISC_STATE_SCHED, &q->state))
> +		msleep(1);
> +}
> +
>  #endif
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 570389f6cdd7dbab5749dc06d886555305cbf623..9a11a499ea2df8d18c9c062496fdcbcf5a861391 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -1700,6 +1700,8 @@ static void taprio_reset(struct Qdisc *sch)
>  	int i;
>  
>  	hrtimer_cancel(&q->advance_timer);
> +	qdisc_synchronize(sch);
> +

From the commit message, I got the impression that only the one
qdisc_synchronize() in taprio_destroy() would be needed.

>  	if (q->qdiscs) {
>  		for (i = 0; i < dev->num_tx_queues; i++)
>  			if (q->qdiscs[i])
> @@ -1720,6 +1722,7 @@ static void taprio_destroy(struct Qdisc *sch)
>  	 * happens in qdisc_create(), after taprio_init() has been called.
>  	 */
>  	hrtimer_cancel(&q->advance_timer);
> +	qdisc_synchronize(sch);
>  
>  	taprio_disable_offload(dev, q, NULL);
>  
> -- 
> 2.39.0.314.g84b9a713c41-goog
>


Cheers,

Cong Wang Jan. 16, 2023, 12:35 a.m. UTC | #2

On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote:
> syzbot reported a nasty crash [1] in net_tx_action() which
> made little sense until we got a repro.
> 
> This repro installs a taprio qdisc, but providing an
> invalid TCA_RATE attribute.
> 
> qdisc_create() has to destroy the just initialized
> taprio qdisc, and taprio_destroy() is called.
> 
> However, the hrtimer used by taprio had already fired,
> therefore advance_sched() called __netif_schedule().
> 
> Then net_tx_action was trying to use a destroyed qdisc.
> 
> We can not undo the __netif_schedule(), so we must wait
> until one cpu serviced the qdisc before we can proceed.
> 

This workaround looks a bit ugly. I think we _may_ be able to make
hrtimer_start() as the last step of the initialization, IOW, move other
validations and allocations before it.

Can you share your reproducer?

Thanks,

shaozhengchao Jan. 16, 2023, 2:07 a.m. UTC | #3

On 2023/1/16 8:35, Cong Wang wrote:
> On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote:
>> syzbot reported a nasty crash [1] in net_tx_action() which
>> made little sense until we got a repro.
>>
>> This repro installs a taprio qdisc, but providing an
>> invalid TCA_RATE attribute.
>>
>> qdisc_create() has to destroy the just initialized
>> taprio qdisc, and taprio_destroy() is called.
>>
>> However, the hrtimer used by taprio had already fired,
>> therefore advance_sched() called __netif_schedule().
>>
>> Then net_tx_action was trying to use a destroyed qdisc.
>>
>> We can not undo the __netif_schedule(), so we must wait
>> until one cpu serviced the qdisc before we can proceed.
>>
> 
> This workaround looks a bit ugly. I think we _may_ be able to make
> hrtimer_start() as the last step of the initialization, IOW, move other
> validations and allocations before it.
> 
> Can you share your reproducer?
> 
> Thanks,
Maybe the issue is the same as 
https://syzkaller.appspot.com/bug?id=1ccb246eecb5114c440218336e4c7205aed5f2c8

Alexander Potapenko Jan. 16, 2023, 9:03 a.m. UTC | #4

On Mon, Jan 16, 2023 at 3:07 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>
>
>
> On 2023/1/16 8:35, Cong Wang wrote:
> > On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote:
> >> syzbot reported a nasty crash [1] in net_tx_action() which
> >> made little sense until we got a repro.
> >>
> >> This repro installs a taprio qdisc, but providing an
> >> invalid TCA_RATE attribute.
> >>
> >> qdisc_create() has to destroy the just initialized
> >> taprio qdisc, and taprio_destroy() is called.
> >>
> >> However, the hrtimer used by taprio had already fired,
> >> therefore advance_sched() called __netif_schedule().
> >>
> >> Then net_tx_action was trying to use a destroyed qdisc.
> >>
> >> We can not undo the __netif_schedule(), so we must wait
> >> until one cpu serviced the qdisc before we can proceed.
> >>
> >
> > This workaround looks a bit ugly. I think we _may_ be able to make
> > hrtimer_start() as the last step of the initialization, IOW, move other
> > validations and allocations before it.
> >
> > Can you share your reproducer?
> >
> > Thanks,
> Maybe the issue is the same as
> https://syzkaller.appspot.com/bug?id=1ccb246eecb5114c440218336e4c7205aed5f2c8

Most certainly, yes.
I also think there were stall reports with the same stack trace where
qdisc_run was unable to take a freed lock because its value was set to
1 by another task.

Eric Dumazet Jan. 16, 2023, 9:36 a.m. UTC | #5

On Mon, Jan 16, 2023 at 1:35 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote:
> > syzbot reported a nasty crash [1] in net_tx_action() which
> > made little sense until we got a repro.
> >
> > This repro installs a taprio qdisc, but providing an
> > invalid TCA_RATE attribute.
> >
> > qdisc_create() has to destroy the just initialized
> > taprio qdisc, and taprio_destroy() is called.
> >
> > However, the hrtimer used by taprio had already fired,
> > therefore advance_sched() called __netif_schedule().
> >
> > Then net_tx_action was trying to use a destroyed qdisc.
> >
> > We can not undo the __netif_schedule(), so we must wait
> > until one cpu serviced the qdisc before we can proceed.
> >
>
> This workaround looks a bit ugly. I think we _may_ be able to make
> hrtimer_start() as the last step of the initialization, IOW, move other
> validations and allocations before it.
>

taprio_init() detects no error.

So moving around the hrtimer_start() inside it won't help.

The error comes later from a wrong TCA_RATE attempt can then:

static struct Qdisc *qdisc_create(...
...
err = gen_new_estimator(...);
if (err) {
    NL_SET_ERR_MSG(extack, "Failed to generate new estimator");
    goto err_out4;
}

...

err_out4:
qdisc_put_stab(rtnl_dereference(sch->stab));
 if (ops->destroy)
     ops->destroy(sch);
goto err_out3;

This is why we need to make sure ->destroy will fully undo what ->init did,
including the possible fact that the hrtimer already fired.
This seems to be taprio specific.

Or we would need a new method, like   ->post_init(), that should be
called once all steps have been a success.

Or call the hrtimer_start() at first taprio_enqueue(), adding a
conditional in fast path...

> Can you share your reproducer?

Not publicly.

Although I think the bug is clear enough.

Eric Dumazet Jan. 16, 2023, 10:03 a.m. UTC | #6

On Sat, Jan 14, 2023 at 12:41 AM Vinicius Costa Gomes
<vinicius.gomes@intel.com> wrote:
>
> Hi,
>
> From the commit message, I got the impression that only the one
> qdisc_synchronize() in taprio_destroy() would be needed.
>

This could be, but then why having hrtimer_cancel(&q->advance_timer);
in taprio_reset(), since it is already in taprio_destroy() ?

patchwork-bot+netdevbpf@kernel.org Jan. 16, 2023, 1:30 p.m. UTC | #7

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 13 Jan 2023 16:48:49 +0000 you wrote:
> syzbot reported a nasty crash [1] in net_tx_action() which
> made little sense until we got a repro.
> 
> This repro installs a taprio qdisc, but providing an
> invalid TCA_RATE attribute.
> 
> qdisc_create() has to destroy the just initialized
> taprio qdisc, and taprio_destroy() is called.
> 
> [...]

Here is the summary with links:
  - [net] net/sched: sch_taprio: fix possible use-after-free
    https://git.kernel.org/netdev/net/c/3a415d59c1db

You are awesome, thank you!

Eric Dumazet Jan. 18, 2023, 11:43 a.m. UTC | #8

On Sat, Jan 14, 2023 at 12:41 AM Vinicius Costa Gomes
<vinicius.gomes@intel.com> wrote:
>
> Hi,
>
>
> From the commit message, I got the impression that only the one
> qdisc_synchronize() in taprio_destroy() would be needed.
>

Hmm, I think you are right, qdisc_reset() is probably called while
qdisc lock is held,
with BH disabled.

So calling msleep() from qdisc_reset() is a no go.

I will send a patch removing the change in taprio_reset(), thanks.

[net] net/sched: sch_taprio: fix possible use-after-free

Checks

Commit Message

Comments

Patch