diff mbox series

net/sched: Set the flushing flags to false to prevent an infinite loop

Message ID 20230606144511.1520657-1-renmingshuai@huawei.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net/sched: Set the flushing flags to false to prevent an infinite loop | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 17 this patch: 17
netdev/cc_maintainers fail 1 blamed authors not CCed: vladbu@mellanox.com; 1 maintainers not CCed: vladbu@mellanox.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 17 this patch: 17
netdev/checkpatch warning WARNING: From:/Signed-off-by: email name mismatch: 'From: renmingshuai <renmingshuai@huawei.com>' != 'Signed-off-by: Ren Mingshuai <renmingshuai@huawei.com>'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

renmingshuai June 6, 2023, 2:45 p.m. UTC
When a new chain is added by using tc, one soft lockup alarm will be
 generated after delete the prio 0 filter of the chain. To reproduce
 the problem, perform the following steps:
(1) tc qdisc add dev eth0 root handle 1: htb default 1
(2) tc chain add dev eth0
(3) tc filter del dev eth0 chain 0 parent 1: prio 0
(4) tc filter add dev eth0 chain 0 parent 1:


The refcnt of the chain added by step 2 is equal to 1. After step 3,
 the flushing flag of the chain is set to true in the tcf_chain_flush()
 called by tc_del_tfilter() because the prio is 0. In this case, if
 we add a new filter to this chain, it will never succeed and try again
 and again because the refresh flash is always true and refcnt is 1.
 A soft lock alarm is generated 20 seconds later.
The stack is show as below:

Kernel panic - not syncing: softlockup: hung tasks
CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
 <IRQ>
 dump_stack+0x57/0x6e
 panic+0x196/0x3ec
 watchdog_timer_fn.cold+0x16/0x5c
 __run_hrtimer+0x5e/0x190
 __hrtimer_run_queues+0x8a/0xe0
 hrtimer_interrupt+0x110/0x2c0
 ? irqtime_account_irq+0x49/0xf0
 __sysvec_apic_timer_interrupt+0x5f/0xe0
 asm_call_irq_on_stack+0x12/0x20
 </IRQ>
 sysvec_apic_timer_interrupt+0x72/0x80
 asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:mutex_lock+0x24/0x70
RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
 __tcf_chain_put+0x27/0x200
 tc_new_tfilter+0x5e8/0x810
 ? tc_setup_cb_add+0x210/0x210
 rtnetlink_rcv_msg+0x2e3/0x380
 ? rtnl_calcit.isra.0+0x120/0x120
 netlink_rcv_skb+0x50/0x100
 netlink_unicast+0x12d/0x1d0
 netlink_sendmsg+0x286/0x490
 sock_sendmsg+0x62/0x70
 ____sys_sendmsg+0x24c/0x2c0
 ? import_iovec+0x17/0x20
 ? sendmsg_copy_msghdr+0x80/0xa0
 ___sys_sendmsg+0x75/0xc0
 ? do_fault_around+0x118/0x160
 ? do_read_fault+0x68/0xf0
 ? __handle_mm_fault+0x3f9/0x6f0
 __sys_sendmsg+0x59/0xa0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f96705b8247
RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00

To avoid this case, set chain->flushing to be false if the chain->refcnt
 is 1 after flushing the chain when prio is 0.

Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
Signed-off-by: Ren Mingshuai <renmingshuai@huawei.com>
---
 net/sched/cls_api.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Pedro Tammela June 6, 2023, 4:59 p.m. UTC | #1
On 06/06/2023 11:45, renmingshuai wrote:
> When a new chain is added by using tc, one soft lockup alarm will be
>   generated after delete the prio 0 filter of the chain. To reproduce
>   the problem, perform the following steps:
> (1) tc qdisc add dev eth0 root handle 1: htb default 1
> (2) tc chain add dev eth0
> (3) tc filter del dev eth0 chain 0 parent 1: prio 0
> (4) tc filter add dev eth0 chain 0 parent 1:

This seems like it could be added to tdc or 3 and 4 must be run in parallel?

> 
> 
> The refcnt of the chain added by step 2 is equal to 1. After step 3,
>   the flushing flag of the chain is set to true in the tcf_chain_flush()
>   called by tc_del_tfilter() because the prio is 0. In this case, if
>   we add a new filter to this chain, it will never succeed and try again
>   and again because the refresh flash is always true and refcnt is 1.
>   A soft lock alarm is generated 20 seconds later.
> The stack is show as below:
> 
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> Call Trace:
>   <IRQ>
>   dump_stack+0x57/0x6e
>   panic+0x196/0x3ec
>   watchdog_timer_fn.cold+0x16/0x5c
>   __run_hrtimer+0x5e/0x190
>   __hrtimer_run_queues+0x8a/0xe0
>   hrtimer_interrupt+0x110/0x2c0
>   ? irqtime_account_irq+0x49/0xf0
>   __sysvec_apic_timer_interrupt+0x5f/0xe0
>   asm_call_irq_on_stack+0x12/0x20
>   </IRQ>
>   sysvec_apic_timer_interrupt+0x72/0x80
>   asm_sysvec_apic_timer_interrupt+0x12/0x20
> RIP: 0010:mutex_lock+0x24/0x70
> RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
> RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
> RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
> R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
> R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
>   __tcf_chain_put+0x27/0x200
>   tc_new_tfilter+0x5e8/0x810
>   ? tc_setup_cb_add+0x210/0x210
>   rtnetlink_rcv_msg+0x2e3/0x380
>   ? rtnl_calcit.isra.0+0x120/0x120
>   netlink_rcv_skb+0x50/0x100
>   netlink_unicast+0x12d/0x1d0
>   netlink_sendmsg+0x286/0x490
>   sock_sendmsg+0x62/0x70
>   ____sys_sendmsg+0x24c/0x2c0
>   ? import_iovec+0x17/0x20
>   ? sendmsg_copy_msghdr+0x80/0xa0
>   ___sys_sendmsg+0x75/0xc0
>   ? do_fault_around+0x118/0x160
>   ? do_read_fault+0x68/0xf0
>   ? __handle_mm_fault+0x3f9/0x6f0
>   __sys_sendmsg+0x59/0xa0
>   do_syscall_64+0x33/0x40
>   entry_SYSCALL_64_after_hwframe+0x61/0xc6
> RIP: 0033:0x7f96705b8247
> RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
> RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
> R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
> R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00
> 
> To avoid this case, set chain->flushing to be false if the chain->refcnt
>   is 1 after flushing the chain when prio is 0.
> 
> Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
> Signed-off-by: Ren Mingshuai <renmingshuai@huawei.com>
> ---
>   net/sched/cls_api.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 2621550bfddc..68be55d75831 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -2442,6 +2442,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>   		tfilter_notify_chain(net, skb, block, q, parent, n,
>   				     chain, RTM_DELTFILTER, extack);
>   		tcf_chain_flush(chain, rtnl_held);
> +		/* Set the flushing flags to false to prevent an infinite loop
> +		 * when a new filter is added.
> +		 */
> +		mutex_lock(&chain->filter_chain_lock);
> +		if (chain->refcnt == 1)
> +			chain->flushing = false;
> +		mutex_unlock(&chain->filter_chain_lock);
>   		err = 0;
>   		goto errout;
>   	}
renmingshuai June 7, 2023, 4:19 a.m. UTC | #2
>On 06/06/2023 11:45, renmingshuai wrote:
>> When a new chain is added by using tc, one soft lockup alarm will be
>>   generated after delete the prio 0 filter of the chain. To reproduce
>>   the problem, perform the following steps:
>> (1) tc qdisc add dev eth0 root handle 1: htb default 1
>> (2) tc chain add dev eth0
>> (3) tc filter del dev eth0 chain 0 parent 1: prio 0
>> (4) tc filter add dev eth0 chain 0 parent 1:
>
>This seems like it could be added to tdc or 3 and 4 must be run in parallel?
3 and 4 do not need to be run inparallel. When a new chain is added by the
 way as step 1 and the step 3 is completed, this problem always occurs
 whenever step 4 is run.
>> 
>> 
>> The refcnt of the chain added by step 2 is equal to 1. After step 3,
>>   the flushing flag of the chain is set to true in the tcf_chain_flush()
>>   called by tc_del_tfilter() because the prio is 0. In this case, if
>>   we add a new filter to this chain, it will never succeed and try again
>>   and again because the refresh flash is always true and refcnt is 1.
>>   A soft lock alarm is generated 20 seconds later.
>> The stack is show as below:
>> 
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>> Call Trace:
>>   <IRQ>
>>   dump_stack+0x57/0x6e
>>   panic+0x196/0x3ec
>>   watchdog_timer_fn.cold+0x16/0x5c
>>   __run_hrtimer+0x5e/0x190
>>   __hrtimer_run_queues+0x8a/0xe0
>>   hrtimer_interrupt+0x110/0x2c0
>>   ? irqtime_account_irq+0x49/0xf0
>>   __sysvec_apic_timer_interrupt+0x5f/0xe0
>>   asm_call_irq_on_stack+0x12/0x20
>>   </IRQ>
>>   sysvec_apic_timer_interrupt+0x72/0x80
>>   asm_sysvec_apic_timer_interrupt+0x12/0x20
>> RIP: 0010:mutex_lock+0x24/0x70
>> RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
>> RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
>> RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
>> R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
>> R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
>>   __tcf_chain_put+0x27/0x200
>>   tc_new_tfilter+0x5e8/0x810
>>   ? tc_setup_cb_add+0x210/0x210
>>   rtnetlink_rcv_msg+0x2e3/0x380
>>   ? rtnl_calcit.isra.0+0x120/0x120
>>   netlink_rcv_skb+0x50/0x100
>>   netlink_unicast+0x12d/0x1d0
>>   netlink_sendmsg+0x286/0x490
>>   sock_sendmsg+0x62/0x70
>>   ____sys_sendmsg+0x24c/0x2c0
>>   ? import_iovec+0x17/0x20
>>   ? sendmsg_copy_msghdr+0x80/0xa0
>>   ___sys_sendmsg+0x75/0xc0
>>   ? do_fault_around+0x118/0x160
>>   ? do_read_fault+0x68/0xf0
>>   ? __handle_mm_fault+0x3f9/0x6f0
>>   __sys_sendmsg+0x59/0xa0
>>   do_syscall_64+0x33/0x40
>>   entry_SYSCALL_64_after_hwframe+0x61/0xc6
>> RIP: 0033:0x7f96705b8247
>> RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
>> RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
>> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
>> R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
>> R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00
>> 
>> To avoid this case, set chain->flushing to be false if the chain->refcnt
>>   is 1 after flushing the chain when prio is 0.
>> 
>> Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
>> Signed-off-by: Ren Mingshuai <renmingshuai@huawei.com>
>> ---
>>   net/sched/cls_api.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>> 
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 2621550bfddc..68be55d75831 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -2442,6 +2442,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>>   		tfilter_notify_chain(net, skb, block, q, parent, n,
>>   				     chain, RTM_DELTFILTER, extack);
>>   		tcf_chain_flush(chain, rtnl_held);
>> +		/* Set the flushing flags to false to prevent an infinite loop
>> +		 * when a new filter is added.
>> +		 */
>> +		mutex_lock(&chain->filter_chain_lock);
>> +		if (chain->refcnt == 1)
>> +			chain->flushing = false;
>> +		mutex_unlock(&chain->filter_chain_lock);
>>   		err = 0;
>>   		goto errout;
>>   	}
Pedro Tammela June 7, 2023, 3:02 p.m. UTC | #3
On 07/06/2023 01:19, renmingshuai wrote:
>> On 06/06/2023 11:45, renmingshuai wrote:
>>> When a new chain is added by using tc, one soft lockup alarm will be
>>>    generated after delete the prio 0 filter of the chain. To reproduce
>>>    the problem, perform the following steps:
>>> (1) tc qdisc add dev eth0 root handle 1: htb default 1
>>> (2) tc chain add dev eth0
>>> (3) tc filter del dev eth0 chain 0 parent 1: prio 0
>>> (4) tc filter add dev eth0 chain 0 parent 1:
>>
>> This seems like it could be added to tdc or 3 and 4 must be run in parallel?
> 3 and 4 do not need to be run inparallel. When a new chain is added by the
>   way as step 1 and the step 3 is completed, this problem always occurs
>   whenever step 4 is run.

Got it,
The test still hangs with the provided patch.

+ tc qdisc add dev lo root handle 1: htb default 1
+ tc chain add dev lo
+ tc filter del dev lo chain 0 parent 1: prio 0
[   68.790030][ T6704] [+]
[   68.790060][ T6704] chain refcnt 2
[   68.790951][ T6704] [-]
+ tc filter add dev lo chain 0 parent 1:
<hangs>

Also please add this test to tdc, it should be straightforward.

>>>
>>>
>>> The refcnt of the chain added by step 2 is equal to 1. After step 3,
>>>    the flushing flag of the chain is set to true in the tcf_chain_flush()
>>>    called by tc_del_tfilter() because the prio is 0. In this case, if
>>>    we add a new filter to this chain, it will never succeed and try again
>>>    and again because the refresh flash is always true and refcnt is 1.
>>>    A soft lock alarm is generated 20 seconds later.
>>> The stack is show as below:
>>>
>>> Kernel panic - not syncing: softlockup: hung tasks
>>> CPU: 2 PID: 3321861 Comm: tc Kdump: loaded Tainted: G
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
>>> Call Trace:
>>>    <IRQ>
>>>    dump_stack+0x57/0x6e
>>>    panic+0x196/0x3ec
>>>    watchdog_timer_fn.cold+0x16/0x5c
>>>    __run_hrtimer+0x5e/0x190
>>>    __hrtimer_run_queues+0x8a/0xe0
>>>    hrtimer_interrupt+0x110/0x2c0
>>>    ? irqtime_account_irq+0x49/0xf0
>>>    __sysvec_apic_timer_interrupt+0x5f/0xe0
>>>    asm_call_irq_on_stack+0x12/0x20
>>>    </IRQ>
>>>    sysvec_apic_timer_interrupt+0x72/0x80
>>>    asm_sysvec_apic_timer_interrupt+0x12/0x20
>>> RIP: 0010:mutex_lock+0x24/0x70
>>> RSP: 0018:ffffa836004ab9a8 EFLAGS: 00000246
>>> RAX: 0000000000000000 RBX: ffff95bb02d76700 RCX: 0000000000000000
>>> RDX: ffff95bb27462100 RSI: 0000000000000000 RDI: ffff95ba5b527000
>>> RBP: ffff95ba5b527000 R08: 0000000000000001 R09: ffffa836004abbb8
>>> R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
>>> R13: ffff95ba5b527000 R14: ffffa836004abbb8 R15: 0000000000000001
>>>    __tcf_chain_put+0x27/0x200
>>>    tc_new_tfilter+0x5e8/0x810
>>>    ? tc_setup_cb_add+0x210/0x210
>>>    rtnetlink_rcv_msg+0x2e3/0x380
>>>    ? rtnl_calcit.isra.0+0x120/0x120
>>>    netlink_rcv_skb+0x50/0x100
>>>    netlink_unicast+0x12d/0x1d0
>>>    netlink_sendmsg+0x286/0x490
>>>    sock_sendmsg+0x62/0x70
>>>    ____sys_sendmsg+0x24c/0x2c0
>>>    ? import_iovec+0x17/0x20
>>>    ? sendmsg_copy_msghdr+0x80/0xa0
>>>    ___sys_sendmsg+0x75/0xc0
>>>    ? do_fault_around+0x118/0x160
>>>    ? do_read_fault+0x68/0xf0
>>>    ? __handle_mm_fault+0x3f9/0x6f0
>>>    __sys_sendmsg+0x59/0xa0
>>>    do_syscall_64+0x33/0x40
>>>    entry_SYSCALL_64_after_hwframe+0x61/0xc6
>>> RIP: 0033:0x7f96705b8247
>>> RSP: 002b:00007ffe552e9dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96705b8247
>>> RDX: 0000000000000000 RSI: 00007ffe552e9e40 RDI: 0000000000000003
>>> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000558113f678b0
>>> R10: 00007f967069ab00 R11: 0000000000000246 R12: 00000000647ea089
>>> R13: 00007ffe552e9f30 R14: 0000000000000001 R15: 0000558113175f00
>>>
>>> To avoid this case, set chain->flushing to be false if the chain->refcnt
>>>    is 1 after flushing the chain when prio is 0.
>>>
>>> Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
>>> Signed-off-by: Ren Mingshuai <renmingshuai@huawei.com>
>>> ---
>>>    net/sched/cls_api.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 2621550bfddc..68be55d75831 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -2442,6 +2442,13 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
>>>    		tfilter_notify_chain(net, skb, block, q, parent, n,
>>>    				     chain, RTM_DELTFILTER, extack);
>>>    		tcf_chain_flush(chain, rtnl_held);
>>> +		/* Set the flushing flags to false to prevent an infinite loop
>>> +		 * when a new filter is added.
>>> +		 */
>>> +		mutex_lock(&chain->filter_chain_lock);
>>> +		if (chain->refcnt == 1)
>>> +			chain->flushing = false;
>>> +		mutex_unlock(&chain->filter_chain_lock);
>>>    		err = 0;
>>>    		goto errout;
>>>    	}
diff mbox series

Patch

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2621550bfddc..68be55d75831 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2442,6 +2442,13 @@  static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tfilter_notify_chain(net, skb, block, q, parent, n,
 				     chain, RTM_DELTFILTER, extack);
 		tcf_chain_flush(chain, rtnl_held);
+		/* Set the flushing flags to false to prevent an infinite loop
+		 * when a new filter is added.
+		 */
+		mutex_lock(&chain->filter_chain_lock);
+		if (chain->refcnt == 1)
+			chain->flushing = false;
+		mutex_unlock(&chain->filter_chain_lock);
 		err = 0;
 		goto errout;
 	}