diff mbox series

[v2] Revert "vrf: Remove unnecessary RCU-bh critical section"

Message ID 20240925185216.1990381-1-greearb@candelatech.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v2] Revert "vrf: Remove unnecessary RCU-bh critical section" | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 4 blamed authors not CCed: kuba@kernel.org dsahern@kernel.org horms@kernel.org idosch@nvidia.com; 6 maintainers not CCed: pabeni@redhat.com kuba@kernel.org horms@kernel.org idosch@nvidia.com edumazet@google.com dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 21 this patch: 21
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 79 this patch: 79
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-26--21-00 (tests: 768)

Commit Message

Ben Greear Sept. 25, 2024, 6:52 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.

dev_queue_xmit_nit needs to run with bh locking, otherwise
it conflicts with packets coming in from a nic in softirq
context and packets being transmitted from user context.

================================
WARNING: inconsistent lock state
6.11.0 #1 Tainted: G        W
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
{IN-SOFTIRQ-W} state was registered at:
  lock_acquire+0x19a/0x4f0
  _raw_spin_lock+0x27/0x40
  packet_rcv+0xa33/0x1320
  __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
  __netif_receive_skb_list_core+0x2c9/0x890
  netif_receive_skb_list_internal+0x610/0xcc0
  napi_complete_done+0x1c0/0x7c0
  igb_poll+0x1dbb/0x57e0 [igb]
  __napi_poll.constprop.0+0x99/0x430
  net_rx_action+0x8e7/0xe10
  handle_softirqs+0x1b7/0x800
  __irq_exit_rcu+0x91/0xc0
  irq_exit_rcu+0x5/0x10
  [snip]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(rlock-AF_PACKET);
  <Interrupt>
    lock(rlock-AF_PACKET);

 *** DEADLOCK ***

Call Trace:
 <TASK>
 dump_stack_lvl+0x73/0xa0
 mark_lock+0x102e/0x16b0
 __lock_acquire+0x9ae/0x6170
 lock_acquire+0x19a/0x4f0
 _raw_spin_lock+0x27/0x40
 tpacket_rcv+0x863/0x3b30
 dev_queue_xmit_nit+0x709/0xa40
 vrf_finish_direct+0x26e/0x340 [vrf]
 vrf_l3_out+0x5f4/0xe80 [vrf]
 __ip_local_out+0x51e/0x7a0
[snip]

Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@candelatech.com/T/

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  Edit patch description.

 drivers/net/vrf.c | 2 ++
 net/core/dev.c    | 1 +
 2 files changed, 3 insertions(+)

Comments

Greg KH Sept. 26, 2024, 8:24 a.m. UTC | #1
On Wed, Sep 25, 2024 at 11:52:16AM -0700, greearb@candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
> 
> This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
> 
> dev_queue_xmit_nit needs to run with bh locking, otherwise
> it conflicts with packets coming in from a nic in softirq
> context and packets being transmitted from user context.
> 
> ================================
> WARNING: inconsistent lock state
> 6.11.0 #1 Tainted: G        W
> --------------------------------
> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
> ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
> {IN-SOFTIRQ-W} state was registered at:
>   lock_acquire+0x19a/0x4f0
>   _raw_spin_lock+0x27/0x40
>   packet_rcv+0xa33/0x1320
>   __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
>   __netif_receive_skb_list_core+0x2c9/0x890
>   netif_receive_skb_list_internal+0x610/0xcc0
>   napi_complete_done+0x1c0/0x7c0
>   igb_poll+0x1dbb/0x57e0 [igb]
>   __napi_poll.constprop.0+0x99/0x430
>   net_rx_action+0x8e7/0xe10
>   handle_softirqs+0x1b7/0x800
>   __irq_exit_rcu+0x91/0xc0
>   irq_exit_rcu+0x5/0x10
>   [snip]
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(rlock-AF_PACKET);
>   <Interrupt>
>     lock(rlock-AF_PACKET);
> 
>  *** DEADLOCK ***
> 
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x73/0xa0
>  mark_lock+0x102e/0x16b0
>  __lock_acquire+0x9ae/0x6170
>  lock_acquire+0x19a/0x4f0
>  _raw_spin_lock+0x27/0x40
>  tpacket_rcv+0x863/0x3b30
>  dev_queue_xmit_nit+0x709/0xa40
>  vrf_finish_direct+0x26e/0x340 [vrf]
>  vrf_l3_out+0x5f4/0xe80 [vrf]
>  __ip_local_out+0x51e/0x7a0
> [snip]
> 
> Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
> Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@candelatech.com/T/
> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
> 
> v2:  Edit patch description.
> 
>  drivers/net/vrf.c | 2 ++
>  net/core/dev.c    | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 4d8ccaf9a2b4..4087f72f0d2b 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
>  		eth_zero_addr(eth->h_dest);
>  		eth->h_proto = skb->protocol;
>  
> +		rcu_read_lock_bh();
>  		dev_queue_xmit_nit(skb, vrf_dev);
> +		rcu_read_unlock_bh();
>  
>  		skb_pull(skb, ETH_HLEN);
>  	}
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cd479f5f22f6..566e69a38eed 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
>  /*
>   *	Support routine. Sends outgoing frames to any network
>   *	taps currently in use.
> + *	BH must be disabled before calling this.
>   */
>  
>  void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
> -- 
> 2.42.0
> 
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>
Willem de Bruijn Sept. 26, 2024, 9:03 a.m. UTC | #2
greearb@ wrote:
> From: Ben Greear <greearb@candelatech.com>
> 
> This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
> 
> dev_queue_xmit_nit needs to run with bh locking, otherwise
> it conflicts with packets coming in from a nic in softirq
> context and packets being transmitted from user context.
> 
> ================================
> WARNING: inconsistent lock state
> 6.11.0 #1 Tainted: G        W
> --------------------------------
> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
> ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
> {IN-SOFTIRQ-W} state was registered at:
>   lock_acquire+0x19a/0x4f0
>   _raw_spin_lock+0x27/0x40
>   packet_rcv+0xa33/0x1320
>   __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
>   __netif_receive_skb_list_core+0x2c9/0x890
>   netif_receive_skb_list_internal+0x610/0xcc0
>   napi_complete_done+0x1c0/0x7c0
>   igb_poll+0x1dbb/0x57e0 [igb]
>   __napi_poll.constprop.0+0x99/0x430
>   net_rx_action+0x8e7/0xe10
>   handle_softirqs+0x1b7/0x800
>   __irq_exit_rcu+0x91/0xc0
>   irq_exit_rcu+0x5/0x10
>   [snip]
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(rlock-AF_PACKET);
>   <Interrupt>
>     lock(rlock-AF_PACKET);
> 
>  *** DEADLOCK ***
> 
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x73/0xa0
>  mark_lock+0x102e/0x16b0
>  __lock_acquire+0x9ae/0x6170
>  lock_acquire+0x19a/0x4f0
>  _raw_spin_lock+0x27/0x40
>  tpacket_rcv+0x863/0x3b30
>  dev_queue_xmit_nit+0x709/0xa40
>  vrf_finish_direct+0x26e/0x340 [vrf]
>  vrf_l3_out+0x5f4/0xe80 [vrf]
>  __ip_local_out+0x51e/0x7a0
> [snip]
> 
> Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
> Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@candelatech.com/T/
> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>

Please Cc: all previous reviewers and folks who participated in the
discussion. I entirely missed this. No need to add as Cc tags, just
--cc in git send-email will do.

> ---
> 
> v2:  Edit patch description.
> 
>  drivers/net/vrf.c | 2 ++
>  net/core/dev.c    | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 4d8ccaf9a2b4..4087f72f0d2b 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
>  		eth_zero_addr(eth->h_dest);
>  		eth->h_proto = skb->protocol;
>  
> +		rcu_read_lock_bh();
>  		dev_queue_xmit_nit(skb, vrf_dev);
> +		rcu_read_unlock_bh();
>  
>  		skb_pull(skb, ETH_HLEN);
>  	}
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cd479f5f22f6..566e69a38eed 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
>  /*
>   *	Support routine. Sends outgoing frames to any network
>   *	taps currently in use.
> + *	BH must be disabled before calling this.

Unnecessary. Especially patches for stable should be minimal to
reduce the chance of conflicts.
Ben Greear Sept. 26, 2024, 6:19 p.m. UTC | #3
On 9/26/24 02:03, Willem de Bruijn wrote:
> greearb@ wrote:
>> From: Ben Greear <greearb@candelatech.com>
>>
>> This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
>>
>> dev_queue_xmit_nit needs to run with bh locking, otherwise
>> it conflicts with packets coming in from a nic in softirq
>> context and packets being transmitted from user context.
>>
>> ================================
>> WARNING: inconsistent lock state
>> 6.11.0 #1 Tainted: G        W
>> --------------------------------
>> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>> btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
>> {IN-SOFTIRQ-W} state was registered at:
>>    lock_acquire+0x19a/0x4f0
>>    _raw_spin_lock+0x27/0x40
>>    packet_rcv+0xa33/0x1320
>>    __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
>>    __netif_receive_skb_list_core+0x2c9/0x890
>>    netif_receive_skb_list_internal+0x610/0xcc0
>>    napi_complete_done+0x1c0/0x7c0
>>    igb_poll+0x1dbb/0x57e0 [igb]
>>    __napi_poll.constprop.0+0x99/0x430
>>    net_rx_action+0x8e7/0xe10
>>    handle_softirqs+0x1b7/0x800
>>    __irq_exit_rcu+0x91/0xc0
>>    irq_exit_rcu+0x5/0x10
>>    [snip]
>>
>> other info that might help us debug this:
>>   Possible unsafe locking scenario:
>>
>>         CPU0
>>         ----
>>    lock(rlock-AF_PACKET);
>>    <Interrupt>
>>      lock(rlock-AF_PACKET);
>>
>>   *** DEADLOCK ***
>>
>> Call Trace:
>>   <TASK>
>>   dump_stack_lvl+0x73/0xa0
>>   mark_lock+0x102e/0x16b0
>>   __lock_acquire+0x9ae/0x6170
>>   lock_acquire+0x19a/0x4f0
>>   _raw_spin_lock+0x27/0x40
>>   tpacket_rcv+0x863/0x3b30
>>   dev_queue_xmit_nit+0x709/0xa40
>>   vrf_finish_direct+0x26e/0x340 [vrf]
>>   vrf_l3_out+0x5f4/0xe80 [vrf]
>>   __ip_local_out+0x51e/0x7a0
>> [snip]
>>
>> Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
>> Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@candelatech.com/T/
>>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
> 
> Please Cc: all previous reviewers and folks who participated in the
> discussion. I entirely missed this. No need to add as Cc tags, just
> --cc in git send-email will do.
> 
>> ---
>>
>> v2:  Edit patch description.
>>
>>   drivers/net/vrf.c | 2 ++
>>   net/core/dev.c    | 1 +
>>   2 files changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
>> index 4d8ccaf9a2b4..4087f72f0d2b 100644
>> --- a/drivers/net/vrf.c
>> +++ b/drivers/net/vrf.c
>> @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
>>   		eth_zero_addr(eth->h_dest);
>>   		eth->h_proto = skb->protocol;
>>   
>> +		rcu_read_lock_bh();
>>   		dev_queue_xmit_nit(skb, vrf_dev);
>> +		rcu_read_unlock_bh();
>>   
>>   		skb_pull(skb, ETH_HLEN);
>>   	}
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index cd479f5f22f6..566e69a38eed 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
>>   /*
>>    *	Support routine. Sends outgoing frames to any network
>>    *	taps currently in use.
>> + *	BH must be disabled before calling this.
> 
> Unnecessary. Especially patches for stable should be minimal to
> reduce the chance of conflicts.

I was asked to add this, and also asked to CC stable.

Thanks,
Ben
Greg KH Sept. 27, 2024, 6:19 a.m. UTC | #4
On Thu, Sep 26, 2024 at 11:19:53AM -0700, Ben Greear wrote:
> On 9/26/24 02:03, Willem de Bruijn wrote:
> > greearb@ wrote:
> > > From: Ben Greear <greearb@candelatech.com>
> > > 
> > > This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
> > > 
> > > dev_queue_xmit_nit needs to run with bh locking, otherwise
> > > it conflicts with packets coming in from a nic in softirq
> > > context and packets being transmitted from user context.
> > > 
> > > ================================
> > > WARNING: inconsistent lock state
> > > 6.11.0 #1 Tainted: G        W
> > > --------------------------------
> > > inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > > btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
> > > {IN-SOFTIRQ-W} state was registered at:
> > >    lock_acquire+0x19a/0x4f0
> > >    _raw_spin_lock+0x27/0x40
> > >    packet_rcv+0xa33/0x1320
> > >    __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
> > >    __netif_receive_skb_list_core+0x2c9/0x890
> > >    netif_receive_skb_list_internal+0x610/0xcc0
> > >    napi_complete_done+0x1c0/0x7c0
> > >    igb_poll+0x1dbb/0x57e0 [igb]
> > >    __napi_poll.constprop.0+0x99/0x430
> > >    net_rx_action+0x8e7/0xe10
> > >    handle_softirqs+0x1b7/0x800
> > >    __irq_exit_rcu+0x91/0xc0
> > >    irq_exit_rcu+0x5/0x10
> > >    [snip]
> > > 
> > > other info that might help us debug this:
> > >   Possible unsafe locking scenario:
> > > 
> > >         CPU0
> > >         ----
> > >    lock(rlock-AF_PACKET);
> > >    <Interrupt>
> > >      lock(rlock-AF_PACKET);
> > > 
> > >   *** DEADLOCK ***
> > > 
> > > Call Trace:
> > >   <TASK>
> > >   dump_stack_lvl+0x73/0xa0
> > >   mark_lock+0x102e/0x16b0
> > >   __lock_acquire+0x9ae/0x6170
> > >   lock_acquire+0x19a/0x4f0
> > >   _raw_spin_lock+0x27/0x40
> > >   tpacket_rcv+0x863/0x3b30
> > >   dev_queue_xmit_nit+0x709/0xa40
> > >   vrf_finish_direct+0x26e/0x340 [vrf]
> > >   vrf_l3_out+0x5f4/0xe80 [vrf]
> > >   __ip_local_out+0x51e/0x7a0
> > > [snip]
> > > 
> > > Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
> > > Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@candelatech.com/T/
> > > 
> > > Signed-off-by: Ben Greear <greearb@candelatech.com>
> > 
> > Please Cc: all previous reviewers and folks who participated in the
> > discussion. I entirely missed this. No need to add as Cc tags, just
> > --cc in git send-email will do.
> > 
> > > ---
> > > 
> > > v2:  Edit patch description.
> > > 
> > >   drivers/net/vrf.c | 2 ++
> > >   net/core/dev.c    | 1 +
> > >   2 files changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> > > index 4d8ccaf9a2b4..4087f72f0d2b 100644
> > > --- a/drivers/net/vrf.c
> > > +++ b/drivers/net/vrf.c
> > > @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
> > >   		eth_zero_addr(eth->h_dest);
> > >   		eth->h_proto = skb->protocol;
> > > +		rcu_read_lock_bh();
> > >   		dev_queue_xmit_nit(skb, vrf_dev);
> > > +		rcu_read_unlock_bh();
> > >   		skb_pull(skb, ETH_HLEN);
> > >   	}
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index cd479f5f22f6..566e69a38eed 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
> > >   /*
> > >    *	Support routine. Sends outgoing frames to any network
> > >    *	taps currently in use.
> > > + *	BH must be disabled before calling this.
> > 
> > Unnecessary. Especially patches for stable should be minimal to
> > reduce the chance of conflicts.
> 
> I was asked to add this, and also asked to CC stable.


<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>
Willem de Bruijn Sept. 27, 2024, 9:35 a.m. UTC | #5
Ben Greear wrote:
> On 9/26/24 02:03, Willem de Bruijn wrote:
> > greearb@ wrote:
> >> From: Ben Greear <greearb@candelatech.com>
> >>
> >> This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
> >>
> >> dev_queue_xmit_nit needs to run with bh locking, otherwise
> >> it conflicts with packets coming in from a nic in softirq
> >> context and packets being transmitted from user context.
> >>
> >> ================================
> >> WARNING: inconsistent lock state
> >> 6.11.0 #1 Tainted: G        W
> >> --------------------------------
> >> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> >> btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
> >> ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
> >> {IN-SOFTIRQ-W} state was registered at:
> >>    lock_acquire+0x19a/0x4f0
> >>    _raw_spin_lock+0x27/0x40
> >>    packet_rcv+0xa33/0x1320
> >>    __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
> >>    __netif_receive_skb_list_core+0x2c9/0x890
> >>    netif_receive_skb_list_internal+0x610/0xcc0
> >>    napi_complete_done+0x1c0/0x7c0
> >>    igb_poll+0x1dbb/0x57e0 [igb]
> >>    __napi_poll.constprop.0+0x99/0x430
> >>    net_rx_action+0x8e7/0xe10
> >>    handle_softirqs+0x1b7/0x800
> >>    __irq_exit_rcu+0x91/0xc0
> >>    irq_exit_rcu+0x5/0x10
> >>    [snip]
> >>
> >> other info that might help us debug this:
> >>   Possible unsafe locking scenario:
> >>
> >>         CPU0
> >>         ----
> >>    lock(rlock-AF_PACKET);
> >>    <Interrupt>
> >>      lock(rlock-AF_PACKET);
> >>
> >>   *** DEADLOCK ***
> >>
> >> Call Trace:
> >>   <TASK>
> >>   dump_stack_lvl+0x73/0xa0
> >>   mark_lock+0x102e/0x16b0
> >>   __lock_acquire+0x9ae/0x6170
> >>   lock_acquire+0x19a/0x4f0
> >>   _raw_spin_lock+0x27/0x40
> >>   tpacket_rcv+0x863/0x3b30
> >>   dev_queue_xmit_nit+0x709/0xa40
> >>   vrf_finish_direct+0x26e/0x340 [vrf]
> >>   vrf_l3_out+0x5f4/0xe80 [vrf]
> >>   __ip_local_out+0x51e/0x7a0
> >> [snip]
> >>
> >> Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
> >> Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@candelatech.com/T/
> >>
> >> Signed-off-by: Ben Greear <greearb@candelatech.com>
> > 
> > Please Cc: all previous reviewers and folks who participated in the
> > discussion. I entirely missed this. No need to add as Cc tags, just
> > --cc in git send-email will do.
> > 
> >> ---
> >>
> >> v2:  Edit patch description.
> >>
> >>   drivers/net/vrf.c | 2 ++
> >>   net/core/dev.c    | 1 +
> >>   2 files changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> >> index 4d8ccaf9a2b4..4087f72f0d2b 100644
> >> --- a/drivers/net/vrf.c
> >> +++ b/drivers/net/vrf.c
> >> @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
> >>   		eth_zero_addr(eth->h_dest);
> >>   		eth->h_proto = skb->protocol;
> >>   
> >> +		rcu_read_lock_bh();
> >>   		dev_queue_xmit_nit(skb, vrf_dev);
> >> +		rcu_read_unlock_bh();
> >>   
> >>   		skb_pull(skb, ETH_HLEN);
> >>   	}
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index cd479f5f22f6..566e69a38eed 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
> >>   /*
> >>    *	Support routine. Sends outgoing frames to any network
> >>    *	taps currently in use.
> >> + *	BH must be disabled before calling this.
> > 
> > Unnecessary. Especially patches for stable should be minimal to
> > reduce the chance of conflicts.
> 
> I was asked to add this, and also asked to CC stable.

Conflicting feedback is annoying, but I disagree with the other
comment.

Not only does it potentially complicate backporting, it also is no
longer a strict revert. In which case it must not be labeled as such.

Easier is to keep the revert unmodified, and add the comment to the
commit message.

Most important is the Link to our earlier thread that explains the
history.

If the process appears byzantine, if you prefer I can also send it.

Thanks,

  Willem
Ben Greear Sept. 27, 2024, 1:39 p.m. UTC | #6
On 9/27/24 02:35, Willem de Bruijn wrote:
> Ben Greear wrote:
>> On 9/26/24 02:03, Willem de Bruijn wrote:
>>> greearb@ wrote:
>>>> From: Ben Greear <greearb@candelatech.com>
>>>>
>>>> This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
>>>>
>>>> dev_queue_xmit_nit needs to run with bh locking, otherwise
>>>> it conflicts with packets coming in from a nic in softirq
>>>> context and packets being transmitted from user context.
>>>>
>>>> ================================
>>>> WARNING: inconsistent lock state
>>>> 6.11.0 #1 Tainted: G        W
>>>> --------------------------------
>>>> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>>>> btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>>> ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
>>>> {IN-SOFTIRQ-W} state was registered at:
>>>>     lock_acquire+0x19a/0x4f0
>>>>     _raw_spin_lock+0x27/0x40
>>>>     packet_rcv+0xa33/0x1320
>>>>     __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
>>>>     __netif_receive_skb_list_core+0x2c9/0x890
>>>>     netif_receive_skb_list_internal+0x610/0xcc0
>>>>     napi_complete_done+0x1c0/0x7c0
>>>>     igb_poll+0x1dbb/0x57e0 [igb]
>>>>     __napi_poll.constprop.0+0x99/0x430
>>>>     net_rx_action+0x8e7/0xe10
>>>>     handle_softirqs+0x1b7/0x800
>>>>     __irq_exit_rcu+0x91/0xc0
>>>>     irq_exit_rcu+0x5/0x10
>>>>     [snip]
>>>>
>>>> other info that might help us debug this:
>>>>    Possible unsafe locking scenario:
>>>>
>>>>          CPU0
>>>>          ----
>>>>     lock(rlock-AF_PACKET);
>>>>     <Interrupt>
>>>>       lock(rlock-AF_PACKET);
>>>>
>>>>    *** DEADLOCK ***
>>>>
>>>> Call Trace:
>>>>    <TASK>
>>>>    dump_stack_lvl+0x73/0xa0
>>>>    mark_lock+0x102e/0x16b0
>>>>    __lock_acquire+0x9ae/0x6170
>>>>    lock_acquire+0x19a/0x4f0
>>>>    _raw_spin_lock+0x27/0x40
>>>>    tpacket_rcv+0x863/0x3b30
>>>>    dev_queue_xmit_nit+0x709/0xa40
>>>>    vrf_finish_direct+0x26e/0x340 [vrf]
>>>>    vrf_l3_out+0x5f4/0xe80 [vrf]
>>>>    __ip_local_out+0x51e/0x7a0
>>>> [snip]
>>>>
>>>> Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
>>>> Link: https://lore.kernel.org/netdev/05765015-f727-2f30-58da-2ad6fa7ea99f@candelatech.com/T/
>>>>
>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>
>>> Please Cc: all previous reviewers and folks who participated in the
>>> discussion. I entirely missed this. No need to add as Cc tags, just
>>> --cc in git send-email will do.
>>>
>>>> ---
>>>>
>>>> v2:  Edit patch description.
>>>>
>>>>    drivers/net/vrf.c | 2 ++
>>>>    net/core/dev.c    | 1 +
>>>>    2 files changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
>>>> index 4d8ccaf9a2b4..4087f72f0d2b 100644
>>>> --- a/drivers/net/vrf.c
>>>> +++ b/drivers/net/vrf.c
>>>> @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
>>>>    		eth_zero_addr(eth->h_dest);
>>>>    		eth->h_proto = skb->protocol;
>>>>    
>>>> +		rcu_read_lock_bh();
>>>>    		dev_queue_xmit_nit(skb, vrf_dev);
>>>> +		rcu_read_unlock_bh();
>>>>    
>>>>    		skb_pull(skb, ETH_HLEN);
>>>>    	}
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index cd479f5f22f6..566e69a38eed 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
>>>>    /*
>>>>     *	Support routine. Sends outgoing frames to any network
>>>>     *	taps currently in use.
>>>> + *	BH must be disabled before calling this.
>>>
>>> Unnecessary. Especially patches for stable should be minimal to
>>> reduce the chance of conflicts.
>>
>> I was asked to add this, and also asked to CC stable.
> 
> Conflicting feedback is annoying, but I disagree with the other
> comment.
> 
> Not only does it potentially complicate backporting, it also is no
> longer a strict revert. In which case it must not be labeled as such.
> 
> Easier is to keep the revert unmodified, and add the comment to the
> commit message.
> 
> Most important is the Link to our earlier thread that explains the
> history.
> 
> If the process appears byzantine, if you prefer I can also send it.

I would appreciate it if you can send it.  As long as it functions,
I will be happy.

Thanks,
Ben

> 
> Thanks,
> 
>    Willem
> 
> 
>
diff mbox series

Patch

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4d8ccaf9a2b4..4087f72f0d2b 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -608,7 +608,9 @@  static void vrf_finish_direct(struct sk_buff *skb)
 		eth_zero_addr(eth->h_dest);
 		eth->h_proto = skb->protocol;
 
+		rcu_read_lock_bh();
 		dev_queue_xmit_nit(skb, vrf_dev);
+		rcu_read_unlock_bh();
 
 		skb_pull(skb, ETH_HLEN);
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index cd479f5f22f6..566e69a38eed 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2285,6 +2285,7 @@  EXPORT_SYMBOL_GPL(dev_nit_active);
 /*
  *	Support routine. Sends outgoing frames to any network
  *	taps currently in use.
+ *	BH must be disabled before calling this.
  */
 
 void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)