diff mbox series

mac80211: do not iterate active interfaces when in re-configure

Message ID 20200525165317.2269-1-greearb@candelatech.com (mailing list archive)
State Changes Requested
Delegated to: Johannes Berg
Headers show
Series mac80211: do not iterate active interfaces when in re-configure | expand

Commit Message

Ben Greear May 25, 2020, 4:53 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

This appears to fix a problem where ath10k firmware would crash,
mac80211 would start re-adding interfaces to the driver, but the
iterate-active-interfaces logic would then try to use the half-built
interfaces.  With a bit of extra debug to catch the problem, the
ath10k crash looks like this:

ath10k_pci 0000:05:00.0: Initializing arvif: ffff8801ce97e320 on vif: ffff8801ce97e1d8

[the print that happens after arvif->ar is assigned is not shown, so code did not make it that far before
 the tx-beacon-nowait method was called]

tx-beacon-nowait:  arvif: ffff8801ce97e320  ar:           (null)
arvif->magic: 0x87560001
------------[ cut here ]------------
kernel BUG at /home/greearb/git/linux-4.7.dev.y/drivers/net/wireless/ath/ath10k/wmi.c:1781!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 bridge carl9170 mac80211_hwsim ath10k_pci ath10k_core ath5k ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 8021q garp mrp stp llc bnep bluetooth fuse macvlan pktgen rpcsec_gss_krb5 nfsv4 nfs fscache snd_hda_codec_hdmi coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic kvm iTCO_wdt irqbypass iTCO_vendor_support joydev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device pcspkr snd_pcm snd_timer shpchp snd i2c_i801 lpc_ich soundcore tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 serio_raw i2c_algo_bit drm_kms_helper ata_generic e1000e pata_acpi drm ptp pps_core i2c_core fjes video ipv6 [last unloaded: nf_conntrack]
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.10+ #15
Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
task: ffff8801d4f20000 ti: ffff8801d4f28000 task.ti: ffff8801d4f28000
RIP: 0010:[<ffffffffa0efbcfb>]  [<ffffffffa0efbcfb>] ath10k_wmi_tx_beacons_iter+0x28b/0x290 [ath10k_core]
RSP: 0018:ffff8801d6447a98  EFLAGS: 00010293
RAX: 0000000000000018 RBX: ffff8801ce97e1d8 RCX: 0000000000000000
RDX: 0000000000000018 RSI: 0000000000000003 RDI: ffffed003ac88f49
RBP: ffff8801d6447af0 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: ffff8801ce97e320 R14: ffff8801ce97e378 R15: ffff8801ce97ca40
FS:  0000000000000000(0000) GS:ffff8801d6440000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007eff191ef1ab CR3: 000000000260a000 CR4: 00000000001406e0
Stack:
 1ffff1003ac88f59 0000000041b58ab3 ffffffffa0f4d52a ffff8801d4f20000
 0000000000000246 0000000000000002 ffff8801ce97e1d8 ffff8801bd5d39b8
 0000000000000002 0000000000000001 ffff8801ce97ca40 ffff8801d6447b48
Call Trace:
 <IRQ>
 [<ffffffffa0d03e5c>] __iterate_interfaces+0xfc/0x1d0 [mac80211]
 [<ffffffffa0efba70>] ? ath10k_wmi_cmd_send_nowait+0x260/0x260 [ath10k_core]
 [<ffffffffa0efba70>] ? ath10k_wmi_cmd_send_nowait+0x260/0x260 [ath10k_core]
 [<ffffffffa0d04477>] ieee80211_iterate_active_interfaces_atomic+0x67/0x100 [mac80211]
 [<ffffffffa0d04410>] ? ieee80211_handle_reconfig_failure+0x140/0x140 [mac80211]
 [<ffffffffa0ef4060>] ? ath10k_tpc_config_disp_tables+0x620/0x620 [ath10k_core]
 [<ffffffffa0ef408b>] ath10k_wmi_op_ep_tx_credits+0x2b/0x50 [ath10k_core]
 [<ffffffffa0ee2fd2>] ath10k_htc_rx_completion_handler+0x422/0x5c0 [ath10k_core]
 [<ffffffffa0b4301e>] ath10k_pci_process_rx_cb+0x37e/0x430 [ath10k_pci]
 [<ffffffffa0ee2bb0>] ? ath10k_htc_build_tx_ctrl_skb+0xc0/0xc0 [ath10k_core]
 [<ffffffffa0b42ca0>] ? ath10k_pci_rx_post_pipe+0x550/0x550 [ath10k_pci]
 [<ffffffff8120cbe5>] ? debug_lockdep_rcu_enabled+0x35/0x40
 [<ffffffff811e1893>] ? mark_held_locks+0x23/0xc0
 [<ffffffff8116019a>] ? __local_bh_enable_ip+0x6a/0xd0
 [<ffffffff811e1abb>] ? trace_hardirqs_on_caller+0x18b/0x290
 [<ffffffff811e1bcd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8116019a>] ? __local_bh_enable_ip+0x6a/0xd0
 [<ffffffff81df11d0>] ? _raw_spin_unlock_bh+0x30/0x40
 [<ffffffffa0b4902e>] ? ath10k_ce_per_engine_service+0xee/0x100 [ath10k_pci]
 [<ffffffffa0b43139>] ath10k_pci_htt_htc_rx_cb+0x29/0x30 [ath10k_pci]
 [<ffffffffa0b48fe6>] ath10k_ce_per_engine_service+0xa6/0x100 [ath10k_pci]
 [<ffffffffa0b49116>] ath10k_ce_per_engine_service_any+0xd6/0xf0 [ath10k_pci]
 [<ffffffffa0b45800>] ? ath10k_pci_enable_legacy_irq+0xe0/0xe0 [ath10k_pci]
 [<ffffffffa0b4585f>] ath10k_pci_tasklet+0x5f/0xb0 [ath10k_pci]
 [<ffffffff81160445>] tasklet_action+0x245/0x2b0
 [<ffffffff81df4831>] __do_softirq+0x181/0x595
 [<ffffffff8116137c>] irq_exit+0xbc/0xc0
 [<ffffffff81df423c>] do_IRQ+0x7c/0x150
 [<ffffffff81df23cc>] common_interrupt+0x8c/0x8c
 <EOI>
 [<ffffffff811e1abb>] ? trace_hardirqs_on_caller+0x18b/0x290
 [<ffffffff81b722ae>] ? cpuidle_enter_state+0x1ae/0x4b0
 [<ffffffff81b722a7>] ? cpuidle_enter_state+0x1a7/0x4b0
 [<ffffffff81b72602>] cpuidle_enter+0x12/0x20
 [<ffffffff811d0b6e>] call_cpuidle+0x4e/0x90
 [<ffffffff811d10e7>] cpu_startup_entry+0x3f7/0x540
 [<ffffffff811d0cf0>] ? default_idle_call+0x50/0x50
 [<ffffffff81234bdf>] ? clockevents_config_and_register+0x5f/0x70
 [<ffffffff81085a9a>] ? setup_APIC_timer+0xfa/0x110
 [<ffffffff81083b63>] start_secondary+0x253/0x2b0
 [<ffffffff81083910>] ? set_cpu_sibling_map+0x920/0x920
Code: 4d 49 e0 8b b3 48 01 00 00 48 c7 c7 a0 ee f3 a0 e8 d9 c2 3f e0 49 81 fd 3f 1f 00 00 76 0f 49 81 fc 3f 1f 00 00 0f 87 c0 fd ff ff <0f> 0b 0f 0b 90 55 48 89 e5 41 57 41 56 48 8d 85 58 ff ff ff 41
RIP  [<ffffffffa0efbcfb>] ath10k_wmi_tx_beacons_iter+0x28b/0x290 [ath10k_core]
 RSP <ffff8801d6447a98>
---[ end trace 6588464714e5163a ]---

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 net/mac80211/util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Johannes Berg July 30, 2020, 11:48 a.m. UTC | #1
On Mon, 2020-05-25 at 09:53 -0700, greearb@candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
> 
> This appears to fix a problem where ath10k firmware would crash,


"appears to", heh

Really though, in general, you need to start thinking about mac80211 and
the drivers as two separate things. You've also submitted another patch
where you say "this iwlwifi thing requires mac80211 to change", and here
you're submitting a patch saying "this ath10k thing requires mac80211 to
change", but I don't see you considering much the fact that mac80211 is
actually used for both.

It'd be good to have a discussion of such things in the commit log for
changes that will affect multiple drivers, rather than focusing on a
single bug for a single driver. In general, not just in this patch.

> diff --git a/net/mac80211/util.c b/net/mac80211/util.c
> index 5db2cd0..186a696 100644
> --- a/net/mac80211/util.c
> +++ b/net/mac80211/util.c
> @@ -831,7 +831,7 @@ static void __iterate_interfaces(struct ieee80211_local *local,
>  			break;
>  		}
>  		if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) &&
> -		    active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER))
> +		    (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER))))
>  			continue;

Anyway, this seems wrong to me. If anything, it should skip those
interfaces that weren't re-added yet, but not all of them. I'm pretty
sure this would cause iwlwifi to misbehave with multiple interfaces, as
it sometimes relies on iterating to understand what else is going on
before configuring something.

It might even be that this can only be done subject to driver choice.

johannes
Ben Greear July 30, 2020, 1:05 p.m. UTC | #2
On 7/30/20 4:48 AM, Johannes Berg wrote:
> On Mon, 2020-05-25 at 09:53 -0700, greearb@candelatech.com wrote:
>> From: Ben Greear <greearb@candelatech.com>
>>
>> This appears to fix a problem where ath10k firmware would crash,
> 
> 
> "appears to", heh
> 
> Really though, in general, you need to start thinking about mac80211 and
> the drivers as two separate things. You've also submitted another patch
> where you say "this iwlwifi thing requires mac80211 to change", and here
> you're submitting a patch saying "this ath10k thing requires mac80211 to
> change", but I don't see you considering much the fact that mac80211 is
> actually used for both.
> 
> It'd be good to have a discussion of such things in the commit log for
> changes that will affect multiple drivers, rather than focusing on a
> single bug for a single driver. In general, not just in this patch.
> 
>> diff --git a/net/mac80211/util.c b/net/mac80211/util.c
>> index 5db2cd0..186a696 100644
>> --- a/net/mac80211/util.c
>> +++ b/net/mac80211/util.c
>> @@ -831,7 +831,7 @@ static void __iterate_interfaces(struct ieee80211_local *local,
>>   			break;
>>   		}
>>   		if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) &&
>> -		    active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER))
>> +		    (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER))))
>>   			continue;
> 
> Anyway, this seems wrong to me. If anything, it should skip those
> interfaces that weren't re-added yet, but not all of them. I'm pretty
> sure this would cause iwlwifi to misbehave with multiple interfaces, as
> it sometimes relies on iterating to understand what else is going on
> before configuring something.
> 
> It might even be that this can only be done subject to driver choice.

I have tested this patch hard for many years with hundreds of station vifs on ath9k radios and
64 station vifs on ath10k radios, probably way harder than anyone else is testing
this sort of thing.

Possibly you are correct about iwlwifi, I've never tested it with multi-interface,
and as you see, have had bad luck on ax200.

If you'd accept a patch with a new driver flag check (which I can enable for
ath10k and ath9k), then I'll respin it thus.

Thanks,
Ben
Johannes Berg July 30, 2020, 1:13 p.m. UTC | #3
On Thu, 2020-07-30 at 06:05 -0700, Ben Greear wrote:

> > It might even be that this can only be done subject to driver choice.
> 
> I have tested this patch hard for many years with hundreds of station vifs on ath9k radios and
> 64 station vifs on ath10k radios, probably way harder than anyone else is testing
> this sort of thing.

Yeah, I'm sure!

> Possibly you are correct about iwlwifi, I've never tested it with multi-interface,
> and as you see, have had bad luck on ax200.

Right.

> If you'd accept a patch with a new driver flag check (which I can enable for
> ath10k and ath9k), then I'll respin it thus.

My order of preference would be something like

1. track per vif whether it was re-added, and skip before it is

If that works, I can certainly get behind it for semantic reasons (the
vif isn't yet active again), although even there I'm not sure how
iwlwifi would behave - but that's something I'd look into and perhaps
even consider a bug there since it shouldn't know about that interface
yet.

2. If for some reason that doesn't work, add an iteration flag that
controls this, rather than a per-device config?

johannes
Ben Greear July 30, 2020, 1:27 p.m. UTC | #4
On 7/30/20 6:13 AM, Johannes Berg wrote:
> On Thu, 2020-07-30 at 06:05 -0700, Ben Greear wrote:
> 
>>> It might even be that this can only be done subject to driver choice.
>>
>> I have tested this patch hard for many years with hundreds of station vifs on ath9k radios and
>> 64 station vifs on ath10k radios, probably way harder than anyone else is testing
>> this sort of thing.
> 
> Yeah, I'm sure!
> 
>> Possibly you are correct about iwlwifi, I've never tested it with multi-interface,
>> and as you see, have had bad luck on ax200.
> 
> Right.
> 
>> If you'd accept a patch with a new driver flag check (which I can enable for
>> ath10k and ath9k), then I'll respin it thus.
> 
> My order of preference would be something like
> 
> 1. track per vif whether it was re-added, and skip before it is
> 
> If that works, I can certainly get behind it for semantic reasons (the
> vif isn't yet active again), although even there I'm not sure how
> iwlwifi would behave - but that's something I'd look into and perhaps
> even consider a bug there since it shouldn't know about that interface
> yet.

Wouldn't that complicate the sdata-in-driver check even more?  So it would
be something like is-in-driver-but-not-yet-reconfigured-in-driver?

This sort of state is quite nasty in my experience.  Almost better if
we had less state.  Driver should already know if it has an object or not,
so redundant adds, or requests from mac80211 for objects the driver does not
have can be handled properly by the driver?

> 
> 2. If for some reason that doesn't work, add an iteration flag that
> controls this, rather than a per-device config?

I wrote this patch probably 5 or so years ago, and since I have fixed most of
the ath10k firmware crashes that would tend to trigger this bug.  I think I have no
good way to test any complicated change to this patch.

I am quite certain that ath9k and ath10k are at least not harmed by this patch, and
certainly old ath10k benefited from this.  So, I'm comfortable adding a driver
level flag enabled for those two drivers.  Changing all the calling locations to
(maybe) add a flag or adding and tracking vif state is too risky to be worth this change I
think.

Thanks,
Ben



> 
> johannes
>
Johannes Berg July 30, 2020, 1:41 p.m. UTC | #5
On Thu, 2020-07-30 at 06:27 -0700, Ben Greear wrote:

> > 1. track per vif whether it was re-added, and skip before it is
> > 
> > If that works, I can certainly get behind it for semantic reasons (the
> > vif isn't yet active again), although even there I'm not sure how
> > iwlwifi would behave - but that's something I'd look into and perhaps
> > even consider a bug there since it shouldn't know about that interface
> > yet.
> 
> Wouldn't that complicate the sdata-in-driver check even more?  So it would
> be something like is-in-driver-but-not-yet-reconfigured-in-driver?

We should probably just clear the "is-in-driver" flag for the most part,
and just remember "was-in-driver" so we know which ones to reconfigure,
or something like that?

> This sort of state is quite nasty in my experience.  Almost better if
> we had less state.  Driver should already know if it has an object or not,
> so redundant adds, or requests from mac80211 for objects the driver does not
> have can be handled properly by the driver?

I don't think that'll work. Drivers just act on "add" and "remove",
they're not checking against double-add. And IMHO it makes more sense to
have mac80211 do good sequencing than the throw our hands in the air and
let the drivers have to be idempotent just because we can't figure out
the right sequencing?

> > 2. If for some reason that doesn't work, add an iteration flag that
> > controls this, rather than a per-device config?
> 
> I wrote this patch probably 5 or so years ago, and since I have fixed most of
> the ath10k firmware crashes that would tend to trigger this bug.  I think I have no
> good way to test any complicated change to this patch.
> 
> I am quite certain that ath9k and ath10k are at least not harmed by this patch, and
> certainly old ath10k benefited from this.  So, I'm comfortable adding a driver
> level flag enabled for those two drivers.  Changing all the calling locations to
> (maybe) add a flag or adding and tracking vif state is too risky to be worth this change I
> think.

Uh, why? I mean, at least mechanically replacing all the callers in that
driver wouldn't really be any different than adding a driver flag, but
is so much more flexible and can be used elsewhere. I don't think I buy
this argument much really.

Yes, I understand that there's always resistance to changing something
that works, but ...

Really I think the right thing to do here would be 1., and let mac80211
sort out the sequencing.

Consider

add interface wlan0
add interface wlan1
iterate active interfaces -> wlan0 wlan1
add interface wlan2
iterate active interfaces -> wlan0 wlan1 wlan2

If you apply this scenario to a restart, which ought to be functionally
equivalent to the normal startup, just compressed in time, you're
basically saying that today you get

add interface wlan0
add interface wlan1
iterate active interfaces -> wlan0 wlan1 wlan2 << problem here
add interface wlan2
iterate active interfaces -> wlan0 wlan1 wlan2

which yeah, totally seems wrong.

But fixing that to be

add interface wlan0
add interface wlan1
iterate active interfaces ->
<nothing>
add interface wlan2
iterate active interfaces -> <nothing>
(or
maybe -> wlan0 wlan1 wlan2 if the reconfig already completed)

seems equally wrong?

johannes
Ben Greear July 30, 2020, 2:52 p.m. UTC | #6
On 7/30/20 6:41 AM, Johannes Berg wrote:
> On Thu, 2020-07-30 at 06:27 -0700, Ben Greear wrote:
> 
>>> 1. track per vif whether it was re-added, and skip before it is
>>>
>>> If that works, I can certainly get behind it for semantic reasons (the
>>> vif isn't yet active again), although even there I'm not sure how
>>> iwlwifi would behave - but that's something I'd look into and perhaps
>>> even consider a bug there since it shouldn't know about that interface
>>> yet.
>>
>> Wouldn't that complicate the sdata-in-driver check even more?  So it would
>> be something like is-in-driver-but-not-yet-reconfigured-in-driver?
> 
> We should probably just clear the "is-in-driver" flag for the most part,
> and just remember "was-in-driver" so we know which ones to reconfigure,
> or something like that?
> 
>> This sort of state is quite nasty in my experience.  Almost better if
>> we had less state.  Driver should already know if it has an object or not,
>> so redundant adds, or requests from mac80211 for objects the driver does not
>> have can be handled properly by the driver?
> 
> I don't think that'll work. Drivers just act on "add" and "remove",
> they're not checking against double-add. And IMHO it makes more sense to
> have mac80211 do good sequencing than the throw our hands in the air and
> let the drivers have to be idempotent just because we can't figure out
> the right sequencing?

They certainly could check.  At the least, removing something that is already
gone should be an easy check.  That would at least let mac80211 be more confident
about cleanup.  The current in-kernel code that cleared SDATA_IN_DRIVER in
the non-recoverable firmware crash bit just assumed that drivers would clean
things up, but I'm not even sure they can entirely clean things up since often
the objects in their local memory are held by mac80211 objects (ie, the priv
opaque objects).


>>> 2. If for some reason that doesn't work, add an iteration flag that
>>> controls this, rather than a per-device config?
>>
>> I wrote this patch probably 5 or so years ago, and since I have fixed most of
>> the ath10k firmware crashes that would tend to trigger this bug.  I think I have no
>> good way to test any complicated change to this patch.
>>
>> I am quite certain that ath9k and ath10k are at least not harmed by this patch, and
>> certainly old ath10k benefited from this.  So, I'm comfortable adding a driver
>> level flag enabled for those two drivers.  Changing all the calling locations to
>> (maybe) add a flag or adding and tracking vif state is too risky to be worth this change I
>> think.
> 
> Uh, why? I mean, at least mechanically replacing all the callers in that
> driver wouldn't really be any different than adding a driver flag, but
> is so much more flexible and can be used elsewhere. I don't think I buy
> this argument much really.
> 
> Yes, I understand that there's always resistance to changing something
> that works, but ...
> 
> Really I think the right thing to do here would be 1., and let mac80211
> sort out the sequencing.
> 
> Consider
> 
> add interface wlan0
> add interface wlan1
> iterate active interfaces -> wlan0 wlan1
> add interface wlan2
> iterate active interfaces -> wlan0 wlan1 wlan2
> 
> If you apply this scenario to a restart, which ought to be functionally
> equivalent to the normal startup, just compressed in time, you're
> basically saying that today you get
> 
> add interface wlan0
> add interface wlan1
> iterate active interfaces -> wlan0 wlan1 wlan2 << problem here
> add interface wlan2
> iterate active interfaces -> wlan0 wlan1 wlan2
> 
> which yeah, totally seems wrong.
> 
> But fixing that to be
> 
> add interface wlan0
> add interface wlan1
> iterate active interfaces ->
> <nothing>
> add interface wlan2
> iterate active interfaces -> <nothing>
> (or
> maybe -> wlan0 wlan1 wlan2 if the reconfig already completed)
> 
> seems equally wrong?

So, looks like there is a flags option passed to the iterate logic, and it is indeed called
directly from drivers.  So, I could just add a new flag value, and | it in when calling from ath10k.

I'm not sure it would really solve the second case, but at least in practice,
that one doesn't seem to be a problem with ath10k, and the first case *was*
a problem.

If that sounds OK to you, I'll work on the patch as described.

Thanks,
Ben

> 
> johannes
>
Johannes Berg July 30, 2020, 3:03 p.m. UTC | #7
On Thu, 2020-07-30 at 07:52 -0700, Ben Greear wrote:

> > Consider
> > 
> > add interface wlan0
> > add interface wlan1
> > iterate active interfaces -> wlan0 wlan1
> > add interface wlan2
> > iterate active interfaces -> wlan0 wlan1 wlan2
> > 
> > If you apply this scenario to a restart, which ought to be functionally
> > equivalent to the normal startup, just compressed in time, you're
> > basically saying that today you get
> > 
> > add interface wlan0
> > add interface wlan1
> > iterate active interfaces -> wlan0 wlan1 wlan2 << problem here
> > add interface wlan2
> > iterate active interfaces -> wlan0 wlan1 wlan2
> > 
> > which yeah, totally seems wrong.
> > 
> > But fixing that to be
> > 
> > add interface wlan0
> > add interface wlan1
> > iterate active interfaces ->
> > <nothing>
> > add interface wlan2
> > iterate active interfaces -> <nothing>
> > (or
> > maybe -> wlan0 wlan1 wlan2 if the reconfig already completed)
> > 
> > seems equally wrong?
> 
> So, looks like there is a flags option passed to the iterate logic, and it is indeed called
> directly from drivers.  So, I could just add a new flag value, and | it in when calling from ath10k.
> 
> I'm not sure it would really solve the second case, but at least in practice,
> that one doesn't seem to be a problem with ath10k, and the first case *was*
> a problem.
> 
> If that sounds OK to you, I'll work on the patch as described.

Right, that'd be the option 2. I described earlier. I can live with that
even if I'd prefer to fix it as per 1. to "make sense". But I guess
there could even be "more legitimate" cases to not want to iterate while
restarting, even if I'm not really sure where that'd make sense?

I guess Kalle should comment on whether he'd accept that into the
driver.

Kalle, as you can see above mac80211 appears to be broken wrt. iterating
"active" interfaces during a restart - the iteration considers all
interfaces active that were active before the restart, not just the ones
that were already re-added to the driver. Ben says this causes trouble
in ath10k.

IMHO the right fix for this would be to fix the iteration to only reach
the ones that have been re-added, like I've said above. OTOH, Ben isn't
really convinced that that's right, and has experience with a patch that
makes mac80211 return *no* interfaces whatsoever in the iteration when
done while in restart. Like I say there, it seems wrong to me.

But depending on what ath10k actually _does_ with this list, perhaps
it's not an issue. Perhaps it's just transient state that it derives
from it, so if it does it again after the reconfig is completed, it
would in fact get all the information it needed.

I'm pretty sure this would break iwlwifi, so one option (less preferred)
would be to add a flag to say "skip iteration in reconfig".

actually does the driver know it's in reconfig? Perhaps it could even do
that completely on its own?

Anyway, the question is what you think about doing such a thing in the
driver, if it fixes issues even if it's probably not really correct.

johannes
Ben Greear Sept. 16, 2020, 10:28 p.m. UTC | #8
On 7/30/20 8:03 AM, Johannes Berg wrote:
> On Thu, 2020-07-30 at 07:52 -0700, Ben Greear wrote:
> 
>>> Consider
>>>
>>> add interface wlan0
>>> add interface wlan1
>>> iterate active interfaces -> wlan0 wlan1
>>> add interface wlan2
>>> iterate active interfaces -> wlan0 wlan1 wlan2
>>>
>>> If you apply this scenario to a restart, which ought to be functionally
>>> equivalent to the normal startup, just compressed in time, you're
>>> basically saying that today you get
>>>
>>> add interface wlan0
>>> add interface wlan1
>>> iterate active interfaces -> wlan0 wlan1 wlan2 << problem here
>>> add interface wlan2
>>> iterate active interfaces -> wlan0 wlan1 wlan2
>>>
>>> which yeah, totally seems wrong.
>>>
>>> But fixing that to be
>>>
>>> add interface wlan0
>>> add interface wlan1
>>> iterate active interfaces ->
>>> <nothing>
>>> add interface wlan2
>>> iterate active interfaces -> <nothing>
>>> (or
>>> maybe -> wlan0 wlan1 wlan2 if the reconfig already completed)
>>>
>>> seems equally wrong?
>>
>> So, looks like there is a flags option passed to the iterate logic, and it is indeed called
>> directly from drivers.  So, I could just add a new flag value, and | it in when calling from ath10k.
>>
>> I'm not sure it would really solve the second case, but at least in practice,
>> that one doesn't seem to be a problem with ath10k, and the first case *was*
>> a problem.
>>
>> If that sounds OK to you, I'll work on the patch as described.
> 
> Right, that'd be the option 2. I described earlier. I can live with that
> even if I'd prefer to fix it as per 1. to "make sense". But I guess
> there could even be "more legitimate" cases to not want to iterate while
> restarting, even if I'm not really sure where that'd make sense?
> 
> I guess Kalle should comment on whether he'd accept that into the
> driver.
> 
> Kalle, as you can see above mac80211 appears to be broken wrt. iterating
> "active" interfaces during a restart - the iteration considers all
> interfaces active that were active before the restart, not just the ones
> that were already re-added to the driver. Ben says this causes trouble
> in ath10k.
> 
> IMHO the right fix for this would be to fix the iteration to only reach
> the ones that have been re-added, like I've said above. OTOH, Ben isn't
> really convinced that that's right, and has experience with a patch that
> makes mac80211 return *no* interfaces whatsoever in the iteration when
> done while in restart. Like I say there, it seems wrong to me.
> 
> But depending on what ath10k actually _does_ with this list, perhaps
> it's not an issue. Perhaps it's just transient state that it derives
> from it, so if it does it again after the reconfig is completed, it
> would in fact get all the information it needed.
> 
> I'm pretty sure this would break iwlwifi, so one option (less preferred)
> would be to add a flag to say "skip iteration in reconfig".
> 
> actually does the driver know it's in reconfig? Perhaps it could even do
> that completely on its own?
> 
> Anyway, the question is what you think about doing such a thing in the
> driver, if it fixes issues even if it's probably not really correct.
> 
> johannes

So, no response from Kalle on this for some time.  I thought I'd ping one
more time before I make an effort to create another out-of-tree patch.

Johannes, if you are OK with a new flag in mac80211, then I can patch ath10k-ct
driver regardless of whether other drivers use it.

Thanks,
Ben
Kalle Valo Sept. 21, 2020, 8:50 a.m. UTC | #9
Johannes Berg <johannes@sipsolutions.net> writes:

> On Thu, 2020-07-30 at 07:52 -0700, Ben Greear wrote:
>
>> > Consider
>> > 
>> > add interface wlan0
>> > add interface wlan1
>> > iterate active interfaces -> wlan0 wlan1
>> > add interface wlan2
>> > iterate active interfaces -> wlan0 wlan1 wlan2
>> > 
>> > If you apply this scenario to a restart, which ought to be functionally
>> > equivalent to the normal startup, just compressed in time, you're
>> > basically saying that today you get
>> > 
>> > add interface wlan0
>> > add interface wlan1
>> > iterate active interfaces -> wlan0 wlan1 wlan2 << problem here
>> > add interface wlan2
>> > iterate active interfaces -> wlan0 wlan1 wlan2
>> > 
>> > which yeah, totally seems wrong.
>> > 
>> > But fixing that to be
>> > 
>> > add interface wlan0
>> > add interface wlan1
>> > iterate active interfaces ->
>> > <nothing>
>> > add interface wlan2
>> > iterate active interfaces -> <nothing>
>> > (or
>> > maybe -> wlan0 wlan1 wlan2 if the reconfig already completed)
>> > 
>> > seems equally wrong?
>> 
>> So, looks like there is a flags option passed to the iterate logic,
>> and it is indeed called
>> directly from drivers. So, I could just add a new flag value, and |
>> it in when calling from ath10k.
>> 
>> I'm not sure it would really solve the second case, but at least in practice,
>> that one doesn't seem to be a problem with ath10k, and the first case *was*
>> a problem.
>> 
>> If that sounds OK to you, I'll work on the patch as described.
>
> Right, that'd be the option 2. I described earlier. I can live with that
> even if I'd prefer to fix it as per 1. to "make sense". But I guess
> there could even be "more legitimate" cases to not want to iterate while
> restarting, even if I'm not really sure where that'd make sense?
>
> I guess Kalle should comment on whether he'd accept that into the
> driver.
>
> Kalle, as you can see above mac80211 appears to be broken wrt. iterating
> "active" interfaces during a restart - the iteration considers all
> interfaces active that were active before the restart, not just the ones
> that were already re-added to the driver. Ben says this causes trouble
> in ath10k.
>
> IMHO the right fix for this would be to fix the iteration to only reach
> the ones that have been re-added, like I've said above. OTOH, Ben isn't
> really convinced that that's right, and has experience with a patch that
> makes mac80211 return *no* interfaces whatsoever in the iteration when
> done while in restart. Like I say there, it seems wrong to me.
>
> But depending on what ath10k actually _does_ with this list, perhaps
> it's not an issue. Perhaps it's just transient state that it derives
> from it, so if it does it again after the reconfig is completed, it
> would in fact get all the information it needed.
>
> I'm pretty sure this would break iwlwifi, so one option (less preferred)
> would be to add a flag to say "skip iteration in reconfig".

To me it sounds fine to have such flag in ath10k. I just want the ath10k
patch test tested with upstream ath10k and firmware because Ben's driver
and firmware might behave differently.

> actually does the driver know it's in reconfig? Perhaps it could even do
> that completely on its own?

We do have ar->state which tracks the reconfig process. For example, in
ath10k_reconfig_complete() we have this check:

	/* If device failed to restart it will be in a different state, e.g.
	 * ATH10K_STATE_WEDGED
	 */
	if (ar->state == ATH10K_STATE_RESTARTED) {
		ath10k_info(ar, "device successfully recovered\n");
		ar->state = ATH10K_STATE_ON;
		ieee80211_wake_queues(ar->hw);
	}
diff mbox series

Patch

diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 5db2cd0..186a696 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -831,7 +831,7 @@  static void __iterate_interfaces(struct ieee80211_local *local,
 			break;
 		}
 		if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) &&
-		    active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER))
+		    (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER))))
 			continue;
 		if (ieee80211_sdata_running(sdata) || !active_only)
 			iterator(data, sdata->vif.addr,