diff mbox series

[v5.18] ath9k: Properly clear TX status area before reporting to mac80211

Message ID 20220330164409.16645-1-toke@toke.dk (mailing list archive)
State Accepted
Commit 037250f0a45cf9ecf5b52d4b9ff8eadeb609c800
Delegated to: Kalle Valo
Headers show
Series [v5.18] ath9k: Properly clear TX status area before reporting to mac80211 | expand

Commit Message

Toke Høiland-Jørgensen March 30, 2022, 4:44 p.m. UTC
The ath9k driver was not properly clearing the status area in the
ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
it was manually filling in fields, which meant that fields introduced later
were left as-is.

Conveniently, mac80211 actually provides a helper to zero out the status
area, so use that to make sure we zero everything.

The last commit touching the driver function writing the status information
seems to have actually been fixing an issue that was also caused by the
area being uninitialised; but it only added clearing of a single field
instead of the whole struct. That is now redundant, though, so revert that
commit and use it as a convenient Fixes tag.

Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Bagas Sanjaya March 31, 2022, 5:06 a.m. UTC | #1
On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
> The ath9k driver was not properly clearing the status area in the
> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> it was manually filling in fields, which meant that fields introduced later
> were left as-is.
> 
> Conveniently, mac80211 actually provides a helper to zero out the status
> area, so use that to make sure we zero everything.
> 
> The last commit touching the driver function writing the status information
> seems to have actually been fixing an issue that was also caused by the
> area being uninitialised; but it only added clearing of a single field
> instead of the whole struct. That is now redundant, though, so revert that
> commit and use it as a convenient Fixes tag.
> 
> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>

No regressions and UBSAN warning [1] reported on dmesg.

Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>

However, there is something missing. I don't see Cc: stable@vger.kernel.org
trailer in this patch. I think it should, because I reported that this issue
first occurred on v5.17, then still appeared on v5.17.1.
Bagas Sanjaya March 31, 2022, 5:18 a.m. UTC | #2
On 31/03/22 12.06, Bagas Sanjaya wrote:
> No regressions and UBSAN warning [1] reported on dmesg.
> 
> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
> 
> However, there is something missing. I don't see Cc: stable@vger.kernel.org
> trailer in this patch. I think it should, because I reported that this issue
> first occurred on v5.17, then still appeared on v5.17.1.
> 

Oops, missing link for [1].

[1]: https://lore.kernel.org/linux-wireless/87bkxn4kpf.fsf@toke.dk/T/#mc2ef5bbdf51fd4ead05b1115cedd58153675365c
Kalle Valo March 31, 2022, 5:36 a.m. UTC | #3
Bagas Sanjaya <bagasdotme@gmail.com> writes:

> On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
>> The ath9k driver was not properly clearing the status area in the
>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> it was manually filling in fields, which meant that fields introduced later
>> were left as-is.
>>
>> Conveniently, mac80211 actually provides a helper to zero out the status
>> area, so use that to make sure we zero everything.
>>
>> The last commit touching the driver function writing the status information
>> seems to have actually been fixing an issue that was also caused by the
>> area being uninitialised; but it only added clearing of a single field
>> instead of the whole struct. That is now redundant, though, so revert that
>> commit and use it as a convenient Fixes tag.
>>
>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>
> No regressions and UBSAN warning [1] reported on dmesg.
>
> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
>
> However, there is something missing. I don't see Cc: stable@vger.kernel.org
> trailer in this patch. I think it should, because I reported that this issue
> first occurred on v5.17, then still appeared on v5.17.1.

I can add that during commit.
Toke Høiland-Jørgensen March 31, 2022, 8:35 a.m. UTC | #4
Kalle Valo <kvalo@kernel.org> writes:

> Bagas Sanjaya <bagasdotme@gmail.com> writes:
>
>> On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
>>> The ath9k driver was not properly clearing the status area in the
>>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>>> it was manually filling in fields, which meant that fields introduced later
>>> were left as-is.
>>>
>>> Conveniently, mac80211 actually provides a helper to zero out the status
>>> area, so use that to make sure we zero everything.
>>>
>>> The last commit touching the driver function writing the status information
>>> seems to have actually been fixing an issue that was also caused by the
>>> area being uninitialised; but it only added clearing of a single field
>>> instead of the whole struct. That is now redundant, though, so revert that
>>> commit and use it as a convenient Fixes tag.
>>>
>>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>>> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>
>> No regressions and UBSAN warning [1] reported on dmesg.
>>
>> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
>>
>> However, there is something missing. I don't see Cc: stable@vger.kernel.org
>> trailer in this patch. I think it should, because I reported that this issue
>> first occurred on v5.17, then still appeared on v5.17.1.
>
> I can add that during commit.

Thanks! And sorry about that, I have gotten so used to the netdev policy
of not including an explicit stable Cc that I totally forgot that this
doesn't apply to the wireless tree.

In any case I think the stable tree autoselection bit should pick it up
from the Fixes tag, though...

-Toke
Kalle Valo March 31, 2022, 9:31 a.m. UTC | #5
Toke Høiland-Jørgensen <toke@toke.dk> writes:

> Kalle Valo <kvalo@kernel.org> writes:
>
>> Bagas Sanjaya <bagasdotme@gmail.com> writes:
>>
>>> On 30/03/22 23.44, Toke Høiland-Jørgensen wrote:
>>>> The ath9k driver was not properly clearing the status area in the
>>>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>>>> it was manually filling in fields, which meant that fields introduced later
>>>> were left as-is.
>>>>
>>>> Conveniently, mac80211 actually provides a helper to zero out the status
>>>> area, so use that to make sure we zero everything.
>>>>
>>>> The last commit touching the driver function writing the status information
>>>> seems to have actually been fixing an issue that was also caused by the
>>>> area being uninitialised; but it only added clearing of a single field
>>>> instead of the whole struct. That is now redundant, though, so revert that
>>>> commit and use it as a convenient Fixes tag.
>>>>
>>>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>>>> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>>>
>>> No regressions and UBSAN warning [1] reported on dmesg.
>>>
>>> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
>>>
>>> However, there is something missing. I don't see Cc: stable@vger.kernel.org
>>> trailer in this patch. I think it should, because I reported that this issue
>>> first occurred on v5.17, then still appeared on v5.17.1.
>>
>> I can add that during commit.
>
> Thanks! And sorry about that, I have gotten so used to the netdev policy
> of not including an explicit stable Cc that I totally forgot that this
> doesn't apply to the wireless tree.

No worries!

> In any case I think the stable tree autoselection bit should pick it up
> from the Fixes tag, though...

Yeah, I agree. I also only add the Fixes tag to my patches, and let the
stable team to choose if they want the commit or not. But if people ask
to add cc stable I'll do that, it's easy enough as I have the edit
command in my patchwork script :)
Kalle Valo April 1, 2022, 6:37 a.m. UTC | #6
Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> The ath9k driver was not properly clearing the status area in the
> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> it was manually filling in fields, which meant that fields introduced later
> were left as-is.
> 
> Conveniently, mac80211 actually provides a helper to zero out the status
> area, so use that to make sure we zero everything.
> 
> The last commit touching the driver function writing the status information
> seems to have actually been fixing an issue that was also caused by the
> area being uninitialised; but it only added clearing of a single field
> instead of the whole struct. That is now redundant, though, so revert that
> commit and use it as a convenient Fixes tag.
> 
> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>

Patch applied to wireless.git, thanks.

037250f0a45c ath9k: Properly clear TX status area before reporting to mac80211
Peter Seiderer April 1, 2022, 5:26 p.m. UTC | #7
Hello Toke,

On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> The ath9k driver was not properly clearing the status area in the
> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> it was manually filling in fields, which meant that fields introduced later
> were left as-is.
> 
> Conveniently, mac80211 actually provides a helper to zero out the status
> area, so use that to make sure we zero everything.
> 
> The last commit touching the driver function writing the status information
> seems to have actually been fixing an issue that was also caused by the
> area being uninitialised; but it only added clearing of a single field
> instead of the whole struct. That is now redundant, though, so revert that
> commit and use it as a convenient Fixes tag.
> 
> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> ---
>  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> index d0caf1de2bde..cbcf96ac303e 100644
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>  	struct ath_hw *ah = sc->sc_ah;
>  	u8 i, tx_rateindex;
>  
> +	ieee80211_tx_info_clear_status(tx_info);
> +

As this also clears the status.rates[].count, see include/net/mac80211.h:

1195 static inline void
1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
1197 {
1198         int i;
1199         
1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
1201                      offsetof(struct ieee80211_tx_info, control.rates));
1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
1203                      offsetof(struct ieee80211_tx_info, driver_rates));
1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
1205         /* clear the rate counts */
1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
1207                 info->status.rates[i].count = 0;
1208         memset_after(&info->status, 0, rates);
1209 }

I would have expected some lines added to restore the count (for the
rates with index < tx_rateindex), e.g. as done in
drivers/net/wireless/ath/ath5k/base.c:

1731         ieee80211_tx_info_clear_status(info);
1732 
1733         for (i = 0; i < ts->ts_final_idx; i++) {
1734                 struct ieee80211_tx_rate *r =
1735                         &info->status.rates[i];
1736 
1737                 r->count = tries[i];
1738         }

In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
the tx_rateindex index (which is often zero in case the first suggested rate
succeeds, but in noisy environment is sometimes > 0)...

>  	if (txok)
>  		tx_info->status.ack_signal = ts->ts_rssi;
>  
> @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>  	}
>  
>  	tx_info->status.rates[)].count = ts->ts_longretry + 1;
> -
> -	/* we report airtime in ath_tx_count_airtime(), don't report twice */
> -	tx_info->status.tx_time = 0;
>  }
>  
>  static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)


And from drivers/net/wireless/ath/ath9k/xmit.c:

2592         for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
2593                 tx_info->status.rates[i].count = 0;
2594                 tx_info->status.rates[i].idx = -1;
2595         }

Line 2593 can be deleted as status.rates[].count is already zeroed through the
ieee80211_tx_info_clear_status() call...

And it should be sufficient to do:

	if (tx_rateindex + 1 < hw->max_rates)
		tx_info->status.rates[tx_rateindex + 1].idx = -1;

Regards,
Peter
Toke Høiland-Jørgensen April 2, 2022, noon UTC | #8
Peter Seiderer <ps.report@gmx.net> writes:

> Hello Toke,
>
> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
>> The ath9k driver was not properly clearing the status area in the
>> ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> it was manually filling in fields, which meant that fields introduced later
>> were left as-is.
>> 
>> Conveniently, mac80211 actually provides a helper to zero out the status
>> area, so use that to make sure we zero everything.
>> 
>> The last commit touching the driver function writing the status information
>> seems to have actually been fixing an issue that was also caused by the
>> area being uninitialised; but it only added clearing of a single field
>> instead of the whole struct. That is now redundant, though, so revert that
>> commit and use it as a convenient Fixes tag.
>> 
>> Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> ---
>>  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> index d0caf1de2bde..cbcf96ac303e 100644
>> --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>>  	struct ath_hw *ah = sc->sc_ah;
>>  	u8 i, tx_rateindex;
>>  
>> +	ieee80211_tx_info_clear_status(tx_info);
>> +
>
> As this also clears the status.rates[].count, see include/net/mac80211.h:
>
> 1195 static inline void
> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> 1197 {
> 1198         int i;
> 1199         
> 1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1201                      offsetof(struct ieee80211_tx_info, control.rates));
> 1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1203                      offsetof(struct ieee80211_tx_info, driver_rates));
> 1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
> 1205         /* clear the rate counts */
> 1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> 1207                 info->status.rates[i].count = 0;
> 1208         memset_after(&info->status, 0, rates);
> 1209 }
>
> I would have expected some lines added to restore the count (for the
> rates with index < tx_rateindex), e.g. as done in
> drivers/net/wireless/ath/ath5k/base.c:
>
> 1731         ieee80211_tx_info_clear_status(info);
> 1732 
> 1733         for (i = 0; i < ts->ts_final_idx; i++) {
> 1734                 struct ieee80211_tx_rate *r =
> 1735                         &info->status.rates[i];
> 1736 
> 1737                 r->count = tries[i];
> 1738         }
>
> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> the tx_rateindex index (which is often zero in case the first suggested rate
> succeeds, but in noisy environment is sometimes > 0)...

Ah, you're right! I looked at that code, and somehow parsed that as "OK,
it's setting all the rate counts, we're fine". But obviously that's not
what that code is doing, so we'll lose some information now. Bugger :(

I'll send a follow-up, thanks for flagging this!

-Toke
Peter Seiderer April 2, 2022, 2:33 p.m. UTC | #9
Hello Toke,

On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <ps.report@gmx.net> wrote:

> Hello Toke,
> 
> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> 
> > The ath9k driver was not properly clearing the status area in the
> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> > it was manually filling in fields, which meant that fields introduced later
> > were left as-is.
> > 
> > Conveniently, mac80211 actually provides a helper to zero out the status
> > area, so use that to make sure we zero everything.
> > 
> > The last commit touching the driver function writing the status information
> > seems to have actually been fixing an issue that was also caused by the
> > area being uninitialised; but it only added clearing of a single field
> > instead of the whole struct. That is now redundant, though, so revert that
> > commit and use it as a convenient Fixes tag.
> > 
> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> > Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
> > Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> > ---
> >  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> > index d0caf1de2bde..cbcf96ac303e 100644
> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> >  	struct ath_hw *ah = sc->sc_ah;
> >  	u8 i, tx_rateindex;
> >  
> > +	ieee80211_tx_info_clear_status(tx_info);
> > +  
> 
> As this also clears the status.rates[].count, see include/net/mac80211.h:
> 
> 1195 static inline void
> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> 1197 {
> 1198         int i;
> 1199         
> 1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1201                      offsetof(struct ieee80211_tx_info, control.rates));
> 1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 1203                      offsetof(struct ieee80211_tx_info, driver_rates));
> 1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
> 1205         /* clear the rate counts */
> 1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> 1207                 info->status.rates[i].count = 0;
> 1208         memset_after(&info->status, 0, rates);
> 1209 }
> 
> I would have expected some lines added to restore the count (for the
> rates with index < tx_rateindex), e.g. as done in
> drivers/net/wireless/ath/ath5k/base.c:
> 
> 1731         ieee80211_tx_info_clear_status(info);
> 1732 
> 1733         for (i = 0; i < ts->ts_final_idx; i++) {
> 1734                 struct ieee80211_tx_rate *r =
> 1735                         &info->status.rates[i];
> 1736 
> 1737                 r->count = tries[i];
> 1738         }
> 
> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> the tx_rateindex index (which is often zero in case the first suggested rate
> succeeds, but in noisy environment is sometimes > 0)...
> 
> >  	if (txok)
> >  		tx_info->status.ack_signal = ts->ts_rssi;
> >  
> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> >  	}
> >  
> >  	tx_info->status.rates[)].count = ts->ts_longretry + 1;
> > -
> > -	/* we report airtime in ath_tx_count_airtime(), don't report twice */
> > -	tx_info->status.tx_time = 0;
> >  }
> >  
> >  static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)  
> 
> 
> And from drivers/net/wireless/ath/ath9k/xmit.c:
> 
> 2592         for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
> 2593                 tx_info->status.rates[i].count = 0;
> 2594                 tx_info->status.rates[i].idx = -1;
> 2595         }
> 
> Line 2593 can be deleted as status.rates[].count is already zeroed through the
> ieee80211_tx_info_clear_status() call...
> 
> And it should be sufficient to do:
> 
> 	if (tx_rateindex + 1 < hw->max_rates)
> 		tx_info->status.rates[tx_rateindex + 1].idx = -1;
> 
> Regards,
> Peter

And one additional problem found with your patch applied (to 5.16.18), the log
get spammed by:

[  445.489139] ------------[ cut here ]------------
[  445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
[  445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
[  445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G        W         5.16.18+ #2
[  445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
[  445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
[  445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
[  445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
[  445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
[  445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
[  445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
[  445.489362] Call Trace:
[  445.489366]  <SOFTIRQ>
[  445.489372]  ath_tx_complete_buf+0x100/0x130 [ath9k]
[  445.489390]  ath_tx_process_buffer+0x187/0xb40 [ath9k]
[  445.489407]  ? resched_curr+0x1d/0xc0
[  445.489419]  ? check_preempt_wakeup+0x115/0x250
[  445.489429]  ? task_fork_fair+0xc0/0x170
[  445.489436]  ? put_prev_task_fair+0x40/0x40
[  445.489444]  ? check_preempt_curr+0x62/0x70
[  445.489452]  ? __local_bh_enable_ip+0x33/0x80
[  445.489460]  ? _raw_spin_unlock_bh+0x13/0x20
[  445.489468]  ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
[  445.489485]  ? _raw_spin_unlock_irqrestore+0x16/0x30
[  445.489492]  ? try_to_wake_up+0x7c/0x550
[  445.489499]  ath_tx_tasklet+0x1de/0x2e0 [ath9k]
[  445.489519]  ath9k_tasklet+0x22f/0x330 [ath9k]
[  445.489535]  tasklet_action_common.constprop.0+0x89/0xb0
[  445.489544]  tasklet_action+0x21/0x30
[  445.489551]  __do_softirq+0xc5/0x28b
[  445.489559]  ? __entry_text_end+0x4/0x4
[  445.489567]  call_on_stack+0x40/0x50
[  445.489576]  </SOFTIRQ>
[  445.489580]  ? irq_exit_rcu+0x92/0x100
[  445.489587]  ? common_interrupt+0x27/0x40
[  445.489595]  ? asm_common_interrupt+0x102/0x140
[  445.489605] ---[ end trace 5c176b666255bca1 ]---
[  445.607206] ------------[ cut here ]------------


Could be fixed by the following additional change:

--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
        struct ieee80211_hw *hw = sc->hw;
        struct ath_hw *ah = sc->sc_ah;
        u8 i, tx_rateindex;
+       bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;

        ieee80211_tx_info_clear_status(tx_info);

+       tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
+
        if (txok)

Regards,
Peter
Toke Høiland-Jørgensen April 2, 2022, 3:11 p.m. UTC | #10
Peter Seiderer <ps.report@gmx.net> writes:

> Hello Toke,
>
> On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <ps.report@gmx.net> wrote:
>
>> Hello Toke,
>> 
>> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> 
>> > The ath9k driver was not properly clearing the status area in the
>> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> > it was manually filling in fields, which meant that fields introduced later
>> > were left as-is.
>> > 
>> > Conveniently, mac80211 actually provides a helper to zero out the status
>> > area, so use that to make sure we zero everything.
>> > 
>> > The last commit touching the driver function writing the status information
>> > seems to have actually been fixing an issue that was also caused by the
>> > area being uninitialised; but it only added clearing of a single field
>> > instead of the whole struct. That is now redundant, though, so revert that
>> > commit and use it as a convenient Fixes tag.
>> > 
>> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> > Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
>> > Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> > ---
>> >  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>> >  1 file changed, 2 insertions(+), 3 deletions(-)
>> > 
>> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> > index d0caf1de2bde..cbcf96ac303e 100644
>> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> >  	struct ath_hw *ah = sc->sc_ah;
>> >  	u8 i, tx_rateindex;
>> >  
>> > +	ieee80211_tx_info_clear_status(tx_info);
>> > +  
>> 
>> As this also clears the status.rates[].count, see include/net/mac80211.h:
>> 
>> 1195 static inline void
>> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
>> 1197 {
>> 1198         int i;
>> 1199         
>> 1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> 1201                      offsetof(struct ieee80211_tx_info, control.rates));
>> 1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> 1203                      offsetof(struct ieee80211_tx_info, driver_rates));
>> 1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
>> 1205         /* clear the rate counts */
>> 1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>> 1207                 info->status.rates[i].count = 0;
>> 1208         memset_after(&info->status, 0, rates);
>> 1209 }
>> 
>> I would have expected some lines added to restore the count (for the
>> rates with index < tx_rateindex), e.g. as done in
>> drivers/net/wireless/ath/ath5k/base.c:
>> 
>> 1731         ieee80211_tx_info_clear_status(info);
>> 1732 
>> 1733         for (i = 0; i < ts->ts_final_idx; i++) {
>> 1734                 struct ieee80211_tx_rate *r =
>> 1735                         &info->status.rates[i];
>> 1736 
>> 1737                 r->count = tries[i];
>> 1738         }
>> 
>> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
>> the tx_rateindex index (which is often zero in case the first suggested rate
>> succeeds, but in noisy environment is sometimes > 0)...
>> 
>> >  	if (txok)
>> >  		tx_info->status.ack_signal = ts->ts_rssi;
>> >  
>> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> >  	}
>> >  
>> >  	tx_info->status.rates[)].count = ts->ts_longretry + 1;
>> > -
>> > -	/* we report airtime in ath_tx_count_airtime(), don't report twice */
>> > -	tx_info->status.tx_time = 0;
>> >  }
>> >  
>> >  static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)  
>> 
>> 
>> And from drivers/net/wireless/ath/ath9k/xmit.c:
>> 
>> 2592         for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
>> 2593                 tx_info->status.rates[i].count = 0;
>> 2594                 tx_info->status.rates[i].idx = -1;
>> 2595         }
>> 
>> Line 2593 can be deleted as status.rates[].count is already zeroed through the
>> ieee80211_tx_info_clear_status() call...
>> 
>> And it should be sufficient to do:
>> 
>> 	if (tx_rateindex + 1 < hw->max_rates)
>> 		tx_info->status.rates[tx_rateindex + 1].idx = -1;
>> 
>> Regards,
>> Peter
>
> And one additional problem found with your patch applied (to 5.16.18), the log
> get spammed by:
>
> [  445.489139] ------------[ cut here ]------------
> [  445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
> [  445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
> [  445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G        W         5.16.18+ #2
> [  445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
> [  445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
> [  445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
> [  445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
> [  445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
> [  445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
> [  445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
> [  445.489362] Call Trace:
> [  445.489366]  <SOFTIRQ>
> [  445.489372]  ath_tx_complete_buf+0x100/0x130 [ath9k]
> [  445.489390]  ath_tx_process_buffer+0x187/0xb40 [ath9k]
> [  445.489407]  ? resched_curr+0x1d/0xc0
> [  445.489419]  ? check_preempt_wakeup+0x115/0x250
> [  445.489429]  ? task_fork_fair+0xc0/0x170
> [  445.489436]  ? put_prev_task_fair+0x40/0x40
> [  445.489444]  ? check_preempt_curr+0x62/0x70
> [  445.489452]  ? __local_bh_enable_ip+0x33/0x80
> [  445.489460]  ? _raw_spin_unlock_bh+0x13/0x20
> [  445.489468]  ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
> [  445.489485]  ? _raw_spin_unlock_irqrestore+0x16/0x30
> [  445.489492]  ? try_to_wake_up+0x7c/0x550
> [  445.489499]  ath_tx_tasklet+0x1de/0x2e0 [ath9k]
> [  445.489519]  ath9k_tasklet+0x22f/0x330 [ath9k]
> [  445.489535]  tasklet_action_common.constprop.0+0x89/0xb0
> [  445.489544]  tasklet_action+0x21/0x30
> [  445.489551]  __do_softirq+0xc5/0x28b
> [  445.489559]  ? __entry_text_end+0x4/0x4
> [  445.489567]  call_on_stack+0x40/0x50
> [  445.489576]  </SOFTIRQ>
> [  445.489580]  ? irq_exit_rcu+0x92/0x100
> [  445.489587]  ? common_interrupt+0x27/0x40
> [  445.489595]  ? asm_common_interrupt+0x102/0x140
> [  445.489605] ---[ end trace 5c176b666255bca1 ]---
> [  445.607206] ------------[ cut here ]------------
>
>
> Could be fixed by the following additional change:
>
> --- a/drivers/net/wireless/ath/ath9k/xmit.c
> +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>         struct ieee80211_hw *hw = sc->hw;
>         struct ath_hw *ah = sc->sc_ah;
>         u8 i, tx_rateindex;
> +       bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
>
>         ieee80211_tx_info_clear_status(tx_info);
>
> +       tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
> +
>         if (txok)

That doesn't seem right, zeroing is_valid_ack_signal was the whole point
of the first patch... Something seems off in that backtrace; could you
try running it through ./scripts/decode_stacktrace.sh please?

-Toke
Peter Seiderer April 2, 2022, 4:19 p.m. UTC | #11
Hello Toke,

On Sat, 02 Apr 2022 17:11:54 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Peter Seiderer <ps.report@gmx.net> writes:
> 
> > Hello Toke,
> >
> > On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <ps.report@gmx.net> wrote:
> >  
> >> Hello Toke,
> >> 
> >> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> >>   
> >> > The ath9k driver was not properly clearing the status area in the
> >> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> >> > it was manually filling in fields, which meant that fields introduced later
> >> > were left as-is.
> >> > 
> >> > Conveniently, mac80211 actually provides a helper to zero out the status
> >> > area, so use that to make sure we zero everything.
> >> > 
> >> > The last commit touching the driver function writing the status information
> >> > seems to have actually been fixing an issue that was also caused by the
> >> > area being uninitialised; but it only added clearing of a single field
> >> > instead of the whole struct. That is now redundant, though, so revert that
> >> > commit and use it as a convenient Fixes tag.
> >> > 
> >> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> >> > Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
> >> > Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> >> > ---
> >> >  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
> >> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >> > 
> >> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> >> > index d0caf1de2bde..cbcf96ac303e 100644
> >> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> >> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> >> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> >> >  	struct ath_hw *ah = sc->sc_ah;
> >> >  	u8 i, tx_rateindex;
> >> >  
> >> > +	ieee80211_tx_info_clear_status(tx_info);
> >> > +    
> >> 
> >> As this also clears the status.rates[].count, see include/net/mac80211.h:
> >> 
> >> 1195 static inline void
> >> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> >> 1197 {
> >> 1198         int i;
> >> 1199         
> >> 1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> >> 1201                      offsetof(struct ieee80211_tx_info, control.rates));
> >> 1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> >> 1203                      offsetof(struct ieee80211_tx_info, driver_rates));
> >> 1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
> >> 1205         /* clear the rate counts */
> >> 1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> >> 1207                 info->status.rates[i].count = 0;
> >> 1208         memset_after(&info->status, 0, rates);
> >> 1209 }
> >> 
> >> I would have expected some lines added to restore the count (for the
> >> rates with index < tx_rateindex), e.g. as done in
> >> drivers/net/wireless/ath/ath5k/base.c:
> >> 
> >> 1731         ieee80211_tx_info_clear_status(info);
> >> 1732 
> >> 1733         for (i = 0; i < ts->ts_final_idx; i++) {
> >> 1734                 struct ieee80211_tx_rate *r =
> >> 1735                         &info->status.rates[i];
> >> 1736 
> >> 1737                 r->count = tries[i];
> >> 1738         }
> >> 
> >> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> >> the tx_rateindex index (which is often zero in case the first suggested rate
> >> succeeds, but in noisy environment is sometimes > 0)...
> >>   
> >> >  	if (txok)
> >> >  		tx_info->status.ack_signal = ts->ts_rssi;
> >> >  
> >> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> >> >  	}
> >> >  
> >> >  	tx_info->status.rates[)].count = ts->ts_longretry + 1;
> >> > -
> >> > -	/* we report airtime in ath_tx_count_airtime(), don't report twice */
> >> > -	tx_info->status.tx_time = 0;
> >> >  }
> >> >  
> >> >  static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)    
> >> 
> >> 
> >> And from drivers/net/wireless/ath/ath9k/xmit.c:
> >> 
> >> 2592         for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
> >> 2593                 tx_info->status.rates[i].count = 0;
> >> 2594                 tx_info->status.rates[i].idx = -1;
> >> 2595         }
> >> 
> >> Line 2593 can be deleted as status.rates[].count is already zeroed through the
> >> ieee80211_tx_info_clear_status() call...
> >> 
> >> And it should be sufficient to do:
> >> 
> >> 	if (tx_rateindex + 1 < hw->max_rates)
> >> 		tx_info->status.rates[tx_rateindex + 1].idx = -1;
> >> 
> >> Regards,
> >> Peter  
> >
> > And one additional problem found with your patch applied (to 5.16.18), the log
> > get spammed by:
> >
> > [  445.489139] ------------[ cut here ]------------
> > [  445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
> > [  445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
> > [  445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G        W         5.16.18+ #2
> > [  445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
> > [  445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
> > [  445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
> > [  445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
> > [  445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
> > [  445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
> > [  445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
> > [  445.489362] Call Trace:
> > [  445.489366]  <SOFTIRQ>
> > [  445.489372]  ath_tx_complete_buf+0x100/0x130 [ath9k]
> > [  445.489390]  ath_tx_process_buffer+0x187/0xb40 [ath9k]
> > [  445.489407]  ? resched_curr+0x1d/0xc0
> > [  445.489419]  ? check_preempt_wakeup+0x115/0x250
> > [  445.489429]  ? task_fork_fair+0xc0/0x170
> > [  445.489436]  ? put_prev_task_fair+0x40/0x40
> > [  445.489444]  ? check_preempt_curr+0x62/0x70
> > [  445.489452]  ? __local_bh_enable_ip+0x33/0x80
> > [  445.489460]  ? _raw_spin_unlock_bh+0x13/0x20
> > [  445.489468]  ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
> > [  445.489485]  ? _raw_spin_unlock_irqrestore+0x16/0x30
> > [  445.489492]  ? try_to_wake_up+0x7c/0x550
> > [  445.489499]  ath_tx_tasklet+0x1de/0x2e0 [ath9k]
> > [  445.489519]  ath9k_tasklet+0x22f/0x330 [ath9k]
> > [  445.489535]  tasklet_action_common.constprop.0+0x89/0xb0
> > [  445.489544]  tasklet_action+0x21/0x30
> > [  445.489551]  __do_softirq+0xc5/0x28b
> > [  445.489559]  ? __entry_text_end+0x4/0x4
> > [  445.489567]  call_on_stack+0x40/0x50
> > [  445.489576]  </SOFTIRQ>
> > [  445.489580]  ? irq_exit_rcu+0x92/0x100
> > [  445.489587]  ? common_interrupt+0x27/0x40
> > [  445.489595]  ? asm_common_interrupt+0x102/0x140
> > [  445.489605] ---[ end trace 5c176b666255bca1 ]---
> > [  445.607206] ------------[ cut here ]------------
> >
> >
> > Could be fixed by the following additional change:
> >
> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> >         struct ieee80211_hw *hw = sc->hw;
> >         struct ath_hw *ah = sc->sc_ah;
> >         u8 i, tx_rateindex;
> > +       bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
> >
> >         ieee80211_tx_info_clear_status(tx_info);
> >
> > +       tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
> > +
> >         if (txok)  
> 
> That doesn't seem right, zeroing is_valid_ack_signal was the whole point
> of the first patch... Something seems off in that backtrace; could you
> try running it through ./scripts/decode_stacktrace.sh please?

Will do (and re-run the test with an fresh compiled kernel as CONFIG_DEBUG_KERNEL
was not set - but need to wait until next week as I need physical access to
the test system for kernel update)...

Regards,
Peter

> 
> -Toke
Peter Seiderer April 4, 2022, 11:04 a.m. UTC | #12
Hello Toke,

On Sat, 2 Apr 2022 18:19:10 +0200, Peter Seiderer <ps.report@gmx.net> wrote:

> Hello Toke,
> 
> On Sat, 02 Apr 2022 17:11:54 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> 
> > Peter Seiderer <ps.report@gmx.net> writes:
> >   
> > > Hello Toke,
> > >
> > > On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <ps.report@gmx.net> wrote:
> > >    
> > >> Hello Toke,
> > >> 
> > >> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
> > >>     
> > >> > The ath9k driver was not properly clearing the status area in the
> > >> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
> > >> > it was manually filling in fields, which meant that fields introduced later
> > >> > were left as-is.
> > >> > 
> > >> > Conveniently, mac80211 actually provides a helper to zero out the status
> > >> > area, so use that to make sure we zero everything.
> > >> > 
> > >> > The last commit touching the driver function writing the status information
> > >> > seems to have actually been fixing an issue that was also caused by the
> > >> > area being uninitialised; but it only added clearing of a single field
> > >> > instead of the whole struct. That is now redundant, though, so revert that
> > >> > commit and use it as a convenient Fixes tag.
> > >> > 
> > >> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
> > >> > Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
> > >> > Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> > >> > ---
> > >> >  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
> > >> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > >> > 
> > >> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
> > >> > index d0caf1de2bde..cbcf96ac303e 100644
> > >> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > >> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > >> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > >> >  	struct ath_hw *ah = sc->sc_ah;
> > >> >  	u8 i, tx_rateindex;
> > >> >  
> > >> > +	ieee80211_tx_info_clear_status(tx_info);
> > >> > +      
> > >> 
> > >> As this also clears the status.rates[].count, see include/net/mac80211.h:
> > >> 
> > >> 1195 static inline void
> > >> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
> > >> 1197 {
> > >> 1198         int i;
> > >> 1199         
> > >> 1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> > >> 1201                      offsetof(struct ieee80211_tx_info, control.rates));
> > >> 1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> > >> 1203                      offsetof(struct ieee80211_tx_info, driver_rates));
> > >> 1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
> > >> 1205         /* clear the rate counts */
> > >> 1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> > >> 1207                 info->status.rates[i].count = 0;
> > >> 1208         memset_after(&info->status, 0, rates);
> > >> 1209 }
> > >> 
> > >> I would have expected some lines added to restore the count (for the
> > >> rates with index < tx_rateindex), e.g. as done in
> > >> drivers/net/wireless/ath/ath5k/base.c:
> > >> 
> > >> 1731         ieee80211_tx_info_clear_status(info);
> > >> 1732 
> > >> 1733         for (i = 0; i < ts->ts_final_idx; i++) {
> > >> 1734                 struct ieee80211_tx_rate *r =
> > >> 1735                         &info->status.rates[i];
> > >> 1736 
> > >> 1737                 r->count = tries[i];
> > >> 1738         }
> > >> 
> > >> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
> > >> the tx_rateindex index (which is often zero in case the first suggested rate
> > >> succeeds, but in noisy environment is sometimes > 0)...
> > >>     
> > >> >  	if (txok)
> > >> >  		tx_info->status.ack_signal = ts->ts_rssi;
> > >> >  
> > >> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > >> >  	}
> > >> >  
> > >> >  	tx_info->status.rates[)].count = ts->ts_longretry + 1;
> > >> > -
> > >> > -	/* we report airtime in ath_tx_count_airtime(), don't report twice */
> > >> > -	tx_info->status.tx_time = 0;
> > >> >  }
> > >> >  
> > >> >  static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)      
> > >> 
> > >> 
> > >> And from drivers/net/wireless/ath/ath9k/xmit.c:
> > >> 
> > >> 2592         for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
> > >> 2593                 tx_info->status.rates[i].count = 0;
> > >> 2594                 tx_info->status.rates[i].idx = -1;
> > >> 2595         }
> > >> 
> > >> Line 2593 can be deleted as status.rates[].count is already zeroed through the
> > >> ieee80211_tx_info_clear_status() call...
> > >> 
> > >> And it should be sufficient to do:
> > >> 
> > >> 	if (tx_rateindex + 1 < hw->max_rates)
> > >> 		tx_info->status.rates[tx_rateindex + 1].idx = -1;
> > >> 
> > >> Regards,
> > >> Peter    
> > >
> > > And one additional problem found with your patch applied (to 5.16.18), the log
> > > get spammed by:
> > >
> > > [  445.489139] ------------[ cut here ]------------
> > > [  445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
> > > [  445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
> > > [  445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G        W         5.16.18+ #2
> > > [  445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
> > > [  445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
> > > [  445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
> > > [  445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
> > > [  445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
> > > [  445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
> > > [  445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
> > > [  445.489362] Call Trace:
> > > [  445.489366]  <SOFTIRQ>
> > > [  445.489372]  ath_tx_complete_buf+0x100/0x130 [ath9k]
> > > [  445.489390]  ath_tx_process_buffer+0x187/0xb40 [ath9k]
> > > [  445.489407]  ? resched_curr+0x1d/0xc0
> > > [  445.489419]  ? check_preempt_wakeup+0x115/0x250
> > > [  445.489429]  ? task_fork_fair+0xc0/0x170
> > > [  445.489436]  ? put_prev_task_fair+0x40/0x40
> > > [  445.489444]  ? check_preempt_curr+0x62/0x70
> > > [  445.489452]  ? __local_bh_enable_ip+0x33/0x80
> > > [  445.489460]  ? _raw_spin_unlock_bh+0x13/0x20
> > > [  445.489468]  ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
> > > [  445.489485]  ? _raw_spin_unlock_irqrestore+0x16/0x30
> > > [  445.489492]  ? try_to_wake_up+0x7c/0x550
> > > [  445.489499]  ath_tx_tasklet+0x1de/0x2e0 [ath9k]
> > > [  445.489519]  ath9k_tasklet+0x22f/0x330 [ath9k]
> > > [  445.489535]  tasklet_action_common.constprop.0+0x89/0xb0
> > > [  445.489544]  tasklet_action+0x21/0x30
> > > [  445.489551]  __do_softirq+0xc5/0x28b
> > > [  445.489559]  ? __entry_text_end+0x4/0x4
> > > [  445.489567]  call_on_stack+0x40/0x50
> > > [  445.489576]  </SOFTIRQ>
> > > [  445.489580]  ? irq_exit_rcu+0x92/0x100
> > > [  445.489587]  ? common_interrupt+0x27/0x40
> > > [  445.489595]  ? asm_common_interrupt+0x102/0x140
> > > [  445.489605] ---[ end trace 5c176b666255bca1 ]---
> > > [  445.607206] ------------[ cut here ]------------
> > >
> > >
> > > Could be fixed by the following additional change:
> > >
> > > --- a/drivers/net/wireless/ath/ath9k/xmit.c
> > > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
> > > @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
> > >         struct ieee80211_hw *hw = sc->hw;
> > >         struct ath_hw *ah = sc->sc_ah;
> > >         u8 i, tx_rateindex;
> > > +       bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
> > >
> > >         ieee80211_tx_info_clear_status(tx_info);
> > >
> > > +       tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
> > > +
> > >         if (txok)    
> > 
> > That doesn't seem right, zeroing is_valid_ack_signal was the whole point
> > of the first patch... Something seems off in that backtrace; could you
> > try running it through ./scripts/decode_stacktrace.sh please?  
> 
> Will do (and re-run the test with an fresh compiled kernel as CONFIG_DEBUG_KERNEL
> was not set - but need to wait until next week as I need physical access to
> the test system for kernel update)...

Same with an fresh build (and with stacktrace run through decode_stacktrace.sh), but
did some more investigations, added an debug print of tx_info->status.is_valid_ack_signal
in ath_tx_rc_status() and an corresponding one in ath_txq_skb_done() printing out
fi->txq, with the additional

	tx_info->status.is_valid_ack_signal = is_valid_ack_signal;

line added showed the following output

[  839.543175] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[  839.543195] XXX - ath_txq_skb_done() txq: 2
[  839.617898] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[  839.617961] XXX - ath_txq_skb_done() txq: 2
[  840.024845] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[  840.024863] XXX - ath_txq_skb_done() txq: 2
[...]

and without the following:

[  924.614745] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
[  924.614762] XXX - ath_txq_skb_done() txq: 0
[  924.614805] ------------[ cut here ]------------
[  924.614809] WARNING: CPU: 0 PID: 0 at drivers/net/wireless/ath/ath9k/xmit.c:176 ath_txq_skb_done.constprop.0+0x28/0x38 [ath9k]
[...]


Seems ieee80211_tx_info_clear_status() clears/destroys the dual use of the
struct ieee80211_tx_info.status vs. ieee80211_tx_info.rate_driver_data...

See drivers/net/wireless/ath/ath9k/xmit.c

 140 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
 141 {
 142         struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
 143         BUILD_BUG_ON(sizeof(struct ath_frame_info) >
 144                      sizeof(tx_info->rate_driver_data));
 145         return (struct ath_frame_info *) &tx_info->rate_driver_data[0];
 146 }


 205 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
 206                              struct sk_buff *skb)
 207 {
 208         struct ath_frame_info *fi = get_frame_info(skb);
 209         int q = fi->txq;
 210 
 211         if (q < 0)
 212                 return;
 213 
 214         txq = sc->tx.txq_map[q];
 215         if (WARN_ON(--txq->pending_frames < 0))
 216                 txq->pending_frames = 0;
 217 
 218 }

Seems it only worked by chance (?) before...

Regards,
Peter


> 
> Regards,
> Peter
> 
> > 
> > -Toke  
>
Toke Høiland-Jørgensen April 4, 2022, 4:03 p.m. UTC | #13
Peter Seiderer <ps.report@gmx.net> writes:

> Hello Toke,
>
> On Sat, 2 Apr 2022 18:19:10 +0200, Peter Seiderer <ps.report@gmx.net> wrote:
>
>> Hello Toke,
>> 
>> On Sat, 02 Apr 2022 17:11:54 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> 
>> > Peter Seiderer <ps.report@gmx.net> writes:
>> >   
>> > > Hello Toke,
>> > >
>> > > On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <ps.report@gmx.net> wrote:
>> > >    
>> > >> Hello Toke,
>> > >> 
>> > >> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> > >>     
>> > >> > The ath9k driver was not properly clearing the status area in the
>> > >> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> > >> > it was manually filling in fields, which meant that fields introduced later
>> > >> > were left as-is.
>> > >> > 
>> > >> > Conveniently, mac80211 actually provides a helper to zero out the status
>> > >> > area, so use that to make sure we zero everything.
>> > >> > 
>> > >> > The last commit touching the driver function writing the status information
>> > >> > seems to have actually been fixing an issue that was also caused by the
>> > >> > area being uninitialised; but it only added clearing of a single field
>> > >> > instead of the whole struct. That is now redundant, though, so revert that
>> > >> > commit and use it as a convenient Fixes tag.
>> > >> > 
>> > >> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> > >> > Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
>> > >> > Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> > >> > ---
>> > >> >  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>> > >> >  1 file changed, 2 insertions(+), 3 deletions(-)
>> > >> > 
>> > >> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > index d0caf1de2bde..cbcf96ac303e 100644
>> > >> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >> >  	struct ath_hw *ah = sc->sc_ah;
>> > >> >  	u8 i, tx_rateindex;
>> > >> >  
>> > >> > +	ieee80211_tx_info_clear_status(tx_info);
>> > >> > +      
>> > >> 
>> > >> As this also clears the status.rates[].count, see include/net/mac80211.h:
>> > >> 
>> > >> 1195 static inline void
>> > >> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
>> > >> 1197 {
>> > >> 1198         int i;
>> > >> 1199         
>> > >> 1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> > >> 1201                      offsetof(struct ieee80211_tx_info, control.rates));
>> > >> 1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> > >> 1203                      offsetof(struct ieee80211_tx_info, driver_rates));
>> > >> 1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
>> > >> 1205         /* clear the rate counts */
>> > >> 1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>> > >> 1207                 info->status.rates[i].count = 0;
>> > >> 1208         memset_after(&info->status, 0, rates);
>> > >> 1209 }
>> > >> 
>> > >> I would have expected some lines added to restore the count (for the
>> > >> rates with index < tx_rateindex), e.g. as done in
>> > >> drivers/net/wireless/ath/ath5k/base.c:
>> > >> 
>> > >> 1731         ieee80211_tx_info_clear_status(info);
>> > >> 1732 
>> > >> 1733         for (i = 0; i < ts->ts_final_idx; i++) {
>> > >> 1734                 struct ieee80211_tx_rate *r =
>> > >> 1735                         &info->status.rates[i];
>> > >> 1736 
>> > >> 1737                 r->count = tries[i];
>> > >> 1738         }
>> > >> 
>> > >> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
>> > >> the tx_rateindex index (which is often zero in case the first suggested rate
>> > >> succeeds, but in noisy environment is sometimes > 0)...
>> > >>     
>> > >> >  	if (txok)
>> > >> >  		tx_info->status.ack_signal = ts->ts_rssi;
>> > >> >  
>> > >> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >> >  	}
>> > >> >  
>> > >> >  	tx_info->status.rates[)].count = ts->ts_longretry + 1;
>> > >> > -
>> > >> > -	/* we report airtime in ath_tx_count_airtime(), don't report twice */
>> > >> > -	tx_info->status.tx_time = 0;
>> > >> >  }
>> > >> >  
>> > >> >  static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)      
>> > >> 
>> > >> 
>> > >> And from drivers/net/wireless/ath/ath9k/xmit.c:
>> > >> 
>> > >> 2592         for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
>> > >> 2593                 tx_info->status.rates[i].count = 0;
>> > >> 2594                 tx_info->status.rates[i].idx = -1;
>> > >> 2595         }
>> > >> 
>> > >> Line 2593 can be deleted as status.rates[].count is already zeroed through the
>> > >> ieee80211_tx_info_clear_status() call...
>> > >> 
>> > >> And it should be sufficient to do:
>> > >> 
>> > >> 	if (tx_rateindex + 1 < hw->max_rates)
>> > >> 		tx_info->status.rates[tx_rateindex + 1].idx = -1;
>> > >> 
>> > >> Regards,
>> > >> Peter    
>> > >
>> > > And one additional problem found with your patch applied (to 5.16.18), the log
>> > > get spammed by:
>> > >
>> > > [  445.489139] ------------[ cut here ]------------
>> > > [  445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
>> > > [  445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
>> > > [  445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G        W         5.16.18+ #2
>> > > [  445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
>> > > [  445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
>> > > [  445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
>> > > [  445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
>> > > [  445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
>> > > [  445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
>> > > [  445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
>> > > [  445.489362] Call Trace:
>> > > [  445.489366]  <SOFTIRQ>
>> > > [  445.489372]  ath_tx_complete_buf+0x100/0x130 [ath9k]
>> > > [  445.489390]  ath_tx_process_buffer+0x187/0xb40 [ath9k]
>> > > [  445.489407]  ? resched_curr+0x1d/0xc0
>> > > [  445.489419]  ? check_preempt_wakeup+0x115/0x250
>> > > [  445.489429]  ? task_fork_fair+0xc0/0x170
>> > > [  445.489436]  ? put_prev_task_fair+0x40/0x40
>> > > [  445.489444]  ? check_preempt_curr+0x62/0x70
>> > > [  445.489452]  ? __local_bh_enable_ip+0x33/0x80
>> > > [  445.489460]  ? _raw_spin_unlock_bh+0x13/0x20
>> > > [  445.489468]  ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
>> > > [  445.489485]  ? _raw_spin_unlock_irqrestore+0x16/0x30
>> > > [  445.489492]  ? try_to_wake_up+0x7c/0x550
>> > > [  445.489499]  ath_tx_tasklet+0x1de/0x2e0 [ath9k]
>> > > [  445.489519]  ath9k_tasklet+0x22f/0x330 [ath9k]
>> > > [  445.489535]  tasklet_action_common.constprop.0+0x89/0xb0
>> > > [  445.489544]  tasklet_action+0x21/0x30
>> > > [  445.489551]  __do_softirq+0xc5/0x28b
>> > > [  445.489559]  ? __entry_text_end+0x4/0x4
>> > > [  445.489567]  call_on_stack+0x40/0x50
>> > > [  445.489576]  </SOFTIRQ>
>> > > [  445.489580]  ? irq_exit_rcu+0x92/0x100
>> > > [  445.489587]  ? common_interrupt+0x27/0x40
>> > > [  445.489595]  ? asm_common_interrupt+0x102/0x140
>> > > [  445.489605] ---[ end trace 5c176b666255bca1 ]---
>> > > [  445.607206] ------------[ cut here ]------------
>> > >
>> > >
>> > > Could be fixed by the following additional change:
>> > >
>> > > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > > @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >         struct ieee80211_hw *hw = sc->hw;
>> > >         struct ath_hw *ah = sc->sc_ah;
>> > >         u8 i, tx_rateindex;
>> > > +       bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
>> > >
>> > >         ieee80211_tx_info_clear_status(tx_info);
>> > >
>> > > +       tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
>> > > +
>> > >         if (txok)    
>> > 
>> > That doesn't seem right, zeroing is_valid_ack_signal was the whole point
>> > of the first patch... Something seems off in that backtrace; could you
>> > try running it through ./scripts/decode_stacktrace.sh please?  
>> 
>> Will do (and re-run the test with an fresh compiled kernel as CONFIG_DEBUG_KERNEL
>> was not set - but need to wait until next week as I need physical access to
>> the test system for kernel update)...
>
> Same with an fresh build (and with stacktrace run through decode_stacktrace.sh), but
> did some more investigations, added an debug print of tx_info->status.is_valid_ack_signal
> in ath_tx_rc_status() and an corresponding one in ath_txq_skb_done() printing out
> fi->txq, with the additional
>
> 	tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
>
> line added showed the following output
>
> [  839.543175] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  839.543195] XXX - ath_txq_skb_done() txq: 2
> [  839.617898] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  839.617961] XXX - ath_txq_skb_done() txq: 2
> [  840.024845] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  840.024863] XXX - ath_txq_skb_done() txq: 2
> [...]
>
> and without the following:
>
> [  924.614745] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  924.614762] XXX - ath_txq_skb_done() txq: 0
> [  924.614805] ------------[ cut here ]------------
> [  924.614809] WARNING: CPU: 0 PID: 0 at drivers/net/wireless/ath/ath9k/xmit.c:176 ath_txq_skb_done.constprop.0+0x28/0x38 [ath9k]
> [...]
>
>
> Seems ieee80211_tx_info_clear_status() clears/destroys the dual use of the
> struct ieee80211_tx_info.status vs. ieee80211_tx_info.rate_driver_data...
>
> See drivers/net/wireless/ath/ath9k/xmit.c
>
>  140 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
>  141 {
>  142         struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
>  143         BUILD_BUG_ON(sizeof(struct ath_frame_info) >
>  144                      sizeof(tx_info->rate_driver_data));
>  145         return (struct ath_frame_info *) &tx_info->rate_driver_data[0];
>  146 }
>
>
>  205 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
>  206                              struct sk_buff *skb)
>  207 {
>  208         struct ath_frame_info *fi = get_frame_info(skb);
>  209         int q = fi->txq;
>  210 
>  211         if (q < 0)
>  212                 return;
>  213 
>  214         txq = sc->tx.txq_map[q];
>  215         if (WARN_ON(--txq->pending_frames < 0))
>  216                 txq->pending_frames = 0;
>  217 
>  218 }
>
> Seems it only worked by chance (?) before...

Yikes, yeah, that's no good! Also, this comment above
ieee80211_tx_info_clear_status() is completely misleading:

 * NOTE: You can only use this function if you do NOT use
 *	 info->driver_data! Use info->rate_driver_data
 *	 instead if you need only the less space that allows.

(I read that, and figured that the current use was OK, but nope, guess
not!)

The ath_frame_info struct is small enough to fit into the
status_driver_data field, though, so guess we could just move it there.
However, ieee80211_tx_info_clear_status() also clears that AFAICT, so
we'll need to save & restore it? That seems a bit odd, is this
intentional Johannes?

-Toke
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index d0caf1de2bde..cbcf96ac303e 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -2553,6 +2553,8 @@  static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
 	struct ath_hw *ah = sc->sc_ah;
 	u8 i, tx_rateindex;
 
+	ieee80211_tx_info_clear_status(tx_info);
+
 	if (txok)
 		tx_info->status.ack_signal = ts->ts_rssi;
 
@@ -2595,9 +2597,6 @@  static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
 	}
 
 	tx_info->status.rates[tx_rateindex].count = ts->ts_longretry + 1;
-
-	/* we report airtime in ath_tx_count_airtime(), don't report twice */
-	tx_info->status.tx_time = 0;
 }
 
 static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)