diff mbox series

iwlwifi RFC related to iwl_mvm_tx_reclaim

Message ID c9b0c01e-acac-9f15-730f-a0ba991a68dc@candelatech.com (mailing list archive)
State RFC
Delegated to: Johannes Berg
Headers show
Series iwlwifi RFC related to iwl_mvm_tx_reclaim | expand

Commit Message

Ben Greear Feb. 12, 2024, 11:22 p.m. UTC
Hello,

I'm seeing a crash due to accessing an 'sta' object in this method that is '-2' as integer.

It fails the initial check for null STA, but I'm thinking it might should check for IS_ERR(sta)
as well.

(I have my own patch that references sta before the IS_ERR check later in the code, and this
causes the crash I'm seeing.  I guess upstream will not crash in this situation.).

My question:  Is the patch below a preferred approach, or should I add special checks to where I
access sta and only exit the method lower where it already has the IS_ERR(sta) check?


Thanks,
Ben

Comments

Ben Greear Feb. 12, 2024, 11:27 p.m. UTC | #1
On 2/12/24 15:22, Ben Greear wrote:
> Hello,
> 
> I'm seeing a crash due to accessing an 'sta' object in this method that is '-2' as integer.
> 
> It fails the initial check for null STA, but I'm thinking it might should check for IS_ERR(sta)
> as well.
> 
> (I have my own patch that references sta before the IS_ERR check later in the code, and this
> causes the crash I'm seeing.  I guess upstream will not crash in this situation.).
> 
> My question:  Is the patch below a preferred approach, or should I add special checks to where I
> access sta and only exit the method lower where it already has the IS_ERR(sta) check?
> 
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> index 0567f4eefebc..bd3d2fe424cd 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> @@ -2337,7 +2337,7 @@ static void iwl_mvm_tx_reclaim(struct iwl_mvm *mvm, int sta_id, int tid,
>          sta = rcu_dereference(mvm->fw_id_to_mac_id[sta_id]);
> 
>          /* Reclaiming frames for a station that has been deleted ? */
> -       if (WARN_ON_ONCE(!sta)) {
> +       if (IS_ERR(sta) || !sta) {
>                  rcu_read_unlock();
>                  return;
>          }

Or another idea came to mind:  Should this check above go away entirely, and check for null
down where it currently checks IS_ERR()?  From the comment about the IS_ERR check, I am thinking
that might be a better idea...

Thanks,
Ben

> 
> Thanks,
> Ben
>
Johannes Berg Feb. 13, 2024, 9:37 a.m. UTC | #2
On Mon, 2024-02-12 at 15:22 -0800, Ben Greear wrote:
> I'm seeing a crash due to accessing an 'sta' object in this method that is '-2' as integer.
> 
> It fails the initial check for null STA, but I'm thinking it might should check for IS_ERR(sta)
> as well.
> 
> (I have my own patch that references sta before the IS_ERR check later in the code, and this
> causes the crash I'm seeing.  I guess upstream will not crash in this situation.).

Indeed.

> My question:  Is the patch below a preferred approach, or should I add special checks to where I
> access sta and only exit the method lower where it already has the IS_ERR(sta) check?

You can do whatever you want in your tree, but I guess generally I'd
advocate you assume that the code does what it should ;-)

In this case, ERR_PTR(-ENOENT) is used to indicate the station is being
deleted, but has not yet been fully removed, and so indeed we still want
to reclaim the frames here correctly, which the code does.

The comment below even kind of explains that?

johannes
Ben Greear Feb. 13, 2024, 2:13 p.m. UTC | #3
On 2/13/24 1:37 AM, Johannes Berg wrote:
> On Mon, 2024-02-12 at 15:22 -0800, Ben Greear wrote:
>> I'm seeing a crash due to accessing an 'sta' object in this method that is '-2' as integer.
>>
>> It fails the initial check for null STA, but I'm thinking it might should check for IS_ERR(sta)
>> as well.
>>
>> (I have my own patch that references sta before the IS_ERR check later in the code, and this
>> causes the crash I'm seeing.  I guess upstream will not crash in this situation.).
> 
> Indeed.
> 
>> My question:  Is the patch below a preferred approach, or should I add special checks to where I
>> access sta and only exit the method lower where it already has the IS_ERR(sta) check?
> 
> You can do whatever you want in your tree, but I guess generally I'd
> advocate you assume that the code does what it should ;-)
> 
> In this case, ERR_PTR(-ENOENT) is used to indicate the station is being
> deleted, but has not yet been fully removed, and so indeed we still want
> to reclaim the frames here correctly, which the code does.
> 
> The comment below even kind of explains that?

If sta is NULL, we should still reclaim the frames?  If so the check earlier in the code where
it returns early if sta is NULL could be deleted, and add a null check down near the IS_ERR
check?

I can (and have) fixed the bug in my code, I'm trying to understand if the driver
itself needs improving here to cover an edge case.

Thanks,
Ben

> 
> johannes
>
Johannes Berg Feb. 13, 2024, 2:14 p.m. UTC | #4
On Tue, 2024-02-13 at 06:13 -0800, Ben Greear wrote:
> 
> If sta is NULL, we should still reclaim the frames?  If so the check earlier in the code where
> it returns early if sta is NULL could be deleted, and add a null check down near the IS_ERR
> check?

If the sta is NULL something went pretty much horribly wrong, not sure
what we should be trying to do in that case. I guess you could argue for
reclaiming anyway, but question is how far you go and what that risks, I
don't really know.

johannes
Ben Greear Feb. 13, 2024, 2:21 p.m. UTC | #5
On 2/13/24 6:14 AM, Johannes Berg wrote:
> On Tue, 2024-02-13 at 06:13 -0800, Ben Greear wrote:
>>
>> If sta is NULL, we should still reclaim the frames?  If so the check earlier in the code where
>> it returns early if sta is NULL could be deleted, and add a null check down near the IS_ERR
>> check?
> 
> If the sta is NULL something went pretty much horribly wrong, not sure
> what we should be trying to do in that case. I guess you could argue for
> reclaiming anyway, but question is how far you go and what that risks, I
> don't really know.

Ok, I haven't seen it actually be null, so will just fix my particular bug
and leave the rest as is.

Thanks,
Ben
Miri Korenblit Feb. 14, 2024, 6:08 a.m. UTC | #6
: iwlwifi RFC related to iwl_mvm_tx_reclaim
> 
> Hello,
> 
> I'm seeing a crash due to accessing an 'sta' object in this method that is '-2' as
> integer.
> 
> It fails the initial check for null STA, but I'm thinking it might should check for
> IS_ERR(sta) as well.
> 
> (I have my own patch that references sta before the IS_ERR check later in the
> code, and this causes the crash I'm seeing.  I guess upstream will not crash in this
> situation.).
> 
> My question:  Is the patch below a preferred approach, or should I add special
> checks to where I access sta and only exit the method lower where it already has
> the IS_ERR(sta) check?
> 
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> index 0567f4eefebc..bd3d2fe424cd 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> @@ -2337,7 +2337,7 @@ static void iwl_mvm_tx_reclaim(struct iwl_mvm
> *mvm, int sta_id, int tid,
>          sta = rcu_dereference(mvm->fw_id_to_mac_id[sta_id]);
> 
>          /* Reclaiming frames for a station that has been deleted ? */
> -       if (WARN_ON_ONCE(!sta)) {
> +       if (IS_ERR(sta) || !sta) {
>                  rcu_read_unlock();
>                  return;
>          }
> 

Hi, 

Did you see this: 2b3eb122342c?

This can explain why the code is how it is.
And no, you should not access the sta pointer before checking if IS_ERR.

> Thanks,
> Ben
> 
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
Ben Greear Feb. 14, 2024, 6:08 p.m. UTC | #7
On 2/13/24 22:08, Korenblit, Miriam Rachel wrote:
> : iwlwifi RFC related to iwl_mvm_tx_reclaim
>>
>> Hello,
>>
>> I'm seeing a crash due to accessing an 'sta' object in this method that is '-2' as
>> integer.
>>
>> It fails the initial check for null STA, but I'm thinking it might should check for
>> IS_ERR(sta) as well.
>>
>> (I have my own patch that references sta before the IS_ERR check later in the
>> code, and this causes the crash I'm seeing.  I guess upstream will not crash in this
>> situation.).
>>
>> My question:  Is the patch below a preferred approach, or should I add special
>> checks to where I access sta and only exit the method lower where it already has
>> the IS_ERR(sta) check?
>>
>> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
>> b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
>> index 0567f4eefebc..bd3d2fe424cd 100644
>> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
>> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
>> @@ -2337,7 +2337,7 @@ static void iwl_mvm_tx_reclaim(struct iwl_mvm
>> *mvm, int sta_id, int tid,
>>           sta = rcu_dereference(mvm->fw_id_to_mac_id[sta_id]);
>>
>>           /* Reclaiming frames for a station that has been deleted ? */
>> -       if (WARN_ON_ONCE(!sta)) {
>> +       if (IS_ERR(sta) || !sta) {
>>                   rcu_read_unlock();
>>                   return;
>>           }
>>
> 
> Hi,
> 
> Did you see this: 2b3eb122342c?
> 
> This can explain why the code is how it is.
> And no, you should not access the sta pointer before checking if IS_ERR.

Thanks for the pointer.  I think if I ever see the WARN_ON_ONCE(!sta) hit then
I'd want to just try to remove that and let the reclaim code work, and check for null in the IS_ERR() logic
added by the patch you referenced.

But I have not seen sta be NULL, and Johannes is fine with current code, so for now
I think it is fine as is.

Thanks,
Ben
diff mbox series

Patch

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index 0567f4eefebc..bd3d2fe424cd 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -2337,7 +2337,7 @@  static void iwl_mvm_tx_reclaim(struct iwl_mvm *mvm, int sta_id, int tid,
         sta = rcu_dereference(mvm->fw_id_to_mac_id[sta_id]);

         /* Reclaiming frames for a station that has been deleted ? */
-       if (WARN_ON_ONCE(!sta)) {
+       if (IS_ERR(sta) || !sta) {
                 rcu_read_unlock();
                 return;
         }