diff mbox

4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

Message ID 532c257e-52a0-18c1-1afe-04d37c28e072@broadcom.com (mailing list archive)
State RFC
Delegated to: Kalle Valo
Headers show

Commit Message

Arend van Spriel May 22, 2017, 9:02 p.m. UTC
On 22-5-2017 14:09, Arend van Spriel wrote:
> On 5/22/2017 12:57 PM, Johannes Berg wrote:
>> On Mon, 2017-05-22 at 12:36 +0200, Sander Eikelenboom wrote:
>>> Hi,
>>>
>>> I encountered this splat with 4.12-RC2.
>>
>> Ugh, yeah, I should've seen that in the review.
>>
>> Arend, please take a look at this. cfg80211_sched_scan_results() cannot
>> sleep, so you can't rtnl_lock() in there. Looks like you can just rely
>> on RCU though?
> 
> I see. I think you are right on RCU. Don't have the code in front of me
> now, but I think the lookup has an ASSERT_RTNL. Will look into it after
> my monday meeting :-p

I realized I have a laptop lying around with intel 3160 wifi chip and
tried to reproduce the issue. Did not run into the splat running
4.12-rc1 from wireless-drivers-next repo. I did not get the email from
Sander so I don't know any details.

Here is what I changed based on the info Johannes provided. Can you
please check if this get rid of the splat and let me know.

Regards,
Arend
---
        }
@@ -398,13 +396,13 @@ void cfg80211_sched_scan_results(struct wiphy
*wiphy, u64
        trace_cfg80211_sched_scan_results(wiphy, reqid);
        /* ignore if we're not scanning */

-       rtnl_lock();
+       rcu_read_lock();
        request = cfg80211_find_sched_scan_req(rdev, reqid);
        if (request) {
                request->report_results = true;
                queue_work(cfg80211_wq, &rdev->sched_scan_res_wk);
        }
-       rtnl_unlock();
+       rcu_read_unlock();
 }
 EXPORT_SYMBOL(cfg80211_sched_scan_results);

Comments

Johannes Berg May 22, 2017, 9:04 p.m. UTC | #1
Hi Arend,

Sorry, I forgot that the original message wasn't Cc'ed to the wireless
list, only netdev.

> +++ b/net/wireless/scan.c
> @@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
> cfg80211_regi
>  {
>         struct cfg80211_sched_scan_request *pos;
> 
> -       ASSERT_RTNL();
> -
> -       list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
> +       list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list,
> list) {

[snip]

This looks fine, but perhaps in the above we should have some kind of
locking assertion, e.g.

	WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());

johannes
Arend van Spriel May 23, 2017, 7:19 a.m. UTC | #2
On 22-5-2017 23:04, Johannes Berg wrote:
> Hi Arend,
> 
> Sorry, I forgot that the original message wasn't Cc'ed to the wireless
> list, only netdev.

That explains. Not subscribed to that.

>> +++ b/net/wireless/scan.c
>> @@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
>> cfg80211_regi
>>  {
>>         struct cfg80211_sched_scan_request *pos;
>>
>> -       ASSERT_RTNL();
>> -
>> -       list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
>> +       list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list,
>> list) {
> 
> [snip]
> 
> This looks fine, but perhaps in the above we should have some kind of
> locking assertion, e.g.
> 
> 	WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());

Thought about something like this after sending the email. So there are
two call sites. One for scheduled scan results notification and one in
scheduled scan stop scenario. So for the latter it is not needed to use
the rcu_read_lock() as it should have RTNL lock hence the two checks above?

Will create a formal patch.

Regards,
Arend
Johannes Berg May 23, 2017, 7:22 a.m. UTC | #3
On Tue, 2017-05-23 at 09:19 +0200, Arend Van Spriel wrote:
> 
> > 	WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());
> 
> Thought about something like this after sending the email. So there
> are two call sites. One for scheduled scan results notification and
> one in scheduled scan stop scenario. So for the latter it is not
> needed to use the rcu_read_lock() as it should have RTNL lock hence
> the two checks above?

Right. The latter can't even really use rcu_read_lock() since it also
wants to modify the list, and that's not sufficient protection for
modifying.

Thanks!

johannes
Arend van Spriel May 23, 2017, 7:24 a.m. UTC | #4
On 23-5-2017 9:22, Johannes Berg wrote:
> On Tue, 2017-05-23 at 09:19 +0200, Arend Van Spriel wrote:
>>
>>> 	WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());
>>
>> Thought about something like this after sending the email. So there
>> are two call sites. One for scheduled scan results notification and
>> one in scheduled scan stop scenario. So for the latter it is not
>> needed to use the rcu_read_lock() as it should have RTNL lock hence
>> the two checks above?
> 
> Right. The latter can't even really use rcu_read_lock() since it also
> wants to modify the list, and that's not sufficient protection for
> modifying.

Hence the name ;-)

Regards,
Arend
Sander Eikelenboom May 23, 2017, 5:51 p.m. UTC | #5
On 22/05/17 23:02, Arend Van Spriel wrote:
> 
> 
> On 22-5-2017 14:09, Arend van Spriel wrote:
>> On 5/22/2017 12:57 PM, Johannes Berg wrote:
>>> On Mon, 2017-05-22 at 12:36 +0200, Sander Eikelenboom wrote:
>>>> Hi,
>>>>
>>>> I encountered this splat with 4.12-RC2.
>>>
>>> Ugh, yeah, I should've seen that in the review.
>>>
>>> Arend, please take a look at this. cfg80211_sched_scan_results() cannot
>>> sleep, so you can't rtnl_lock() in there. Looks like you can just rely
>>> on RCU though?
>>
>> I see. I think you are right on RCU. Don't have the code in front of me
>> now, but I think the lookup has an ASSERT_RTNL. Will look into it after
>> my monday meeting :-p
> 
> I realized I have a laptop lying around with intel 3160 wifi chip and
> tried to reproduce the issue. Did not run into the splat running
> 4.12-rc1 from wireless-drivers-next repo. I did not get the email from
> Sander so I don't know any details.
> 
> Here is what I changed based on the info Johannes provided. Can you
> please check if this get rid of the splat and let me know.

Hi Arend,

I ran your patch today, so far no issues.

--
Sander


> Regards,
> Arend
> ---
> diff --git a/net/wireless/scan.c b/net/wireless/scan.c
> index 14d5f0c..04833bb 100644
> --- a/net/wireless/scan.c
> +++ b/net/wireless/scan.c
> @@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
> cfg80211_regi
>  {
>         struct cfg80211_sched_scan_request *pos;
> 
> -       ASSERT_RTNL();
> -
> -       list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
> +       list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list, list) {
>                 if (pos->reqid == reqid)
>                         return pos;
>         }
> @@ -398,13 +396,13 @@ void cfg80211_sched_scan_results(struct wiphy
> *wiphy, u64
>         trace_cfg80211_sched_scan_results(wiphy, reqid);
>         /* ignore if we're not scanning */
> 
> -       rtnl_lock();
> +       rcu_read_lock();
>         request = cfg80211_find_sched_scan_req(rdev, reqid);
>         if (request) {
>                 request->report_results = true;
>                 queue_work(cfg80211_wq, &rdev->sched_scan_res_wk);
>         }
> -       rtnl_unlock();
> +       rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(cfg80211_sched_scan_results);
> 
>
diff mbox

Patch

diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index 14d5f0c..04833bb 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -322,9 +322,7 @@  static void cfg80211_del_sched_scan_req(struct
cfg80211_regi
 {
        struct cfg80211_sched_scan_request *pos;

-       ASSERT_RTNL();
-
-       list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
+       list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list, list) {
                if (pos->reqid == reqid)
                        return pos;