Message ID | 1471569995-10028-2-git-send-email-greearb@candelatech.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kalle Valo |
Headers | show |
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index 916119c..d96c06e 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) > int max; > int loop_max = 2000; > > - spin_lock_bh(&ar->txqs_lock); > rcu_read_lock(); > + spin_lock_bh(&ar->txqs_lock); > Ben, It is quite possible that push_pending can be preempted after acquiring rcu_lock which may lead to deadlock. no? I assume to prevent that spin_lock is taken first. Could you please explain how this reordering is fixing dead lock? -Rajkumar
On 08/18/2016 08:01 PM, Manoharan, Rajkumar wrote: >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c >> index 916119c..d96c06e 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) >> int max; >> int loop_max = 2000; >> >> - spin_lock_bh(&ar->txqs_lock); >> rcu_read_lock(); >> + spin_lock_bh(&ar->txqs_lock); >> > Ben, > > It is quite possible that push_pending can be preempted after acquiring rcu_lock which > may lead to deadlock. no? I assume to prevent that spin_lock is taken first. > > Could you please explain how this reordering is fixing dead lock? It did not obviously fix the spin lock issue, but the issue went away. Maybe it was because I fixed the stale memory access issues at around the same time. But, I don't think you can deadlock by taking rcu lock first and then the spinlock. I checked all places where the spinlock is held, and I do not see any way for any of them to block forever (In my kernel, I have a 2000 time limit on spinning in the push pending method, which can help make sure we don't spin forever). http://dmz2.candelatech.com/?p=linux-4.7.dev.y/.git;a=commitdiff;h=5d192657269d8e20fce733f894bb1b68df71db00 I was also worried that some calls from mac80211 might be holding rcu_read_lock while calling into ath10k code that would grab the spinlock. If that is the case (and I didn't verify it was), then you could have a lock inversion by taking spinlock before rcu read lock in the push-pending method. Anyway, someone that understands locking subtleties better might have more clue about this code than I do. Thanks, Ben
greearb@candelatech.com writes: > From: Ben Greear <greearb@candelatech.com> > > I was seeing some spin-lock hangs in this area of the code, > and it seems more proper to do the rcu-read-lock outside of > the spin lock. I am not sure how much this matters, however. > > Signed-off-by: Ben Greear <greearb@candelatech.com> > --- > drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index 916119c..d96c06e 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) > int max; > int loop_max = 2000; > > - spin_lock_bh(&ar->txqs_lock); > rcu_read_lock(); > + spin_lock_bh(&ar->txqs_lock); > > last = list_last_entry(&ar->txqs, struct ath10k_txq, list); > while (!list_empty(&ar->txqs)) { > @@ -4342,8 +4342,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) > break; > } > > - rcu_read_unlock(); > spin_unlock_bh(&ar->txqs_lock); > + rcu_read_unlock(); I'm no RCU expert but this isn't making any sense. Maybe it changes timings on your kernel so that it hides the real problem?
On 09/09/2016 06:36 AM, Valo, Kalle wrote: > greearb@candelatech.com writes: > >> From: Ben Greear <greearb@candelatech.com> >> >> I was seeing some spin-lock hangs in this area of the code, >> and it seems more proper to do the rcu-read-lock outside of >> the spin lock. I am not sure how much this matters, however. >> >> Signed-off-by: Ben Greear <greearb@candelatech.com> >> --- >> drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c >> index 916119c..d96c06e 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) >> int max; >> int loop_max = 2000; >> >> - spin_lock_bh(&ar->txqs_lock); >> rcu_read_lock(); >> + spin_lock_bh(&ar->txqs_lock); >> >> last = list_last_entry(&ar->txqs, struct ath10k_txq, list); >> while (!list_empty(&ar->txqs)) { >> @@ -4342,8 +4342,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) >> break; >> } >> >> - rcu_read_unlock(); >> spin_unlock_bh(&ar->txqs_lock); >> + rcu_read_unlock(); > > I'm no RCU expert but this isn't making any sense. Maybe it changes > timings on your kernel so that it hides the real problem? I'm not sure this fixed anything or not, it just seemed weird so I changed it. I was hoping someone that understood rcu locking would comment... Thanks, Ben
> > > - rcu_read_unlock(); > > > spin_unlock_bh(&ar->txqs_lock); > > > + rcu_read_unlock(); > > > > I'm no RCU expert but this isn't making any sense. Maybe it changes > > timings on your kernel so that it hides the real problem? > > I'm not sure this fixed anything or not, it just seemed weird so I > changed it. > > I was hoping someone that understood rcu locking would comment... > RCU is no "locking". The sooner you get over that notion, the better. This therefore make no sense whatsoever. In fact, you want to keep the RCU protected section *small*, so having the spinlock inside hurts overall system performance. johannes
On 09/11/2016 11:41 PM, Johannes Berg wrote: > >>>> - rcu_read_unlock(); >>>> spin_unlock_bh(&ar->txqs_lock); >>>> + rcu_read_unlock(); >>> >>> I'm no RCU expert but this isn't making any sense. Maybe it changes >>> timings on your kernel so that it hides the real problem? >> >> I'm not sure this fixed anything or not, it just seemed weird so I >> changed it. >> >> I was hoping someone that understood rcu locking would comment... >> > > RCU is no "locking". The sooner you get over that notion, the better. > > This therefore make no sense whatsoever. > > In fact, you want to keep the RCU protected section *small*, so having > the spinlock inside hurts overall system performance. Ok, thanks for the review. I'll drop this patch from my tree. Thanks, Ben
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 916119c..d96c06e 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) int max; int loop_max = 2000; - spin_lock_bh(&ar->txqs_lock); rcu_read_lock(); + spin_lock_bh(&ar->txqs_lock); last = list_last_entry(&ar->txqs, struct ath10k_txq, list); while (!list_empty(&ar->txqs)) { @@ -4342,8 +4342,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) break; } - rcu_read_unlock(); spin_unlock_bh(&ar->txqs_lock); + rcu_read_unlock(); } /************/