Message ID | 201304230258.08359.chunkeey@googlemail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Tue, 2013-04-23 at 02:58 +0200, Christian Lamparter wrote: > This patch fixes the following RCU debug splat: > > =============================== > [ INFO: suspicious RCU usage. ] > 3.9.0-rc8-wl+ #31 Tainted: G O > ------------------------------- > net/mac80211/rate.c:691 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > rcu_scheduler_active = 1, debug_locks = 1 > 3 locks held by hostapd/9451: > #0: (genl_mutex){+.+.+.}, at: [<c1326365>] genl_lock+0xf/0x11 > #1: (rtnl_mutex){+.+.+.}, at: [<c13133c4>] rtnl_lock+0xf/0x11 > #2: (&rdev->mtx){+.+.+.}, at: [<f853395e>] nl80211_pre_doit+0x166/0x180 [cfg80211] > > stack backtrace: > Pid: 9451, comm: hostapd Tainted: G O 3.9.0-rc8-wl+ #31 > Call Trace: > [<c107da0b>] lockdep_rcu_suspicious+0xe6/0xee > [<f8bf82ad>] rate_control_set_rates+0x43/0x5a [mac80211] > [<f8c2cacb>] minstrel_update_rates+0xdc/0xe2 [mac80211] > [<f8c2cfb0>] minstrel_rate_init+0x24c/0x33d [mac80211] > [<f8c2d9d3>] minstrel_ht_update_caps+0x206/0x234 [mac80211] > [<c1080a8d>] ? lock_release+0x1c9/0x226 > [<f8c2da25>] minstrel_ht_rate_init+0x10/0x14 [mac80211] > [...] > > Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> > --- > Actually, rcu_read_lock() might not be necessary in this special > case [the RC is not yet initialized, so nothing bad can happen]. > > But, since the rcu_read_lock() has a low overhead and > rate_control_set_rates mac80211.h doc does not mention > anything about locking, I think this is a viable way. I think that, on the contrary, it's completely strange/wrong. ;-) > + rcu_read_lock(); > + old = rcu_dereference(pubsta->rates); Here's have a dereference. > rcu_assign_pointer(pubsta->rates, rates); and here's an assignment. The assignment ought to be protected already by some locking, presumably, so similarly is the rcu_dereference() which then should just be rcu_dereference_protected()? johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, April 23, 2013 08:48:28 AM Johannes Berg wrote: > On Tue, 2013-04-23 at 02:58 +0200, Christian Lamparter wrote: > > This patch fixes the following RCU debug splat: > > > > =============================== > > [ INFO: suspicious RCU usage. ] > > 3.9.0-rc8-wl+ #31 Tainted: G O > > ------------------------------- > > net/mac80211/rate.c:691 suspicious rcu_dereference_check() usage! > > > > other info that might help us debug this: > > > > rcu_scheduler_active = 1, debug_locks = 1 > > 3 locks held by hostapd/9451: > > #0: (genl_mutex){+.+.+.}, at: [<c1326365>] genl_lock+0xf/0x11 > > #1: (rtnl_mutex){+.+.+.}, at: [<c13133c4>] rtnl_lock+0xf/0x11 > > #2: (&rdev->mtx){+.+.+.}, at: [<f853395e>] nl80211_pre_doit+0x166/0x180 [cfg80211] > > > > stack backtrace: > > Pid: 9451, comm: hostapd Tainted: G O 3.9.0-rc8-wl+ #31 > > Call Trace: > > [<c107da0b>] lockdep_rcu_suspicious+0xe6/0xee > > [<f8bf82ad>] rate_control_set_rates+0x43/0x5a [mac80211] > > [<f8c2cacb>] minstrel_update_rates+0xdc/0xe2 [mac80211] > > [<f8c2cfb0>] minstrel_rate_init+0x24c/0x33d [mac80211] > > [<f8c2d9d3>] minstrel_ht_update_caps+0x206/0x234 [mac80211] > > [<c1080a8d>] ? lock_release+0x1c9/0x226 > > [<f8c2da25>] minstrel_ht_rate_init+0x10/0x14 [mac80211] > > [...] > > > > Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> > > --- > > Actually, rcu_read_lock() might not be necessary in this special > > case [the RC is not yet initialized, so nothing bad can happen]. > > > > But, since the rcu_read_lock() has a low overhead and > > rate_control_set_rates mac80211.h doc does not mention > > anything about locking, I think this is a viable way. > > I think that, on the contrary, it's completely strange/wrong. ;-) Sorry, I think I cut too much from the stack trace and I didn't explain how the code end up in this case. This time, I commented out the rcu_read_(un)lock() [=> rate.c:694 is rate.c:691 in wireless-testing.git] and started hostapd and let a station connect. (see attached log) > > + rcu_read_lock(); > > + old = rcu_dereference(pubsta->rates); > > Here's have a dereference. > > > rcu_assign_pointer(pubsta->rates, rates); > > and here's an assignment. The assignment ought to be protected already > by some locking, presumably, so similarly is the rcu_dereference() which > then should just be rcu_dereference_protected()? The issue seems to be in ieee80211_add_station in net/mac80211/cfg.c. This function allocates, initializes and adds the new station for hostapd. And of course: the alloc and (rate_)init part is done without acquiring any special mac80211 locks. (just rtnl, genl and rdev->mtx). [And why should it? After all, during initialization, the station is not yet in the station hash table.] So, what else can be done? Obviously, the locking requirement needs to be added to the doc entry for rate_control_set_rates in include/net/mac80211.h. And one of the following changes: 1. move the rate_control_rate_init after sta_info_insert_rcu and remove the rcu_read_locks from rate_control_set_rates. However then we would add an incomplete station (this can't be right?!). 2. add rcu or other lock around rate_control_set_rates in minstrel_update_rates() and minstrel_ht_update_rates(). 3. add a new function: rate_control_init_rates which is reserved for this case and only does the assignment. (4. use rcu_dereference_protected and test the rtnl_lock - really?) (5. some other way?) Regards, Christian --- =============================== [ INFO: suspicious RCU usage. ] 3.9.0-rc8-wl+ #32 Tainted: G O ------------------------------ net/mac80211/rate.c:694 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 3 locks held by hostapd/2906: #0: (genl_mutex){+.+.+.}, at: [<c1326365>] genl_lock+0xf/0x11 #1: (rtnl_mutex){+.+.+.}, at: [<c13133c4>] rtnl_lock+0xf/0x11 #2: (&rdev->mtx){+.+.+.}, at: [<f852195e>] nl80211_pre_doit+0x166/0x180 [cfg80211] stack backtrace: Pid: 2906, comm: hostapd Tainted: G O 3.9.0-rc8-wl+ #32 Call Trace: [<c107da0b>] lockdep_rcu_suspicious+0xe6/0xee [<f884835f>] rate_control_set_rates+0x43/0x5a [mac80211] [<f8882e52>] minstrel_ht_update_rates+0x9f/0xa7 [mac80211] [<f88833ec>] minstrel_ht_update_caps+0x1cf/0x234 [mac80211] [<c1080a8d>] ? lock_release+0x1c9/0x226 [<f8883475>] minstrel_ht_rate_init+0x10/0x14 [mac80211] [<f884d326>] rate_control_rate_init+0xc4/0xd8 [mac80211] [<f884e219>] ieee80211_add_station+0xdc/0x11b [mac80211] [<f8526595>] nl80211_new_station+0x27e/0x2c7 [cfg80211] [<c132653d>] genl_rcv_msg+0x1b6/0x1ee [<c1326387>] ? genl_rcv+0x20/0x20 [The full unaltered trace is available at: <http://pastebin.com/gYc8yAqB>] -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-04-23 at 15:26 +0200, Christian Lamparter wrote: > > > Actually, rcu_read_lock() might not be necessary in this special > > > case [the RC is not yet initialized, so nothing bad can happen]. > > > > > > But, since the rcu_read_lock() has a low overhead and > > > rate_control_set_rates mac80211.h doc does not mention > > > anything about locking, I think this is a viable way. > > > > I think that, on the contrary, it's completely strange/wrong. ;-) > Sorry, I think I cut too much from the stack trace and I didn't > explain how the code end up in this case. This time, I commented out > the rcu_read_(un)lock() [=> rate.c:694 is rate.c:691 in wireless-testing.git] > and started hostapd and let a station connect. (see attached log) Yes, I understand how you can get here. But every time the assignment here happens, the value is completely overwritten. And when we free it here, we don't look at the value. > > > + rcu_read_lock(); > > > + old = rcu_dereference(pubsta->rates); > > > > Here's have a dereference. > > > > > rcu_assign_pointer(pubsta->rates, rates); > > > > and here's an assignment. The assignment ought to be protected already > > by some locking, presumably, so similarly is the rcu_dereference() which > > then should just be rcu_dereference_protected()? > The issue seems to be in ieee80211_add_station in net/mac80211/cfg.c. > This function allocates, initializes and adds the new station for > hostapd. And of course: the alloc and (rate_)init part is done without > acquiring any special mac80211 locks. (just rtnl, genl and rdev->mtx). > > [And why should it? After all, during initialization, the station is > not yet in the station hash table.] > > So, what else can be done? > > Obviously, the locking requirement needs to be added to the > doc entry for rate_control_set_rates in include/net/mac80211.h. I don't see that any bug can happen here right now, even without locking. > And one of the following changes: > > 1. move the rate_control_rate_init after sta_info_insert_rcu > and remove the rcu_read_locks from rate_control_set_rates. > However then we would add an incomplete station (this can't be right?!). > > 2. add rcu or other lock around rate_control_set_rates in > minstrel_update_rates() and minstrel_ht_update_rates(). Both seem wrong. > 3. add a new function: rate_control_init_rates which is > reserved for this case and only does the assignment. I like that. > (4. use rcu_dereference_protected and test the rtnl_lock - really?) Nah that'll never work anyway. > (5. some other way?) The problem here is that even the rcu_read_lock() around here that's actually there in most cases *isn't* what's protecting this code. What's protecting this assignment is the fact that we require drivers to not call ieee80211_tx_status() concurrently (and if they call ieee80211_tx_status_irqsafe() then we serialize via the tasklet.) If this wasn't the case, then calling the function could cause double-free or so by having two CPUs read the old pointer and both call kfree_rcu() on it. Actually, looking at this code, this does seem possible in minstrel_ht because it also calls this from minstrel_ht_rate_update() (indirectly), which is called from the RX path which I'm not sure we require to be not concurrent with the TX status path? Most drivers probably don't call them concurrently, but I haven't checked all of them. So as you can see, the RCU warning is just the tip of the iceberg. johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
=============================== [ INFO: suspicious RCU usage. ] 3.9.0-rc8-wl+ #31 Tainted: G O ------------------------------- net/mac80211/rate.c:691 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 3 locks held by hostapd/9451: #0: (genl_mutex){+.+.+.}, at: [<c1326365>] genl_lock+0xf/0x11 #1: (rtnl_mutex){+.+.+.}, at: [<c13133c4>] rtnl_lock+0xf/0x11 #2: (&rdev->mtx){+.+.+.}, at: [<f853395e>] nl80211_pre_doit+0x166/0x180 [cfg80211] stack backtrace: Pid: 9451, comm: hostapd Tainted: G O 3.9.0-rc8-wl+ #31 Call Trace: [<c107da0b>] lockdep_rcu_suspicious+0xe6/0xee [<f8bf82ad>] rate_control_set_rates+0x43/0x5a [mac80211] [<f8c2cacb>] minstrel_update_rates+0xdc/0xe2 [mac80211] [<f8c2cfb0>] minstrel_rate_init+0x24c/0x33d [mac80211] [<f8c2d9d3>] minstrel_ht_update_caps+0x206/0x234 [mac80211] [<c1080a8d>] ? lock_release+0x1c9/0x226 [<f8c2da25>] minstrel_ht_rate_init+0x10/0x14 [mac80211] [...] Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> --- Actually, rcu_read_lock() might not be necessary in this special case [the RC is not yet initialized, so nothing bad can happen]. But, since the rcu_read_lock() has a low overhead and rate_control_set_rates mac80211.h doc does not mention anything about locking, I think this is a viable way. --- diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c index 0d51877..615d3a8 100644 --- a/net/mac80211/rate.c +++ b/net/mac80211/rate.c @@ -688,11 +688,15 @@ int rate_control_set_rates(struct ieee80211_hw *hw, struct ieee80211_sta *pubsta, struct ieee80211_sta_rates *rates) { - struct ieee80211_sta_rates *old = rcu_dereference(pubsta->rates); + struct ieee80211_sta_rates *old; + + rcu_read_lock(); + old = rcu_dereference(pubsta->rates); rcu_assign_pointer(pubsta->rates, rates); if (old) kfree_rcu(old, rcu_head); + rcu_read_unlock(); return 0; }