Message ID | 1314728954-22646-1-git-send-email-javier@cozybit.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Tue, 2011-08-30 at 11:29 -0700, Javier Cardona wrote: > Under failure conditions, the mesh stack sends PERR messages to the > previous sender of the failed frame. This happens in the tx feedback > path, in which the transmission queue lock may be taken. Avoid a > deadlock by sending the path error via the pending queue. > > Signed-off-by: Javier Cardona <javier@cozybit.com> > --- > net/mac80211/mesh_hwmp.c | 7 ++++++- > 1 files changed, 6 insertions(+), 1 deletions(-) > > diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c > index fd4f76a..f760679 100644 > --- a/net/mac80211/mesh_hwmp.c > +++ b/net/mac80211/mesh_hwmp.c > @@ -209,6 +209,10 @@ static int mesh_path_sel_frame_tx(enum mpath_frame_type action, u8 flags, > * @target_sn: SN of the broken destination > * @target_rcode: reason code for this PERR > * @ra: node this frame is addressed to > + * > + * Note: This function may be called from the tx feedback path, possibly with > + * the transmission queue lock taken. To avoid a deadlock we don't transmit > + * the frame directly but add it to the pending queue instead. I'd change that to say "with driver locks taken that the driver also acquires in the TX path" or something like that maybe? But that's not a big thing. Have you tested it? I'm wondering if because we take a lock here we might run into lock dependencies issues (lockdep would say) but I don't think so because stop_queue etc. all take the same lock from an arbitrary driver context already. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 30, 2011 at 11:43 AM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Tue, 2011-08-30 at 11:29 -0700, Javier Cardona wrote: >> Under failure conditions, the mesh stack sends PERR messages to the >> previous sender of the failed frame. This happens in the tx feedback >> path, in which the transmission queue lock may be taken. Avoid a >> deadlock by sending the path error via the pending queue. >> >> Signed-off-by: Javier Cardona <javier@cozybit.com> >> --- >> net/mac80211/mesh_hwmp.c | 7 ++++++- >> 1 files changed, 6 insertions(+), 1 deletions(-) >> >> diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c >> index fd4f76a..f760679 100644 >> --- a/net/mac80211/mesh_hwmp.c >> +++ b/net/mac80211/mesh_hwmp.c >> @@ -209,6 +209,10 @@ static int mesh_path_sel_frame_tx(enum mpath_frame_type action, u8 flags, >> * @target_sn: SN of the broken destination >> * @target_rcode: reason code for this PERR >> * @ra: node this frame is addressed to >> + * >> + * Note: This function may be called from the tx feedback path, possibly with >> + * the transmission queue lock taken. To avoid a deadlock we don't transmit >> + * the frame directly but add it to the pending queue instead. > > I'd change that to say "with driver locks taken that the driver also > acquires in the TX path" or something like that maybe? OK > But that's not a big thing. Have you tested it? I'm wondering if because > we take a lock here we might run into lock dependencies issues (lockdep > would say) but I don't think so because stop_queue etc. all take the > same lock from an arbitrary driver context already. We don't have a test specific for that particular perr. Other tests run fine and lockdep does not complain. I can resubmit the patch with the clearer comment once we've run our whole test suite if that gives you peace of mind. Javier > Reviewed-by: Johannes Berg <johannes@sipsolutions.net> > > johannes > >
On Tue, Aug 30, 2011 at 2:38 PM, Javier Cardona <javier@cozybit.com> wrote: > On Tue, Aug 30, 2011 at 11:43 AM, Johannes Berg >> But that's not a big thing. Have you tested it? I'm wondering if because >> we take a lock here we might run into lock dependencies issues (lockdep >> would say) but I don't think so because stop_queue etc. all take the >> same lock from an arbitrary driver context already. > > We don't have a test specific for that particular perr. Other tests > run fine and lockdep does not complain. > I can resubmit the patch with the clearer comment once we've run our > whole test suite if that gives you peace of mind. The stress tests do trigger perrs and, with this change, we see a warning due to missing info->control.vif [14202.988077] ------------[ cut here ]------------ [14202.988351] WARNING: at net/mac80211/util.c:358 ieee80211_add_pending_skb+0x97/0xa0() [14202.988353] Hardware name: Bochs [14202.988355] Modules linked in: mac80211_hwsim [last unloaded: mac80211_hwsim] [14202.988359] Pid: 15051, comm: iperf Tainted: G W 3.1.0-rc4-wl+ #47 [14202.988361] Call Trace: [14202.988364] [<c103ad2d>] warn_slowpath_common+0x6d/0xa0 [14202.988367] [<c15fbca7>] ? ieee80211_add_pending_skb+0x97/0xa0 [14202.988370] [<c15fbca7>] ? ieee80211_add_pending_skb+0x97/0xa0 [14202.988373] [<c103ad7d>] warn_slowpath_null+0x1d/0x20 [14202.988376] [<c15fbca7>] ieee80211_add_pending_skb+0x97/0xa0 [14202.988379] [<c1605ab1>] mesh_path_error_tx+0x151/0x190 [14202.988382] [<c160362b>] mesh_path_discard_frame+0xfb/0x100 [14202.988385] [<c1603580>] ? mesh_path_discard_frame+0x50/0x100 [14202.988387] [<c1605dd7>] mesh_nexthop_lookup+0x157/0x1d0 [14202.988390] [<c1605c80>] ? mesh_queue_preq+0x190/0x190 [14202.988393] [<c15f826a>] ieee80211_xmit+0x10a/0x240 [14202.988395] [<c15f8160>] ? ieee80211_tx+0xe0/0xe0 [14202.988398] [<c15f5c3a>] ? ieee80211_skb_resize+0x7a/0x100 [14202.988401] [<c15f8ba7>] ieee80211_subif_start_xmit+0x307/0x810 [14202.988404] [<c15f8cf8>] ? ieee80211_subif_start_xmit+0x458/0x810 I'll look into it tomorrow. Cheers, j
On Tue, 2011-08-30 at 18:50 -0700, Javier Cardona wrote: > The stress tests do trigger perrs and, with this change, we see a > warning due to missing info->control.vif > > [14202.988077] ------------[ cut here ]------------ > [14202.988351] WARNING: at net/mac80211/util.c:358 > ieee80211_add_pending_skb+0x97/0xa0() Interesting. Ok, that means we can't just use add_pending_skb(), we need to do some setup before, all the setup that ieee80211_tx_skb() and ieee80211_xmit() would do up to ieee80211_tx()... I guess we need a little bit of helper code to enable this. johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 30, 2011 at 10:11 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Tue, 2011-08-30 at 18:50 -0700, Javier Cardona wrote: > >> The stress tests do trigger perrs and, with this change, we see a >> warning due to missing info->control.vif >> >> [14202.988077] ------------[ cut here ]------------ >> [14202.988351] WARNING: at net/mac80211/util.c:358 >> ieee80211_add_pending_skb+0x97/0xa0() > > Interesting. Ok, that means we can't just use add_pending_skb(), we need > to do some setup before, all the setup that ieee80211_tx_skb() and > ieee80211_xmit() would do up to ieee80211_tx()... > > I guess we need a little bit of helper code to enable this. The patch that follows seems to do it. Passes our stress test without warnings. I increased the headroom on allocation to avoid a resize. Does it look correct to you? Cheers, Javier
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c index fd4f76a..f760679 100644 --- a/net/mac80211/mesh_hwmp.c +++ b/net/mac80211/mesh_hwmp.c @@ -209,6 +209,10 @@ static int mesh_path_sel_frame_tx(enum mpath_frame_type action, u8 flags, * @target_sn: SN of the broken destination * @target_rcode: reason code for this PERR * @ra: node this frame is addressed to + * + * Note: This function may be called from the tx feedback path, possibly with + * the transmission queue lock taken. To avoid a deadlock we don't transmit + * the frame directly but add it to the pending queue instead. */ int mesh_path_error_tx(u8 ttl, u8 *target, __le32 target_sn, __le16 target_rcode, const u8 *ra, @@ -263,7 +267,8 @@ int mesh_path_error_tx(u8 ttl, u8 *target, __le32 target_sn, pos += 4; memcpy(pos, &target_rcode, 2); - ieee80211_tx_skb(sdata, skb); + /* see note in function header */ + ieee80211_add_pending_skb(local, skb); return 0; }
Under failure conditions, the mesh stack sends PERR messages to the previous sender of the failed frame. This happens in the tx feedback path, in which the transmission queue lock may be taken. Avoid a deadlock by sending the path error via the pending queue. Signed-off-by: Javier Cardona <javier@cozybit.com> --- net/mac80211/mesh_hwmp.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-)