diff mbox

mac80211: Defer tranmission of mesh path errors

Message ID 1314728954-22646-1-git-send-email-javier@cozybit.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Javier Cardona Aug. 30, 2011, 6:29 p.m. UTC
Under failure conditions, the mesh stack sends PERR messages to the
previous sender of the failed frame.  This happens in the tx feedback
path, in which the transmission queue lock may be taken.  Avoid a
deadlock by sending the path error via the pending queue.

Signed-off-by: Javier Cardona <javier@cozybit.com>
---
 net/mac80211/mesh_hwmp.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

Comments

Johannes Berg Aug. 30, 2011, 6:43 p.m. UTC | #1
On Tue, 2011-08-30 at 11:29 -0700, Javier Cardona wrote:
> Under failure conditions, the mesh stack sends PERR messages to the
> previous sender of the failed frame.  This happens in the tx feedback
> path, in which the transmission queue lock may be taken.  Avoid a
> deadlock by sending the path error via the pending queue.
> 
> Signed-off-by: Javier Cardona <javier@cozybit.com>
> ---
>  net/mac80211/mesh_hwmp.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
> index fd4f76a..f760679 100644
> --- a/net/mac80211/mesh_hwmp.c
> +++ b/net/mac80211/mesh_hwmp.c
> @@ -209,6 +209,10 @@ static int mesh_path_sel_frame_tx(enum mpath_frame_type action, u8 flags,
>   * @target_sn: SN of the broken destination
>   * @target_rcode: reason code for this PERR
>   * @ra: node this frame is addressed to
> + *
> + * Note: This function may be called from the tx feedback path, possibly with
> + * the transmission queue lock taken.  To avoid a deadlock we don't transmit
> + * the frame directly but add it to the pending queue instead.

I'd change that to say "with driver locks taken that the driver also
acquires in the TX path" or something like that maybe?

But that's not a big thing. Have you tested it? I'm wondering if because
we take a lock here we might run into lock dependencies issues (lockdep
would say) but I don't think so because stop_queue etc. all take the
same lock from an arbitrary driver context already.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Javier Cardona Aug. 30, 2011, 9:38 p.m. UTC | #2
On Tue, Aug 30, 2011 at 11:43 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Tue, 2011-08-30 at 11:29 -0700, Javier Cardona wrote:
>> Under failure conditions, the mesh stack sends PERR messages to the
>> previous sender of the failed frame.  This happens in the tx feedback
>> path, in which the transmission queue lock may be taken.  Avoid a
>> deadlock by sending the path error via the pending queue.
>>
>> Signed-off-by: Javier Cardona <javier@cozybit.com>
>> ---
>>  net/mac80211/mesh_hwmp.c |    7 ++++++-
>>  1 files changed, 6 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
>> index fd4f76a..f760679 100644
>> --- a/net/mac80211/mesh_hwmp.c
>> +++ b/net/mac80211/mesh_hwmp.c
>> @@ -209,6 +209,10 @@ static int mesh_path_sel_frame_tx(enum mpath_frame_type action, u8 flags,
>>   * @target_sn: SN of the broken destination
>>   * @target_rcode: reason code for this PERR
>>   * @ra: node this frame is addressed to
>> + *
>> + * Note: This function may be called from the tx feedback path, possibly with
>> + * the transmission queue lock taken.  To avoid a deadlock we don't transmit
>> + * the frame directly but add it to the pending queue instead.
>
> I'd change that to say "with driver locks taken that the driver also
> acquires in the TX path" or something like that maybe?

OK

> But that's not a big thing. Have you tested it? I'm wondering if because
> we take a lock here we might run into lock dependencies issues (lockdep
> would say) but I don't think so because stop_queue etc. all take the
> same lock from an arbitrary driver context already.

We don't have a test specific for that particular perr.  Other tests
run fine and lockdep does not complain.
I can resubmit the patch with the clearer comment once we've run our
whole test suite if that gives you peace of mind.

Javier

> Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
>
> johannes
>
>
Javier Cardona Aug. 31, 2011, 1:50 a.m. UTC | #3
On Tue, Aug 30, 2011 at 2:38 PM, Javier Cardona <javier@cozybit.com> wrote:
> On Tue, Aug 30, 2011 at 11:43 AM, Johannes Berg
>> But that's not a big thing. Have you tested it? I'm wondering if because
>> we take a lock here we might run into lock dependencies issues (lockdep
>> would say) but I don't think so because stop_queue etc. all take the
>> same lock from an arbitrary driver context already.
>
> We don't have a test specific for that particular perr.  Other tests
> run fine and lockdep does not complain.
> I can resubmit the patch with the clearer comment once we've run our
> whole test suite if that gives you peace of mind.

The stress tests do trigger perrs and, with this change, we see a
warning due to missing info->control.vif

[14202.988077] ------------[ cut here ]------------
[14202.988351] WARNING: at net/mac80211/util.c:358
ieee80211_add_pending_skb+0x97/0xa0()
[14202.988353] Hardware name: Bochs
[14202.988355] Modules linked in: mac80211_hwsim [last unloaded: mac80211_hwsim]
[14202.988359] Pid: 15051, comm: iperf Tainted: G        W   3.1.0-rc4-wl+ #47
[14202.988361] Call Trace:
[14202.988364]  [<c103ad2d>] warn_slowpath_common+0x6d/0xa0
[14202.988367]  [<c15fbca7>] ? ieee80211_add_pending_skb+0x97/0xa0
[14202.988370]  [<c15fbca7>] ? ieee80211_add_pending_skb+0x97/0xa0
[14202.988373]  [<c103ad7d>] warn_slowpath_null+0x1d/0x20
[14202.988376]  [<c15fbca7>] ieee80211_add_pending_skb+0x97/0xa0
[14202.988379]  [<c1605ab1>] mesh_path_error_tx+0x151/0x190
[14202.988382]  [<c160362b>] mesh_path_discard_frame+0xfb/0x100
[14202.988385]  [<c1603580>] ? mesh_path_discard_frame+0x50/0x100
[14202.988387]  [<c1605dd7>] mesh_nexthop_lookup+0x157/0x1d0
[14202.988390]  [<c1605c80>] ? mesh_queue_preq+0x190/0x190
[14202.988393]  [<c15f826a>] ieee80211_xmit+0x10a/0x240
[14202.988395]  [<c15f8160>] ? ieee80211_tx+0xe0/0xe0
[14202.988398]  [<c15f5c3a>] ? ieee80211_skb_resize+0x7a/0x100
[14202.988401]  [<c15f8ba7>] ieee80211_subif_start_xmit+0x307/0x810
[14202.988404]  [<c15f8cf8>] ? ieee80211_subif_start_xmit+0x458/0x810

I'll look into it tomorrow.

Cheers,

j
Johannes Berg Aug. 31, 2011, 5:11 a.m. UTC | #4
On Tue, 2011-08-30 at 18:50 -0700, Javier Cardona wrote:

> The stress tests do trigger perrs and, with this change, we see a
> warning due to missing info->control.vif
> 
> [14202.988077] ------------[ cut here ]------------
> [14202.988351] WARNING: at net/mac80211/util.c:358
> ieee80211_add_pending_skb+0x97/0xa0()

Interesting. Ok, that means we can't just use add_pending_skb(), we need
to do some setup before, all the setup that ieee80211_tx_skb() and
ieee80211_xmit() would do up to ieee80211_tx()...

I guess we need a little bit of helper code to enable this.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Javier Cardona Sept. 1, 2011, 5:04 p.m. UTC | #5
On Tue, Aug 30, 2011 at 10:11 PM, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Tue, 2011-08-30 at 18:50 -0700, Javier Cardona wrote:
>
>> The stress tests do trigger perrs and, with this change, we see a
>> warning due to missing info->control.vif
>>
>> [14202.988077] ------------[ cut here ]------------
>> [14202.988351] WARNING: at net/mac80211/util.c:358
>> ieee80211_add_pending_skb+0x97/0xa0()
>
> Interesting. Ok, that means we can't just use add_pending_skb(), we need
> to do some setup before, all the setup that ieee80211_tx_skb() and
> ieee80211_xmit() would do up to ieee80211_tx()...
>
> I guess we need a little bit of helper code to enable this.

The patch that follows seems to do it.  Passes our stress test without
warnings.  I increased the headroom on allocation to avoid a resize.  Does it
look correct to you?

Cheers,

Javier
diff mbox

Patch

diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index fd4f76a..f760679 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -209,6 +209,10 @@  static int mesh_path_sel_frame_tx(enum mpath_frame_type action, u8 flags,
  * @target_sn: SN of the broken destination
  * @target_rcode: reason code for this PERR
  * @ra: node this frame is addressed to
+ *
+ * Note: This function may be called from the tx feedback path, possibly with
+ * the transmission queue lock taken.  To avoid a deadlock we don't transmit
+ * the frame directly but add it to the pending queue instead.
  */
 int mesh_path_error_tx(u8 ttl, u8 *target, __le32 target_sn,
 		       __le16 target_rcode, const u8 *ra,
@@ -263,7 +267,8 @@  int mesh_path_error_tx(u8 ttl, u8 *target, __le32 target_sn,
 	pos += 4;
 	memcpy(pos, &target_rcode, 2);
 
-	ieee80211_tx_skb(sdata, skb);
+	/* see note in function header */
+	ieee80211_add_pending_skb(local, skb);
 	return 0;
 }