diff mbox

[20/22] IB/ipoib: don't queue a work struct up twice

Message ID 30a5bd6461381448c52af0d7408dbc14da9ac4d0.1423703861.git.dledford@redhat.com (mailing list archive)
State Rejected
Headers show

Commit Message

Doug Ledford Feb. 12, 2015, 1:43 a.m. UTC
In b2d408f78b3 (IB/ipoib: make delayed tasks not hold up everything) I
added the ability to have some joins processed around delayed joins.
Because the join task processes one join at a time, we needed to queue
the task to run again immediately and then again at the appropriate
delay time.  I didn't think about the fact that our work struct can only
be on the workqueue once at a given time and instead tried to queue the
same struct up twice.  Instead, kick the queue again immediately on a
delayed restart attempt, when the task sees that it has nothing to do
but delayed work, it will pick the shortest work item and queue a delay
equal to its expiration.  Finally, in case we have delayed work, in the
send_mcast routine we need to cancel any delayed work before kicking the
thread immediately.

Signed-off-by: Doug Ledford <dledford@redhat.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 49 ++++++++++++++++----------
 1 file changed, 31 insertions(+), 18 deletions(-)

Comments

Or Gerlitz Feb. 12, 2015, 6:33 p.m. UTC | #1
On 02/12/2015 03:43 AM, Doug Ledford wrote:
> In b2d408f78b3 (IB/ipoib: make delayed tasks not hold up everything) I
> added the ability to have some joins processed around delayed joins.
> Because the join task processes one join at a time, we needed to queue
> the task to run again immediately and then again at the appropriate
> delay time.  I didn't think about the fact that our work struct can only
> be on the workqueue once at a given time and instead tried to queue the
> same struct up twice.  Instead, kick the queue again immediately on a
> delayed restart attempt, when the task sees that it has nothing to do
> but delayed work, it will pick the shortest work item and queue a delay
> equal to its expiration.  Finally, in case we have delayed work, in the
> send_mcast routine we need to cancel any delayed work before kicking the
> thread immediately.
This is strictly a fix to a downstream patch of the series, lets squash 
it there somehow, I see no point to submit
commit that has a bug and later a fix in the same series.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Feb. 12, 2015, 7:47 p.m. UTC | #2
On Thu, 2015-02-12 at 20:33 +0200, Or Gerlitz wrote:
> On 02/12/2015 03:43 AM, Doug Ledford wrote:
> > In b2d408f78b3 (IB/ipoib: make delayed tasks not hold up everything) I
> > added the ability to have some joins processed around delayed joins.
> > Because the join task processes one join at a time, we needed to queue
> > the task to run again immediately and then again at the appropriate
> > delay time.  I didn't think about the fact that our work struct can only
> > be on the workqueue once at a given time and instead tried to queue the
> > same struct up twice.  Instead, kick the queue again immediately on a
> > delayed restart attempt, when the task sees that it has nothing to do
> > but delayed work, it will pick the shortest work item and queue a delay
> > equal to its expiration.  Finally, in case we have delayed work, in the
> > send_mcast routine we need to cancel any delayed work before kicking the
> > thread immediately.
> This is strictly a fix to a downstream patch of the series, lets squash 
> it there somehow, I see no point to submit
> commit that has a bug and later a fix in the same series.

A number of them *are* strictly fixes.  That's a given.  But both you
and Erez emailed me and wanted this patch set sooner rather than later
because Erez in particular had some other things he wanted to send
through and wanted them to be based upon this patchset so they would
apply cleanly.  When I went to squash these things, it became clear very
quickly that I would end up rewriting significant portions of this work,
and that I would have to retest all but the first few patches one patch
at a time.  And I'm actually OK doing that, but it wasn't possible to
meet both requirements at the same time.  So, I'll keep a new branch and
this branch, and I'll reorder things (like the singleton fix you brought
up in another email being first) and squash things, and when I get done,
I'll do a diff between the two branches to make sure that there are no
logical differences.  That should avoid invalidating all of the
testing/QE work that has already been done on this patchset.  In the
meantime, Erez can work off of these patches knowing the end result will
be the same either way.
Or Gerlitz Feb. 12, 2015, 9:35 p.m. UTC | #3
On 02/12/2015 09:47 PM, Doug Ledford wrote:
> So, I'll keep a new branch and this branch, and I'll reorder things (like the singleton fix you brought
> up in another email being first) and squash things, and when I get done,
> I'll do a diff between the two branches to make sure that there are no
> logical differences.  That should avoid invalidating all of the
> testing/QE work that has already been done on this patchset.

yes, lets do that, for example, patches 1-21 touch 56 times the 
mcast_mutex and patch #22 removes that mutex all together, this is 
nightmare for future bisection and maintenance. Please post the 
revised/shorter series once you have it to the list and specify the 
branch too.

> In the meantime, Erez can work off of these patches knowing the end result will be the same either way.
We are almost one week into the 3.20 merge window and our WW (Sun-Thu) 
is pretty much done. Erez will try to give some post-WW feedback but 
next week is on a well-pre-planned OOO -- I'm really not sure what 
should we do here. If you have the squashed patches by the end of this 
week, and our regression system runs them early next week maybe we can 
ack them.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index ad2bcd105e4..68b47bed33a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -71,29 +71,28 @@  static void __ipoib_mcast_continue_join_thread(struct ipoib_dev_priv *priv,
 					       int delay)
 {
 	if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) {
-		/*
-		 * Mark this mcast for its delay and set a timer to kick the
-		 * thread when the delay completes
-		 */
 		if (mcast && delay) {
 			mcast->backoff *= 2;
 			if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS)
 				mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS;
 			mcast->delay_until = jiffies + (mcast->backoff * HZ);
-			queue_delayed_work(priv->wq, &priv->mcast_task,
-					   mcast->backoff * HZ);
+			/*
+			 * Mark this mcast for its delay, but restart the
+			 * task immediately.  It will make sure to clear
+			 * out all entries without delays, and then
+			 * schedule itself for run again when the delay
+			 * expires
+			 */
+			queue_delayed_work(priv->wq, &priv->mcast_task, 0);
 		} else if (delay) {
-			/* Special case of retrying after a failure to
+			/*
+			 * Special case of retrying after a failure to
 			 * allocate the broadcast multicast group, wait
 			 * 1 second and try again
 			 */
 			queue_delayed_work(priv->wq, &priv->mcast_task, HZ);
-		}
-		/*
-		 * But also rekick the thread immediately for any other
-		 * tasks in queue behind this one
-		 */
-		queue_delayed_work(priv->wq, &priv->mcast_task, delay);
+		} else
+			queue_delayed_work(priv->wq, &priv->mcast_task, 0);
 	}
 }
 
@@ -568,6 +567,7 @@  void ipoib_mcast_join_task(struct work_struct *work)
 	struct ib_port_attr port_attr;
 	struct ipoib_mcast *mcast = NULL;
 	int create = 1;
+	unsigned long delay_until = 0;
 
 	if (!test_bit(IPOIB_MCAST_RUN, &priv->flags))
 		return;
@@ -636,11 +636,14 @@  void ipoib_mcast_join_task(struct work_struct *work)
 	list_for_each_entry(mcast, &priv->multicast_list, list) {
 		if (IS_ERR_OR_NULL(mcast->mc) &&
 		    !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags) &&
-		    !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags) &&
-		    (mcast->backoff == 1 ||
-		     time_after_eq(jiffies, mcast->delay_until))) {
-			/* Found the next unjoined group */
-			break;
+		    !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
+			if (mcast->backoff == 1 ||
+			    time_after_eq(jiffies, mcast->delay_until))
+				/* Found the next unjoined group */
+				break;
+			else if (!delay_until ||
+				 time_before(mcast->delay_until, delay_until))
+				delay_until = mcast->delay_until;
 		}
 	}
 
@@ -653,6 +656,9 @@  void ipoib_mcast_join_task(struct work_struct *work)
 		mcast = NULL;
 		ipoib_dbg_mcast(priv, "successfully joined all "
 				"multicast groups\n");
+		if (delay_until)
+			queue_delayed_work(priv->wq, &priv->mcast_task,
+					   delay_until - jiffies);
 	}
 
 out:
@@ -766,6 +772,13 @@  void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 		memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid));
 		__ipoib_mcast_add(dev, mcast);
 		list_add_tail(&mcast->list, &priv->multicast_list);
+		/*
+		 * Make sure that if we are currently waiting for a backoff
+		 * delay to expire, that we cancel that, run immediately,
+		 * and then requeue our task to fire when the backoff
+		 * timer is over.
+		 */
+		cancel_delayed_work(&priv->mcast_task);
 		__ipoib_mcast_continue_join_thread(priv, NULL, 0);
 	}