diff mbox

[2/2] nbd: don't start req until after the dead connection logic

Message ID 1508444519-8751-2-git-send-email-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik Oct. 19, 2017, 8:21 p.m. UTC
From: Josef Bacik <jbacik@fb.com>

We can end up sleeping for a while waiting for the dead timeout, which
means we could get the per request timer to fire.  We did handle this
case, but if the dead timeout happened right after we submitted we'd
either tear down the connection or possibly requeue as we're handling an
error and race with the endio which can lead to panics and other
hilarity.

Fixes: 560bc4b39952 ("nbd: handle dead connections")
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

Comments

Bart Van Assche May 17, 2018, 6:21 p.m. UTC | #1
On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
> +	blk_mq_start_request(req);

>  	if (unlikely(nsock->pending && nsock->pending != req)) {

>  		blk_mq_requeue_request(req, true);

>  		ret = 0;


(replying to an e-mail from seven months ago)

Hello Josef,

Are you aware that the nbd driver is one of the very few block drivers that
calls blk_mq_requeue_request() after a request has been started? I think that
can lead to the block layer core to undesired behavior, e.g. that the timeout
handler fires concurrently with a request being reinstered. Can you or a
colleague have a look at this? I would like to add the following code to the
block layer core and I think that the nbd driver would trigger this warning:

 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
+       WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE);
+
        __blk_mq_requeue_request(rq);

Thanks,

Bart.
Josef Bacik May 17, 2018, 6:41 p.m. UTC | #2
On Thu, May 17, 2018 at 06:21:40PM +0000, Bart Van Assche wrote:
> On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
> > +	blk_mq_start_request(req);
> >  	if (unlikely(nsock->pending && nsock->pending != req)) {
> >  		blk_mq_requeue_request(req, true);
> >  		ret = 0;
> 
> (replying to an e-mail from seven months ago)
> 
> Hello Josef,
> 
> Are you aware that the nbd driver is one of the very few block drivers that
> calls blk_mq_requeue_request() after a request has been started? I think that
> can lead to the block layer core to undesired behavior, e.g. that the timeout
> handler fires concurrently with a request being reinstered. Can you or a
> colleague have a look at this? I would like to add the following code to the
> block layer core and I think that the nbd driver would trigger this warning:
> 
>  void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
>  {
> +       WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE);
> +
>         __blk_mq_requeue_request(rq);
> 

Yup I can tell you why, on 4.11 where I originally did this work
__blk_mq_requeue_request() did this

static void __blk_mq_requeue_request(struct request *rq)
{
        struct request_queue *q = rq->q;

        trace_block_rq_requeue(q, rq);
        rq_qos_requeue(q, &rq->issue_stat);
        blk_mq_sched_requeue_request(rq);

        if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
                if (q->dma_drain_size && blk_rq_bytes(rq))
                        rq->nr_phys_segments--;
        }
}

So it was clearing the started part when it did the requeue.  If that's not what
I'm supposed to be doing anymore then I can send a patch to fix it.  What is
supposed to be done if I did already do blk_mq_start_request, because I can
avoid doing the start until after that chunk of code, but there's a part further
down that needs to have start done before we reach it, so I'll have to do
whatever the special thing is now there.  Thanks,

Josef
Bart Van Assche May 17, 2018, 6:52 p.m. UTC | #3
On Thu, 2018-05-17 at 14:41 -0400, Josef Bacik wrote:
> Yup I can tell you why, on 4.11 where I originally did this work

> __blk_mq_requeue_request() did this

> 

> static void __blk_mq_requeue_request(struct request *rq)

> {

>         struct request_queue *q = rq->q;

> 

>         trace_block_rq_requeue(q, rq);

>         rq_qos_requeue(q, &rq->issue_stat);

>         blk_mq_sched_requeue_request(rq);

> 

>         if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {

>                 if (q->dma_drain_size && blk_rq_bytes(rq))

>                         rq->nr_phys_segments--;

>         }

> }

> 

> So it was clearing the started part when it did the requeue.  If that's not what

> I'm supposed to be doing anymore then I can send a patch to fix it.  What is

> supposed to be done if I did already do blk_mq_start_request, because I can

> avoid doing the start until after that chunk of code, but there's a part further

> down that needs to have start done before we reach it, so I'll have to do

> whatever the special thing is now there.  Thanks,


Hello Josef,

After having had a closer look I think calling blk_mq_start_request() before
blk_mq_requeue_request() is fine anyway. The v4.16 block layer core namely defers
timeout processing until after .queue_rq() has returned. So the timeout code
shouldn't see the requests for which both blk_mq_start_request() and
blk_mq_requeue_request() are called from inside .queue_rq(). I will make sure this
behavior is preserved.

Bart.
diff mbox

Patch

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index fd2f724462b6..528e6f6951cc 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -289,15 +289,6 @@  static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 		cmd->status = BLK_STS_TIMEOUT;
 		return BLK_EH_HANDLED;
 	}
-
-	/* If we are waiting on our dead timer then we could get timeout
-	 * callbacks for our request.  For this we just want to reset the timer
-	 * and let the queue side take care of everything.
-	 */
-	if (!completion_done(&cmd->send_complete)) {
-		nbd_config_put(nbd);
-		return BLK_EH_RESET_TIMER;
-	}
 	config = nbd->config;
 
 	if (config->num_connections > 1) {
@@ -732,6 +723,7 @@  static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Socks array is empty\n");
+		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	config = nbd->config;
@@ -740,6 +732,7 @@  static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Attempted send on invalid socket\n");
 		nbd_config_put(nbd);
+		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	cmd->status = BLK_STS_OK;
@@ -763,6 +756,7 @@  static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 			 */
 			sock_shutdown(nbd);
 			nbd_config_put(nbd);
+			blk_mq_start_request(req);
 			return -EIO;
 		}
 		goto again;
@@ -773,6 +767,7 @@  static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	 * here so that it gets put _after_ the request that is already on the
 	 * dispatch list.
 	 */
+	blk_mq_start_request(req);
 	if (unlikely(nsock->pending && nsock->pending != req)) {
 		blk_mq_requeue_request(req, true);
 		ret = 0;
@@ -785,10 +780,10 @@  static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	ret = nbd_send_cmd(nbd, cmd, index);
 	if (ret == -EAGAIN) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
-				    "Request send failed trying another connection\n");
+				    "Request send failed, requeueing\n");
 		nbd_mark_nsock_dead(nbd, nsock, 1);
-		mutex_unlock(&nsock->tx_lock);
-		goto again;
+		blk_mq_requeue_request(req, true);
+		ret = 0;
 	}
 out:
 	mutex_unlock(&nsock->tx_lock);
@@ -812,7 +807,6 @@  static blk_status_t nbd_queue_rq(struct blk_mq_hw_ctx *hctx,
 	 * done sending everything over the wire.
 	 */
 	init_completion(&cmd->send_complete);
-	blk_mq_start_request(bd->rq);
 
 	/* We can be called directly from the user space process, which means we
 	 * could possibly have signals pending so our sendmsg will fail.  In