From patchwork Thu Oct 19 20:21:59 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 10018285
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	0DCDA60215 for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 19 Oct 2017 20:22:10 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 022CC28E52
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 19 Oct 2017 20:22:10 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id EAFED28E55; Thu, 19 Oct 2017 20:22:09 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, RCVD_IN_DNSWL_HI,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85D2628E52
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 19 Oct 2017 20:22:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753580AbdJSUWH (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Thu, 19 Oct 2017 16:22:07 -0400
Received: from mail-qk0-f193.google.com ([209.85.220.193]:54521 "EHLO
	mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753998AbdJSUWD (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Thu, 19 Oct 2017 16:22:03 -0400
Received: by mail-qk0-f193.google.com with SMTP id n5so11898045qke.11
	for <linux-block@vger.kernel.org>;
	Thu, 19 Oct 2017 13:22:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=T1dJU20s9CPUO77FpLU76ADrAmZve5iDE4wtTZwL8Rs=;
	b=aR71ganJTXdNpb4ivpshNa8s6NEQ29fu8uKUm7FWG2mcQg2M0AJagQ9D3Ize8Pz3+7
	WI1P8/P9Cut6rON+/bInMCt0coeUYJ06SzgCSYrvakIWH+QjWoJQ8P6BW8UWpn8Kqcr1
	6YcF0NE4Wg1jsGRxJEgNzg0uvkPhMyNl2AX4EbstNSnAvo2pK7I0OcEWglg+T2BTw4Mo
	T45Fsgdpcpx9yOcW07rTqHIq9QVp7v0A/5LVkdxiSroWrPgjhiswG6xOG8Bl0hn8OjpX
	JFT/09+mlxrzLX5k+UMDOx3LbnGF4K+MspSgO3fSQTBUWgexmVyp1Bk8wPGxPM32rBqG
	5k/Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=T1dJU20s9CPUO77FpLU76ADrAmZve5iDE4wtTZwL8Rs=;
	b=H7jI2XC/oWe9hy8RkfRn/7pOcJdgFKGT5LCLFRi96AGFMkrhMwFVFdcnrEx4VgoqCt
	TqIjpeEhesh0C1fIjkqoki7KLyUxF3NJR9wJiGq7SNqVWxfLU2j0p63xyV3Ykhj3Vryq
	+/thv91zmX9GCZT1TC2j9NFRBcvaZMHQnHLAYJmrTHTYdKbri0M/Tv/Jxx6E9tEqYk3Q
	zItM/7wIrUdL7K9jWaojidGX3GiOw7lpaYh91hoGi2fdyd0EqOcglkjZEBnhEUN4D0Mw
	NY4Mx5cwuT8ZGsTY2LONpo8IjnhlbmjuBWgw6UUd2egSjoEgrPd2tv+xBhSeL9K7UeuZ
	xiMg==
X-Gm-Message-State: AMCzsaVfQ7KK63N3Dd+GY8KdICUlyM596tgkNtUFe7SJN6twHGm1lyWi
	+2Trn1bROHZ9KOr+XHeDrcwhiQ==
X-Google-Smtp-Source: 
 ABhQp+Qc9UHPXK4sQZfEzNHN5UFaK+NnCnCySz+7j/HrG2Q/suVODtrbjmI5NvUrhJroesDSecMq9w==
X-Received: by 10.55.3.130 with SMTP id 124mr3887350qkd.42.1508444522536;
	Thu, 19 Oct 2017 13:22:02 -0700 (PDT)
Received: from localhost ([2606:a000:4381:1201:225:22ff:feb3:e51a])
	by smtp.gmail.com with ESMTPSA id
	q49sm10233918qtq.80.2017.10.19.13.22.01
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 19 Oct 2017 13:22:02 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
To: axboe@kernel.dk, nbd@other.debian.org, linux-block@vger.kernel.org,
	kernel-team@fb.com
Cc: Josef Bacik <jbacik@fb.com>, stable@vger.kernel.org
Subject: [PATCH 2/2] nbd: don't start req until after the dead connection
	logic
Date: Thu, 19 Oct 2017 16:21:59 -0400
Message-Id: <1508444519-8751-2-git-send-email-josef@toxicpanda.com>
X-Mailer: git-send-email 2.7.5
In-Reply-To: <1508444519-8751-1-git-send-email-josef@toxicpanda.com>
References: <1508444519-8751-1-git-send-email-josef@toxicpanda.com>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Josef Bacik <jbacik@fb.com>

We can end up sleeping for a while waiting for the dead timeout, which
means we could get the per request timer to fire.  We did handle this
case, but if the dead timeout happened right after we submitted we'd
either tear down the connection or possibly requeue as we're handling an
error and race with the endio which can lead to panics and other
hilarity.

Fixes: 560bc4b39952 ("nbd: handle dead connections")
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index fd2f724462b6..528e6f6951cc 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -289,15 +289,6 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 		cmd->status = BLK_STS_TIMEOUT;
 		return BLK_EH_HANDLED;
 	}
-
-	/* If we are waiting on our dead timer then we could get timeout
-	 * callbacks for our request.  For this we just want to reset the timer
-	 * and let the queue side take care of everything.
-	 */
-	if (!completion_done(&cmd->send_complete)) {
-		nbd_config_put(nbd);
-		return BLK_EH_RESET_TIMER;
-	}
 	config = nbd->config;
 
 	if (config->num_connections > 1) {
@@ -732,6 +723,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Socks array is empty\n");
+		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	config = nbd->config;
@@ -740,6 +732,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Attempted send on invalid socket\n");
 		nbd_config_put(nbd);
+		blk_mq_start_request(req);
 		return -EINVAL;
 	}
 	cmd->status = BLK_STS_OK;
@@ -763,6 +756,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 			 */
 			sock_shutdown(nbd);
 			nbd_config_put(nbd);
+			blk_mq_start_request(req);
 			return -EIO;
 		}
 		goto again;
@@ -773,6 +767,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	 * here so that it gets put _after_ the request that is already on the
 	 * dispatch list.
 	 */
+	blk_mq_start_request(req);
 	if (unlikely(nsock->pending && nsock->pending != req)) {
 		blk_mq_requeue_request(req, true);
 		ret = 0;
@@ -785,10 +780,10 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	ret = nbd_send_cmd(nbd, cmd, index);
 	if (ret == -EAGAIN) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
-				    "Request send failed trying another connection\n");
+				    "Request send failed, requeueing\n");
 		nbd_mark_nsock_dead(nbd, nsock, 1);
-		mutex_unlock(&nsock->tx_lock);
-		goto again;
+		blk_mq_requeue_request(req, true);
+		ret = 0;
 	}
 out:
 	mutex_unlock(&nsock->tx_lock);
@@ -812,7 +807,6 @@ static blk_status_t nbd_queue_rq(struct blk_mq_hw_ctx *hctx,
 	 * done sending everything over the wire.
 	 */
 	init_completion(&cmd->send_complete);
-	blk_mq_start_request(bd->rq);
 
 	/* We can be called directly from the user space process, which means we
 	 * could possibly have signals pending so our sendmsg will fail.  In