From patchwork Mon Nov 6 21:11:58 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10044513 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D67B16032D for ; Mon, 6 Nov 2017 21:12:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C85C729F32 for ; Mon, 6 Nov 2017 21:12:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BC3E02A021; Mon, 6 Nov 2017 21:12:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AA3829F26 for ; Mon, 6 Nov 2017 21:12:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753376AbdKFVMC (ORCPT ); Mon, 6 Nov 2017 16:12:02 -0500 Received: from mail-yw0-f196.google.com ([209.85.161.196]:53060 "EHLO mail-yw0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753396AbdKFVMB (ORCPT ); Mon, 6 Nov 2017 16:12:01 -0500 Received: by mail-yw0-f196.google.com with SMTP id w2so9076633ywa.9 for ; Mon, 06 Nov 2017 13:12:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=xbr7LPrCuI+9f/PdEiE+I3iexXtc/IS85oRTDl7jXmA=; b=Wp07hmKpI13ycf6Vty4jhSObW8Ml4EQuTxU73JXCql+URUKyeF89Z4rw5XuDw+NqQ1 Yvlw2+4EmgCUHf6B16Jicg6JjdZ9Y1iPnAAgUg34Cwu8MKwE3dGAljjoAn+ke2/GTHF4 7z3caOBvb4+ki/NgGYE8VT/aE2K9gnUPkkaYZXm6M5jbRsiZK/s8rx/M7xuOxIuPlce4 gGVR17d4mHbphiG+XyibeWV7KF5cIHVHIFyFI2cfrmup9KbOJh4QVP92lcNZs+l51t/P vNna9oSCYBswBTfPwBVuPawJDPDY70oimAJEuD6xotTguWF4BiIaMeDjCV+N/p/TZr85 SYkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=xbr7LPrCuI+9f/PdEiE+I3iexXtc/IS85oRTDl7jXmA=; b=mMrXa4GfAmp4KWRla7EypzM5zWykAJfwghGHX2e9PuyNvu18eQpLr++rEE8vISVDLU kMQHw1kbo9gir+ooUW+29Lj7nrMe/yDec0ue8Sg20cyY/Y+jjtFNLfg5czXXo8vRk+RM dPvzZWFszYvzOv1PJa/8Oom3PoDNerlqu9Gj5cJ9unAkJb1cwGYIHSoq6BwoSTA7NpB1 7xljO0Xp6svCHP6HlghSwutZQEIoZ36IXxxxtRjg5pyf+JMasdst0tcGbzutCBRGmUCA N38NiZ1+Kb44dUrosu5BOSFdd6UjA2oZtbwenJRklAq7BcFgyJzKE8obRH2iSVjWcp5u HL/w== X-Gm-Message-State: AMCzsaUj0hVQ9eVpcw4NT/4PhI+N18JP1W7I053ghq7eP4DdfckZxTfM 6VEzw0DGu3/h2JUBBq5vIkgRwg== X-Google-Smtp-Source: ABhQp+QIEaPz5UNkUj+FmA6DxS1is8fgBJQaLZfK3UZZaOudOyBaRKlbEoqnIfRMS5ZTxMWPF6JCcA== X-Received: by 10.37.178.143 with SMTP id k15mr10458679ybj.277.1510002720868; Mon, 06 Nov 2017 13:12:00 -0800 (PST) Received: from localhost (cpe-2606-A000-4381-1201-225-22FF-FEB3-E51A.dyn6.twc.com. [2606:a000:4381:1201:225:22ff:feb3:e51a]) by smtp.gmail.com with ESMTPSA id e124sm6306242ywc.34.2017.11.06.13.12.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Nov 2017 13:12:00 -0800 (PST) From: Josef Bacik To: axboe@kernel.dk, nbd@other.debian.org, linux-block@vger.kernel.org, kernel-team@fb.com Cc: Josef Bacik , stable@vger.kernel.org Subject: [PATCH 2/2][RESEND] nbd: don't start req until after the dead connection logic Date: Mon, 6 Nov 2017 16:11:58 -0500 Message-Id: <1510002718-9574-2-git-send-email-josef@toxicpanda.com> X-Mailer: git-send-email 2.7.5 In-Reply-To: <1510002718-9574-1-git-send-email-josef@toxicpanda.com> References: <1510002718-9574-1-git-send-email-josef@toxicpanda.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Josef Bacik We can end up sleeping for a while waiting for the dead timeout, which means we could get the per request timer to fire. We did handle this case, but if the dead timeout happened right after we submitted we'd either tear down the connection or possibly requeue as we're handling an error and race with the endio which can lead to panics and other hilarity. Fixes: 560bc4b39952 ("nbd: handle dead connections") Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik --- drivers/block/nbd.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index fdef8efcdabc..5f2a4240a204 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -288,15 +288,6 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, cmd->status = BLK_STS_TIMEOUT; return BLK_EH_HANDLED; } - - /* If we are waiting on our dead timer then we could get timeout - * callbacks for our request. For this we just want to reset the timer - * and let the queue side take care of everything. - */ - if (!completion_done(&cmd->send_complete)) { - nbd_config_put(nbd); - return BLK_EH_RESET_TIMER; - } config = nbd->config; if (config->num_connections > 1) { @@ -740,6 +731,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) if (!refcount_inc_not_zero(&nbd->config_refs)) { dev_err_ratelimited(disk_to_dev(nbd->disk), "Socks array is empty\n"); + blk_mq_start_request(req); return -EINVAL; } config = nbd->config; @@ -748,6 +740,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) dev_err_ratelimited(disk_to_dev(nbd->disk), "Attempted send on invalid socket\n"); nbd_config_put(nbd); + blk_mq_start_request(req); return -EINVAL; } cmd->status = BLK_STS_OK; @@ -771,6 +764,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) */ sock_shutdown(nbd); nbd_config_put(nbd); + blk_mq_start_request(req); return -EIO; } goto again; @@ -781,6 +775,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) * here so that it gets put _after_ the request that is already on the * dispatch list. */ + blk_mq_start_request(req); if (unlikely(nsock->pending && nsock->pending != req)) { blk_mq_requeue_request(req, true); ret = 0; @@ -793,10 +788,10 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) ret = nbd_send_cmd(nbd, cmd, index); if (ret == -EAGAIN) { dev_err_ratelimited(disk_to_dev(nbd->disk), - "Request send failed trying another connection\n"); + "Request send failed, requeueing\n"); nbd_mark_nsock_dead(nbd, nsock, 1); - mutex_unlock(&nsock->tx_lock); - goto again; + blk_mq_requeue_request(req, true); + ret = 0; } out: mutex_unlock(&nsock->tx_lock); @@ -820,7 +815,6 @@ static blk_status_t nbd_queue_rq(struct blk_mq_hw_ctx *hctx, * done sending everything over the wire. */ init_completion(&cmd->send_complete); - blk_mq_start_request(bd->rq); /* We can be called directly from the user space process, which means we * could possibly have signals pending so our sendmsg will fail. In