From patchwork Thu Apr  6 21:01:57 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Josef Bacik <josef@toxicpanda.com>
X-Patchwork-Id: 9668413
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	6918C60364 for <patchwork-linux-block@patchwork.kernel.org>;
	Thu,  6 Apr 2017 21:02:16 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6FC642858B
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu,  6 Apr 2017 21:02:16 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 64AAD285DB; Thu,  6 Apr 2017 21:02:16 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, RCVD_IN_DNSWL_HI,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B05202858B
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu,  6 Apr 2017 21:02:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753323AbdDFVCP (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Thu, 6 Apr 2017 17:02:15 -0400
Received: from mail-qk0-f195.google.com ([209.85.220.195]:35980 "EHLO
	mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752974AbdDFVCO (ORCPT
	<rfc822; linux-block@vger.kernel.org>); Thu, 6 Apr 2017 17:02:14 -0400
Received: by mail-qk0-f195.google.com with SMTP id v75so7629384qkb.3
	for <linux-block@vger.kernel.org>;
	Thu, 06 Apr 2017 14:02:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
	h=from:to:subject:date:message-id:in-reply-to:references;
	bh=n0m4i2vMJNFUgFACsQ//59e+XoqQGNr6/8xagbfhD+E=;
	b=kJH8PDcXxHh0fb8KcINpGzq5qme4TRTmy+sT+d5ASY281p9BMc4OvoCLYk1HCIXqqp
	EK5TzSZEtw8EF33knzhU1s0bu4GjJdXlAo12bNLGi28NKu1UdLPwxxakTc/67h28Rtjl
	H06ucXqRXObVnVI/A8O3y+t3+zwE9kCmw3YGZnmJgNa1aiHxMTzevklIdYwAhs6bp7wy
	FpgnHWAYRqVwEChgKB0R3E3Q+z4S0oft7rD9rI13CkXJZ91G3nZ0Yx/gWHjckFIB9E01
	ZpOv+UBUxspu8ezLHxwWCHZwe8HXmUY2Xn0gE0du6cuX8Lu2r2/8U3ew959UTo6YO93V
	0lkQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
	:references;
	bh=n0m4i2vMJNFUgFACsQ//59e+XoqQGNr6/8xagbfhD+E=;
	b=LslIGXosB0Vy+ZxBQKf+axpvKkHh7LzkrVPAiR5pVQmqhnWEkUqybZWOon2dP0jr3j
	S+dzhPN7+ZQt+eA/jZ1RSCMiYup4l6/SC1NBsW54tl2KxcvnOfDBAxh6HOF6Ct2HhEUA
	2Gq335SwPCbsBBsf29Ti+J+eaT/1qYf5dkE68lzjsyqaonmE83069Q6FlPBa9uiNeqpK
	D4cSNz0p3VzqxutdZZQfHZWHOKpJY/y+qS1yR4yTjjC9+YPSc2keNaop3kp05nj3BICe
	BgTHjNkkgcJ22hFs3I4KstRv1jbeV/6ZvvWAbJgaT9uTAue4nduPdeL/wmXEqr63jgub
	mtnQ==
X-Gm-Message-State: 
 AFeK/H0YaDbnsTL0iJ9sj/lJ4rRdmYh7W+vgi0wbl7VwfaxH6b8mDp06T2ive/weug6Zvg==
X-Received: by 10.55.133.193 with SMTP id
	h184mr38315759qkd.205.1491512533192;
	Thu, 06 Apr 2017 14:02:13 -0700 (PDT)
Received: from localhost
	(cpe-2606-A000-4381-1201-225-22FF-FEB3-E51A.dyn6.twc.com.
	[2606:a000:4381:1201:225:22ff:feb3:e51a])
	by smtp.gmail.com with ESMTPSA id
	y196sm1735859qkb.0.2017.04.06.14.02.12
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 06 Apr 2017 14:02:12 -0700 (PDT)
From: Josef Bacik <josef@toxicpanda.com>
X-Google-Original-From: Josef Bacik <jbacik@fb.com>
To: axboe@kernel.dk, nbd-general@lists.sourceforge.net,
	linux-block@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 02/12] nbd: handle single path failures gracefully
Date: Thu,  6 Apr 2017 17:01:57 -0400
Message-Id: <1491512527-4286-3-git-send-email-jbacik@fb.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1491512527-4286-1-git-send-email-jbacik@fb.com>
References: <1491512527-4286-1-git-send-email-jbacik@fb.com>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Currently if we have multiple connections and one of them goes down we will tear
down the whole device.  However there's no reason we need to do this as we
could have other connections that are working fine.  Deal with this by keeping
track of the state of the different connections, and if we lose one we mark it
as dead and send all IO destined for that socket to one of the other healthy
sockets.  Any outstanding requests that were on the dead socket will timeout and
be re-submitted properly.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 151 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 125 insertions(+), 26 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a4fd82e..4e64d60 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -49,6 +49,8 @@ struct nbd_sock {
 	struct mutex tx_lock;
 	struct request *pending;
 	int sent;
+	bool dead;
+	int fallback_index;
 };
 
 #define NBD_TIMEDOUT			0
@@ -82,6 +84,7 @@ struct nbd_device {
 
 struct nbd_cmd {
 	struct nbd_device *nbd;
+	int index;
 	struct completion send_complete;
 };
 
@@ -124,6 +127,15 @@ static const char *nbdcmd_to_ascii(int cmd)
 	return "invalid";
 }
 
+static void nbd_mark_nsock_dead(struct nbd_sock *nsock)
+{
+	if (!nsock->dead)
+		kernel_sock_shutdown(nsock->sock, SHUT_RDWR);
+	nsock->dead = true;
+	nsock->pending = NULL;
+	nsock->sent = 0;
+}
+
 static int nbd_size_clear(struct nbd_device *nbd, struct block_device *bdev)
 {
 	if (bdev->bd_openers <= 1)
@@ -191,7 +203,31 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 	struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
 	struct nbd_device *nbd = cmd->nbd;
 
-	dev_err(nbd_to_dev(nbd), "Connection timed out, shutting down connection\n");
+	if (nbd->num_connections > 1) {
+		dev_err_ratelimited(nbd_to_dev(nbd),
+				    "Connection timed out, retrying\n");
+		mutex_lock(&nbd->config_lock);
+		/*
+		 * Hooray we have more connections, requeue this IO, the submit
+		 * path will put it on a real connection.
+		 */
+		if (nbd->socks && nbd->num_connections > 1) {
+			if (cmd->index < nbd->num_connections) {
+				struct nbd_sock *nsock =
+					nbd->socks[cmd->index];
+				mutex_lock(&nsock->tx_lock);
+				nbd_mark_nsock_dead(nsock);
+				mutex_unlock(&nsock->tx_lock);
+			}
+			mutex_unlock(&nbd->config_lock);
+			blk_mq_requeue_request(req, true);
+			return BLK_EH_NOT_HANDLED;
+		}
+		mutex_unlock(&nbd->config_lock);
+	} else {
+		dev_err_ratelimited(nbd_to_dev(nbd),
+				    "Connection timed out\n");
+	}
 	set_bit(NBD_TIMEDOUT, &nbd->runtime_flags);
 	req->errors = -EIO;
 
@@ -301,6 +337,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		}
 		iov_iter_advance(&from, sent);
 	}
+	cmd->index = index;
 	request.type = htonl(type);
 	if (type != NBD_CMD_FLUSH) {
 		request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9);
@@ -328,7 +365,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		}
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 			"Send control failed (result %d)\n", result);
-		return -EIO;
+		return -EAGAIN;
 	}
 send_pages:
 	if (type != NBD_CMD_WRITE)
@@ -370,7 +407,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 				dev_err(disk_to_dev(nbd->disk),
 					"Send data failed (result %d)\n",
 					result);
-				return -EIO;
+				return -EAGAIN;
 			}
 			/*
 			 * The completion might already have come in,
@@ -389,6 +426,12 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	return 0;
 }
 
+static int nbd_disconnected(struct nbd_device *nbd)
+{
+	return test_bit(NBD_DISCONNECTED, &nbd->runtime_flags) ||
+		test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags);
+}
+
 /* NULL returned = something went wrong, inform userspace */
 static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 {
@@ -405,8 +448,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	iov_iter_kvec(&to, READ | ITER_KVEC, &iov, 1, sizeof(reply));
 	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
 	if (result <= 0) {
-		if (!test_bit(NBD_DISCONNECTED, &nbd->runtime_flags) &&
-		    !test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags))
+		if (!nbd_disconnected(nbd))
 			dev_err(disk_to_dev(nbd->disk),
 				"Receive control failed (result %d)\n", result);
 		return ERR_PTR(result);
@@ -449,8 +491,19 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 			if (result <= 0) {
 				dev_err(disk_to_dev(nbd->disk), "Receive data failed (result %d)\n",
 					result);
-				req->errors = -EIO;
-				return cmd;
+				/*
+				 * If we've disconnected or we only have 1
+				 * connection then we need to make sure we
+				 * complete this request, otherwise error out
+				 * and let the timeout stuff handle resubmitting
+				 * this request onto another connection.
+				 */
+				if (nbd_disconnected(nbd) ||
+				    nbd->num_connections <= 1) {
+					req->errors = -EIO;
+					return cmd;
+				}
+				return ERR_PTR(-EIO);
 			}
 			dev_dbg(nbd_to_dev(nbd), "request %p: got %d bytes data\n",
 				cmd, bvec.bv_len);
@@ -495,19 +548,17 @@ static void recv_work(struct work_struct *work)
 	while (1) {
 		cmd = nbd_read_stat(nbd, args->index);
 		if (IS_ERR(cmd)) {
+			struct nbd_sock *nsock = nbd->socks[args->index];
+
+			mutex_lock(&nsock->tx_lock);
+			nbd_mark_nsock_dead(nsock);
+			mutex_unlock(&nsock->tx_lock);
 			ret = PTR_ERR(cmd);
 			break;
 		}
 
 		nbd_end_request(cmd);
 	}
-
-	/*
-	 * We got an error, shut everybody down if this wasn't the result of a
-	 * disconnect request.
-	 */
-	if (ret && !test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags))
-		sock_shutdown(nbd);
 	atomic_dec(&nbd->recv_threads);
 	wake_up(&nbd->recv_wq);
 }
@@ -531,6 +582,47 @@ static void nbd_clear_que(struct nbd_device *nbd)
 	dev_dbg(disk_to_dev(nbd->disk), "queue cleared\n");
 }
 
+static int find_fallback(struct nbd_device *nbd, int index)
+{
+	int new_index = -1;
+	struct nbd_sock *nsock = nbd->socks[index];
+	int fallback = nsock->fallback_index;
+
+	if (test_bit(NBD_DISCONNECTED, &nbd->runtime_flags))
+		return new_index;
+
+	if (nbd->num_connections <= 1) {
+		dev_err_ratelimited(disk_to_dev(nbd->disk),
+				    "Attempted send on invalid socket\n");
+		return new_index;
+	}
+
+	if (fallback >= 0 && fallback < nbd->num_connections &&
+	    !nbd->socks[fallback]->dead)
+		return fallback;
+
+	if (nsock->fallback_index < 0 ||
+	    nsock->fallback_index >= nbd->num_connections ||
+	    nbd->socks[nsock->fallback_index]->dead) {
+		int i;
+		for (i = 0; i < nbd->num_connections; i++) {
+			if (i == index)
+				continue;
+			if (!nbd->socks[i]->dead) {
+				new_index = i;
+				break;
+			}
+		}
+		nsock->fallback_index = new_index;
+		if (new_index < 0) {
+			dev_err_ratelimited(disk_to_dev(nbd->disk),
+					    "Dead connection, failed to find a fallback\n");
+			return new_index;
+		}
+	}
+	new_index = nsock->fallback_index;
+	return new_index;
+}
 
 static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 {
@@ -544,22 +636,16 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 				    "Attempted send on invalid socket\n");
 		return -EINVAL;
 	}
-
-	if (test_bit(NBD_DISCONNECTED, &nbd->runtime_flags)) {
-		dev_err_ratelimited(disk_to_dev(nbd->disk),
-				    "Attempted send on closed socket\n");
-		return -EINVAL;
-	}
-
 	req->errors = 0;
-
+again:
 	nsock = nbd->socks[index];
 	mutex_lock(&nsock->tx_lock);
-	if (unlikely(!nsock->sock)) {
+	if (nsock->dead) {
+		index = find_fallback(nbd, index);
 		mutex_unlock(&nsock->tx_lock);
-		dev_err_ratelimited(disk_to_dev(nbd->disk),
-				    "Attempted send on closed socket\n");
-		return -EINVAL;
+		if (index < 0)
+			return -EIO;
+		goto again;
 	}
 
 	/* Handle the case that we have a pending request that was partially
@@ -572,7 +658,18 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 		ret = 0;
 		goto out;
 	}
+	/*
+	 * Some failures are related to the link going down, so anything that
+	 * returns EAGAIN can be retried on a different socket.
+	 */
 	ret = nbd_send_cmd(nbd, cmd, index);
+	if (ret == -EAGAIN) {
+		dev_err_ratelimited(disk_to_dev(nbd->disk),
+				    "Request send failed trying another connection\n");
+		nbd_mark_nsock_dead(nsock);
+		mutex_unlock(&nsock->tx_lock);
+		goto again;
+	}
 out:
 	mutex_unlock(&nsock->tx_lock);
 	return ret;
@@ -646,6 +743,8 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 
 	nbd->socks = socks;
 
+	nsock->fallback_index = -1;
+	nsock->dead = false;
 	mutex_init(&nsock->tx_lock);
 	nsock->sock = sock;
 	nsock->pending = NULL;