From patchwork Thu Oct 27 14:05:28 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022201
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 3D60BECAAA1
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:12:56 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynZH38Rbz219m;
	Thu, 27 Oct 2022 07:07:51 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynWw6htpz1wLC
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:05:48 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6E38910090F6;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 62F4DFD4E1; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:28 -0400
Message-Id: <1666879542-10737-2-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 01/15] lnet: o2iblnd: Avoid NULL md deref
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Chris Horn <chris.horn@hpe.com>,
 Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Chris Horn <chris.horn@hpe.com>

struct lnet_msg::msg_md is NULL when a router is forwarding a
REPLY. ko2iblnd attempts to access this pointer on the receive path.
This causes a panic.

Fixes: 4c95c6b6c7 (" lnet: Replace msg_rdma_force with a new md_flag LNET_MD_FLAG_GPU.")
HPE-bug-id: LUS-11269
WC-bug-id: https://jira.whamcloud.com/browse/LU-16211
Lustre-commit: f792297212387c2ff ("LU-16211 o2iblnd: Avoid NULL md deref")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48777
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 6f04096..3e3be065 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1738,7 +1738,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct bio_vec *kiov = lntmsg->msg_kiov;
 	unsigned int offset = lntmsg->msg_offset;
 	unsigned int nob = lntmsg->msg_len;
-	struct lnet_libmd *payload_md = lntmsg->msg_md;
+	struct lnet_libmd *msg_md = lntmsg->msg_md;
 	struct kib_tx *tx;
 	int rc;
 
@@ -1749,7 +1749,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		goto failed_0;
 	}
 
-	tx->tx_gpu = !!(payload_md->md_flags & LNET_MD_FLAG_GPU);
+	tx->tx_gpu = msg_md ? (msg_md->md_flags & LNET_MD_FLAG_GPU) : 0;
 	if (!nob)
 		rc = 0;
 	else
@@ -1847,7 +1847,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	case IBLND_MSG_PUT_REQ: {
 		struct kib_msg *txmsg;
 		struct kib_rdma_desc *rd;
-		struct lnet_libmd *payload_md = lntmsg->msg_md;
+		struct lnet_libmd *msg_md = lntmsg->msg_md;
 
 		ibprm_cookie = rxmsg->ibm_u.putreq.ibprm_cookie;
 
@@ -1867,7 +1867,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			break;
 		}
 
-		tx->tx_gpu = !!(payload_md->md_flags & LNET_MD_FLAG_GPU);
+		tx->tx_gpu = msg_md ? (msg_md->md_flags & LNET_MD_FLAG_GPU) : 0;
+
 		txmsg = tx->tx_msg;
 		rd = &txmsg->ibm_u.putack.ibpam_rd;
 		rc = kiblnd_setup_rd_kiov(ni, tx, rd,

From patchwork Thu Oct 27 14:05:29 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022199
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id A3DF7ECAAA1
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:09:54 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynXz2pygz1y5x;
	Thu, 27 Oct 2022 07:06:43 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynWv3vkVz1wLC
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:05:47 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6E2C810090F4;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 65A39FD4E9; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:29 -0400
Message-Id: <1666879542-10737-3-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 02/15] lnet: support IPv6 in
 lnet_inet_enumerate()
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Mr NeilBrown <neilb@suse.de>

lnet_inet_enumerate() can now optionally report IPv6 addresses on
interfaces.  We use this in socklnd to determine the address of the
interface.

Unlike IPv4, different IPv6 addresses associated with a single
interface cannot be associated with different labels (e.g. eth0:2).
This means that lnet_inet_enumerate() must report the same name for
each address.  For now, we only report the first non-temporary address
to avoid any confusion.

The network mask provided with IPv4 is only use for reporting
information for an ioctl.  It isn't clear this will be useful for
IPv6, so no netmask is collected.

To save a bit of space in struct lnet_inetdev{} which much now hold a
16byte address, we replace he 4byte flag with a 1byte bool as only the
IFF_MASTER flag is ever of interest.  Another bool is needed to report
of the address is IPv6.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10391
Lustre-commit: 781499eee645a635d ("LU-10391 lnet: support IPv6 in lnet_inet_enumerate()")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48572
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h    | 15 +++++---
 net/lnet/klnds/o2iblnd/o2iblnd.c |  4 +--
 net/lnet/klnds/socklnd/socklnd.c | 39 +++++++++++++-------
 net/lnet/lnet/config.c           | 77 ++++++++++++++++++++++++++++++++++------
 4 files changed, 105 insertions(+), 30 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index eb48d29..bd4acef 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -823,14 +823,21 @@ void lnet_connect_console_error(int rc, struct lnet_nid *peer_nid,
 
 struct lnet_inetdev {
 	u32	li_cpt;
-	u32	li_flags;
-	u32	li_ipaddr;
-	u32	li_netmask;
+	union {
+		struct {
+			u32	li_ipaddr;
+			u32	li_netmask;
+		};
+		u32	li_ipv6addr[4];
+	};
 	u32	li_index;
+	bool	li_iff_master;
+	bool	li_ipv6;
 	char	li_name[IFNAMSIZ];
 };
 
-int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns);
+int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns,
+			bool v6);
 void lnet_sock_setbuf(struct socket *socket, int txbufsize, int rxbufsize);
 void lnet_sock_getbuf(struct socket *socket, int *txbufsize, int *rxbufsize);
 int lnet_sock_getaddr(struct socket *socket, bool remote,
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index d5ca1a3..14dd686 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -3034,7 +3034,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
 		goto failed;
 	}
 
-	rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns);
+	rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns, false);
 	if (rc < 0)
 		goto failed;
 
@@ -3062,7 +3062,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
 		ibdev->ibd_ifip = ifaces[i].li_ipaddr;
 		strlcpy(ibdev->ibd_ifname, ifaces[i].li_name,
 			sizeof(ibdev->ibd_ifname));
-		ibdev->ibd_can_failover = !!(ifaces[i].li_flags & IFF_MASTER);
+		ibdev->ibd_can_failover = ifaces[i].li_iff_master;
 
 		INIT_LIST_HEAD(&ibdev->ibd_nets);
 		INIT_LIST_HEAD(&ibdev->ibd_list); /* not yet in kib_devs */
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 00e33c8..8d3c0d6 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -1744,11 +1744,13 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_processid *id)
 			iface = &net->ksnn_interface;
 
 			sa = (void *)&iface->ksni_addr;
-			if (sa->sin_family == AF_INET)
+			if (sa->sin_family == AF_INET) {
 				data->ioc_u32[0] = ntohl(sa->sin_addr.s_addr);
-			else
+				data->ioc_u32[1] = iface->ksni_netmask;
+			} else {
 				data->ioc_u32[0] = 0xFFFFFFFF;
-			data->ioc_u32[1] = iface->ksni_netmask;
+				data->ioc_u32[1] = 0;
+			}
 			data->ioc_u32[2] = iface->ksni_npeers;
 			data->ioc_u32[3] = iface->ksni_nroutes;
 		}
@@ -2443,7 +2445,6 @@ static int ksocknal_inetaddr_event(struct notifier_block *unused,
 	struct ksock_net *net;
 	struct ksock_interface *ksi = NULL;
 	struct lnet_inetdev *ifaces = NULL;
-	struct sockaddr_in *sa;
 	int i = 0;
 	int rc;
 
@@ -2464,7 +2465,7 @@ static int ksocknal_inetaddr_event(struct notifier_block *unused,
 
 	ksocknal_tunables_setup(ni);
 
-	rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns);
+	rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns, true);
 	if (rc < 0)
 		goto fail_1;
 
@@ -2485,11 +2486,26 @@ static int ksocknal_inetaddr_event(struct notifier_block *unused,
 
 	ni->ni_dev_cpt = ifaces[i].li_cpt;
 	ksi->ksni_index = ifaces[i].li_index;
-	sa = (void *)&ksi->ksni_addr;
-	memset(sa, 0, sizeof(*sa));
-	sa->sin_family = AF_INET;
-	sa->sin_addr.s_addr = htonl(ifaces[i].li_ipaddr);
-	ksi->ksni_netmask = ifaces[i].li_netmask;
+	if (ifaces[i].li_ipv6) {
+		struct sockaddr_in6 *sa;
+		sa = (void *)&ksi->ksni_addr;
+		memset(sa, 0, sizeof(*sa));
+		sa->sin6_family = AF_INET6;
+		memcpy(&sa->sin6_addr, ifaces[i].li_ipv6addr,
+		       sizeof(struct in6_addr));
+		ni->ni_nid.nid_size = sizeof(struct in6_addr) - 4;
+		memcpy(&ni->ni_nid.nid_addr, ifaces[i].li_ipv6addr,
+		       sizeof(struct in6_addr));
+	} else {
+		struct sockaddr_in *sa;
+		sa = (void *)&ksi->ksni_addr;
+		memset(sa, 0, sizeof(*sa));
+		sa->sin_family = AF_INET;
+		sa->sin_addr.s_addr = htonl(ifaces[i].li_ipaddr);
+		ksi->ksni_netmask = ifaces[i].li_netmask;
+		ni->ni_nid.nid_size = 4 - 4;
+		ni->ni_nid.nid_addr[0] = sa->sin_addr.s_addr;
+	}
 	strlcpy(ksi->ksni_name, ifaces[i].li_name, sizeof(ksi->ksni_name));
 
 	/* call it before add it to ksocknal_data.ksnd_nets */
@@ -2497,9 +2513,6 @@ static int ksocknal_inetaddr_event(struct notifier_block *unused,
 	if (rc)
 		goto fail_1;
 
-	LASSERT(ksi);
-	LASSERT(ksi->ksni_addr.ss_family == AF_INET);
-	ni->ni_nid.nid_addr[0] = ((struct sockaddr_in *)&ksi->ksni_addr)->sin_addr.s_addr;
 	list_add(&net->ksnn_list, &ksocknal_data.ksnd_nets);
 	net->ksnn_ni = ni;
 	ksocknal_data.ksnd_nnets++;
diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c
index 083a9a2..cebc725 100644
--- a/net/lnet/lnet/config.c
+++ b/net/lnet/lnet/config.c
@@ -31,11 +31,12 @@
  */
 
 #define DEBUG_SUBSYSTEM S_LNET
+#include <linux/ctype.h>
+#include <linux/inetdevice.h>
 #include <linux/nsproxy.h>
 #include <net/net_namespace.h>
-#include <linux/ctype.h>
 #include <linux/lnet/lib-lnet.h>
-#include <linux/inetdevice.h>
+#include <net/addrconf.h>
 
 struct lnet_text_buf {		/* tmp struct for parsing routes */
 	struct list_head	ltb_list;	/* stash on lists */
@@ -1488,7 +1489,7 @@ struct lnet_ni *
 	return count;
 }
 
-int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns)
+int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns, bool v6)
 {
 	struct lnet_inetdev *ifaces = NULL;
 	struct net_device *dev;
@@ -1500,6 +1501,8 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns)
 		int flags = dev_get_flags(dev);
 		const struct in_ifaddr *ifa;
 		struct in_device *in_dev;
+		struct inet6_dev *in6_dev;
+		const struct inet6_ifaddr *ifa6;
 		int node_id;
 		int cpt;
 
@@ -1511,15 +1514,18 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns)
 			      dev->name);
 			continue;
 		}
+
+		node_id = dev_to_node(&dev->dev);
+		cpt = cfs_cpt_of_node(lnet_cpt_table(), node_id);
+
 		in_dev = __in_dev_get_rtnl(dev);
 		if (!in_dev) {
-			CWARN("lnet: Interface %s has no IPv4 status.\n",
-			      dev->name);
-			continue;
+			if (!v6)
+				CWARN("lnet: Interface %s has no IPv4 status.\n",
+				      dev->name);
+			goto try_v6;
 		}
 
-		node_id = dev_to_node(&dev->dev);
-		cpt = cfs_cpt_of_node(lnet_cpt_table(), node_id);
 		in_dev_for_each_ifa_rtnl(ifa, in_dev) {
 			if (nip >= nalloc) {
 				struct lnet_inetdev *tmp;
@@ -1537,7 +1543,8 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns)
 			}
 
 			ifaces[nip].li_cpt = cpt;
-			ifaces[nip].li_flags = flags;
+			ifaces[nip].li_iff_master = !!(flags & IFF_MASTER);
+			ifaces[nip].li_ipv6 = false;
 			ifaces[nip].li_index = dev->ifindex;
 			ifaces[nip].li_ipaddr = ntohl(ifa->ifa_local);
 			ifaces[nip].li_netmask = ntohl(ifa->ifa_mask);
@@ -1545,6 +1552,53 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns)
 				sizeof(ifaces[nip].li_name));
 			nip++;
 		}
+	try_v6:
+		if (!v6)
+			continue;
+#if IS_ENABLED(CONFIG_IPV6)
+		in6_dev = __in6_dev_get(dev);
+		if (!in6_dev) {
+			if (!in_dev)
+				CWARN("lnet: Interface %s has no IP status.\n",
+				      dev->name);
+			continue;
+		}
+
+		list_for_each_entry_rcu(ifa6, &in6_dev->addr_list, if_list) {
+			if (ifa6->flags & IFA_F_TEMPORARY)
+				continue;
+			if (nip >= nalloc) {
+				struct lnet_inetdev *tmp;
+
+				nalloc += LNET_INTERFACES_NUM;
+				tmp = krealloc(ifaces, nalloc * sizeof(*tmp),
+					       GFP_KERNEL);
+				if (!tmp) {
+					kfree(ifaces);
+					ifaces = NULL;
+					nip = -ENOMEM;
+					goto unlock_rtnl;
+				}
+				ifaces = tmp;
+			}
+
+			ifaces[nip].li_cpt = cpt;
+			ifaces[nip].li_iff_master = !!(flags & IFF_MASTER);
+			ifaces[nip].li_ipv6 = true;
+			ifaces[nip].li_index = dev->ifindex;
+			memcpy(ifaces[nip].li_ipv6addr,
+			       &ifa6->addr, sizeof(struct in6_addr));
+			strlcpy(ifaces[nip].li_name, dev->name,
+				sizeof(ifaces[nip].li_name));
+			nip++;
+			/* As different IPv6 addresses don't have unique
+			 * labels, it is safest just to use the first
+			 * and ignore the rest.
+			 */
+			break;
+		}
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+
 	}
 unlock_rtnl:
 	rtnl_unlock();
@@ -1569,9 +1623,10 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns)
 	int i;
 
 	if (current->nsproxy && current->nsproxy->net_ns)
-		nip = lnet_inet_enumerate(&ifaces, current->nsproxy->net_ns);
+		nip = lnet_inet_enumerate(&ifaces, current->nsproxy->net_ns,
+					  false);
 	else
-		nip = lnet_inet_enumerate(&ifaces, &init_net);
+		nip = lnet_inet_enumerate(&ifaces, &init_net, false);
 	if (nip < 0) {
 		if (nip != -ENOENT) {
 			LCONSOLE_ERROR_MSG(0x117,

From patchwork Thu Oct 27 14:05:30 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022198
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 8DA5DFA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:07:31 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynXb5k5cz1yDf;
	Thu, 27 Oct 2022 07:06:23 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynWy3Kd3z1wLC
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:05:50 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 74B6510090F7;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 6A070FD4F2; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:30 -0400
Message-Id: <1666879542-10737-4-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 03/15] lustre: sec: retry ro mount if
 read-only flag set
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Sebastien Buisson <sbuisson@ddn.com>

In case client mount fails with -EROFS because the read-only nodemap
flag is set and ro mount option is not specified, just retry ro mount
internally. This is to avoid the need for users to manually retry the
mount with ro option.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15451
Lustre-commit: 56b5b5be43d88e604 ("LU-15451 sec: retry ro mount if read-only flag set")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47490
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/fld/fld_request.c       |  1 +
 fs/lustre/include/lu_object.h     |  1 +
 fs/lustre/ldlm/ldlm_lib.c         | 13 +++++++++++--
 fs/lustre/llite/llite_lib.c       | 21 ++++++++++++++++++---
 fs/lustre/lmv/lmv_obd.c           | 23 ++++++++++++++++-------
 fs/lustre/obdclass/lu_tgt_descs.c |  5 +++--
 6 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/fs/lustre/fld/fld_request.c b/fs/lustre/fld/fld_request.c
index b365dc2..bafd5a9 100644
--- a/fs/lustre/fld/fld_request.c
+++ b/fs/lustre/fld/fld_request.c
@@ -224,6 +224,7 @@ static void fld_client_debugfs_init(struct lu_client_fld *fld)
 
 	ldebugfs_add_vars(fld->lcf_debugfs_entry, fld_client_debugfs_list, fld);
 }
+EXPORT_SYMBOL(fld_client_del_target);
 
 void fld_client_debugfs_fini(struct lu_client_fld *fld)
 {
diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h
index 5c7f439..4e101fa 100644
--- a/fs/lustre/include/lu_object.h
+++ b/fs/lustre/include/lu_object.h
@@ -1592,6 +1592,7 @@ struct lu_tgt_descs {
 
 u64 lu_prandom_u64_max(u64 ep_ro);
 int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd);
+int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd);
 void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt);
 
 int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt);
diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index 804bb9c..08aff4f 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -593,6 +593,16 @@ int client_connect_import(const struct lu_env *env,
 
 	LASSERT(obd->obd_namespace);
 
+	spin_lock(&imp->imp_lock);
+	if (imp->imp_state == LUSTRE_IMP_CLOSED && imp->imp_deactive) {
+		/* need to reactivate import if trying to connect
+		 * to a previously disconnected
+		 */
+		imp->imp_deactive = 0;
+		imp->imp_invalid = 0;
+	}
+	spin_unlock(&imp->imp_lock);
+
 	imp->imp_dlm_handle = conn;
 	rc = ptlrpc_init_import(imp);
 	if (rc != 0)
@@ -631,8 +641,7 @@ int client_connect_import(const struct lu_env *env,
 out_sem:
 	up_write(&cli->cl_sem);
 
-	if (!rc && localdata) {
-		LASSERT(!cli->cl_cache); /* only once */
+	if (!rc && localdata && !cli->cl_cache) {
 		cli->cl_cache = (struct cl_client_cache *)localdata;
 		cl_cache_incref(cli->cl_cache);
 		cli->cl_lru_left = &cli->cl_cache->ccc_lru_left;
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 191a83c..55a9202 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -363,6 +363,9 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 
 	data->ocd_brw_size = MD_MAX_BRW_SIZE;
 
+retry_connect:
+	if (sb_rdonly(sb))
+		data->ocd_connect_flags |= OBD_CONNECT_RDONLY;
 	err = obd_connect(NULL, &sbi->ll_md_exp, sbi->ll_md_obd,
 			  &sbi->ll_sb_uuid, data, sbi->ll_cache);
 	if (err == -EBUSY) {
@@ -405,8 +408,20 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	err = obd_statfs(NULL, sbi->ll_md_exp, osfs,
 			 ktime_get_seconds() - sbi->ll_statfs_max_age,
 			 OBD_STATFS_FOR_MDT0);
-	if (err)
+	if (err == -EROFS && !sb_rdonly(sb)) {
+		/* We got -EROFS from the server, maybe it is imposing
+		 * read-only mount. So just retry like this.
+		 */
+		LCONSOLE_INFO("Forcing read-only mount.\n\r");
+		CERROR("%s: mount failed with %d, forcing read-only mount.\n",
+		       sbi->ll_md_exp->exp_obd->obd_name, err);
+		sb->s_flags |= SB_RDONLY;
+		obd_fid_fini(sbi->ll_md_exp->exp_obd);
+		obd_disconnect(sbi->ll_md_exp);
+		goto retry_connect;
+	} else if (err) {
 		goto out_md_fid;
+	}
 
 	/* This needs to be after statfs to ensure connect has finished.
 	 * Note that "data" does NOT contain the valid connect reply.
@@ -1329,8 +1344,8 @@ int ll_fill_super(struct super_block *sb)
 	if (err)
 		ll_put_super(sb);
 	else if (test_bit(LL_SBI_VERBOSE, sbi->ll_flags))
-		LCONSOLE_WARN("Mounted %s\n", profilenm);
-
+		LCONSOLE_WARN("Mounted %s%s\n", profilenm,
+			      sb_rdonly(sb) ? " read-only" : "");
 	return err;
 } /* ll_fill_super */
 
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 84d583e..3a02cc1 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -494,24 +494,33 @@ static int lmv_disconnect_mdc(struct obd_device *obd, struct lmv_tgt_desc *tgt)
 					  mdc_obd->obd_name);
 	}
 
+	rc = lu_qos_del_tgt(&lmv->lmv_qos, tgt);
+	if (rc)
+		CERROR("%s: Can't del target from QoS table: rc = %d\n",
+		       tgt->ltd_exp->exp_obd->obd_name, rc);
+
+	rc = fld_client_del_target(&lmv->lmv_fld, tgt->ltd_index);
+	if (rc)
+		CERROR("%s: Can't del fld targets: rc = %d\n",
+		       tgt->ltd_exp->exp_obd->obd_name, rc);
+
 	rc = obd_fid_fini(tgt->ltd_exp->exp_obd);
 	if (rc)
-		CERROR("Can't finalize fids factory\n");
+		CERROR("%s: Can't finalize fids factory: rc = %d\n",
+		       tgt->ltd_exp->exp_obd->obd_name, rc);
 
 	CDEBUG(D_INFO, "Disconnected from %s(%s) successfully\n",
 	       tgt->ltd_exp->exp_obd->obd_name,
 	       tgt->ltd_exp->exp_obd->obd_uuid.uuid);
 
+	lmv_activate_target(lmv, tgt, 0);
 	obd_register_observer(tgt->ltd_exp->exp_obd, NULL);
 	rc = obd_disconnect(tgt->ltd_exp);
 	if (rc) {
-		if (tgt->ltd_active) {
-			CERROR("Target %s disconnect error %d\n",
-			       tgt->ltd_uuid.uuid, rc);
-		}
+		CERROR("%s: Target %s disconnect error: rc = %d\n",
+		       tgt->ltd_exp->exp_obd->obd_name,
+		       tgt->ltd_uuid.uuid, rc);
 	}
-
-	lmv_activate_target(lmv, tgt, 0);
 	tgt->ltd_exp = NULL;
 	return 0;
 }
diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c
index 51d2e21..7394789 100644
--- a/fs/lustre/obdclass/lu_tgt_descs.c
+++ b/fs/lustre/obdclass/lu_tgt_descs.c
@@ -170,7 +170,7 @@ int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *tgt)
  * Return:	0 on success
  *		-ENOENT if no server was found
  */
-static int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd)
+int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd)
 {
 	struct lu_svr_qos *svr;
 	int rc = 0;
@@ -182,12 +182,12 @@ static int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd)
 		goto out;
 	}
 
+	ltd->ltd_qos.ltq_svr = NULL;
 	svr->lsq_tgt_count--;
 	if (svr->lsq_tgt_count == 0) {
 		CDEBUG(D_OTHER, "removing server %s\n",
 		       obd_uuid2str(&svr->lsq_uuid));
 		list_del(&svr->lsq_svr_list);
-		ltd->ltd_qos.ltq_svr = NULL;
 		kfree(svr);
 	}
 
@@ -196,6 +196,7 @@ static int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd)
 	up_write(&qos->lq_rw_sem);
 	return rc;
 }
+EXPORT_SYMBOL(lu_qos_del_tgt);
 
 static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt)
 {

From patchwork Thu Oct 27 14:05:31 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022200
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id A0F4BFA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:11:06 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynYR3TVtz215Q;
	Thu, 27 Oct 2022 07:07:07 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynX30tGcz1wdw
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:05:55 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 768BF10090F8;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 6F691FD4F7; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:31 -0400
Message-Id: <1666879542-10737-5-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 04/15] lustre: ptlrpc: reduce lock contention
 in ptlrpc_free_committed
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Andreas Dilger <adilger@whamcloud.com>

This patch breaks out of the loop in ptlrpc_free_committed()
if need_resched() is true or there are other threads waiting
on the imp_lock. This can avoid the thread holding the
CPU for too long time to free large number of requests. The
remaining requests in the list will be processed the next
time this function is called. That also avoids delaying a
single thread too long if the list is long.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16180
Lustre-commit: d3074511f3ee322d ("LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48629
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_import.h       |  6 ++
 fs/lustre/obdclass/genops.c             |  1 +
 fs/lustre/ptlrpc/client.c               | 99 ++++++++++++++++++++++++++++-----
 include/uapi/linux/lustre/lustre_user.h |  2 +-
 4 files changed, 93 insertions(+), 15 deletions(-)

diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h
index 8c1fe65..3ae05b5 100644
--- a/fs/lustre/include/lustre_import.h
+++ b/fs/lustre/include/lustre_import.h
@@ -279,6 +279,12 @@ struct obd_import {
 	/** Protects flags, level, generation, conn_cnt, *_list */
 	spinlock_t			imp_lock;
 
+	/**
+	 * A "sentinel" value used to check if there are other threads
+	 * waiting on the imp_lock.
+	 */
+	atomic_t			imp_waiting;
+
 	/* flags */
 	unsigned long			imp_invalid:1,    /* evicted */
 					/* administratively disabled */
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 81e3498..2031320 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -997,6 +997,7 @@ struct obd_import *class_new_import(struct obd_device *obd)
 	atomic_set(&imp->imp_replay_inflight, 0);
 	init_waitqueue_head(&imp->imp_replay_waitq);
 	atomic_set(&imp->imp_inval_count, 0);
+	atomic_set(&imp->imp_waiting, 0);
 	INIT_LIST_HEAD(&imp->imp_conn_list);
 	init_imp_at(&imp->imp_at);
 
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 5f0ff47..6c1d98d 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -1507,7 +1507,15 @@ static int after_reply(struct ptlrpc_request *req)
 	}
 
 	if (imp->imp_replayable) {
+		/* if other threads are waiting for ptlrpc_free_committed()
+		 * they could continue the work of freeing RPCs. That reduces
+		 * lock hold times, and distributes work more fairly across
+		 * waiting threads.  We can't use spin_is_contended() since
+		 * there are many other places where imp_lock is held.
+		 */
+		atomic_inc(&imp->imp_waiting);
 		spin_lock(&imp->imp_lock);
+		atomic_dec(&imp->imp_waiting);
 		/*
 		 * No point in adding already-committed requests to the replay
 		 * list, we will just remove them immediately. b=9829
@@ -1528,7 +1536,9 @@ static int after_reply(struct ptlrpc_request *req)
 			 */
 			spin_unlock(&imp->imp_lock);
 			req->rq_commit_cb(req);
+			atomic_inc(&imp->imp_waiting);
 			spin_lock(&imp->imp_lock);
+			atomic_dec(&imp->imp_waiting);
 		}
 
 		/* Replay-enabled imports return commit-status information. */
@@ -2754,25 +2764,33 @@ void ptlrpc_free_committed(struct obd_import *imp)
 	struct ptlrpc_request *req, *saved;
 	struct ptlrpc_request *last_req = NULL; /* temporary fire escape */
 	bool skip_committed_list = true;
+	unsigned int replay_scanned = 0, replay_freed = 0;
+	unsigned int commit_scanned = 0, commit_freed = 0;
+	unsigned int debug_level = D_INFO;
+	u64 peer_committed_transno;
+	int imp_generation;
+	time64_t start, now;
 
 	assert_spin_locked(&imp->imp_lock);
 
-	if (imp->imp_peer_committed_transno == imp->imp_last_transno_checked &&
-	    imp->imp_generation == imp->imp_last_generation_checked) {
+	start = ktime_get_seconds();
+	/* save these here, we can potentially drop imp_lock after checking */
+	peer_committed_transno = imp->imp_peer_committed_transno;
+	imp_generation = imp->imp_generation;
+
+	if (peer_committed_transno == imp->imp_last_transno_checked &&
+	    imp_generation == imp->imp_last_generation_checked) {
 		CDEBUG(D_INFO, "%s: skip recheck: last_committed %llu\n",
-		       imp->imp_obd->obd_name, imp->imp_peer_committed_transno);
+		       imp->imp_obd->obd_name, peer_committed_transno);
 		return;
 	}
 	CDEBUG(D_RPCTRACE, "%s: committing for last_committed %llu gen %d\n",
-	       imp->imp_obd->obd_name, imp->imp_peer_committed_transno,
-	       imp->imp_generation);
+	       imp->imp_obd->obd_name, peer_committed_transno, imp_generation);
 
-	if (imp->imp_generation != imp->imp_last_generation_checked ||
+	if (imp_generation != imp->imp_last_generation_checked ||
 	    !imp->imp_last_transno_checked)
 		skip_committed_list = false;
-
-	imp->imp_last_transno_checked = imp->imp_peer_committed_transno;
-	imp->imp_last_generation_checked = imp->imp_generation;
+	/* maybe drop imp_lock here, if another lock protected the lists */
 
 	list_for_each_entry_safe(req, saved, &imp->imp_replay_list,
 				 rq_replay_list) {
@@ -2784,7 +2802,27 @@ void ptlrpc_free_committed(struct obd_import *imp)
 			DEBUG_REQ(D_EMERG, req, "zero transno during replay");
 			LBUG();
 		}
-		if (req->rq_import_generation < imp->imp_generation) {
+
+		/* If other threads are waiting on imp_lock, stop processing
+		 * in this thread. Another thread can finish remaining work.
+		 * This may happen if there are huge numbers of open files
+		 * that are closed suddenly or evicted, or if the server
+		 * commit interval is very high vs. RPC rate.
+		 */
+		if (++replay_scanned % 2048 == 0) {
+			now = ktime_get_seconds();
+			if (now > start + 5)
+				debug_level = D_WARNING;
+
+			if ((replay_freed > 128 && now > start + 3) &&
+			    atomic_read(&imp->imp_waiting)) {
+				if (debug_level == D_INFO)
+					debug_level = D_RPCTRACE;
+				break;
+			}
+		}
+
+		if (req->rq_import_generation < imp_generation) {
 			DEBUG_REQ(D_RPCTRACE, req, "free request with old gen");
 			goto free_req;
 		}
@@ -2803,29 +2841,62 @@ void ptlrpc_free_committed(struct obd_import *imp)
 		}
 
 		DEBUG_REQ(D_INFO, req, "commit (last_committed %llu)",
-			  imp->imp_peer_committed_transno);
+			  peer_committed_transno);
 free_req:
+		replay_freed++;
 		ptlrpc_free_request(req);
 	}
+
 	if (skip_committed_list)
-		return;
+		goto out;
 
 	list_for_each_entry_safe(req, saved, &imp->imp_committed_list,
 				 rq_replay_list) {
 		LASSERT(req->rq_transno != 0);
-		if (req->rq_import_generation < imp->imp_generation ||
+
+		/* If other threads are waiting on imp_lock, stop processing
+		 * in this thread. Another thread can finish remaining work.
+		 */
+		if (++commit_scanned % 2048 == 0) {
+			now = ktime_get_seconds();
+			if (now > start + 6)
+				debug_level = D_WARNING;
+
+			if ((commit_freed > 128 && now > start + 4) &&
+			    atomic_read(&imp->imp_waiting)) {
+				if (debug_level == D_INFO)
+					debug_level = D_RPCTRACE;
+				break;
+			}
+		}
+
+		if (req->rq_import_generation < imp_generation ||
 		    !req->rq_replay) {
 			DEBUG_REQ(D_RPCTRACE, req, "free %s open request",
 				  req->rq_import_generation <
-				  imp->imp_generation ? "stale" : "closed");
+				  imp_generation ? "stale" : "closed");
 
 			if (imp->imp_replay_cursor == &req->rq_replay_list)
 				imp->imp_replay_cursor =
 					req->rq_replay_list.next;
 
+			commit_freed++;
 			ptlrpc_free_request(req);
 		}
 	}
+out:
+	/* if full lists processed without interruption, avoid next scan */
+	if (debug_level == D_INFO) {
+		imp->imp_last_transno_checked = peer_committed_transno;
+		imp->imp_last_generation_checked = imp_generation;
+	}
+
+	CDEBUG_LIMIT(debug_level,
+		     "%s: %s: skip=%u replay=%u/%u committed=%u/%u\n",
+		     imp->imp_obd->obd_name,
+		     debug_level == D_INFO ? "normal" : "overloaded",
+		     skip_committed_list, replay_freed, replay_scanned,
+		     commit_freed, commit_scanned);
 }
 
 /**
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index db18cd5..e97ccb0 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -987,7 +987,7 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen)
  */
 #define SFID "0x%llx:0x%x:0x%x"
 #define RFID(fid) &((fid)->f_seq), &((fid)->f_oid), &((fid)->f_ver)
-#define PLOGID(logid) ((unsigned long long)(logid)->lgl_oi.oi.oi_seq, (__u32)(logid)->lgl_oi.oi.oi_id, 0)
+#define PLOGID(logid) (unsigned long long)(logid)->lgl_oi.oi.oi_seq, (__u32)(logid)->lgl_oi.oi.oi_id, 0
 
 /********* Quotas **********/
 

From patchwork Thu Oct 27 14:05:32 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022203
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 53AFAFA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:15:57 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynbX6Kqwz1wJw;
	Thu, 27 Oct 2022 07:08:56 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynXN3Mvjz1yCm
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:06:12 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 7D9F310090FA;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 737EBFD4FB; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:32 -0400
Message-Id: <1666879542-10737-6-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 05/15] lustre: llite: only statfs for projid
 if PROJINHERIT set
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Andreas Dilger <adilger@whamcloud.com>

If projid is set on a directory but PROJINHERIT is not, do not report
the project quota for statfs.  This matches how ext4_statfs() and
xfs_fs_statfs() behave, on which Lustre project quota is modelled.

Fixes: 323e22e731 ("lustre: quota: df should return projid-specific values")
WC-bug-id: https://jira.whamcloud.com/browse/LU-15721
Lustre-commit: 59f0d691686c9ab8e ("LU-15721 llite: only statfs for projid if PROJINHERIT set")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47352
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_lib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 55a9202..81c7fa3 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2510,7 +2510,8 @@ int ll_statfs(struct dentry *de, struct kstatfs *sfs)
 	sfs->f_bavail = osfs.os_bavail;
 	sfs->f_fsid.val[0] = (u32)fsid;
 	sfs->f_fsid.val[1] = (u32)(fsid >> 32);
-	if (ll_i2info(de->d_inode)->lli_projid)
+	if (ll_i2info(de->d_inode)->lli_projid &&
+	    test_bit(LLIF_PROJECT_INHERIT, &ll_i2info(de->d_inode)->lli_flags))
 		return ll_statfs_project(de->d_inode, sfs);
 
 	ll_stats_ops_tally(ll_s2sbi(sb), LPROC_LL_STATFS,

From patchwork Thu Oct 27 14:05:33 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022208
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 32988FA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:22:09 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4Myncb1vsnz21Jc;
	Thu, 27 Oct 2022 07:09:51 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynZY2yhHz21BD
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:08:05 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 7FB1A1009100;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 767B7FD4FC; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:33 -0400
Message-Id: <1666879542-10737-7-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 06/15] lustre: llite: revert: "lustre: llite:
 prevent mulitple group locks"
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Vitaly Fertman <vitaly.fertman@hpe.com>,
 Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Vitaly Fertman <vitaly.fertman@hpe.com>

This reverts commit a1fd83a981c5813e3d9bc031c767bb21ba2305d2
since it makes group unlock synchronous what leads to poor performance
on shared file IO under group lock.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16046
Lustre-commit: bc37f89a81ea0a2fa ("LU-16046 revert: "LU-9964 llite: prevent mulitple group locks"")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48037
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_request.c    |  3 +-
 fs/lustre/llite/file.c           | 76 ++++++++++++++--------------------------
 fs/lustre/llite/llite_internal.h |  3 --
 fs/lustre/llite/llite_lib.c      |  3 --
 fs/lustre/osc/osc_lock.c         |  2 --
 5 files changed, 27 insertions(+), 60 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 56ae9b1..cf5a290 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -773,8 +773,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp,
 	lock->l_conn_export = exp;
 	lock->l_export = NULL;
 	lock->l_blocking_ast = einfo->ei_cb_bl;
-	lock->l_flags |= (*flags & (LDLM_FL_NO_LRU | LDLM_FL_EXCL |
-				    LDLM_FL_ATOMIC_CB));
+	lock->l_flags |= (*flags & (LDLM_FL_NO_LRU | LDLM_FL_EXCL));
 	lock->l_activity = ktime_get_real_seconds();
 
 	/* lock not sent to server yet */
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index e75f482..f96557e 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -2448,30 +2448,15 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 	if (ll_file_nolock(file))
 		return -EOPNOTSUPP;
 
-retry:
-	if (file->f_flags & O_NONBLOCK) {
-		if (!mutex_trylock(&lli->lli_group_mutex))
-			return -EAGAIN;
-	} else
-		mutex_lock(&lli->lli_group_mutex);
-
+	read_lock(&lli->lli_lock);
 	if (fd->fd_flags & LL_FILE_GROUP_LOCKED) {
 		CWARN("group lock already existed with gid %lu\n",
 		      fd->fd_grouplock.lg_gid);
-		rc = -EINVAL;
-		goto out;
-	}
-	if (arg != lli->lli_group_gid && lli->lli_group_users != 0) {
-		if (file->f_flags & O_NONBLOCK) {
-			rc = -EAGAIN;
-			goto out;
-		}
-		mutex_unlock(&lli->lli_group_mutex);
-		wait_var_event(&lli->lli_group_users, !lli->lli_group_users);
-		rc = 0;
-		goto retry;
+		read_unlock(&lli->lli_lock);
+		return -EINVAL;
 	}
 	LASSERT(!fd->fd_grouplock.lg_lock);
+	read_unlock(&lli->lli_lock);
 
 	/**
 	 * XXX: group lock needs to protect all OST objects while PFL
@@ -2490,10 +2475,8 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 		u16 refcheck;
 
 		env = cl_env_get(&refcheck);
-		if (IS_ERR(env)) {
-			rc = PTR_ERR(env);
-			goto out;
-		}
+		if (IS_ERR(env))
+			return PTR_ERR(env);
 
 		rc = cl_object_layout_get(env, obj, &cl);
 		if (rc >= 0 && cl.cl_is_composite)
@@ -2502,26 +2485,28 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file,
 
 		cl_env_put(env, &refcheck);
 		if (rc < 0)
-			goto out;
+			return rc;
 	}
 
 	rc = cl_get_grouplock(ll_i2info(inode)->lli_clob,
 			      arg, (file->f_flags & O_NONBLOCK), &grouplock);
-
 	if (rc)
-		goto out;
+		return rc;
+
+	write_lock(&lli->lli_lock);
+	if (fd->fd_flags & LL_FILE_GROUP_LOCKED) {
+		write_unlock(&lli->lli_lock);
+		CERROR("another thread just won the race\n");
+		cl_put_grouplock(&grouplock);
+		return -EINVAL;
+	}
 
 	fd->fd_flags |= LL_FILE_GROUP_LOCKED;
 	fd->fd_grouplock = grouplock;
-	if (lli->lli_group_users == 0)
-		lli->lli_group_gid = grouplock.lg_gid;
-	lli->lli_group_users++;
+	write_unlock(&lli->lli_lock);
 
 	CDEBUG(D_INFO, "group lock %lu obtained\n", arg);
-out:
-	mutex_unlock(&lli->lli_group_mutex);
-
-	return rc;
+	return 0;
 }
 
 static int ll_put_grouplock(struct inode *inode, struct file *file,
@@ -2530,40 +2515,31 @@ static int ll_put_grouplock(struct inode *inode, struct file *file,
 	struct ll_inode_info *lli = ll_i2info(inode);
 	struct ll_file_data *fd = file->private_data;
 	struct ll_grouplock grouplock;
-	int rc;
 
-	mutex_lock(&lli->lli_group_mutex);
+	write_lock(&lli->lli_lock);
 	if (!(fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
+		write_unlock(&lli->lli_lock);
 		CWARN("no group lock held\n");
-		rc = -EINVAL;
-		goto out;
+		return -EINVAL;
 	}
+
 	LASSERT(fd->fd_grouplock.lg_lock);
 
 	if (fd->fd_grouplock.lg_gid != arg) {
 		CWARN("group lock %lu doesn't match current id %lu\n",
 		      arg, fd->fd_grouplock.lg_gid);
-		rc = -EINVAL;
-		goto out;
+		write_unlock(&lli->lli_lock);
+		return -EINVAL;
 	}
 
 	grouplock = fd->fd_grouplock;
 	memset(&fd->fd_grouplock, 0, sizeof(fd->fd_grouplock));
 	fd->fd_flags &= ~LL_FILE_GROUP_LOCKED;
+	write_unlock(&lli->lli_lock);
 
 	cl_put_grouplock(&grouplock);
-
-	lli->lli_group_users--;
-	if (lli->lli_group_users == 0) {
-		lli->lli_group_gid = 0;
-		wake_up_var(&lli->lli_group_users);
-	}
 	CDEBUG(D_INFO, "group lock %lu released\n", arg);
-	rc = 0;
-out:
-	mutex_unlock(&lli->lli_group_mutex);
-
-	return rc;
+	return 0;
 }
 
 /**
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 6d85b96..e7e4387 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -253,9 +253,6 @@ struct ll_inode_info {
 			u64				lli_pcc_generation;
 			enum pcc_dataset_flags		lli_pcc_dsflags;
 			struct pcc_inode		*lli_pcc_inode;
-			struct mutex			lli_group_mutex;
-			u64				lli_group_users;
-			unsigned long			lli_group_gid;
 
 			u64				lli_attr_valid;
 			u64				lli_lazysize;
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 81c7fa3..645fbd9 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1194,9 +1194,6 @@ void ll_lli_init(struct ll_inode_info *lli)
 		lli->lli_pcc_inode = NULL;
 		lli->lli_pcc_dsflags = PCC_DATASET_INVALID;
 		lli->lli_pcc_generation = 0;
-		mutex_init(&lli->lli_group_mutex);
-		lli->lli_group_users = 0;
-		lli->lli_group_gid = 0;
 	}
 	mutex_init(&lli->lli_layout_mutex);
 	memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid));
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index dd10949..3b22688 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -1221,8 +1221,6 @@ int osc_lock_init(const struct lu_env *env,
 
 	oscl->ols_flags = osc_enq2ldlm_flags(enqflags);
 	oscl->ols_speculative = !!(enqflags & CEF_SPECULATIVE);
-	if (lock->cll_descr.cld_mode == CLM_GROUP)
-		oscl->ols_flags |= LDLM_FL_ATOMIC_CB;
 
 	if (oscl->ols_flags & LDLM_FL_HAS_INTENT) {
 		oscl->ols_flags |= LDLM_FL_BLOCK_GRANTED;

From patchwork Thu Oct 27 14:05:34 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022202
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 5C0C8FA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:13:17 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynZY6GRXz21BM;
	Thu, 27 Oct 2022 07:08:05 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynZ51LRpz217K
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:07:41 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 7F9C910090FF;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 7A9B4E8CAE; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:34 -0400
Message-Id: <1666879542-10737-8-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 07/15] lustre: ldlm: group lock fix
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Vitaly Fertman <vitaly.fertman@hpe.com>,
 Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Vitaly Fertman <vitaly.fertman@hpe.com>

The original LU-9964 fix had a problem because with many pages in
memory grouplock unlock takes 10+ seconds just to discard them.

The current patch makes grouplock unlock asynchronous. It introduces
a logic similar to the original one, but on mdc/osc layer.

HPE-bug-id: LUS-10644, LUS-10906
WC-bug-id: https://jira.whamcloud.com/browse/LU-16046
Lustre-commit: 3ffcb5b700ebfd68 ("LU-16046 ldlm: group lock fix")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/159856
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48038
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h |  15 ++++
 fs/lustre/mdc/mdc_dev.c        |  46 ++++++++++--
 fs/lustre/osc/osc_lock.c       | 157 +++++++++++++++++++++++++++++++++++++++--
 fs/lustre/osc/osc_object.c     |  16 +++++
 4 files changed, 222 insertions(+), 12 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 884eafe..2e8c184 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -321,6 +321,11 @@ struct osc_object {
 
 	const struct osc_object_operations *oo_obj_ops;
 	bool			oo_initialized;
+
+	wait_queue_head_t	oo_group_waitq;
+	struct mutex		oo_group_mutex;
+	u64			oo_group_users;
+	unsigned long		oo_group_gid;
 };
 
 static inline void osc_build_res_name(struct osc_object *osc,
@@ -657,6 +662,16 @@ int osc_object_glimpse(const struct lu_env *env, const struct cl_object *obj,
 int osc_object_find_cbdata(const struct lu_env *env, struct cl_object *obj,
 			   ldlm_iterator_t iter, void *data);
 int osc_object_prune(const struct lu_env *env, struct cl_object *obj);
+void osc_grouplock_inc_locked(struct osc_object *osc, struct ldlm_lock *lock);
+void osc_grouplock_dec(struct osc_object *osc, struct ldlm_lock *lock);
+int osc_grouplock_enqueue_init(const struct lu_env *env,
+			       struct osc_object *obj,
+			       struct osc_lock *oscl,
+			       struct lustre_handle *lh);
+void osc_grouplock_enqueue_fini(const struct lu_env *env,
+				struct osc_object *obj,
+				struct osc_lock *oscl,
+				struct lustre_handle *lh);
 
 /* osc_request.c */
 void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd);
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 2fd137d..978fee3 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -330,6 +330,7 @@ static int mdc_dlm_canceling(const struct lu_env *env,
 	 */
 	if (obj) {
 		struct cl_attr *attr = &osc_env_info(env)->oti_attr;
+		void *data;
 
 		/* Destroy pages covered by the extent of the DLM lock */
 		result = mdc_lock_flush(env, cl2osc(obj), cl_index(obj, 0),
@@ -339,12 +340,17 @@ static int mdc_dlm_canceling(const struct lu_env *env,
 		 */
 		/* losing a lock, update kms */
 		lock_res_and_lock(dlmlock);
+		data = dlmlock->l_ast_data;
 		dlmlock->l_ast_data = NULL;
 		cl_object_attr_lock(obj);
 		attr->cat_kms = 0;
 		cl_object_attr_update(env, obj, attr, CAT_KMS);
 		cl_object_attr_unlock(obj);
 		unlock_res_and_lock(dlmlock);
+
+		/* Skip dec in case mdc_object_ast_clear() did it */
+		if (data && dlmlock->l_req_mode == LCK_GROUP)
+			osc_grouplock_dec(cl2osc(obj), dlmlock);
 		cl_object_put(env, obj);
 	}
 	return result;
@@ -451,7 +457,7 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc,
 }
 
 static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
-			     struct lustre_handle *lockh)
+			     struct lustre_handle *lockh, int errcode)
 {
 	struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj);
 	struct ldlm_lock *dlmlock;
@@ -504,6 +510,9 @@ static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
 
 	LASSERT(oscl->ols_state != OLS_GRANTED);
 	oscl->ols_state = OLS_GRANTED;
+
+	if (errcode != ELDLM_LOCK_MATCHED && dlmlock->l_req_mode == LCK_GROUP)
+		osc_grouplock_inc_locked(osc, dlmlock);
 }
 
 /**
@@ -535,7 +544,7 @@ static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 
 	CDEBUG(D_INODE, "rc %d, err %d\n", rc, errcode);
 	if (rc == 0)
-		mdc_lock_granted(env, oscl, lockh);
+		mdc_lock_granted(env, oscl, lockh, errcode);
 
 	/* Error handling, some errors are tolerable. */
 	if (oscl->ols_glimpse && rc == -ENAVAIL) {
@@ -824,9 +833,9 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp,
  *
  * This function does not wait for the network communication to complete.
  */
-static int mdc_lock_enqueue(const struct lu_env *env,
-			    const struct cl_lock_slice *slice,
-			    struct cl_io *unused, struct cl_sync_io *anchor)
+static int __mdc_lock_enqueue(const struct lu_env *env,
+			      const struct cl_lock_slice *slice,
+			      struct cl_io *unused, struct cl_sync_io *anchor)
 {
 	struct osc_thread_info *info = osc_env_info(env);
 	struct osc_io *oio = osc_env_io(env);
@@ -912,6 +921,28 @@ static int mdc_lock_enqueue(const struct lu_env *env,
 	return result;
 }
 
+static int mdc_lock_enqueue(const struct lu_env *env,
+			    const struct cl_lock_slice *slice,
+			    struct cl_io *unused, struct cl_sync_io *anchor)
+{
+	struct osc_object *obj = cl2osc(slice->cls_obj);
+	struct osc_lock	*oscl = cl2osc_lock(slice);
+	struct lustre_handle lh = { 0 };
+	int rc;
+
+	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) {
+		rc = osc_grouplock_enqueue_init(env, obj, oscl, &lh);
+		if (rc < 0)
+			return rc;
+	}
+
+	rc = __mdc_lock_enqueue(env, slice, unused, anchor);
+
+	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP)
+		osc_grouplock_enqueue_fini(env, obj, oscl, &lh);
+	return rc;
+}
+
 static const struct cl_lock_operations mdc_lock_lockless_ops = {
 	.clo_fini	= osc_lock_fini,
 	.clo_enqueue	= mdc_lock_enqueue,
@@ -950,8 +981,6 @@ int mdc_lock_init(const struct lu_env *env, struct cl_object *obj,
 
 	ols->ols_flags = flags;
 	ols->ols_speculative = !!(enqflags & CEF_SPECULATIVE);
-	if (lock->cll_descr.cld_mode == CLM_GROUP)
-		ols->ols_flags |= LDLM_FL_ATOMIC_CB;
 
 	if (ols->ols_flags & LDLM_FL_HAS_INTENT) {
 		ols->ols_flags |= LDLM_FL_BLOCK_GRANTED;
@@ -1439,6 +1468,9 @@ static int mdc_object_ast_clear(struct ldlm_lock *lock, void *data)
 		memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb));
 		cl_object_attr_unlock(&osc->oo_cl);
 		ldlm_clear_lvb_cached(lock);
+
+		if (lock->l_req_mode == LCK_GROUP)
+			osc_grouplock_dec(osc, lock);
 	}
 	return LDLM_ITER_CONTINUE;
 }
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index 3b22688..a3e72a6 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -198,7 +198,7 @@ void osc_lock_lvb_update(const struct lu_env *env,
 }
 
 static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
-			     struct lustre_handle *lockh)
+			     struct lustre_handle *lockh, int errcode)
 {
 	struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj);
 	struct ldlm_lock *dlmlock;
@@ -254,7 +254,126 @@ static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl,
 
 	LASSERT(oscl->ols_state != OLS_GRANTED);
 	oscl->ols_state = OLS_GRANTED;
+
+	if (errcode != ELDLM_LOCK_MATCHED && dlmlock->l_req_mode == LCK_GROUP)
+		osc_grouplock_inc_locked(osc, dlmlock);
+}
+
+void osc_grouplock_inc_locked(struct osc_object *osc, struct ldlm_lock *lock)
+{
+	LASSERT(lock->l_req_mode == LCK_GROUP);
+
+	if (osc->oo_group_users == 0)
+		osc->oo_group_gid = lock->l_policy_data.l_extent.gid;
+	osc->oo_group_users++;
+
+	LDLM_DEBUG(lock, "users %llu gid %llu\n",
+		   osc->oo_group_users,
+		   lock->l_policy_data.l_extent.gid);
+}
+EXPORT_SYMBOL(osc_grouplock_inc_locked);
+
+void osc_grouplock_dec(struct osc_object *osc, struct ldlm_lock *lock)
+{
+	LASSERT(lock->l_req_mode == LCK_GROUP);
+
+	mutex_lock(&osc->oo_group_mutex);
+
+	LASSERT(osc->oo_group_users > 0);
+	osc->oo_group_users--;
+	if (osc->oo_group_users == 0) {
+		osc->oo_group_gid = 0;
+		wake_up_all(&osc->oo_group_waitq);
+	}
+	mutex_unlock(&osc->oo_group_mutex);
+
+	LDLM_DEBUG(lock, "users %llu gid %lu\n",
+		   osc->oo_group_users, osc->oo_group_gid);
 }
+EXPORT_SYMBOL(osc_grouplock_dec);
+
+int osc_grouplock_enqueue_init(const struct lu_env *env,
+			       struct osc_object *obj,
+			       struct osc_lock *oscl,
+			       struct lustre_handle *lh)
+{
+	struct cl_lock_descr *need = &oscl->ols_cl.cls_lock->cll_descr;
+	int rc = 0;
+
+	LASSERT(need->cld_mode == CLM_GROUP);
+
+	while (true) {
+		bool check_gid = true;
+
+		if (oscl->ols_flags & LDLM_FL_BLOCK_NOWAIT) {
+			if (!mutex_trylock(&obj->oo_group_mutex))
+				return -EAGAIN;
+		} else {
+			mutex_lock(&obj->oo_group_mutex);
+		}
+
+		/**
+		 * If a grouplock of the same gid already exists, match it
+		 * here in advance. Otherwise, if that lock is being cancelled
+		 * there is a chance to get 2 grouplocks for the same file.
+		 */
+		if (obj->oo_group_users &&
+		    obj->oo_group_gid == need->cld_gid) {
+			struct osc_thread_info *info = osc_env_info(env);
+			struct ldlm_res_id *resname = &info->oti_resname;
+			union ldlm_policy_data *policy = &info->oti_policy;
+			struct cl_lock *lock = oscl->ols_cl.cls_lock;
+			u64 flags = oscl->ols_flags | LDLM_FL_BLOCK_GRANTED;
+			struct ldlm_namespace *ns;
+			enum ldlm_mode mode;
+
+			ns = osc_export(obj)->exp_obd->obd_namespace;
+			ostid_build_res_name(&obj->oo_oinfo->loi_oi, resname);
+			osc_lock_build_policy(env, lock, policy);
+			mode = ldlm_lock_match(ns, flags, resname,
+					       oscl->ols_einfo.ei_type, policy,
+					       oscl->ols_einfo.ei_mode, lh);
+			if (mode)
+				oscl->ols_flags |= LDLM_FL_MATCH_LOCK;
+			else
+				check_gid = false;
+		}
+
+		/**
+		 * If a grouplock exists but cannot be matched, let it to flush
+		 * and wait just for zero users for now.
+		 */
+		if (obj->oo_group_users == 0 ||
+		    (check_gid && obj->oo_group_gid == need->cld_gid))
+			break;
+
+		mutex_unlock(&obj->oo_group_mutex);
+		if (oscl->ols_flags & LDLM_FL_BLOCK_NOWAIT)
+			return -EAGAIN;
+
+		rc = l_wait_event_abortable(obj->oo_group_waitq,
+					    !obj->oo_group_users);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(osc_grouplock_enqueue_init);
+
+void osc_grouplock_enqueue_fini(const struct lu_env *env,
+				struct osc_object *obj,
+				struct osc_lock *oscl,
+				struct lustre_handle *lh)
+{
+	LASSERT(oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP);
+
+	/* If a user was added on enqueue_init, decref it */
+	if (lustre_handle_is_used(lh))
+		ldlm_lock_decref(lh, oscl->ols_einfo.ei_mode);
+	mutex_unlock(&obj->oo_group_mutex);
+}
+EXPORT_SYMBOL(osc_grouplock_enqueue_fini);
 
 /**
  * Lock upcall function that is executed either when a reply to ENQUEUE rpc is
@@ -284,7 +403,7 @@ static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 	}
 
 	if (rc == 0)
-		osc_lock_granted(env, oscl, lockh);
+		osc_lock_granted(env, oscl, lockh, errcode);
 
 	/* Error handling, some errors are tolerable. */
 	if (oscl->ols_glimpse && rc == -ENAVAIL) {
@@ -421,6 +540,7 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env,
 		struct ldlm_extent *extent = &dlmlock->l_policy_data.l_extent;
 		struct cl_attr *attr = &osc_env_info(env)->oti_attr;
 		u64 old_kms;
+		void *data;
 
 		/* Destroy pages covered by the extent of the DLM lock */
 		result = osc_lock_flush(cl2osc(obj),
@@ -433,6 +553,7 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env,
 		/* clearing l_ast_data after flushing data,
 		 * to let glimpse ast find the lock and the object
 		 */
+		data = dlmlock->l_ast_data;
 		dlmlock->l_ast_data = NULL;
 		cl_object_attr_lock(obj);
 		/* Must get the value under the lock to avoid race. */
@@ -446,6 +567,9 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env,
 		cl_object_attr_unlock(obj);
 		unlock_res_and_lock(dlmlock);
 
+		/* Skip dec in case osc_object_ast_clear() did it */
+		if (data && dlmlock->l_req_mode == LCK_GROUP)
+			osc_grouplock_dec(cl2osc(obj), dlmlock);
 		cl_object_put(env, obj);
 	}
 	return result;
@@ -931,9 +1055,9 @@ int osc_lock_enqueue_wait(const struct lu_env *env, struct osc_object *obj,
  *
  * This function does not wait for the network communication to complete.
  */
-static int osc_lock_enqueue(const struct lu_env *env,
-			    const struct cl_lock_slice *slice,
-			    struct cl_io *unused, struct cl_sync_io *anchor)
+static int __osc_lock_enqueue(const struct lu_env *env,
+			      const struct cl_lock_slice *slice,
+			      struct cl_io *unused, struct cl_sync_io *anchor)
 {
 	struct osc_thread_info *info = osc_env_info(env);
 	struct osc_io *oio = osc_env_io(env);
@@ -1053,6 +1177,29 @@ static int osc_lock_enqueue(const struct lu_env *env,
 	return result;
 }
 
+static int osc_lock_enqueue(const struct lu_env *env,
+			    const struct cl_lock_slice *slice,
+			    struct cl_io *unused, struct cl_sync_io *anchor)
+{
+	struct osc_object *obj = cl2osc(slice->cls_obj);
+	struct osc_lock	*oscl = cl2osc_lock(slice);
+	struct lustre_handle lh = { 0 };
+	int rc;
+
+	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) {
+		rc = osc_grouplock_enqueue_init(env, obj, oscl, &lh);
+		if (rc < 0)
+			return rc;
+	}
+
+	rc = __osc_lock_enqueue(env, slice, unused, anchor);
+
+	if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP)
+		osc_grouplock_enqueue_fini(env, obj, oscl, &lh);
+
+	return rc;
+}
+
 /**
  * Breaks a link between osc_lock and dlm_lock.
  */
diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c
index efb0533..c3667a3 100644
--- a/fs/lustre/osc/osc_object.c
+++ b/fs/lustre/osc/osc_object.c
@@ -74,6 +74,10 @@ int osc_object_init(const struct lu_env *env, struct lu_object *obj,
 
 	atomic_set(&osc->oo_nr_ios, 0);
 	init_waitqueue_head(&osc->oo_io_waitq);
+	init_waitqueue_head(&osc->oo_group_waitq);
+	mutex_init(&osc->oo_group_mutex);
+	osc->oo_group_users = 0;
+	osc->oo_group_gid = 0;
 
 	osc->oo_root.rb_node = NULL;
 	INIT_LIST_HEAD(&osc->oo_hp_exts);
@@ -113,6 +117,7 @@ void osc_object_free(const struct lu_env *env, struct lu_object *obj)
 	LASSERT(atomic_read(&osc->oo_nr_writes) == 0);
 	LASSERT(list_empty(&osc->oo_ol_list));
 	LASSERT(!atomic_read(&osc->oo_nr_ios));
+	LASSERT(!osc->oo_group_users);
 
 	lu_object_fini(obj);
 	/* osc doen't contain an lu_object_header, so we don't need call_rcu */
@@ -225,6 +230,17 @@ static int osc_object_ast_clear(struct ldlm_lock *lock, void *data)
 		memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb));
 		cl_object_attr_unlock(&osc->oo_cl);
 		ldlm_clear_lvb_cached(lock);
+
+		/**
+		 * Object is being destroyed and gets unlinked from the lock,
+		 * IO is finished and no cached data is left under the lock. As
+		 * grouplock is immediately marked CBPENDING it is not reused.
+		 * It will also be not possible to flush data later due to a
+		 * NULL l_ast_data - enough conditions to let new grouplocks to
+		 * be enqueued even if the lock still exists on client.
+		 */
+		if (lock->l_req_mode == LCK_GROUP)
+			osc_grouplock_dec(osc, lock);
 	}
 	return LDLM_ITER_CONTINUE;
 }

From patchwork Thu Oct 27 14:05:35 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022205
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 04752FA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:17:48 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4Mync7202Mz21HM;
	Thu, 27 Oct 2022 07:09:27 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4Mynbv5tDwz21H2
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:09:15 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 829AF1009101;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 7DA76FD4E1; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:35 -0400
Message-Id: <1666879542-10737-9-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 08/15] lustre: llite: adjust read count as
 file got truncated
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Bobi Jam <bobijam@whamcloud.com>

File read will not notice the file size truncate by another node,
and continue to read 0 filled pages beyond the new file size.

This patch add a confinement in the read to prevent the issue and
add a test case verifying the fix.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16025
Lustre-commit: 4468f6c9d92448cb7 ("LU-16025 llite: adjust read count as file got truncated")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47896
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c          | 76 ++++++++++++++++++++++++++++++++++++++++-
 fs/lustre/llite/glimpse.c       |  7 +++-
 fs/lustre/lov/lov_cl_internal.h |  6 ++--
 fs/lustre/lov/lov_object.c      | 14 ++++----
 4 files changed, 92 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index f96557e..f35cddc 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1957,6 +1957,59 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 	return result;
 }
 
+/**
+ * Confine read iter lest read beyond the EOF
+ *
+ * @iocb	kernel iocb
+ * @to		reader iov_iter
+ *
+ * RETURN	0	success
+ *		<0	failure
+ *		>0	@iocb->ki_pos has passed the EOF
+ */
+static int file_read_confine_iter(struct lu_env *env, struct kiocb *iocb,
+				  struct iov_iter *to)
+{
+	struct cl_attr *attr = vvp_env_thread_attr(env);
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct ll_inode_info *lli = ll_i2info(inode);
+	loff_t read_end = iocb->ki_pos + iov_iter_count(to);
+	loff_t kms;
+	loff_t size;
+	int rc;
+
+	cl_object_attr_lock(lli->lli_clob);
+	rc = cl_object_attr_get(env, lli->lli_clob, attr);
+	cl_object_attr_unlock(lli->lli_clob);
+	if (rc != 0)
+		return rc;
+
+	kms = attr->cat_kms;
+	/* if read beyond end-of-file, adjust read count */
+	if (kms > 0 && (iocb->ki_pos >= kms || read_end > kms)) {
+		rc = ll_glimpse_size(inode);
+		if (rc != 0)
+			return rc;
+
+		size = i_size_read(inode);
+		if (iocb->ki_pos >= size || read_end > size) {
+			CDEBUG(D_VFSTRACE,
+			       "%s: read [%llu, %llu] over eof, kms %llu, file_size %llu.\n",
+			       file_dentry(file)->d_name.name,
+			       iocb->ki_pos, read_end, kms, size);
+
+			if (iocb->ki_pos >= size)
+				return 1;
+
+			if (read_end > size)
+				iov_iter_truncate(to, size - iocb->ki_pos);
+		}
+	}
+
+	return rc;
+}
+
 static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct lu_env *env;
@@ -1967,6 +2020,7 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ssize_t rc2;
 	ktime_t kstart = ktime_get();
 	bool cached;
+	bool stale_data = false;
 
 	CDEBUG(D_VFSTRACE|D_IOTRACE, "file %s:"DFID", ppos: %lld, count: %zu\n",
 	       file_dentry(file)->d_name.name,
@@ -1976,6 +2030,16 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (!iov_iter_count(to))
 		return 0;
 
+	env = cl_env_get(&refcheck);
+	if (IS_ERR(env))
+		return PTR_ERR(env);
+
+	result = file_read_confine_iter(env, iocb, to);
+	if (result < 0)
+		goto out;
+	else if (result > 0)
+		stale_data = true;
+
 	/**
 	 * Currently when PCC read failed, we do not fall back to the
 	 * normal read path, just return the error.
@@ -2012,8 +2076,18 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	else if (result == 0)
 		result = rc2;
 
-	cl_env_put(env, &refcheck);
 out:
+	cl_env_put(env, &refcheck);
+
+	if (stale_data && result > 0) {
+		/**
+		 * we've reached EOF before the read, the data read are cached
+		 * stale data.
+		 */
+		iov_iter_truncate(to, 0);
+		result = 0;
+	}
+
 	if (result > 0) {
 		ll_rw_stats_tally(ll_i2sbi(file_inode(file)), current->pid,
 				  file->private_data, iocb->ki_pos, result,
diff --git a/fs/lustre/llite/glimpse.c b/fs/lustre/llite/glimpse.c
index c55d079..0190cb5 100644
--- a/fs/lustre/llite/glimpse.c
+++ b/fs/lustre/llite/glimpse.c
@@ -206,7 +206,12 @@ int __cl_glimpse_size(struct inode *inode, int agl)
 		} else if (result == 0) {
 			result = cl_glimpse_lock(env, io, inode, io->ci_obj,
 						 agl);
-			if (!agl && result == -EAGAIN)
+			/**
+			 * need to limit retries for FLR mirrors if fast read
+			 * is short because of concurrent truncate.
+			 */
+			if (!agl && result == -EAGAIN &&
+			    !io->ci_tried_all_mirrors)
 				io->ci_need_restart = 1;
 		}
 
diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h
index 95dbb43..49cc40b 100644
--- a/fs/lustre/lov/lov_cl_internal.h
+++ b/fs/lustre/lov/lov_cl_internal.h
@@ -377,8 +377,10 @@ static inline struct lov_layout_entry *lov_entry(struct lov_object *lov, int i)
 }
 
 #define lov_for_layout_entry(lov, entry, start, end)			\
-	for (entry = lov_entry(lov, start);				\
-	     entry <= lov_entry(lov, end); entry++)
+	if (lov->u.composite.lo_entries &&				\
+	    lov->u.composite.lo_entry_count > 0)			\
+		for (entry = lov_entry(lov, start);			\
+		     entry <= lov_entry(lov, end); entry++)
 
 #define lov_foreach_layout_entry(lov, entry)				\
 	lov_for_layout_entry(lov, entry, 0,				\
diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 064764c..5245fd6 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -847,19 +847,17 @@ static int lov_delete_composite(const struct lu_env *env,
 				struct lov_object *lov,
 				union lov_layout_state *state)
 {
-	struct lov_layout_composite *comp = &state->composite;
 	struct lov_layout_entry *entry;
 
 	dump_lsm(D_INODE, lov->lo_lsm);
 
 	lov_layout_wait(env, lov);
-	if (comp->lo_entries)
-		lov_foreach_layout_entry(lov, entry) {
-			if (entry->lle_lsme && lsme_is_foreign(entry->lle_lsme))
-				continue;
+	lov_foreach_layout_entry(lov, entry) {
+		if (entry->lle_lsme && lsme_is_foreign(entry->lle_lsme))
+			continue;
 
-			lov_delete_raid0(env, lov, entry);
-		}
+		lov_delete_raid0(env, lov, entry);
+	}
 
 	return 0;
 }
@@ -997,6 +995,8 @@ static int lov_attr_get_composite(const struct lu_env *env,
 
 	attr->cat_size = 0;
 	attr->cat_blocks = 0;
+	attr->cat_kms = 0;
+
 	lov_foreach_layout_entry(lov, entry) {
 		int index = lov_layout_entry_index(lov, entry);
 		struct cl_attr *lov_attr = NULL;

From patchwork Thu Oct 27 14:05:36 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022206
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 55A68FA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:21:10 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4Mynd35h9wz21Jx;
	Thu, 27 Oct 2022 07:10:15 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MyncZ1Rh7z21JZ
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:09:49 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 8451E1009102;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 80969FD4E9; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:36 -0400
Message-Id: <1666879542-10737-10-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 09/15] lnet: Discovery queue and deletion race
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Chris Horn <chris.horn@hpe.com>,
 Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Chris Horn <chris.horn@hpe.com>

lnet_peer_deletion() can race with another thread calling
lnet_peer_queue_for_discovery.

Discovery thread:
  - Calls lnet_peer_deletion():
  - LNET_PEER_DISCOVERING bit is cleared from lnet_peer::lp_state
  - releases lnet_peer::lp_lock

Another thread:
  - Acquires lnet_net_lock/EX
  - Calls lnet_peer_queue_for_discovery()
  - Takes lnet_peer::lp_lock
  - Sets LNET_PEER_DISCOVERING bit
  - Releases lnet_peer::lp_lock
  - Sees lnet_peer::lp_dc_list is not empty, so it does not add peer
    to dc request queue
  - lnet_peer_queue_for_discovery() returns, lnet_net_lock/EX releases

Discovery thread:
  - Acquires lnet_net_lock/EX
  - Deletes peer from ln_dc_working list
  - performs the peer deletion

At this point, the peer is not on any discovery list, and it has
LNET_PEER_DISCOVERING bit set. This peer is now stranded, and any
messages on the peer's lnet_peer::lp_dc_pendq are likewise stranded.

To solve this, we modify lnet_peer_deletion() so that it waits to
clear the LNET_PEER_DISCOVERING bit until it has completed deleting
the peer and re-acquired the lnet_peer::lp_lock. This ensures we
cannot race with any other thread that may add the
LNET_PEER_DISCOVERING bit back to the peer. We also avoid deleting
the peer from the ln_dc_working list in lnet_peer_deletion(). This is
already done by lnet_peer_discovery_complete().

There is another window where the LNET_PEER_DISCOVERING bit can be
added when the discovery thread drops the lp_lock just before
acquiring the net_lock/EX and calling lnet_peer_discovery_complete().
Have lnet_peer_discovery_complete() clear LNET_PEER_DISCOVERING to
deal with this (it already does this for the case where discovery hit
an error). Also move the deletion of lp_dc_list to after we clear the
DISCOVERING bit. This is to mirror the behavior of
lnet_peer_queue_for_discovery() which sets the DISCOVERING bit and
then manipulates the lp_dc_list.

Also tweak the logic in lnet_peer_deletion() to call
lnet_peer_del_locked() in order to avoid extra calls to
lnet_net_lock()/lnet_net_unlock().

HPE-bug-id: LUS-11237
WC-bug-id: https://jira.whamcloud.com/browse/LU-16149
Lustre-commit: a966b624ac76e34e8 ("LU-16149 lnet: Discovery queue and deletion race")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48532
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 9b20660..d8d1857 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2206,15 +2206,19 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp, int dc_error)
 	CDEBUG(D_NET, "Discovery complete. Dequeue peer %s\n",
 	       libcfs_nidstr(&lp->lp_primary_nid));
 
-	list_del_init(&lp->lp_dc_list);
 	spin_lock(&lp->lp_lock);
+	/* Our caller dropped lp_lock which may have allowed another thread to
+	 * set LNET_PEER_DISCOVERING, or it may be set if dc_error is non-zero.
+	 * Ensure it is cleared.
+	 */
+	lp->lp_state &= ~LNET_PEER_DISCOVERING;
 	if (dc_error) {
 		lp->lp_dc_error = dc_error;
-		lp->lp_state &= ~LNET_PEER_DISCOVERING;
 		lp->lp_state |= LNET_PEER_REDISCOVER;
 	}
 	list_splice_init(&lp->lp_dc_pendq, &pending_msgs);
 	spin_unlock(&lp->lp_lock);
+	list_del_init(&lp->lp_dc_list);
 	wake_up(&lp->lp_dc_waitq);
 
 	if (lp->lp_rtr_refcount > 0)
@@ -3152,18 +3156,16 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	struct list_head rlist;
 	struct lnet_route *route, *tmp;
 	int sensitivity = lp->lp_health_sensitivity;
-	int rc;
+	int rc = 0;
 
 	INIT_LIST_HEAD(&rlist);
 
-	lp->lp_state &= ~(LNET_PEER_DISCOVERING | LNET_PEER_FORCE_PING |
-			  LNET_PEER_FORCE_PUSH);
 	CDEBUG(D_NET, "peer %s(%p) state %#x\n",
 	       libcfs_nidstr(&lp->lp_primary_nid), lp, lp->lp_state);
 
 	/* no-op if lnet_peer_del() has already been called on this peer */
 	if (lp->lp_state & LNET_PEER_MARK_DELETED)
-		return 0;
+		goto clear_discovering;
 
 	spin_unlock(&lp->lp_lock);
 
@@ -3172,28 +3174,25 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	    the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) {
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		spin_lock(&lp->lp_lock);
-		return -ESHUTDOWN;
+		rc = -ESHUTDOWN;
+		goto clear_discovering;
 	}
 
+	lnet_peer_cancel_discovery(lp);
 	lnet_net_lock(LNET_LOCK_EX);
-	/* remove the peer from the discovery work
-	 * queue if it's on there in preparation
-	 * of deleting it.
-	 */
-	if (!list_empty(&lp->lp_dc_list))
-		list_del_init(&lp->lp_dc_list);
 	list_for_each_entry_safe(route, tmp,
 				 &lp->lp_routes,
 				 lr_gwlist)
 		lnet_move_route(route, NULL, &rlist);
-	lnet_net_unlock(LNET_LOCK_EX);
 
-	/* lnet_peer_del() deletes all the peer NIs owned by this peer */
-	rc = lnet_peer_del(lp);
+	/* lnet_peer_del_locked() deletes all the peer NIs owned by this peer */
+	rc = lnet_peer_del_locked(lp);
 	if (rc)
 		CNETERR("Internal error: Unable to delete peer %s rc %d\n",
 			libcfs_nidstr(&lp->lp_primary_nid), rc);
 
+	lnet_net_unlock(LNET_LOCK_EX);
+
 	list_for_each_entry_safe(route, tmp,
 				 &rlist, lr_list) {
 		/* re-add these routes */
@@ -3209,7 +3208,13 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 
 	spin_lock(&lp->lp_lock);
 
-	return 0;
+	rc = 0;
+
+clear_discovering:
+	lp->lp_state &= ~(LNET_PEER_DISCOVERING | LNET_PEER_FORCE_PING |
+			  LNET_PEER_FORCE_PUSH);
+
+	return rc;
 }
 
 /*

From patchwork Thu Oct 27 14:05:37 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022209
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 783CDECAAA1
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:24:34 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynfR1Wh7z20rR;
	Thu, 27 Oct 2022 07:11:27 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4Myndq3bHwz1y5x
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:10:55 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 895C91009106;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 8699EFD4F2; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:37 -0400
Message-Id: <1666879542-10737-11-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 10/15] lustre: statahead: avoid to block
 ptlrpcd interpret context
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Qian Yingjin <qian@ddn.com>

If a stat-ahead entry is a striped directory or a regular file
with layout change, it will generate a new RPC and block ptlrpcd
interpret context for a long time.
However, it is dangerous of blocking in ptlrpcd thread as it may
result in deadlock.

The following is the stack trace for the timeout of replay-dual
test_26:
task:ptlrpcd_00_01   state:I stack:    0 pid: 8026 ppid:     2
osc_extent_wait+0x44d/0x560 [osc]
osc_cache_wait_range+0x2b8/0x930 [osc]
osc_io_fsync_end+0x67/0x80 [osc]
cl_io_end+0x58/0x130 [obdclass]
lov_io_end_wrapper+0xcf/0xe0 [lov]
lov_io_fsync_end+0x6f/0x1c0 [lov]
cl_io_end+0x58/0x130 [obdclass]
cl_io_loop+0xa7/0x200 [obdclass]
cl_sync_file_range+0x2c9/0x340 [lustre]
vvp_prune+0x5d/0x1e0 [lustre]
cl_object_prune+0x58/0x130 [obdclass]
lov_layout_change.isra.47+0x1ba/0x640 [lov]
lov_conf_set+0x38d/0x4e0 [lov]
cl_conf_set+0x60/0x140 [obdclass]
cl_file_inode_init+0xc8/0x380 [lustre]
ll_update_inode+0x432/0x6e0 [lustre]
ll_iget+0x227/0x320 [lustre]
ll_prep_inode+0x344/0xb60 [lustre]
ll_statahead_interpret_common.isra.26+0x69/0x830 [lustre]
ll_statahead_interpret+0x2c8/0x5b0 [lustre]
mdc_intent_getattr_async_interpret+0x14a/0x3e0 [mdc]
ptlrpc_check_set+0x5b8/0x1fe0 [ptlrpc]
ptlrpcd+0x6c6/0xa50 [ptlrpc]

In this patch, we use work queue to handle the extra RPC and long
wait in a separate thread for a striped directory and a regular
file with layout change:
    (@ll_prep_inode->@lmv_revalidate_slaves);
    (@ll_prep_inode->@lov_layout_change->osc_cache_wait_range)

WC-bug-id: https://jira.whamcloud.com/browse/LU-16139
Lustre-commit: 2e089743901433855 ("LU-16139 statahead: avoid to block ptlrpcd interpret context")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48451
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_intent.h |   2 -
 fs/lustre/include/obd.h           |   6 +-
 fs/lustre/llite/llite_internal.h  |   6 --
 fs/lustre/llite/llite_lib.c       |   8 --
 fs/lustre/llite/statahead.c       | 173 +++++++++++++++-----------------------
 fs/lustre/mdc/mdc_locks.c         |   3 +-
 6 files changed, 73 insertions(+), 125 deletions(-)

diff --git a/fs/lustre/include/lustre_intent.h b/fs/lustre/include/lustre_intent.h
index 298270b..e7d81f6 100644
--- a/fs/lustre/include/lustre_intent.h
+++ b/fs/lustre/include/lustre_intent.h
@@ -50,8 +50,6 @@ struct lookup_intent {
 	u64			it_remote_lock_handle;
 	struct ptlrpc_request	*it_request;
 	unsigned int		it_lock_set:1;
-	unsigned int		it_extra_rpc_check:1;
-	unsigned int		it_extra_rpc_need:1;
 };
 
 static inline int it_disposition(struct lookup_intent *it, int flag)
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index c452da7..16f66ea 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -835,9 +835,7 @@ struct md_readdir_info {
 };
 
 struct md_op_item;
-typedef int (*md_op_item_cb_t)(struct req_capsule *pill,
-			       struct md_op_item *item,
-			       int rc);
+typedef int (*md_op_item_cb_t)(struct md_op_item *item, int rc);
 
 struct md_op_item {
 	struct md_op_data		 mop_data;
@@ -847,6 +845,8 @@ struct md_op_item {
 	md_op_item_cb_t                  mop_cb;
 	void				*mop_cbdata;
 	struct inode			*mop_dir;
+	struct req_capsule		*mop_pill;
+	struct work_struct		 mop_work;
 };
 
 struct obd_ops {
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index e7e4387..d245dd8 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1517,12 +1517,6 @@ struct ll_statahead_info {
 	atomic_t		sai_cache_count; /* entry count in cache */
 };
 
-struct ll_interpret_work {
-	struct work_struct	 lpw_work;
-	struct md_op_item	*lpw_item;
-	struct req_capsule	*lpw_pill;
-};
-
 int ll_revalidate_statahead(struct inode *dir, struct dentry **dentry,
 			    bool unplug);
 int ll_start_statahead(struct inode *dir, struct dentry *dentry, bool agl);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 645fbd9..130a723 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -3095,14 +3095,6 @@ int ll_prep_inode(struct inode **inode, struct req_capsule *pill,
 	if (rc)
 		goto out;
 
-	if (S_ISDIR(md.body->mbo_mode) && md.lmv && lmv_dir_striped(md.lmv) &&
-	    it && it->it_extra_rpc_check) {
-		/* TODO: Check @lsm unchanged via @lsm_md_eq. */
-		it->it_extra_rpc_need = 1;
-		rc = -EAGAIN;
-		goto out;
-	}
-
 	/*
 	 * clear default_lmv only if intent_getattr reply doesn't contain it.
 	 * but it needs to be done after iget, check this early because
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 1f1fafd..e6ea2ee 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -329,8 +329,7 @@ static void sa_fini_data(struct md_op_item *item)
 	kfree(item);
 }
 
-static int ll_statahead_interpret(struct req_capsule *pill,
-				  struct md_op_item *item, int rc);
+static int ll_statahead_interpret(struct md_op_item *item, int rc);
 
 /*
  * prepare arguments for async stat RPC.
@@ -591,56 +590,6 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai)
 	iput(inode);
 }
 
-static int ll_statahead_interpret_common(struct inode *dir,
-					 struct ll_statahead_info *sai,
-					 struct req_capsule *pill,
-					 struct lookup_intent *it,
-					 struct sa_entry *entry,
-					 struct mdt_body *body)
-{
-	struct inode *child;
-	int rc;
-
-	child = entry->se_inode;
-	rc = ll_prep_inode(&child, pill, dir->i_sb, it);
-	if (rc)
-		goto out;
-
-	/* If encryption context was returned by MDT, put it in
-	 * inode now to save an extra getxattr.
-	 */
-	if (body->mbo_valid & OBD_MD_ENCCTX) {
-		void *encctx = req_capsule_server_get(pill, &RMF_FILE_ENCCTX);
-		u32 encctxlen = req_capsule_get_size(pill, &RMF_FILE_ENCCTX,
-						     RCL_SERVER);
-
-		if (encctxlen) {
-			CDEBUG(D_SEC,
-			       "server returned encryption ctx for "DFID"\n",
-			       PFID(ll_inode2fid(child)));
-			rc = ll_xattr_cache_insert(child,
-						   xattr_for_enc(child),
-						   encctx, encctxlen);
-			if (rc)
-				CWARN("%s: cannot set enc ctx for "DFID": rc = %d\n",
-				      ll_i2sbi(child)->ll_fsname,
-				      PFID(ll_inode2fid(child)), rc);
-		}
-	}
-
-	CDEBUG(D_READA, "%s: setting %.*s"DFID" l_data to inode %p\n",
-	       ll_i2sbi(dir)->ll_fsname, entry->se_qstr.len,
-	       entry->se_qstr.name, PFID(ll_inode2fid(child)), child);
-	ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, child, it, NULL);
-
-	entry->se_inode = child;
-
-	if (agl_should_run(sai, child))
-		ll_agl_add(sai, child, entry->se_index);
-out:
-	return rc;
-}
-
 static void ll_statahead_interpret_fini(struct ll_inode_info *lli,
 					struct ll_statahead_info *sai,
 					struct md_op_item *item,
@@ -664,13 +613,11 @@ static void ll_statahead_interpret_fini(struct ll_inode_info *lli,
 	spin_unlock(&lli->lli_sa_lock);
 }
 
-static void ll_statahead_interpret_work(struct work_struct *data)
+static void ll_statahead_interpret_work(struct work_struct *work)
 {
-	struct ll_interpret_work *work = container_of(data,
-						     struct ll_interpret_work,
-						     lpw_work);
-	struct md_op_item *item = work->lpw_item;
-	struct req_capsule *pill = work->lpw_pill;
+	struct md_op_item *item = container_of(work, struct md_op_item,
+					       mop_work);
+	struct req_capsule *pill = item->mop_pill;
 	struct inode *dir = item->mop_dir;
 	struct ll_inode_info *lli = ll_i2info(dir);
 	struct ll_statahead_info *sai = lli->lli_sai;
@@ -709,11 +656,43 @@ static void ll_statahead_interpret_work(struct work_struct *data)
 		goto out;
 	}
 
-	LASSERT(it->it_extra_rpc_check == 0);
-	rc = ll_statahead_interpret_common(dir, sai, pill, it, entry, body);
+	rc = ll_prep_inode(&child, pill, dir->i_sb, it);
+	if (rc)
+		goto out;
+
+	/* If encryption context was returned by MDT, put it in
+	 * inode now to save an extra getxattr.
+	 */
+	if (body->mbo_valid & OBD_MD_ENCCTX) {
+		void *encctx = req_capsule_server_get(pill, &RMF_FILE_ENCCTX);
+		u32 encctxlen = req_capsule_get_size(pill, &RMF_FILE_ENCCTX,
+						     RCL_SERVER);
+
+		if (encctxlen) {
+			CDEBUG(D_SEC,
+			       "server returned encryption ctx for "DFID"\n",
+			       PFID(ll_inode2fid(child)));
+			rc = ll_xattr_cache_insert(child,
+						   xattr_for_enc(child),
+						   encctx, encctxlen);
+			if (rc)
+				CWARN("%s: cannot set enc ctx for "DFID": rc = %d\n",
+				      ll_i2sbi(child)->ll_fsname,
+				      PFID(ll_inode2fid(child)), rc);
+		}
+	}
+
+	CDEBUG(D_READA, "%s: setting %.*s"DFID" l_data to inode %p\n",
+	       ll_i2sbi(dir)->ll_fsname, entry->se_qstr.len,
+	       entry->se_qstr.name, PFID(ll_inode2fid(child)), child);
+	ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, child, it, NULL);
+
+	entry->se_inode = child;
+
+	if (agl_should_run(sai, child))
+		ll_agl_add(sai, child, entry->se_index);
 out:
 	ll_statahead_interpret_fini(lli, sai, item, entry, pill->rc_req, rc);
-	kfree(work);
 }
 
 /*
@@ -721,14 +700,15 @@ static void ll_statahead_interpret_work(struct work_struct *data)
  * the inode and set lock data directly in the ptlrpcd context. It will wake up
  * the directory listing process if the dentry is the waiting one.
  */
-static int ll_statahead_interpret(struct req_capsule *pill,
-				  struct md_op_item *item, int rc)
+static int ll_statahead_interpret(struct md_op_item *item, int rc)
 {
+	struct req_capsule *pill = item->mop_pill;
 	struct lookup_intent *it = &item->mop_it;
 	struct inode *dir = item->mop_dir;
 	struct ll_inode_info *lli = ll_i2info(dir);
 	struct ll_statahead_info *sai = lli->lli_sai;
 	struct sa_entry *entry = (struct sa_entry *)item->mop_cbdata;
+	struct work_struct *work = &item->mop_work;
 	struct mdt_body *body;
 	struct inode *child;
 	u64 handle = 0;
@@ -770,50 +750,33 @@ static int ll_statahead_interpret(struct req_capsule *pill,
 	entry->se_handle = it->it_lock_handle;
 	/*
 	 * In ptlrpcd context, it is not allowed to generate new RPCs
-	 * especially for striped directories.
+	 * especially for striped directories or regular files with layout
+	 * change.
 	 */
-	it->it_extra_rpc_check = 1;
-	rc = ll_statahead_interpret_common(dir, sai, pill, it, entry, body);
-	if (rc == -EAGAIN && it->it_extra_rpc_need) {
-		struct ll_interpret_work *work;
-
-		/*
-		 * release ibits lock ASAP to avoid deadlock when statahead
-		 * thread enqueues lock on parent in readdir and another
-		 * process enqueues lock on child with parent lock held, eg.
-		 * unlink.
-		 */
-		handle = it->it_lock_handle;
-		ll_intent_drop_lock(it);
-		ll_unlock_md_op_lsm(&item->mop_data);
-		it->it_extra_rpc_check = 0;
-		it->it_extra_rpc_need = 0;
-
-		/*
-		 * If the stat-ahead entry is a striped directory, there are two
-		 * solutions:
-		 * 1. It can drop the result, let the scanning process do stat()
-		 * on the striped directory in synchronous way. By this way, it
-		 * can avoid to generate new RPCs to obtain the attributes for
-		 * slaves of the striped directory in the ptlrpcd context as it
-		 * is dangerous of blocking in ptlrpcd thread.
-		 * 2. Use work queue or the separate statahead thread to handle
-		 * the extra RPCs (@ll_prep_inode->@lmv_revalidate_slaves).
-		 * Here we adopt the second solution.
-		 */
-		work = kmalloc(sizeof(*work), GFP_ATOMIC);
-		if (!work) {
-			rc = -ENOMEM;
-			goto out;
-		}
-		INIT_WORK(&work->lpw_work, ll_statahead_interpret_work);
-		work->lpw_item = item;
-		work->lpw_pill = pill;
-		ptlrpc_request_addref(pill->rc_req);
-		schedule_work(&work->lpw_work);
-		return 0;
-	}
+	/*
+	 * release ibits lock ASAP to avoid deadlock when statahead
+	 * thread enqueues lock on parent in readdir and another
+	 * process enqueues lock on child with parent lock held, eg.
+	 * unlink.
+	 */
+	handle = it->it_lock_handle;
+	ll_intent_drop_lock(it);
+	ll_unlock_md_op_lsm(&item->mop_data);
 
+	/*
+	 * If the statahead entry is a striped directory or regular file with
+	 * layout change, it will generate a new RPC and long wait in the
+	 * ptlrpcd context.
+	 * However, it is dangerous of blocking in ptlrpcd thread.
+	 * Here we use work queue or the separate statahead thread to handle
+	 * the extra RPC and long wait:
+	 *	(@ll_prep_inode->@lmv_revalidate_slaves);
+	 *	(@ll_prep_inode->@lov_layout_change->osc_cache_wait_range);
+	 */
+	INIT_WORK(work, ll_statahead_interpret_work);
+	ptlrpc_request_addref(pill->rc_req);
+	schedule_work(work);
+	return 0;
 out:
 	ll_statahead_interpret_fini(lli, sai, item, entry, NULL, rc);
 	return rc;
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 31c5bc0..f36e0ec 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -1396,7 +1396,8 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 	rc = mdc_finish_intent_lock(exp, req, &item->mop_data, it, lockh);
 
 out:
-	item->mop_cb(&req->rq_pill, item, rc);
+	item->mop_pill = &req->rq_pill;
+	item->mop_cb(item, rc);
 	return 0;
 }
 

From patchwork Thu Oct 27 14:05:38 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022210
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 604C5ECAAA1
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:24:43 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4Myngt6WDkz227B;
	Thu, 27 Oct 2022 07:12:42 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4Myngk266rz216F
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:12:34 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 8A8A81009107;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 874E2FD4F7; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:38 -0400
Message-Id: <1666879542-10737-12-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 11/15] lnet: add mechanism for dumping lnd
 peer debug info
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Serguei Smirnov <ssmirnov@whamcloud.com>,
 Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Add ability to dump lnd peer debug info:
        lnetctl debug peer --nid=<nid>

The debug info is dumped to the log as D_CONSOLE by the respective
lnd and can be retrieved with "lctl dk" or seen in syslog.
This mechanism has been added for socklnd and o2iblnd peers.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15234
Lustre-commit: 950e59ced18d49e9f ("LU-15234 lnet: add mechanism for dumping lnd peer debug info")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48566
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 96 +++++++++++++++++++++++++++++++++++++++-
 net/lnet/klnds/socklnd/socklnd.c | 51 ++++++++++++++++++++-
 2 files changed, 143 insertions(+), 4 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 14dd686..d2e4ce9 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -422,7 +422,96 @@ void kiblnd_unlink_peer_locked(struct kib_peer_ni *peer_ni)
 	kiblnd_peer_decref(peer_ni);
 }
 
-static int kiblnd_get_peer_info(struct lnet_ni *ni, int index,
+static void
+kiblnd_debug_rx(struct kib_rx *rx)
+{
+	CDEBUG(D_CONSOLE, "      %p msg_type %x cred %d\n",
+	       rx, rx->rx_msg->ibm_type,
+	       rx->rx_msg->ibm_credits);
+}
+
+static void
+kiblnd_debug_tx(struct kib_tx *tx)
+{
+	CDEBUG(D_CONSOLE,
+	       "      %p snd %d q %d w %d rc %d dl %lld cookie %#llx msg %s%s type %x cred %d\n",
+	       tx, tx->tx_sending, tx->tx_queued, tx->tx_waiting,
+	       tx->tx_status, ktime_to_ns(tx->tx_deadline), tx->tx_cookie,
+	       !tx->tx_lntmsg[0] ? "-" : "!",
+	       !tx->tx_lntmsg[1] ? "-" : "!",
+	       tx->tx_msg->ibm_type, tx->tx_msg->ibm_credits);
+}
+
+static void
+kiblnd_debug_conn(struct kib_conn *conn)
+{
+	struct list_head *tmp;
+	int i;
+
+	spin_lock(&conn->ibc_lock);
+
+	CDEBUG(D_CONSOLE, "conn[%d] %p [version %x] -> %s:\n",
+	       atomic_read(&conn->ibc_refcount), conn,
+	       conn->ibc_version, libcfs_nid2str(conn->ibc_peer->ibp_nid));
+	CDEBUG(D_CONSOLE,
+	       "   state %d nposted %d/%d cred %d o_cred %d r_cred %d\n",
+	       conn->ibc_state, conn->ibc_noops_posted,
+	       conn->ibc_nsends_posted, conn->ibc_credits,
+	       conn->ibc_outstanding_credits, conn->ibc_reserved_credits);
+	CDEBUG(D_CONSOLE, "   comms_err %d\n", conn->ibc_comms_error);
+
+	CDEBUG(D_CONSOLE, "   early_rxs:\n");
+	list_for_each(tmp, &conn->ibc_early_rxs)
+		kiblnd_debug_rx(list_entry(tmp, struct kib_rx, rx_list));
+
+	CDEBUG(D_CONSOLE, "   tx_noops:\n");
+	list_for_each(tmp, &conn->ibc_tx_noops)
+		kiblnd_debug_tx(list_entry(tmp, struct kib_tx, tx_list));
+
+	CDEBUG(D_CONSOLE, "   tx_queue_nocred:\n");
+	list_for_each(tmp, &conn->ibc_tx_queue_nocred)
+		kiblnd_debug_tx(list_entry(tmp, struct kib_tx, tx_list));
+
+	CDEBUG(D_CONSOLE, "   tx_queue_rsrvd:\n");
+	list_for_each(tmp, &conn->ibc_tx_queue_rsrvd)
+		kiblnd_debug_tx(list_entry(tmp, struct kib_tx, tx_list));
+
+	CDEBUG(D_CONSOLE, "   tx_queue:\n");
+	list_for_each(tmp, &conn->ibc_tx_queue)
+		kiblnd_debug_tx(list_entry(tmp, struct kib_tx, tx_list));
+
+	CDEBUG(D_CONSOLE, "   active_txs:\n");
+	list_for_each(tmp, &conn->ibc_active_txs)
+		kiblnd_debug_tx(list_entry(tmp, struct kib_tx, tx_list));
+
+	CDEBUG(D_CONSOLE, "   rxs:\n");
+	for (i = 0; i < IBLND_RX_MSGS(conn); i++)
+		kiblnd_debug_rx(&conn->ibc_rxs[i]);
+
+	spin_unlock(&conn->ibc_lock);
+}
+
+static void
+kiblnd_dump_peer_debug_info(struct kib_peer_ni *peer_ni)
+{
+	struct kib_conn *conn;
+	struct kib_conn *cnxt;
+	int count = 0;
+
+	CDEBUG(D_CONSOLE, "[last_alive, races, reconnected, error]: %lld, %d, %d, %d\n",
+	       peer_ni->ibp_last_alive,
+	       peer_ni->ibp_races,
+	       peer_ni->ibp_reconnected,
+	       peer_ni->ibp_error);
+	list_for_each_entry_safe(conn, cnxt, &peer_ni->ibp_conns,
+				 ibc_list) {
+		CDEBUG(D_CONSOLE, "Conn %d:\n", count);
+		kiblnd_debug_conn(conn);
+		count++;
+	}
+}
+
+static int kiblnd_get_peer_info(struct lnet_ni *ni, lnet_nid_t nid, int index,
 				lnet_nid_t *nidp, int *count)
 {
 	struct kib_peer_ni *peer_ni;
@@ -437,6 +526,9 @@ static int kiblnd_get_peer_info(struct lnet_ni *ni, int index,
 		if (peer_ni->ibp_ni != ni)
 			continue;
 
+		if (peer_ni->ibp_nid == nid)
+			kiblnd_dump_peer_debug_info(peer_ni);
+
 		if (index-- > 0)
 			continue;
 
@@ -1065,7 +1157,7 @@ static int kiblnd_ctl(struct lnet_ni *ni, unsigned int cmd, void *arg)
 		lnet_nid_t nid = 0;
 		int count = 0;
 
-		rc = kiblnd_get_peer_info(ni, data->ioc_count,
+		rc = kiblnd_get_peer_info(ni, data->ioc_nid, data->ioc_count,
 					  &nid, &count);
 		data->ioc_nid = nid;
 		data->ioc_count = count;
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 8d3c0d6..996d3a9 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -277,6 +277,52 @@ struct ksock_peer_ni *
 	ksocknal_peer_decref(peer_ni);
 }
 
+static void
+ksocknal_dump_peer_debug_info(struct ksock_peer_ni *peer_ni)
+{
+	struct ksock_conn *conn;
+	struct list_head *ctmp;
+	struct list_head *txtmp;
+	int ccount = 0;
+	int txcount = 0;
+
+	list_for_each(ctmp, &peer_ni->ksnp_conns) {
+		conn = list_entry(ctmp, struct ksock_conn, ksnc_list);
+
+		if (!list_empty(&conn->ksnc_tx_queue))
+			list_for_each(txtmp, &conn->ksnc_tx_queue) txcount++;
+
+		CDEBUG(D_CONSOLE, "Conn %d [type, closing, crefcnt, srefcnt]: %d, %d, %d, %d\n",
+		       ccount,
+		       conn->ksnc_type,
+		       conn->ksnc_closing,
+		       refcount_read(&conn->ksnc_conn_refcount),
+		       refcount_read(&conn->ksnc_sock_refcount));
+		CDEBUG(D_CONSOLE, "Conn %d rx [scheduled, ready, state]: %d, %d, %d\n",
+		       ccount,
+		       conn->ksnc_rx_scheduled,
+		       conn->ksnc_rx_ready,
+		       conn->ksnc_rx_state);
+		CDEBUG(D_CONSOLE,
+		       "Conn %d tx [txqcnt, scheduled, last_post, ready, deadline]: %d, %d, %lld, %d, %lld\n",
+		       ccount,
+		       txcount,
+		       conn->ksnc_tx_scheduled,
+		       conn->ksnc_tx_last_post,
+		       conn->ksnc_rx_ready,
+		       conn->ksnc_rx_deadline);
+
+		if (conn->ksnc_scheduler)
+			CDEBUG(D_CONSOLE, "Conn %d sched [nconns, cpt]: %d, %d\n",
+			       ccount,
+			       conn->ksnc_scheduler->kss_nconns,
+			       conn->ksnc_scheduler->kss_cpt);
+
+		txcount = 0;
+		ccount++;
+	}
+}
+
 static int
 ksocknal_get_peer_info(struct lnet_ni *ni, int index,
 		       struct lnet_processid *id, u32 *myip, u32 *peer_ip,
@@ -295,9 +341,9 @@ struct ksock_peer_ni *
 		if (index-- > 0)
 			continue;
 
+		*id = peer_ni->ksnp_id;
 		conn_cb = peer_ni->ksnp_conn_cb;
 		if (!conn_cb) {
-			*id = peer_ni->ksnp_id;
 			*myip = 0;
 			*peer_ip = 0;
 			*port = 0;
@@ -305,7 +351,8 @@ struct ksock_peer_ni *
 			*share_count = 0;
 			rc = 0;
 		} else {
-			*id = peer_ni->ksnp_id;
+			ksocknal_dump_peer_debug_info(peer_ni);
+
 			if (conn_cb->ksnr_addr.ss_family == AF_INET) {
 				struct sockaddr_in *sa;
 

From patchwork Thu Oct 27 14:05:39 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022211
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 5FCFFFA3741
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:25:28 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4Mynhs2lH8z219K;
	Thu, 27 Oct 2022 07:13:33 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4Mynhf2r9Fz2169
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:13:22 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 8CC681009108;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 8AE00FD4FB; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:39 -0400
Message-Id: <1666879542-10737-13-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 12/15] lnet: ksocklnd: fix irq lock inversion
 while calling sk_data_ready()
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but ksocklnd version of sk_data_ready, ksocknal_data_ready()
does not handle the BH case. Change how ksnd_global_lock is taken in
this case.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15807
Lustre-commit: 1df5199097ef0d789 ("LU-15807 ksocklnd: fix irq lock inversion while calling sk_data_ready()")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48715
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 net/lnet/klnds/socklnd/socklnd_lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd_lib.c b/net/lnet/klnds/socklnd/socklnd_lib.c
index 047e7a6..dd12945 100644
--- a/net/lnet/klnds/socklnd/socklnd_lib.c
+++ b/net/lnet/klnds/socklnd/socklnd_lib.c
@@ -374,7 +374,7 @@ static int lustre_csum(struct kvec *v, void *context)
 
 	/* interleave correctly with closing sockets... */
 	LASSERT(!in_irq());
-	read_lock(&ksocknal_data.ksnd_global_lock);
+	read_lock_bh(&ksocknal_data.ksnd_global_lock);
 
 	conn = sk->sk_user_data;
 	wspace = sk_stream_wspace(sk);
@@ -408,7 +408,7 @@ static int lustre_csum(struct kvec *v, void *context)
 		clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 	}
 
-	read_unlock(&ksocknal_data.ksnd_global_lock);
+	read_unlock_bh(&ksocknal_data.ksnd_global_lock);
 }
 
 void

From patchwork Thu Oct 27 14:05:40 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022212
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id C82FCECAAA1
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:27:17 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4Mynjm5nGrz219y;
	Thu, 27 Oct 2022 07:14:20 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4Mynjh0q6xz1yCy
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:14:15 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 913051009109;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 8E9D6E8CAE; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:40 -0400
Message-Id: <1666879542-10737-14-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 13/15] lustre: obdclass: fix race in
 class_del_profile
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Li Dongyang <dongyangli@ddn.com>,
 Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Li Dongyang <dongyangli@ddn.com>

Move profile lookup and remove from lustre_profile_list
into the same critical section, otherwise we could race with
class_del_profiles or another class_del_profile.

Do not create duplicate mount opts in the client config,
otherwise we will add duplicate lustre_profile to
lustre_profile_list for a single mount.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15305
Lustre-commit: 83d3f42118579d7fb ("LU-15305 obdclass: fix race in class_del_profile")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/obd_config.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c
index 2b24276..75fc6a6 100644
--- a/fs/lustre/obdclass/obd_config.c
+++ b/fs/lustre/obdclass/obd_config.c
@@ -622,21 +622,28 @@ static int class_del_conn(struct obd_device *obd, struct lustre_cfg *lcfg)
 static LIST_HEAD(lustre_profile_list);
 static DEFINE_SPINLOCK(lustre_profile_list_lock);
 
-struct lustre_profile *class_get_profile(const char *prof)
+static struct lustre_profile *class_get_profile_nolock(const char *prof)
 {
 	struct lustre_profile *lprof;
 
-	spin_lock(&lustre_profile_list_lock);
 	list_for_each_entry(lprof, &lustre_profile_list, lp_list) {
-		if (!strcmp(lprof->lp_profile, prof)) {
+		if (strcmp(lprof->lp_profile, prof) == 0) {
 			lprof->lp_refs++;
-			spin_unlock(&lustre_profile_list_lock);
 			return lprof;
 		}
 	}
-	spin_unlock(&lustre_profile_list_lock);
 	return NULL;
 }
+
+struct lustre_profile *class_get_profile(const char *prof)
+{
+	struct lustre_profile *lprof;
+
+	spin_lock(&lustre_profile_list_lock);
+	lprof = class_get_profile_nolock(prof);
+	spin_unlock(&lustre_profile_list_lock);
+	return lprof;
+}
 EXPORT_SYMBOL(class_get_profile);
 
 /** Create a named "profile".
@@ -701,9 +708,9 @@ void class_del_profile(const char *prof)
 
 	CDEBUG(D_CONFIG, "Del profile %s\n", prof);
 
-	lprof = class_get_profile(prof);
+	spin_lock(&lustre_profile_list_lock);
+	lprof = class_get_profile_nolock(prof);
 	if (lprof) {
-		spin_lock(&lustre_profile_list_lock);
 		/* because get profile increments the ref counter */
 		lprof->lp_refs--;
 		list_del(&lprof->lp_list);
@@ -711,6 +718,8 @@ void class_del_profile(const char *prof)
 		spin_unlock(&lustre_profile_list_lock);
 
 		class_put_profile(lprof);
+	} else {
+		spin_unlock(&lustre_profile_list_lock);
 	}
 }
 EXPORT_SYMBOL(class_del_profile);

From patchwork Thu Oct 27 14:05:41 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022214
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 60AF7FA3741
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:29:26 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynlR6jtrz22Qr;
	Thu, 27 Oct 2022 07:15:47 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4Mynky05vWz21CD
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:15:21 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C4C18100910F;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 9176EFD4E1; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:41 -0400
Message-Id: <1666879542-10737-15-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 14/15] lnet: use 'fallthrough' pseudo keyword
 for switch
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Jian Yu <yujian@whamcloud.com>

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15220
Lustre-commit: 8fed107588b74c2a8 ("LU-15220 lnet: use 'fallthrough' pseudo keyword for switch")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45566
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  4 ++--
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c |  5 ++---
 net/lnet/klnds/socklnd/socklnd.c    |  3 ++-
 net/lnet/klnds/socklnd/socklnd_cb.c |  2 +-
 net/lnet/lnet/lib-move.c            |  4 ++--
 net/lnet/lnet/net_fault.c           |  2 +-
 net/lnet/selftest/conctl.c          |  2 +-
 net/lnet/selftest/module.c          |  8 ++++----
 net/lnet/selftest/rpc.c             | 21 +++++++++++----------
 9 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index d2e4ce9..94ff926 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2841,7 +2841,7 @@ static void kiblnd_base_shutdown(void)
 				       !atomic_read(&kiblnd_data.kib_nthreads),
 				       "Waiting for %d threads to terminate\n",
 				       atomic_read(&kiblnd_data.kib_nthreads));
-		/* fall through */
+		fallthrough;
 
 	case IBLND_INIT_NOTHING:
 		break;
@@ -2898,7 +2898,7 @@ static void kiblnd_shutdown(struct lnet_ni *ni)
 				       "%s: waiting for %d conns to clean\n",
 				       libcfs_nidstr(&ni->ni_nid),
 				       atomic_read(&net->ibn_nconns));
-		/* fall through */
+		fallthrough;
 
 	case IBLND_INIT_NOTHING:
 		LASSERT(!atomic_read(&net->ibn_nconns));
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 3e3be065..b16841e 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1821,9 +1821,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	switch (rxmsg->ibm_type) {
 	default:
 		LBUG();
-
+		fallthrough;
 	case IBLND_MSG_IMMEDIATE:
-		/* fallthrough */
 		nob = offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[rlen]);
 		if (nob > rx->rx_nob) {
 			CERROR("Immediate message from %s too big: %d(%d)\n",
@@ -2918,7 +2917,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 			}
 			break;
 		}
-		/* fall through */
+		fallthrough;
 	default:
 		CNETERR("%s rejected: reason %d, size %d\n",
 			libcfs_nid2str(peer_ni->ibp_nid), reason, priv_nob);
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index 996d3a9..e8f8020 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -2135,7 +2135,8 @@ static int ksocknal_inetaddr_event(struct notifier_block *unused,
 	switch (ksocknal_data.ksnd_init) {
 	default:
 		LASSERT(0);
-		/* fall through */
+		fallthrough;
+
 	case SOCKNAL_INIT_ALL:
 	case SOCKNAL_INIT_DATA:
 		hash_for_each(ksocknal_data.ksnd_peers, i, peer_ni, ksnp_list)
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index f358875..15fba9d 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -1254,7 +1254,7 @@ struct ksock_conn_cb *
 			ksocknal_close_conn_and_siblings(conn, rc);
 			return -EPROTO;
 		}
-		/* Fall through */
+		fallthrough;
 
 	case SOCKNAL_RX_SLOP:
 		/* starting new packet? */
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index d465789..225acca 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -3804,7 +3804,7 @@ struct lnet_mt_event_info {
 	case LNET_EVENT_UNLINK:
 		CDEBUG(D_NET, "%s recovery ping unlinked\n",
 		       libcfs_nidstr(&ev_info->mt_nid));
-		/* fall-through */
+		fallthrough;
 	case LNET_EVENT_REPLY:
 		lnet_handle_recovery_reply(ev_info, event->status, false,
 					   event->type == LNET_EVENT_UNLINK);
@@ -4012,7 +4012,7 @@ void lnet_monitor_thr_stop(void)
 			ready_delay = true;
 			goto again;
 		}
-		/* fall through */
+		fallthrough;
 
 	case LNET_MATCHMD_DROP:
 		CNETERR("Dropping PUT from %s portal %d match %llu offset %d length %d: %d\n",
diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c
index fe7a07c..77cdb73 100644
--- a/net/lnet/lnet/net_fault.c
+++ b/net/lnet/lnet/net_fault.c
@@ -721,7 +721,7 @@ struct delay_daemon_data {
 			case LNET_CREDIT_OK:
 				lnet_ni_recv(ni, msg->msg_private, msg, 0,
 					     0, msg->msg_len, msg->msg_len);
-				/* fall through */
+				fallthrough;
 			case LNET_CREDIT_WAIT:
 				continue;
 			default: /* failures */
diff --git a/net/lnet/selftest/conctl.c b/net/lnet/selftest/conctl.c
index b6dc4ee..ede7fe5 100644
--- a/net/lnet/selftest/conctl.c
+++ b/net/lnet/selftest/conctl.c
@@ -136,7 +136,7 @@
 
 	case LST_OPC_BATCHSRV:
 		client = 0;
-		/* fall through */
+		fallthrough;
 	case LST_OPC_BATCHCLI:
 		if (!args->lstio_dbg_namep)
 			goto out;
diff --git a/net/lnet/selftest/module.c b/net/lnet/selftest/module.c
index f6a3ec2..333f392 100644
--- a/net/lnet/selftest/module.c
+++ b/net/lnet/selftest/module.c
@@ -57,13 +57,13 @@ enum {
 	switch (lst_init_step) {
 	case LST_INIT_CONSOLE:
 		lstcon_console_fini();
-		/* fall through */
+		fallthrough;
 	case LST_INIT_FW:
 		sfw_shutdown();
-		/* fall through */
+		fallthrough;
 	case LST_INIT_RPC:
 		srpc_shutdown();
-		/* fall through */
+		fallthrough;
 	case LST_INIT_WI_TEST:
 		for (i = 0;
 		     i < cfs_cpt_number(lnet_cpt_table()); i++) {
@@ -73,7 +73,7 @@ enum {
 		}
 		kvfree(lst_test_wq);
 		lst_test_wq = NULL;
-		/* fall through */
+		fallthrough;
 	case LST_INIT_WI_SERIAL:
 		destroy_workqueue(lst_serial_wq);
 		lst_serial_wq = NULL;
diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c
index c376019..c75addc 100644
--- a/net/lnet/selftest/rpc.c
+++ b/net/lnet/selftest/rpc.c
@@ -1018,6 +1018,7 @@ struct srpc_bulk *
 	switch (wi->swi_state) {
 	default:
 		LBUG();
+		fallthrough;
 	case SWI_STATE_NEWBORN: {
 		struct srpc_msg *msg;
 		struct srpc_generic_reply *reply;
@@ -1059,7 +1060,7 @@ struct srpc_bulk *
 			ev->ev_status = rc;
 		}
 	}
-		/* fall through */
+	fallthrough;
 	case SWI_STATE_BULK_STARTED:
 		LASSERT(!rpc->srpc_bulk || ev->ev_fired);
 
@@ -1255,8 +1256,7 @@ struct srpc_bulk *
 			break;
 
 		wi->swi_state = SWI_STATE_REQUEST_SENT;
-		/* perhaps more events */
-		/* fall through */
+		fallthrough;
 	case SWI_STATE_REQUEST_SENT: {
 		enum srpc_msg_type type = srpc_service2reply(rpc->crpc_service);
 
@@ -1288,7 +1288,7 @@ struct srpc_bulk *
 
 		wi->swi_state = SWI_STATE_REPLY_RECEIVED;
 	}
-		/* fall through */
+	fallthrough;
 	case SWI_STATE_REPLY_RECEIVED:
 		if (do_bulk && !rpc->crpc_bulkev.ev_fired)
 			break;
@@ -1462,11 +1462,11 @@ struct srpc_client_rpc *
 		CERROR("Unknown event: status %d, type %d, lnet %d\n",
 		       rpcev->ev_status, rpcev->ev_type, rpcev->ev_lnet);
 		LBUG();
+		fallthrough;
 	case SRPC_REQUEST_SENT:
 		if (!ev->status && ev->type != LNET_EVENT_UNLINK)
 			atomic_inc(&RPC_STAT32(SRPC_RPC_SENT));
-
-		/* fall through */
+		fallthrough;
 	case SRPC_REPLY_RCVD:
 	case SRPC_BULK_REQ_RCVD:
 		crpc = rpcev->ev_data;
@@ -1588,7 +1588,7 @@ struct srpc_client_rpc *
 
 		if (!ev->unlinked)
 			break; /* wait for final event */
-		/* fall through */
+		fallthrough;
 	case SRPC_BULK_PUT_SENT:
 		if (!ev->status && ev->type != LNET_EVENT_UNLINK) {
 			atomic64_t *data;
@@ -1600,7 +1600,7 @@ struct srpc_client_rpc *
 
 			atomic64_add(ev->mlength, data);
 		}
-		/* fall through */
+		fallthrough;
 	case SRPC_REPLY_SENT:
 		srpc = rpcev->ev_data;
 		scd = srpc->srpc_scd;
@@ -1673,6 +1673,7 @@ struct srpc_client_rpc *
 	switch (state) {
 	default:
 		LBUG();
+		fallthrough;
 	case SRPC_STATE_RUNNING:
 		spin_lock(&srpc_data.rpc_glock);
 
@@ -1686,13 +1687,13 @@ struct srpc_client_rpc *
 		spin_unlock(&srpc_data.rpc_glock);
 
 		stt_shutdown();
-		/* fall through */
+		fallthrough;
 	case SRPC_STATE_EQ_INIT:
 		rc = LNetClearLazyPortal(SRPC_FRAMEWORK_REQUEST_PORTAL);
 		rc = LNetClearLazyPortal(SRPC_REQUEST_PORTAL);
 		LASSERT(!rc);
 		lnet_assert_handler_unused(srpc_data.rpc_lnet_handler);
-		/* fall through */
+		fallthrough;
 	case SRPC_STATE_NI_INIT:
 		LNetNIFini();
 	}

From patchwork Thu Oct 27 14:05:42 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Simmons <jsimmons@infradead.org>
X-Patchwork-Id: 13022213
Return-Path: <lustre-devel-bounces@lists.lustre.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from pdx1-mailman-customer002.dreamhost.com
 (listserver-buz.dreamhost.com [69.163.136.29])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id E6F15FA3740
	for <lustre-devel@archiver.kernel.org>; Thu, 27 Oct 2022 14:27:27 +0000 (UTC)
Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1])
	by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id
 4MynkS5Lmyz21BT;
	Thu, 27 Oct 2022 07:14:56 -0700 (PDT)
Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id
 4MynkC4k5Pz1wLC
 for <lustre-devel@lists.lustre.org>; Thu, 27 Oct 2022 07:14:43 -0700 (PDT)
Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134])
 by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 961BD100910E;
 Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
Received: by star.ccs.ornl.gov (Postfix, from userid 2004)
 id 94E55FD4E9; Thu, 27 Oct 2022 10:05:44 -0400 (EDT)
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Thu, 27 Oct 2022 10:05:42 -0400
Message-Id: <1666879542-10737-16-git-send-email-jsimmons@infradead.org>
X-Mailer: git-send-email 1.8.3.1
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
References: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 15/15] lustre: use 'fallthrough' pseudo
 keyword for switch
X-BeenThere: lustre-devel@lists.lustre.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "For discussing Lustre software development."
 <lustre-devel-lustre.org>
List-Unsubscribe: 
 <http://lists.lustre.org/options.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=unsubscribe>
List-Archive: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/>
List-Post: <mailto:lustre-devel@lists.lustre.org>
List-Help: <mailto:lustre-devel-request@lists.lustre.org?subject=help>
List-Subscribe: 
 <http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org>,
 <mailto:lustre-devel-request@lists.lustre.org?subject=subscribe>
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

From: Jian Yu <yujian@whamcloud.com>

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

WC-bug-id: https://jira.whamcloud.com/browse/LU-15220
Lustre-commit: 5549b1b9e032c6eae ("LU-15220 lustre: use 'fallthrough' pseudo keyword for switch")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46269
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_request.c    |  2 +-
 fs/lustre/llite/dir.c            |  4 ++--
 fs/lustre/llite/file.c           |  8 ++++----
 fs/lustre/llite/llite_lib.c      |  4 ++--
 fs/lustre/llite/namei.c          |  6 +++---
 fs/lustre/lov/lov_object.c       |  2 +-
 fs/lustre/obdecho/echo_client.c  |  2 +-
 fs/lustre/osc/osc_cache.c        |  2 +-
 fs/lustre/ptlrpc/pack_generic.c  |  8 ++++----
 fs/lustre/ptlrpc/ptlrpc_module.c | 12 ++++++------
 10 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index cf5a290..8b244d7 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -1372,7 +1372,7 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count,
 	case LDLM_IBITS:
 		if (ns->ns_cancel && ns->ns_cancel(lock) != 0)
 			break;
-		/* fall through */
+		fallthrough;
 	default:
 		result = LDLM_POLICY_SKIP_LOCK;
 		break;
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 451bd0e..abbba96 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -2017,9 +2017,9 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return rc;
 	}
 	case OBD_IOC_GETNAME_OLD:
-		/* fall through */
+		fallthrough;
 	case OBD_IOC_GETDTNAME:
-		/* fall through */
+		fallthrough;
 	case OBD_IOC_GETMDNAME:
 		return ll_get_obd_name(inode, cmd, arg);
 	case LL_IOC_FLUSHCTX:
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index f35cddc..350d5df 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -176,7 +176,7 @@ static int ll_close_inode_openhandle(struct inode *inode,
 		op_data->op_attr_blocks += ((struct inode *)data)->i_blocks;
 		op_data->op_attr.ia_valid |= ATTR_SIZE;
 		op_data->op_xvalid |= OP_XVALID_BLOCKS;
-		/* fallthrough */
+		fallthrough;
 	case MDS_CLOSE_LAYOUT_SPLIT:
 	case MDS_CLOSE_LAYOUT_SWAP: {
 		struct split_param *sp = data;
@@ -3317,7 +3317,7 @@ static int ll_ladvise_sanity(struct inode *inode,
 			       ladvise_names[advice], rc);
 			goto out;
 		}
-		/* fallthrough */
+		fallthrough;
 	case LU_LADVISE_WILLREAD:
 	case LU_LADVISE_DONTNEED:
 	default:
@@ -4028,9 +4028,9 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags)
 		return 0;
 	}
 	case OBD_IOC_GETNAME_OLD:
-		/* fall through */
+		fallthrough;
 	case OBD_IOC_GETDTNAME:
-		/* fall through */
+		fallthrough;
 	case OBD_IOC_GETMDNAME:
 		return ll_get_obd_name(inode, cmd, arg);
 	case LL_IOC_HSM_STATE_GET: {
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 130a723..3dc0030 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1057,7 +1057,7 @@ static int ll_options(char *options, struct super_block *sb)
 
 		case LL_SBI_CHECKSUM:
 			sbi->ll_checksum_set = 1;
-			/* fall through */
+			fallthrough;
 		case LL_SBI_USER_XATTR:
 		case LL_SBI_USER_FID2PATH:
 		case LL_SBI_LRU_RESIZE:
@@ -1135,7 +1135,7 @@ static int ll_options(char *options, struct super_block *sb)
 				LCONSOLE_ERROR_MSG(0x152,
 						   "invalid %s option\n", s1);
 			}
-		/* fall through */
+		fallthrough;
 		default:
 			break;
 		}
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 8b21effc..5ac634c 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -410,13 +410,13 @@ int ll_md_need_convert(struct ldlm_lock *lock)
 	switch (lock->l_req_mode) {
 	case LCK_PR:
 		mode = LCK_PR;
-		/* fall-through */
+		fallthrough;
 	case LCK_PW:
 		mode |= LCK_CR;
 		break;
 	case LCK_CW:
 		mode = LCK_CW;
-		/* fall-through */
+		fallthrough;
 	case LCK_CR:
 		mode |= LCK_CR;
 		break;
@@ -1784,7 +1784,7 @@ static int ll_mknod(struct inode *dir, struct dentry *dchild,
 	case 0:
 		mode |= S_IFREG;
 		/* for mode = 0 case */
-		/* fall through */
+		fallthrough;
 	case S_IFREG:
 	case S_IFCHR:
 	case S_IFBLK:
diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 5245fd6..34cb6a0 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -2308,7 +2308,7 @@ int lov_read_and_clear_async_rc(struct cl_object *clob)
 		}
 		case LLT_RELEASED:
 		case LLT_EMPTY:
-			/* fall through */
+			fallthrough;
 		case LLT_FOREIGN:
 			break;
 		default:
diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c
index f25ea41..d7de6e4 100644
--- a/fs/lustre/obdecho/echo_client.c
+++ b/fs/lustre/obdecho/echo_client.c
@@ -1072,7 +1072,7 @@ static int echo_client_brw_ioctl(const struct lu_env *env, int rw,
 		}
 
 		rw = OBD_BRW_WRITE;
-		/* fall through */
+		fallthrough;
 	case OBD_IOC_BRW_READ:
 		rc = echo_client_brw_ioctl(env, rw, exp, data);
 		goto out;
diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 12d9ab5..e563809 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -230,7 +230,7 @@ static int __osc_extent_sanity_check(struct osc_extent *ext,
 			rc = 65;
 			goto out;
 		}
-		/* fall through */
+		fallthrough;
 	default:
 		if (atomic_read(&ext->oe_users) > 0) {
 			rc = 70;
diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c
index 9acea24..9a0341c 100644
--- a/fs/lustre/ptlrpc/pack_generic.c
+++ b/fs/lustre/ptlrpc/pack_generic.c
@@ -783,7 +783,7 @@ u32 lustre_msg_get_flags(struct lustre_msg *msg)
 
 		CERROR("invalid msg %p: no ptlrpc body!\n", msg);
 	}
-	/* fall through */
+	fallthrough;
 	default:
 		/* flags might be printed in debug code while message
 		 * uninitialized
@@ -851,7 +851,7 @@ u32 lustre_msg_get_op_flags(struct lustre_msg *msg)
 
 		CERROR("invalid msg %p: no ptlrpc body!\n", msg);
 	}
-	/* fall through */
+	fallthrough;
 	default:
 		return 0;
 	}
@@ -1032,7 +1032,7 @@ int lustre_msg_get_status(struct lustre_msg *msg)
 
 		CERROR("invalid msg %p: no ptlrpc body!\n", msg);
 	}
-	/* fall through */
+	fallthrough;
 	default:
 		/* status might be printed in debug code while message
 		 * uninitialized
@@ -2069,7 +2069,7 @@ void lustre_swab_lmv_user_md(struct lmv_user_md *lum)
 	switch (lum->lum_magic) {
 	case LMV_USER_MAGIC_SPECIFIC:
 		count = lum->lum_stripe_count;
-		/* fallthrough */
+		fallthrough;
 	case __swab32(LMV_USER_MAGIC_SPECIFIC):
 		lustre_swab_lmv_user_md_objects(lum->lum_objects, count);
 		break;
diff --git a/fs/lustre/ptlrpc/ptlrpc_module.c b/fs/lustre/ptlrpc/ptlrpc_module.c
index 7e29a91..95a29b2 100644
--- a/fs/lustre/ptlrpc/ptlrpc_module.c
+++ b/fs/lustre/ptlrpc/ptlrpc_module.c
@@ -135,23 +135,23 @@ static int __init ptlrpc_init(void)
 	switch (cleanup_phase) {
 	case 8:
 		ptlrpc_nrs_fini();
-		/* Fall through */
+		fallthrough;
 	case 7:
 		sptlrpc_fini();
-		/* Fall through */
+		fallthrough;
 	case 6:
 		ldlm_exit();
-		/* Fall through */
+		fallthrough;
 	case 5:
 		ptlrpc_connection_fini();
-		/* Fall through */
+		fallthrough;
 	case 3:
 		ptlrpc_request_cache_fini();
-		/* Fall through */
+		fallthrough;
 	case 1:
 		ptlrpc_hr_fini();
 		req_layout_fini();
-		/* Fall through */
+		fallthrough;
 	default:
 		break;
 	}