From patchwork Fri Jan 14 01:37:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0F3A0C433EF for ; Fri, 14 Jan 2022 01:38:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 93CA621FF8F; Thu, 13 Jan 2022 17:38:08 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F332421FF8F for ; Thu, 13 Jan 2022 17:38:06 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id A75D610087C6; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9A8D2A8103; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:40 -0500 Message-Id: <1642124283-10148-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/24] lustre: osc: don't have extra gpu call X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov osc don't needs to call GPU to check an GPU page, this is in the oap_flags WC-bug-id: https://jira.whamcloud.com/browse/LU-15189 Lustre-commit: a75f1a90611038ea0 ("LU-15189 osc: don't have extra nvidia call") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/45481 Reviewed-by: Andrew Perepechko Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 59dc625..14863dc 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1584,7 +1584,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, } } - if (lnet_is_rdma_only_page(pga[0]->pg)) { + if (brw_page2oap(pga[0])->oap_brw_flags & OBD_BRW_RDMA_ONLY) { enable_checksum = false; short_io_size = 0; } From patchwork Fri Jan 14 01:37:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B3063C433F5 for ; Fri, 14 Jan 2022 01:38:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E230E3AD960; Thu, 13 Jan 2022 17:38:23 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 371E121FF8F for ; Thu, 13 Jan 2022 17:38:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id A8F34100BAF3; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9EDFADF4C4; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:41 -0500 Message-Id: <1642124283-10148-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/24] lustre: llite: add trusted.projid virtual xattr X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang Add trusted.projid virtual xattr in ldiskfs to export the current project id, intended for ldiskfs level MDT backup. When the project id is EXT4_DEF_PROJID/0, the virtual xattr is hidden from listxattr(2). It's also hidden on lustre client when parent has the project inherit flag and the same project ID, to stop mv from setting the virtual xattr on the dest with the project id from src, which could be different from dest. getxattr(2) on trusted.projid will report current project id, setxattr(2) will change curent project id and removexattr(2) will set project id back to EXT4_DEF_PROJID/0 Both get|setxattr(2) will work even when the virtual xattr is hidden. Invalidate client xattr cache for the inode when changing its project id, so the virtual xattr can get the new value for next getxattr(2) Add test cases to verify the virtual projid xattr and backup restore MDT using tar can now preserve the project id. Change mds_backup_restore in test framework, to use tar with --xattrs --xattrs-include='trusted.*'" options. WC-bug-id: https://jira.whamcloud.com/browse/LU-12056 Lustre-commit: 665383d3a1f4d1dc7 ("LU-12056 ldiskfs: add trusted.projid virtual xattr") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/45679 Reviewed-by: Andreas Dilger Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/xattr.c | 15 +++++++++++++++ fs/lustre/llite/xattr_cache.c | 15 +++++++++++++++ include/uapi/linux/lustre/lustre_idl.h | 1 + 3 files changed, 31 insertions(+) diff --git a/fs/lustre/llite/xattr.c b/fs/lustre/llite/xattr.c index 6aea651..ce9585a 100644 --- a/fs/lustre/llite/xattr.c +++ b/fs/lustre/llite/xattr.c @@ -613,6 +613,7 @@ static int ll_xattr_get(const struct xattr_handler *handler, ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) { + struct inode *dir = d_inode(dentry->d_parent); struct inode *inode = d_inode(dentry); struct ll_sb_info *sbi = ll_i2sbi(inode); ktime_t kstart = ktime_get(); @@ -656,6 +657,20 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) hide_xattr = true; } + /* Hide virtual project id xattr from the list when + * parent has the inherit flag and the same project id, + * so project id won't be messed up by copying the xattrs + * when mv to a tree with different project id. + */ + if (get_xattr_type(xattr_name)->flags == XATTR_TRUSTED_T && + strcmp(xattr_name, XATTR_NAME_PROJID) == 0) { + if (ll_i2info(inode)->lli_projid == + ll_i2info(dir)->lli_projid && + test_bit(LLIF_PROJECT_INHERIT, + &ll_i2info(dir)->lli_flags)) + hide_xattr = true; + } + len = strnlen(xattr_name, rem - 1) + 1; rem -= len; if (!xattr_type_filter(sbi, hide_xattr ? NULL : diff --git a/fs/lustre/llite/xattr_cache.c b/fs/lustre/llite/xattr_cache.c index 7c1f5b7..723cc39 100644 --- a/fs/lustre/llite/xattr_cache.c +++ b/fs/lustre/llite/xattr_cache.c @@ -563,6 +563,21 @@ int ll_xattr_cache_get(struct inode *inode, const char *name, char *buffer, else rc = -ERANGE; } + /* Return the project id when the virtual project id xattr + * is explicitly asked. + */ + } else if (strcmp(name, XATTR_NAME_PROJID) == 0) { + /* 10 chars to hold u32 in decimal, plus ending \0 */ + char projid[11]; + + rc = snprintf(projid, sizeof(projid), + "%u", lli->lli_projid); + if (size != 0) { + if (rc <= size) + memcpy(buffer, projid, rc); + else + rc = -ERANGE; + } } } else if (valid & OBD_MD_FLXATTRLS) { rc = ll_xattr_cache_list(&lli->lli_xattrs, diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 35d3ed2..78e20a7 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -1078,6 +1078,7 @@ struct lov_mds_md_v1 { /* LOV EA mds/wire data (little-endian) */ #define XATTR_NAME_SOM "trusted.som" #define XATTR_NAME_HSM "trusted.hsm" #define XATTR_NAME_LFSCK_NAMESPACE "trusted.lfsck_namespace" +#define XATTR_NAME_PROJID "trusted.projid" #define LL_XATTR_NAME_ENCRYPTION_CONTEXT XATTR_SECURITY_PREFIX"c" From patchwork Fri Jan 14 01:37:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 37474C433F5 for ; Fri, 14 Jan 2022 01:38:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EEE493AD923; Thu, 13 Jan 2022 17:38:40 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 89D9321FF8F for ; Thu, 13 Jan 2022 17:38:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id ACA43100F320; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A34EEE07E3; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:42 -0500 Message-Id: <1642124283-10148-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/24] lnet: o2iblnd: cleanup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexey Lyashkov simplify kiblnd_send by avoid code duplication. lets pickup idle tx first. HPE-bug-id: LUS-1796 WC-bug-id: https://jira.whamcloud.com/browse/LU-14008 Lustre-commit: 3916b9d7226ebb21c ("LU-14008 o2iblnd: cleanup") Signed-off-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/40260 Reviewed-by: Chris Horn Reviewed-by: Alexander Boyko Reviewed-by: Cyril Bordage Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 43 ++++++++++++++----------------------- net/lnet/lnet/lib-move.c | 1 + 2 files changed, 17 insertions(+), 27 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index db13f41..7560fe1 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1543,6 +1543,15 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, iov_iter_advance(&from, payload_offset); + tx = kiblnd_get_idle_tx(ni, target.nid); + if (!tx) { + CERROR("Can't allocate %s txd for %s\n", + lnet_msgtyp2str(type), + libcfs_nid2str(target.nid)); + return -ENOMEM; + } + ibmsg = tx->tx_msg; + switch (type) { default: LBUG(); @@ -1561,14 +1570,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force) break; /* send IMMEDIATE */ - tx = kiblnd_get_idle_tx(ni, target.nid); - if (!tx) { - CERROR("Can't allocate txd for GET to %s\n", - libcfs_nid2str(target.nid)); - return -ENOMEM; - } - - ibmsg = tx->tx_msg; rd = &ibmsg->ibm_u.get.ibgm_rd; rc = kiblnd_setup_rd_kiov(ni, tx, rd, payload_niov, payload_kiov, @@ -1595,7 +1596,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, return -EIO; } - tx->tx_lntmsg[0] = lntmsg; /* finalise lntmsg[0,1] on completion */ + /* finalise lntmsg[0,1] on completion */ + tx->tx_lntmsg[0] = lntmsg; tx->tx_waiting = 1; /* waiting for GET_DONE */ kiblnd_launch_tx(ni, tx, target.nid); return 0; @@ -1607,14 +1609,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force) break; /* send IMMEDIATE */ - tx = kiblnd_get_idle_tx(ni, target.nid); - if (!tx) { - CERROR("Can't allocate %s txd for %s\n", - type == LNET_MSG_PUT ? "PUT" : "REPLY", - libcfs_nid2str(target.nid)); - return -ENOMEM; - } - rc = kiblnd_setup_rd_kiov(ni, tx, tx->tx_rd, payload_niov, payload_kiov, payload_offset, payload_nob); @@ -1625,12 +1619,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, return -EIO; } - ibmsg = tx->tx_msg; ibmsg->ibm_u.putreq.ibprm_hdr = *hdr; ibmsg->ibm_u.putreq.ibprm_cookie = tx->tx_cookie; kiblnd_init_tx_msg(ni, tx, IBLND_MSG_PUT_REQ, sizeof(struct kib_putreq_msg)); - tx->tx_lntmsg[0] = lntmsg; /* finalise lntmsg on completion */ + /* finalise lntmsg on completion */ + tx->tx_lntmsg[0] = lntmsg; tx->tx_waiting = 1; /* waiting for PUT_{ACK,NAK} */ kiblnd_launch_tx(ni, tx, target.nid); return 0; @@ -1641,13 +1635,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[payload_nob]) <= IBLND_MSG_SIZE); - tx = kiblnd_get_idle_tx(ni, target.nid); - if (!tx) { - CERROR("Can't send %d to %s: tx descs exhausted\n", - type, libcfs_nid2str(target.nid)); - return -ENOMEM; - } - ibmsg = tx->tx_msg; ibmsg->ibm_u.immediate.ibim_hdr = *hdr; @@ -1661,7 +1648,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, nob = offsetof(struct kib_immediate_msg, ibim_payload[payload_nob]); kiblnd_init_tx_msg(ni, tx, IBLND_MSG_IMMEDIATE, nob); - tx->tx_lntmsg[0] = lntmsg; /* finalise lntmsg on completion */ + /* finalise lntmsg on completion */ + tx->tx_lntmsg[0] = lntmsg; + kiblnd_launch_tx(ni, tx, target.nid); return 0; } diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index caffa30..133397e 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -4215,6 +4215,7 @@ void lnet_monitor_thr_stop(void) return ""; } } +EXPORT_SYMBOL(lnet_msgtyp2str); int lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid4, From patchwork Fri Jan 14 01:37:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C0970C433EF for ; Fri, 14 Jan 2022 01:38:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 402BB3AD9A7; Thu, 13 Jan 2022 17:38:27 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F0D7821FF9A for ; Thu, 13 Jan 2022 17:38:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B0D78100F323; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A7471E07E4; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:43 -0500 Message-Id: <1642124283-10148-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/24] lustre: ptlrpc: make rq_replied flag always correct X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Zarochentsev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Zarochentsev rq_replied flag is cleared at ptl_rpc_send() only, so state of the flag may be incorrect for rpcs which are timed out but have have been never sent. HPE-bug-id: LUS-8752 WC-bug-id: https://jira.whamcloud.com/browse/LU-15112 Lustre-commit: 94f3f1b511609fa19 ("LU-15112 ptlrpc: make rq_replied flag always correct") Signed-off-by: Alexander Zarochentsev Reviewed-on: https://review.whamcloud.com/45871 Reviewed-by: Andrew Perepechko Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/ptlrpc_internal.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/lustre/ptlrpc/ptlrpc_internal.h b/fs/lustre/ptlrpc/ptlrpc_internal.h index d6edfde..d902cfe 100644 --- a/fs/lustre/ptlrpc/ptlrpc_internal.h +++ b/fs/lustre/ptlrpc/ptlrpc_internal.h @@ -336,6 +336,7 @@ static inline void ptlrpc_cli_req_init(struct ptlrpc_request *req) req->rq_receiving_reply = 0; req->rq_req_unlinked = 1; req->rq_reply_unlinked = 1; + req->rq_replied = 0; INIT_LIST_HEAD(&cr->cr_set_chain); INIT_LIST_HEAD(&cr->cr_ctx_chain); From patchwork Fri Jan 14 01:37:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B5361C433EF for ; Fri, 14 Jan 2022 01:38:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C95093AD9DA; Thu, 13 Jan 2022 17:38:30 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 363CA21FFA9 for ; Thu, 13 Jan 2022 17:38:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B519E100F324; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AB34AE080D; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:44 -0500 Message-Id: <1642124283-10148-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/24] lustre: mgc: do not ignore target registration failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Zarochentsev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Zarochentsev A serious target registation failure with LDD_F_ERROR flag set is ignored by target, it makes possible registreting new target with already used index; Writeconf flag should be encoded in fs label regardless the "first_time" flag, otherwise target cannot be registered after initial registration failure. HPE-bug-id: LUS-8752 WC-bug-id: https://jira.whamcloud.com/browse/LU-15112 Lustre-commit: cefabee52586f443b ("LU-15112 mgc: do not ignore target registration failure") Signed-off-by: Alexander Zarochentsev Reviewed-on: https://review.whamcloud.com/45259 Reviewed-by: Alexander Boyko Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mgc/mgc_request.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index 3955d1f..62bf0ea 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -937,7 +937,10 @@ static int mgc_target_register(struct obd_export *exp, if (!rc) { rep_mti = req_capsule_server_get(&req->rq_pill, &RMF_MGS_TARGET_INFO); - memcpy(mti, rep_mti, sizeof(*rep_mti)); + if (rep_mti) + memcpy(mti, rep_mti, sizeof(*rep_mti)); + } + if (!rc) { CDEBUG(D_MGC, "register %s got index = %d\n", mti->mti_svname, mti->mti_stripe_index); } From patchwork Fri Jan 14 01:37:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713326 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0EBE2C433F5 for ; Fri, 14 Jan 2022 01:38:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D391821F792; Thu, 13 Jan 2022 17:38:44 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6E51121FFA9 for ; Thu, 13 Jan 2022 17:38:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B7C11100F325; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AF560E080E; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:45 -0500 Message-Id: <1642124283-10148-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/24] lustre: llite: make foreign symlinks aware of mount namespaces X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Currently the foreign symlink code test if mount namespace is the same namespace related to the sysfs tree. This doesn't cover all cases. Linux supports limiting which mounts are visible to a process with mount namespaces. Lets add this support as well. WC-bug-id: https://jira.whamcloud.com/browse/LU-10824 Lustre-commit: 942b4e118677af587 ("LU-10824 llite: make foreign symlinks aware of mount namespaces") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/45609 Reviewed-by: Andreas Dilger Reviewed-by: Faccini Bruno Reviewed-by: Oleg Drokin --- fs/lustre/llite/llite_foreign_symlink.c | 8 ++++---- fs/lustre/llite/llite_internal.h | 1 + fs/lustre/llite/llite_lib.c | 1 + 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/llite_foreign_symlink.c b/fs/lustre/llite/llite_foreign_symlink.c index bfade93..64bc5db 100644 --- a/fs/lustre/llite/llite_foreign_symlink.c +++ b/fs/lustre/llite/llite_foreign_symlink.c @@ -367,15 +367,15 @@ static struct dentry *ll_foreign_dir_lookup(struct inode *parent, static bool has_same_mount_namespace(struct ll_sb_info *sbi) { - int rc; + bool same; - rc = (sbi->ll_mnt.mnt == current->fs->root.mnt); - if (!rc) + same = (sbi->ll_mnt_ns == current->nsproxy->mnt_ns); + if (!same) LCONSOLE_WARN("%s: client mount %s and '%s.%d' not in same mnt-namespace\n", sbi->ll_fsname, sbi->ll_kset.kobj.name, current->comm, current->pid); - return rc; + return same; } ssize_t foreign_symlink_enable_show(struct kobject *kobj, diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 54fd8d4..a2abec6 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -672,6 +672,7 @@ struct ll_sb_info { struct obd_device *ll_dt_obd; struct dentry *ll_debugfs_entry; struct lu_fid ll_root_fid; /* root object fid */ + struct mnt_namespace *ll_mnt_ns; DECLARE_BITMAP(ll_flags, LL_SBI_NUM_FLAGS); /* enum ll_sbi_flags */ unsigned int ll_xattr_cache_enabled:1, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 87cdc36..f8ecdcba 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -445,6 +445,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) sb->s_maxbytes = MAX_LFS_FILESIZE; sbi->ll_namelen = osfs->os_namelen; sbi->ll_mnt.mnt = current->fs->root.mnt; + sbi->ll_mnt_ns = current->nsproxy->mnt_ns; if (test_bit(LL_SBI_USER_XATTR, sbi->ll_flags) && !(data->ocd_connect_flags & OBD_CONNECT_XATTR)) { From patchwork Fri Jan 14 01:37:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4C75FC433EF for ; Fri, 14 Jan 2022 01:38:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4A8A521FF91; Thu, 13 Jan 2022 17:38:23 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A77BA21FFA9 for ; Thu, 13 Jan 2022 17:38:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id BC88D100F326; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B53E1A8102; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:46 -0500 Message-Id: <1642124283-10148-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/24] lustre: lov: Cache stripe offset calculation X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Calculating the page offset relative to the stripe (etc) in a file is surprisingly expensive. Because i/o has already been split up to stripes by the cl_io code, calculating the stripe each time is unnecessary. We cache most of the values requiring calculation. This improves AIO/DIO page submission significantly, improving performance by a bit over 10%. Also remove lpg_generation, which isn't doing anything useful. This suggests the possibility of removing lov_page, but that's for another patch. This patch reduces i/o time in ms/GiB by: Write: 17 ms/GiB Read: 22 ms/GiB Totals: Write: 119 ms/GiB Read: 121 ms/GiB mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect With previous patches in series: write 7531 MiB/s read 7179 MiB/s Plus this patch: write 8637 MiB/s read 8488 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: 14db1faa0fbe813fe ("LU-13799 lov: Cache stripe offset calculation") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39445 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_cl_internal.h | 9 +++++-- fs/lustre/lov/lov_io.c | 6 +++++ fs/lustre/lov/lov_page.c | 57 +++++++++++++++++++++++++++++++---------- 3 files changed, 57 insertions(+), 15 deletions(-) diff --git a/fs/lustre/lov/lov_cl_internal.h b/fs/lustre/lov/lov_cl_internal.h index d48e2df3..42fd10a 100644 --- a/fs/lustre/lov/lov_cl_internal.h +++ b/fs/lustre/lov/lov_cl_internal.h @@ -453,8 +453,6 @@ struct lov_lock { struct lov_page { struct cl_page_slice lps_cl; - /* the layout gen when this page was created */ - u32 lps_layout_gen; }; /* @@ -524,6 +522,7 @@ struct lov_io_sub { /** * IO state private for LOV. */ +#define LIS_CACHE_ENTRY_NONE -ENOENT struct lov_io { /** super-class */ struct cl_io_slice lis_cl; @@ -590,6 +589,12 @@ struct lov_io { * All sub-io's created in this lov_io. */ struct list_head lis_subios; + /* Cached results from stripe & offset calculations for page init */ + int lis_cached_entry; + int lis_cached_stripe; + loff_t lis_cached_off; + loff_t lis_cached_suboff; + struct lov_io_sub *lis_cached_sub; }; struct lov_session { diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 8df13ee..904bafd 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -467,6 +467,7 @@ static int lov_io_slice_init(struct lov_io *lio, struct lov_object *obj, io->ci_result = 0; lio->lis_object = obj; + lio->lis_cached_entry = LIS_CACHE_ENTRY_NONE; switch (io->ci_type) { case CIT_READ: @@ -1053,6 +1054,11 @@ static void lov_io_end(const struct lu_env *env, const struct cl_io_slice *ios) { int rc; + /* Before ending each i/o, we must set lis_cached_entry to tell the + * next i/o not to use stale cached lis information. + */ + cl2lov_io(env, ios)->lis_cached_entry = LIS_CACHE_ENTRY_NONE; + rc = lov_io_call(env, cl2lov_io(env, ios), lov_io_end_wrapper); LASSERT(rc == 0); } diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c index fdc415b..16bd7cd 100644 --- a/fs/lustre/lov/lov_page.c +++ b/fs/lustre/lov/lov_page.c @@ -56,8 +56,7 @@ static int lov_comp_page_print(const struct lu_env *env, struct lov_page *lp = cl2lov_page(slice); return (*printer)(env, cookie, - LUSTRE_LOV_NAME "-page@%p, gen: %u\n", - lp, lp->lps_layout_gen); + LUSTRE_LOV_NAME"-page@%p\n", lp); } static const struct cl_page_operations lov_comp_page_ops = { @@ -74,33 +73,65 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj, struct cl_object *o; struct lov_io_sub *sub; struct lov_page *lpg = cl_object_page_slice(obj, page); + bool stripe_cached = false; u64 offset; u64 suboff; - int stripe; int entry; + int stripe; int rc; + /* Direct i/o (CPT_TRANSIENT) is split strictly to stripes, so we can + * cache the stripe information. Buffered i/o is differently + * organized, and stripe calculation isn't a significant cost for + * buffered i/o, so we only cache this for direct i/o. + */ + stripe_cached = lio->lis_cached_entry != LIS_CACHE_ENTRY_NONE && + page->cp_type == CPT_TRANSIENT; + offset = cl_offset(obj, index); - entry = lov_io_layout_at(lio, offset); + + if (stripe_cached) { + entry = lio->lis_cached_entry; + stripe = lio->lis_cached_stripe; + /* Offset can never go backwards in an i/o, so this is valid */ + suboff = lio->lis_cached_suboff + offset - lio->lis_cached_off; + } else { + entry = lov_io_layout_at(lio, offset); + + stripe = lov_stripe_number(loo->lo_lsm, entry, offset); + rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe, + &suboff); + LASSERT(rc == 0); + lio->lis_cached_entry = entry; + lio->lis_cached_stripe = stripe; + lio->lis_cached_off = offset; + lio->lis_cached_suboff = suboff; + } + if (entry < 0 || !lsm_entry_inited(loo->lo_lsm, entry)) { /* non-existing layout component */ lov_page_init_empty(env, obj, page, index); return 0; } - r0 = lov_r0(loo, entry); - stripe = lov_stripe_number(loo->lo_lsm, entry, offset); - LASSERT(stripe < r0->lo_nr); - rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe, &suboff); - LASSERT(rc == 0); + CDEBUG(D_PAGE, "offset %llu, entry %d, stripe %d, suboff %llu\n", + offset, entry, stripe, suboff); page->cp_lov_index = lov_comp_index(entry, stripe); - lpg->lps_layout_gen = loo->lo_lsm->lsm_layout_gen; cl_page_slice_add(page, &lpg->lps_cl, obj, &lov_comp_page_ops); - sub = lov_sub_get(env, lio, page->cp_lov_index); - if (IS_ERR(sub)) - return PTR_ERR(sub); + if (!stripe_cached) { + sub = lov_sub_get(env, lio, page->cp_lov_index); + if (IS_ERR(sub)) + return PTR_ERR(sub); + } else { + sub = lio->lis_cached_sub; + } + + lio->lis_cached_sub = sub; + + r0 = lov_r0(loo, entry); + LASSERT(stripe < r0->lo_nr); subobj = lovsub2cl(r0->lo_sub[stripe]); cl_object_for_each(o, subobj) { From patchwork Fri Jan 14 01:37:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9102CC433EF for ; Fri, 14 Jan 2022 01:38:46 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AF4CE21FFA9; Thu, 13 Jan 2022 17:38:34 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F13AC3B60F7 for ; Thu, 13 Jan 2022 17:38:08 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id BFC5E100F327; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B9DB0A8103; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:47 -0500 Message-Id: <1642124283-10148-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/24] lnet: o2iblnd: treat cmid->device == NULL as an error X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov Even if rdma_bind_addr is successful, kiblnd_dev_failover should treat cmid->device == NULL as an error in order to later avoid calling kiblnd_set_ni_fatal_on with possibly dev->ibd_hdev == NULL. Fixes: 5e07562bc3 ("lnet: o2iblnd: clear fatal error on successful failover") WC-bug-id: https://jira.whamcloud.com/browse/LU-15018 Lustre-commit: abd0ce62e96523193 ("LU-15018 o2iblnd: treat cmid->device == NULL as an error") Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/44981 Reviewed-by: Chris Horn Reviewed-by: Cyril Bordage Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 7d28acd..76f5e7f 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2365,6 +2365,7 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns) struct kib_net *net; struct sockaddr_in addr; struct net_device *netdev; + bool set_fatal = true; unsigned long flags; int rc = 0; int i; @@ -2416,6 +2417,8 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns) CERROR("Failed to bind %s:%pI4h to device(%p): %d\n", dev->ibd_ifname, &dev->ibd_ifip, cmid->device, rc); + if (!rc && !cmid->device) + set_fatal = false; rdma_destroy_id(cmid); goto out; } @@ -2490,11 +2493,13 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns) } else { dev->ibd_failed_failover = 0; - rcu_read_lock(); - netdev = dev_get_by_name_rcu(ns, dev->ibd_ifname); - if (netdev && (kiblnd_get_link_status(netdev) == 1)) - kiblnd_set_ni_fatal_on(dev->ibd_hdev, 0); - rcu_read_unlock(); + if (set_fatal) { + rcu_read_lock(); + netdev = dev_get_by_name_rcu(ns, dev->ibd_ifname); + if (netdev && (kiblnd_get_link_status(netdev) == 1)) + kiblnd_set_ni_fatal_on(dev->ibd_hdev, 0); + rcu_read_unlock(); + } } return rc; From patchwork Fri Jan 14 01:37:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6274BC433F5 for ; Fri, 14 Jan 2022 01:38:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB7693AD9A0; Thu, 13 Jan 2022 17:38:26 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 361583AD812 for ; Thu, 13 Jan 2022 17:38:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C2D08100F32C; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BE109DF4C4; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:48 -0500 Message-Id: <1642124283-10148-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/24] lustre: lmv: set default LMV for "lfs mkdir -c 1" X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao With the introduction of filesystem-wide default LMV, dirs will be created on MDT by space usage, but if dir is created by "lfs mkdir -c 1 ...", its subdirs should be kept on the same MDT. To achieve this, set default LMV on such dirs, NB if user doesn't want this, he needs to create dir with "lfs mkdir -c 1 --max-inherit=0 ...". The policy to choose MDT in mkdir is as below: 1. is "lfs mkdir -i N"? mkdir on MDT N. 2. is "lfs mkdir -i -1"? mkdir by space usage. 3. is starting MDT specified in default LMV? mkdir on MDT N. 4. is default LMV space balanced? mkdir by space usage. WC-bug-id: https://jira.whamcloud.com/browse/LU-14560 Lustre-commit: bc2d7f065af6b4f9a ("LU-13560 lod: set default LMV for "lfs mkdir -c 1") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/45290 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 104 +++++++++++++++++++++++++++--------------------- 1 file changed, 58 insertions(+), 46 deletions(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index c87f37f..55816a1 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1770,46 +1770,39 @@ int lmv_old_layout_lookup(struct lmv_obd *lmv, struct md_op_data *op_data) return rc; } +/* mkdir by QoS upon 'lfs mkdir -i -1'. + * + * NB, mkdir by QoS only if parent is not striped, this is to avoid remote + * directories under striped directory. + */ static inline bool lmv_op_user_qos_mkdir(const struct md_op_data *op_data) { const struct lmv_user_md *lum = op_data->op_data; + if (op_data->op_code != LUSTRE_OPC_MKDIR) + return false; + + if (lmv_dir_striped(op_data->op_mea1)) + return false; + return (op_data->op_cli_flags & CLI_SET_MEA) && lum && le32_to_cpu(lum->lum_magic) == LMV_USER_MAGIC && le32_to_cpu(lum->lum_stripe_offset) == LMV_OFFSET_DEFAULT; } +/* mkdir by QoS if either ROOT or parent default LMV is space balanced. */ static inline bool lmv_op_default_qos_mkdir(const struct md_op_data *op_data) { const struct lmv_stripe_md *lsm = op_data->op_default_mea1; - return (op_data->op_flags & MF_QOS_MKDIR) || - (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT); -} - -/* mkdir by QoS in three cases: - * 1. ROOT default LMV is space balanced. - * 2. 'lfs mkdir -i -1' - * 3. parent default LMV master_mdt_index is -1 - * - * NB, mkdir by QoS only if parent is not striped, this is to avoid remote - * directories under striped directory. - */ -static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data) -{ if (op_data->op_code != LUSTRE_OPC_MKDIR) return false; if (lmv_dir_striped(op_data->op_mea1)) return false; - if (lmv_op_user_qos_mkdir(op_data)) - return true; - - if (lmv_op_default_qos_mkdir(op_data)) - return true; - - return false; + return (op_data->op_flags & MF_QOS_MKDIR) || + (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT); } /* if parent default LMV is space balanced, and @@ -1853,6 +1846,38 @@ static inline bool lmv_op_user_specific_mkdir(const struct md_op_data *op_data) LMV_OFFSET_DEFAULT; } +/* locate MDT by space usage */ +static struct lu_tgt_desc *lmv_locate_tgt_by_space(struct lmv_obd *lmv, + struct md_op_data *op_data, + struct lmv_tgt_desc *tgt) +{ + struct lmv_tgt_desc *tmp = tgt; + + tgt = lmv_locate_tgt_qos(lmv, op_data->op_mds, op_data->op_dir_depth); + if (tgt == ERR_PTR(-EAGAIN)) { + if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) && + !lmv_op_default_rr_mkdir(op_data) && + !lmv_op_user_qos_mkdir(op_data)) + /* if not necessary, don't create remote directory. */ + tgt = tmp; + else + tgt = lmv_locate_tgt_rr(lmv); + } + + /* + * only update statfs after QoS mkdir, this means the cached statfs may + * be stale, and current mkdir may not follow QoS accurately, but it's + * not serious, and avoids periodic statfs when client doesn't mkdir by + * QoS. + */ + if (!IS_ERR(tgt)) { + op_data->op_mds = tgt->ltd_index; + lmv_statfs_check_update(lmv2obd_dev(lmv), tgt); + } + + return tgt; +} + int lmv_create(struct obd_export *exp, struct md_op_data *op_data, const void *data, size_t datalen, umode_t mode, uid_t uid, gid_t gid, kernel_cap_t cap_effective, u64 rdev, @@ -1886,6 +1911,12 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, if (IS_ERR(tgt)) return PTR_ERR(tgt); + /* the order to apply policy in mkdir: + * 1. is "lfs mkdir -i N"? mkdir on MDT N. + * 2. is "lfs mkdir -i -1"? mkdir by space usage. + * 3. is starting MDT specified in default LMV? mkdir on MDT N. + * 4. is default LMV space balanced? mkdir by space usage. + */ if (lmv_op_user_specific_mkdir(op_data)) { struct lmv_user_md *lum = op_data->op_data; @@ -1893,39 +1924,20 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, tgt = lmv_tgt(lmv, op_data->op_mds); if (!tgt) return -ENODEV; + } else if (lmv_op_user_qos_mkdir(op_data)) { + tgt = lmv_locate_tgt_by_space(lmv, op_data, tgt); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); } else if (lmv_op_default_specific_mkdir(op_data)) { op_data->op_mds = op_data->op_default_mea1->lsm_md_master_mdt_index; tgt = lmv_tgt(lmv, op_data->op_mds); if (!tgt) return -ENODEV; - } else if (lmv_op_qos_mkdir(op_data)) { - struct lmv_tgt_desc *tmp = tgt; - - tgt = lmv_locate_tgt_qos(lmv, op_data->op_mds, - op_data->op_dir_depth); - if (tgt == ERR_PTR(-EAGAIN)) { - if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) && - !lmv_op_default_rr_mkdir(op_data) && - !lmv_op_user_qos_mkdir(op_data)) - /* if it's not necessary, don't create remote - * directory. - */ - tgt = tmp; - else - tgt = lmv_locate_tgt_rr(lmv); - } + } else if (lmv_op_default_qos_mkdir(op_data)) { + tgt = lmv_locate_tgt_by_space(lmv, op_data, tgt); if (IS_ERR(tgt)) return PTR_ERR(tgt); - - op_data->op_mds = tgt->ltd_index; - /* - * only update statfs after QoS mkdir, this means the cached - * statfs may be stale, and current mkdir may not follow QoS - * accurately, but it's not serious, and avoids periodic statfs - * when client doesn't mkdir by QoS. - */ - lmv_statfs_check_update(obd, tgt); } retry: From patchwork Fri Jan 14 01:37:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C2BB1C433F5 for ; Fri, 14 Jan 2022 01:38:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E09833AD9AA; Thu, 13 Jan 2022 17:38:37 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7FC043AD812 for ; Thu, 13 Jan 2022 17:38:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C5E13100F32D; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C2B7FE07E3; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:49 -0500 Message-Id: <1642124283-10148-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/24] lnet: socklnd: decrement connection counters on close X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov To gracefully handle potential race with delayed connection create, decrement connection counters per type as connections are being closed. Fixes: 511ace4a ("lnet: socklnd: add conns_per_peer parameter") WC-bug-id: https://jira.whamcloud.com/browse/LU-15137 Lustre-commit: 7e26413aa85fdc931 ("LU-15137 socklnd: decrement connection counters on close") Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/45422 Reviewed-by: Amir Shehata Reviewed-by: Cyril Bordage Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 69 ++++++++++++++++++++++++++++++++++------ 1 file changed, 60 insertions(+), 9 deletions(-) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index b014aa8..6d1f85c 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -422,7 +422,9 @@ struct ksock_peer_ni * switch (type) { case SOCKLND_CONN_CONTROL: conn_cb->ksnr_ctrl_conn_count++; - /* there's a single control connection per peer */ + /* there's a single control connection per peer, + * two in case of loopback + */ conn_cb->ksnr_connected |= BIT(type); break; case SOCKLND_CONN_BULK_IN: @@ -449,6 +451,45 @@ struct ksock_peer_ni * } static void +ksocknal_decr_conn_count(struct ksock_conn_cb *conn_cb, + int type) +{ + conn_cb->ksnr_conn_count--; + + /* check if all connections of the given type got created */ + switch (type) { + case SOCKLND_CONN_CONTROL: + conn_cb->ksnr_ctrl_conn_count--; + /* there's a single control connection per peer, + * two in case of loopback + */ + if (conn_cb->ksnr_ctrl_conn_count == 0) + conn_cb->ksnr_connected &= ~BIT(type); + break; + case SOCKLND_CONN_BULK_IN: + conn_cb->ksnr_blki_conn_count--; + if (conn_cb->ksnr_blki_conn_count < conn_cb->ksnr_max_conns) + conn_cb->ksnr_connected &= ~BIT(type); + break; + case SOCKLND_CONN_BULK_OUT: + conn_cb->ksnr_blko_conn_count--; + if (conn_cb->ksnr_blko_conn_count < conn_cb->ksnr_max_conns) + conn_cb->ksnr_connected &= ~BIT(type); + break; + case SOCKLND_CONN_ANY: + if (conn_cb->ksnr_conn_count < conn_cb->ksnr_max_conns) + conn_cb->ksnr_connected &= ~BIT(type); + break; + default: + LBUG(); + break; + } + + CDEBUG(D_NET, "Del conn type %d, ksnr_connected %x ksnr_max_conns %d\n", + type, conn_cb->ksnr_connected, conn_cb->ksnr_max_conns); +} + +static void ksocknal_associate_cb_conn_locked(struct ksock_conn_cb *conn_cb, struct ksock_conn *conn) { @@ -1249,6 +1290,8 @@ struct ksock_peer_ni * struct ksock_peer_ni *peer_ni = conn->ksnc_peer; struct ksock_conn_cb *conn_cb; struct ksock_conn *conn2; + int conn_count; + int duplicate_count = 0; LASSERT(!peer_ni->ksnp_error); LASSERT(!conn->ksnc_closing); @@ -1262,21 +1305,29 @@ struct ksock_peer_ni * /* dissociate conn from cb... */ LASSERT(!conn_cb->ksnr_deleted); + conn_count = ksocknal_get_conn_count_by_type(conn_cb, + conn->ksnc_type); /* connected bit is set only if all connections * of the given type got created */ - if (ksocknal_get_conn_count_by_type(conn_cb, conn->ksnc_type) == - conn_cb->ksnr_max_conns) + if (conn_count == conn_cb->ksnr_max_conns) LASSERT((conn_cb->ksnr_connected & BIT(conn->ksnc_type)) != 0); - list_for_each_entry(conn2, &peer_ni->ksnp_conns, ksnc_list) { - if (conn2->ksnc_conn_cb == conn_cb && - conn2->ksnc_type == conn->ksnc_type) - goto conn2_found; + if (conn_count == 1) { + list_for_each_entry(conn2, &peer_ni->ksnp_conns, + ksnc_list) { + if (conn2->ksnc_conn_cb == conn_cb && + conn2->ksnc_type == conn->ksnc_type) + duplicate_count += 1; + } + if (duplicate_count > 0) + CERROR("Found %d duplicate conns type %d\n", + duplicate_count, + conn->ksnc_type); } - conn_cb->ksnr_connected &= ~BIT(conn->ksnc_type); -conn2_found: + ksocknal_decr_conn_count(conn_cb, conn->ksnc_type); + conn->ksnc_conn_cb = NULL; /* drop conn's ref on route */ From patchwork Fri Jan 14 01:37:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713310 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 58D12C433F5 for ; Fri, 14 Jan 2022 01:38:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4DFE43B60F6; Thu, 13 Jan 2022 17:38:13 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CAC943B47A4 for ; Thu, 13 Jan 2022 17:38:09 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C8EAD100F32E; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C7340E07E4; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:50 -0500 Message-Id: <1642124283-10148-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/24] lustre: lmv: improve MDT QOS space balance X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao When MDTs are not balanced, QOS code tries to keep subdirectory creation local to the same MDT when it is deep in the directory tree, to avoid creating too many remote directories, but the existing weight to stay on the parent MDT until 50% of other MDTs is too radical, and causes mkdirs to be "stuck" on the same MDT. * remove "lq_threshold_rr" from above calculation because the check in ltd_qos_is_usable() handles this, so use only "dir_depth". * the factor is changed to "16 / (dir_depth + 10)", then it's less likely to stick to the parent MDT for top levels, while more likely to stay on the parent MDT for low levels: depth=0 -> 160%, depth=4 -> 114%, depth=6 -> 100%, depth=8 -> 88%, depth=12 -> 72% * rename lli_depth to lli_dir_depth to make usage more clear. WC-bug-id: https://jira.whamcloud.com/browse/LU-15216 Lustre-commit: 38c4c538f53fb5f0c ("LU-15216 lmv: improve MDT QOS space balance") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/45544 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 2 +- fs/lustre/llite/llite_internal.h | 2 +- fs/lustre/llite/llite_lib.c | 6 +++--- fs/lustre/llite/namei.c | 6 +++--- fs/lustre/lmv/lmv_obd.c | 7 ++++--- 5 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index f3f1ce7..43cd3cc 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -480,7 +480,7 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, if (IS_ERR(op_data)) return PTR_ERR(op_data); - op_data->op_dir_depth = ll_i2info(parent)->lli_depth; + op_data->op_dir_depth = ll_i2info(parent)->lli_dir_depth; if (ll_sbi_has_encrypt(sbi) && (IS_ENCRYPTED(parent) || diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index a2abec6..0398b5f 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -184,7 +184,7 @@ struct ll_inode_info { */ pid_t lli_opendir_pid; /* directory depth to ROOT */ - unsigned short lli_depth; + unsigned short lli_dir_depth; /* stat will try to access statahead entries or start * statahead if this flag is set, and this flag will be * set upon dir open, and cleared when dir is closed, diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index f8ecdcba..e3e871d 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2609,9 +2609,9 @@ void ll_update_dir_depth(struct inode *dir, struct inode *inode) return; lli = ll_i2info(inode); - lli->lli_depth = ll_i2info(dir)->lli_depth + 1; - CDEBUG(D_INODE, DFID" depth %hu\n", PFID(&lli->lli_fid), - lli->lli_depth); + lli->lli_dir_depth = ll_i2info(dir)->lli_dir_depth + 1; + CDEBUG(D_INODE, DFID" depth %hu\n", + PFID(&lli->lli_fid), lli->lli_dir_depth); } void ll_truncate_inode_pages_final(struct inode *inode) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index d46a30f..0683614 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -1493,7 +1493,7 @@ static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir) struct ll_inode_info *lli = ll_i2info(dir); struct lmv_stripe_md *lsm; - op_data->op_dir_depth = lli->lli_depth; + op_data->op_dir_depth = lli->lli_dir_depth; /* parent directory is striped */ if (unlikely(lli->lli_lsm_md)) @@ -1522,11 +1522,11 @@ static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir) if (lsm->lsm_md_max_inherit != LMV_INHERIT_NONE && (lsm->lsm_md_max_inherit == LMV_INHERIT_UNLIMITED || - lsm->lsm_md_max_inherit >= lli->lli_depth)) { + lsm->lsm_md_max_inherit >= lli->lli_dir_depth)) { op_data->op_flags |= MF_QOS_MKDIR; if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE && (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED || - lsm->lsm_md_max_inherit_rr >= lli->lli_depth)) + lsm->lsm_md_max_inherit_rr >= lli->lli_dir_depth)) op_data->op_flags |= MF_RR_MKDIR; CDEBUG(D_INODE, DFID" requests qos mkdir %#x\n", PFID(&lli->lli_fid), op_data->op_flags); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 55816a1..3e050b7 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1471,10 +1471,11 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 mdt, /* if current MDT has above-average space, within range of the QOS * threshold, stay on the same MDT to avoid creating needless remote - * MDT directories. It's more likely for low level directories. + * MDT directories. It's more likely for low level directories + * "16 / (dir_depth + 10)" is the factor to make it more unlikely for + * top level directories, while more likely for low levels. */ - rand = total_avail * (256 - lmv->lmv_qos.lq_threshold_rr) / - (total_usable * 256 * (1 + dir_depth / 4)); + rand = total_avail * 16 / (total_usable * (dir_depth + 10)); if (cur && cur->ltd_qos.ltq_avail >= rand) { tgt = cur; goto unlock; From patchwork Fri Jan 14 01:37:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 10359C433EF for ; Fri, 14 Jan 2022 01:38:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A443121FFAB; Thu, 13 Jan 2022 17:38:41 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1FE723AD835 for ; Thu, 13 Jan 2022 17:38:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id CFFAF100F331; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CBF9DA8102; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:51 -0500 Message-Id: <1642124283-10148-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/24] lustre: llite: access striped directory with missing stripe X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao This patch allows accessing striped directory with missing stripes: * lmv_revalidate_slave() skip error if one stripe returns -ESHUTDOWN. * add ll_dir_flush(), which will return error found in reading stripe dir pages, thus 'ls' can list dirents on other stripes, and return an error in the end. WC-bug-id: https://jira.whamcloud.com/browse/LU-9206 Lustre-commit: c0fa6f7a10d1162f8 ("LU-9206 llite: access striped directory with missing stripe") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/45631 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 9 ++++++--- fs/lustre/include/obd_class.h | 7 +++---- fs/lustre/llite/dir.c | 43 ++++++++++++++++++++++++++++++---------- fs/lustre/llite/llite_internal.h | 8 ++++++-- fs/lustre/llite/llite_nfs.c | 2 +- fs/lustre/llite/statahead.c | 6 +++--- fs/lustre/lmv/lmv_intent.c | 4 ++-- fs/lustre/lmv/lmv_obd.c | 22 ++++++++++---------- fs/lustre/mdc/mdc_request.c | 7 +++---- 9 files changed, 69 insertions(+), 39 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index f6b9d16..ecee321 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -826,10 +826,12 @@ struct md_op_data { u32 op_archive_id; }; -struct md_callback { - int (*md_blocking_ast)(struct ldlm_lock *lock, +struct md_readdir_info { + int (*mr_blocking_ast)(struct ldlm_lock *lock, struct ldlm_lock_desc *desc, void *data, int flag); + /* if striped directory is partially read, the result is stored here */ + int mr_partial_readdir_rc; }; struct md_enqueue_info; @@ -1028,8 +1030,9 @@ struct md_ops { int (*fsync)(struct obd_export *, const struct lu_fid *, struct ptlrpc_request **); int (*read_page)(struct obd_export *, struct md_op_data *, - struct md_callback *cb_op, u64 hash_offset, + struct md_readdir_info *mrinfo, u64 hash_offset, struct page **ppage); + int (*unlink)(struct obd_export *, struct md_op_data *, struct ptlrpc_request **); diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index f2a3d2b..b69331d 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1399,9 +1399,8 @@ static inline int md_file_resync(struct obd_export *exp, static inline int md_read_page(struct obd_export *exp, struct md_op_data *op_data, - struct md_callback *cb_op, - u64 hash_offset, - struct page **ppage) + struct md_readdir_info *mrinfo, + u64 hash_offset, struct page **ppage) { int rc; @@ -1412,7 +1411,7 @@ static inline int md_read_page(struct obd_export *exp, lprocfs_counter_incr(exp->exp_obd->obd_md_stats, LPROC_MD_READ_PAGE); - return MDP(exp->exp_obd, read_page)(exp, op_data, cb_op, hash_offset, + return MDP(exp->exp_obd, read_page)(exp, op_data, mrinfo, hash_offset, ppage); } diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 43cd3cc..b4870d9 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -140,17 +140,21 @@ * */ struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data, - u64 offset) + u64 offset, int *partial_readdir_rc) { - struct md_callback cb_op; + struct md_readdir_info mrinfo = { + .mr_blocking_ast = ll_md_blocking_ast + }; struct page *page; int rc; - cb_op.md_blocking_ast = ll_md_blocking_ast; - rc = md_read_page(ll_i2mdexp(dir), op_data, &cb_op, offset, &page); + rc = md_read_page(ll_i2mdexp(dir), op_data, &mrinfo, offset, &page); if (rc) return ERR_PTR(rc); + if (partial_readdir_rc && mrinfo.mr_partial_readdir_rc) + *partial_readdir_rc = mrinfo.mr_partial_readdir_rc; + return page; } @@ -177,7 +181,7 @@ void ll_release_page(struct inode *inode, struct page *page, bool remove) } int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data, - struct dir_context *ctx) + struct dir_context *ctx, int *partial_readdir_rc) { struct ll_sb_info *sbi = ll_i2sbi(inode); u64 pos = *ppos; @@ -194,7 +198,7 @@ int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data, return rc; } - page = ll_get_dir_page(inode, op_data, pos); + page = ll_get_dir_page(inode, op_data, pos, partial_readdir_rc); while (rc == 0 && !done) { struct lu_dirpage *dp; @@ -285,7 +289,8 @@ int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data, le32_to_cpu(dp->ldp_flags) & LDF_COLLIDE); next = pos; - page = ll_get_dir_page(inode, op_data, pos); + page = ll_get_dir_page(inode, op_data, pos, + partial_readdir_rc); } } @@ -305,8 +310,13 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) struct md_op_data *op_data; struct lu_fid pfid = { 0 }; ktime_t kstart = ktime_get(); + /* result of possible partial readdir */ + int partial_readdir_rc = 0; int rc; + LASSERT(lfd); + pos = lfd->lfd_pos; + CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p) pos/size %lu/%llu 32bit_api %d\n", PFID(ll_inode2fid(inode)), inode, (unsigned long)pos, @@ -369,10 +379,11 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) op_data->op_fid3 = pfid; ctx->pos = pos; - rc = ll_dir_read(inode, &pos, op_data, ctx); + rc = ll_dir_read(inode, &pos, op_data, ctx, &partial_readdir_rc); pos = ctx->pos; - if (lfd) - lfd->lfd_pos = pos; + lfd->lfd_pos = pos; + if (!lfd->fd_partial_readdir_rc) + lfd->fd_partial_readdir_rc = partial_readdir_rc; if (pos == MDS_DIR_END_OFF) { if (api32) @@ -2294,6 +2305,17 @@ static int ll_dir_release(struct inode *inode, struct file *file) return ll_file_release(inode, file); } +/* notify error if partially read striped directory */ +static int ll_dir_flush(struct file *file, fl_owner_t id) +{ + struct ll_file_data *lfd = file->private_data; + int rc = lfd->fd_partial_readdir_rc; + + lfd->fd_partial_readdir_rc = 0; + + return rc; +} + const struct file_operations ll_dir_operations = { .llseek = ll_dir_seek, .open = ll_dir_open, @@ -2302,4 +2324,5 @@ static int ll_dir_release(struct inode *inode, struct file *file) .iterate_shared = ll_readdir, .unlocked_ioctl = ll_dir_ioctl, .fsync = ll_fsync, + .flush = ll_dir_flush, }; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 0398b5f..54f0218 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -920,6 +920,10 @@ struct ll_file_data { */ u32 fd_layout_version; struct pcc_file fd_pcc_file; + /* striped directory may read partially if some stripe inaccessible, + * -errno is saved here, and will return to user in close(). + */ + int fd_partial_readdir_rc; }; void llite_tunables_unregister(void); @@ -1043,11 +1047,11 @@ enum { extern const struct file_operations ll_dir_operations; extern const struct inode_operations ll_dir_inode_operations; int ll_dir_read(struct inode *inode, u64 *ppos, struct md_op_data *op_data, - struct dir_context *ctx); + struct dir_context *ctx, int *partial_readdir_rc); int ll_get_mdt_idx(struct inode *inode); int ll_get_mdt_idx_by_fid(struct ll_sb_info *sbi, const struct lu_fid *fid); struct page *ll_get_dir_page(struct inode *dir, struct md_op_data *op_data, - u64 offset); + u64 offset, int *partial_readdir_rc); void ll_release_page(struct inode *inode, struct page *page, bool remove); int quotactl_ioctl(struct super_block *sb, struct if_quotactl *qctl); diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c index 07fcad6..3c4c9ef 100644 --- a/fs/lustre/llite/llite_nfs.c +++ b/fs/lustre/llite/llite_nfs.c @@ -233,7 +233,7 @@ static int ll_get_name(struct dentry *dentry, char *name, } inode_lock(dir); - rc = ll_dir_read(dir, &pos, op_data, &lgd.ctx); + rc = ll_dir_read(dir, &pos, op_data, &lgd.ctx, NULL); inode_unlock(dir); ll_finish_md_op_data(op_data); if (!rc && !lgd.lgd_found) diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index afb668e..c781e49 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -1041,7 +1041,7 @@ static int ll_statahead_thread(void *arg) } sai->sai_in_readpage = 1; - page = ll_get_dir_page(dir, op_data, pos); + page = ll_get_dir_page(dir, op_data, pos, NULL); ll_unlock_md_op_lsm(op_data); sai->sai_in_readpage = 0; if (IS_ERR(page)) { @@ -1325,7 +1325,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry) /** * FIXME choose the start offset of the readdir */ - page = ll_get_dir_page(dir, op_data, pos); + page = ll_get_dir_page(dir, op_data, pos, NULL); while (1) { struct lu_dirpage *dp; @@ -1429,7 +1429,7 @@ static int is_first_dirent(struct inode *dir, struct dentry *dentry) ll_release_page(dir, page, le32_to_cpu(dp->ldp_flags) & LDF_COLLIDE); - page = ll_get_dir_page(dir, op_data, pos); + page = ll_get_dir_page(dir, op_data, pos, NULL); } } out: diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 906ca16..2322b6a 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -222,8 +222,8 @@ int lmv_revalidate_slaves(struct obd_export *exp, rc = md_intent_lock(tgt->ltd_exp, op_data, &it, &req, cb_blocking, extra_lock_flags); - if (rc == -ENOENT) { - /* skip stripe is not exists */ + if (rc == -ENOENT || rc == -ESHUTDOWN) { + /* skip stripe that doesn't exist or is inaccessible */ rc = 0; continue; } diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 3e050b7..5fd00d3 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -2574,7 +2574,7 @@ struct stripe_dirent { struct lmv_dir_ctxt { struct lmv_obd *ldc_lmv; struct md_op_data *ldc_op_data; - struct md_callback *ldc_cb_op; + struct md_readdir_info *ldc_mrinfo; u64 ldc_hash; int ldc_count; struct stripe_dirent ldc_stripes[0]; @@ -2675,7 +2675,7 @@ static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt, op_data->op_fid2 = oinfo->lmo_fid; op_data->op_data = oinfo->lmo_root; - rc = md_read_page(tgt->ltd_exp, op_data, ctxt->ldc_cb_op, hash, + rc = md_read_page(tgt->ltd_exp, op_data, ctxt->ldc_mrinfo, hash, &stripe->sd_page); op_data->op_fid1 = fid; @@ -2696,6 +2696,7 @@ static struct lu_dirent *stripe_dirent_load(struct lmv_dir_ctxt *ctxt, LASSERT(!ent); /* treat error as eof, so dir can be partially accessed */ stripe->sd_eof = true; + ctxt->ldc_mrinfo->mr_partial_readdir_rc = rc; LCONSOLE_WARN("dir " DFID " stripe %d readdir failed: %d, directory is partially accessed!\n", PFID(&ctxt->ldc_op_data->op_fid1), stripe_index, rc); @@ -2793,7 +2794,8 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt) * * @exp: obd export refer to LMV * @op_data: hold those MD parameters of read_entry - * @cb_op: ldlm callback being used in enqueue in mdc_read_entry + * @mrinfo: ldlm callback being used in enqueue in mdc_read_entry, + * and partial readdir results will be stored in it. * @offset: the entry being read * @ppage: the page holding the entry. Note: because the entry * will be accessed in upper layer, so we need hold the @@ -2805,8 +2807,8 @@ static struct lu_dirent *lmv_dirent_next(struct lmv_dir_ctxt *ctxt) */ static int lmv_striped_read_page(struct obd_export *exp, struct md_op_data *op_data, - struct md_callback *cb_op, - u64 offset, struct page **ppage) + struct md_readdir_info *mrinfo, u64 offset, + struct page **ppage) { struct page *page = NULL; struct lu_dirpage *dp; @@ -2848,7 +2850,7 @@ static int lmv_striped_read_page(struct obd_export *exp, } ctxt->ldc_lmv = &exp->exp_obd->u.lmv; ctxt->ldc_op_data = op_data; - ctxt->ldc_cb_op = cb_op; + ctxt->ldc_mrinfo = mrinfo; ctxt->ldc_hash = offset; ctxt->ldc_count = stripe_count; @@ -2925,7 +2927,7 @@ static int lmv_striped_read_page(struct obd_export *exp, } static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data, - struct md_callback *cb_op, u64 offset, + struct md_readdir_info *mrinfo, u64 offset, struct page **ppage) { struct obd_device *obd = exp->exp_obd; @@ -2936,15 +2938,15 @@ static int lmv_read_page(struct obd_export *exp, struct md_op_data *op_data, return -ENODATA; if (unlikely(lmv_dir_striped(op_data->op_mea1))) { - return lmv_striped_read_page(exp, op_data, cb_op, - offset, ppage); + return lmv_striped_read_page(exp, op_data, mrinfo, offset, + ppage); } tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); - return md_read_page(tgt->ltd_exp, op_data, cb_op, offset, ppage); + return md_read_page(tgt->ltd_exp, op_data, mrinfo, offset, ppage); } /** diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 9788bd3..3284c01 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1294,7 +1294,6 @@ struct readpage_param { u64 rp_off; int rp_hash64; struct obd_export *rp_exp; - struct md_callback *rp_cb; }; /** @@ -1410,7 +1409,7 @@ static int mdc_read_page_remote(void *data, struct page *page0) * @exp: MDC export * @op_data: client MD stack parameters, transferring parameters * between different layers on client MD stack. - * @cb_op: callback required for ldlm lock enqueue during + * @mrinfo: callback required for ldlm lock enqueue during * read page * @hash_offset: the hash offset of the page to be read * @ppage the page to be read @@ -1419,7 +1418,7 @@ static int mdc_read_page_remote(void *data, struct page *page0) * errno(<0) get the page failed */ static int mdc_read_page(struct obd_export *exp, struct md_op_data *op_data, - struct md_callback *cb_op, u64 hash_offset, + struct md_readdir_info *mrinfo, u64 hash_offset, struct page **ppage) { struct lookup_intent it = { .it_op = IT_READDIR }; @@ -1440,7 +1439,7 @@ static int mdc_read_page(struct obd_export *exp, struct md_op_data *op_data, mapping = dir->i_mapping; rc = mdc_intent_lock(exp, op_data, &it, &enq_req, - cb_op->md_blocking_ast, 0); + mrinfo->mr_blocking_ast, 0); if (enq_req) ptlrpc_req_finished(enq_req); From patchwork Fri Jan 14 01:37:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713314 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5E2A5C433EF for ; Fri, 14 Jan 2022 01:38:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ADEE73AD84D; Thu, 13 Jan 2022 17:38:19 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6DAF73AD84A for ; Thu, 13 Jan 2022 17:38:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D3DD6100F332; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D03A7A8103; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:52 -0500 Message-Id: <1642124283-10148-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/24] lnet: libcfs: Remove D_TTY X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The D_TTY flag is almost entirely unused and certainly not needed. Remove it so we have a spare flag to use for iotrace. WC-bug-id: https://jira.whamcloud.com/browse/LU-15137 Lustre-commit: f9fe2977d184fbc8e ("LU-15317 libcfs: Remove D_TTY") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/45751 Reviewed-by: Andreas Dilger Reviewed-by: Sebastien Buisson Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 2 +- include/uapi/linux/lnet/libcfs_debug.h | 1 - net/lnet/libcfs/tracefile.c | 51 +--------------------------------- 3 files changed, 2 insertions(+), 52 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 30e99c0..05f2f1a 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -4883,7 +4883,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, */ if (!(fd->fd_flags & LL_FILE_FLOCK_WARNING)) { fd->fd_flags |= LL_FILE_FLOCK_WARNING; - CDEBUG_LIMIT(D_TTY | D_CONSOLE, + CDEBUG_LIMIT(D_CONSOLE, "flock disabled, mount with '-o [local]flock' to enable\r\n"); } return -EINVAL; diff --git a/include/uapi/linux/lnet/libcfs_debug.h b/include/uapi/linux/lnet/libcfs_debug.h index 6b64f0e..4cb6594 100644 --- a/include/uapi/linux/lnet/libcfs_debug.h +++ b/include/uapi/linux/lnet/libcfs_debug.h @@ -106,7 +106,6 @@ struct ptldebug_header { #define D_TRACE 0x00000001 /* ENTRY/EXIT markers */ #define D_INODE 0x00000002 #define D_SUPER 0x00000004 -#define D_TTY 0x00000008 /* notification printed to TTY */ #define D_MALLOC 0x00000010 /* print malloc, free information */ #define D_CACHE 0x00000020 /* cache-related items */ #define D_INFO 0x00000040 /* general information */ diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c index b27732a..948eaaa 100644 --- a/net/lnet/libcfs/tracefile.c +++ b/net/lnet/libcfs/tracefile.c @@ -44,7 +44,6 @@ #include #include #include -#include #include #include "tracefile.h" @@ -352,41 +351,6 @@ static void cfs_set_ptldebug_header(struct ptldebug_header *header, header->ph_extern_pid = 0; } -/** - * tty_write_msg - write a message to a certain tty, not just the console. - * @tty: the destination tty_struct - * @msg: the message to write - * - * tty_write_message is not exported, so write a same function for it - * - */ -static void tty_write_msg(struct tty_struct *tty, const char *msg) -{ - mutex_lock(&tty->atomic_write_lock); - tty_lock(tty); - if (tty->ops->write && tty->count > 0) - tty->ops->write(tty, msg, strlen(msg)); - tty_unlock(tty); - mutex_unlock(&tty->atomic_write_lock); - wake_up_interruptible_poll(&tty->write_wait, POLLOUT); -} - -static void cfs_tty_write_message(const char *prefix, int mask, const char *msg) -{ - struct tty_struct *tty; - - tty = get_current_tty(); - if (!tty) - return; - - tty_write_msg(tty, prefix); - if ((mask & D_EMERG) || (mask & D_ERROR)) - tty_write_msg(tty, "Error"); - tty_write_msg(tty, ": "); - tty_write_msg(tty, msg); - tty_kref_put(tty); -} - static void cfs_vprint_to_console(struct ptldebug_header *hdr, int mask, struct va_format *vaf, const char *file, const char *fn) @@ -421,10 +385,6 @@ static void cfs_vprint_to_console(struct ptldebug_header *hdr, int mask, else if (mask & (D_CONSOLE | libcfs_printk)) pr_info("%s: %pV", prefix, vaf); } - - if (mask & D_TTY) - /* tty_write_msg doesn't handle formatting */ - cfs_tty_write_message(prefix, mask, vaf->fmt); } static void cfs_print_to_console(struct ptldebug_header *hdr, int mask, @@ -534,14 +494,6 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata, if (*(string_buf + needed - 1) != '\n') { pr_info("Lustre: format at %s:%d:%s doesn't end in newline\n", file, msgdata->msg_line, msgdata->msg_fn); - } else if (mask & D_TTY) { - /* TTY needs '\r\n' to move carriage to leftmost position */ - if (needed < 2 || *(string_buf + needed - 2) != '\r') - pr_info("Lustre: format at %s:%d:%s doesn't end in '\\r\\n'\n", - file, msgdata->msg_line, msgdata->msg_fn); - if (strnchr(string_buf, needed, '%')) - pr_info("Lustre: format at %s:%d:%s mustn't contain %%\n", - file, msgdata->msg_line, msgdata->msg_fn); } header.ph_len = known_size + needed; @@ -627,8 +579,7 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata, } if (cdls && cdls->cdls_count) { - /* Do not allow print this to TTY */ - cfs_print_to_console(&header, mask & ~D_TTY, file, + cfs_print_to_console(&header, mask, file, msgdata->msg_fn, "Skipped %d previous similar message%s\n", cdls->cdls_count, From patchwork Fri Jan 14 01:37:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713329 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AB992C433EF for ; Fri, 14 Jan 2022 01:39:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 11DE03AD81B; Thu, 13 Jan 2022 17:38:50 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B8AD63AD851 for ; Thu, 13 Jan 2022 17:38:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D5AF5100F333; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D47E6DF4C4; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:53 -0500 Message-Id: <1642124283-10148-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/24] lustre: llite: Add D_IOTRACE X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell In looking in to performance problems, it's very important to be able to trace the I/O patterns from userspace in to Lustre, and also understand the key basics of how Lustre handles that I/O (readahead, RPC generation). This is best done with a dedicated debug flag - No userspace tool can provide all this information, and existing debug flags collect a huge number of unrelated pieces of, well, debug information. The goal is for customers to be able to quickly gather log files of a reasonable size which contain the necessary information and which can easily be interpreted by engineering. This is not possible if the information is spread out across a number of heavyweight debug flags. This is a first pass at adding the flag and the debug required to track basic data I/O. One significant omission in the first patch is RPC generation - I have not decided how best to do that yet. That will be added in a future patch. WC-bug-id: https://jira.whamcloud.com/browse/LU-15137 Lustre-commit: 40d286e11138fc67f ("LU-15317 llite: Add D_IOTRACE") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/45752 Reviewed-by: Sebastien Buisson Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 10 ++++++++++ fs/lustre/llite/llite_internal.h | 1 + fs/lustre/llite/llite_mmap.c | 13 ++++++++++--- fs/lustre/llite/rw.c | 10 ++++++++-- include/uapi/linux/lnet/libcfs_debug.h | 3 ++- 5 files changed, 31 insertions(+), 6 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 05f2f1a..dec0109 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1954,6 +1954,11 @@ static ssize_t ll_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ktime_t kstart = ktime_get(); bool cached; + CDEBUG(D_VFSTRACE|D_IOTRACE, "file %s:"DFID", ppos: %lld, count: %zu\n", + file_dentry(file)->d_name.name, + PFID(ll_inode2fid(file_inode(file))), iocb->ki_pos, + iov_iter_count(to)); + if (!iov_iter_count(to)) return 0; @@ -2075,6 +2080,11 @@ static ssize_t ll_file_write_iter(struct kiocb *iocb, struct iov_iter *from) ktime_t kstart = ktime_get(); int result; + CDEBUG(D_VFSTRACE|D_IOTRACE, "file %s:"DFID", ppos: %lld, count: %zu\n", + file_dentry(file)->d_name.name, + PFID(ll_inode2fid(file_inode(file))), iocb->ki_pos, + iov_iter_count(from)); + if (!iov_iter_count(from)) { rc_normal = 0; goto out; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 54f0218..8c7361a 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -885,6 +885,7 @@ struct ll_readahead_work { struct file *lrw_file; pgoff_t lrw_start_idx; pgoff_t lrw_end_idx; + pid_t lrw_user_pid; /* async worker to handler read */ struct work_struct lrw_readahead_work; diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 0009c5f..d87a68d 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -377,9 +377,10 @@ static vm_fault_t ll_fault(struct vm_fault *vmf) if (cached) goto out; - CDEBUG(D_MMAP, DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx\n", + CDEBUG(D_MMAP|D_IOTRACE, + DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx idx=%lu\n", PFID(&ll_i2info(file_inode(vma->vm_file))->lli_fid), - vma, vma->vm_start, vma->vm_end, vma->vm_flags); + vma, vma->vm_start, vma->vm_end, vma->vm_flags, vmf->pgoff); /* Only SIGKILL and SIGTERM are allowed for fault/nopage/mkwrite * so that it can be killed by admin but not cause segfault by @@ -440,8 +441,14 @@ static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf) bool retry; bool cached; int err; - vm_fault_t ret; ktime_t kstart = ktime_get(); + vm_fault_t ret; + + CDEBUG(D_MMAP|D_IOTRACE, + DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx idx=%lu\n", + PFID(&ll_i2info(file_inode(vma->vm_file))->lli_fid), + vma, vma->vm_start, vma->vm_end, vma->vm_flags, + vmf->page->index); ret = pcc_page_mkwrite(vma, vmf, &cached); if (cached) diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index c9f29ef..9f6e140 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -595,6 +595,11 @@ static void ll_readahead_handle_work(struct work_struct *wq) inode = file_inode(file); sbi = ll_i2sbi(inode); + CDEBUG(D_READA|D_IOTRACE, + "%s: async ra from %lu to %lu triggered by user pid %d\n", + file_dentry(file)->d_name.name, work->lrw_start_idx, + work->lrw_end_idx, work->lrw_user_pid); + env = cl_env_alloc(&refcheck, LCT_NOREF); if (IS_ERR(env)) { rc = PTR_ERR(env); @@ -1301,7 +1306,7 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode, spin_lock(&ras->ras_lock); if (!hit) - CDEBUG(D_READA, DFID " pages at %lu miss.\n", + CDEBUG(D_READA|D_IOTRACE, DFID " pages at %lu miss.\n", PFID(ll_inode2fid(inode)), index); ll_ra_stats_inc_sbi(sbi, hit ? RA_STAT_HIT : RA_STAT_MISS); @@ -1670,7 +1675,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io, skip_index = vvp_index(vpg); rc2 = ll_readahead(env, io, &queue->c2_qin, ras, uptodate, file, skip_index); - CDEBUG(D_READA, DFID " %d pages read ahead at %lu\n", + CDEBUG(D_READA|D_IOTRACE, DFID " %d pages read ahead at %lu\n", PFID(ll_inode2fid(inode)), rc2, vvp_index(vpg)); } else if (vvp_index(vpg) == io_start_index && io_end_index - io_start_index > 0) { @@ -1770,6 +1775,7 @@ static int kickoff_async_readahead(struct file *file, unsigned long pages) lrw->lrw_file = get_file(file); lrw->lrw_start_idx = start_idx; lrw->lrw_end_idx = end_idx; + lrw->lrw_user_pid = current->pid; spin_lock(&ras->ras_lock); ras->ras_next_readahead_idx = end_idx + 1; ras->ras_async_last_readpage_idx = start_idx; diff --git a/include/uapi/linux/lnet/libcfs_debug.h b/include/uapi/linux/lnet/libcfs_debug.h index 4cb6594..bbd9f25 100644 --- a/include/uapi/linux/lnet/libcfs_debug.h +++ b/include/uapi/linux/lnet/libcfs_debug.h @@ -106,6 +106,7 @@ struct ptldebug_header { #define D_TRACE 0x00000001 /* ENTRY/EXIT markers */ #define D_INODE 0x00000002 #define D_SUPER 0x00000004 +#define D_IOTRACE 0x00000008 /* simple, low overhead io tracing */ #define D_MALLOC 0x00000010 /* print malloc, free information */ #define D_CACHE 0x00000020 /* cache-related items */ #define D_INFO 0x00000040 /* general information */ @@ -136,7 +137,7 @@ struct ptldebug_header { #define D_LAYOUT 0x80000000 #define LIBCFS_DEBUG_MASKS_NAMES { \ - "trace", "inode", "super", "tty", "malloc", "cache", "info", \ + "trace", "inode", "super", "iotrace", "malloc", "cache", "info",\ "ioctl", "neterror", "net", "warning", "buffs", "other", \ "dentry", "nettrace", "page", "dlmtrace", "error", "emerg", \ "ha", "rpctrace", "vfstrace", "reada", "mmap", "config", \ From patchwork Fri Jan 14 01:37:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76437C433FE for ; Fri, 14 Jan 2022 01:38:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 27F3B3AD876; Thu, 13 Jan 2022 17:38:14 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 127A23AD84A for ; Thu, 13 Jan 2022 17:38:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id DA55E100F334; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D81A7E07E3; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:54 -0500 Message-Id: <1642124283-10148-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/24] lustre: llite: Add start_idx debug X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell When readahead is triggered, current readahead debug prints the page the user requested which triggered readahead and the number of pages read by readahead. However, readahead does not necessarily start reading from the user requested page, so it's important to also print the page where readahead starts. WC-bug-id: https://jira.whamcloud.com/browse/LU-15069 Lustre-commit: e13ed446337273a04 ("LU-15069 llite: Add start_idx debug") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/45674 Reviewed-by: Sebastien Buisson Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/rw.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 9f6e140..b8cffde 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -713,12 +713,13 @@ static void ll_readahead_handle_work(struct work_struct *wq) static int ll_readahead(const struct lu_env *env, struct cl_io *io, struct cl_page_list *queue, struct ll_readahead_state *ras, bool hit, - struct file *file, pgoff_t skip_index) + struct file *file, pgoff_t skip_index, + pgoff_t *start_idx) { struct vvp_io *vio = vvp_env_io(env); struct ll_thread_info *lti = ll_env_info(env); unsigned long pages, pages_min = 0; - pgoff_t ra_end_idx = 0, start_idx = 0, end_idx = 0; + pgoff_t ra_end_idx = 0, end_idx = 0; struct inode *inode; struct ra_io_arg *ria = <i->lti_ria; struct cl_object *clob; @@ -761,16 +762,16 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, * so that stride read ahead can work correctly. */ if (stride_io_mode(ras)) - start_idx = max_t(pgoff_t, ras->ras_next_readahead_idx, + *start_idx = max_t(pgoff_t, ras->ras_next_readahead_idx, ras->ras_stride_offset >> PAGE_SHIFT); else - start_idx = ras->ras_next_readahead_idx; + *start_idx = ras->ras_next_readahead_idx; if (ras->ras_window_pages > 0) end_idx = ras->ras_window_start_idx + ras->ras_window_pages - 1; if (skip_index) - end_idx = start_idx + ras->ras_window_pages - 1; + end_idx = *start_idx + ras->ras_window_pages - 1; /* Enlarge the RA window to encompass the full read */ if (vio->vui_ra_valid && @@ -787,7 +788,7 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io, ria->ria_eof = true; } } - ria->ria_start_idx = start_idx; + ria->ria_start_idx = *start_idx; ria->ria_end_idx = end_idx; /* If stride I/O mode is detected, get stride window*/ if (stride_io_mode(ras)) { @@ -1627,6 +1628,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io, struct cl_2queue *queue = &io->ci_queue; struct ll_sb_info *sbi = ll_i2sbi(inode); struct cl_sync_io *anchor = NULL; + pgoff_t ra_start_index = 0; pgoff_t io_start_index; pgoff_t io_end_index; int rc = 0, rc2 = 0; @@ -1674,9 +1676,12 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io, if (ras->ras_next_readahead_idx < vvp_index(vpg)) skip_index = vvp_index(vpg); rc2 = ll_readahead(env, io, &queue->c2_qin, ras, - uptodate, file, skip_index); - CDEBUG(D_READA|D_IOTRACE, DFID " %d pages read ahead at %lu\n", - PFID(ll_inode2fid(inode)), rc2, vvp_index(vpg)); + uptodate, file, skip_index, + &ra_start_index); + CDEBUG(D_READA|D_IOTRACE, + DFID " %d pages read ahead at %lu, triggered by user read at %lu\n", + PFID(ll_inode2fid(inode)), rc2, ra_start_index, + vvp_index(vpg)); } else if (vvp_index(vpg) == io_start_index && io_end_index - io_start_index > 0) { rc2 = ll_readpages(env, io, &queue->c2_qin, io_start_index + 1, From patchwork Fri Jan 14 01:37:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713312 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9BF5FC433EF for ; Fri, 14 Jan 2022 01:38:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C0E9C3B60F7; Thu, 13 Jan 2022 17:38:16 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5BBC13AD861 for ; Thu, 13 Jan 2022 17:38:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id DE219100F335; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DCC2EA8102; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:55 -0500 Message-Id: <1642124283-10148-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 16/24] lnet: Skip router discovery on send path X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn When the router checker is enabled, routes are regularly marked as out of date w.r.t. discovery. This can cause upper level messages to be delayed while the router undergoes discovery. We can avoid delaying messages by relying on the router checker to initiate discovery of routers. If we happen to send a message to a router before it has been discovered then the worst case scenario is that the route is actually down or we end up utilizing a subset of a multi-rail router's interfaces. Both situations can be remedied by utilizing the check_routers_before_use parameter. Change the logic in lnet_handle_find_routed_path() so that we only initiate discovery if the alive_router_check_interval is <= 0 (i.e. router checker pings are disabled). WC-bug-id: https://jira.whamcloud.com/browse/LU-15275 Lustre-commit: c8e74c395d5634dbb ("LU-15275 lnet: Skip router discovery on send path") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/45684 Reviewed-by: Alexey Lyashkov Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 133397e..8d4fd4d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2104,13 +2104,23 @@ struct lnet_ni * LASSERT(gw == gwni->lpni_peer_net->lpn_peer); } - /* Discover this gateway if it hasn't already been discovered. - * This means we might delay the message until discovery has - * completed + /* If the router checker is not active then discover the gateway here. + * This ensures we are able to take advantage of multi-rail routing, but + * if the router checker is active then we do not unecessarily delay + * messages while the gateway is being checked by the dedicated monitor + * thread. + * + * NB: We're only checking the alive_router_check_interval here, rather + * than calling lnet_router_checker_active(), because the other + * conditions that are checked by that function are either + * irrelevant (the_lnet.ln_routing) or must be true (list of routers + * is not empty) */ - rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_cpt); - if (rc) - return rc; + if (alive_router_check_interval <= 0) { + rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_cpt); + if (rc) + return rc; + } if (!sd->sd_best_ni) { lpn = gwni->lpni_peer_net; From patchwork Fri Jan 14 01:37:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713328 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 18E15C433EF for ; Fri, 14 Jan 2022 01:39:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F18223AD7F0; Thu, 13 Jan 2022 17:38:48 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 94AEA3AD861 for ; Thu, 13 Jan 2022 17:38:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E2AF2100F338; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E13ADA8103; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:56 -0500 Message-Id: <1642124283-10148-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/24] lustre: mdc: GET(X)ATTR to READPAGE portal X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Send the MDS_GETATTR and MDS_GETXATTR RPCs to the MDS_READPAGE_PORTAL instead of the default portal to avoid deadlocks with other MDS_REINT RPCs that may block all of the MDS service threads on that portal. This deadlock occurs with MDS_GETXATTR when selinux is enabled, because getxattr becomes part of lookup, so it takes a reference on a lock used for lookup. However, all of the MDS service threads on the default portal can be consumed by threads waiting for that lock, resulting in a deadlock when the getxattr can't be processed. WC-bug-id: https://jira.whamcloud.com/browse/LU-15245 Lustre-commit: 5552eba1451d47ce1 ("LU-15245 mdc: GET(X)ATTR to READPAGE portal") Signed-off-by: Andreas Dilger Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/45593 Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_request.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 3284c01..1064d9f 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -224,6 +224,9 @@ static int mdc_getattr(struct obd_export *exp, struct md_op_data *op_data, return rc; } + /* LU-15245: avoid deadlock with modifying RPCs on MDS_REQUEST_PORTAL */ + req->rq_request_portal = MDS_READPAGE_PORTAL; + again: mdc_pack_body(&req->rq_pill, &op_data->op_fid1, op_data->op_valid, op_data->op_mode, -1, 0); @@ -402,6 +405,10 @@ static int mdc_xattr_common(struct obd_export *exp, } else { mdc_pack_body(&req->rq_pill, fid, valid, output_size, suppgid, flags); + /* Avoid deadlock with modifying RPCs on MDS_REQUEST_PORTAL. + * See LU-15245. + */ + req->rq_request_portal = MDS_READPAGE_PORTAL; } if (xattr_name) { From patchwork Fri Jan 14 01:37:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713331 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 98AADC433FE for ; Fri, 14 Jan 2022 01:39:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B42A73AD998; Thu, 13 Jan 2022 17:38:53 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CC8C63AD7BC for ; Thu, 13 Jan 2022 17:38:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E7314100F339; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E5D31DF4C4; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:57 -0500 Message-Id: <1642124283-10148-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/24] lnet: libcfs: set x->ls_len to 0 when x->ls_str is NULL X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Etienne AUJAMES cfs_gettok() set next->ls_str to NULL if no delimiter is found but it does not update next->ls_len to 0. This patch fix cfs_gettok() to update "next->ls_len = 0;" if no delimiter is found. WC-bug-id: https://jira.whamcloud.com/browse/LU-15130 Lustre-commit: cec864b7938f1138d ("LU-15130 nrs: null pointer dereference in nrs_tbf_id_parse") Signed-off-by: Etienne AUJAMES Reviewed-on: https://review.whamcloud.com/45291 Reviewed-by: Andreas Dilger Reviewed-by: Li Xi Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/libcfs/libcfs_string.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/lnet/libcfs/libcfs_string.c b/net/lnet/libcfs/libcfs_string.c index 4259f8b8..0563c42 100644 --- a/net/lnet/libcfs/libcfs_string.c +++ b/net/lnet/libcfs/libcfs_string.c @@ -154,6 +154,7 @@ int cfs_str2mask(const char *str, const char *(*bit2str)(int bit), /* there is no the delimeter in the string */ end = next->ls_str + next->ls_len; next->ls_str = NULL; + next->ls_len = 0; } else { next->ls_str = end + 1; next->ls_len -= (end - res->ls_str + 1); From patchwork Fri Jan 14 01:37:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4E45EC433F5 for ; Fri, 14 Jan 2022 01:38:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6B67B3AD9D0; Thu, 13 Jan 2022 17:38:30 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 10D8E3AD7BC for ; Thu, 13 Jan 2022 17:38:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id EB4FC100F33A; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EA020A8102; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:58 -0500 Message-Id: <1642124283-10148-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/24] lustre: uapi: set default max-inherit to 3 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lei Feng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lei Feng Change LMV_INHERIT_DEFAULT from 0 to 3. So that the default stripe policy of dir will not be inherited unlimited and reduce performance unexpectly. WC-bug-id: https://jira.whamcloud.com/browse/LU-15314 Lustre-commit: 956b4b1e0d9f18c6f ("LU-15314 utils: set default max-inherit to 3") Signed-off-by: Lei Feng Reviewed-on: https://review.whamcloud.com/45874 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_user.h | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 1e66930..3b53a5b 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -838,23 +838,25 @@ enum lmv_type { */ enum { /* for historical reason, 0 means unlimited inheritance */ - LMV_INHERIT_UNLIMITED = 0, - /* unlimited lum_max_inherit by default */ - LMV_INHERIT_DEFAULT = 0, + LMV_INHERIT_UNLIMITED = 0, + /* unlimited lum_max_inherit by default for plain stripe (0 or 1) */ + LMV_INHERIT_DEFAULT_PLAIN = LMV_INHERIT_UNLIMITED, /* not inherit any more */ - LMV_INHERIT_END = 1, + LMV_INHERIT_END = 1, + /* for multiple stripes, the default lum_max_inherit is 3 */ + LMV_INHERIT_DEFAULT_STRIPED = 3, /* max inherit depth */ - LMV_INHERIT_MAX = 250, + LMV_INHERIT_MAX = 250, /* [251, 254] are reserved */ /* not set, or when inherit depth goes beyond end, */ - LMV_INHERIT_NONE = 255, + LMV_INHERIT_NONE = 255, }; enum { /* not set, or when inherit_rr depth goes beyond end, */ LMV_INHERIT_RR_NONE = 0, /* disable lum_max_inherit_rr by default */ - LMV_INHERIT_RR_DEFAULT = 0, + LMV_INHERIT_RR_DEFAULT = LMV_INHERIT_RR_NONE, /* not inherit any more */ LMV_INHERIT_RR_END = 1, /* default inherit_rr of ROOT */ From patchwork Fri Jan 14 01:37:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713332 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A563C433EF for ; Fri, 14 Jan 2022 01:39:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1355C3AD841; Thu, 13 Jan 2022 17:38:58 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 497883AD7BC for ; Thu, 13 Jan 2022 17:38:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id F22BA100F34A; Thu, 13 Jan 2022 20:38:04 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EE92DA8103; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:37:59 -0500 Message-Id: <1642124283-10148-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 20/24] lustre: llite: Switch pcc to lookup_one_len X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Using kern_path to lookup files in the PCC cache means we are subject to user namespaces, so the PCC volume must be mapped in to a container or the cached files cannot be found. One solution is to switch to using lookup_one_len - this is what the code which *creates* PCC files does. This manually walks the path from the root, which avoids namespace issues. This is appropriate because PCC is kernel functionality - the user should not be able to directly access the volume, but it should be accessible as a cache. WC-bug-id: https://jira.whamcloud.com/browse/LU-15170 Lustre-commit: f3be560031cc7022a ("LU-15170 llite: Switch pcc to lookup_one_len") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/45436 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/pcc.c | 119 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 84 insertions(+), 35 deletions(-) diff --git a/fs/lustre/llite/pcc.c b/fs/lustre/llite/pcc.c index 85114b8..8bdf681e 100644 --- a/fs/lustre/llite/pcc.c +++ b/fs/lustre/llite/pcc.c @@ -1085,7 +1085,7 @@ void pcc_inode_free(struct inode *inode) * reduce overhead: * (fid->f_oid >> 16 & oxFFFF)/FID */ -#define MAX_PCC_DATABASE_PATH (6 * 5 + FID_NOBRACE_LEN + 1) +#define PCC_DATASET_MAX_PATH (6 * 5 + FID_NOBRACE_LEN + 1) static int pcc_fid2dataset_path(char *buf, int sz, struct lu_fid *fid) { return scnprintf(buf, sz, "%04x/%04x/%04x/%04x/%04x/%04x/" @@ -1160,21 +1160,6 @@ static int pcc_get_layout_info(struct inode *inode, struct cl_layout *clt) return rc < 0 ? rc : 0; } -static int pcc_fid2dataset_fullpath(char *buf, int sz, struct lu_fid *fid, - struct pcc_dataset *dataset) -{ - return scnprintf(buf, sz, "%s/%04x/%04x/%04x/%04x/%04x/%04x/" - DFID_NOBRACE, - dataset->pccd_pathname, - (fid)->f_oid & 0xFFFF, - (fid)->f_oid >> 16 & 0xFFFF, - (unsigned int)((fid)->f_seq & 0xFFFF), - (unsigned int)((fid)->f_seq >> 16 & 0xFFFF), - (unsigned int)((fid)->f_seq >> 32 & 0xFFFF), - (unsigned int)((fid)->f_seq >> 48 & 0xFFFF), - PFID(fid)); -} - /* Must be called with pcci->pcci_lock held */ static void pcc_inode_attach_init(struct pcc_dataset *dataset, struct pcc_inode *pcci, @@ -1221,6 +1206,72 @@ static inline bool pcc_inode_has_layout(struct pcc_inode *pcci) return pcci->pcci_layout_gen != CL_LAYOUT_GEN_NONE; } +static struct dentry *pcc_lookup(struct dentry *base, char *pathname) +{ + char *ptr = NULL, *component; + struct dentry *parent; + struct dentry *child = ERR_PTR(-ENOENT); + + ptr = pathname; + + /* move past any initial '/' to the start of the first path component*/ + while (*ptr == '/') + ptr++; + + /* store the start of the first path component */ + component = ptr; + + parent = dget(base); + while (ptr) { + /* find the start of the next component - if we don't find it, + * the current component is the last component + */ + ptr = strchr(ptr, '/'); + /* put a NUL char in place of the '/' before the next compnent + * so we can treat this component as a string; note the full + * path string is NUL terminated to this is not needed for the + * last component + */ + if (ptr) + *ptr = '\0'; + + /* look up the current component */ + inode_lock(parent->d_inode); + child = lookup_one_len(component, parent, strlen(component)); + inode_unlock(parent->d_inode); + + /* repair the path string: put '/' back in place of the NUL */ + if (ptr) + *ptr = '/'; + + dput(parent); + + if (IS_ERR_OR_NULL(child)) + break; + + /* we may find a cached negative dentry */ + if (!d_is_positive(child)) { + dput(child); + child = NULL; + break; + } + + /* descend in to the next level of the path */ + parent = child; + + /* move the pointer past the '/' to the next component */ + if (ptr) + ptr++; + component = ptr; + } + + /* NULL child means we didn't find anything */ + if (!child) + child = ERR_PTR(-ENOENT); + + return child; +} + static int pcc_try_dataset_attach(struct inode *inode, u32 gen, enum lu_pcc_type type, struct pcc_dataset *dataset, @@ -1229,9 +1280,8 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen, struct ll_inode_info *lli = ll_i2info(inode); struct pcc_inode *pcci = lli->lli_pcc_inode; const struct cred *old_cred; - struct dentry *pcc_dentry; - struct path path; - char *pathname; + struct dentry *pcc_dentry = NULL; + char pathname[PCC_DATASET_MAX_PATH]; u32 pcc_gen; int rc; @@ -1239,27 +1289,27 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen, !(dataset->pccd_flags & PCC_DATASET_RWPCC)) return 0; - pathname = kzalloc(PATH_MAX, GFP_KERNEL); - if (!pathname) - return -ENOMEM; - - pcc_fid2dataset_fullpath(pathname, PATH_MAX, &lli->lli_fid, dataset); + rc = pcc_fid2dataset_path(pathname, PCC_DATASET_MAX_PATH, + &lli->lli_fid); old_cred = override_creds(pcc_super_cred(inode->i_sb)); - rc = kern_path(pathname, LOOKUP_FOLLOW, &path); - if (rc) { + pcc_dentry = pcc_lookup(dataset->pccd_path.dentry, pathname); + if (IS_ERR(pcc_dentry)) { + rc = PTR_ERR(pcc_dentry); + CDEBUG(D_CACHE, "%s: path lookup error on "DFID":%s: rc = %d\n", + ll_i2sbi(inode)->ll_fsname, PFID(&lli->lli_fid), + pathname, rc); /* ignore this error */ rc = 0; goto out; } - pcc_dentry = path.dentry; rc = __vfs_getxattr(pcc_dentry, pcc_dentry->d_inode, pcc_xattr_layout, &pcc_gen, sizeof(pcc_gen)); if (rc < 0) { /* ignore this error */ rc = 0; - goto out_put_path; + goto out_put_pcc_dentry; } rc = 0; @@ -1271,7 +1321,7 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen, pcci = kmem_cache_zalloc(pcc_inode_slab, GFP_NOFS); if (!pcci) { rc = -ENOMEM; - goto out_put_path; + goto out_put_pcc_dentry; } pcc_inode_init(pcci, lli); @@ -1294,11 +1344,10 @@ static int pcc_try_dataset_attach(struct inode *inode, u32 gen, pcc_layout_gen_set(pcci, gen); *cached = true; } -out_put_path: - path_put(&path); +out_put_pcc_dentry: + dput(pcc_dentry); out: revert_creds(old_cred); - kfree(pathname); return rc; } @@ -2072,11 +2121,11 @@ static int __pcc_inode_create(struct pcc_dataset *dataset, struct dentry *child; int rc = 0; - path = kzalloc(MAX_PCC_DATABASE_PATH, GFP_NOFS); + path = kzalloc(PCC_DATASET_MAX_PATH, GFP_NOFS); if (!path) return -ENOMEM; - pcc_fid2dataset_path(path, MAX_PCC_DATABASE_PATH, fid); + pcc_fid2dataset_path(path, PCC_DATASET_MAX_PATH, fid); base = pcc_mkdir_p(dataset->pccd_path.dentry, path, 0); if (IS_ERR(base)) { @@ -2084,7 +2133,7 @@ static int __pcc_inode_create(struct pcc_dataset *dataset, goto out; } - snprintf(path, MAX_PCC_DATABASE_PATH, DFID_NOBRACE, PFID(fid)); + snprintf(path, PCC_DATASET_MAX_PATH, DFID_NOBRACE, PFID(fid)); child = pcc_create(base, path, 0); if (IS_ERR(child)) { rc = PTR_ERR(child); From patchwork Fri Jan 14 01:38:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713327 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 37E9FC433F5 for ; Fri, 14 Jan 2022 01:39:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F37CB3AD921; Thu, 13 Jan 2022 17:38:45 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9344C3AD7BC for ; Thu, 13 Jan 2022 17:38:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 017F7100F34B; Thu, 13 Jan 2022 20:38:05 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F3238DF4C4; Thu, 13 Jan 2022 20:38:04 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:38:00 -0500 Message-Id: <1642124283-10148-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/24] lustre: llite: revalidate dentry if LOOKUP lock fetched X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Once ll_inode_revalidate() fetches LOOKUP lock, it should revalidate dentry, so subsequent lookup can find it in dcache. It should also update lli_dir_depth. WC-bug-id: https://jira.whamcloud.com/browse/LU-15200 Lustre-commit: 92fadf9cc1d06b21b ("LU-15200 llite: revalidate dentry if LOOKUP lock fetched") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/45599 Reviewed-by: Yang Sheng Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dcache.c | 19 +++++++++++++++++-- fs/lustre/llite/file.c | 2 +- fs/lustre/llite/llite_internal.h | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/lustre/llite/dcache.c b/fs/lustre/llite/dcache.c index a074a2c..d9fb0cd 100644 --- a/fs/lustre/llite/dcache.c +++ b/fs/lustre/llite/dcache.c @@ -200,15 +200,30 @@ void ll_prune_aliases(struct inode *inode) int ll_revalidate_it_finish(struct ptlrpc_request *request, struct lookup_intent *it, - struct inode *inode) + struct dentry *de) { + struct inode *inode = d_inode(de); + u64 bits = 0; + int rc; + if (!request) return 0; if (it_disposition(it, DISP_LOOKUP_NEG)) return -ENOENT; - return ll_prep_inode(&inode, &request->rq_pill, NULL, it); + rc = ll_prep_inode(&inode, &request->rq_pill, NULL, it); + if (rc) + return rc; + + ll_set_lock_data(ll_i2sbi(inode)->ll_md_exp, inode, it, + &bits); + if (bits & MDS_INODELOCK_LOOKUP) { + ll_update_dir_depth(de->d_parent->d_inode, inode); + d_lustre_revalidate(de); + } + + return rc; } void ll_lookup_finish_locks(struct lookup_intent *it, struct inode *inode) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index dec0109..d9b1457 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -5033,7 +5033,7 @@ static int ll_inode_revalidate(struct dentry *dentry, enum ldlm_intent_flags op) goto out; } - rc = ll_revalidate_it_finish(req, &oit, inode); + rc = ll_revalidate_it_finish(req, &oit, dentry); if (rc != 0) { ll_intent_release(&oit); goto out; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 8c7361a..dd338f2 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1177,7 +1177,7 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, void ll_prune_aliases(struct inode *inode); void ll_lookup_finish_locks(struct lookup_intent *it, struct inode *inode); int ll_revalidate_it_finish(struct ptlrpc_request *request, - struct lookup_intent *it, struct inode *inode); + struct lookup_intent *it, struct dentry *de); /* llite/llite_lib.c */ extern const struct super_operations lustre_super_operations; From patchwork Fri Jan 14 01:38:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713313 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 31961C433FE for ; Fri, 14 Jan 2022 01:38:21 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73BE73AD7BC; Thu, 13 Jan 2022 17:38:17 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CB7813AD7F0 for ; Thu, 13 Jan 2022 17:38:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 06382100F34C; Thu, 13 Jan 2022 20:38:05 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0352AA8102; Thu, 13 Jan 2022 20:38:05 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:38:01 -0500 Message-Id: <1642124283-10148-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 22/24] lustre: llite: Simplify cda_no_aio_complete use X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell It is better to handle AIO and DIO the same as much as possible, limiting the difference to setup if possible. In this spirit, move the check for DIO (is_sync_kiocb()) to the setup function rather than cleanup and just use no_aio_complete. WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: b60bd21ec5d5f34ed ("LU-13799 llite: Simplify cda_no_aio_complete use") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/44154 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/cl_io.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index f33a5f38..675116d 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1135,8 +1135,7 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor) cl_page_put(env, page); } - if (!is_sync_kiocb(aio->cda_iocb) && !aio->cda_no_aio_complete && - aio->cda_iocb->ki_complete) + if (!aio->cda_no_aio_complete) aio->cda_iocb->ki_complete(aio->cda_iocb, ret ?: aio->cda_bytes, 0); } @@ -1156,7 +1155,10 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj) cl_aio_end); cl_page_list_init(&aio->cda_pages); aio->cda_iocb = iocb; - aio->cda_no_aio_complete = 0; + if (is_sync_kiocb(iocb)) + aio->cda_no_aio_complete = 1; + else + aio->cda_no_aio_complete = 0; cl_object_get(obj); aio->cda_obj = obj; } From patchwork Fri Jan 14 01:38:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 36DCDC433FE for ; Fri, 14 Jan 2022 01:38:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5ED2C3AD880; Thu, 13 Jan 2022 17:38:20 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0FA1E21FFAD for ; Thu, 13 Jan 2022 17:38:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 099BF100F34D; Thu, 13 Jan 2022 20:38:05 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 06630E07E3; Thu, 13 Jan 2022 20:38:05 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:38:02 -0500 Message-Id: <1642124283-10148-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 23/24] lustre: osc: Always set aio in anchor X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell We currently do not set csi_aio for DIO and use this to control when we free the aio struct. (For AIO, we must free it in cl_sync_io_note, but for other users, we have to wait until after cl_sync_io_wait has been called.) The lack of csi_aio causes trouble for the implementation of the next patch, so instead we always set it and control freeing by checking at that time if we are doing DIO. WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: eadccb33ac4bbe54a ("LU-13799 osc: Always set aio in anchor") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/44153 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/cl_io.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index 675116d..b72f5db 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1150,9 +1150,7 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj) * Hold one ref so that it won't be released until * every pages is added. */ - cl_sync_io_init_notify(&aio->cda_sync, 1, - is_sync_kiocb(iocb) ? NULL : aio, - cl_aio_end); + cl_sync_io_init_notify(&aio->cda_sync, 1, aio, cl_aio_end); cl_page_list_init(&aio->cda_pages); aio->cda_iocb = iocb; if (is_sync_kiocb(iocb)) @@ -1203,16 +1201,20 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, wake_up_locked(&anchor->csi_waitq); if (end_io) end_io(env, anchor); - if (anchor->csi_aio) - aio = anchor->csi_aio; + + aio = anchor->csi_aio; spin_unlock(&anchor->csi_waitq.lock); /** - * If anchor->csi_aio is set, we are responsible for freeing - * memory here rather than when cl_sync_io_wait() completes. + * For AIO (!is_sync_kiocb), we are responsible for freeing + * memory here. This is because we are the last user of this + * aio struct, whereas in other cases, we will call + * cl_sync_io_wait to wait after this, and so the memory is + * freed after that call. */ - cl_aio_free(env, aio); + if (aio && !is_sync_kiocb(aio->cda_iocb)) + cl_aio_free(env, aio); } } EXPORT_SYMBOL(cl_sync_io_note); From patchwork Fri Jan 14 01:38:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12713330 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4B83AC433F5 for ; Fri, 14 Jan 2022 01:39:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4D84E3AD9D2; Thu, 13 Jan 2022 17:38:53 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 476A921FFAD for ; Thu, 13 Jan 2022 17:38:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 0C261100F34E; Thu, 13 Jan 2022 20:38:05 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0A15BA8103; Thu, 13 Jan 2022 20:38:05 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 13 Jan 2022 20:38:03 -0500 Message-Id: <1642124283-10148-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> References: <1642124283-10148-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 24/24] lustre: llite: Implement lower/upper aio X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell This patch creates a lower level aio struct for each set of pages submitted, and attaches that to the llite level aio. That means the completion of i/o (in the sense of successful RPC/page completion) is associated with the lower level aio struct, and the higher level aio waits for the completion of these lower level structs. Previously, all pages were associated with the upper level (and only) aio struct. This patch is a reorganization/cleanup, which is necessary for the next patch, which moves release pages to aio_end. The justification for this (correctness and performance) will be provided in that patch. WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: 46ff76137160b66f1 ("LU-13799 llite: Implement lower/upper aio") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/44209 Reviewed-by: Andreas Dilger Reviewed-by: Yingjin Qian Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 7 +++++-- fs/lustre/llite/file.c | 2 +- fs/lustre/llite/rw26.c | 34 +++++++++++++++++++++++++-------- fs/lustre/obdclass/cl_io.c | 44 +++++++++++++++++++++++++++++++++---------- 4 files changed, 66 insertions(+), 21 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 1746c4e..9815b19 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -2592,7 +2592,8 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, int ioret); int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, long timeout, int ioret); -struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj); +struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj, + struct cl_dio_aio *ll_aio); void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio); static inline void cl_sync_io_init(struct cl_sync_io *anchor, int nr) @@ -2626,7 +2627,9 @@ struct cl_dio_aio { struct cl_object *cda_obj; struct kiocb *cda_iocb; ssize_t cda_bytes; - unsigned int cda_no_aio_complete:1; + struct cl_dio_aio *cda_ll_aio; + unsigned int cda_no_aio_complete:1, + cda_no_aio_free:1; }; /** @} cl_sync_io */ diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index d9b1457..6b95133 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1684,7 +1684,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, is_parallel_dio = false; ci_aio = cl_aio_alloc(args->u.normal.via_iocb, - ll_i2info(inode)->lli_clob); + ll_i2info(inode)->lli_clob, NULL); if (!ci_aio) { rc = -ENOMEM; goto out; diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index 4c2ab38..16cccfa 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -330,7 +330,8 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct cl_io *io; struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - struct cl_dio_aio *aio; + struct cl_dio_aio *ll_aio; + struct cl_dio_aio *ldp_aio; size_t count = iov_iter_count(iter); ssize_t tot_bytes = 0, result = 0; loff_t file_offset = iocb->ki_pos; @@ -365,12 +366,12 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) io = lcc->lcc_io; LASSERT(io); - aio = io->ci_aio; - LASSERT(aio); - LASSERT(aio->cda_iocb == iocb); + ll_aio = io->ci_aio; + LASSERT(ll_aio); + LASSERT(ll_aio->cda_iocb == iocb); while (iov_iter_count(iter)) { - struct ll_dio_pages pvec = { .ldp_aio = aio }; + struct ll_dio_pages pvec = {}; struct page **pages; count = min_t(size_t, iov_iter_count(iter), MAX_DIO_SIZE); @@ -382,10 +383,23 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) count = i_size_read(inode) - file_offset; } + /* this aio is freed on completion from cl_sync_io_note, so we + * do not need to directly free the memory here + */ + ldp_aio = cl_aio_alloc(iocb, ll_i2info(inode)->lli_clob, + ll_aio); + if (!ldp_aio) { + result = -ENOMEM; + goto out; + } + pvec.ldp_aio = ldp_aio; + result = ll_get_user_pages(rw, iter, &pages, &pvec.ldp_count, count); - if (unlikely(result <= 0)) + if (unlikely(result <= 0)) { + cl_sync_io_note(env, &ldp_aio->cda_sync, result); goto out; + } count = result; pvec.ldp_file_offset = file_offset; @@ -393,6 +407,10 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) result = ll_direct_rw_pages(env, io, count, rw, inode, &pvec); + /* We've submitted pages and can now remove the extra + * reference for that + */ + cl_sync_io_note(env, &ldp_aio->cda_sync, result); ll_free_user_pages(pages, pvec.ldp_count); if (unlikely(result < 0)) @@ -404,7 +422,7 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) } out: - aio->cda_bytes += tot_bytes; + ll_aio->cda_bytes += tot_bytes; if (rw == WRITE) vio->u.readwrite.vui_written += tot_bytes; @@ -424,7 +442,7 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) ssize_t rc2; /* Wait here rather than doing async submission */ - rc2 = cl_sync_io_wait_recycle(env, &aio->cda_sync, 0, 0); + rc2 = cl_sync_io_wait_recycle(env, &ll_aio->cda_sync, 0, 0); if (result == 0 && rc2) result = rc2; diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index b72f5db..038ab4c 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1138,9 +1138,13 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor) if (!aio->cda_no_aio_complete) aio->cda_iocb->ki_complete(aio->cda_iocb, ret ?: aio->cda_bytes, 0); + + if (aio->cda_ll_aio) + cl_sync_io_note(env, &aio->cda_ll_aio->cda_sync, ret); } -struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj) +struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj, + struct cl_dio_aio *ll_aio) { struct cl_dio_aio *aio; @@ -1153,12 +1157,30 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj) cl_sync_io_init_notify(&aio->cda_sync, 1, aio, cl_aio_end); cl_page_list_init(&aio->cda_pages); aio->cda_iocb = iocb; - if (is_sync_kiocb(iocb)) + if (is_sync_kiocb(iocb) || ll_aio) aio->cda_no_aio_complete = 1; else aio->cda_no_aio_complete = 0; + /* in the case of a lower level aio struct (ll_aio is set), or + * true AIO (!is_sync_kiocb()), the memory is freed by + * the daemons calling cl_sync_io_note, because they are the + * last users of the aio struct + * + * in other cases, the last user is cl_sync_io_wait, and in + * that case, the caller frees the aio struct after that call + * completes + */ + if (ll_aio || !is_sync_kiocb(iocb)) + aio->cda_no_aio_free = 0; + else + aio->cda_no_aio_free = 1; + cl_object_get(obj); aio->cda_obj = obj; + aio->cda_ll_aio = ll_aio; + + if (ll_aio) + atomic_add(1, &ll_aio->cda_sync.csi_sync_nr); } return aio; } @@ -1206,14 +1228,7 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, spin_unlock(&anchor->csi_waitq.lock); - /** - * For AIO (!is_sync_kiocb), we are responsible for freeing - * memory here. This is because we are the last user of this - * aio struct, whereas in other cases, we will call - * cl_sync_io_wait to wait after this, and so the memory is - * freed after that call. - */ - if (aio && !is_sync_kiocb(aio->cda_iocb)) + if (aio && !aio->cda_no_aio_free) cl_aio_free(env, aio); } } @@ -1223,8 +1238,15 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, long timeout, int ioret) { + bool no_aio_free = anchor->csi_aio->cda_no_aio_free; int rc = 0; + /* for true AIO, the daemons running cl_sync_io_note would normally + * free the aio struct, but if we're waiting on it, we need them to not + * do that. This ensures the aio is not freed when we drop the + * reference count to zero in cl_sync_io_note below + */ + anchor->csi_aio->cda_no_aio_free = 1; /* * @anchor was inited as 1 to prevent end_io to be * called before we add all pages for IO, so drop @@ -1244,6 +1266,8 @@ int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, */ atomic_add(1, &anchor->csi_sync_nr); + anchor->csi_aio->cda_no_aio_free = no_aio_free; + return rc; } EXPORT_SYMBOL(cl_sync_io_wait_recycle);