From patchwork Wed Jul 7 19:11:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29D2AC07E9C for ; Wed, 7 Jul 2021 19:11:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB1E961C48 for ; Wed, 7 Jul 2021 19:11:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB1E961C48 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 007962FA8B4; Wed, 7 Jul 2021 12:11:28 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DE07021F888 for ; Wed, 7 Jul 2021 12:11:19 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3691E10090E2; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 30BF69D8B2; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:02 -0400 Message-Id: <1625685076-1964-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/15] lustre: osc: Notify server if cache discard takes a long time X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin Discarding a large number of pages from a mapping under a single lock can take a really long time (750GB is over 170s). Since there is no stream of RPCs sent to the server as with read or write to prolong the DLM lock timeout, the server may evict the client as it does not see progress is being made. As such send periodic "empty" RPCs to the server to show the client is still alive and working on the pages under the lock. For compatibility reasons the RPC is formed as a one-byte OST_READ request with a special flag set to avoid doing actual IO, but older servers actually do the one-byte read WC-bug-id: https://jira.whamcloud.com/browse/LU-14711 Lustre-commit: 564070343ac4ccf4 ("LU-14711 osc: Notify server if cache discard takes a long time") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/43857 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Patrick Farrell Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 3 +++ fs/lustre/osc/osc_cache.c | 11 +++++++++ fs/lustre/osc/osc_internal.h | 1 + fs/lustre/osc/osc_request.c | 54 +++++++++++++++++++++++++++++++++---------- 4 files changed, 57 insertions(+), 12 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index c615091..1495949 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1919,6 +1919,9 @@ struct cl_io { loff_t ls_result; int ls_whence; } ci_lseek; + struct cl_misc_io { + time64_t lm_next_rpc_time; + } ci_misc; } u; struct cl_2queue ci_queue; size_t ci_nob; diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 8dd12b1..321e9d9 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -3186,6 +3186,15 @@ bool osc_page_gang_lookup(const struct lu_env *env, struct cl_io *io, if (!res) break; + + if (io->ci_type == CIT_MISC && + io->u.ci_misc.lm_next_rpc_time && + ktime_get_seconds() > io->u.ci_misc.lm_next_rpc_time) { + osc_send_empty_rpc(osc, idx << PAGE_SHIFT); + io->u.ci_misc.lm_next_rpc_time = ktime_get_seconds() + + 5 * obd_timeout / 16; + } + if (need_resched()) cond_resched(); @@ -3320,6 +3329,8 @@ int osc_lock_discard_pages(const struct lu_env *env, struct osc_object *osc, io->ci_obj = cl_object_top(osc2cl(osc)); io->ci_ignore_layout = 1; + io->u.ci_misc.lm_next_rpc_time = ktime_get_seconds() + + 5 * obd_timeout / 16; result = cl_io_init(env, io, CIT_MISC, io->ci_obj); if (result != 0) goto out; diff --git a/fs/lustre/osc/osc_internal.h b/fs/lustre/osc/osc_internal.h index 3b65f2d..d174691 100644 --- a/fs/lustre/osc/osc_internal.h +++ b/fs/lustre/osc/osc_internal.h @@ -87,6 +87,7 @@ int osc_ladvise_base(struct obd_export *exp, struct obdo *oa, int osc_process_config_base(struct obd_device *obd, struct lustre_cfg *cfg); int osc_build_rpc(const struct lu_env *env, struct client_obd *cli, struct list_head *ext_list, int cmd); +void osc_send_empty_rpc(struct osc_object *osc, pgoff_t start); unsigned long osc_lru_reserve(struct client_obd *cli, unsigned long npages); void osc_lru_unreserve(struct client_obd *cli, unsigned long npages); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 0d590ed..2b2ee83 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1399,21 +1399,23 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli, struct brw_page *pg_prev; void *short_io_buf; const char *obd_name = cli->cl_import->imp_obd->obd_name; - struct inode *inode; + struct inode *inode = NULL; bool directio = false; - inode = page2inode(pga[0]->pg); - if (!inode) { - /* Try to get reference to inode from cl_page if we are - * dealing with direct IO, as handled pages are not - * actual page cache pages. - */ - struct osc_async_page *oap = brw_page2oap(pga[0]); - struct cl_page *clpage = oap2cl_page(oap); + if (pga[0]->pg) { + inode = page2inode(pga[0]->pg); + if (!inode) { + /* Try to get reference to inode from cl_page if we are + * dealing with direct IO, as handled pages are not + * actual page cache pages. + */ + struct osc_async_page *oap = brw_page2oap(pga[0]); + struct cl_page *clpage = oap2cl_page(oap); - inode = clpage->cp_inode; - if (inode) - directio = true; + inode = clpage->cp_inode; + if (inode) + directio = true; + } } if (OBD_FAIL_CHECK(OBD_FAIL_OSC_BRW_PREP_REQ)) return -ENOMEM; /* Recoverable */ @@ -2666,6 +2668,34 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli, return rc; } +/* This is to refresh our lock in face of no RPCs. */ +void osc_send_empty_rpc(struct osc_object *osc, pgoff_t start) +{ + struct ptlrpc_request *req; + struct obdo oa; + struct brw_page bpg = { .off = start, .count = 1}; + struct brw_page *pga = &bpg; + int rc; + + memset(&oa, 0, sizeof(oa)); + oa.o_oi = osc->oo_oinfo->loi_oi; + oa.o_valid = OBD_MD_FLID | OBD_MD_FLGROUP | OBD_MD_FLFLAGS; + /* For updated servers - don't do a read */ + oa.o_flags = OBD_FL_NORPC; + + rc = osc_brw_prep_request(OBD_BRW_READ, osc_cli(osc), &oa, 1, &pga, + &req, 0); + + /* If we succeeded we ship it off, if not there's no point in doing + * anything. Also no resends. + * No interpret callback, no commit callback. + */ + if (!rc) { + req->rq_no_resend = 1; + ptlrpcd_add_req(req); + } +} + static int osc_set_lock_data(struct ldlm_lock *lock, void *data) { int set = 0; From patchwork Wed Jul 7 19:11:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE31EC07E95 for ; Wed, 7 Jul 2021 19:12:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8507B61A13 for ; Wed, 7 Jul 2021 19:12:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8507B61A13 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1F78D338AA6; Wed, 7 Jul 2021 12:11:53 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4117721F888 for ; Wed, 7 Jul 2021 12:11:20 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3C2B610090E4; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 33B339D8BA; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:03 -0400 Message-Id: <1625685076-1964-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/15] lustre: osc: Move shrink update to per-write X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Updating the grant shrink interval is currently done for each page submitted, rather than once per write. Since the grant shrink interval is in seconds, this is unnecessary. This came up because this function showed up in the perf traces for https://review.whamcloud.com/#/c/38151/, and it is called with the cl_loi_list_lock held. Note that this change makes this access to the grant shrink interval a 'dirty' access, without locking, but the grant shrink interval is: A) Already accessed like this in various places, and B) can safely be out of date or suffer a lost update without affecting correctness or performance. IOR performance testing with this test: mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F No patches: 5942 MiB/s With 38151: 14950 MiB/s With 38151+this: 15320 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13419 Lustre-commit: c24c25dc1b84912 ("LU-13419 osc: Move shrink update to per-write") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/38214 Reviewed-by: Andreas Dilger Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 1 - fs/lustre/osc/osc_io.c | 5 +++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 321e9d9..0f0daa1 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -1426,7 +1426,6 @@ static void osc_consume_write_grant(struct client_obd *cli, pga->flag |= OBD_BRW_FROM_GRANT; CDEBUG(D_CACHE, "using %lu grant credits for brw %p page %p\n", PAGE_SIZE, pga, pga->pg); - osc_update_next_shrink(cli); } /* the companion to osc_consume_write_grant, called when a brw has completed. diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index de214ba..67fe85b 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -354,6 +354,11 @@ int osc_io_commit_async(const struct lu_env *env, pagevec_reinit(pvec); } } + /* The shrink interval is in seconds, so we can update it once per + * write, rather than once per page. + */ + osc_update_next_shrink(osc_cli(osc)); + /* Clean up any partially full pagevecs */ if (pagevec_count(pvec) != 0) From patchwork Wed Jul 7 19:11:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E824C07E9C for ; Wed, 7 Jul 2021 19:11:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15B8A61A13 for ; Wed, 7 Jul 2021 19:11:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15B8A61A13 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CC1933379F4; Wed, 7 Jul 2021 12:11:25 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 78DDD21F888 for ; Wed, 7 Jul 2021 12:11:20 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3ECEF10090E5; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 35B8C9D8BC; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:04 -0400 Message-Id: <1625685076-1964-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/15] lustre: client: don't panic for mgs evictions X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ben Evans , Alexander Boyko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alexander Boyko Avoid client panics for MGS evictions. Create a function to check if the eviction is coming from an MGS, and if so to ignore it. Rework dump_on_eviction and lbug_on_eviction so all logic is handled in one place. HPE-bug-id: LUS-197 WC-bug-id: https://jira.whamcloud.com/browse/LU-13811 Lustre-commit: 5d8f6742e65d588d ("LU-13811 client: don't panic for mgs evictions") Signed-off-by: Alexander Boyko Signed-off-by: Ben Evans Reviewed-on: https://review.whamcloud.com/43655 Reviewed-by: Andriy Skulysh Reviewed-by: Alexander Zarochentsev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_class.h | 15 +++++++++++++++ fs/lustre/ptlrpc/import.c | 5 ++--- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 2fe4ea2..f2a3d2b 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1701,6 +1701,21 @@ int class_add_nids_to_uuid(struct obd_uuid *uuid, lnet_nid_t *nids, int class_procfs_init(void); int class_procfs_clean(void); +extern unsigned int obd_lbug_on_eviction; +extern unsigned int obd_dump_on_eviction; + +static inline bool do_dump_on_eviction(struct obd_device *exp_obd) +{ + if (obd_lbug_on_eviction && + strncmp(exp_obd->obd_type->typ_name, LUSTRE_MGC_NAME, + strlen(LUSTRE_MGC_NAME))) { + CERROR("LBUG upon eviction\n"); + LBUG(); + } + + return obd_dump_on_eviction; +} + /* statfs_pack.c */ struct kstatfs; void statfs_pack(struct obd_statfs *osfs, struct kstatfs *sfs); diff --git a/fs/lustre/ptlrpc/import.c b/fs/lustre/ptlrpc/import.c index 1f31edb..f28fb68 100644 --- a/fs/lustre/ptlrpc/import.c +++ b/fs/lustre/ptlrpc/import.c @@ -1473,13 +1473,12 @@ static int ptlrpc_invalidate_import_thread(void *data) imp->imp_obd->obd_name, obd2cli_tgt(imp->imp_obd), imp->imp_connection->c_remote_uuid.uuid); - ptlrpc_invalidate_import(imp); - - if (obd_dump_on_eviction) { + if (do_dump_on_eviction(imp->imp_obd)) { CERROR("dump the log upon eviction\n"); libcfs_debug_dumplog(); } + ptlrpc_invalidate_import(imp); import_set_state(imp, LUSTRE_IMP_RECOVER); ptlrpc_import_recovery_state_machine(imp); From patchwork Wed Jul 7 19:11:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58A91C07E95 for ; Wed, 7 Jul 2021 19:12:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1D98561C48 for ; Wed, 7 Jul 2021 19:12:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1D98561C48 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D6DE6338A3E; Wed, 7 Jul 2021 12:11:55 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B20DA21F888 for ; Wed, 7 Jul 2021 12:11:20 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4056410090E6; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3939E9D8BD; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:05 -0400 Message-Id: <1625685076-1964-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/15] lnet: Add health ping stats X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Add the NI and peer NI ping count and next ping timestamp to detailed output of lnetctl peer and net output. HPE-bug-id: LUS-9109 WC-bug-id: https://jira.whamcloud.com/browse/LU-13569 Lustre-commit: 4c7e4aa576296603 ("LU-13569 lnet: Add health ping stats") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/40314 Reviewed-by: Alexander Boyko Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lnet/lnet-dlc.h | 4 ++++ net/lnet/lnet/api-ni.c | 2 ++ net/lnet/lnet/peer.c | 7 +++++-- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index b375d0a..c1c063f 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -191,6 +191,8 @@ struct lnet_ioctl_local_ni_hstats { __u32 hlni_local_timeout; __u32 hlni_local_error; __s32 hlni_health_value; + __u32 hlni_ping_count; + __u64 hlni_next_ping; }; struct lnet_ioctl_peer_ni_hstats { @@ -199,6 +201,8 @@ struct lnet_ioctl_peer_ni_hstats { __u32 hlpni_remote_error; __u32 hlpni_network_timeout; __s32 hlpni_health_value; + __u32 hlpni_ping_count; + __u64 hlpni_next_ping; }; struct lnet_ioctl_element_msg_stats { diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index d6a8c1b..e52bb41 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3634,6 +3634,8 @@ u32 lnet_get_dlc_seq_locked(void) atomic_read(&ni->ni_hstats.hlt_local_error); stats->hlni_health_value = atomic_read(&ni->ni_healthv); + stats->hlni_ping_count = ni->ni_ping_count; + stats->hlni_next_ping = ni->ni_next_ping; unlock: lnet_net_unlock(cpt); diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 2fc784d..76b2d2f 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3986,6 +3986,8 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) atomic_read(&lpni->lpni_hstats.hlt_remote_error); lpni_hstats->hlpni_health_value = atomic_read(&lpni->lpni_healthv); + lpni_hstats->hlpni_ping_count = lpni->lpni_ping_count; + lpni_hstats->hlpni_next_ping = lpni->lpni_next_ping; if (copy_to_user(bulk, lpni_hstats, sizeof(*lpni_hstats))) goto out_free_hstats; bulk += sizeof(*lpni_hstats); @@ -4081,7 +4083,7 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lnet_net_unlock(LNET_LOCK_EX); return; } - atomic_set(&lpni->lpni_healthv, value); + lnet_set_lpni_healthv_locked(lpni, value); lnet_peer_ni_add_to_recoveryq_locked(lpni, &the_lnet.ln_mt_peerNIRecovq, now); lnet_peer_ni_decref_locked(lpni); @@ -4102,7 +4104,8 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) lpn_peer_nets) { list_for_each_entry(lpni, &lpn->lpn_peer_nis, lpni_peer_nis) { - atomic_set(&lpni->lpni_healthv, value); + lnet_set_lpni_healthv_locked(lpni, + value); lnet_peer_ni_add_to_recoveryq_locked(lpni, &the_lnet.ln_mt_peerNIRecovq, now); From patchwork Wed Jul 7 19:11:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363927 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B64EC07E9B for ; Wed, 7 Jul 2021 19:12:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E379B61C48 for ; Wed, 7 Jul 2021 19:12:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E379B61C48 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5D398338A80; Wed, 7 Jul 2021 12:11:44 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 11FD321F888 for ; Wed, 7 Jul 2021 12:11:21 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 44B2310090EE; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3BF119D8BF; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:06 -0400 Message-Id: <1625685076-1964-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/15] lnet: Ensure ref taken when queueing for discovery X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Call lnet_peer_queue_for_discovery() in lnet_discovery_event_handler() to ensure that we take a ref on the peer when forcing it onto the discovery queue. This also ensures that the peer state has LNET_PEER_DISCOVERING. Add a test to sanity-lnet.sh that can trigger the refcount loss bug in discovery. HPE-bug-id: LUS-7651 WC-bug-id: https://jira.whamcloud.com/browse/LU-14627 Lustre-commit: 2ce6957b69370b0c ("LU-14627 lnet: Ensure ref taken when queueing for discovery") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/43418 Reviewed-by: Serguei Smirnov Reviewed-by: Alexander Boyko Reviewed-by: James Simmons Reviewed-by: Stephane Thiell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 76b2d2f..29c3372 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2783,7 +2783,8 @@ static void lnet_discovery_event_handler(struct lnet_event *event) /* Put peer back at end of request queue, if discovery not already * done */ - if (rc == LNET_REDISCOVER_PEER && !lnet_peer_is_uptodate(lp)) { + if (rc == LNET_REDISCOVER_PEER && !lnet_peer_is_uptodate(lp) && + lnet_peer_queue_for_discovery(lp)) { list_move_tail(&lp->lp_dc_list, &the_lnet.ln_dc_request); wake_up(&the_lnet.ln_dc_waitq); } From patchwork Wed Jul 7 19:11:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27591C07E9B for ; Wed, 7 Jul 2021 19:11:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3131961A13 for ; Wed, 7 Jul 2021 19:11:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3131961A13 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2FC0A309ED3; Wed, 7 Jul 2021 12:11:29 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4905421F888 for ; Wed, 7 Jul 2021 12:11:21 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 446E610090E8; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3EDD59D8C0; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:07 -0400 Message-Id: <1625685076-1964-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/15] lnet: Correct distance calculation of local NIDs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Multi-rail peers can have multiple local NIDs on the same net, but LNetDist() may only identify a NID as local if it is the first one returned by lnet_get_next_ni_locked(). We need to check all local NIs to find a match for the target NID in LNetDist(). Add test to check LNetDist() calculation of local NIDs for a peer with multiple NIDs on the same net. HPE-bug-id: LUS-9964 WC-bug-id: https://jira.whamcloud.com/browse/LU-14649 Lustre-commit: 4d0162037415988b ("LU-14649 lnet: Correct distance calculation of local NIDs") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/43498 Reviewed-by: Serguei Smirnov Reviewed-by: Alexander Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 40 +++++++++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 13 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 3ae0209..33d7e78 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -4981,6 +4981,7 @@ struct lnet_msg * int cpt; u32 order = 2; struct list_head *rn_list; + bool matched_dstnet = false; /* * if !local_nid_dist_zero, I don't return a distance of 0 ever @@ -5007,27 +5008,40 @@ struct lnet_msg * return local_nid_dist_zero ? 0 : 1; } - if (LNET_NIDNET(ni->ni_nid) == dstnet) { - /* - * Check if ni was originally created in - * current net namespace. - * If not, assign order above 0xffff0000, - * to make this ni not a priority. + if (!matched_dstnet && LNET_NIDNET(ni->ni_nid) == dstnet) { + matched_dstnet = true; + /* We matched the destination net, but we may have + * additional local NIs to inspect. + * + * We record the nid and order as appropriate, but + * they may be overwritten if we match local NI above. */ - if (current->nsproxy && - !net_eq(ni->ni_net_ns, current->nsproxy->net_ns)) - order += 0xffff0000; if (srcnidp) *srcnidp = ni->ni_nid; - if (orderp) - *orderp = order; - lnet_net_unlock(cpt); - return 1; + + if (orderp) { + /* Check if ni was originally created in + * current net namespace. + * If not, assign order above 0xffff0000, + * to make this ni not a priority. + */ + if (current->nsproxy && + !net_eq(ni->ni_net_ns, + current->nsproxy->net_ns)) + *orderp = order + 0xffff0000; + else + *orderp = order; + } } order++; } + if (matched_dstnet) { + lnet_net_unlock(cpt); + return 1; + } + rn_list = lnet_net2rnethash(dstnet); list_for_each_entry(rnet, rn_list, lrn_list) { if (rnet->lrn_net == dstnet) { From patchwork Wed Jul 7 19:11:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE2B0C07E9B for ; Wed, 7 Jul 2021 19:11:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 58B7061C48 for ; Wed, 7 Jul 2021 19:11:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 58B7061C48 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5456221FDD7; Wed, 7 Jul 2021 12:11:25 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8433921F888 for ; Wed, 7 Jul 2021 12:11:21 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 488D610090EF; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 429BA9D8AD; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:08 -0400 Message-Id: <1625685076-1964-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/15] lnet: socklnd: detect link state to set fatal error on ni X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov To help avoid selecting lnet ni which corresponds to a downed ethernet link for sending, add a mechanism for detecting link events in socklnd. On link up/down events, find corresponding ni and toggle ni_fatal_error_on flag, similar to o2iblnd way. WC-bug-id: https://jira.whamcloud.com/browse/LU-14742 Lustre-commit: fc2df80e96dc5db9f ("LU-14742 socklnd: detect link state to set fatal error on ni") Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/43952 Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 78 ++++++++++++++++++++++++++++++++++++++++ net/lnet/klnds/socklnd/socklnd.h | 1 + 2 files changed, 79 insertions(+) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index eb8c736..e15f1c0 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1843,6 +1843,78 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) } } +static int ksocknal_get_link_status(struct net_device *dev) +{ + int ret = -1; + + LASSERT(dev); + + if (!netif_running(dev)) + ret = 0; + /* Some devices may not be providing link settings */ + else if (dev->ethtool_ops->get_link) + ret = dev->ethtool_ops->get_link(dev); + + return ret; +} + +static int +ksocknal_handle_link_state_change(struct net_device *dev, + unsigned char operstate) +{ + struct lnet_ni *ni; + struct ksock_net *net; + struct ksock_net *cnxt; + int ifindex; + unsigned char link_down = !(operstate == IF_OPER_UP); + + ifindex = dev->ifindex; + + if (!ksocknal_data.ksnd_nnets) + goto out; + + list_for_each_entry_safe(net, cnxt, &ksocknal_data.ksnd_nets, + ksnn_list) { + if (net->ksnn_interface.ksni_index != ifindex) + continue; + ni = net->ksnn_ni; + if (link_down) + atomic_set(&ni->ni_fatal_error_on, link_down); + else + atomic_set(&ni->ni_fatal_error_on, + (ksocknal_get_link_status(dev) == 0)); + } +out: + return 0; +} + + +/************************************ + * Net device notifier event handler + ************************************/ +static int ksocknal_device_event(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + unsigned char operstate; + + operstate = dev->operstate; + + switch (event) { + case NETDEV_UP: + case NETDEV_DOWN: + case NETDEV_CHANGE: + ksocknal_handle_link_state_change(dev, operstate); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block ksocknal_notifier_block = { + .notifier_call = ksocknal_device_event, +}; + static void ksocknal_base_shutdown(void) { @@ -1852,6 +1924,9 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) LASSERT(!ksocknal_data.ksnd_nnets); + if (ksocknal_data.ksnd_init == SOCKNAL_INIT_ALL) + unregister_netdevice_notifier(&ksocknal_notifier_block); + switch (ksocknal_data.ksnd_init) { default: LASSERT(0); @@ -2015,6 +2090,8 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) goto failed; } + register_netdevice_notifier(&ksocknal_notifier_block); + /* flag everything initialised */ ksocknal_data.ksnd_init = SOCKNAL_INIT_ALL; @@ -2297,6 +2374,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) ni->ni_nid = LNET_MKNID(LNET_NIDNET(ni->ni_nid), ntohl(((struct sockaddr_in *)&ksi->ksni_addr)->sin_addr.s_addr)); list_add(&net->ksnn_list, &ksocknal_data.ksnd_nets); + net->ksnn_ni = ni; ksocknal_data.ksnd_nnets++; return 0; diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index dac8559..357769a 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -175,6 +175,7 @@ struct ksock_net { struct list_head ksnn_list; /* chain on global list */ atomic_t ksnn_npeers; /* # peers */ struct ksock_interface ksnn_interface; /* IP interface */ + struct lnet_ni *ksnn_ni; }; /* When the ksock_net is shut down, this bias is added to From patchwork Wed Jul 7 19:11:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C13FC07E9B for ; Wed, 7 Jul 2021 19:11:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 430F261C48 for ; Wed, 7 Jul 2021 19:11:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 430F261C48 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 95C93338A1D; Wed, 7 Jul 2021 12:11:32 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EEF2F21F8CC for ; Wed, 7 Jul 2021 12:11:21 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 49CF810090F0; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 45DC79D8B2; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:09 -0400 Message-Id: <1625685076-1964-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/15] lustre: mdt: New connect flag for non-open-by-fid lock request X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin While we removed the 2.1 check for open by fid when open lock is requested, when you talk to old servers that don't have that patch - they get an open error, so introduce a compat flag. Fixes: c9e0538f2b ("lustre: llite: Introduce inode open heat counter") WC-bug-id: https://jira.whamcloud.com/browse/LU-10948 Lustre-commit: 72c9a6e5fb6e11fca ("LU-10948 mdt: New connect flag for non-open-by-fid lock request") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/43907 Reviewed-by: Andreas Dilger Reviewed-by: James Nunez Signed-off-by: James Simmons --- fs/lustre/llite/llite_lib.c | 3 ++- fs/lustre/llite/namei.c | 4 +++- fs/lustre/obdclass/lprocfs_status.c | 6 ++++++ fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 5 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 646bff8..b131edd 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -316,7 +316,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT2_CRUSH | OBD_CONNECT2_LSEEK | OBD_CONNECT2_GETATTR_PFID | OBD_CONNECT2_DOM_LVB | - OBD_CONNECT2_REP_MBITS; + OBD_CONNECT2_REP_MBITS | + OBD_CONNECT2_ATOMIC_OPEN_LOCK; if (sbi->ll_flags & LL_SBI_LRU_RESIZE) data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index f42e872..f32aa14 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -1145,7 +1145,9 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, * we only need to request open lock if it was requested * for every open */ - if (ll_i2sbi(dir)->ll_oc_thrsh_count == 1) + if (ll_i2sbi(dir)->ll_oc_thrsh_count == 1 && + exp_connect_flags2(ll_i2mdexp(dir)) & + OBD_CONNECT2_ATOMIC_OPEN_LOCK) it->it_flags |= MDS_OPEN_LOCK; /* Dentry added to dcache tree in ll_lookup_it */ diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c index 0cad91d..db809f3 100644 --- a/fs/lustre/obdclass/lprocfs_status.c +++ b/fs/lustre/obdclass/lprocfs_status.c @@ -131,6 +131,12 @@ "lseek", /* 0x40000 */ "dom_lvb", /* 0x80000 */ "reply_mbits", /* 0x100000 */ + "mode_convert", /* 0x200000 */ + "batch_rpc", /* 0x400000 */ + "pcc_ro", /* 0x800000 */ + "mne_nid_type", /* 0x1000000 */ + "lock_contend", /* 0x2000000 */ + "atomic_open_lock", /* 0x4000000 */ NULL }; diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index db97748..9e0eaa7 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1252,6 +1252,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_DOM_LVB); LASSERTF(OBD_CONNECT2_REP_MBITS == 0x100000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_REP_MBITS); + LASSERTF(OBD_CONNECT2_ATOMIC_OPEN_LOCK == 0x4000000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_ATOMIC_OPEN_LOCK); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", (unsigned int)OBD_CKSUM_CRC32); LASSERTF(OBD_CKSUM_ADLER == 0x00000002UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 813e4fc..68bb807 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -840,6 +840,7 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_LSEEK 0x40000ULL /* SEEK_HOLE/DATA RPC */ #define OBD_CONNECT2_DOM_LVB 0x80000ULL /* pack DOM glimpse data in LVB */ #define OBD_CONNECT2_REP_MBITS 0x100000ULL /* match reply by mbits, not xid */ +#define OBD_CONNECT2_ATOMIC_OPEN_LOCK 0x4000000ULL/* request lock on 1st open */ /* XXX README XXX: * Please DO NOT add flag values here before first ensuring that this same * flag value is not in use on some other branch. Please clear any such From patchwork Wed Jul 7 19:11:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44DF4C07E95 for ; Wed, 7 Jul 2021 19:11:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 030CD61C48 for ; Wed, 7 Jul 2021 19:11:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 030CD61C48 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1065E337D76; Wed, 7 Jul 2021 12:11:36 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4522A21F8CC for ; Wed, 7 Jul 2021 12:11:22 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4B437100BB0E; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 48BF69D8BA; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:10 -0400 Message-Id: <1625685076-1964-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/15] lustre: obdclass: Wake up entire queue of requests on close completion X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin Since close requests could be stuck behind normal requests and get more slots we need to wake up entire accumulated queue waiting for the next modrpc slot or have additional waitqueue just for close requests. This patch goes with the former approach. Fixes: 7cb15d0448 ("staging: lustre: mdc: manage number of modify RPCs in flight") WC-bug-id: https://jira.whamcloud.com/browse/LU-10948 Lustre-commit: a4e1567d67559b797 ("LU-14741 obdclass: Wake up entire queue of requests on close completion") Signed-off-by: Oleg Drokin Reviewed-on: https://review.whamcloud.com/43941 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Neil Brown Signed-off-by: James Simmons --- fs/lustre/obdclass/genops.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index bbb63b2..4e89e0a 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -1587,6 +1587,10 @@ void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, u16 tag) LASSERT(tag - 1 < OBD_MAX_RIF_MAX); LASSERT(test_and_clear_bit(tag - 1, cli->cl_mod_tag_bitmap) != 0); spin_unlock(&cli->cl_mod_rpcs_lock); - wake_up(&cli->cl_mod_rpcs_waitq); + /* LU-14741 - to prevent close RPCs stuck behind normal ones */ + if (close_req) + wake_up_all(&cli->cl_mod_rpcs_waitq); + else + wake_up(&cli->cl_mod_rpcs_waitq); } EXPORT_SYMBOL(obd_put_mod_rpc_slot); From patchwork Wed Jul 7 19:11:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F0BBC07E95 for ; Wed, 7 Jul 2021 19:11:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F3A5161A13 for ; Wed, 7 Jul 2021 19:11:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F3A5161A13 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 54F19338A54; Wed, 7 Jul 2021 12:11:39 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8673421F8D5 for ; Wed, 7 Jul 2021 12:11:22 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4F243100F361; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4BCC99D8BC; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:11 -0400 Message-Id: <1625685076-1964-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/15] lnet: add netlink infrastructure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" Netlink was designed as a successor to ioctl as defined under RFC 3549. There are several advantages to using netlink over ioctls or virtual file system interfaces like proc. Collecting proc doesn't scale well which was seen with power drain on Android phones. A netlink implementation was developed to remove this performance hit. Details can be read at: https://lwn.net/Articles/406975 Besides the scaling gains the other benefit is the flexiblity with API changes. Adding or removing information to be transmitted doesn't require creating a new interface like ioctl do. Instead you add new code to handle the stream of attributes read from the socket. Lastly you can multiplex data to N listeners with groups using one request. This patch adds netlink handling in a generic way that can be used by the libyaml library. This greatly lowers the barrier by only requiring the implementor to understand the libyaml API. WC-bug-id: https://jira.whamcloud.com/browse/LU-9680 Lustre-commit: 3c39dac19aaf7f3f ("LU-9680 utils: add netlink infrastructure") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/34230 Reviewed-by: Petros Koutoupis Reviewed-by: Ben Evans Reviewed-by: Oleg Drokin --- include/linux/lnet/lib-types.h | 15 ++++++++ include/uapi/linux/lnet/lnet-nl.h | 67 ++++++++++++++++++++++++++++++++ net/lnet/lnet/api-ni.c | 81 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 163 insertions(+) create mode 100644 include/uapi/linux/lnet/lnet-nl.h diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index cb0a950..64d7472 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -43,7 +43,9 @@ #include #include #include +#include +#include #include #include #include @@ -1280,4 +1282,17 @@ struct lnet { struct list_head ln_udsp_list; }; +static const struct nla_policy scalar_attr_policy[LN_SCALAR_CNT + 1] = { + [LN_SCALAR_ATTR_LIST] = { .type = NLA_NESTED }, + [LN_SCALAR_ATTR_LIST_SIZE] = { .type = NLA_U16 }, + [LN_SCALAR_ATTR_INDEX] = { .type = NLA_U16 }, + [LN_SCALAR_ATTR_NLA_TYPE] = { .type = NLA_U16 }, + [LN_SCALAR_ATTR_VALUE] = { .type = NLA_STRING }, + [LN_SCALAR_ATTR_KEY_FORMAT] = { .type = NLA_U16 }, +}; + +int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq, + const struct genl_family *family, int flags, + u8 cmd, const struct ln_key_list *data[]); + #endif diff --git a/include/uapi/linux/lnet/lnet-nl.h b/include/uapi/linux/lnet/lnet-nl.h new file mode 100644 index 0000000..f5bb67c --- /dev/null +++ b/include/uapi/linux/lnet/lnet-nl.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: LGPL-2.0+ WITH Linux-syscall-note */ +/* + * LGPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. + * + * LGPL HEADER END + * + */ +/* Copyright (c) 2021, UT-Battelle, LLC + * + * Author: James Simmons + */ + +#ifndef __UAPI_LNET_NL_H__ +#define __UAPI_LNET_NL_H__ + +#include + +enum lnet_nl_key_format { + /* Is it FLOW or BLOCK */ + LNKF_FLOW = 1, + /* Is it SEQUENCE or MAPPING */ + LNKF_MAPPING = 2, + LNKF_SEQUENCE = 4, +}; + +enum lnet_nl_scalar_attrs { + LN_SCALAR_ATTR_UNSPEC = 0, + LN_SCALAR_ATTR_LIST, + + LN_SCALAR_ATTR_LIST_SIZE, + LN_SCALAR_ATTR_INDEX, + LN_SCALAR_ATTR_NLA_TYPE, + LN_SCALAR_ATTR_VALUE, + LN_SCALAR_ATTR_KEY_FORMAT, + + __LN_SCALAR_ATTR_LAST, +}; + +#define LN_SCALAR_CNT (__LN_SCALAR_ATTR_LAST - 1) + +struct ln_key_props { + char *lkp_values; + __u16 lkp_key_format; + __u16 lkp_data_type; +}; + +struct ln_key_list { + __u16 lkl_maxattr; + struct ln_key_props lkl_list[]; +}; + +#endif /* __UAPI_LNET_NL_H__ */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index e52bb41..687df3b 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -2572,6 +2572,87 @@ static void lnet_push_target_fini(void) return rc; } +static int lnet_genl_parse_list(struct sk_buff *msg, + const struct ln_key_list *data[], u16 idx) +{ + const struct ln_key_list *list = data[idx]; + const struct ln_key_props *props; + struct nlattr *node; + u16 count; + + if (!list) + return 0; + + if (!list->lkl_maxattr) + return -ERANGE; + + props = list->lkl_list; + if (!props) + return -EINVAL; + + node = nla_nest_start(msg, LN_SCALAR_ATTR_LIST); + if (!node) + return -ENOBUFS; + + for (count = 1; count <= list->lkl_maxattr; count++) { + struct nlattr *key = nla_nest_start(msg, count); + + if (count == 1) + nla_put_u16(msg, LN_SCALAR_ATTR_LIST_SIZE, + list->lkl_maxattr); + + nla_put_u16(msg, LN_SCALAR_ATTR_INDEX, count); + if (props[count].lkp_values) + nla_put_string(msg, LN_SCALAR_ATTR_VALUE, + props[count].lkp_values); + if (props[count].lkp_key_format) + nla_put_u16(msg, LN_SCALAR_ATTR_KEY_FORMAT, + props[count].lkp_key_format); + nla_put_u16(msg, LN_SCALAR_ATTR_NLA_TYPE, + props[count].lkp_data_type); + if (props[count].lkp_data_type == NLA_NESTED) { + int rc; + + rc = lnet_genl_parse_list(msg, data, ++idx); + if (rc < 0) + return rc; + } + + nla_nest_end(msg, key); + } + + nla_nest_end(msg, node); + return 0; +} + +int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq, + const struct genl_family *family, int flags, + u8 cmd, const struct ln_key_list *data[]) +{ + int rc = 0; + void *hdr; + + if (!data[0]) + return -EINVAL; + + hdr = genlmsg_put(msg, portid, seq, family, flags, cmd); + if (!hdr) { + rc = -EMSGSIZE; + goto canceled; + } + + rc = lnet_genl_parse_list(msg, data, 0); + if (rc < 0) + goto canceled; + + genlmsg_end(msg, hdr); +canceled: + if (rc < 0) + genlmsg_cancel(msg, hdr); + return rc; +} +EXPORT_SYMBOL(lnet_genl_send_scalar_list); + /** * Initialize LNet library. * From patchwork Wed Jul 7 19:11:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3356FC07E9C for ; Wed, 7 Jul 2021 19:11:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14C4561A13 for ; Wed, 7 Jul 2021 19:11:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14C4561A13 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2619F337DC1; Wed, 7 Jul 2021 12:11:36 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D2E5621F925 for ; Wed, 7 Jul 2021 12:11:22 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 510B5100F362; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4F19A9D8BD; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:12 -0400 Message-Id: <1625685076-1964-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/15] lustre: llite: parallelize direct i/o issuance X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Currently, the direct i/o code issues an i/o to a given stripe, and then waits for that i/o to complete. (This is for i/os from a single process.) This forces DIO to send only one RPC at a time, serially. In the case of multi-stripe files and larger i/os from userspace, this means that i/o is serialized - so single thread/single process direct i/o doesn't see any benefit from the combination of extra stripes & larger i/os. Using part of the AIO support, it is possible to move this waiting up a level, so it happens after all the i/o is issued. (See LU-4198 for AIO support.) This means we can issue many RPCs and then wait, dramatically improving performance vs waiting for each RPC serially. This is referred to as 'parallel dio'. Notes: AIO is not supported on pipes, so we fall back to the old sync behavior if the source or destination is a pipe. Error handling is similar to buffered writes: We do not wait for individual chunks, so we can get an error on an RPC in the middle of an i/o. The solution is to return an error in this case, because we cannot know how many bytes were written contiguously. This is similar to buffered i/o combined with fsync(). The performance improvement from this is dramatic, and greater at larger sizes. lfs setstripe -c 8 -S 4M . mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect Without the patch: write 764.85 MiB/s read 682.87 MiB/s With patch: write 4030 MiB/s read 4468 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13798 Lustre-commit: cba07b68f9386b61 ("LU-13798 llite: parallelize direct i/o issuance") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39436 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 10 +++++++- fs/lustre/include/lustre_osc.h | 2 +- fs/lustre/llite/file.c | 51 ++++++++++++++++++++++++++++++++++++---- fs/lustre/llite/llite_internal.h | 9 +++++++ fs/lustre/llite/llite_lib.c | 1 + fs/lustre/llite/lproc_llite.c | 37 +++++++++++++++++++++++++++++ fs/lustre/llite/rw26.c | 38 +++++++++--------------------- fs/lustre/llite/vvp_io.c | 1 + fs/lustre/obdclass/cl_io.c | 29 +++++++++++++++++++++++ fs/lustre/osc/osc_cache.c | 12 +++++++++- 10 files changed, 155 insertions(+), 35 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 1495949..61a14f4 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -1996,7 +1996,13 @@ struct cl_io { /** * Sequential read hints. */ - ci_seq_read:1; + ci_seq_read:1, + /** + * Do parallel (async) submission of DIO RPCs. Note DIO is still sync + * to userspace, only the RPCs are submitted async, then waited for at + * the llite layer before returning. + */ + ci_parallel_dio:1; /** * Bypass quota check */ @@ -2585,6 +2591,8 @@ int cl_sync_io_wait(const struct lu_env *env, struct cl_sync_io *anchor, long timeout); void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, int ioret); +int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, + long timeout, int ioret); struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb); void cl_aio_free(struct cl_dio_aio *aio); diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 0947677..884ea59 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -602,7 +602,7 @@ int osc_teardown_async_page(const struct lu_env *env, struct osc_object *obj, struct osc_page *ops); int osc_flush_async_page(const struct lu_env *env, struct cl_io *io, struct osc_page *ops); -int osc_queue_sync_pages(const struct lu_env *env, const struct cl_io *io, +int osc_queue_sync_pages(const struct lu_env *env, struct cl_io *io, struct osc_object *obj, struct list_head *list, int brw_flags); int osc_cache_truncate_start(const struct lu_env *env, struct osc_object *obj, diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 2dcf25f..54e343f 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1619,12 +1619,15 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, struct ll_sb_info *sbi = ll_i2sbi(inode); struct vvp_io *vio = vvp_env_io(env); struct range_lock range; + bool range_locked = false; struct cl_io *io; ssize_t result = 0; int rc = 0; + int rc2 = 0; unsigned int retried = 0; unsigned int dio_lock = 0; bool is_aio = false; + bool is_parallel_dio = false; struct cl_dio_aio *ci_aio = NULL; size_t per_bytes; bool partial_io = false; @@ -1642,6 +1645,17 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, if (file->f_flags & O_DIRECT) { if (!is_sync_kiocb(args->u.normal.via_iocb)) is_aio = true; + + /* the kernel does not support AIO on pipes, and parallel DIO + * uses part of the AIO path, so we must not do parallel dio + * to pipes + */ + is_parallel_dio = !iov_iter_is_pipe(args->u.normal.via_iter) && + !is_aio; + + if (!ll_sbi_has_parallel_dio(sbi)) + is_parallel_dio = false; + ci_aio = cl_aio_alloc(args->u.normal.via_iocb); if (!ci_aio) { rc = -ENOMEM; @@ -1665,10 +1679,9 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, io->ci_aio = ci_aio; io->ci_dio_lock = dio_lock; io->ci_ndelay_tried = retried; + io->ci_parallel_dio = is_parallel_dio; if (cl_io_rw_init(env, io, iot, *ppos, per_bytes) == 0) { - bool range_locked = false; - if (file->f_flags & O_APPEND) range_lock_init(&range, 0, LUSTRE_EOF); else @@ -1697,17 +1710,41 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, ll_cl_add(file, env, io, LCC_RW); rc = cl_io_loop(env, io); ll_cl_remove(file, env); - if (range_locked) { + if (range_locked && !is_parallel_dio) { CDEBUG(D_VFSTRACE, "Range unlock [%llu, %llu]\n", range.rl_start, range.rl_last); range_unlock(&lli->lli_write_tree, &range); + range_locked = false; } } else { /* cl_io_rw_init() handled IO */ rc = io->ci_result; } + /* N/B: parallel DIO may be disabled during i/o submission; + * if that occurs, async RPCs are resolved before we get here, and this + * wait call completes immediately. + */ + if (is_parallel_dio) { + struct cl_sync_io *anchor = &io->ci_aio->cda_sync; + + /* for dio, EIOCBQUEUED is an implementation detail, + * and we don't return it to userspace + */ + if (rc == -EIOCBQUEUED) + rc = 0; + + rc2 = cl_sync_io_wait_recycle(env, anchor, 0, 0); + if (rc2 < 0) + rc = rc2; + + if (range_locked) { + range_unlock(&lli->lli_write_tree, &range); + range_locked = false; + } + } + /* * In order to move forward AIO, ci_nob was increased, * but that doesn't mean io have been finished, it just @@ -1717,8 +1754,12 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, */ if (io->ci_nob > 0) { if (!is_aio) { - result += io->ci_nob; - *ppos = io->u.ci_wr.wr.crw_pos; /* for splice */ + if (rc2 == 0) { + result += io->ci_nob; + *ppos = io->u.ci_wr.wr.crw_pos; /* for splice */ + } else if (rc2) { + result = 0; + } } count -= io->ci_nob; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 3674af9..a073d6d 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -631,6 +631,9 @@ enum stats_track_type { #define LL_SBI_FOREIGN_SYMLINK 0x20000000 /* foreign fake-symlink support */ /* foreign fake-symlink upcall registered */ #define LL_SBI_FOREIGN_SYMLINK_UPCALL 0x40000000 +#define LL_SBI_PARALLEL_DIO 0x80000000 /* parallel (async) submission of + * RPCs for DIO + */ #define LL_SBI_FLAGS { \ "nolck", \ @@ -664,6 +667,7 @@ enum stats_track_type { "noencrypt", \ "foreign_symlink", \ "foreign_symlink_upcall", \ + "parallel_dio", \ } /* @@ -1001,6 +1005,11 @@ static inline bool ll_sbi_has_foreign_symlink(struct ll_sb_info *sbi) return !!(sbi->ll_flags & LL_SBI_FOREIGN_SYMLINK); } +static inline bool ll_sbi_has_parallel_dio(struct ll_sb_info *sbi) +{ + return !!(sbi->ll_flags & LL_SBI_PARALLEL_DIO); +} + void ll_ras_enter(struct file *f, loff_t pos, size_t count); /* llite/lcommon_misc.c */ diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index b131edd..153d34e 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -179,6 +179,7 @@ static struct ll_sb_info *ll_init_sbi(void) sbi->ll_flags |= LL_SBI_AGL_ENABLED; sbi->ll_flags |= LL_SBI_FAST_READ; sbi->ll_flags |= LL_SBI_TINY_WRITE; + sbi->ll_flags |= LL_SBI_PARALLEL_DIO; ll_sbi_set_encrypt(sbi, true); /* root squash */ diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index cd8394c..3b4f60c 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1100,6 +1100,42 @@ static ssize_t tiny_write_store(struct kobject *kobj, } LUSTRE_RW_ATTR(tiny_write); +static ssize_t parallel_dio_show(struct kobject *kobj, + struct attribute *attr, + char *buf) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + + return snprintf(buf, PAGE_SIZE, "%u\n", + !!(sbi->ll_flags & LL_SBI_PARALLEL_DIO)); +} + +static ssize_t parallel_dio_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) +{ + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + bool val; + int rc; + + rc = kstrtobool(buffer, &val); + if (rc) + return rc; + + spin_lock(&sbi->ll_lock); + if (val) + sbi->ll_flags |= LL_SBI_PARALLEL_DIO; + else + sbi->ll_flags &= ~LL_SBI_PARALLEL_DIO; + spin_unlock(&sbi->ll_lock); + + return count; +} +LUSTRE_RW_ATTR(parallel_dio); + static ssize_t max_read_ahead_async_active_show(struct kobject *kobj, struct attribute *attr, char *buf) @@ -1685,6 +1721,7 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = { &lustre_attr_xattr_cache.attr, &lustre_attr_fast_read.attr, &lustre_attr_tiny_write.attr, + &lustre_attr_parallel_dio.attr, &lustre_attr_file_heat.attr, &lustre_attr_heat_decay_percentage.attr, &lustre_attr_heat_period_second.attr, diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index 2de956d..6a1b5bb 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -404,39 +404,23 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) out: aio->cda_bytes += tot_bytes; - if (is_sync_kiocb(iocb)) { - struct cl_sync_io *anchor = &aio->cda_sync; - ssize_t rc2; + if (rw == WRITE) + vio->u.readwrite.vui_written += tot_bytes; + else + vio->u.readwrite.vui_read += tot_bytes; - /** - * @anchor was inited as 1 to prevent end_io to be - * called before we add all pages for IO, so drop - * one extra reference to make sure we could wait - * count to be zero. - */ - cl_sync_io_note(env, anchor, result); + /* If async dio submission is not allowed, we must wait here. */ + if (is_sync_kiocb(iocb) && !io->ci_parallel_dio) { + ssize_t rc2; - rc2 = cl_sync_io_wait(env, anchor, 0); + rc2 = cl_sync_io_wait_recycle(env, &aio->cda_sync, 0, 0); if (result == 0 && rc2) result = rc2; - /** - * One extra reference again, as if @anchor is - * reused we assume it as 1 before using. - */ - atomic_add(1, &anchor->csi_sync_nr); - if (result == 0) { - /* no commit async for direct IO */ - vio->u.readwrite.vui_written += tot_bytes; - result = tot_bytes; - } - } else { - if (rw == WRITE) - vio->u.readwrite.vui_written += tot_bytes; - else - vio->u.readwrite.vui_read += tot_bytes; if (result == 0) - result = -EIOCBQUEUED; + result = tot_bytes; + } else if (result == 0) { + result = -EIOCBQUEUED; } return result; diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 12314fd..0e54f46 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -526,6 +526,7 @@ static void vvp_io_advance(const struct lu_env *env, * of relying on VFS, we move iov iter by ourselves. */ iov_iter_advance(vio->vui_iter, nob); + CDEBUG(D_VFSTRACE, "advancing %ld bytes\n", nob); vio->vui_tot_count -= nob; iov_iter_reexpand(vio->vui_iter, vio->vui_tot_count); } diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index 6c22137..beda7fc 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1202,3 +1202,32 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, } } EXPORT_SYMBOL(cl_sync_io_note); + + +int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, + long timeout, int ioret) +{ + int rc = 0; + + /* + * @anchor was inited as 1 to prevent end_io to be + * called before we add all pages for IO, so drop + * one extra reference to make sure we could wait + * count to be zero. + */ + cl_sync_io_note(env, anchor, ioret); + /* Wait for completion of normal dio. + * This replaces the EIOCBQEUED return from the DIO/AIO + * path, and this is where AIO and DIO implementations + * split. + */ + rc = cl_sync_io_wait(env, anchor, timeout); + /** + * One extra reference again, as if @anchor is + * reused we assume it as 1 before using. + */ + atomic_add(1, &anchor->csi_sync_nr); + + return rc; +} +EXPORT_SYMBOL(cl_sync_io_wait_recycle); diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 0f0daa1..e37c034 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2640,7 +2640,7 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io, return rc; } -int osc_queue_sync_pages(const struct lu_env *env, const struct cl_io *io, +int osc_queue_sync_pages(const struct lu_env *env, struct cl_io *io, struct osc_object *obj, struct list_head *list, int brw_flags) { @@ -2701,6 +2701,7 @@ int osc_queue_sync_pages(const struct lu_env *env, const struct cl_io *io, grants += (1 << cli->cl_chunkbits) * ((page_count + ppc - 1) / ppc); + CDEBUG(D_CACHE, "requesting %d bytes grant\n", grants); spin_lock(&cli->cl_loi_list_lock); if (osc_reserve_grant(cli, grants) == 0) { list_for_each_entry(oap, list, oap_pending_item) { @@ -2710,6 +2711,15 @@ int osc_queue_sync_pages(const struct lu_env *env, const struct cl_io *io, } osc_unreserve_grant_nolock(cli, grants, 0); ext->oe_grants = grants; + } else { + /* We cannot report ENOSPC correctly if we do parallel + * DIO (async RPC submission), so turn off parallel dio + * if there is not sufficient grant available. This + * makes individual RPCs synchronous. + */ + io->ci_parallel_dio = false; + CDEBUG(D_CACHE, + "not enough grant available, switching to sync for this i/o\n"); } spin_unlock(&cli->cl_loi_list_lock); } From patchwork Wed Jul 7 19:11:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9957EC07E95 for ; Wed, 7 Jul 2021 19:11:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 45A7561C48 for ; Wed, 7 Jul 2021 19:11:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 45A7561C48 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB7D9338A6F; Wed, 7 Jul 2021 12:11:42 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 28B1C21F978 for ; Wed, 7 Jul 2021 12:11:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 53617100F3DE; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 525769D8BF; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:13 -0400 Message-Id: <1625685076-1964-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/15] lustre: osc: Don't get time for each page X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Getting the time when each batch of pages starts is sufficiently accurate, and ktime_get() is several % of the CPU time when doing AIO + DIO. This relies on previous patches in this series. Measuring this in milliseconds/gigabyte lets us measure the improvement in absolute terms, rather than just relative terms. This patch reduces i/o time in ms/GiB by: Write: 17 ms/GiB Read: 6 ms/GiB Totals: Write: 237 ms/GiB Read: 223 ms/GiB IOR: mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect Without the patch: write 4030 MiB/s read 4468 MiB/s With patch: write 4326 MiB/s read 4587 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: 485976ab451dd6708 ("LU-13799 osc: Don't get time for each page") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39437 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 2 +- fs/lustre/osc/osc_io.c | 3 ++- fs/lustre/osc/osc_page.c | 4 ++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 884ea59..208bb59 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -584,7 +584,7 @@ void osc_index2policy(union ldlm_policy_data *policy, pgoff_t start, pgoff_t end); void osc_lru_add_batch(struct client_obd *cli, struct list_head *list); void osc_page_submit(const struct lu_env *env, struct osc_page *opg, - enum cl_req_type crt, int brw_flags); + enum cl_req_type crt, int brw_flags, ktime_t submit_time); int lru_queue_work(const struct lu_env *env, void *data); long osc_lru_shrink(const struct lu_env *env, struct client_obd *cli, long target, bool force); diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index 67fe85b..bd92b5d 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -132,6 +132,7 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios, unsigned int max_pages; unsigned int ppc_bits; /* pages per chunk bits */ unsigned int ppc; + ktime_t submit_time = ktime_get(); bool sync_queue = false; LASSERT(qin->pl_nr > 0); @@ -195,7 +196,7 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios, oap->oap_async_flags |= ASYNC_COUNT_STABLE; spin_unlock(&oap->oap_lock); - osc_page_submit(env, opg, crt, brw_flags); + osc_page_submit(env, opg, crt, brw_flags, submit_time); list_add_tail(&oap->oap_pending_item, &list); if (page->cp_sync_io) diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c index 94db9d2..0f088fe 100644 --- a/fs/lustre/osc/osc_page.c +++ b/fs/lustre/osc/osc_page.c @@ -295,7 +295,7 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj, * transfer (i.e., transferred synchronously). */ void osc_page_submit(const struct lu_env *env, struct osc_page *opg, - enum cl_req_type crt, int brw_flags) + enum cl_req_type crt, int brw_flags, ktime_t submit_time) { struct osc_io *oio = osc_env_io(env); struct osc_async_page *oap = &opg->ops_oap; @@ -316,7 +316,7 @@ void osc_page_submit(const struct lu_env *env, struct osc_page *opg, oap->oap_cmd |= OBD_BRW_NOQUOTA; } - opg->ops_submit_time = ktime_get(); + opg->ops_submit_time = submit_time; osc_page_transfer_get(opg, "transfer\0imm"); osc_page_transfer_add(env, opg, crt); } From patchwork Wed Jul 7 19:11:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BF67C07E95 for ; Wed, 7 Jul 2021 19:12:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 31D5161CBE for ; Wed, 7 Jul 2021 19:12:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31D5161CBE Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 950ED3389FC; Wed, 7 Jul 2021 12:11:46 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 72EEA21F978 for ; Wed, 7 Jul 2021 12:11:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 57239100F3DF; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 555799D8AD; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:14 -0400 Message-Id: <1625685076-1964-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/15] lustre: clio: Implement real list splice X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Lustre's list_splice is actually just a slightly depressing list_for_each; let's use a real list_splice. This saves significant time in AIO/DIO page submission, getting a several % performance boost. This patch reduces i/o time in ms/GiB by: Write: 16 ms/GiB Read: 14 ms/GiB Totals: Write: 220 ms/GiB Read: 209 ms/GiB mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect With previous patches in series: write 4326 MiB/s read 4587 MiB/s With this patch: write 4647 MiB/s read 4888 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: dfe2d225b86d4215 ("LU-13799 clio: Implement real list splice") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39439 Reviewed-by: Wang Shilong Reviewed-by: Bobi Jam Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/obdclass/cl_io.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index beda7fc..63ce39c 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -891,13 +891,11 @@ void cl_page_list_move_head(struct cl_page_list *dst, struct cl_page_list *src, /** * splice the cl_page_list, just as list head does */ -void cl_page_list_splice(struct cl_page_list *list, struct cl_page_list *head) +void cl_page_list_splice(struct cl_page_list *src, struct cl_page_list *dst) { - struct cl_page *page; - struct cl_page *tmp; - - cl_page_list_for_each_safe(page, tmp, list) - cl_page_list_move(head, list, page); + dst->pl_nr += src->pl_nr; + src->pl_nr = 0; + list_splice_tail_init(&src->pl_pages, &dst->pl_pages); } EXPORT_SYMBOL(cl_page_list_splice); From patchwork Wed Jul 7 19:11:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D28DC07E95 for ; Wed, 7 Jul 2021 19:12:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C607B61A13 for ; Wed, 7 Jul 2021 19:12:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C607B61A13 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 58DCE338A95; Wed, 7 Jul 2021 12:11:50 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AB3AB21F978 for ; Wed, 7 Jul 2021 12:11:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 59925100F3E0; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 587219D8B2; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:15 -0400 Message-Id: <1625685076-1964-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/15] lustre: osc: Simplify clipping for transient pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The combination of page clip and page flag setting for transient pages takes up several % of the time when submitting them for async DIO. But neither is required - Transient pages do not change after creation except in limited cases, and in any case, they are only accessible from the submitting thread - there is no possibility of parallel access. So we can set the page flags, etc, at init time. This patch improves i/o time in ms/GiB by: Write: 17 ms/GiB Read: 22 ms/GiB Totals: Write: 204 ms/GiB Read: 198 ms/GiB mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect With previous patches in series: write 4647 MiB/s read 4888 MiB/s Plus this patch: write 5030 MiB/s read 5174 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: b64b9646f17b771c ("LU-13799 osc: Simplify clipping for transient pages") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39440 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 2 +- fs/lustre/llite/rw26.c | 3 ++- fs/lustre/osc/osc_cache.c | 18 +++++++++++++----- fs/lustre/osc/osc_io.c | 10 ++++++---- fs/lustre/osc/osc_page.c | 6 ++++-- 5 files changed, 26 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 208bb59..13e9363 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -593,7 +593,7 @@ long osc_lru_shrink(const struct lu_env *env, struct client_obd *cli, int osc_set_async_flags(struct osc_object *obj, struct osc_page *opg, u32 async_flags); int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops, - struct page *page, loff_t offset); + struct cl_page *page, loff_t offset); int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, struct osc_page *ops, cl_commit_cbt cb); int osc_page_cache_add(const struct lu_env *env, struct osc_page *opg, diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index 6a1b5bb..ba9c070 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -269,7 +269,8 @@ struct ll_dio_pages { * Set page clip to tell transfer formation engine * that page has to be sent even if it is beyond KMS. */ - cl_page_clip(env, page, 0, min(size, page_size)); + if (size < page_size) + cl_page_clip(env, page, 0, size); ++io_pages; /* drop the reference count for cl_page_find */ diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index e37c034..84c6b68 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2311,10 +2311,11 @@ int __osc_io_unplug(const struct lu_env *env, struct client_obd *cli, EXPORT_SYMBOL(__osc_io_unplug); int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops, - struct page *page, loff_t offset) + struct cl_page *page, loff_t offset) { struct obd_export *exp = osc_export(osc); struct osc_async_page *oap = &ops->ops_oap; + struct page *vmpage = page->cp_vmpage; if (!page) return -EIO; @@ -2323,17 +2324,24 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops, oap->oap_cli = &exp->exp_obd->u.cli; oap->oap_obj = osc; - oap->oap_page = page; + oap->oap_page = vmpage; oap->oap_obj_off = offset; LASSERT(!(offset & ~PAGE_MASK)); + /* Count of transient (direct i/o) pages is always stable by the time + * they're submitted. Setting this here lets us avoid calling + * cl_page_clip later to set this. + */ + if (page->cp_type == CPT_TRANSIENT) + oap->oap_async_flags |= ASYNC_COUNT_STABLE|ASYNC_URGENT| + ASYNC_READY; + INIT_LIST_HEAD(&oap->oap_pending_item); INIT_LIST_HEAD(&oap->oap_rpc_item); spin_lock_init(&oap->oap_lock); - CDEBUG(D_INFO, "oap %p page %p obj off %llu\n", - oap, page, oap->oap_obj_off); - + CDEBUG(D_INFO, "oap %p vmpage %p obj off %llu\n", + oap, vmpage, oap->oap_obj_off); return 0; } EXPORT_SYMBOL(osc_prep_async_page); diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index bd92b5d..f69f201 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -191,10 +191,12 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios, continue; } - spin_lock(&oap->oap_lock); - oap->oap_async_flags = ASYNC_URGENT | ASYNC_READY; - oap->oap_async_flags |= ASYNC_COUNT_STABLE; - spin_unlock(&oap->oap_lock); + if (page->cp_type != CPT_TRANSIENT) { + spin_lock(&oap->oap_lock); + oap->oap_async_flags = ASYNC_URGENT | ASYNC_READY; + oap->oap_async_flags |= ASYNC_COUNT_STABLE; + spin_unlock(&oap->oap_lock); + } osc_page_submit(env, opg, crt, brw_flags, submit_time); list_add_tail(&oap->oap_pending_item, &list); diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c index 0f088fe..8aa21ee 100644 --- a/fs/lustre/osc/osc_page.c +++ b/fs/lustre/osc/osc_page.c @@ -212,6 +212,9 @@ static void osc_page_clip(const struct lu_env *env, opg->ops_from = from; /* argument @to is exclusive, but @ops_to is inclusive */ opg->ops_to = to - 1; + /* This isn't really necessary for transient pages, but we also don't + * call clip on transient pages often, so it's OK. + */ spin_lock(&oap->oap_lock); oap->oap_async_flags |= ASYNC_COUNT_STABLE; spin_unlock(&oap->oap_lock); @@ -257,8 +260,7 @@ int osc_page_init(const struct lu_env *env, struct cl_object *obj, opg->ops_to = PAGE_SIZE - 1; INIT_LIST_HEAD(&opg->ops_lru); - result = osc_prep_async_page(osc, opg, cl_page->cp_vmpage, - cl_offset(obj, index)); + result = osc_prep_async_page(osc, opg, cl_page, cl_offset(obj, index)); if (result != 0) return result; From patchwork Wed Jul 7 19:11:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12363915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA696C07E95 for ; Wed, 7 Jul 2021 19:11:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6D00B61A13 for ; Wed, 7 Jul 2021 19:11:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6D00B61A13 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0C5ED3379F4; Wed, 7 Jul 2021 12:11:32 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 013F921F978 for ; Wed, 7 Jul 2021 12:11:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5C9B1100F3E1; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5B8D69D8BA; Wed, 7 Jul 2021 15:11:18 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 7 Jul 2021 15:11:16 -0400 Message-Id: <1625685076-1964-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> References: <1625685076-1964-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/15] lustre: mgc: configurable wait-to-reprocess time X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev so we can set it shorter, for testing purposes at least. to change minimal wait time MGC module option 'mgc_requeue_timeout_min' should be used (in seconds). additionally a random value upto mgc_requeue_timeout_min is added to avoid a flood of config re-read requests from clients. if mgc_requeue_timeout_min is set to 0, then random part will be upto 1 second. ost-pools: before: 5840s, after:a 3474s sanity-flr: before: 1575s, after: 1381s sanity-quota: before: 10679s, after: 9703s WC-bug-id: https://jira.whamcloud.com/browse/LU-14516 Lustre-commit: 04b2da6180d3c8eda ("LU-14516 mgc: configurable wait-to-reprocess time") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/42020 Reviewed-by: Andreas Dilger Reviewed-by: Aurelien Degremont Reviewed-by: Sebastien Buisson Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/mgc/mgc_internal.h | 8 ++++++++ fs/lustre/mgc/mgc_request.c | 44 +++++++++++++++++++++++++++++++++----------- 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/fs/lustre/mgc/mgc_internal.h b/fs/lustre/mgc/mgc_internal.h index a2a09d4..91f5fa1 100644 --- a/fs/lustre/mgc/mgc_internal.h +++ b/fs/lustre/mgc/mgc_internal.h @@ -43,6 +43,14 @@ int mgc_process_log(struct obd_device *mgc, struct config_llog_data *cld); +/* this timeout represents how many seconds MGC should wait before + * requeue config and recover lock to the MGS. We need to randomize this + * in order to not flood the MGS. + */ +#define MGC_TIMEOUT_MIN_SECONDS 5 + +extern unsigned int mgc_requeue_timeout_min; + static inline bool cld_is_sptlrpc(struct config_llog_data *cld) { return cld->cld_type == MGS_CFG_T_SPTLRPC; diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c index 1dfc74b..50044aa2 100644 --- a/fs/lustre/mgc/mgc_request.c +++ b/fs/lustre/mgc/mgc_request.c @@ -530,13 +530,6 @@ static void do_requeue(struct config_llog_data *cld) up_read(&cld->cld_mgcexp->exp_obd->u.cli.cl_sem); } -/* this timeout represents how many seconds MGC should wait before - * requeue config and recover lock to the MGS. We need to randomize this - * in order to not flood the MGS. - */ -#define MGC_TIMEOUT_MIN_SECONDS 5 -#define MGC_TIMEOUT_RAND_CENTISEC 500 - static int mgc_requeue_thread(void *data) { bool first = true; @@ -548,7 +541,6 @@ static int mgc_requeue_thread(void *data) rq_state |= RQ_RUNNING; while (!(rq_state & RQ_STOP)) { struct config_llog_data *cld, *cld_prev; - int rand = prandom_u32_max(MGC_TIMEOUT_RAND_CENTISEC); int to; /* Any new or requeued lostlocks will change the state */ @@ -565,11 +557,11 @@ static int mgc_requeue_thread(void *data) * random so everyone doesn't try to reconnect at once. */ /* rand is centi-seconds, "to" is in centi-HZ */ - to = MGC_TIMEOUT_MIN_SECONDS * HZ * 100; - to += rand * HZ; + to = mgc_requeue_timeout_min == 0 ? 1 : mgc_requeue_timeout_min; + to = mgc_requeue_timeout_min * HZ + prandom_u32_max(to * HZ); wait_event_idle_timeout(rq_waitq, rq_state & (RQ_STOP | RQ_PRECLEANUP), - to/100); + to); /* * iterate & processing through the list. for each cld, process @@ -1835,6 +1827,36 @@ static int mgc_process_config(struct obd_device *obd, u32 len, void *buf) .process_config = mgc_process_config, }; +static int mgc_param_requeue_timeout_min_set(const char *val, + const struct kernel_param *kp) +{ + int rc; + unsigned int num; + + rc = kstrtouint(val, 0, &num); + if (rc < 0) + return rc; + if (num > 120) + return -EINVAL; + + mgc_requeue_timeout_min = num; + + return 0; +} + +static struct kernel_param_ops param_ops_requeue_timeout_min = { + .set = mgc_param_requeue_timeout_min_set, + .get = param_get_uint, +}; + +#define param_check_requeue_timeout_min(name, p) \ + __param_check(name, p, unsigned int) + +unsigned int mgc_requeue_timeout_min = MGC_TIMEOUT_MIN_SECONDS; +module_param_call(mgc_requeue_timeout_min, mgc_param_requeue_timeout_min_set, + param_get_uint, ¶m_ops_requeue_timeout_min, 0644); +MODULE_PARM_DESC(mgc_requeue_timeout_min, "Minimal requeue time to refresh logs"); + static int __init mgc_init(void) { int rc;