From patchwork Sun Nov 20 14:16:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8E36EC433FE for ; Sun, 20 Nov 2022 14:17:43 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXf24lV7z1y35; Sun, 20 Nov 2022 06:17:14 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXdz4nCsz1wM4 for ; Sun, 20 Nov 2022 06:17:11 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id AB3D71006F3B; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A39DCE8B88; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:47 -0500 Message-Id: <1668953828-10909-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/22] lustre: llite: clear stale page's uptodate bit X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam With truncate_inode_page()->do_invalidatepage()->ll_invalidatepage() call path before deleting vmpage from page cache, the page could be possibly picked up by ll_read_ahead_page()->grab_cache_page_nowait(). If ll_invalidatepage()->cl_page_delete() does not clear the vmpage's uptodate bit, the read ahead could pick it up and think it's already uptodate wrongly. In ll_fault()->vvp_io_fault_start()->vvp_io_kernel_fault(), the filemap_fault() will call ll_readpage() to read vmpage and wait for the unlock of the vmpage, and when ll_readpage() successfully read the vmpage then unlock the vmpage, memory pressure or truncate can get in and delete the cl_page, afterward filemap_fault() find that the vmpage is not uptodate and VM_FAULT_SIGBUS got returned. To fix this situation, this patch makes vvp_io_kernel_fault() restart filemap_fault() to get uptodated vmpage again. WC-bug-id: https://jira.whamcloud.com/browse/LU-16160 Lustre-commit: 5b911e03261c3de6b ("LU-16160 llite: clear stale page's uptodate bit") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48607 Reviewed-by: Alex Zhuravlev Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 15 ++++- fs/lustre/llite/rw.c | 10 +++- fs/lustre/llite/vvp_io.c | 124 +++++++++++++++++++++++++++++++++++++++--- fs/lustre/llite/vvp_page.c | 5 ++ fs/lustre/obdclass/cl_page.c | 37 ++++++++++--- 5 files changed, 172 insertions(+), 19 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 41ce0b0..8be58ff 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -768,7 +768,15 @@ struct cl_page { enum cl_page_type cp_type:CP_TYPE_BITS; unsigned int cp_defer_uptodate:1, cp_ra_updated:1, - cp_ra_used:1; + cp_ra_used:1, + /* fault page read grab extra referece */ + cp_fault_ref:1, + /** + * if fault page got delete before returned to + * filemap_fault(), defer the vmpage detach/put + * until filemap_fault() has been handled. + */ + cp_defer_detach:1; /* which slab kmem index this memory allocated from */ short int cp_kmem_index; @@ -2393,6 +2401,11 @@ int cl_io_lru_reserve(const struct lu_env *env, struct cl_io *io, int cl_io_read_ahead(const struct lu_env *env, struct cl_io *io, pgoff_t start, struct cl_read_ahead *ra); +static inline int cl_io_is_pagefault(const struct cl_io *io) +{ + return io->ci_type == CIT_FAULT && !io->u.ci_fault.ft_mkwrite; +} + /** * True, if @io is an O_APPEND write(2). */ diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 2290b31..0283af4 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -1947,7 +1947,15 @@ int ll_readpage(struct file *file, struct page *vmpage) unlock_page(vmpage); result = 0; } - cl_page_put(env, page); + if (cl_io_is_pagefault(io) && result == 0) { + /** + * page fault, retain the cl_page reference until + * vvp_io_kernel_fault() release it. + */ + page->cp_fault_ref = 1; + } else { + cl_page_put(env, page); + } } else { unlock_page(vmpage); result = PTR_ERR(page); diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index ef7a3d92..be6f17f 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1292,14 +1292,41 @@ static void vvp_io_rw_end(const struct lu_env *env, trunc_sem_up_read(&lli->lli_trunc_sem); } -static int vvp_io_kernel_fault(struct vvp_fault_io *cfio) +static void detach_and_deref_page(struct cl_page *clp, struct page *vmpage) +{ + if (!clp->cp_defer_detach) + return; + + /** + * cl_page_delete0() took a vmpage reference, but not unlink the vmpage + * from its cl_page. + */ + clp->cp_defer_detach = 0; + ClearPagePrivate(vmpage); + vmpage->private = 0; + + put_page(vmpage); + refcount_dec(&clp->cp_ref); +} + +static int vvp_io_kernel_fault(const struct lu_env *env, + struct vvp_fault_io *cfio) { struct vm_fault *vmf = cfio->ft_vmf; + struct file *vmff = cfio->ft_vma->vm_file; + struct address_space *mapping = vmff->f_mapping; + struct inode *inode = mapping->host; + struct page *vmpage = NULL; + struct cl_page *clp = NULL; + int rc = 0; + ll_inode_size_lock(inode); +retry: cfio->ft_flags = filemap_fault(vmf); cfio->ft_flags_valid = 1; if (vmf->page) { + /* success, vmpage is locked */ CDEBUG(D_PAGE, "page %p map %p index %lu flags %lx count %u priv %0lx: got addr %p type NOPAGE\n", vmf->page, vmf->page->mapping, vmf->page->index, @@ -1311,24 +1338,105 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio) } cfio->ft_vmpage = vmf->page; - return 0; + + /** + * ll_filemap_fault()->ll_readpage() could get an extra cl_page + * reference. So we have to get the cl_page's to check its + * cp_fault_ref and drop the reference later. + */ + clp = cl_vmpage_page(vmf->page, NULL); + + goto unlock; + } + + /* filemap_fault() fails, vmpage is not locked */ + if (!clp) { + vmpage = find_get_page(mapping, vmf->pgoff); + if (vmpage) { + lock_page(vmpage); + clp = cl_vmpage_page(vmpage, NULL); + unlock_page(vmpage); + } } if (cfio->ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) { + pgoff_t max_idx; + + /** + * ll_filemap_fault()->ll_readpage() could fill vmpage + * correctly, and unlock the vmpage, while memory pressure or + * truncate could detach cl_page from vmpage, and kernel + * filemap_fault() will wait_on_page_locked(vmpage) and find + * out that the vmpage has been cleared its uptodate bit, + * so it returns VM_FAULT_SIGBUS. + * + * In this case, we'd retry the filemap_fault()->ll_readpage() + * to rebuild the cl_page and fill vmpage with uptodated data. + */ + if (likely(vmpage)) { + bool need_retry = false; + + if (clp) { + if (clp->cp_defer_detach) { + detach_and_deref_page(clp, vmpage); + /** + * check i_size to make sure it's not + * over EOF, we don't want to call + * filemap_fault() repeatedly since it + * returns VM_FAULT_SIGBUS without even + * trying if vmf->pgoff is over EOF. + */ + max_idx = DIV_ROUND_UP(i_size_read(inode), + PAGE_SIZE); + if (vmf->pgoff < max_idx) + need_retry = true; + } + if (clp->cp_fault_ref) { + clp->cp_fault_ref = 0; + /* ref not released in ll_readpage() */ + cl_page_put(env, clp); + } + if (need_retry) + goto retry; + } + } + CDEBUG(D_PAGE, "got addr %p - SIGBUS\n", (void *)vmf->address); - return -EFAULT; + rc = -EFAULT; + goto unlock; } if (cfio->ft_flags & VM_FAULT_OOM) { CDEBUG(D_PAGE, "got addr %p - OOM\n", (void *)vmf->address); - return -ENOMEM; + rc = -ENOMEM; + goto unlock; } - if (cfio->ft_flags & VM_FAULT_RETRY) - return -EAGAIN; + if (cfio->ft_flags & VM_FAULT_RETRY) { + rc = -EAGAIN; + goto unlock; + } CERROR("Unknown error in page fault %d!\n", cfio->ft_flags); - return -EINVAL; + rc = -EINVAL; +unlock: + ll_inode_size_unlock(inode); + if (clp) { + if (clp->cp_defer_detach && vmpage) + detach_and_deref_page(clp, vmpage); + + /* additional cl_page ref has been taken in ll_readpage() */ + if (clp->cp_fault_ref) { + clp->cp_fault_ref = 0; + /* ref not released in ll_readpage() */ + cl_page_put(env, clp); + } + /* ref taken in this function */ + cl_page_put(env, clp); + } + if (vmpage) + put_page(vmpage); + return rc; } static void mkwrite_commit_callback(const struct lu_env *env, struct cl_io *io, @@ -1368,7 +1476,7 @@ static int vvp_io_fault_start(const struct lu_env *env, LASSERT(cfio->ft_vmpage); lock_page(cfio->ft_vmpage); } else { - result = vvp_io_kernel_fault(cfio); + result = vvp_io_kernel_fault(env, cfio); if (result != 0) return result; } diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c index f359596..9e8c158 100644 --- a/fs/lustre/llite/vvp_page.c +++ b/fs/lustre/llite/vvp_page.c @@ -104,6 +104,11 @@ static void vvp_page_completion_read(const struct lu_env *env, ll_ra_count_put(ll_i2sbi(inode), 1); if (ioret == 0) { + /** + * cp_defer_uptodate is used for readahead page, and the + * vmpage Uptodate bit is deferred to set in ll_readpage/ + * ll_io_read_page. + */ if (!cp->cp_defer_uptodate) SetPageUptodate(vmpage); } else if (cp->cp_defer_uptodate) { diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c index 7011235..3bc1a9b 100644 --- a/fs/lustre/obdclass/cl_page.c +++ b/fs/lustre/obdclass/cl_page.c @@ -725,16 +725,35 @@ static void __cl_page_delete(const struct lu_env *env, struct cl_page *cp) LASSERT(PageLocked(vmpage)); LASSERT((struct cl_page *)vmpage->private == cp); - /* Drop the reference count held in vvp_page_init */ - refcount_dec(&cp->cp_ref); - ClearPagePrivate(vmpage); - vmpage->private = 0; - - /* - * The reference from vmpage to cl_page is removed, - * but the reference back is still here. It is removed - * later in cl_page_free(). + /** + * clear vmpage uptodate bit, since ll_read_ahead_pages()-> + * ll_read_ahead_page() could pick up this stale vmpage and + * take it as uptodated. */ + ClearPageUptodate(vmpage); + /** + * vvp_io_kernel_fault()->ll_readpage() set cp_fault_ref + * and need it to check cl_page to retry the page fault read. + */ + if (cp->cp_fault_ref) { + cp->cp_defer_detach = 1; + /** + * get a vmpage reference, so that filemap_fault() + * won't free it from pagecache. + */ + get_page(vmpage); + } else { + /* Drop the reference count held in vvp_page_init */ + refcount_dec(&cp->cp_ref); + ClearPagePrivate(vmpage); + vmpage->private = 0; + + /* + * The reference from vmpage to cl_page is removed, + * but the reference back is still here. It is removed + * later in cl_page_free(). + */ + } } } From patchwork Sun Nov 20 14:16:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 83638C433FE for ; Sun, 20 Nov 2022 14:20:13 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXfj0JXYz1yBy; Sun, 20 Nov 2022 06:17:49 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXf03R3Vz1wM4 for ; Sun, 20 Nov 2022 06:17:12 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id AC63710077E7; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A694AE8B89; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:48 -0500 Message-Id: <1668953828-10909-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/22] lustre: osc: Remove oap lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The OAP lock is taken around setting the oap flags, but not any of the other fields in oap. As far as I can tell, this is just some cargo cult belief about locking - there's no reason for it. Remove it entirely. (From the code, a queued spin lock appears to be 12 bytes on x86_64.) WC-bug-id: https://jira.whamcloud.com/browse/LU-15619 Lustre-commit: b2274a716087fad24 ("LU-15619 osc: Remove oap lock") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46719 Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Reviewed-by: Zhenyu Xu Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 2 -- fs/lustre/osc/osc_cache.c | 11 ----------- fs/lustre/osc/osc_io.c | 8 ++------ fs/lustre/osc/osc_page.c | 5 ----- 4 files changed, 2 insertions(+), 24 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 2e8c184..a0f1afc 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -88,8 +88,6 @@ struct osc_async_page { struct ptlrpc_request *oap_request; struct osc_object *oap_obj; - - spinlock_t oap_lock; }; #define oap_page oap_brw_page.pg diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index e563809..b5776a1 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -1140,9 +1140,7 @@ static int osc_extent_make_ready(const struct lu_env *env, rc = osc_make_ready(env, oap, OBD_BRW_WRITE); switch (rc) { case 0: - spin_lock(&oap->oap_lock); oap->oap_async_flags |= ASYNC_READY; - spin_unlock(&oap->oap_lock); break; case -EALREADY: LASSERT((oap->oap_async_flags & ASYNC_READY) != 0); @@ -1165,9 +1163,7 @@ static int osc_extent_make_ready(const struct lu_env *env, "last_oap_count %d\n", last_oap_count); LASSERT(last->oap_page_off + last_oap_count <= PAGE_SIZE); last->oap_count = last_oap_count; - spin_lock(&last->oap_lock); last->oap_async_flags |= ASYNC_COUNT_STABLE; - spin_unlock(&last->oap_lock); } /* for the rest of pages, we don't need to call osf_refresh_count() @@ -1176,9 +1172,7 @@ static int osc_extent_make_ready(const struct lu_env *env, list_for_each_entry(oap, &ext->oe_pages, oap_pending_item) { if (!(oap->oap_async_flags & ASYNC_COUNT_STABLE)) { oap->oap_count = PAGE_SIZE - oap->oap_page_off; - spin_lock(&last->oap_lock); oap->oap_async_flags |= ASYNC_COUNT_STABLE; - spin_unlock(&last->oap_lock); } } @@ -1866,9 +1860,7 @@ static void osc_ap_completion(const struct lu_env *env, struct client_obd *cli, } /* As the transfer for this page is being done, clear the flags */ - spin_lock(&oap->oap_lock); oap->oap_async_flags = 0; - spin_unlock(&oap->oap_lock); if (oap->oap_cmd & OBD_BRW_WRITE && xid > 0) { spin_lock(&cli->cl_loi_list_lock); @@ -2330,7 +2322,6 @@ int osc_prep_async_page(struct osc_object *osc, struct osc_page *ops, INIT_LIST_HEAD(&oap->oap_pending_item); INIT_LIST_HEAD(&oap->oap_rpc_item); - spin_lock_init(&oap->oap_lock); CDEBUG(D_INFO, "oap %p vmpage %p obj off %llu\n", oap, vmpage, oap->oap_obj_off); return 0; @@ -2619,9 +2610,7 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io, if (rc) goto out; - spin_lock(&oap->oap_lock); oap->oap_async_flags |= ASYNC_READY | ASYNC_URGENT; - spin_unlock(&oap->oap_lock); if (current->flags & PF_MEMALLOC) ext->oe_memalloc = 1; diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index aa8f61d..b9362d9 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -192,12 +192,8 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios, continue; } - if (page->cp_type != CPT_TRANSIENT) { - spin_lock(&oap->oap_lock); - oap->oap_async_flags = ASYNC_URGENT | ASYNC_READY; - oap->oap_async_flags |= ASYNC_COUNT_STABLE; - spin_unlock(&oap->oap_lock); - } + if (page->cp_type != CPT_TRANSIENT) + oap->oap_async_flags = ASYNC_URGENT | ASYNC_READY | ASYNC_COUNT_STABLE; osc_page_submit(env, opg, crt, brw_flags); list_add_tail(&oap->oap_pending_item, &list); diff --git a/fs/lustre/osc/osc_page.c b/fs/lustre/osc/osc_page.c index ba10ba3..667825a 100644 --- a/fs/lustre/osc/osc_page.c +++ b/fs/lustre/osc/osc_page.c @@ -204,12 +204,7 @@ static void osc_page_clip(const struct lu_env *env, opg->ops_from = from; /* argument @to is exclusive, but @ops_to is inclusive */ opg->ops_to = to - 1; - /* This isn't really necessary for transient pages, but we also don't - * call clip on transient pages often, so it's OK. - */ - spin_lock(&oap->oap_lock); oap->oap_async_flags |= ASYNC_COUNT_STABLE; - spin_unlock(&oap->oap_lock); } static int osc_page_flush(const struct lu_env *env, From patchwork Sun Nov 20 14:16:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050056 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6E029C433FE for ; Sun, 20 Nov 2022 14:22:46 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXgg02Rrz1yDD; Sun, 20 Nov 2022 06:18:38 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXf12MT8z1wM4 for ; Sun, 20 Nov 2022 06:17:13 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B3BE010077F7; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AC604E8B8B; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:49 -0500 Message-Id: <1668953828-10909-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/22] lnet: Don't modify uptodate peer with temp NI X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn When processing the config log it is possible that we attempt to add temp NIs after discovery has completed on a peer. These temp may not actually exist on the peer. Since discovery has already completed the peer is considered up-to-date and we can end up with incorrect peer entries. We shouldn't add temp NIs to a peer that is already up-to-date. HPE-bug-id: LUS-10867 WC-bug-id: https://jira.whamcloud.com/browse/LU-15852 Lustre-commit: 8f718df474e453fbc ("LU-15852 lnet: Don't modify uptodate peer with temp NI") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47322 Reviewed-by: Frank Sehr Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index d8d1857..52ad791 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1855,6 +1855,7 @@ struct lnet_peer_net * int lnet_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool mr, bool temp) +__must_hold(&the_lnet.ln_api_mutex) { struct lnet_peer *lp = NULL; struct lnet_peer_ni *lpni; @@ -1906,6 +1907,13 @@ struct lnet_peer_net * return -EPERM; } + if (temp && lnet_peer_is_uptodate(lp)) { + CDEBUG(D_NET, + "Don't add temporary peer NI for uptodate peer %s\n", + libcfs_nidstr(&lp->lp_primary_nid)); + return -EINVAL; + } + return lnet_peer_add_nid(lp, nid, flags); } From patchwork Sun Nov 20 14:16:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050058 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C70E4C433FE for ; Sun, 20 Nov 2022 14:25:10 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXgy16s7z1yG1; Sun, 20 Nov 2022 06:18:54 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXf24WKsz1y2s for ; Sun, 20 Nov 2022 06:17:14 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id B698E1007866; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B0629E8B9B; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:50 -0500 Message-Id: <1668953828-10909-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/22] lustre: llite: Explicitly support .splice_write X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Shaun Tancheff , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Shaun Tancheff Linux commit v5.9-rc1-6-g36e2c7421f02 fs: don't allow splice read/write without explicit ops Lustre supports splice_write and previously provide handlers for splice_read. Explicitly use iter_file_splice_write, if it exists. HPE-bug-id: LUS-11259 WC-bug-id: https://jira.whamcloud.com/browse/LU-16258 Lustre-commit: c619b6d6a54235cc0 ("LU-16258 llite: Explicitly support .splice_write") Signed-off-by: Shaun Tancheff Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48928 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 350d5df..34a449e 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -5564,6 +5564,7 @@ int ll_inode_permission(struct inode *inode, int mask) .mmap = ll_file_mmap, .llseek = ll_file_seek, .splice_read = generic_file_splice_read, + .splice_write = iter_file_splice_write, .fsync = ll_fsync, .flush = ll_flush, .fallocate = ll_fallocate, @@ -5578,6 +5579,7 @@ int ll_inode_permission(struct inode *inode, int mask) .mmap = ll_file_mmap, .llseek = ll_file_seek, .splice_read = generic_file_splice_read, + .splice_write = iter_file_splice_write, .fsync = ll_fsync, .flush = ll_flush, .flock = ll_file_flock, @@ -5595,6 +5597,7 @@ int ll_inode_permission(struct inode *inode, int mask) .mmap = ll_file_mmap, .llseek = ll_file_seek, .splice_read = generic_file_splice_read, + .splice_write = iter_file_splice_write, .fsync = ll_fsync, .flush = ll_flush, .flock = ll_file_noflock, From patchwork Sun Nov 20 14:16:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050054 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F34A7C433FE for ; Sun, 20 Nov 2022 14:19:48 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXfb2w1Kz1yBc; Sun, 20 Nov 2022 06:17:43 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXf424KFz1y6B for ; Sun, 20 Nov 2022 06:17:16 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id BA67D1007A82; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B504BE8B84; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:51 -0500 Message-Id: <1668953828-10909-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/22] lnet: o2iblnd: add verbose debug prints for rx/tx events X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov Added/modified debug messages for syncing with mlnx driver debug output. On rx/tx events print message type, size and peer credits. Make printing of debug message on o2iblnd conn refcount change events compile-time optional. Add compile-time option for dumping detailed connection state info to net debug log. WC-bug-id: https://jira.whamcloud.com/browse/LU-16172 Lustre-commit: b4e06ff1e4856ce08 ("LU-16172 o2iblnd: add verbose debug prints for rx/tx events") Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48600 Reviewed-by: Chris Horn Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.h | 78 +++++++++++++++++------- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 117 ++++++++++++++++++++++-------------- 2 files changed, 129 insertions(+), 66 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 56d486f..bef7a55 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -588,28 +588,32 @@ static inline int kiblnd_timeout(void) return dev->ibd_can_failover; } -#define kiblnd_conn_addref(conn) \ -do { \ - CDEBUG(D_NET, "conn[%p] (%d)++\n", \ - (conn), atomic_read(&(conn)->ibc_refcount)); \ - atomic_inc(&(conn)->ibc_refcount); \ -} while (0) - -#define kiblnd_conn_decref(conn) \ -do { \ - unsigned long flags; \ - \ - CDEBUG(D_NET, "conn[%p] (%d)--\n", \ - (conn), atomic_read(&(conn)->ibc_refcount)); \ - LASSERT_ATOMIC_POS(&(conn)->ibc_refcount); \ - if (atomic_dec_and_test(&(conn)->ibc_refcount)) { \ - spin_lock_irqsave(&kiblnd_data.kib_connd_lock, flags); \ - list_add_tail(&(conn)->ibc_list, \ - &kiblnd_data.kib_connd_zombies); \ - wake_up(&kiblnd_data.kib_connd_waitq); \ - spin_unlock_irqrestore(&kiblnd_data.kib_connd_lock, flags);\ - } \ -} while (0) +static inline void kiblnd_conn_addref(struct kib_conn *conn) +{ +#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK + CDEBUG(D_NET, "conn[%p] (%d)++\n", + conn, atomic_read(&conn->ibc_refcount)); +#endif + atomic_inc(&(conn)->ibc_refcount); +} + +static inline void kiblnd_conn_decref(struct kib_conn *conn) +{ + unsigned long flags; + +#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK + CDEBUG(D_NET, "conn[%p] (%d)--\n", + conn, atomic_read(&conn->ibc_refcount)); +#endif + LASSERT_ATOMIC_POS(&conn->ibc_refcount); + if (atomic_dec_and_test(&conn->ibc_refcount)) { + spin_lock_irqsave(&kiblnd_data.kib_connd_lock, flags); + list_add_tail(&conn->ibc_list, + &kiblnd_data.kib_connd_zombies); + wake_up(&kiblnd_data.kib_connd_waitq); + spin_unlock_irqrestore(&kiblnd_data.kib_connd_lock, flags); + } +} void kiblnd_destroy_peer(struct kref *kref); @@ -971,3 +975,33 @@ void kiblnd_pack_msg(struct lnet_ni *ni, struct kib_msg *msg, int version, int kiblnd_recv(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg, int delayed, struct iov_iter *to, unsigned int rlen); unsigned int kiblnd_get_dev_prio(struct lnet_ni *ni, unsigned int dev_idx); + +#define kiblnd_dump_conn_dbg(conn) \ +({ \ + if (conn && conn->ibc_cmid) \ + CDEBUG(D_NET, \ + "conn %p state %d nposted %d/%d c/o/r %d/%d/%d ce %d : cm_id %p qp_num 0x%x device_name %s\n", \ + conn, \ + conn->ibc_state, \ + conn->ibc_noops_posted, \ + conn->ibc_nsends_posted, \ + conn->ibc_credits, \ + conn->ibc_outstanding_credits, \ + conn->ibc_reserved_credits, \ + conn->ibc_comms_error, \ + conn->ibc_cmid, \ + conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0, \ + conn->ibc_cmid->qp ? (conn->ibc_cmid->qp->device ? dev_name(&conn->ibc_cmid->qp->device->dev) : "NULL") : "NULL"); \ + else if (conn) \ + CDEBUG(D_NET, \ + "conn %p state %d nposted %d/%d c/o/r %d/%d/%d ce %d : cm_id NULL\n", \ + conn, \ + conn->ibc_state, \ + conn->ibc_noops_posted, \ + conn->ibc_nsends_posted, \ + conn->ibc_credits, \ + conn->ibc_outstanding_credits, \ + conn->ibc_reserved_credits, \ + conn->ibc_comms_error \ + ); \ +}) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index b16841e..d4de326 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -337,9 +337,12 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, LASSERT(conn->ibc_state >= IBLND_CONN_ESTABLISHED); - CDEBUG(D_NET, "Received %x[%d] from %s\n", + CDEBUG(D_NET, "Received %x[%d] nob %u cm_id %p qp_num 0x%x\n", msg->ibm_type, credits, - libcfs_nid2str(conn->ibc_peer->ibp_nid)); + msg->ibm_nob, + conn->ibc_cmid, + conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0); + kiblnd_dump_conn_dbg(conn); if (credits) { /* Have I received credits that will let me send? */ @@ -760,8 +763,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, } if (credit && !conn->ibc_credits) { /* no credits */ - CDEBUG(D_NET, "%s: no credits\n", - libcfs_nid2str(peer_ni->ibp_nid)); + CDEBUG(D_NET, "%s: no credits cm_id %p qp_num 0x%x\n", + libcfs_nid2str(peer_ni->ibp_nid), + conn->ibc_cmid, + conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0); + kiblnd_dump_conn_dbg(conn); return -EAGAIN; } @@ -790,12 +796,22 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR; kiblnd_tx_done(tx); spin_lock(&conn->ibc_lock); - CDEBUG(D_NET, "%s(%d): redundant or enough NOOP\n", + CDEBUG(D_NET, "%s(%d): redundant or enough NOOP cm_id %p qp_num 0x%x\n", libcfs_nid2str(peer_ni->ibp_nid), - conn->ibc_noops_posted); + conn->ibc_noops_posted, + conn->ibc_cmid, + conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0); + kiblnd_dump_conn_dbg(conn); return 0; } + CDEBUG(D_NET, "Transmit %x[%d] nob %u cm_id %p qp_num 0x%x\n", + msg->ibm_type, credit, + msg->ibm_nob, + conn->ibc_cmid, + conn->ibc_cmid->qp ? conn->ibc_cmid->qp->qp_num : 0); + kiblnd_dump_conn_dbg(conn); + kiblnd_pack_msg(peer_ni->ibp_ni, msg, ver, conn->ibc_outstanding_credits, peer_ni->ibp_nid, conn->ibc_incarnation); @@ -1000,6 +1016,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, tx->tx_hstatus = LNET_MSG_STATUS_REMOTE_DROPPED; tx->tx_waiting = 0; /* don't wait for peer_ni */ tx->tx_status = -EIO; +#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK + kiblnd_dump_conn_dbg(conn); +#endif } idle = !tx->tx_sending && /* This is the final callback */ @@ -1982,10 +2001,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, list_empty(&conn->ibc_tx_queue_rsrvd) && list_empty(&conn->ibc_tx_queue_nocred) && list_empty(&conn->ibc_active_txs)) { - CDEBUG(D_NET, "closing conn to %s\n", + CDEBUG(D_NET, "closing conn %p to %s\n", + conn, libcfs_nid2str(peer_ni->ibp_nid)); } else { - CNETERR("Closing conn to %s: error %d%s%s%s%s%s\n", + CNETERR("Closing conn %p to %s: error %d%s%s%s%s%s\n", + conn, libcfs_nid2str(peer_ni->ibp_nid), error, list_empty(&conn->ibc_tx_queue) ? "" : "(sending)", list_empty(&conn->ibc_tx_noops) ? "" : "(sending_noops)", @@ -2660,11 +2681,11 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, cp.retry_count = *kiblnd_tunables.kib_retry_count; cp.rnr_retry_count = *kiblnd_tunables.kib_rnr_retry_count; - CDEBUG(D_NET, "Accept %s\n", libcfs_nid2str(nid)); + CDEBUG(D_NET, "Accept %s conn %p\n", libcfs_nid2str(nid), conn); rc = rdma_accept(cmid, &cp); if (rc) { - CERROR("Can't accept %s: %d\n", libcfs_nid2str(nid), rc); + CNETERR("Can't accept %s: %d cm_id %p\n", libcfs_nid2str(nid), rc, cmid); rej.ibr_version = version; rej.ibr_why = IBLND_REJECT_FATAL; @@ -3085,10 +3106,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, rc = rdma_connect(cmid, &cp); if (rc) { - CERROR("Can't connect to %s: %d\n", - libcfs_nid2str(peer_ni->ibp_nid), rc); + CNETERR("Can't connect to %s: %d cm_id %p\n", + libcfs_nid2str(peer_ni->ibp_nid), rc, cmid); kiblnd_connreq_done(conn, rc); kiblnd_conn_decref(conn); + } else { + CDEBUG(D_NET, "Connected to %s: cm_id %p\n", + libcfs_nid2str(peer_ni->ibp_nid), cmid); } return 0; @@ -3112,13 +3136,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, rc = kiblnd_passive_connect(cmid, (void *)KIBLND_CONN_PARAM(event), KIBLND_CONN_PARAM_LEN(event)); - CDEBUG(D_NET, "connreq: %d\n", rc); + CDEBUG(D_NET, "connreq: %d cm_id %p\n", rc, cmid); return rc; case RDMA_CM_EVENT_ADDR_ERROR: peer_ni = (struct kib_peer_ni *)cmid->context; - CNETERR("%s: ADDR ERROR %d\n", - libcfs_nid2str(peer_ni->ibp_nid), event->status); + CNETERR("%s: ADDR ERROR %d cm_id %p\n", + libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid); kiblnd_peer_connect_failed(peer_ni, 1, -EHOSTUNREACH); kiblnd_peer_decref(peer_ni); return -EHOSTUNREACH; /* rc destroys cmid */ @@ -3126,13 +3150,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, case RDMA_CM_EVENT_ADDR_RESOLVED: peer_ni = (struct kib_peer_ni *)cmid->context; - CDEBUG(D_NET, "%s Addr resolved: %d\n", - libcfs_nid2str(peer_ni->ibp_nid), event->status); + CDEBUG(D_NET, "%s Addr resolved: %d cm_id %p\n", + libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid); if (event->status) { - CNETERR("Can't resolve address for %s: %d\n", + CNETERR("Can't resolve address for %s: %d cm_id %p\n", libcfs_nid2str(peer_ni->ibp_nid), - event->status); + event->status, cmid); rc = event->status; } else { rc = rdma_resolve_route(cmid, @@ -3151,8 +3175,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, } /* Can't initiate route resolution */ - CERROR("Can't resolve route for %s: %d\n", - libcfs_nid2str(peer_ni->ibp_nid), rc); + CNETERR("Can't resolve route for %s: %d cm_id %p\n", + libcfs_nid2str(peer_ni->ibp_nid), rc, cmid); } kiblnd_peer_connect_failed(peer_ni, 1, rc); kiblnd_peer_decref(peer_ni); @@ -3160,8 +3184,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, case RDMA_CM_EVENT_ROUTE_ERROR: peer_ni = (struct kib_peer_ni *)cmid->context; - CNETERR("%s: ROUTE ERROR %d\n", - libcfs_nid2str(peer_ni->ibp_nid), event->status); + CNETERR("%s: ROUTE ERROR %d cm_id %p\n", + libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid); kiblnd_peer_connect_failed(peer_ni, 1, -EHOSTUNREACH); kiblnd_peer_decref(peer_ni); return -EHOSTUNREACH; /* rc destroys cmid */ @@ -3174,17 +3198,15 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, if (!event->status) return kiblnd_active_connect(cmid); - CNETERR("Can't resolve route for %s: %d\n", - libcfs_nid2str(peer_ni->ibp_nid), event->status); + CNETERR("Can't resolve route for %s: %d cm_id %p\n", + libcfs_nid2str(peer_ni->ibp_nid), event->status, cmid); kiblnd_peer_connect_failed(peer_ni, 1, event->status); kiblnd_peer_decref(peer_ni); return event->status; /* rc destroys cmid */ case RDMA_CM_EVENT_UNREACHABLE: - CNETERR("%s: UNREACHABLE %d, ibc_state: %d\n", - libcfs_nid2str(conn->ibc_peer->ibp_nid), - event->status, - conn->ibc_state); + CNETERR("%s: UNREACHABLE %d cm_id %p conn %p\n", + libcfs_nid2str(conn->ibc_peer->ibp_nid), event->status, cmid, conn); LASSERT(conn->ibc_state != IBLND_CONN_ESTABLISHED && conn->ibc_state != IBLND_CONN_INIT); if (conn->ibc_state == IBLND_CONN_ACTIVE_CONNECT || @@ -3198,8 +3220,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, conn = (struct kib_conn *)cmid->context; LASSERT(conn->ibc_state == IBLND_CONN_ACTIVE_CONNECT || conn->ibc_state == IBLND_CONN_PASSIVE_WAIT); - CNETERR("%s: CONNECT ERROR %d\n", - libcfs_nid2str(conn->ibc_peer->ibp_nid), event->status); + CNETERR("%s: CONNECT ERROR %d cm_id %p conn %p\n", + libcfs_nid2str(conn->ibc_peer->ibp_nid), event->status, cmid, conn); kiblnd_connreq_done(conn, -ENOTCONN); kiblnd_conn_decref(conn); return 0; @@ -3211,9 +3233,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LBUG(); case IBLND_CONN_PASSIVE_WAIT: - CERROR("%s: REJECTED %d\n", + CERROR("%s: REJECTED %d cm_id %p\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), - event->status); + event->status, cmid); kiblnd_connreq_done(conn, -ECONNRESET); break; @@ -3233,14 +3255,14 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LBUG(); case IBLND_CONN_PASSIVE_WAIT: - CDEBUG(D_NET, "ESTABLISHED (passive): %s\n", - libcfs_nid2str(conn->ibc_peer->ibp_nid)); + CDEBUG(D_NET, "ESTABLISHED (passive): %s cm_id %p conn %p\n", + libcfs_nid2str(conn->ibc_peer->ibp_nid), cmid, conn); kiblnd_connreq_done(conn, 0); break; case IBLND_CONN_ACTIVE_CONNECT: - CDEBUG(D_NET, "ESTABLISHED(active): %s\n", - libcfs_nid2str(conn->ibc_peer->ibp_nid)); + CDEBUG(D_NET, "ESTABLISHED(active): %s cm_id %p conn %p\n", + libcfs_nid2str(conn->ibc_peer->ibp_nid), cmid, conn); kiblnd_check_connreply(conn, (void *)KIBLND_CONN_PARAM(event), KIBLND_CONN_PARAM_LEN(event)); @@ -3255,8 +3277,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, case RDMA_CM_EVENT_DISCONNECTED: conn = (struct kib_conn *)cmid->context; if (conn->ibc_state < IBLND_CONN_ESTABLISHED) { - CERROR("%s DISCONNECTED\n", - libcfs_nid2str(conn->ibc_peer->ibp_nid)); + CERROR("%s DISCONNECTED cm_id %p conn %p\n", + libcfs_nid2str(conn->ibc_peer->ibp_nid), cmid, conn); kiblnd_connreq_done(conn, -ECONNRESET); } else { kiblnd_close_conn(conn, 0); @@ -3372,6 +3394,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, conn->ibc_credits, conn->ibc_outstanding_credits, conn->ibc_reserved_credits); +#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK + kiblnd_dump_conn_dbg(conn); +#endif list_add(&conn->ibc_connd_list, &closes); } else { list_add(&conn->ibc_connd_list, &checksends); @@ -3425,7 +3450,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(!in_interrupt()); LASSERT(current == kiblnd_data.kib_connd); LASSERT(conn->ibc_state == IBLND_CONN_CLOSING); - +#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK + kiblnd_dump_conn_dbg(conn); +#endif rdma_disconnect(conn->ibc_cmid); kiblnd_finalise_conn(conn); @@ -3716,6 +3743,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, (conn->ibc_nrx > 0 || conn->ibc_nsends_posted > 0)) { kiblnd_conn_addref(conn); /* +1 ref for sched_conns */ + kiblnd_dump_conn_dbg(conn); conn->ibc_scheduled = 1; list_add_tail(&conn->ibc_sched_list, &sched->ibs_conns); @@ -3788,8 +3816,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, rc = ib_req_notify_cq(conn->ibc_cq, IB_CQ_NEXT_COMP); if (rc < 0) { - CWARN("%s: ib_req_notify_cq failed: %d, closing connection\n", - libcfs_nid2str(conn->ibc_peer->ibp_nid), rc); + CWARN("%s: ib_req_notify_cq failed: %d, closing connection %p\n", + libcfs_nid2str(conn->ibc_peer->ibp_nid), + rc, conn); kiblnd_close_conn(conn, -EIO); kiblnd_conn_decref(conn); spin_lock_irqsave(&sched->ibs_lock, @@ -3810,9 +3839,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, } if (rc < 0) { - CWARN("%s: ib_poll_cq failed: %d, closing connection\n", + CWARN("%s: ib_poll_cq failed: %d, closing connection %p\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), - rc); + rc, conn); kiblnd_close_conn(conn, -EIO); kiblnd_conn_decref(conn); spin_lock_irqsave(&sched->ibs_lock, flags); From patchwork Sun Nov 20 14:16:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 02341C4332F for ; Sun, 20 Nov 2022 14:22:44 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXgg0KdCz1yDG; Sun, 20 Nov 2022 06:18:39 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXfc46k0z1wfv for ; Sun, 20 Nov 2022 06:17:44 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C001A1007B72; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BAFA5E8B88; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:52 -0500 Message-Id: <1668953828-10909-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/22] lnet: use Netlink to support old and new NI APIs. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The LNet layer uses two different sets of ioctls. One ioctl set is for Multi-Rail and the other is an older API. Both are in heavy use and with the upcoming support for IPv6 we are looking at an explosion of ioctls. The solution is to move the LNet layer to Netlink which can easily handle all the differences between the APIs. This also resolves a long standing issue of the user land API constantly changing in a non-compatible way with previous versions. This patch unifies the handling the LNet NI to use Netlink and is fully aware of the new large NID addressing. WC-bug-id: https://jira.whamcloud.com/browse/LU-10003 Lustre-commit: 8f8f6e2f36e56e53e ("LU-10003 lnet: use Netlink to support old and new NI APIs.") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48814 Reviewed-by: Serguei Smirnov Reviewed-by: Neil Brown Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin --- include/linux/lnet/lib-lnet.h | 6 +- include/linux/lnet/lib-types.h | 103 +++++ include/uapi/linux/lnet/libcfs_ioctl.h | 2 +- include/uapi/linux/lnet/lnet-dlc.h | 23 + include/uapi/linux/lnet/lnet-types.h | 15 + net/lnet/klnds/o2iblnd/o2iblnd.c | 88 +++- net/lnet/klnds/o2iblnd/o2iblnd.h | 16 + net/lnet/klnds/socklnd/socklnd.c | 37 +- net/lnet/klnds/socklnd/socklnd.h | 9 + net/lnet/lnet/api-ni.c | 779 +++++++++++++++++++++++++++++++-- net/lnet/lnet/config.c | 4 +- net/lnet/lnet/module.c | 42 +- 12 files changed, 1054 insertions(+), 70 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index bd4acef..13ce2bf 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -457,6 +457,7 @@ struct lnet_ni * struct lnet_ni * lnet_ni_alloc_w_cpt_array(struct lnet_net *net, u32 *cpts, u32 ncpts, char *iface); +int lnet_ni_add_interface(struct lnet_ni *ni, char *iface); static inline int lnet_nid2peerhash(struct lnet_nid *nid) @@ -622,8 +623,9 @@ void lnet_rtr_transfer_to_peer(struct lnet_peer *src, struct lnet_remotenet *lnet_find_rnet_locked(u32 net); int lnet_dyn_add_net(struct lnet_ioctl_config_data *conf); int lnet_dyn_del_net(u32 net); -int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf); -int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf); +int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf, u32 net, + struct lnet_ioctl_config_lnd_tunables *tun); +int lnet_dyn_del_ni(struct lnet_nid *nid); int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason); struct lnet_net *lnet_get_net_locked(u32 net_id); void lnet_net_clr_pref_rtrs(struct lnet_net *net); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 499385b..2d3b044 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -335,6 +335,11 @@ struct lnet_lnd { /* get dma_dev priority */ unsigned int (*lnd_get_dev_prio)(struct lnet_ni *ni, unsigned int dev_idx); + + /* Handle LND specific Netlink handling */ + int (*lnd_nl_set)(int cmd, struct nlattr *attr, int type, void *data); + + const struct ln_key_list *lnd_keys; }; /* FIXME !!!!! The abstract for GPU page support (PCI peer2peer) @@ -464,6 +469,104 @@ struct lnet_net { struct list_head net_rtr_pref_nids; }; +/* Normally Netlink atttributes are defined in UAPI headers but Lustre is + * different in that the ABI is in a constant state of change unlike other + * Netlink interfaces. LNet sends a special header to help user land handle + * the differences. + */ + +/** enum lnet_net_attrs - LNet NI netlink properties + * attributes that describe LNet 'NI' + * These values are used to piece together + * messages for sending and receiving. + * + * @LNET_NET_ATTR_UNSPEC: unspecified attribute to catch errors + * + * @LNET_NET_ATTR_HDR: grouping for LNet net data (NLA_NESTED) + * @LNET_NET_ATTR_TYPE: LNet net this NI belongs to (NLA_STRING) + * @LNET_NET_ATTR_LOCAL: Local NI information (NLA_NESTED) + */ +enum lnet_net_attrs { + LNET_NET_ATTR_UNSPEC = 0, + + LNET_NET_ATTR_HDR, + LNET_NET_ATTR_TYPE, + LNET_NET_ATTR_LOCAL, + + __LNET_NET_ATTR_MAX_PLUS_ONE, +}; + +#define LNET_NET_ATTR_MAX (__LNET_NET_ATTR_MAX_PLUS_ONE - 1) + +/** enum lnet_net_local_ni_attrs - LNet local NI netlink properties + * attributes that describe local NI + * + * @LNET_NET_LOCAL_NI_ATTR_UNSPEC: unspecified attribute to catch errors + * + * @LNET_NET_LOCAL_NI_ATTR_NID: NID that represents this NI (NLA_STRING) + * @LNET_NET_LOCAL_NI_ATTR_STATUS: State of this NI (NLA_STRING) + * @LNET_NET_LOCAL_NI_ATTR_INTERFACE: Defines physical devices (NLA_NESTED) + * Used to be many devices but no longer. + */ +enum lnet_net_local_ni_attrs { + LNET_NET_LOCAL_NI_ATTR_UNSPEC = 0, + + LNET_NET_LOCAL_NI_ATTR_NID, + LNET_NET_LOCAL_NI_ATTR_STATUS, + LNET_NET_LOCAL_NI_ATTR_INTERFACE, + + __LNET_NET_LOCAL_NI_ATTR_MAX_PLUS_ONE, +}; + +#define LNET_NET_LOCAL_NI_ATTR_MAX (__LNET_NET_LOCAL_NI_ATTR_MAX_PLUS_ONE - 1) + +/** enum lnet_net_local_ni_intf_attrs - LNet NI device netlink properties + * attribute that reports the device + * in use + * + * @LNET_NET_LOCAL_NI_INTF_ATTR_UNSPEC: unspecified attribute to catch errors + * + * @LNET_NET_LOCAL_NI_INTF_ATTR_TYPE: Physcial device interface (NLA_STRING) + */ +enum lnet_net_local_ni_intf_attrs { + LNET_NET_LOCAL_NI_INTF_ATTR_UNSPEC = 0, + + LNET_NET_LOCAL_NI_INTF_ATTR_TYPE, + + __LNET_NET_LOCAL_NI_INTF_ATTR_MAX_PLUS_ONE, +}; + +#define LNET_NET_LOCAL_NI_INTF_ATTR_MAX (__LNET_NET_LOCAL_NI_INTF_ATTR_MAX_PLUS_ONE - 1) + +/** enum lnet_net_local_ni_tunables_attrs - LNet NI tunables + * netlink properties. + * Performance options + * for your NI. + * + * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_UNSPEC: unspecified attribute + * to catch errors + * + * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT: Timeout for LNet peer. + * (NLA_S32) + * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS: Credits for LNet peer. + * (NLA_S32) + * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS: Buffer credits for + * LNet peer. (NLA_S32) + * @LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS: Credits for LNet peer + * TX. (NLA_S32) + */ +enum lnet_net_local_ni_tunables_attr { + LNET_NET_LOCAL_NI_TUNABLES_ATTR_UNSPEC = 0, + + LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT, + LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS, + LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS, + LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS, + __LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX_PLUS_ONE, +}; + +#define LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX (__LNET_NET_LOCAL_NI_TUNABLES_ATTR_MAX_PLUS_ONE - 1) + struct lnet_ni { spinlock_t ni_lock; /* chain on the lnet_net structure */ diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index f2ae76c..89ac075 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -94,7 +94,7 @@ struct libcfs_ioctl_data { #define IOC_LIBCFS_MARK_DEBUG _IOWR('e', 32, IOCTL_LIBCFS_TYPE) /* IOC_LIBCFS_MEMHOG obsolete in 2.8.0, was _IOWR('e', 36, IOCTL_LIBCFS_TYPE) */ /* lnet ioctls */ -#define IOC_LIBCFS_GET_NI _IOWR('e', 50, IOCTL_LIBCFS_TYPE) +/* IOC_LIBCFS_GET_NI obsolete in 2.16, was _IOWR('e', 50, IOCTL_LIBCFS_TYPE) */ #define IOC_LIBCFS_FAIL_NID _IOWR('e', 51, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_NOTIFY_ROUTER _IOWR('e', 55, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_UNCONFIGURE _IOWR('e', 56, IOCTL_LIBCFS_TYPE) diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 415968a..58697c1 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -49,6 +49,29 @@ #define __user #endif +#define LNET_GENL_NAME "lnet" +#define LNET_GENL_VERSION 0x05 + +/* enum lnet_commands - Supported core LNet Netlink commands + * + * @LNET_CMD_UNSPEC: unspecified command to catch errors + * + * @LNET_CMD_NETS: command to manage the LNet networks + */ +enum lnet_commands { + LNET_CMD_UNSPEC = 0, + + LNET_CMD_CONFIGURE = 1, + LNET_CMD_NETS = 2, + LNET_CMD_PEERS = 3, + LNET_CMD_ROUTES = 4, + LNET_CMD_CONNS = 5, + + __LNET_CMD_MAX_PLUS_ONE +}; + +#define LNET_CMD_MAX (__LNET_CMD_MAX_PLUS_ONE - 1) + /* * To allow for future enhancements to extend the tunables * add a hdr to this structure, so that the version can be set diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h index 5a2ea45..304add9 100644 --- a/include/uapi/linux/lnet/lnet-types.h +++ b/include/uapi/linux/lnet/lnet-types.h @@ -37,8 +37,12 @@ #include #include +#include #include #include +#ifndef __KERNEL__ +#include +#endif /** \addtogroup lnet * @{ @@ -111,6 +115,17 @@ static inline __u32 LNET_MKNET(__u32 type, __u32 num) #define LNET_NET_ANY LNET_NIDNET(LNET_NID_ANY) +/* check for address set */ +static inline bool nid_addr_is_set(const struct lnet_nid *nid) +{ + int sum = 0, i; + + for (i = 0; i < NID_ADDR_BYTES(nid); i++) + sum |= nid->nid_addr[i]; + + return sum ? true : false; +} + static inline int nid_is_nid4(const struct lnet_nid *nid) { return NID_ADDR_BYTES(nid) == 4; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 94ff926..cbb3445 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -491,6 +491,86 @@ void kiblnd_unlink_peer_locked(struct kib_peer_ni *peer_ni) spin_unlock(&conn->ibc_lock); } +static const struct ln_key_list kiblnd_tunables_keys = { + .lkl_maxattr = LNET_NET_O2IBLND_TUNABLES_ATTR_MAX, + .lkl_list = { + [LNET_NET_O2IBLND_TUNABLES_ATTR_HIW_PEER_CREDITS] = { + .lkp_value = "peercredits_hiw", + .lkp_data_type = NLA_U32 + }, + [LNET_NET_O2IBLND_TUNABLES_ATTR_MAP_ON_DEMAND] = { + .lkp_value = "map_on_demand", + .lkp_data_type = NLA_FLAG + }, + [LNET_NET_O2IBLND_TUNABLES_ATTR_CONCURRENT_SENDS] = { + .lkp_value = "concurrent_sends", + .lkp_data_type = NLA_U32 + }, + [LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_POOL_SIZE] = { + .lkp_value = "fmr_pool_size", + .lkp_data_type = NLA_U32 + }, + [LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_FLUSH_TRIGGER] = { + .lkp_value = "fmr_flush_trigger", + .lkp_data_type = NLA_U32 + }, + [LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_CACHE] = { + .lkp_value = "fmr_cache", + .lkp_data_type = NLA_U32 + }, + [LNET_NET_O2IBLND_TUNABLES_ATTR_NTX] = { + .lkp_value = "ntx", + .lkp_data_type = NLA_U16 + }, + [LNET_NET_O2IBLND_TUNABLES_ATTR_CONNS_PER_PEER] = { + .lkp_value = "conns_per_peer", + .lkp_data_type = NLA_U16 + }, + }, +}; + +static int +kiblnd_nl_set(int cmd, struct nlattr *attr, int type, void *data) +{ + struct lnet_lnd_tunables *tunables = data; + + if (cmd != LNET_CMD_NETS) + return -EOPNOTSUPP; + + if (nla_type(attr) != LN_SCALAR_ATTR_INT_VALUE) + return -EINVAL; + + switch (type) { + case LNET_NET_O2IBLND_TUNABLES_ATTR_HIW_PEER_CREDITS: + tunables->lnd_tun_u.lnd_o2ib.lnd_peercredits_hiw = nla_get_s64(attr); + break; + case LNET_NET_O2IBLND_TUNABLES_ATTR_MAP_ON_DEMAND: + tunables->lnd_tun_u.lnd_o2ib.lnd_map_on_demand = nla_get_s64(attr); + break; + case LNET_NET_O2IBLND_TUNABLES_ATTR_CONCURRENT_SENDS: + tunables->lnd_tun_u.lnd_o2ib.lnd_concurrent_sends = nla_get_s64(attr); + break; + case LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_POOL_SIZE: + tunables->lnd_tun_u.lnd_o2ib.lnd_fmr_pool_size = nla_get_s64(attr); + break; + case LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_FLUSH_TRIGGER: + tunables->lnd_tun_u.lnd_o2ib.lnd_fmr_flush_trigger = nla_get_s64(attr); + break; + case LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_CACHE: + tunables->lnd_tun_u.lnd_o2ib.lnd_fmr_cache = nla_get_s64(attr); + break; + case LNET_NET_O2IBLND_TUNABLES_ATTR_NTX: + tunables->lnd_tun_u.lnd_o2ib.lnd_ntx = nla_get_s64(attr); + break; + case LNET_NET_O2IBLND_TUNABLES_ATTR_CONNS_PER_PEER: + tunables->lnd_tun_u.lnd_o2ib.lnd_conns_per_peer = nla_get_s64(attr); + default: + break; + } + + return 0; +} + static void kiblnd_dump_peer_debug_info(struct kib_peer_ni *peer_ni) { @@ -3173,7 +3253,11 @@ static int kiblnd_startup(struct lnet_ni *ni) net->ibn_dev = ibdev; ni->ni_nid.nid_addr[0] = cpu_to_be32(ibdev->ibd_ifip); - + if (!ni->ni_interface) { + rc = lnet_ni_add_interface(ni, ifaces[i].li_name); + if (rc < 0) + CWARN("ko2iblnd failed to allocate ni_interface\n"); + } ni->ni_dev_cpt = ifaces[i].li_cpt; rc = kiblnd_dev_start_threads(ibdev, newdev, ni->ni_cpts, ni->ni_ncpts); @@ -3220,6 +3304,8 @@ static int kiblnd_startup(struct lnet_ni *ni) .lnd_send = kiblnd_send, .lnd_recv = kiblnd_recv, .lnd_get_dev_prio = kiblnd_get_dev_prio, + .lnd_nl_set = kiblnd_nl_set, + .lnd_keys = &kiblnd_tunables_keys, }; static void ko2inlnd_assert_wire_constants(void) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index bef7a55..e3c069b 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -65,6 +65,22 @@ #include #include "o2iblnd-idl.h" +enum kiblnd_ni_lnd_tunables_attr { + LNET_NET_O2IBLND_TUNABLES_ATTR_UNSPEC = 0, + + LNET_NET_O2IBLND_TUNABLES_ATTR_HIW_PEER_CREDITS, + LNET_NET_O2IBLND_TUNABLES_ATTR_CONCURRENT_SENDS, + LNET_NET_O2IBLND_TUNABLES_ATTR_MAP_ON_DEMAND, + LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_POOL_SIZE, + LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_FLUSH_TRIGGER, + LNET_NET_O2IBLND_TUNABLES_ATTR_FMR_CACHE, + LNET_NET_O2IBLND_TUNABLES_ATTR_NTX, + LNET_NET_O2IBLND_TUNABLES_ATTR_CONNS_PER_PEER, + __LNET_NET_O2IBLND_TUNABLES_ATTR_MAX_PLUS_ONE, +}; + +#define LNET_NET_O2IBLND_TUNABLES_ATTR_MAX (__LNET_NET_O2IBLND_TUNABLES_ATTR_MAX_PLUS_ONE - 1) + #define IBLND_PEER_HASH_BITS 7 /* log2 of # peer_ni lists */ #define IBLND_N_SCHED 2 diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index e8f8020..21fccfa 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -840,6 +840,33 @@ struct ksock_peer_ni * return 0; } +static const struct ln_key_list ksocknal_tunables_keys = { + .lkl_maxattr = LNET_NET_SOCKLND_TUNABLES_ATTR_MAX, + .lkl_list = { + [LNET_NET_SOCKLND_TUNABLES_ATTR_CONNS_PER_PEER] = { + .lkp_value = "conns_per_peer", + .lkp_data_type = NLA_S32 + }, + }, +}; + +static int +ksocknal_nl_set(int cmd, struct nlattr *attr, int type, void *data) +{ + struct lnet_lnd_tunables *tunables = data; + + if (cmd != LNET_CMD_NETS) + return -EOPNOTSUPP; + + if (type != LNET_NET_SOCKLND_TUNABLES_ATTR_CONNS_PER_PEER || + nla_type(attr) != LN_SCALAR_ATTR_INT_VALUE) + return -EINVAL; + + tunables->lnd_tun_u.lnd_sock.lnd_conns_per_peer = nla_get_s64(attr); + + return 0; +} + static int ksocknal_connecting(struct ksock_conn_cb *conn_cb, struct sockaddr *sa) { @@ -2520,16 +2547,20 @@ static int ksocknal_inetaddr_event(struct notifier_block *unused, ksi = &net->ksnn_interface; /* Use the first discovered interface or look in the list */ if (ni->ni_interface) { - for (i = 0; i < rc; i++) + for (i = 0; i < rc; i++) { if (strcmp(ifaces[i].li_name, ni->ni_interface) == 0) break; - + } /* ni_interface doesn't contain the interface we want */ if (i == rc) { CERROR("ksocklnd: failed to find interface %s\n", ni->ni_interface); goto fail_1; } + } else { + rc = lnet_ni_add_interface(ni, ifaces[i].li_name); + if (rc < 0) + CWARN("ksocklnd failed to allocate ni_interface\n"); } ni->ni_dev_cpt = ifaces[i].li_cpt; @@ -2590,6 +2621,8 @@ static void __exit ksocklnd_exit(void) .lnd_recv = ksocknal_recv, .lnd_notify_peer_down = ksocknal_notify_gw_down, .lnd_accept = ksocknal_accept, + .lnd_nl_set = ksocknal_nl_set, + .lnd_keys = &ksocknal_tunables_keys, }; static int __init ksocklnd_init(void) diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index bb68a3d..50892b1 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -74,6 +74,15 @@ # define SOCKNAL_RISK_KMAP_DEADLOCK 1 #endif +enum ksocklnd_ni_lnd_tunables_attr { + LNET_NET_SOCKLND_TUNABLES_ATTR_UNSPEC = 0, + + LNET_NET_SOCKLND_TUNABLES_ATTR_CONNS_PER_PEER, + __LNET_NET_SOCKLND_TUNABLES_ATTR_MAX_PLUS_ONE, +}; + +#define LNET_NET_SOCKLND_TUNABLES_ATTR_MAX (__LNET_NET_SOCKLND_TUNABLES_ATTR_MAX_PLUS_ONE - 1) + /* per scheduler state */ struct ksock_sched { /* per scheduler state */ spinlock_t kss_lock; /* serialise */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 9459fc0..af875ba 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -34,6 +34,8 @@ #include #include #include +#include +#include #include #include @@ -2498,6 +2500,36 @@ static void lnet_push_target_fini(void) return rc; } +static struct lnet_lnd *lnet_load_lnd(u32 lnd_type) +{ + struct lnet_lnd *lnd; + int rc = 0; + + mutex_lock(&the_lnet.ln_lnd_mutex); + lnd = lnet_find_lnd_by_type(lnd_type); + if (!lnd) { + mutex_unlock(&the_lnet.ln_lnd_mutex); + rc = request_module("%s", libcfs_lnd2modname(lnd_type)); + mutex_lock(&the_lnet.ln_lnd_mutex); + + lnd = lnet_find_lnd_by_type(lnd_type); + if (!lnd) { + mutex_unlock(&the_lnet.ln_lnd_mutex); + CERROR("Can't load LND %s, module %s, rc=%d\n", + libcfs_lnd2str(lnd_type), + libcfs_lnd2modname(lnd_type), rc); +#ifndef HAVE_MODULE_LOADING_SUPPORT + LCONSOLE_ERROR_MSG(0x104, + "Your kernel must be compiled with kernel module loading support."); +#endif + return ERR_PTR(-EINVAL); + } + } + mutex_unlock(&the_lnet.ln_lnd_mutex); + + return lnd; +} + static int lnet_startup_lndnet(struct lnet_net *net, struct lnet_lnd_tunables *tun) { @@ -2525,27 +2557,14 @@ static void lnet_push_target_fini(void) if (lnet_net_unique(net->net_id, &the_lnet.ln_nets, &net_l)) { lnd_type = LNET_NETTYP(net->net_id); - mutex_lock(&the_lnet.ln_lnd_mutex); - lnd = lnet_find_lnd_by_type(lnd_type); - - if (!lnd) { - mutex_unlock(&the_lnet.ln_lnd_mutex); - rc = request_module("%s", libcfs_lnd2modname(lnd_type)); - mutex_lock(&the_lnet.ln_lnd_mutex); - - lnd = lnet_find_lnd_by_type(lnd_type); - if (!lnd) { - mutex_unlock(&the_lnet.ln_lnd_mutex); - CERROR("Can't load LND %s, module %s, rc=%d\n", - libcfs_lnd2str(lnd_type), - libcfs_lnd2modname(lnd_type), rc); - rc = -EINVAL; - goto failed0; - } + lnd = lnet_load_lnd(lnd_type); + if (IS_ERR(lnd)) { + rc = PTR_ERR(lnd); + goto failed0; } + mutex_lock(&the_lnet.ln_lnd_mutex); net->net_lnd = lnd; - mutex_unlock(&the_lnet.ln_lnd_mutex); net_l = net; @@ -2766,6 +2785,8 @@ int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq, } EXPORT_SYMBOL(lnet_genl_send_scalar_list); +static struct genl_family lnet_family; + /** * Initialize LNet library. * @@ -2803,6 +2824,13 @@ int lnet_lib_init(void) return rc; } + rc = genl_register_family(&lnet_family); + if (rc != 0) { + lnet_destroy_locks(); + CERROR("Can't register LNet netlink family: %d\n", rc); + return rc; + } + the_lnet.ln_refcount = 0; INIT_LIST_HEAD(&the_lnet.ln_net_zombie); INIT_LIST_HEAD(&the_lnet.ln_msg_resend); @@ -2846,6 +2874,7 @@ void lnet_lib_exit(void) for (i = 0; i < NUM_LNDS; i++) LASSERT(!the_lnet.ln_lnds[i]); lnet_destroy_locks(); + genl_unregister_family(&lnet_family); } /** @@ -3525,31 +3554,24 @@ static int lnet_handle_legacy_ip2nets(char *ip2nets, return rc; } -int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf) +int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf, u32 net_id, + struct lnet_ioctl_config_lnd_tunables *tun) { struct lnet_net *net; struct lnet_ni *ni; - struct lnet_ioctl_config_lnd_tunables *tun = NULL; int rc, i; - u32 net_id, lnd_type; - - /* get the tunables if they are available */ - if (conf->lic_cfg_hdr.ioc_len >= - sizeof(*conf) + sizeof(*tun)) - tun = (struct lnet_ioctl_config_lnd_tunables *) - conf->lic_bulk; + u32 lnd_type; /* handle legacy ip2nets from DLC */ if (conf->lic_legacy_ip2nets[0] != '\0') return lnet_handle_legacy_ip2nets(conf->lic_legacy_ip2nets, tun); - net_id = LNET_NIDNET(conf->lic_nid); lnd_type = LNET_NETTYP(net_id); if (!libcfs_isknown_lnd(lnd_type)) { CERROR("No valid net and lnd information provided\n"); - return -EINVAL; + return -ENOENT; } net = lnet_net_alloc(net_id, NULL); @@ -3559,7 +3581,7 @@ int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf) for (i = 0; i < conf->lic_ncpts; i++) { if (conf->lic_cpts[i] >= LNET_CPT_NUMBER) { lnet_net_free(net); - return -EINVAL; + return -ERANGE; } } @@ -3588,16 +3610,15 @@ int lnet_dyn_add_ni(struct lnet_ioctl_config_ni *conf) return rc; } -int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) +int lnet_dyn_del_ni(struct lnet_nid *nid) { struct lnet_net *net; struct lnet_ni *ni; - u32 net_id = LNET_NIDNET(conf->lic_nid); + u32 net_id = LNET_NID_NET(nid); struct lnet_ping_buffer *pbuf; struct lnet_handle_md ping_mdh; int net_bytes, rc; bool net_empty; - u32 addr; /* don't allow userspace to shutdown the LOLND */ if (LNET_NETTYP(net_id) == LOLND) @@ -3619,8 +3640,7 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) goto unlock_net; } - addr = LNET_NIDADDR(conf->lic_nid); - if (addr == 0) { + if (!nid_addr_is_set(nid)) { /* remove the entire net */ net_bytes = lnet_get_net_ni_bytes_locked(net); @@ -3642,10 +3662,9 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) goto unlock_api_mutex; } - ni = lnet_nid2ni_locked(conf->lic_nid, 0); + ni = lnet_nid_to_ni_locked(nid, 0); if (!ni) { - CERROR("nid %s not found\n", - libcfs_nid2str(conf->lic_nid)); + CERROR("nid %s not found\n", libcfs_nidstr(nid)); rc = -ENOENT; goto unlock_net; } @@ -3952,8 +3971,6 @@ u32 lnet_get_dlc_seq_locked(void) { struct libcfs_ioctl_data *data = arg; struct lnet_ioctl_config_data *config; - struct lnet_process_id id4 = {}; - struct lnet_processid id = {}; struct lnet_ni *ni; struct lnet_nid nid; int rc; @@ -3963,11 +3980,6 @@ u32 lnet_get_dlc_seq_locked(void) sizeof(struct lnet_ioctl_config_data)); switch (cmd) { - case IOC_LIBCFS_GET_NI: - rc = LNetGetId(data->ioc_count, &id); - data->ioc_nid = lnet_nid_to_nid4(&id.nid); - return rc; - case IOC_LIBCFS_FAIL_NID: return lnet_fail_nid(data->ioc_nid, data->ioc_count); @@ -4351,6 +4363,7 @@ u32 lnet_get_dlc_seq_locked(void) return lnet_fault_ctl(data->ioc_flags, data); case IOC_LIBCFS_PING: { + struct lnet_process_id id4; signed long timeout; id4.nid = data->ioc_nid; @@ -4561,6 +4574,682 @@ u32 lnet_get_dlc_seq_locked(void) } EXPORT_SYMBOL(LNetCtl); +static const struct ln_key_list net_props_list = { + .lkl_maxattr = LNET_NET_ATTR_MAX, + .lkl_list = { + [LNET_NET_ATTR_HDR] = { + .lkp_value = "net", + .lkp_key_format = LNKF_SEQUENCE | LNKF_MAPPING, + .lkp_data_type = NLA_NUL_STRING, + }, + [LNET_NET_ATTR_TYPE] = { + .lkp_value = "net type", + .lkp_data_type = NLA_STRING + }, + [LNET_NET_ATTR_LOCAL] = { + .lkp_value = "local NI(s)", + .lkp_key_format = LNKF_SEQUENCE | LNKF_MAPPING, + .lkp_data_type = NLA_NESTED + }, + }, +}; + +static struct ln_key_list local_ni_list = { + .lkl_maxattr = LNET_NET_LOCAL_NI_ATTR_MAX, + .lkl_list = { + [LNET_NET_LOCAL_NI_ATTR_NID] = { + .lkp_value = "nid", + .lkp_data_type = NLA_STRING + }, + [LNET_NET_LOCAL_NI_ATTR_STATUS] = { + .lkp_value = "status", + .lkp_data_type = NLA_STRING + }, + [LNET_NET_LOCAL_NI_ATTR_INTERFACE] = { + .lkp_value = "interfaces", + .lkp_key_format = LNKF_MAPPING, + .lkp_data_type = NLA_NESTED + }, + }, +}; + +static const struct ln_key_list local_ni_interfaces_list = { + .lkl_maxattr = LNET_NET_LOCAL_NI_INTF_ATTR_MAX, + .lkl_list = { + [LNET_NET_LOCAL_NI_INTF_ATTR_TYPE] = { + .lkp_value = "0", + .lkp_data_type = NLA_STRING + }, + }, +}; + +/* Use an index since the traversal is across LNet nets and ni collections */ +struct lnet_genl_net_list { + unsigned int lngl_net_id; + unsigned int lngl_idx; +}; + +static inline struct lnet_genl_net_list * +lnet_net_dump_ctx(struct netlink_callback *cb) +{ + return (struct lnet_genl_net_list *)cb->args[0]; +} + +static int lnet_net_show_done(struct netlink_callback *cb) +{ + struct lnet_genl_net_list *nlist = lnet_net_dump_ctx(cb); + + kfree(nlist); + cb->args[0] = 0; + + return 0; +} + +/* LNet net ->start() handler for GET requests */ +static int lnet_net_show_start(struct netlink_callback *cb) +{ + struct genlmsghdr *gnlh = nlmsg_data(cb->nlh); + struct netlink_ext_ack *extack = cb->extack; + struct lnet_genl_net_list *nlist; + int msg_len = genlmsg_len(gnlh); + struct nlattr *params, *top; + int rem, rc = 0; + + if (the_lnet.ln_refcount == 0) { + NL_SET_ERR_MSG(extack, "LNet stack down"); + return -ENETDOWN; + } + + nlist = kmalloc(sizeof(*nlist), GFP_KERNEL); + if (!nlist) + return -ENOMEM; + + nlist->lngl_net_id = LNET_NET_ANY; + nlist->lngl_idx = 0; + cb->args[0] = (long)nlist; + + if (!msg_len) + return 0; + + params = genlmsg_data(gnlh); + nla_for_each_attr(top, params, msg_len, rem) { + struct nlattr *net; + int rem2; + + nla_for_each_nested(net, top, rem2) { + char filter[LNET_NIDSTR_SIZE]; + + if (nla_type(net) != LN_SCALAR_ATTR_VALUE || + nla_strcmp(net, "name") != 0) + continue; + + net = nla_next(net, &rem2); + if (nla_type(net) != LN_SCALAR_ATTR_VALUE) { + NL_SET_ERR_MSG(extack, "invalid config param"); + rc = -EINVAL; + goto report_err; + } + + rc = nla_strlcpy(filter, net, sizeof(filter)); + if (rc < 0) { + NL_SET_ERR_MSG(extack, "failed to get param"); + goto report_err; + } + rc = 0; + + nlist->lngl_net_id = libcfs_str2net(filter); + if (nlist->lngl_net_id == LNET_NET_ANY) { + NL_SET_ERR_MSG(extack, "cannot parse net"); + rc = -ENOENT; + goto report_err; + } + } + } +report_err: + if (rc < 0) + lnet_net_show_done(cb); + + return rc; +} + +static int lnet_net_show_dump(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct lnet_genl_net_list *nlist = lnet_net_dump_ctx(cb); + struct netlink_ext_ack *extack = cb->extack; + int portid = NETLINK_CB(cb->skb).portid; + int seq = cb->nlh->nlmsg_seq; + struct lnet_net *net; + int idx = 0, rc = 0; + bool found = false; + void *hdr = NULL; + + if (!nlist->lngl_idx) { + const struct ln_key_list *all[] = { + &net_props_list, &local_ni_list, + &local_ni_interfaces_list, + NULL + }; + + rc = lnet_genl_send_scalar_list(msg, portid, seq, + &lnet_family, + NLM_F_CREATE | NLM_F_MULTI, + LNET_CMD_NETS, all); + if (rc < 0) { + NL_SET_ERR_MSG(extack, "failed to send key table"); + goto send_error; + } + } + + lnet_net_lock(LNET_LOCK_EX); + + list_for_each_entry(net, &the_lnet.ln_nets, net_list) { + struct lnet_ni *ni; + + if (nlist->lngl_net_id != LNET_NET_ANY && + nlist->lngl_net_id != net->net_id) + continue; + + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { + struct nlattr *local_ni, *ni_attr; + char *status = "up"; + + if (idx++ < nlist->lngl_idx) + continue; + + hdr = genlmsg_put(msg, portid, seq, &lnet_family, + NLM_F_MULTI, LNET_CMD_NETS); + if (!hdr) { + NL_SET_ERR_MSG(extack, "failed to send values"); + rc = -EMSGSIZE; + goto net_unlock; + } + + if (idx == 1) + nla_put_string(msg, LNET_NET_ATTR_HDR, ""); + + nla_put_string(msg, LNET_NET_ATTR_TYPE, + libcfs_net2str(net->net_id)); + found = true; + + local_ni = nla_nest_start(msg, LNET_NET_ATTR_LOCAL); + ni_attr = nla_nest_start(msg, idx - 1); + + lnet_ni_lock(ni); + nla_put_string(msg, LNET_NET_LOCAL_NI_ATTR_NID, + libcfs_nidstr(&ni->ni_nid)); + if (nid_is_lo0(&ni->ni_nid) && + *ni->ni_status != LNET_NI_STATUS_UP) + status = "down"; + nla_put_string(msg, LNET_NET_LOCAL_NI_ATTR_STATUS, "up"); + + if (!nid_is_lo0(&ni->ni_nid) && ni->ni_interface) { + struct nlattr *intf_nest, *intf_attr; + + intf_nest = nla_nest_start(msg, + LNET_NET_LOCAL_NI_ATTR_INTERFACE); + intf_attr = nla_nest_start(msg, 0); + nla_put_string(msg, + LNET_NET_LOCAL_NI_INTF_ATTR_TYPE, + ni->ni_interface); + nla_nest_end(msg, intf_attr); + nla_nest_end(msg, intf_nest); + } + + lnet_ni_unlock(ni); + nla_nest_end(msg, ni_attr); + nla_nest_end(msg, local_ni); + + genlmsg_end(msg, hdr); + } + } + + if (!found) { + struct nlmsghdr *nlh = nlmsg_hdr(msg); + + nlmsg_cancel(msg, nlh); + NL_SET_ERR_MSG(extack, "Network is down"); + rc = -ESRCH; + } +net_unlock: + lnet_net_unlock(LNET_LOCK_EX); +send_error: + nlist->lngl_idx = idx; + + return rc; +} + +static int lnet_genl_parse_tunables(struct nlattr *settings, + struct lnet_ioctl_config_lnd_tunables *tun) +{ + struct nlattr *param; + int rem, rc = 0; + + nla_for_each_nested(param, settings, rem) { + int type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_UNSPEC; + s64 num; + + if (nla_type(param) != LN_SCALAR_ATTR_VALUE) + continue; + + if (nla_strcmp(param, "peer_timeout") == 0) + type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT; + else if (nla_strcmp(param, "peer_credits") == 0) + type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS; + else if (nla_strcmp(param, "peer_buffer_credits") == 0) + type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS; + else if (nla_strcmp(param, "credits") == 0) + type = LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS; + + param = nla_next(param, &rem); + if (nla_type(param) != LN_SCALAR_ATTR_INT_VALUE) + return -EINVAL; + + num = nla_get_s64(param); + switch (type) { + case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_TIMEOUT: + tun->lt_cmn.lct_peer_timeout = num; + break; + case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_CREDITS: + tun->lt_cmn.lct_peer_tx_credits = num; + break; + case LNET_NET_LOCAL_NI_TUNABLES_ATTR_PEER_BUFFER_CREDITS: + tun->lt_cmn.lct_peer_rtr_credits = num; + break; + case LNET_NET_LOCAL_NI_TUNABLES_ATTR_CREDITS: + tun->lt_cmn.lct_max_tx_credits = num; + break; + default: + rc = -EINVAL; + break; + } + } + return rc; +} + +static int +lnet_genl_parse_lnd_tunables(struct nlattr *settings, + struct lnet_ioctl_config_lnd_tunables *tun, + const struct lnet_lnd *lnd) +{ + const struct ln_key_list *list = lnd->lnd_keys; + struct nlattr *param; + int rem, rc = 0; + int i = 1; + + if (!list) + return 0; + + if (!lnd->lnd_nl_set) + return -EOPNOTSUPP; + + if (!list->lkl_maxattr) + return -ERANGE; + + nla_for_each_nested(param, settings, rem) { + if (nla_type(param) != LN_SCALAR_ATTR_VALUE) + continue; + + for (i = 1; i <= list->lkl_maxattr; i++) { + if (!list->lkl_list[i].lkp_value || + nla_strcmp(param, list->lkl_list[i].lkp_value) != 0) + continue; + + param = nla_next(param, &rem); + rc = lnd->lnd_nl_set(LNET_CMD_NETS, param, i, tun); + if (rc < 0) + return rc; + } + } + + return rc; +} + +static int +lnet_genl_parse_local_ni(struct nlattr *entry, struct genl_info *info, + int net_id, struct lnet_ioctl_config_ni *conf, + struct lnet_ioctl_config_lnd_tunables *tun, + bool *ni_list) +{ + struct nlattr *settings; + int rem3, rc = 0; + + nla_for_each_nested(settings, entry, rem3) { + if (nla_type(settings) != LN_SCALAR_ATTR_VALUE) + continue; + + if (nla_strcmp(settings, "interfaces") == 0) { + struct nlattr *intf; + int rem4; + + settings = nla_next(settings, &rem3); + if (nla_type(settings) != + LN_SCALAR_ATTR_LIST) { + GENL_SET_ERR_MSG(info, + "invalid interfaces"); + rc = -EINVAL; + goto out; + } + + nla_for_each_nested(intf, settings, rem4) { + intf = nla_next(intf, &rem4); + if (nla_type(intf) != + LN_SCALAR_ATTR_VALUE) { + GENL_SET_ERR_MSG(info, + "0 key is invalid"); + rc = -EINVAL; + goto out; + } + + rc = nla_strlcpy(conf->lic_ni_intf, intf, + sizeof(conf->lic_ni_intf)); + if (rc < 0) { + GENL_SET_ERR_MSG(info, + "failed to parse interfaces"); + goto out; + } + } + *ni_list = true; + } else if (nla_strcmp(settings, "tunables") == 0) { + settings = nla_next(settings, &rem3); + if (nla_type(settings) != + LN_SCALAR_ATTR_LIST) { + GENL_SET_ERR_MSG(info, + "invalid tunables"); + rc = -EINVAL; + goto out; + } + + rc = lnet_genl_parse_tunables(settings, tun); + if (rc < 0) { + GENL_SET_ERR_MSG(info, + "failed to parse tunables"); + goto out; + } + } else if ((nla_strcmp(settings, "lnd tunables") == 0)) { + const struct lnet_lnd *lnd; + + lnd = lnet_load_lnd(LNET_NETTYP(net_id)); + if (IS_ERR(lnd)) { + GENL_SET_ERR_MSG(info, + "LND type not supported"); + rc = PTR_ERR(lnd); + goto out; + } + + settings = nla_next(settings, &rem3); + if (nla_type(settings) != + LN_SCALAR_ATTR_LIST) { + GENL_SET_ERR_MSG(info, + "lnd tunables should be list\n"); + rc = -EINVAL; + goto out; + } + + rc = lnet_genl_parse_lnd_tunables(settings, + tun, lnd); + if (rc < 0) { + GENL_SET_ERR_MSG(info, + "failed to parse lnd tunables"); + goto out; + } + } else if (nla_strcmp(settings, "CPT") == 0) { + struct nlattr *cpt; + int rem4; + + settings = nla_next(settings, &rem3); + if (nla_type(settings) != LN_SCALAR_ATTR_LIST) { + GENL_SET_ERR_MSG(info, + "CPT should be list"); + rc = -EINVAL; + goto out; + } + + nla_for_each_nested(cpt, settings, rem4) { + s64 core; + + if (nla_type(cpt) != + LN_SCALAR_ATTR_INT_VALUE) { + GENL_SET_ERR_MSG(info, + "invalid CPT config"); + rc = -EINVAL; + goto out; + } + + core = nla_get_s64(cpt); + if (core >= LNET_CPT_NUMBER) { + GENL_SET_ERR_MSG(info, + "invalid CPT value"); + rc = -ERANGE; + goto out; + } + + conf->lic_cpts[conf->lic_ncpts] = core; + conf->lic_ncpts++; + } + } + } +out: + return rc; +} + +static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info) +{ + struct nlmsghdr *nlh = nlmsg_hdr(skb); + struct genlmsghdr *gnlh = nlmsg_data(nlh); + struct nlattr *params = genlmsg_data(gnlh); + int msg_len, rem, rc = 0; + struct nlattr *attr; + + msg_len = genlmsg_len(gnlh); + if (!msg_len) { + GENL_SET_ERR_MSG(info, "no configuration"); + return -ENOMSG; + } + + nla_for_each_attr(attr, params, msg_len, rem) { + struct lnet_ioctl_config_ni conf; + u32 net_id = LNET_NET_ANY; + struct nlattr *entry; + bool ni_list = false; + int rem2; + + if (nla_type(attr) != LN_SCALAR_ATTR_LIST) + continue; + + nla_for_each_nested(entry, attr, rem2) { + switch (nla_type(entry)) { + case LN_SCALAR_ATTR_VALUE: { + ssize_t len; + + memset(&conf, 0, sizeof(conf)); + if (nla_strcmp(entry, "ip2net") == 0) { + entry = nla_next(entry, &rem2); + if (nla_type(entry) != + LN_SCALAR_ATTR_VALUE) { + GENL_SET_ERR_MSG(info, + "ip2net has invalid key"); + rc = -EINVAL; + goto out; + } + + len = nla_strlcpy(conf.lic_legacy_ip2nets, + entry, + sizeof(conf.lic_legacy_ip2nets)); + if (len < 0) { + GENL_SET_ERR_MSG(info, + "ip2net key string is invalid"); + rc = len; + goto out; + } + ni_list = true; + } else if (nla_strcmp(entry, "net type") == 0) { + char tmp[LNET_NIDSTR_SIZE]; + + entry = nla_next(entry, &rem2); + if (nla_type(entry) != + LN_SCALAR_ATTR_VALUE) { + GENL_SET_ERR_MSG(info, + "net type has invalid key"); + rc = -EINVAL; + goto out; + } + + len = nla_strlcpy(tmp, entry, + sizeof(tmp)); + if (len < 0) { + GENL_SET_ERR_MSG(info, + "net type key string is invalid"); + rc = len; + goto out; + } + + net_id = libcfs_str2net(tmp); + if (!net_id) { + GENL_SET_ERR_MSG(info, + "cannot parse net"); + rc = -ENODEV; + goto out; + } + if (LNET_NETTYP(net_id) == LOLND) { + GENL_SET_ERR_MSG(info, + "setting @lo not allowed"); + rc = -ENODEV; + goto out; + } + conf.lic_legacy_ip2nets[0] = '\0'; + conf.lic_ni_intf[0] = '\0'; + ni_list = false; + } + if (rc < 0) + goto out; + break; + } + case LN_SCALAR_ATTR_LIST: { + bool create = info->nlhdr->nlmsg_flags & + NLM_F_CREATE; + struct lnet_ioctl_config_lnd_tunables tun; + + memset(&tun, 0, sizeof(tun)); + tun.lt_cmn.lct_peer_timeout = -1; + conf.lic_ncpts = 0; + + rc = lnet_genl_parse_local_ni(entry, info, + net_id, &conf, + &tun, &ni_list); + if (rc < 0) + goto out; + + if (!create) { + struct lnet_net *net; + struct lnet_ni *ni; + + rc = -ENODEV; + if (!strlen(conf.lic_ni_intf)) { + GENL_SET_ERR_MSG(info, + "interface is missing"); + goto out; + } + + lnet_net_lock(LNET_LOCK_EX); + net = lnet_get_net_locked(net_id); + if (!net) { + GENL_SET_ERR_MSG(info, + "LNet net doesn't exist"); + goto out; + } + list_for_each_entry(ni, &net->net_ni_list, + ni_netlist) { + if (!ni->ni_interface || + strncmp(ni->ni_interface, + conf.lic_ni_intf, + strlen(conf.lic_ni_intf)) != 0) { + ni = NULL; + continue; + } + + lnet_net_unlock(LNET_LOCK_EX); + rc = lnet_dyn_del_ni(&ni->ni_nid); + lnet_net_lock(LNET_LOCK_EX); + if (rc < 0) { + GENL_SET_ERR_MSG(info, + "cannot del LNet NI"); + goto out; + } + break; + } + + lnet_net_unlock(LNET_LOCK_EX); + } else { + rc = lnet_dyn_add_ni(&conf, net_id, &tun); + switch (rc) { + case -ENOENT: + GENL_SET_ERR_MSG(info, + "cannot parse net"); + break; + case -ERANGE: + GENL_SET_ERR_MSG(info, + "invalid CPT set"); + fallthrough; + default: + GENL_SET_ERR_MSG(info, + "cannot add LNet NI"); + case 0: + break; + } + if (rc < 0) + goto out; + } + break; + } + /* it is possible a newer version of the user land send + * values older kernels doesn't handle. So silently + * ignore these values + */ + default: + break; + } + } + + /* Handle case of just sent NET with no list of NIDs */ + if (!(info->nlhdr->nlmsg_flags & NLM_F_CREATE) && !ni_list) { + rc = lnet_dyn_del_net(net_id); + if (rc < 0) { + GENL_SET_ERR_MSG(info, + "cannot del network"); + } + } + } +out: + return rc; +} + +static const struct genl_multicast_group lnet_mcast_grps[] = { + { .name = "ip2net", }, + { .name = "net", }, +}; + +static const struct genl_ops lnet_genl_ops[] = { + { + .cmd = LNET_CMD_NETS, + .start = lnet_net_show_start, + .dumpit = lnet_net_show_dump, + .done = lnet_net_show_done, + .doit = lnet_net_cmd, + }, +}; + +static struct genl_family lnet_family = { + .name = LNET_GENL_NAME, + .version = LNET_GENL_VERSION, + .module = THIS_MODULE, + .netnsok = true, + .ops = lnet_genl_ops, + .n_ops = ARRAY_SIZE(lnet_genl_ops), + .mcgrps = lnet_mcast_grps, + .n_mcgrps = ARRAY_SIZE(lnet_mcast_grps), +}; + void LNetDebugPeer(struct lnet_processid *id) { lnet_debug_peer(lnet_nid_to_nid4(&id->nid)); diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index cebc725..4b2d776 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -367,8 +367,7 @@ struct lnet_net * return net; } -static int -lnet_ni_add_interface(struct lnet_ni *ni, char *iface) +int lnet_ni_add_interface(struct lnet_ni *ni, char *iface) { if (!ni) return -ENOMEM; @@ -395,6 +394,7 @@ struct lnet_net * return 0; } +EXPORT_SYMBOL(lnet_ni_add_interface); static struct lnet_ni * lnet_ni_alloc_common(struct lnet_net *net, char *iface) diff --git a/net/lnet/lnet/module.c b/net/lnet/lnet/module.c index 9d7b39a..6e41e4b 100644 --- a/net/lnet/lnet/module.c +++ b/net/lnet/lnet/module.c @@ -41,8 +41,7 @@ static DEFINE_MUTEX(lnet_config_mutex); -static int -lnet_configure(void *arg) +int lnet_configure(void *arg) { /* 'arg' only there so I can be passed to cfs_create_thread() */ int rc = 0; @@ -68,8 +67,7 @@ return rc; } -static int -lnet_unconfigure(void) +int lnet_unconfigure(void) { int refcount; @@ -134,16 +132,26 @@ { struct lnet_ioctl_config_ni *conf = (struct lnet_ioctl_config_ni *)hdr; - int rc; + int rc = -EINVAL; if (conf->lic_cfg_hdr.ioc_len < sizeof(*conf)) - return -EINVAL; + return rc; mutex_lock(&lnet_config_mutex); - if (the_lnet.ln_niinit_self) - rc = lnet_dyn_add_ni(conf); - else - rc = -EINVAL; + if (the_lnet.ln_niinit_self) { + struct lnet_ioctl_config_lnd_tunables *tun = NULL; + struct lnet_nid nid; + u32 net_id; + + /* get the tunables if they are available */ + if (conf->lic_cfg_hdr.ioc_len >= + sizeof(*conf) + sizeof(*tun)) + tun = (struct lnet_ioctl_config_lnd_tunables *) conf->lic_bulk; + + lnet_nid4_to_nid(conf->lic_nid, &nid); + net_id = LNET_NID_NET(&nid); + rc = lnet_dyn_add_ni(conf, net_id, tun); + } mutex_unlock(&lnet_config_mutex); return rc; @@ -154,16 +162,16 @@ { struct lnet_ioctl_config_ni *conf = (struct lnet_ioctl_config_ni *)hdr; - int rc; + struct lnet_nid nid; + int rc = EINVAL; - if (conf->lic_cfg_hdr.ioc_len < sizeof(*conf)) - return -EINVAL; + if (conf->lic_cfg_hdr.ioc_len < sizeof(*conf) || + !the_lnet.ln_niinit_self) + return rc; + lnet_nid4_to_nid(conf->lic_nid, &nid); mutex_lock(&lnet_config_mutex); - if (the_lnet.ln_niinit_self) - rc = lnet_dyn_del_ni(conf); - else - rc = -EINVAL; + rc = lnet_dyn_del_ni(&nid); mutex_unlock(&lnet_config_mutex); return rc; From patchwork Sun Nov 20 14:16:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2B02AC43217 for ; Sun, 20 Nov 2022 14:28:27 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXhm0C8vz215y; Sun, 20 Nov 2022 06:19:36 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXg63Nndz1yCf for ; Sun, 20 Nov 2022 06:18:10 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C44D41007B78; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BE863E8B89; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:53 -0500 Message-Id: <1668953828-10909-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/22] lustre: obdclass: improve precision of wakeups for mod_rpcs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown There is a limit of the number of in-flight mod rpcs with a complication that a 'close' rpc is always permitted if there are no other close rpcs in flight, even if that would exceed the limit. When a non-close-request complete, we just wake the first waiting request and assume it will use the slot we released. When a close-request completes, the first waiting request may not find a slot if the close was using the 'extra' slot. So in that case we wake all waiting requests and let them fit it out. This is wasteful and unfair. To correct this we revise the wait/wake approach to use a dedicated wakeup function which atomically checks if a given task can proceed, and updates the counters when permission to proceed is given. This means that once a task has been woken, it has already been accounted and it can proceed. To minimise locking, cl_mod_rpcs_lock is discarded and cl_mod_rpcs_waitq.lock is used to protect the counters. For the fast-path where the max has not been reached, this means we take and release that spinlock just once. We call wake_up_locked while still holding the lock, and if that woke the process, then we don't drop the spinlock to wait, but proceed directly to the remainder of the task. When the last 'close' rpc completes, the wake function will iterate the whole wait queue until it finds a task waiting to submit a close request. When any other rpc completes, the queue will only be searched until the maximum is reached. WC-bug-id: https://jira.whamcloud.com/browse/LU-15947 Lustre-commit: 5243630b09d22e0b5 ("LU-15947 obdclass: improve precision of wakeups for mod_rpcs") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44041 Reviewed-by: James Simmons Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 1 - fs/lustre/ldlm/ldlm_lib.c | 1 - fs/lustre/obdclass/genops.c | 158 ++++++++++++++++++++++++-------------------- 3 files changed, 88 insertions(+), 72 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 16f66ea..56e5641 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -326,7 +326,6 @@ struct client_obd { /* modify rpcs in flight * currently used for metadata only */ - spinlock_t cl_mod_rpcs_lock; u16 cl_max_mod_rpcs_in_flight; u16 cl_mod_rpcs_in_flight; u16 cl_close_rpcs_in_flight; diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 08aff4f..e4262c3 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -444,7 +444,6 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg) else cli->cl_max_rpcs_in_flight = OBD_MAX_RIF_DEFAULT; - spin_lock_init(&cli->cl_mod_rpcs_lock); spin_lock_init(&cli->cl_mod_rpcs_hist.oh_lock); cli->cl_max_mod_rpcs_in_flight = 0; cli->cl_mod_rpcs_in_flight = 0; diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c index 2031320..6e4d240 100644 --- a/fs/lustre/obdclass/genops.c +++ b/fs/lustre/obdclass/genops.c @@ -1426,16 +1426,16 @@ int obd_set_max_mod_rpcs_in_flight(struct client_obd *cli, u16 max) return -ERANGE; } - spin_lock(&cli->cl_mod_rpcs_lock); + spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock); prev = cli->cl_max_mod_rpcs_in_flight; cli->cl_max_mod_rpcs_in_flight = max; /* wakeup waiters if limit has been increased */ if (cli->cl_max_mod_rpcs_in_flight > prev) - wake_up(&cli->cl_mod_rpcs_waitq); + wake_up_locked(&cli->cl_mod_rpcs_waitq); - spin_unlock(&cli->cl_mod_rpcs_lock); + spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock); return 0; } @@ -1446,7 +1446,7 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq) unsigned long mod_tot = 0, mod_cum; int i; - spin_lock(&cli->cl_mod_rpcs_lock); + spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock); lprocfs_stats_header(seq, ktime_get(), cli->cl_mod_rpcs_init, 25, ":", true); seq_printf(seq, "modify_RPCs_in_flight: %hu\n", @@ -1469,13 +1469,13 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq) break; } - spin_unlock(&cli->cl_mod_rpcs_lock); + spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock); return 0; } EXPORT_SYMBOL(obd_mod_rpc_stats_seq_show); -/* - * The number of modify RPCs sent in parallel is limited + +/* The number of modify RPCs sent in parallel is limited * because the server has a finite number of slots per client to * store request result and ensure reply reconstruction when needed. * On the client, this limit is stored in cl_max_mod_rpcs_in_flight @@ -1484,34 +1484,55 @@ int obd_mod_rpc_stats_seq_show(struct client_obd *cli, struct seq_file *seq) * On the MDC client, to avoid a potential deadlock (see Bugzilla 3462), * one close request is allowed above the maximum. */ -static inline bool obd_mod_rpc_slot_avail_locked(struct client_obd *cli, - bool close_req) +struct mod_waiter { + struct client_obd *cli; + bool close_req; + wait_queue_entry_t wqe; +}; +static int claim_mod_rpc_function(wait_queue_entry_t *wq_entry, + unsigned int mode, int flags, void *key) { + struct mod_waiter *w = container_of(wq_entry, struct mod_waiter, wqe); + struct client_obd *cli = w->cli; + bool close_req = w->close_req; bool avail; + int ret; + + /* As woken_wake_function() doesn't remove us from the wait_queue, + * we could get called twice for the same thread - take care. + */ + if (wq_entry->flags & WQ_FLAG_WOKEN) + /* Already woke this thread, don't try again */ + return 0; /* A slot is available if * - number of modify RPCs in flight is less than the max * - it's a close RPC and no other close request is in flight */ avail = cli->cl_mod_rpcs_in_flight < cli->cl_max_mod_rpcs_in_flight || - (close_req && !cli->cl_close_rpcs_in_flight); - - return avail; -} - -static inline bool obd_mod_rpc_slot_avail(struct client_obd *cli, - bool close_req) -{ - bool avail; - - spin_lock(&cli->cl_mod_rpcs_lock); - avail = obd_mod_rpc_slot_avail_locked(cli, close_req); - spin_unlock(&cli->cl_mod_rpcs_lock); - return avail; + (close_req && cli->cl_close_rpcs_in_flight == 0); + if (avail) { + cli->cl_mod_rpcs_in_flight++; + if (w->close_req) + cli->cl_close_rpcs_in_flight++; + ret = woken_wake_function(wq_entry, mode, flags, key); + } else if (cli->cl_close_rpcs_in_flight) + /* No other waiter could be woken */ + ret = -1; + else if (!key) + /* This was not a wakeup from a close completion, so there is no + * point seeing if there are close waiters to be woken + */ + ret = -1; + else + /* There might be a close so we could wake, keep looking */ + ret = 0; + return ret; } /* Get a modify RPC slot from the obd client @cli according - * to the kind of operation @opc that is going to be sent. + * to the kind of operation @opc that is going to be sent + * and the intent @it of the operation if it applies. * If the maximum number of modify RPCs in flight is reached * the thread is put to sleep. * Returns the tag to be set in the request message. Tag 0 @@ -1519,51 +1540,51 @@ static inline bool obd_mod_rpc_slot_avail(struct client_obd *cli, */ u16 obd_get_mod_rpc_slot(struct client_obd *cli, u32 opc) { - bool close_req = false; + struct mod_waiter wait = { + .cli = cli, + .close_req = (opc == MDS_CLOSE), + }; u16 i, max; - if (opc == MDS_CLOSE) - close_req = true; - - do { - spin_lock(&cli->cl_mod_rpcs_lock); - max = cli->cl_max_mod_rpcs_in_flight; - if (obd_mod_rpc_slot_avail_locked(cli, close_req)) { - /* there is a slot available */ - cli->cl_mod_rpcs_in_flight++; - if (close_req) - cli->cl_close_rpcs_in_flight++; - lprocfs_oh_tally(&cli->cl_mod_rpcs_hist, - cli->cl_mod_rpcs_in_flight); - /* find a free tag */ - i = find_first_zero_bit(cli->cl_mod_tag_bitmap, - max + 1); - LASSERT(i < OBD_MAX_RIF_MAX); - LASSERT(!test_and_set_bit(i, cli->cl_mod_tag_bitmap)); - spin_unlock(&cli->cl_mod_rpcs_lock); - /* tag 0 is reserved for non-modify RPCs */ - - CDEBUG(D_RPCTRACE, - "%s: modify RPC slot %u is allocated opc %u, max %hu\n", - cli->cl_import->imp_obd->obd_name, - i + 1, opc, max); - - return i + 1; - } - spin_unlock(&cli->cl_mod_rpcs_lock); - - CDEBUG(D_RPCTRACE, "%s: sleeping for a modify RPC slot opc %u, max %hu\n", - cli->cl_import->imp_obd->obd_name, opc, max); + init_wait(&wait.wqe); + wait.wqe.func = claim_mod_rpc_function; - wait_event_idle_exclusive(cli->cl_mod_rpcs_waitq, - obd_mod_rpc_slot_avail(cli, - close_req)); - } while (true); + spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock); + __add_wait_queue(&cli->cl_mod_rpcs_waitq, &wait.wqe); + /* This wakeup will only succeed if the maximums haven't + * been reached. If that happens, WQ_FLAG_WOKEN will be cleared + * and there will be no need to wait. + */ + wake_up_locked(&cli->cl_mod_rpcs_waitq); + if (!(wait.wqe.flags & WQ_FLAG_WOKEN)) { + spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock); + wait_woken(&wait.wqe, TASK_UNINTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock); + } + __remove_wait_queue(&cli->cl_mod_rpcs_waitq, &wait.wqe); + + max = cli->cl_max_mod_rpcs_in_flight; + lprocfs_oh_tally(&cli->cl_mod_rpcs_hist, + cli->cl_mod_rpcs_in_flight); + /* find a free tag */ + i = find_first_zero_bit(cli->cl_mod_tag_bitmap, + max + 1); + LASSERT(i < OBD_MAX_RIF_MAX); + LASSERT(!test_and_set_bit(i, cli->cl_mod_tag_bitmap)); + spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock); + /* tag 0 is reserved for non-modify RPCs */ + + CDEBUG(D_RPCTRACE, + "%s: modify RPC slot %u is allocated opc %u, max %hu\n", + cli->cl_import->imp_obd->obd_name, + i + 1, opc, max); + + return i + 1; } EXPORT_SYMBOL(obd_get_mod_rpc_slot); -/* - * Put a modify RPC slot from the obd client @cli according +/* Put a modify RPC slot from the obd client @cli according * to the kind of operation @opc that has been sent. */ void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, u16 tag) @@ -1576,18 +1597,15 @@ void obd_put_mod_rpc_slot(struct client_obd *cli, u32 opc, u16 tag) if (opc == MDS_CLOSE) close_req = true; - spin_lock(&cli->cl_mod_rpcs_lock); + spin_lock_irq(&cli->cl_mod_rpcs_waitq.lock); cli->cl_mod_rpcs_in_flight--; if (close_req) cli->cl_close_rpcs_in_flight--; /* release the tag in the bitmap */ LASSERT(tag - 1 < OBD_MAX_RIF_MAX); LASSERT(test_and_clear_bit(tag - 1, cli->cl_mod_tag_bitmap) != 0); - spin_unlock(&cli->cl_mod_rpcs_lock); - /* LU-14741 - to prevent close RPCs stuck behind normal ones */ - if (close_req) - wake_up_all(&cli->cl_mod_rpcs_waitq); - else - wake_up(&cli->cl_mod_rpcs_waitq); + __wake_up_locked_key(&cli->cl_mod_rpcs_waitq, TASK_NORMAL, + (void *)close_req); + spin_unlock_irq(&cli->cl_mod_rpcs_waitq.lock); } EXPORT_SYMBOL(obd_put_mod_rpc_slot); From patchwork Sun Nov 20 14:16:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DD00CC433FE for ; Sun, 20 Nov 2022 14:26:55 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXhK33FPz1yGl; Sun, 20 Nov 2022 06:19:13 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXgd4TbJz1yD3 for ; Sun, 20 Nov 2022 06:18:37 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C56911008049; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C1A0BE8BBF; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:54 -0500 Message-Id: <1668953828-10909-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/22] lnet: allow ping packet to contain large nids X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown The ping packet has an array of fixed-size status entries that only have room for a 4-byte-address nid. This patches adds a feature flag which activates a list of variable sized entries after the initial array. Each entry contains a 4-byte status and then a nid, rounded to a multiple of 4 bytes. The total number of bytes of the ping_info (header, first array, subsequent list) is stored in the ns_unused field of the first entry in the array. The user-space interfaces only see the initial array. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: db0fb8f2b649c0c38 ("LU-10391 lnet: allow ping packet to contain large nids") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44628 Tested-by: James Simmons Reviewed-by: James Simmons Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 39 +++++++++++ include/uapi/linux/lnet/lnet-idl.h | 58 +++++++++++----- net/lnet/lnet/api-ni.c | 131 +++++++++++++++++++++++-------------- net/lnet/lnet/lib-msg.c | 2 +- 4 files changed, 165 insertions(+), 65 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 2d3b044..73d962f 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -684,6 +684,45 @@ struct lnet_ping_buffer { #define LNET_PING_INFO_TO_BUFFER(PINFO) \ container_of((PINFO), struct lnet_ping_buffer, pb_info) +static inline int +lnet_ping_sts_size(const struct lnet_nid *nid) +{ + int size; + + if (nid_is_nid4(nid)) + return sizeof(struct lnet_ni_status); + + size = offsetof(struct lnet_ni_large_status, ns_nid) + + NID_BYTES(nid); + + return round_up(size, 4); +} + +static inline struct lnet_ni_large_status * +lnet_ping_sts_next(const struct lnet_ni_large_status *nis) +{ + return (void *)nis + lnet_ping_sts_size(&nis->ns_nid); +} + +static inline bool +lnet_ping_at_least_two_entries(const struct lnet_ping_info *pi) +{ + /* Return true if we have at lease two entries. There is always a + * least one, a 4-byte lo0 interface. + */ + struct lnet_ni_large_status *lns; + + if ((pi->pi_features & LNET_PING_FEAT_LARGE_ADDR) == 0) + return pi->pi_nnis <= 2; + /* There is at least 1 large-address entry */ + if (pi->pi_nnis != 1) + return false; + lns = (void *)&pi->pi_ni[1]; + lns = lnet_ping_sts_next(lns); + + return ((void *)pi + lnet_ping_info_size(pi) <= (void *)lns); +} + struct lnet_nid_list { struct list_head nl_list; struct lnet_nid nl_nid; diff --git a/include/uapi/linux/lnet/lnet-idl.h b/include/uapi/linux/lnet/lnet-idl.h index 41bbb40..479e7fa 100644 --- a/include/uapi/linux/lnet/lnet-idl.h +++ b/include/uapi/linux/lnet/lnet-idl.h @@ -247,7 +247,6 @@ struct lnet_counters_common { __u64 lcc_drop_length; } __attribute__((packed)); - #define LNET_NI_STATUS_UP 0x15aac0de #define LNET_NI_STATUS_DOWN 0xdeadface #define LNET_NI_STATUS_INVALID 0x00000000 @@ -255,19 +254,32 @@ struct lnet_counters_common { struct lnet_ni_status { lnet_nid_t ns_nid; __u32 ns_status; - __u32 ns_unused; + __u32 ns_msg_size; /* represents ping buffer size if message + * contains large NID addresses. + */ } __attribute__((packed)); -/* - * NB: value of these features equal to LNET_PROTO_PING_VERSION_x +/* When this appears in lnet_ping_info, it will be large + * enough to hold whatever nid is present, rounded up + * to a multiple of 4 bytes. + * NOTE: all users MUST check ns_nid.nid_size is usable. + */ +struct lnet_ni_large_status { + __u32 ns_status; + struct lnet_nid ns_nid; +} __attribute__((packed)); + +/* NB: value of these features equal to LNET_PROTO_PING_VERSION_x * of old LNet, so there shouldn't be any compatibility issue */ #define LNET_PING_FEAT_INVAL (0) /* no feature */ #define LNET_PING_FEAT_BASE (1 << 0) /* just a ping */ #define LNET_PING_FEAT_NI_STATUS (1 << 1) /* return NI status */ -#define LNET_PING_FEAT_RTE_DISABLED (1 << 2) /* Routing enabled */ -#define LNET_PING_FEAT_MULTI_RAIL (1 << 3) /* Multi-Rail aware */ +#define LNET_PING_FEAT_RTE_DISABLED (1 << 2) /* Routing enabled */ +#define LNET_PING_FEAT_MULTI_RAIL (1 << 3) /* Multi-Rail aware */ #define LNET_PING_FEAT_DISCOVERY (1 << 4) /* Supports Discovery */ +#define LNET_PING_FEAT_LARGE_ADDR (1 << 5) /* Large addr nids present */ +#define LNET_PING_FEAT_PRIMARY_LARGE (1 << 6) /* Primary is first Large addr */ /* * All ping feature bits fit to hit the wire. @@ -277,17 +289,26 @@ struct lnet_ni_status { * New feature bits can be added, just be aware that this does change the * over-the-wire protocol. */ -#define LNET_PING_FEAT_BITS (LNET_PING_FEAT_BASE | \ - LNET_PING_FEAT_NI_STATUS | \ - LNET_PING_FEAT_RTE_DISABLED | \ - LNET_PING_FEAT_MULTI_RAIL | \ - LNET_PING_FEAT_DISCOVERY) - +#define LNET_PING_FEAT_BITS (LNET_PING_FEAT_BASE | \ + LNET_PING_FEAT_NI_STATUS | \ + LNET_PING_FEAT_RTE_DISABLED | \ + LNET_PING_FEAT_MULTI_RAIL | \ + LNET_PING_FEAT_DISCOVERY | \ + LNET_PING_FEAT_LARGE_ADDR | \ + LNET_PING_FEAT_PRIMARY_LARGE) + +/* NOTE: + * The first address in pi_ni *must* be the loop-back nid: LNET_NID_LO_0 + * The second address must be the primary nid for the host unless + * LNET_PING_FEAT_PRIMARY_LARGE is set, then the first large address + * is the preferred primary. However nodes that do not recognise that + * flag will quietly ignore it. + */ struct lnet_ping_info { __u32 pi_magic; __u32 pi_features; lnet_pid_t pi_pid; - __u32 pi_nnis; + __u32 pi_nnis; /* number of nid4 entries */ struct lnet_ni_status pi_ni[0]; } __attribute__((packed)); @@ -297,7 +318,14 @@ struct lnet_ping_info { offsetof(struct lnet_ping_info, pi_ni[LNET_INTERFACES_MIN]) #define LNET_PING_INFO_LONI(PINFO) ((PINFO)->pi_ni[0].ns_nid) #define LNET_PING_INFO_SEQNO(PINFO) ((PINFO)->pi_ni[0].ns_status) -#define lnet_ping_info_size(pinfo) \ - offsetof(struct lnet_ping_info, pi_ni[(pinfo)->pi_nnis]) +/* If LNET_PING_FEAT_LARGE_ADDR set, pi_nnis is the number of nid4 entries + * and pi_ni[0].ns_msg_size is the total number of bytes, including header and + * lnet_ni_large_status entries which follow the lnet_ni_status entries. + * This must be a multiple of 4. + */ +#define lnet_ping_info_size(pinfo) \ + (((pinfo)->pi_features & LNET_PING_FEAT_LARGE_ADDR) \ + ? ((pinfo)->pi_ni[0].ns_msg_size & ~3) \ + : offsetof(struct lnet_ping_info, pi_ni[(pinfo)->pi_nnis])) #endif diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index af875ba..935c848 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -823,8 +823,15 @@ static void lnet_assert_wire_constants(void) BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_nid) != 8); BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_status) != 8); BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_status) != 4); - BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_unused) != 12); - BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_unused) != 4); + BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_msg_size) != 12); + BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_msg_size) != 4); + + /* Checks for struct lnet_ni_large_status */ + BUILD_BUG_ON((int)sizeof(struct lnet_ni_large_status) != 24); + BUILD_BUG_ON((int)offsetof(struct lnet_ni_large_status, ns_status) != 0); + BUILD_BUG_ON((int)sizeof(((struct lnet_ni_large_status *)0)->ns_status) != 4); + BUILD_BUG_ON((int)offsetof(struct lnet_ni_large_status, ns_nid) != 4); + BUILD_BUG_ON((int)sizeof(((struct lnet_ni_large_status *)0)->ns_nid) != 20); /* Checks for struct lnet_ping_info and related constants */ BUILD_BUG_ON(LNET_PROTO_PING_MAGIC != 0x70696E67); @@ -834,7 +841,9 @@ static void lnet_assert_wire_constants(void) BUILD_BUG_ON(LNET_PING_FEAT_RTE_DISABLED != 4); BUILD_BUG_ON(LNET_PING_FEAT_MULTI_RAIL != 8); BUILD_BUG_ON(LNET_PING_FEAT_DISCOVERY != 16); - BUILD_BUG_ON(LNET_PING_FEAT_BITS != 31); + BUILD_BUG_ON(LNET_PING_FEAT_LARGE_ADDR != 32); + BUILD_BUG_ON(LNET_PING_FEAT_PRIMARY_LARGE != 64); + BUILD_BUG_ON(LNET_PING_FEAT_BITS != 127); /* Checks for struct lnet_ping_info */ BUILD_BUG_ON((int)sizeof(struct lnet_ping_info) != 16); @@ -1770,21 +1779,7 @@ struct lnet_ping_buffer * int bytes = 0; list_for_each_entry(ni, &net->net_ni_list, ni_netlist) - if (nid_is_nid4(&ni->ni_nid)) - bytes += sizeof(struct lnet_ni_status); - - return bytes; -} - -static inline int -lnet_get_net_ni_bytes_pre(struct lnet_net *net) -{ - struct lnet_ni *ni; - int bytes = 0; - - list_for_each_entry(ni, &net->net_ni_added, ni_netlist) - if (nid_is_nid4(&ni->ni_nid)) - bytes += sizeof(struct lnet_ni_status); + bytes += lnet_ping_sts_size(&ni->ni_nid); return bytes; } @@ -1800,9 +1795,7 @@ struct lnet_ping_buffer * list_for_each_entry(net, &the_lnet.ln_nets, net_list) { list_for_each_entry(ni, &net->net_ni_list, ni_netlist) - if (nid_is_nid4(&ni->ni_nid)) - bytes += sizeof(struct lnet_ni_status); - + bytes += lnet_ping_sts_size(&ni->ni_nid); } lnet_net_unlock(0); @@ -1813,6 +1806,7 @@ struct lnet_ping_buffer * void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf) { + struct lnet_ni_large_status *lstat, *lend; struct lnet_ni_status *stat, *end; int nnis; int i; @@ -1827,6 +1821,19 @@ struct lnet_ping_buffer * for (i = 0; i < nnis && stat + 1 <= end; i++, stat++) { __swab64s(&stat->ns_nid); __swab32s(&stat->ns_status); + if (i == 0) + /* Might be total size */ + __swab32s(&stat->ns_msg_size); + } + if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_LARGE_ADDR)) + return; + + lstat = (struct lnet_ni_large_status *)stat; + lend = (void *)end; + while (lstat + 1 <= lend) { + __swab32s(&lstat->ns_status); + /* struct lnet_nid never needs to be swabed */ + lstat = lnet_ping_sts_next(lstat); } } @@ -1954,6 +1961,7 @@ struct lnet_ping_buffer * static void lnet_ping_target_install_locked(struct lnet_ping_buffer *pbuf) { + struct lnet_ni_large_status *lns, *lend; struct lnet_ni_status *ns, *end; struct lnet_ni *ni; struct lnet_net *net; @@ -1964,8 +1972,14 @@ struct lnet_ping_buffer * end = (void *)&pbuf->pb_info + pbuf->pb_nbytes; list_for_each_entry(net, &the_lnet.ln_nets, net_list) { list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { - if (!nid_is_nid4(&ni->ni_nid)) + if (!nid_is_nid4(&ni->ni_nid)) { + if (ns == &pbuf->pb_info.pi_ni[1]) { + /* This is primary, and it is long */ + pbuf->pb_info.pi_features |= + LNET_PING_FEAT_PRIMARY_LARGE; + } continue; + } LASSERT(ns + 1 <= end); ns->ns_nid = lnet_nid_to_nid4(&ni->ni_nid); @@ -1979,6 +1993,31 @@ struct lnet_ping_buffer * } } + lns = (void *)ns; + lend = (void *)end; + list_for_each_entry(net, &the_lnet.ln_nets, net_list) { + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { + if (nid_is_nid4(&ni->ni_nid)) + continue; + LASSERT(lns + 1 <= lend); + + lns->ns_nid = ni->ni_nid; + + lnet_ni_lock(ni); + ns->ns_status = lnet_ni_get_status_locked(ni); + ni->ni_status = &lns->ns_status; + lnet_ni_unlock(ni); + + lns = lnet_ping_sts_next(lns); + } + } + if ((void *)lns > (void *)ns) { + /* Record total info size */ + pbuf->pb_info.pi_ni[0].ns_msg_size = + (void *)lns - (void *)&pbuf->pb_info; + pbuf->pb_info.pi_features |= LNET_PING_FEAT_LARGE_ADDR; + } + /* We (ab)use the ns_status of the loopback interface to * transmit the sequence number. The first interface listed * must be the loopback interface. @@ -3397,7 +3436,6 @@ static int lnet_add_net_common(struct lnet_net *net, struct lnet_ping_buffer *pbuf; struct lnet_remotenet *rnet; struct lnet_ni *ni; - int net_ni_bytes; u32 net_id; int rc; @@ -3415,39 +3453,32 @@ static int lnet_add_net_common(struct lnet_net *net, return -EUSERS; } - /* - * make sure you calculate the correct number of slots in the ping + if (tun) + memcpy(&net->net_tunables, + &tun->lt_cmn, sizeof(net->net_tunables)); + else + memset(&net->net_tunables, -1, sizeof(net->net_tunables)); + + net_id = net->net_id; + + rc = lnet_startup_lndnet(net, (tun ? &tun->lt_tun : NULL)); + if (rc < 0) + return rc; + + /* make sure you calculate the correct number of slots in the ping * buffer. Since the ping info is a flattened list of all the NIs, * we should allocate enough slots to accomodate the number of NIs * which will be added. - * - * since ni hasn't been configured yet, use - * lnet_get_net_ni_bytes_pre() which checks the net_ni_added list */ - net_ni_bytes = lnet_get_net_ni_bytes_pre(net); - rc = lnet_ping_target_setup(&pbuf, &ping_mdh, LNET_PING_INFO_HDR_SIZE + - net_ni_bytes + lnet_get_ni_bytes(), + lnet_get_ni_bytes(), false); if (rc < 0) { - lnet_net_free(net); + lnet_shutdown_lndnet(net); return rc; } - if (tun) - memcpy(&net->net_tunables, - &tun->lt_cmn, sizeof(net->net_tunables)); - else - memset(&net->net_tunables, -1, sizeof(net->net_tunables)); - - net_id = net->net_id; - - rc = lnet_startup_lndnet(net, (tun ? - &tun->lt_tun : NULL)); - if (rc < 0) - goto failed; - lnet_net_lock(LNET_LOCK_EX); net = lnet_get_net_locked(net_id); LASSERT(net); @@ -3678,7 +3709,7 @@ int lnet_dyn_del_ni(struct lnet_nid *nid) rc = lnet_ping_target_setup(&pbuf, &ping_mdh, (LNET_PING_INFO_HDR_SIZE + lnet_get_ni_bytes() - - sizeof(pbuf->pb_info.pi_ni[0])), + lnet_ping_sts_size(&ni->ni_nid)), false); if (rc != 0) goto unlock_api_mutex; @@ -5428,10 +5459,12 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid, goto fail_ping_buffer_decref; } - /* Test if smaller than lnet_pinginfo with no pi_ni status info */ - if (nob < LNET_PING_INFO_HDR_SIZE) { + /* Test if smaller than lnet_pinginfo with just one pi_ni status info. + * That one might contain size when large nids are used. + */ + if (nob < LNET_PING_INFO_SIZE(1)) { CERROR("%s: Short reply %d(%lu min)\n", - libcfs_idstr(&id), nob, LNET_PING_INFO_HDR_SIZE); + libcfs_idstr(&id), nob, LNET_PING_INFO_SIZE(1)); goto fail_ping_buffer_decref; } diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 9fb001e..898d867 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -831,7 +831,7 @@ * I only have a single (non-lolnd) interface. */ pi = &the_lnet.ln_ping_target->pb_info; - if (pi->pi_nnis <= 2) { + if (lnet_ping_at_least_two_entries(pi)) { handle_local_health = false; attempt_local_resend = false; } From patchwork Sun Nov 20 14:16:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D16B8C4332F for ; Sun, 20 Nov 2022 14:31:21 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXhz4JT5z21B1; Sun, 20 Nov 2022 06:19:47 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXgv6P5Dz1yFh for ; Sun, 20 Nov 2022 06:18:51 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C796C1008252; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C5CE7E8B84; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:55 -0500 Message-Id: <1668953828-10909-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/22] lustre: llog: skip bad records in llog X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin This patch is further development of idea to skip bad corrupted) llogs data. If llog has fixed-size records then it is possible to skip one record but not rest of llog block. Patch also fixes the skipping to the next chunk: - make sure to skip to the next block for partial chunk or it causes the same block re-read. - handle index == 0 as goal for the llog_next_block() as expected exclusion and just return requested block - set new index after block was skipped to the first one in block - don't create fake padding record in llog_osd_next_block() as the caller can handle it and would know about - restore test_8 functionality to check corruption handling Fixes: b79e7c205e40 ("lustre: llog: add synchronization for the last record") WC-bug-id: https://jira.whamcloud.com/browse/LU-16203 Lustre-commit: cf121b16685fe2a27 ("LU-16203 llog: skip bad records in llog") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48776 Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/llog.c | 86 ++++++++++++++++++++++++++++------------------- 1 file changed, 52 insertions(+), 34 deletions(-) diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c index eb8f7e5..90bb8bd 100644 --- a/fs/lustre/obdclass/llog.c +++ b/fs/lustre/obdclass/llog.c @@ -233,27 +233,26 @@ int llog_init_handle(const struct lu_env *env, struct llog_handle *handle, } EXPORT_SYMBOL(llog_init_handle); +#define LLOG_ERROR_REC(lgh, rec, format, a...) \ + CERROR("%s: "DFID" rec type=%x idx=%u len=%u, " format "\n", \ + loghandle2name(lgh), PLOGID(&lgh->lgh_id), (rec)->lrh_type, \ + (rec)->lrh_index, (rec)->lrh_len, ##a) + int llog_verify_record(const struct llog_handle *llh, struct llog_rec_hdr *rec) { int chunk_size = llh->lgh_hdr->llh_hdr.lrh_len; - if (rec->lrh_len == 0 || rec->lrh_len > chunk_size) { - CERROR("%s: record is too large: %d > %d\n", - loghandle2name(llh), rec->lrh_len, chunk_size); - return -EINVAL; - } - if (rec->lrh_index >= LLOG_HDR_BITMAP_SIZE(llh->lgh_hdr)) { - CERROR("%s: index is too high: %d\n", - loghandle2name(llh), rec->lrh_index); - return -EINVAL; - } - if ((rec->lrh_type & LLOG_OP_MASK) != LLOG_OP_MAGIC) { - CERROR("%s: magic %x is bad\n", - loghandle2name(llh), rec->lrh_type); - return -EINVAL; - } + if ((rec->lrh_type & LLOG_OP_MASK) != LLOG_OP_MAGIC) + LLOG_ERROR_REC(llh, rec, "magic is bad"); + else if (rec->lrh_len == 0 || rec->lrh_len > chunk_size) + LLOG_ERROR_REC(llh, rec, "bad record len, chunk size is %d", + chunk_size); + else if (rec->lrh_index >= LLOG_HDR_BITMAP_SIZE(llh->lgh_hdr)) + LLOG_ERROR_REC(llh, rec, "index is too high"); + else + return 0; - return 0; + return -EINVAL; } static inline bool llog_is_index_skipable(int idx, struct llog_log_hdr *llh, @@ -278,7 +277,6 @@ static int llog_process_thread(void *arg) int saved_index = 0; int last_called_index = 0; bool repeated = false; - bool refresh_idx = false; if (!llh) return -EINVAL; @@ -346,6 +344,11 @@ static int llog_process_thread(void *arg) rc = 0; goto out; } + /* EOF while trying to skip to the next chunk */ + if (!index && rc == -EBADR) { + rc = 0; + goto out; + } if (rc) goto out; @@ -377,6 +380,15 @@ static int llog_process_thread(void *arg) CDEBUG(D_OTHER, "after swabbing, type=%#x idx=%d\n", rec->lrh_type, rec->lrh_index); + /* start with first rec if block was skipped */ + if (!index) { + CDEBUG(D_OTHER, + "%s: skipping to the index %u\n", + loghandle2name(loghandle), + rec->lrh_index); + index = rec->lrh_index; + } + if (index == (synced_idx + 1) && synced_idx == LLOG_HDR_TAIL(llh)->lrt_index) { rc = 0; @@ -399,11 +411,15 @@ static int llog_process_thread(void *arg) * it turns to * lh_last_idx != LLOG_HDR_TAIL(llh)->lrt_index * This exception is working for catalog only. + * The last check is for the partial chunk boundary, + * if it is reached then try to re-read for possible + * new records once. */ if ((index == lh_last_idx && synced_idx != index) || (index == (lh_last_idx + 1) && lh_last_idx != LLOG_HDR_TAIL(llh)->lrt_index) || - (rec->lrh_index == 0 && !repeated)) { + (((char *)rec - buf >= cur_offset - chunk_offset) && + !repeated)) { /* save offset inside buffer for the re-read */ buf_offset = (char *)rec - (char *)buf; cur_offset = chunk_offset; @@ -415,24 +431,27 @@ static int llog_process_thread(void *arg) CDEBUG(D_OTHER, "synced_idx: %d\n", synced_idx); goto repeat; } - repeated = false; rc = llog_verify_record(loghandle, rec); if (rc) { - CERROR("%s: invalid record in llog "DFID" record for index %d/%d: rc = %d\n", - loghandle2name(loghandle), - PLOGID(&loghandle->lgh_id), - rec->lrh_len, index, rc); + CDEBUG(D_OTHER, "invalid record at index %d\n", + index); /* - * the block seem to be corrupted, let's try - * with the next one. reset rc to go to the - * next chunk. + * for fixed-sized llogs we can skip one record + * by using llh_size from llog header. + * Otherwise skip the next llog chunk. */ - refresh_idx = true; - index = 0; rc = 0; - goto repeat; + if (llh->llh_flags & LLOG_F_IS_FIXSIZE) { + rec->lrh_len = llh->llh_size; + goto next_rec; + } + /* make sure that is always next block */ + cur_offset = chunk_offset + chunk_size; + /* no goal to find, just next block to read */ + index = 0; + break; } if (rec->lrh_index < index) { @@ -446,10 +465,9 @@ static int llog_process_thread(void *arg) * gap which can be result of old bugs, just * keep going */ - CERROR("%s: "DFID" index %u, expected %u\n", - loghandle2name(loghandle), - PLOGID(&loghandle->lgh_id), - rec->lrh_index, index); + LLOG_ERROR_REC(loghandle, rec, + "gap in index, expected %u", + index); index = rec->lrh_index; } @@ -470,7 +488,7 @@ static int llog_process_thread(void *arg) if (rc) goto out; } - +next_rec: /* exit if the last index is reached */ if (index >= last_index) { rc = 0; From patchwork Sun Nov 20 14:16:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84DA0C433FE for ; Sun, 20 Nov 2022 14:30:46 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXhw1fLFz219Q; Sun, 20 Nov 2022 06:19:44 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXhl5PPnz215t for ; Sun, 20 Nov 2022 06:19:35 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id CFF341008260; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C9463E8B8B; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:56 -0500 Message-Id: <1668953828-10909-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/22] lnet: fix build issue when IPv6 is disabled. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" struct inet6_dev and struct inet6_ifaddr are not defined if IPv6 is not configured for the Linux kernel. Fixes: 351a6df78c3 ("lnet: support IPv6 in lnet_inet_enumerate()") WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: 896cd5b7bcf94d4fd ("LU-10391 lnet: fix build issue when IPv6 is disabled.") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48990 Reviewed-by: Chris Horn Reviewed-by: Neil Brown Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin --- net/lnet/lnet/config.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index 4b2d776..5bfae4e 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -1501,8 +1501,10 @@ int lnet_inet_enumerate(struct lnet_inetdev **dev_list, struct net *ns, bool v6) int flags = dev_get_flags(dev); const struct in_ifaddr *ifa; struct in_device *in_dev; +#if IS_ENABLED(CONFIG_IPV6) struct inet6_dev *in6_dev; const struct inet6_ifaddr *ifa6; +#endif int node_id; int cpt; From patchwork Sun Nov 20 14:16:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 818FEC4332F for ; Sun, 20 Nov 2022 14:32:20 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXjG2yfpz21By; Sun, 20 Nov 2022 06:20:02 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXhr16gKz1y7d for ; Sun, 20 Nov 2022 06:19:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D29D610084C5; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CE4FCE8B88; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:57 -0500 Message-Id: <1668953828-10909-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/22] lustre: obdclass: fill jobid in a safe way X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lei Feng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lei Feng Ensure jobid_interpret_string() fills jobid in an atomic way. Make sure we use the proper length. The Linux native client got this mostly right. WC-bug-id: https://jira.whamcloud.com/browse/LU-16251 Lustre-commit: 9a0a89520e8b57bd6 ("LU-16251 obdclass: fill jobid in a safe way") Signed-off-by: Lei Feng Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48915 Reviewed-by: Andreas Dilger Reviewed-by: Jian Yu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/jobid.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/lustre/obdclass/jobid.c b/fs/lustre/obdclass/jobid.c index da1af51..77ea5b2 100644 --- a/fs/lustre/obdclass/jobid.c +++ b/fs/lustre/obdclass/jobid.c @@ -308,7 +308,8 @@ static int jobid_interpret_string(const char *jobfmt, char *jobid, */ int lustre_get_jobid(char *jobid, size_t joblen) { - char tmp_jobid[LUSTRE_JOBID_SIZE] = ""; + char id[LUSTRE_JOBID_SIZE] = ""; + int len = min_t(int, joblen, LUSTRE_JOBID_SIZE); if (unlikely(joblen < 2)) { if (joblen == 1) @@ -324,14 +325,14 @@ int lustre_get_jobid(char *jobid, size_t joblen) if (strcmp(obd_jobid_var, JOBSTATS_NODELOCAL) == 0 || strnstr(obd_jobid_name, "%j", LUSTRE_JOBID_SIZE)) { int rc2 = jobid_interpret_string(obd_jobid_name, - tmp_jobid, joblen); + id, len); if (!rc2) goto out_cache_jobid; } /* Use process name + fsuid as jobid */ if (strcmp(obd_jobid_var, JOBSTATS_PROCNAME_UID) == 0) { - snprintf(tmp_jobid, LUSTRE_JOBID_SIZE, "%s.%u", + snprintf(id, LUSTRE_JOBID_SIZE, "%s.%u", current->comm, from_kuid(&init_user_ns, current_fsuid())); goto out_cache_jobid; @@ -343,7 +344,7 @@ int lustre_get_jobid(char *jobid, size_t joblen) rcu_read_lock(); jid = jobid_current(); if (jid) - strlcpy(tmp_jobid, jid, sizeof(tmp_jobid)); + strlcpy(id, jid, sizeof(id)); rcu_read_unlock(); goto out_cache_jobid; } @@ -352,8 +353,8 @@ int lustre_get_jobid(char *jobid, size_t joblen) out_cache_jobid: /* Only replace the job ID if it changed. */ - if (strcmp(jobid, tmp_jobid) != 0) - strcpy(jobid, tmp_jobid); + if (strcmp(jobid, id) != 0) + strcpy(jobid, id); return 0; } From patchwork Sun Nov 20 14:16:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050064 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 42A8AC433FE for ; Sun, 20 Nov 2022 14:34:16 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXjl0Kskz21GY; Sun, 20 Nov 2022 06:20:27 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXhx1pBcz219c for ; Sun, 20 Nov 2022 06:19:45 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D508410084D5; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D23B8E8B89; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:58 -0500 Message-Id: <1668953828-10909-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/22] lustre: llite: remove linefeed from LDLM_DEBUG X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev to make the corresponding messages single-line WC-bug-id: https://jira.whamcloud.com/browse/LU-15825 Lustre-commit: 93784852c8f20b27c ("LU-15825 ldlm: remove linefeed from LDLM_DEBUG") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47219 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 5ac634c..93abec8 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -233,7 +233,7 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) */ if (lock->l_resource->lr_lvb_inode) LDLM_DEBUG(lock, - "can't take inode for the lock (%sevicted)\n", + "can't take inode for the lock (%sevicted)", lock->l_resource->lr_lvb_inode->i_state & I_FREEING ? "" : "not "); return; From patchwork Sun Nov 20 14:16:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050066 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 543B5C433FE for ; Sun, 20 Nov 2022 14:34:46 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXkQ3gPFz21J5; Sun, 20 Nov 2022 06:21:02 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXj95zYTz21Bb for ; Sun, 20 Nov 2022 06:19:57 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D8C4410084E2; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D6609E8B84; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:16:59 -0500 Message-Id: <1668953828-10909-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/22] lnet: selftest: migrate LNet selftest session handling to Netlink X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The currently LNet selftest ioctl interface has a few issues which can be resolved using Netlink. The first is the current API using struct list_head is disliked by the Linux VFS maintainers. While we technically don't need to use the struct list_head directly its still confusing and passing pointers from userland to kernel space is also frowned on. Second issue that is exposed with debug kernels is that ioctl handling done with the lstcon_ioctl_handler can easily end up in a might_sleep state. The new Netlink work is also needed for the IPv6 support. Update the session handling to work with large NIDs. Internally use struct lst_session_id which supports large NIDs instead of struct lst_sid. Lastly we have been wanting YAMl handling with LNet selftest (LU-10975) which comes naturally with this work. WC-bug-id: https://jira.whamcloud.com/browse/LU-8915 Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43298 Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Reviewed-by: Frank Sehr --- include/uapi/linux/lnet/lnetst.h | 21 ++- net/lnet/selftest/conctl.c | 349 +++++++++++++++++++++++++++++++-------- net/lnet/selftest/conrpc.c | 28 +++- net/lnet/selftest/console.c | 81 +++------ net/lnet/selftest/console.h | 68 ++++---- net/lnet/selftest/framework.c | 43 +++-- net/lnet/selftest/selftest.h | 78 +++++++-- 7 files changed, 477 insertions(+), 191 deletions(-) diff --git a/include/uapi/linux/lnet/lnetst.h b/include/uapi/linux/lnet/lnetst.h index af0435f1..d04496d 100644 --- a/include/uapi/linux/lnet/lnetst.h +++ b/include/uapi/linux/lnet/lnetst.h @@ -84,8 +84,6 @@ struct lst_sid { __s64 ses_stamp; /* time stamp */ }; /*** session id */ -extern struct lst_sid LST_INVALID_SID; - struct lst_bid { __u64 bat_id; /* unique id in session */ }; /*** batch id (group of tests) */ @@ -577,4 +575,23 @@ struct sfw_counters { __u32 ping_errors; } __packed; +#define LNET_SELFTEST_GENL_NAME "lnet_selftest" +#define LNET_SELFTEST_GENL_VERSION 0x1 + +/* enum lnet_selftest_commands - Supported core LNet Selftest Netlink + * commands + * + * @LNET_SELFTEST_CMD_UNSPEC: unspecified command to catch errors + * @LNET_SELFTEST_CMD_SESSIONS: command to manage sessions + */ +enum lnet_selftest_commands { + LNET_SELFTEST_CMD_UNSPEC = 0, + + LNET_SELFTEST_CMD_SESSIONS = 1, + + __LNET_SELFTEST_CMD_MAX_PLUS_ONE, +}; + +#define LNET_SELFTEST_CMD_MAX (__LNET_SELFTEST_CMD_MAX_PLUS_ONE - 1) + #endif diff --git a/net/lnet/selftest/conctl.c b/net/lnet/selftest/conctl.c index ede7fe5..aa11885 100644 --- a/net/lnet/selftest/conctl.c +++ b/net/lnet/selftest/conctl.c @@ -40,67 +40,6 @@ #include "console.h" static int -lst_session_new_ioctl(struct lstio_session_new_args *args) -{ - char name[LST_NAME_SIZE + 1]; - int rc; - - if (!args->lstio_ses_idp || /* address for output sid */ - !args->lstio_ses_key || /* no key is specified */ - !args->lstio_ses_namep || /* session name */ - args->lstio_ses_nmlen <= 0 || - args->lstio_ses_nmlen > LST_NAME_SIZE) - return -EINVAL; - - if (copy_from_user(name, args->lstio_ses_namep, - args->lstio_ses_nmlen)) { - return -EFAULT; - } - - name[args->lstio_ses_nmlen] = 0; - - rc = lstcon_session_new(name, - args->lstio_ses_key, - args->lstio_ses_feats, - args->lstio_ses_timeout, - args->lstio_ses_force, - args->lstio_ses_idp); - - return rc; -} - -static int -lst_session_end_ioctl(struct lstio_session_end_args *args) -{ - if (args->lstio_ses_key != console_session.ses_key) - return -EACCES; - - return lstcon_session_end(); -} - -static int -lst_session_info_ioctl(struct lstio_session_info_args *args) -{ - /* no checking of key */ - - if (!args->lstio_ses_idp || /* address for output sid */ - !args->lstio_ses_keyp || /* address for output key */ - !args->lstio_ses_featp || /* address for output features */ - !args->lstio_ses_ndinfo || /* address for output ndinfo */ - !args->lstio_ses_namep || /* address for output name */ - args->lstio_ses_nmlen <= 0 || - args->lstio_ses_nmlen > LST_NAME_SIZE) - return -EINVAL; - - return lstcon_session_info(args->lstio_ses_idp, - args->lstio_ses_keyp, - args->lstio_ses_featp, - args->lstio_ses_ndinfo, - args->lstio_ses_namep, - args->lstio_ses_nmlen); -} - -static int lst_debug_ioctl(struct lstio_debug_args *args) { char name[LST_NAME_SIZE + 1]; @@ -729,13 +668,11 @@ static int lst_test_add_ioctl(struct lstio_test_args *args) switch (opc) { case LSTIO_SESSION_NEW: - rc = lst_session_new_ioctl((struct lstio_session_new_args *)buf); - break; + fallthrough; case LSTIO_SESSION_END: - rc = lst_session_end_ioctl((struct lstio_session_end_args *)buf); - break; + fallthrough; case LSTIO_SESSION_INFO: - rc = lst_session_info_ioctl((struct lstio_session_info_args *)buf); + rc = -EOPNOTSUPP; break; case LSTIO_DEBUG: rc = lst_debug_ioctl((struct lstio_debug_args *)buf); @@ -797,3 +734,283 @@ static int lst_test_add_ioctl(struct lstio_test_args *args) return notifier_from_ioctl_errno(rc); } + +static struct genl_family lst_family; + +static const struct ln_key_list lst_session_keys = { + .lkl_maxattr = LNET_SELFTEST_SESSION_MAX, + .lkl_list = { + [LNET_SELFTEST_SESSION_HDR] = { + .lkp_value = "session", + .lkp_key_format = LNKF_MAPPING, + .lkp_data_type = NLA_NUL_STRING, + }, + [LNET_SELFTEST_SESSION_NAME] = { + .lkp_value = "name", + .lkp_data_type = NLA_STRING, + }, + [LNET_SELFTEST_SESSION_KEY] = { + .lkp_value = "key", + .lkp_data_type = NLA_U32, + }, + [LNET_SELFTEST_SESSION_TIMESTAMP] = { + .lkp_value = "timestamp", + .lkp_data_type = NLA_S64, + }, + [LNET_SELFTEST_SESSION_NID] = { + .lkp_value = "nid", + .lkp_data_type = NLA_STRING, + }, + [LNET_SELFTEST_SESSION_NODE_COUNT] = { + .lkp_value = "nodes", + .lkp_data_type = NLA_U16, + }, + }, +}; + +static int lst_sessions_show_dump(struct sk_buff *msg, + struct netlink_callback *cb) +{ + const struct ln_key_list *all[] = { + &lst_session_keys, NULL + }; + struct netlink_ext_ack *extack = cb->extack; + int portid = NETLINK_CB(cb->skb).portid; + int seq = cb->nlh->nlmsg_seq; + unsigned int node_count = 0; + struct lstcon_ndlink *ndl; + int flag = NLM_F_MULTI; + int rc = 0; + void *hdr; + + if (console_session.ses_state != LST_SESSION_ACTIVE) { + NL_SET_ERR_MSG(extack, "session is not active"); + rc = -ESRCH; + goto out_unlock; + } + + list_for_each_entry(ndl, &console_session.ses_ndl_list, ndl_link) + node_count++; + + rc = lnet_genl_send_scalar_list(msg, portid, seq, &lst_family, + NLM_F_CREATE | NLM_F_MULTI, + LNET_SELFTEST_CMD_SESSIONS, all); + if (rc < 0) { + NL_SET_ERR_MSG(extack, "failed to send key table"); + goto out_unlock; + } + + if (console_session.ses_force) + flag |= NLM_F_REPLACE; + + hdr = genlmsg_put(msg, portid, seq, &lst_family, flag, + LNET_SELFTEST_CMD_SESSIONS); + if (!hdr) { + NL_SET_ERR_MSG(extack, "failed to send values"); + genlmsg_cancel(msg, hdr); + rc = -EMSGSIZE; + goto out_unlock; + } + + nla_put_string(msg, LNET_SELFTEST_SESSION_NAME, + console_session.ses_name); + nla_put_u32(msg, LNET_SELFTEST_SESSION_KEY, + console_session.ses_key); + nla_put_u64_64bit(msg, LNET_SELFTEST_SESSION_TIMESTAMP, + console_session.ses_id.ses_stamp, + LNET_SELFTEST_SESSION_PAD); + nla_put_string(msg, LNET_SELFTEST_SESSION_NID, + libcfs_nidstr(&console_session.ses_id.ses_nid)); + nla_put_u16(msg, LNET_SELFTEST_SESSION_NODE_COUNT, + node_count); + genlmsg_end(msg, hdr); +out_unlock: + return rc; +} + +static int lst_sessions_cmd(struct sk_buff *skb, struct genl_info *info) +{ + struct sk_buff *msg = NULL; + int rc = 0; + + mutex_lock(&console_session.ses_mutex); + + console_session.ses_laststamp = ktime_get_real_seconds(); + + if (console_session.ses_shutdown) { + GENL_SET_ERR_MSG(info, "session is shutdown"); + rc = -ESHUTDOWN; + goto out_unlock; + } + + if (console_session.ses_expired) + lstcon_session_end(); + + if (!(info->nlhdr->nlmsg_flags & NLM_F_CREATE) && + console_session.ses_state == LST_SESSION_NONE) { + GENL_SET_ERR_MSG(info, "session is not active"); + rc = -ESRCH; + goto out_unlock; + } + + memset(&console_session.ses_trans_stat, 0, + sizeof(struct lstcon_trans_stat)); + + if (!(info->nlhdr->nlmsg_flags & NLM_F_CREATE)) { + lstcon_session_end(); + goto out_unlock; + } + + if (info->attrs[LN_SCALAR_ATTR_LIST]) { + struct genlmsghdr *gnlh = nlmsg_data(info->nlhdr); + const struct ln_key_list *all[] = { + &lst_session_keys, NULL + }; + char name[LST_NAME_SIZE]; + struct nlmsghdr *nlh; + struct nlattr *item; + bool force = false; + s64 timeout = 300; + void *hdr; + int rem; + + if (info->nlhdr->nlmsg_flags & NLM_F_REPLACE) + force = true; + + nla_for_each_nested(item, info->attrs[LN_SCALAR_ATTR_LIST], + rem) { + if (nla_type(item) != LN_SCALAR_ATTR_VALUE) + continue; + + if (nla_strcmp(item, "name") == 0) { + ssize_t len; + + item = nla_next(item, &rem); + if (nla_type(item) != LN_SCALAR_ATTR_VALUE) { + rc = -EINVAL; + goto err_conf; + } + + len = nla_strlcpy(name, item, sizeof(name)); + if (len < 0) + rc = len; + } else if (nla_strcmp(item, "timeout") == 0) { + item = nla_next(item, &rem); + if (nla_type(item) != + LN_SCALAR_ATTR_INT_VALUE) { + rc = -EINVAL; + goto err_conf; + } + + timeout = nla_get_s64(item); + if (timeout < 0) + rc = -ERANGE; + } + if (rc < 0) { +err_conf: + GENL_SET_ERR_MSG(info, + "failed to get config"); + goto out_unlock; + } + } + + rc = lstcon_session_new(name, info->nlhdr->nlmsg_pid, + gnlh->version, timeout, + force); + if (rc < 0) { + GENL_SET_ERR_MSG(info, "new session creation failed"); + lstcon_session_end(); + goto out_unlock; + } + + msg = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) { + GENL_SET_ERR_MSG(info, "msg allocation failed"); + rc = -ENOMEM; + goto out_unlock; + } + + rc = lnet_genl_send_scalar_list(msg, info->snd_portid, + info->snd_seq, &lst_family, + NLM_F_CREATE | NLM_F_MULTI, + LNET_SELFTEST_CMD_SESSIONS, + all); + if (rc < 0) { + GENL_SET_ERR_MSG(info, "failed to send key table"); + goto out_unlock; + } + + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, + &lst_family, NLM_F_MULTI, + LNET_SELFTEST_CMD_SESSIONS); + if (!hdr) { + GENL_SET_ERR_MSG(info, "failed to send values"); + genlmsg_cancel(msg, hdr); + rc = -EMSGSIZE; + goto out_unlock; + } + + nla_put_string(msg, LNET_SELFTEST_SESSION_NAME, + console_session.ses_name); + nla_put_u32(msg, LNET_SELFTEST_SESSION_KEY, + console_session.ses_key); + nla_put_u64_64bit(msg, LNET_SELFTEST_SESSION_TIMESTAMP, + console_session.ses_id.ses_stamp, + LNET_SELFTEST_SESSION_PAD); + nla_put_string(msg, LNET_SELFTEST_SESSION_NID, + libcfs_nidstr(&console_session.ses_id.ses_nid)); + nla_put_u16(msg, LNET_SELFTEST_SESSION_NODE_COUNT, 0); + + genlmsg_end(msg, hdr); + + nlh = nlmsg_put(msg, info->snd_portid, info->snd_seq, + NLMSG_DONE, 0, NLM_F_MULTI); + if (!nlh) { + GENL_SET_ERR_MSG(info, "failed to complete message"); + genlmsg_cancel(msg, hdr); + rc = -ENOMEM; + goto out_unlock; + } + rc = genlmsg_reply(msg, info); + if (rc) + GENL_SET_ERR_MSG(info, "failed to send reply"); + } +out_unlock: + if (rc < 0 && msg) + nlmsg_free(msg); + mutex_unlock(&console_session.ses_mutex); + return rc; +} + +static const struct genl_multicast_group lst_mcast_grps[] = { + { .name = "sessions", }, +}; + +static const struct genl_ops lst_genl_ops[] = { + { + .cmd = LNET_SELFTEST_CMD_SESSIONS, + .dumpit = lst_sessions_show_dump, + .doit = lst_sessions_cmd, + }, +}; + +static struct genl_family lst_family = { + .name = LNET_SELFTEST_GENL_NAME, + .version = LNET_SELFTEST_GENL_VERSION, + .maxattr = LN_SCALAR_MAX, + .module = THIS_MODULE, + .ops = lst_genl_ops, + .n_ops = ARRAY_SIZE(lst_genl_ops), + .mcgrps = lst_mcast_grps, + .n_mcgrps = ARRAY_SIZE(lst_mcast_grps), +}; + +int lstcon_init_netlink(void) +{ + return genl_register_family(&lst_family); +} + +void lstcon_fini_netlink(void) +{ + genl_unregister_family(&lst_family); +} diff --git a/net/lnet/selftest/conrpc.c b/net/lnet/selftest/conrpc.c index 0170219..8096c46 100644 --- a/net/lnet/selftest/conrpc.c +++ b/net/lnet/selftest/conrpc.c @@ -602,7 +602,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *, return rc; msrq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.mksn_reqst; - msrq->mksn_sid = console_session.ses_id; + msrq->mksn_sid.ses_stamp = console_session.ses_id.ses_stamp; + msrq->mksn_sid.ses_nid = + lnet_nid_to_nid4(&console_session.ses_id.ses_nid); msrq->mksn_force = console_session.ses_force; strlcpy(msrq->mksn_name, console_session.ses_name, sizeof(msrq->mksn_name)); @@ -615,7 +617,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *, return rc; rsrq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.rmsn_reqst; - rsrq->rmsn_sid = console_session.ses_id; + rsrq->rmsn_sid.ses_stamp = console_session.ses_id.ses_stamp; + rsrq->rmsn_sid.ses_nid = + lnet_nid_to_nid4(&console_session.ses_id.ses_nid); break; default: @@ -638,7 +642,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *, drq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.dbg_reqst; - drq->dbg_sid = console_session.ses_id; + drq->dbg_sid.ses_stamp = console_session.ses_id.ses_stamp; + drq->dbg_sid.ses_nid = + lnet_nid_to_nid4(&console_session.ses_id.ses_nid); drq->dbg_flags = 0; return rc; @@ -658,7 +664,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *, brq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.bat_reqst; - brq->bar_sid = console_session.ses_id; + brq->bar_sid.ses_stamp = console_session.ses_id.ses_stamp; + brq->bar_sid.ses_nid = + lnet_nid_to_nid4(&console_session.ses_id.ses_nid); brq->bar_bid = tsb->tsb_id; brq->bar_testidx = tsb->tsb_index; brq->bar_opc = transop == LST_TRANS_TSBRUN ? SRPC_BATCH_OPC_RUN : @@ -690,7 +698,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *, srq = &(*crpc)->crp_rpc->crpc_reqstmsg.msg_body.stat_reqst; - srq->str_sid = console_session.ses_id; + srq->str_sid.ses_stamp = console_session.ses_id.ses_stamp; + srq->str_sid.ses_nid = + lnet_nid_to_nid4(&console_session.ses_id.ses_nid); srq->str_type = 0; /* XXX remove it */ return 0; @@ -877,7 +887,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *, trq->tsr_loop = test->tes_loop; } - trq->tsr_sid = console_session.ses_id; + trq->tsr_sid.ses_stamp = console_session.ses_id.ses_stamp; + trq->tsr_sid.ses_nid = + lnet_nid_to_nid4(&console_session.ses_id.ses_nid); trq->tsr_bid = test->tes_hdr.tsb_id; trq->tsr_concur = test->tes_concur; trq->tsr_is_client = (transop == LST_TRANS_TSBCLIADD) ? 1 : 0; @@ -1259,7 +1271,9 @@ void lstcon_rpc_stat_reply(struct lstcon_rpc_trans *, struct srpc_msg *, drq = &crpc->crp_rpc->crpc_reqstmsg.msg_body.dbg_reqst; - drq->dbg_sid = console_session.ses_id; + drq->dbg_sid.ses_stamp = console_session.ses_id.ses_stamp; + drq->dbg_sid.ses_nid = + lnet_nid_to_nid4(&console_session.ses_id.ses_nid); drq->dbg_flags = 0; lstcon_rpc_trans_addreq(trans, crpc); diff --git a/net/lnet/selftest/console.c b/net/lnet/selftest/console.c index 85e9300..1ed6191 100644 --- a/net/lnet/selftest/console.c +++ b/net/lnet/selftest/console.c @@ -1679,27 +1679,32 @@ static void lstcon_group_ndlink_release(struct lstcon_group *, } int -lstcon_session_match(struct lst_sid sid) +lstcon_session_match(struct lst_sid id) { - return (console_session.ses_id.ses_nid == sid.ses_nid && - console_session.ses_id.ses_stamp == sid.ses_stamp) ? 1 : 0; + struct lst_session_id sid; + + sid.ses_stamp = id.ses_stamp; + lnet_nid4_to_nid(id.ses_nid, &sid.ses_nid); + + return (nid_same(&console_session.ses_id.ses_nid, &sid.ses_nid) && + console_session.ses_id.ses_stamp == sid.ses_stamp) ? 1 : 0; } static void -lstcon_new_session_id(struct lst_sid *sid) +lstcon_new_session_id(struct lst_session_id *sid) { struct lnet_processid id; LASSERT(console_session.ses_state == LST_SESSION_NONE); LNetGetId(1, &id); - sid->ses_nid = lnet_nid_to_nid4(&id.nid); + sid->ses_nid = id.nid; sid->ses_stamp = div_u64(ktime_get_ns(), NSEC_PER_MSEC); } int lstcon_session_new(char *name, int key, unsigned int feats, - int timeout, int force, struct lst_sid __user *sid_up) + int timeout, int force) { int rc = 0; int i; @@ -1731,7 +1736,6 @@ static void lstcon_group_ndlink_release(struct lstcon_group *, lstcon_new_session_id(&console_session.ses_id); console_session.ses_key = key; - console_session.ses_state = LST_SESSION_ACTIVE; console_session.ses_force = !!force; console_session.ses_features = feats; console_session.ses_feats_updated = 0; @@ -1757,52 +1761,12 @@ static void lstcon_group_ndlink_release(struct lstcon_group *, return rc; } - if (!copy_to_user(sid_up, &console_session.ses_id, - sizeof(struct lst_sid))) - return rc; - - lstcon_session_end(); - - return -EFAULT; -} - -int -lstcon_session_info(struct lst_sid __user *sid_up, int __user *key_up, - unsigned __user *featp, - struct lstcon_ndlist_ent __user *ndinfo_up, - char __user *name_up, int len) -{ - struct lstcon_ndlist_ent *entp; - struct lstcon_ndlink *ndl; - int rc = 0; - - if (console_session.ses_state != LST_SESSION_ACTIVE) - return -ESRCH; - - entp = kzalloc(sizeof(*entp), GFP_NOFS); - if (!entp) - return -ENOMEM; - - list_for_each_entry(ndl, &console_session.ses_ndl_list, ndl_link) - LST_NODE_STATE_COUNTER(ndl->ndl_node, entp); - - if (copy_to_user(sid_up, &console_session.ses_id, - sizeof(*sid_up)) || - copy_to_user(key_up, &console_session.ses_key, - sizeof(*key_up)) || - copy_to_user(featp, &console_session.ses_features, - sizeof(*featp)) || - copy_to_user(ndinfo_up, entp, sizeof(*entp)) || - copy_to_user(name_up, console_session.ses_name, len)) - rc = -EFAULT; - - kfree(entp); + console_session.ses_state = LST_SESSION_ACTIVE; return rc; } -int -lstcon_session_end(void) +int lstcon_session_end(void) { struct lstcon_rpc_trans *trans; struct lstcon_group *grp; @@ -1907,9 +1871,10 @@ static void lstcon_group_ndlink_release(struct lstcon_group *, mutex_lock(&console_session.ses_mutex); - jrep->join_sid = console_session.ses_id; + jrep->join_sid.ses_stamp = console_session.ses_id.ses_stamp; + jrep->join_sid.ses_nid = lnet_nid_to_nid4(&console_session.ses_id.ses_nid); - if (console_session.ses_id.ses_nid == LNET_NID_ANY) { + if (LNET_NID_IS_ANY(&console_session.ses_id.ses_nid)) { jrep->join_status = ESRCH; goto out; } @@ -2041,14 +2006,21 @@ static void lstcon_init_acceptor_service(void) goto out; } + rc = lstcon_init_netlink(); + if (rc < 0) + goto out; + rc = blocking_notifier_chain_register(&libcfs_ioctl_list, &lstcon_ioctl_handler); - if (!rc) { - lstcon_rpc_module_init(); - return 0; + if (rc < 0) { + lstcon_fini_netlink(); + goto out; } + lstcon_rpc_module_init(); + return 0; + out: srpc_shutdown_service(&lstcon_acceptor_service); srpc_remove_service(&lstcon_acceptor_service); @@ -2067,6 +2039,7 @@ static void lstcon_init_acceptor_service(void) blocking_notifier_chain_unregister(&libcfs_ioctl_list, &lstcon_ioctl_handler); + lstcon_fini_netlink(); mutex_lock(&console_session.ses_mutex); diff --git a/net/lnet/selftest/console.h b/net/lnet/selftest/console.h index 93aa515..dd416dc 100644 --- a/net/lnet/selftest/console.h +++ b/net/lnet/selftest/console.h @@ -136,36 +136,34 @@ struct lstcon_test { #define LST_CONSOLE_TIMEOUT 300 /* default console timeout */ struct lstcon_session { - struct mutex ses_mutex; /* only 1 thread in session */ - struct lst_sid ses_id; /* global session id */ - int ses_key; /* local session key */ - int ses_state; /* state of session */ - int ses_timeout; /* timeout in seconds */ - time64_t ses_laststamp; /* last operation stamp (secs) */ - unsigned int ses_features; /* tests features of the session */ - unsigned int ses_feats_updated:1; /* features are synced with - * remote test nodes - */ - unsigned int ses_force:1; /* force creating */ - unsigned int ses_shutdown:1; /* session is shutting down */ - unsigned int ses_expired:1; /* console is timedout */ - u64 ses_id_cookie; /* batch id cookie */ - char ses_name[LST_NAME_SIZE];/* session name */ - struct lstcon_rpc_trans - *ses_ping; /* session pinger */ - struct stt_timer ses_ping_timer; /* timer for pinger */ - struct lstcon_trans_stat - ses_trans_stat; /* transaction stats */ - - struct list_head ses_trans_list; /* global list of transaction */ - struct list_head ses_grp_list; /* global list of groups */ - struct list_head ses_bat_list; /* global list of batches */ - struct list_head ses_ndl_list; /* global list of nodes */ - struct list_head *ses_ndl_hash; /* hash table of nodes */ - - spinlock_t ses_rpc_lock; /* serialize */ - atomic_t ses_rpc_counter; /* # of initialized RPCs */ - struct list_head ses_rpc_freelist; /* idle console rpc */ + struct mutex ses_mutex; /* only 1 thread in session */ + struct lst_session_id ses_id; /* global session id */ + u32 ses_key; /* local session key */ + int ses_state; /* state of session */ + int ses_timeout; /* timeout in seconds */ + time64_t ses_laststamp; /* last operation stamp (secs) */ + unsigned int ses_features; /* tests features of the session */ + unsigned int ses_feats_updated:1; /* features are synced with + * remote test nodes + */ + unsigned int ses_force:1; /* force creating */ + unsigned int ses_shutdown:1; /* session is shutting down */ + unsigned int ses_expired:1; /* console is timedout */ + u64 ses_id_cookie; /* batch id cookie */ + char ses_name[LST_NAME_SIZE];/* session name */ + struct lstcon_rpc_trans *ses_ping; /* session pinger */ + struct stt_timer ses_ping_timer; /* timer for pinger */ + struct lstcon_trans_stat ses_trans_stat;/* transaction stats */ + + struct list_head ses_trans_list; /* global list of transaction */ + struct list_head ses_grp_list; /* global list of groups */ + struct list_head ses_bat_list; /* global list of batches */ + struct list_head ses_ndl_list; /* global list of nodes */ + struct list_head *ses_ndl_hash; /* hash table of nodes */ + + spinlock_t ses_rpc_lock; /* serialize */ + atomic_t ses_rpc_counter;/* # of initialized RPCs */ + struct list_head ses_rpc_freelist;/* idle console rpc */ }; /* session descriptor */ extern struct lstcon_session console_session; @@ -186,14 +184,16 @@ struct lstcon_session { int lstcon_ioctl_entry(struct notifier_block *nb, unsigned long cmd, void *vdata); + +int lstcon_init_netlink(void); +void lstcon_fini_netlink(void); + int lstcon_console_init(void); int lstcon_console_fini(void); + int lstcon_session_match(struct lst_sid sid); int lstcon_session_new(char *name, int key, unsigned int version, - int timeout, int flags, struct lst_sid __user *sid_up); -int lstcon_session_info(struct lst_sid __user *sid_up, int __user *key, - unsigned __user *verp, struct lstcon_ndlist_ent __user *entp, - char __user *name_up, int len); + int timeout, int flags); int lstcon_session_end(void); int lstcon_session_debug(int timeout, struct list_head __user *result_up); int lstcon_session_feats_check(unsigned int feats); diff --git a/net/lnet/selftest/framework.c b/net/lnet/selftest/framework.c index e84904e..0dd0421 100644 --- a/net/lnet/selftest/framework.c +++ b/net/lnet/selftest/framework.c @@ -39,7 +39,7 @@ #include "selftest.h" -struct lst_sid LST_INVALID_SID = { .ses_nid = LNET_NID_ANY, .ses_stamp = -1 }; +struct lst_session_id LST_INVALID_SID = { .ses_nid = LNET_ANY_NID, .ses_stamp = -1}; static int session_timeout = 100; module_param(session_timeout, int, 0444); @@ -244,7 +244,7 @@ LASSERT(sn == sfw_data.fw_session); CWARN("Session expired! sid: %s-%llu, name: %s\n", - libcfs_nid2str(sn->sn_id.ses_nid), + libcfs_nidstr(&sn->sn_id.ses_nid), sn->sn_id.ses_stamp, &sn->sn_name[0]); sn->sn_timer_active = 0; @@ -268,7 +268,8 @@ strlcpy(&sn->sn_name[0], name, sizeof(sn->sn_name)); sn->sn_timer_active = 0; - sn->sn_id = sid; + sn->sn_id.ses_stamp = sid.ses_stamp; + lnet_nid4_to_nid(sid.ses_nid, &sn->sn_id.ses_nid); sn->sn_features = features; sn->sn_timeout = session_timeout; sn->sn_started = ktime_get(); @@ -357,6 +358,18 @@ return bat; } +static struct lst_sid get_old_sid(struct sfw_session *sn) +{ + struct lst_sid sid = { .ses_nid = LNET_NID_ANY, .ses_stamp = -1 }; + + if (sn) { + sid.ses_stamp = sn->sn_id.ses_stamp; + sid.ses_nid = lnet_nid_to_nid4(&sn->sn_id.ses_nid); + } + + return sid; +} + static int sfw_get_stats(struct srpc_stat_reqst *request, struct srpc_stat_reply *reply) { @@ -364,7 +377,7 @@ struct sfw_counters *cnt = &reply->str_fw; struct sfw_batch *bat; - reply->str_sid = !sn ? LST_INVALID_SID : sn->sn_id; + reply->str_sid = get_old_sid(sn); if (request->str_sid.ses_nid == LNET_NID_ANY) { reply->str_status = EINVAL; @@ -407,14 +420,14 @@ int cplen = 0; if (request->mksn_sid.ses_nid == LNET_NID_ANY) { - reply->mksn_sid = !sn ? LST_INVALID_SID : sn->sn_id; + reply->mksn_sid = get_old_sid(sn); reply->mksn_status = EINVAL; return 0; } if (sn) { reply->mksn_status = 0; - reply->mksn_sid = sn->sn_id; + reply->mksn_sid = get_old_sid(sn); reply->mksn_timeout = sn->sn_timeout; if (sfw_sid_equal(request->mksn_sid, sn->sn_id)) { @@ -464,7 +477,7 @@ spin_unlock(&sfw_data.fw_lock); reply->mksn_status = 0; - reply->mksn_sid = sn->sn_id; + reply->mksn_sid = get_old_sid(sn); reply->mksn_timeout = sn->sn_timeout; return 0; } @@ -475,7 +488,7 @@ { struct sfw_session *sn = sfw_data.fw_session; - reply->rmsn_sid = !sn ? LST_INVALID_SID : sn->sn_id; + reply->rmsn_sid = get_old_sid(sn); if (request->rmsn_sid.ses_nid == LNET_NID_ANY) { reply->rmsn_status = EINVAL; @@ -497,7 +510,7 @@ spin_unlock(&sfw_data.fw_lock); reply->rmsn_status = 0; - reply->rmsn_sid = LST_INVALID_SID; + reply->rmsn_sid = get_old_sid(NULL); LASSERT(!sfw_data.fw_session); return 0; } @@ -510,12 +523,12 @@ if (!sn) { reply->dbg_status = ESRCH; - reply->dbg_sid = LST_INVALID_SID; + reply->dbg_sid = get_old_sid(NULL); return 0; } reply->dbg_status = 0; - reply->dbg_sid = sn->sn_id; + reply->dbg_sid = get_old_sid(sn); reply->dbg_timeout = sn->sn_timeout; if (strlcpy(reply->dbg_name, &sn->sn_name[0], sizeof(reply->dbg_name)) >= sizeof(reply->dbg_name)) @@ -1119,7 +1132,7 @@ struct sfw_batch *bat; request = &rpc->srpc_reqstbuf->buf_msg.msg_body.tes_reqst; - reply->tsr_sid = !sn ? LST_INVALID_SID : sn->sn_id; + reply->tsr_sid = get_old_sid(sn); if (!request->tsr_loop || !request->tsr_concur || @@ -1187,7 +1200,7 @@ int rc = 0; struct sfw_batch *bat; - reply->bar_sid = !sn ? LST_INVALID_SID : sn->sn_id; + reply->bar_sid = get_old_sid(sn); if (!sn || !sfw_sid_equal(request->bar_sid, sn->sn_id)) { reply->bar_status = ESRCH; @@ -1266,7 +1279,9 @@ CNETERR("Features of framework RPC don't match features of current session: %x/%x\n", request->msg_ses_feats, sn->sn_features); reply->msg_body.reply.status = EPROTO; - reply->msg_body.reply.sid = sn->sn_id; + reply->msg_body.reply.sid.ses_stamp = sn->sn_id.ses_stamp; + reply->msg_body.reply.sid.ses_nid = + lnet_nid_to_nid4(&sn->sn_id.ses_nid); goto out; } diff --git a/net/lnet/selftest/selftest.h b/net/lnet/selftest/selftest.h index 223a432..5bffe73 100644 --- a/net/lnet/selftest/selftest.h +++ b/net/lnet/selftest/selftest.h @@ -49,6 +49,39 @@ #define MADE_WITHOUT_COMPROMISE #endif +/* enum lnet_selftest_session_attrs - LNet selftest session Netlink + * attributes + * + * @LNET_SELFTEST_SESSION_UNSPEC: unspecified attribute to catch errors + * @LNET_SELFTEST_SESSION_PAD: padding for 64-bit attributes, ignore + * + * @LENT_SELFTEST_SESSION_HDR: Netlink group this data is for + * (NLA_NUL_STRING) + * @LNET_SELFTEST_SESSION_NAME: name of this session (NLA_STRING) + * @LNET_SELFTEST_SESSION_KEY: key used to represent the session + * (NLA_U32) + * @LNET_SELFTEST_SESSION_TIMESTAMP: timestamp when the session was created + * (NLA_S64) + * @LNET_SELFTEST_SESSION_NID: NID of the node selftest ran on + * (NLA_STRING) + * @LNET_SELFTEST_SESSION_NODE_COUNT: Number of nodes in use (NLA_U16) + */ +enum lnet_selftest_session_attrs { + LNET_SELFTEST_SESSION_UNSPEC = 0, + LNET_SELFTEST_SESSION_PAD = LNET_SELFTEST_SESSION_UNSPEC, + + LNET_SELFTEST_SESSION_HDR, + LNET_SELFTEST_SESSION_NAME, + LNET_SELFTEST_SESSION_KEY, + LNET_SELFTEST_SESSION_TIMESTAMP, + LNET_SELFTEST_SESSION_NID, + LNET_SELFTEST_SESSION_NODE_COUNT, + + __LNET_SELFTEST_SESSION_MAX_PLUS_ONE, +}; + +#define LNET_SELFTEST_SESSION_MAX (__LNET_SELFTEST_SESSION_MAX_PLUS_ONE - 1) + #define SWI_STATE_NEWBORN 0 #define SWI_STATE_REPLY_SUBMITTED 1 #define SWI_STATE_REPLY_SENT 2 @@ -318,23 +351,40 @@ struct srpc_service { int (*sv_bulk_ready)(struct srpc_server_rpc *, int); }; +struct lst_session_id { + s64 ses_stamp; /* time stamp in milliseconds */ + struct lnet_nid ses_nid; /* nid of console node */ +}; /*** session id (large addr) */ + +extern struct lst_session_id LST_INVALID_SID; + struct sfw_session { - struct list_head sn_list; /* chain on fw_zombie_sessions */ - struct lst_sid sn_id; /* unique identifier */ - unsigned int sn_timeout; /* # seconds' inactivity to expire */ - int sn_timer_active; - unsigned int sn_features; - struct stt_timer sn_timer; - struct list_head sn_batches; /* list of batches */ - char sn_name[LST_NAME_SIZE]; - atomic_t sn_refcount; - atomic_t sn_brw_errors; - atomic_t sn_ping_errors; - ktime_t sn_started; + /* chain on fw_zombie_sessions */ + struct list_head sn_list; + struct lst_session_id sn_id; /* unique identifier */ + /* # seconds' inactivity to expire */ + unsigned int sn_timeout; + int sn_timer_active; + unsigned int sn_features; + struct stt_timer sn_timer; + struct list_head sn_batches; /* list of batches */ + char sn_name[LST_NAME_SIZE]; + atomic_t sn_refcount; + atomic_t sn_brw_errors; + atomic_t sn_ping_errors; + ktime_t sn_started; }; -#define sfw_sid_equal(sid0, sid1) ((sid0).ses_nid == (sid1).ses_nid && \ - (sid0).ses_stamp == (sid1).ses_stamp) +static inline int sfw_sid_equal(struct lst_sid sid0, + struct lst_session_id sid1) +{ + struct lnet_nid ses_nid; + + lnet_nid4_to_nid(sid0.ses_nid, &ses_nid); + + return ((sid0.ses_stamp == sid1.ses_stamp) && + nid_same(&ses_nid, &sid1.ses_nid)); +} struct sfw_batch { struct list_head bat_list; /* chain on sn_batches */ From patchwork Sun Nov 20 14:17:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A0CFFC4332F for ; Sun, 20 Nov 2022 14:34:19 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXkF4RV5z21Hf; Sun, 20 Nov 2022 06:20:53 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXk01Zt4z21H8 for ; Sun, 20 Nov 2022 06:20:40 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id DCCE310087C5; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D9CBBE8B9B; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:00 -0500 Message-Id: <1668953828-10909-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/22] lustre: clio: append to non-existent component X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman should return an error, but it fails now with a BUG below because @rc of lov_io_layout_at() is not checked for < 0 BUG: unable to handle kernel paging request at ffff99d3c2f74030 Call Trace: lov_stripe_number+0x19/0x40 [lov] lov_page_init_composite+0x103/0x5f0 [lov] ? kmem_cache_alloc+0x12e/0x270 cl_page_alloc+0x19f/0x660 [obdclass] cl_page_find+0x1a0/0x250 [obdclass] ll_write_begin+0x1f7/0xfb0 [lustre] HPE-bug-id: LUS-11075 WC-bug-id: https://jira.whamcloud.com/browse/LU-16281 Lustre-commit: 8fdeca3b6faf22c72 ("LU-16281 clio: append to non-existent component") Signed-off-by: Vitaly Fertman Reviewed-on: https://es-gerrit.dev.cray.com/161123 Reviewed-by: Alexander Zarochentsev Reviewed-by: Alexander Boyko Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48994 Reviewed-by: Andreas Dilger Reviewed-by: Alexander Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_page.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c index a22b71f..6e28e62 100644 --- a/fs/lustre/lov/lov_page.c +++ b/fs/lustre/lov/lov_page.c @@ -84,6 +84,8 @@ int lov_page_init_composite(const struct lu_env *env, struct cl_object *obj, suboff = lio->lis_cached_suboff + offset - lio->lis_cached_off; } else { entry = lov_io_layout_at(lio, offset); + if (entry < 0) + return -ENODATA; stripe = lov_stripe_number(loo->lo_lsm, entry, offset); rc = lov_stripe_offset(loo->lo_lsm, entry, offset, stripe, From patchwork Sun Nov 20 14:17:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7144BC4332F for ; Sun, 20 Nov 2022 14:38:26 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXl30xgcz21Jw; Sun, 20 Nov 2022 06:21:35 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXkv5rvlz1yC5 for ; Sun, 20 Nov 2022 06:21:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id DFD4D10087CA; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DD8E3E8B88; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:01 -0500 Message-Id: <1668953828-10909-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/22] lnet: fix debug message in lnet_discovery_event_reply X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov The message in lnet_discovery_event_reply currently says "Peer X has discovery disabled" even though the same path may be taken if discovery is disabled locally. Change the debug message to indicate whether discovery is disabled on the peer side or locally. WC-bug-id: https://jira.whamcloud.com/browse/LU-16282 Lustre-commit: 9f45a79e983c11def ("LU-16282 lnet: fix debug message in lnet_discovery_event_reply") Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48997 Reviewed-by: Neil Brown Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 52ad791..35b135e 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2592,6 +2592,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) struct lnet_ping_buffer *pbuf; int infobytes; int rc; + bool ping_feat_disc; spin_lock(&lp->lp_lock); @@ -2629,14 +2630,15 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) goto out; } - /* - * The peer may have discovery disabled at its end. Set + /* The peer may have discovery disabled at its end. Set * NO_DISCOVERY as appropriate. */ - if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) && - lnet_peer_discovery_disabled) { - CDEBUG(D_NET, "Peer %s has discovery enabled\n", - libcfs_nidstr(&lp->lp_primary_nid)); + ping_feat_disc = pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY; + if (!ping_feat_disc || lnet_peer_discovery_disabled) { + CDEBUG(D_NET, "Peer %s has discovery %s, local discovery %s\n", + libcfs_nidstr(&lp->lp_primary_nid), + ping_feat_disc ? "enabled" : "disabled", + lnet_peer_discovery_disabled ? "disabled" : "enabled"); /* Detect whether this peer has toggled discovery from on to * off and whether we can delete and re-create the peer. Peers From patchwork Sun Nov 20 14:17:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050068 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A4480C433FE for ; Sun, 20 Nov 2022 14:39:09 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXln5tp7z21b5; Sun, 20 Nov 2022 06:22:13 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXl63d6fz21Jn for ; Sun, 20 Nov 2022 06:21:38 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E56F91009354; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E2164E8B89; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:02 -0500 Message-Id: <1668953828-10909-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 16/22] lustre: ldlm: group lock unlock fix X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman The original LU-9964 fix had a problem because with many pages in memory grouplock unlock takes 10+ seconds just to discard them. The current patch makes grouplock unlock thread to be not atomic, but makes a new grouplock enqueue to wait until previous CBPENDING lock gets destroyed. HPE-bug-id: LUS-10644 WC-bug-id: https://jira.whamcloud.com/browse/LU-16046 Lustre-commit: 3dc261c06434eceee ("LU-16046 ldlm: group lock unlock fix") Lustre-commit: 62fd8f9b498ae3d16 ("Revert "LU-16046 revert: "LU-9964 llite: prevent mulitple group locks"") Lustre-commit: dd609c6f31adeadab ("Revert "LU-16046 ldlm: group lock fix") Signed-off-by: Vitaly Fertman Reviewed-on: https://es-gerrit.dev.cray.com/161411 Reviewed-by: Andriy Skulysh Reviewed-by: Alexander Boyko Tested-by: Alexander Lezhoev Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49008 Reviewed-by: Alexander Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 1 + fs/lustre/include/lustre_osc.h | 15 ---- fs/lustre/ldlm/ldlm_lock.c | 28 ++++++- fs/lustre/llite/file.c | 76 ++++++++++++------- fs/lustre/llite/llite_internal.h | 3 + fs/lustre/llite/llite_lib.c | 3 + fs/lustre/mdc/mdc_dev.c | 58 ++++----------- fs/lustre/osc/osc_lock.c | 157 ++------------------------------------- fs/lustre/osc/osc_object.c | 16 ---- fs/lustre/osc/osc_request.c | 14 ++-- 10 files changed, 110 insertions(+), 261 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 6053e01..d08c48f 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -855,6 +855,7 @@ enum ldlm_match_flags { LDLM_MATCH_AST = BIT(1), LDLM_MATCH_AST_ANY = BIT(2), LDLM_MATCH_RIGHT = BIT(3), + LDLM_MATCH_GROUP = BIT(4), }; /** diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index a0f1afc..d15f46b 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -319,11 +319,6 @@ struct osc_object { const struct osc_object_operations *oo_obj_ops; bool oo_initialized; - - wait_queue_head_t oo_group_waitq; - struct mutex oo_group_mutex; - u64 oo_group_users; - unsigned long oo_group_gid; }; static inline void osc_build_res_name(struct osc_object *osc, @@ -660,16 +655,6 @@ int osc_object_glimpse(const struct lu_env *env, const struct cl_object *obj, int osc_object_find_cbdata(const struct lu_env *env, struct cl_object *obj, ldlm_iterator_t iter, void *data); int osc_object_prune(const struct lu_env *env, struct cl_object *obj); -void osc_grouplock_inc_locked(struct osc_object *osc, struct ldlm_lock *lock); -void osc_grouplock_dec(struct osc_object *osc, struct ldlm_lock *lock); -int osc_grouplock_enqueue_init(const struct lu_env *env, - struct osc_object *obj, - struct osc_lock *oscl, - struct lustre_handle *lh); -void osc_grouplock_enqueue_fini(const struct lu_env *env, - struct osc_object *obj, - struct osc_lock *oscl, - struct lustre_handle *lh); /* osc_request.c */ void osc_init_grant(struct client_obd *cli, struct obd_connect_data *ocd); diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c index 39ab2a0..8659aa5 100644 --- a/fs/lustre/ldlm/ldlm_lock.c +++ b/fs/lustre/ldlm/ldlm_lock.c @@ -324,6 +324,7 @@ static int ldlm_lock_destroy_internal(struct ldlm_lock *lock) return 0; } ldlm_set_destroyed(lock); + wake_up(&lock->l_waitq); ldlm_lock_remove_from_lru(lock); class_handle_unhash(&lock->l_handle); @@ -1067,10 +1068,12 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata) * can still happen. */ if (ldlm_is_cbpending(lock) && - !(data->lmd_flags & LDLM_FL_CBPENDING)) + !(data->lmd_flags & LDLM_FL_CBPENDING) && + !(data->lmd_match & LDLM_MATCH_GROUP)) return false; - if (!(data->lmd_match & LDLM_MATCH_UNREF) && ldlm_is_cbpending(lock) && + if (!(data->lmd_match & (LDLM_MATCH_UNREF | LDLM_MATCH_GROUP)) && + ldlm_is_cbpending(lock) && !lock->l_readers && !lock->l_writers) return false; @@ -1136,7 +1139,12 @@ static bool lock_matches(struct ldlm_lock *lock, void *vdata) return false; matched: - if (data->lmd_flags & LDLM_FL_TEST_LOCK) { + /** + * In case the lock is a CBPENDING grouplock, just pin it and return, + * we need to wait until it gets to DESTROYED. + */ + if ((data->lmd_flags & LDLM_FL_TEST_LOCK) || + (ldlm_is_cbpending(lock) && (data->lmd_match & LDLM_MATCH_GROUP))) { LDLM_LOCK_GET(lock); ldlm_lock_touch_in_lru(lock); } else { @@ -1296,6 +1304,7 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, }; struct ldlm_resource *res; struct ldlm_lock *lock; + struct ldlm_lock *group_lock; int matched; if (!ns) { @@ -1314,6 +1323,8 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, return 0; } +repeat: + group_lock = NULL; LDLM_RESOURCE_ADDREF(res); lock_res(res); if (res->lr_type == LDLM_EXTENT) @@ -1323,8 +1334,19 @@ enum ldlm_mode ldlm_lock_match_with_skip(struct ldlm_namespace *ns, if (!lock && !(flags & LDLM_FL_BLOCK_GRANTED)) lock = search_queue(&res->lr_waiting, &data); matched = lock ? mode : 0; + + if (lock && ldlm_is_cbpending(lock) && + (data.lmd_match & LDLM_MATCH_GROUP)) + group_lock = lock; unlock_res(res); LDLM_RESOURCE_DELREF(res); + + if (group_lock) { + l_wait_event_abortable(group_lock->l_waitq, + ldlm_is_destroyed(lock)); + LDLM_LOCK_RELEASE(lock); + goto repeat; + } ldlm_resource_putref(res); if (lock) { diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 34a449e..dac829f 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -2522,15 +2522,30 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file, if (ll_file_nolock(file)) return -EOPNOTSUPP; - read_lock(&lli->lli_lock); +retry: + if (file->f_flags & O_NONBLOCK) { + if (!mutex_trylock(&lli->lli_group_mutex)) + return -EAGAIN; + } else + mutex_lock(&lli->lli_group_mutex); + if (fd->fd_flags & LL_FILE_GROUP_LOCKED) { CWARN("group lock already existed with gid %lu\n", fd->fd_grouplock.lg_gid); - read_unlock(&lli->lli_lock); - return -EINVAL; + rc = -EINVAL; + goto out; + } + if (arg != lli->lli_group_gid && lli->lli_group_users != 0) { + if (file->f_flags & O_NONBLOCK) { + rc = -EAGAIN; + goto out; + } + mutex_unlock(&lli->lli_group_mutex); + wait_var_event(&lli->lli_group_users, !lli->lli_group_users); + rc = 0; + goto retry; } LASSERT(!fd->fd_grouplock.lg_lock); - read_unlock(&lli->lli_lock); /** * XXX: group lock needs to protect all OST objects while PFL @@ -2549,8 +2564,10 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file, u16 refcheck; env = cl_env_get(&refcheck); - if (IS_ERR(env)) - return PTR_ERR(env); + if (IS_ERR(env)) { + rc = PTR_ERR(env); + goto out; + } rc = cl_object_layout_get(env, obj, &cl); if (rc >= 0 && cl.cl_is_composite) @@ -2559,28 +2576,26 @@ static int ll_lov_setstripe(struct inode *inode, struct file *file, cl_env_put(env, &refcheck); if (rc < 0) - return rc; + goto out; } rc = cl_get_grouplock(ll_i2info(inode)->lli_clob, arg, (file->f_flags & O_NONBLOCK), &grouplock); - if (rc) - return rc; - write_lock(&lli->lli_lock); - if (fd->fd_flags & LL_FILE_GROUP_LOCKED) { - write_unlock(&lli->lli_lock); - CERROR("another thread just won the race\n"); - cl_put_grouplock(&grouplock); - return -EINVAL; - } + if (rc) + goto out; fd->fd_flags |= LL_FILE_GROUP_LOCKED; fd->fd_grouplock = grouplock; - write_unlock(&lli->lli_lock); + if (lli->lli_group_users == 0) + lli->lli_group_gid = grouplock.lg_gid; + lli->lli_group_users++; CDEBUG(D_INFO, "group lock %lu obtained\n", arg); - return 0; +out: + mutex_unlock(&lli->lli_group_mutex); + + return rc; } static int ll_put_grouplock(struct inode *inode, struct file *file, @@ -2589,31 +2604,40 @@ static int ll_put_grouplock(struct inode *inode, struct file *file, struct ll_inode_info *lli = ll_i2info(inode); struct ll_file_data *fd = file->private_data; struct ll_grouplock grouplock; + int rc; - write_lock(&lli->lli_lock); + mutex_lock(&lli->lli_group_mutex); if (!(fd->fd_flags & LL_FILE_GROUP_LOCKED)) { - write_unlock(&lli->lli_lock); CWARN("no group lock held\n"); - return -EINVAL; + rc = -EINVAL; + goto out; } - LASSERT(fd->fd_grouplock.lg_lock); if (fd->fd_grouplock.lg_gid != arg) { CWARN("group lock %lu doesn't match current id %lu\n", arg, fd->fd_grouplock.lg_gid); - write_unlock(&lli->lli_lock); - return -EINVAL; + rc = -EINVAL; + goto out; } grouplock = fd->fd_grouplock; memset(&fd->fd_grouplock, 0, sizeof(fd->fd_grouplock)); fd->fd_flags &= ~LL_FILE_GROUP_LOCKED; - write_unlock(&lli->lli_lock); cl_put_grouplock(&grouplock); + + lli->lli_group_users--; + if (lli->lli_group_users == 0) { + lli->lli_group_gid = 0; + wake_up_var(&lli->lli_group_users); + } CDEBUG(D_INFO, "group lock %lu released\n", arg); - return 0; + rc = 0; +out: + mutex_unlock(&lli->lli_group_mutex); + + return rc; } /** diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index d245dd8..998eed8 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -253,6 +253,9 @@ struct ll_inode_info { u64 lli_pcc_generation; enum pcc_dataset_flags lli_pcc_dsflags; struct pcc_inode *lli_pcc_inode; + struct mutex lli_group_mutex; + u64 lli_group_users; + unsigned long lli_group_gid; u64 lli_attr_valid; u64 lli_lazysize; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 3dc0030..176e61b5 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1194,6 +1194,9 @@ void ll_lli_init(struct ll_inode_info *lli) lli->lli_pcc_inode = NULL; lli->lli_pcc_dsflags = PCC_DATASET_INVALID; lli->lli_pcc_generation = 0; + mutex_init(&lli->lli_group_mutex); + lli->lli_group_users = 0; + lli->lli_group_gid = 0; } mutex_init(&lli->lli_layout_mutex); memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid)); diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 978fee3..e0f5b45 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -330,7 +330,6 @@ static int mdc_dlm_canceling(const struct lu_env *env, */ if (obj) { struct cl_attr *attr = &osc_env_info(env)->oti_attr; - void *data; /* Destroy pages covered by the extent of the DLM lock */ result = mdc_lock_flush(env, cl2osc(obj), cl_index(obj, 0), @@ -340,17 +339,12 @@ static int mdc_dlm_canceling(const struct lu_env *env, */ /* losing a lock, update kms */ lock_res_and_lock(dlmlock); - data = dlmlock->l_ast_data; dlmlock->l_ast_data = NULL; cl_object_attr_lock(obj); attr->cat_kms = 0; cl_object_attr_update(env, obj, attr, CAT_KMS); cl_object_attr_unlock(obj); unlock_res_and_lock(dlmlock); - - /* Skip dec in case mdc_object_ast_clear() did it */ - if (data && dlmlock->l_req_mode == LCK_GROUP) - osc_grouplock_dec(cl2osc(obj), dlmlock); cl_object_put(env, obj); } return result; @@ -457,7 +451,7 @@ void mdc_lock_lvb_update(const struct lu_env *env, struct osc_object *osc, } static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, - struct lustre_handle *lockh, int errcode) + struct lustre_handle *lockh) { struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj); struct ldlm_lock *dlmlock; @@ -510,9 +504,6 @@ static void mdc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, LASSERT(oscl->ols_state != OLS_GRANTED); oscl->ols_state = OLS_GRANTED; - - if (errcode != ELDLM_LOCK_MATCHED && dlmlock->l_req_mode == LCK_GROUP) - osc_grouplock_inc_locked(osc, dlmlock); } /** @@ -544,7 +535,7 @@ static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh, CDEBUG(D_INODE, "rc %d, err %d\n", rc, errcode); if (rc == 0) - mdc_lock_granted(env, oscl, lockh, errcode); + mdc_lock_granted(env, oscl, lockh); /* Error handling, some errors are tolerable. */ if (oscl->ols_glimpse && rc == -ENAVAIL) { @@ -706,7 +697,8 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, struct ldlm_intent *lit; enum ldlm_mode mode; bool glimpse = *flags & LDLM_FL_HAS_INTENT; - u64 match_flags = *flags; + u64 search_flags = *flags; + u64 match_flags = 0; LIST_HEAD(cancels); int rc, count; int lvb_size; @@ -716,11 +708,14 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, if (einfo->ei_mode == LCK_PR) mode |= LCK_PW; - match_flags |= LDLM_FL_LVB_READY; + search_flags |= LDLM_FL_LVB_READY; if (glimpse) - match_flags |= LDLM_FL_BLOCK_GRANTED; - mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id, - einfo->ei_type, policy, mode, &lockh); + search_flags |= LDLM_FL_BLOCK_GRANTED; + if (mode == LCK_GROUP) + match_flags = LDLM_MATCH_GROUP; + mode = ldlm_lock_match_with_skip(obd->obd_namespace, search_flags, 0, + res_id, einfo->ei_type, policy, mode, + &lockh, match_flags); if (mode) { struct ldlm_lock *matched; @@ -833,9 +828,9 @@ int mdc_enqueue_send(const struct lu_env *env, struct obd_export *exp, * * This function does not wait for the network communication to complete. */ -static int __mdc_lock_enqueue(const struct lu_env *env, - const struct cl_lock_slice *slice, - struct cl_io *unused, struct cl_sync_io *anchor) +static int mdc_lock_enqueue(const struct lu_env *env, + const struct cl_lock_slice *slice, + struct cl_io *unused, struct cl_sync_io *anchor) { struct osc_thread_info *info = osc_env_info(env); struct osc_io *oio = osc_env_io(env); @@ -921,28 +916,6 @@ static int __mdc_lock_enqueue(const struct lu_env *env, return result; } -static int mdc_lock_enqueue(const struct lu_env *env, - const struct cl_lock_slice *slice, - struct cl_io *unused, struct cl_sync_io *anchor) -{ - struct osc_object *obj = cl2osc(slice->cls_obj); - struct osc_lock *oscl = cl2osc_lock(slice); - struct lustre_handle lh = { 0 }; - int rc; - - if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) { - rc = osc_grouplock_enqueue_init(env, obj, oscl, &lh); - if (rc < 0) - return rc; - } - - rc = __mdc_lock_enqueue(env, slice, unused, anchor); - - if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) - osc_grouplock_enqueue_fini(env, obj, oscl, &lh); - return rc; -} - static const struct cl_lock_operations mdc_lock_lockless_ops = { .clo_fini = osc_lock_fini, .clo_enqueue = mdc_lock_enqueue, @@ -1468,9 +1441,6 @@ static int mdc_object_ast_clear(struct ldlm_lock *lock, void *data) memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); cl_object_attr_unlock(&osc->oo_cl); ldlm_clear_lvb_cached(lock); - - if (lock->l_req_mode == LCK_GROUP) - osc_grouplock_dec(osc, lock); } return LDLM_ITER_CONTINUE; } diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index a3e72a6..3b22688 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -198,7 +198,7 @@ void osc_lock_lvb_update(const struct lu_env *env, } static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, - struct lustre_handle *lockh, int errcode) + struct lustre_handle *lockh) { struct osc_object *osc = cl2osc(oscl->ols_cl.cls_obj); struct ldlm_lock *dlmlock; @@ -254,126 +254,7 @@ static void osc_lock_granted(const struct lu_env *env, struct osc_lock *oscl, LASSERT(oscl->ols_state != OLS_GRANTED); oscl->ols_state = OLS_GRANTED; - - if (errcode != ELDLM_LOCK_MATCHED && dlmlock->l_req_mode == LCK_GROUP) - osc_grouplock_inc_locked(osc, dlmlock); -} - -void osc_grouplock_inc_locked(struct osc_object *osc, struct ldlm_lock *lock) -{ - LASSERT(lock->l_req_mode == LCK_GROUP); - - if (osc->oo_group_users == 0) - osc->oo_group_gid = lock->l_policy_data.l_extent.gid; - osc->oo_group_users++; - - LDLM_DEBUG(lock, "users %llu gid %llu\n", - osc->oo_group_users, - lock->l_policy_data.l_extent.gid); -} -EXPORT_SYMBOL(osc_grouplock_inc_locked); - -void osc_grouplock_dec(struct osc_object *osc, struct ldlm_lock *lock) -{ - LASSERT(lock->l_req_mode == LCK_GROUP); - - mutex_lock(&osc->oo_group_mutex); - - LASSERT(osc->oo_group_users > 0); - osc->oo_group_users--; - if (osc->oo_group_users == 0) { - osc->oo_group_gid = 0; - wake_up_all(&osc->oo_group_waitq); - } - mutex_unlock(&osc->oo_group_mutex); - - LDLM_DEBUG(lock, "users %llu gid %lu\n", - osc->oo_group_users, osc->oo_group_gid); } -EXPORT_SYMBOL(osc_grouplock_dec); - -int osc_grouplock_enqueue_init(const struct lu_env *env, - struct osc_object *obj, - struct osc_lock *oscl, - struct lustre_handle *lh) -{ - struct cl_lock_descr *need = &oscl->ols_cl.cls_lock->cll_descr; - int rc = 0; - - LASSERT(need->cld_mode == CLM_GROUP); - - while (true) { - bool check_gid = true; - - if (oscl->ols_flags & LDLM_FL_BLOCK_NOWAIT) { - if (!mutex_trylock(&obj->oo_group_mutex)) - return -EAGAIN; - } else { - mutex_lock(&obj->oo_group_mutex); - } - - /** - * If a grouplock of the same gid already exists, match it - * here in advance. Otherwise, if that lock is being cancelled - * there is a chance to get 2 grouplocks for the same file. - */ - if (obj->oo_group_users && - obj->oo_group_gid == need->cld_gid) { - struct osc_thread_info *info = osc_env_info(env); - struct ldlm_res_id *resname = &info->oti_resname; - union ldlm_policy_data *policy = &info->oti_policy; - struct cl_lock *lock = oscl->ols_cl.cls_lock; - u64 flags = oscl->ols_flags | LDLM_FL_BLOCK_GRANTED; - struct ldlm_namespace *ns; - enum ldlm_mode mode; - - ns = osc_export(obj)->exp_obd->obd_namespace; - ostid_build_res_name(&obj->oo_oinfo->loi_oi, resname); - osc_lock_build_policy(env, lock, policy); - mode = ldlm_lock_match(ns, flags, resname, - oscl->ols_einfo.ei_type, policy, - oscl->ols_einfo.ei_mode, lh); - if (mode) - oscl->ols_flags |= LDLM_FL_MATCH_LOCK; - else - check_gid = false; - } - - /** - * If a grouplock exists but cannot be matched, let it to flush - * and wait just for zero users for now. - */ - if (obj->oo_group_users == 0 || - (check_gid && obj->oo_group_gid == need->cld_gid)) - break; - - mutex_unlock(&obj->oo_group_mutex); - if (oscl->ols_flags & LDLM_FL_BLOCK_NOWAIT) - return -EAGAIN; - - rc = l_wait_event_abortable(obj->oo_group_waitq, - !obj->oo_group_users); - if (rc) - return rc; - } - - return 0; -} -EXPORT_SYMBOL(osc_grouplock_enqueue_init); - -void osc_grouplock_enqueue_fini(const struct lu_env *env, - struct osc_object *obj, - struct osc_lock *oscl, - struct lustre_handle *lh) -{ - LASSERT(oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP); - - /* If a user was added on enqueue_init, decref it */ - if (lustre_handle_is_used(lh)) - ldlm_lock_decref(lh, oscl->ols_einfo.ei_mode); - mutex_unlock(&obj->oo_group_mutex); -} -EXPORT_SYMBOL(osc_grouplock_enqueue_fini); /** * Lock upcall function that is executed either when a reply to ENQUEUE rpc is @@ -403,7 +284,7 @@ static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh, } if (rc == 0) - osc_lock_granted(env, oscl, lockh, errcode); + osc_lock_granted(env, oscl, lockh); /* Error handling, some errors are tolerable. */ if (oscl->ols_glimpse && rc == -ENAVAIL) { @@ -540,7 +421,6 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env, struct ldlm_extent *extent = &dlmlock->l_policy_data.l_extent; struct cl_attr *attr = &osc_env_info(env)->oti_attr; u64 old_kms; - void *data; /* Destroy pages covered by the extent of the DLM lock */ result = osc_lock_flush(cl2osc(obj), @@ -553,7 +433,6 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env, /* clearing l_ast_data after flushing data, * to let glimpse ast find the lock and the object */ - data = dlmlock->l_ast_data; dlmlock->l_ast_data = NULL; cl_object_attr_lock(obj); /* Must get the value under the lock to avoid race. */ @@ -567,9 +446,6 @@ static int __osc_dlm_blocking_ast(const struct lu_env *env, cl_object_attr_unlock(obj); unlock_res_and_lock(dlmlock); - /* Skip dec in case osc_object_ast_clear() did it */ - if (data && dlmlock->l_req_mode == LCK_GROUP) - osc_grouplock_dec(cl2osc(obj), dlmlock); cl_object_put(env, obj); } return result; @@ -1055,9 +931,9 @@ int osc_lock_enqueue_wait(const struct lu_env *env, struct osc_object *obj, * * This function does not wait for the network communication to complete. */ -static int __osc_lock_enqueue(const struct lu_env *env, - const struct cl_lock_slice *slice, - struct cl_io *unused, struct cl_sync_io *anchor) +static int osc_lock_enqueue(const struct lu_env *env, + const struct cl_lock_slice *slice, + struct cl_io *unused, struct cl_sync_io *anchor) { struct osc_thread_info *info = osc_env_info(env); struct osc_io *oio = osc_env_io(env); @@ -1177,29 +1053,6 @@ static int __osc_lock_enqueue(const struct lu_env *env, return result; } -static int osc_lock_enqueue(const struct lu_env *env, - const struct cl_lock_slice *slice, - struct cl_io *unused, struct cl_sync_io *anchor) -{ - struct osc_object *obj = cl2osc(slice->cls_obj); - struct osc_lock *oscl = cl2osc_lock(slice); - struct lustre_handle lh = { 0 }; - int rc; - - if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) { - rc = osc_grouplock_enqueue_init(env, obj, oscl, &lh); - if (rc < 0) - return rc; - } - - rc = __osc_lock_enqueue(env, slice, unused, anchor); - - if (oscl->ols_cl.cls_lock->cll_descr.cld_mode == CLM_GROUP) - osc_grouplock_enqueue_fini(env, obj, oscl, &lh); - - return rc; -} - /** * Breaks a link between osc_lock and dlm_lock. */ diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c index c3667a3..efb0533 100644 --- a/fs/lustre/osc/osc_object.c +++ b/fs/lustre/osc/osc_object.c @@ -74,10 +74,6 @@ int osc_object_init(const struct lu_env *env, struct lu_object *obj, atomic_set(&osc->oo_nr_ios, 0); init_waitqueue_head(&osc->oo_io_waitq); - init_waitqueue_head(&osc->oo_group_waitq); - mutex_init(&osc->oo_group_mutex); - osc->oo_group_users = 0; - osc->oo_group_gid = 0; osc->oo_root.rb_node = NULL; INIT_LIST_HEAD(&osc->oo_hp_exts); @@ -117,7 +113,6 @@ void osc_object_free(const struct lu_env *env, struct lu_object *obj) LASSERT(atomic_read(&osc->oo_nr_writes) == 0); LASSERT(list_empty(&osc->oo_ol_list)); LASSERT(!atomic_read(&osc->oo_nr_ios)); - LASSERT(!osc->oo_group_users); lu_object_fini(obj); /* osc doen't contain an lu_object_header, so we don't need call_rcu */ @@ -230,17 +225,6 @@ static int osc_object_ast_clear(struct ldlm_lock *lock, void *data) memcpy(lvb, &oinfo->loi_lvb, sizeof(oinfo->loi_lvb)); cl_object_attr_unlock(&osc->oo_cl); ldlm_clear_lvb_cached(lock); - - /** - * Object is being destroyed and gets unlinked from the lock, - * IO is finished and no cached data is left under the lock. As - * grouplock is immediately marked CBPENDING it is not reused. - * It will also be not possible to flush data later due to a - * NULL l_ast_data - enough conditions to let new grouplocks to - * be enqueued even if the lock still exists on client. - */ - if (lock->l_req_mode == LCK_GROUP) - osc_grouplock_dec(osc, lock); } return LDLM_ITER_CONTINUE; } diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index 7577fad..5a3f418 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -3009,7 +3009,8 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, struct lustre_handle lockh = { 0 }; struct ptlrpc_request *req = NULL; int intent = *flags & LDLM_FL_HAS_INTENT; - u64 match_flags = *flags; + u64 search_flags = *flags; + u64 match_flags = 0; enum ldlm_mode mode; int rc; @@ -3040,11 +3041,14 @@ int osc_enqueue_base(struct obd_export *exp, struct ldlm_res_id *res_id, * because they will not actually use the lock. */ if (!speculative) - match_flags |= LDLM_FL_LVB_READY; + search_flags |= LDLM_FL_LVB_READY; if (intent != 0) - match_flags |= LDLM_FL_BLOCK_GRANTED; - mode = ldlm_lock_match(obd->obd_namespace, match_flags, res_id, - einfo->ei_type, policy, mode, &lockh); + search_flags |= LDLM_FL_BLOCK_GRANTED; + if (mode == LCK_GROUP) + match_flags = LDLM_MATCH_GROUP; + mode = ldlm_lock_match_with_skip(obd->obd_namespace, search_flags, 0, + res_id, einfo->ei_type, policy, mode, + &lockh, match_flags); if (mode) { struct ldlm_lock *matched; From patchwork Sun Nov 20 14:17:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A4283C433FE for ; Sun, 20 Nov 2022 14:39:57 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXmY3R2Yz1wZV; Sun, 20 Nov 2022 06:22:53 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXm33nbnz226T for ; Sun, 20 Nov 2022 06:22:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E761A1009355; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E460CE8B84; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:03 -0500 Message-Id: <1668953828-10909-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/22] lnet: Signal completion on ping send failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Call complete() on the ping_data::completion if we get LNET_EVENT_SEND with non-zero status. Otherwise the thread which issued the ping is stuck waiting for the full ping timeout. A pd_unlinked member is added to struct ping_data to indicate whether the associated MD has been unlinked. This is checked by lnet_ping() to determine whether it needs to explicitly called LNetMDUnlink(). Lastly, in cases where we do not receive a reply, we now return the value of pd.rc, if it is non-zero, rather than -EIO. This can provide more information about the underlying ping failure. HPE-bug-id: LUS-11317 WC-bug-id: https://jira.whamcloud.com/browse/LU-16290 Lustre-commit: 48c34c71de65e8a25 ("LU-16290 lnet: Signal completion on ping send failure") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49020 Reviewed-by: Serguei Smirnov Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/api-ni.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 935c848..8b53adf 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -5333,6 +5333,7 @@ void LNetDebugPeer(struct lnet_processid *id) struct ping_data { int rc; int replied; + int pd_unlinked; struct lnet_handle_md mdh; struct completion completion; }; @@ -5353,7 +5354,12 @@ struct ping_data { pd->replied = 1; pd->rc = event->mlength; } + if (event->unlinked) + pd->pd_unlinked = 1; + + if (event->unlinked || + (event->type == LNET_EVENT_SEND && event->status)) complete(&pd->completion); } @@ -5424,13 +5430,14 @@ static int lnet_ping(struct lnet_process_id id4, struct lnet_nid *src_nid, /* NB must wait for the UNLINK event below... */ } - if (wait_for_completion_timeout(&pd.completion, timeout) == 0) { - /* Ensure completion in finite time... */ + /* Ensure completion in finite time... */ + wait_for_completion_timeout(&pd.completion, timeout); + if (!pd.pd_unlinked) { LNetMDUnlink(pd.mdh); wait_for_completion(&pd.completion); } if (!pd.replied) { - rc = -EIO; + rc = pd.rc ?: -EIO; goto fail_ping_buffer_decref; } From patchwork Sun Nov 20 14:17:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 94B2DC4332F for ; Sun, 20 Nov 2022 14:41:29 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXnT0zZDz1y6P; Sun, 20 Nov 2022 06:23:41 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXnC6qf3z2278 for ; Sun, 20 Nov 2022 06:23:27 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id EF2C11009356; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EB19EE8B8B; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:04 -0500 Message-Id: <1668953828-10909-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/22] lnet: extend lnet_is_nid_in_ping_info() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown lnet_is_nid_in_ping_info() now checks the ping_info for both nid4 and larger nids. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: 56bcfbf22d91b96c3 ("LU-10391 lnet: extend lnet_is_nid_in_ping_info()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44629 Reviewed-by: Oleg Drokin Reviewed-by: Frank Sehr Reviewed-by: James Simmons Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 9 ++++++ net/lnet/lnet/peer.c | 71 +++++++++++++++++++++++++++++++++++++------ 2 files changed, 70 insertions(+), 10 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 13ce2bf..7ce6cff 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -886,6 +886,15 @@ static inline void lnet_ping_buffer_decref(struct lnet_ping_buffer *pbuf) } } +struct lnet_ping_iter { + struct lnet_ping_info *pinfo; + void *pos, *end; +}; + +u32 *ping_iter_first(struct lnet_ping_iter *pi, struct lnet_ping_buffer *pbuf, + struct lnet_nid *nid); +u32 *ping_iter_next(struct lnet_ping_iter *pi, struct lnet_nid *nid); + static inline int lnet_push_target_resize_needed(void) { return the_lnet.ln_push_target->pb_nbytes < the_lnet.ln_push_target_nbytes; diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 35b135e..b33d6ac 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2875,6 +2875,56 @@ static void lnet_discovery_event_handler(struct lnet_event *event) lnet_net_unlock(LNET_LOCK_EX); } +u32 *ping_iter_first(struct lnet_ping_iter *pi, + struct lnet_ping_buffer *pbuf, + struct lnet_nid *nid) +{ + pi->pinfo = &pbuf->pb_info; + pi->pos = &pbuf->pb_info.pi_ni; + pi->end = (void *)pi->pinfo + + min_t(int, pbuf->pb_nbytes, + lnet_ping_info_size(pi->pinfo)); + /* lnet_ping_info_validiate ensures there will be one + * lnet_ni_status at the start + */ + if (nid) + lnet_nid4_to_nid(pbuf->pb_info.pi_ni[0].ns_nid, nid); + return &pbuf->pb_info.pi_ni[0].ns_status; +} + +u32 *ping_iter_next(struct lnet_ping_iter *pi, struct lnet_nid *nid) +{ + int off = offsetof(struct lnet_ping_info, pi_ni[pi->pinfo->pi_nnis]); + + if (pi->pos < ((void *)pi->pinfo + off)) { + struct lnet_ni_status *ns = pi->pos; + + pi->pos = ns + 1; + if (pi->pos > pi->end) + return NULL; + if (nid) + lnet_nid4_to_nid(ns->ns_nid, nid); + return &ns->ns_status; + } + + while (pi->pinfo->pi_features & LNET_PING_FEAT_LARGE_ADDR) { + struct lnet_ni_large_status *lns = pi->pos; + + if (pi->pos + 8 > pi->end) + /* Not safe to examine next */ + return NULL; + pi->pos = lnet_ping_sts_next(lns); + if (pi->pos > pi->end) + return NULL; + if (NID_BYTES(&lns->ns_nid) > sizeof(struct lnet_nid)) + continue; + if (nid) + *nid = lns->ns_nid; + return &lns->ns_status; + } + return NULL; +} + /* * Build a peer from incoming data. * @@ -3140,16 +3190,18 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, return 0; } -static bool lnet_is_nid_in_ping_info(lnet_nid_t nid, - struct lnet_ping_info *pinfo) +static bool lnet_is_nid_in_ping_info(struct lnet_nid *nid, + struct lnet_ping_buffer *pbuf) { - int i; - - for (i = 0; i < pinfo->pi_nnis; i++) { - if (pinfo->pi_ni[i].ns_nid == nid) + struct lnet_ping_iter pi; + struct lnet_nid pnid; + u32 *st; + + for (st = ping_iter_first(&pi, pbuf, &pnid); + st; + st = ping_iter_next(&pi, &pnid)) + if (nid_same(nid, &pnid)) return true; - } - return false; } @@ -3308,8 +3360,7 @@ static int lnet_peer_data_present(struct lnet_peer *lp) * recorded in that peer. */ } else if (nid_same(&lp->lp_primary_nid, &nid) || - (lnet_is_nid_in_ping_info(lnet_nid_to_nid4(&lp->lp_primary_nid), - &pbuf->pb_info) && + (lnet_is_nid_in_ping_info(&lp->lp_primary_nid, pbuf) && lnet_is_discovery_disabled(lp))) { rc = lnet_peer_merge_data(lp, pbuf); } else { From patchwork Sun Nov 20 14:17:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8972BC4332F for ; Sun, 20 Nov 2022 14:43:14 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXpX0b4fz22Pg; Sun, 20 Nov 2022 06:24:36 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXpL2trYz1yDb for ; Sun, 20 Nov 2022 06:24:26 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id F1A401009357; Sun, 20 Nov 2022 09:17:09 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id EE6CCE8B88; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:05 -0500 Message-Id: <1668953828-10909-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/22] lnet: find correct primary for peer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown If the peer has a large-address for the primary, it can now be found. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: 022b46d887603f703 ("LU-10391 lnet: find correct primary for peer") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44632 Reviewed-by: Serguei Smirnov Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Reviewed-by: James Simmons Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 41 ++++++++++++++++++++++++++++++++++------- 1 file changed, 34 insertions(+), 7 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index b33d6ac..a1305b6 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -2585,11 +2585,40 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) libcfs_nidstr(&lp->lp_primary_nid), ev->status); } +static bool find_primary(struct lnet_nid *nid, + struct lnet_ping_buffer *pbuf) +{ + struct lnet_ping_info *pi = &pbuf->pb_info; + struct lnet_ping_iter piter; + u32 *stp; + + if (pi->pi_features & LNET_PING_FEAT_PRIMARY_LARGE) { + /* First large nid is primary */ + for (stp = ping_iter_first(&piter, pbuf, nid); + stp; + stp = ping_iter_next(&piter, nid)) { + if (nid_is_nid4(nid)) + continue; + /* nid has already been copied in */ + return true; + } + /* no large nids ... weird ... ignore the flag + * and use first nid. + */ + } + /* pi_nids[1] is primary */ + if (pi->pi_nnis < 2) + return false; + lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, nid); + return true; +} + /* Handle a Reply message. This is the reply to a Ping message. */ static void lnet_discovery_event_reply(struct lnet_peer *lp, struct lnet_event *ev) { struct lnet_ping_buffer *pbuf; + struct lnet_nid primary; int infobytes; int rc; bool ping_feat_disc; @@ -2731,9 +2760,8 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) * available if the reply came from a Multi-Rail peer. */ if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL && - pbuf->pb_info.pi_nnis > 1 && - lnet_nid_to_nid4(&lp->lp_primary_nid) == - pbuf->pb_info.pi_ni[1].ns_nid) { + find_primary(&primary, pbuf) && + nid_same(&lp->lp_primary_nid, &primary)) { if (LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno) CDEBUG(D_NET, "peer %s: seq# got %u have %u. peer rebooted?\n", @@ -3081,11 +3109,11 @@ static int lnet_peer_merge_data(struct lnet_peer *lp, * peer's lp_peer_nets list, and the peer NI for the primary NID should * be the first entry in its peer net's lpn_peer_nis list. */ - lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid); + find_primary(&nid, pbuf); lpni = lnet_peer_ni_find_locked(&nid); if (!lpni) { CERROR("Internal error: Failed to lookup peer NI for primary NID: %s\n", - libcfs_nid2str(pbuf->pb_info.pi_ni[1].ns_nid)); + libcfs_nidstr(&nid)); goto out; } @@ -3341,11 +3369,10 @@ static int lnet_peer_data_present(struct lnet_peer *lp) * primary NID to the correct value here. Moreover, this peer * can show up with only the loopback NID in the ping buffer. */ - if (pbuf->pb_info.pi_nnis <= 1) { + if (!find_primary(&nid, pbuf)) { lnet_ping_buffer_decref(pbuf); goto out; } - lnet_nid4_to_nid(pbuf->pb_info.pi_ni[1].ns_nid, &nid); if (nid_is_lo0(&lp->lp_primary_nid)) { rc = lnet_peer_set_primary_nid(lp, &nid, flags); if (rc) From patchwork Sun Nov 20 14:17:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A2C85C433FE for ; Sun, 20 Nov 2022 14:43:33 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXq50jY1z22RT; Sun, 20 Nov 2022 06:25:05 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXpm2GCLz22RJ for ; Sun, 20 Nov 2022 06:24:48 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 039611009358; Sun, 20 Nov 2022 09:17:10 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F35A9E8B84; Sun, 20 Nov 2022 09:17:09 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:06 -0500 Message-Id: <1668953828-10909-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 20/22] lnet: change lnet_notify() to take struct lnet_nid X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown lnet_notify() now takes a 'struct lnet_nid *' instead of a lnet_nid_t. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: 4a88236f40a47c05d ("LU-10391 lnet: change lnet_notify() to take struct lnet_nid") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44633 Reviewed-by: Chris Horn Reviewed-by: Serguei Smirnov Reviewed-by: Frank Sehr Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 4 ++-- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 10 +++++++--- net/lnet/klnds/socklnd/socklnd.c | 2 +- net/lnet/lnet/api-ni.c | 3 ++- net/lnet/lnet/router.c | 15 +++++++-------- 5 files changed, 19 insertions(+), 15 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 7ce6cff..3bcea11 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -574,8 +574,8 @@ unsigned int lnet_nid_cpt_hash(struct lnet_nid *nid, void lnet_mt_event_handler(struct lnet_event *event); -int lnet_notify(struct lnet_ni *ni, lnet_nid_t peer, bool alive, bool reset, - time64_t when); +int lnet_notify(struct lnet_ni *ni, struct lnet_nid *peer, bool alive, + bool reset, time64_t when); void lnet_notify_locked(struct lnet_peer_ni *lp, int notifylnd, int alive, time64_t when); int lnet_add_route(u32 net, u32 hops, struct lnet_nid *gateway, diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index d4de326..451363b 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1967,9 +1967,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, read_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); - if (error) - lnet_notify(peer_ni->ibp_ni, - peer_ni->ibp_nid, false, false, last_alive); + if (error != 0) { + struct lnet_nid nid; + + lnet_nid4_to_nid(peer_ni->ibp_nid, &nid); + lnet_notify(peer_ni->ibp_ni, &nid, + false, false, last_alive); + } } void diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 21fccfa..d8d1071 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -1424,7 +1424,7 @@ struct ksock_peer_ni * if (notify) lnet_notify(peer_ni->ksnp_ni, - lnet_nid_to_nid4(&peer_ni->ksnp_id.nid), + &peer_ni->ksnp_id.nid, false, false, last_alive); } diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 8b53adf..5be2aff 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -4372,7 +4372,8 @@ u32 lnet_get_dlc_seq_locked(void) * that deadline to the wall clock. */ deadline += ktime_get_seconds(); - return lnet_notify(NULL, data->ioc_nid, data->ioc_flags, false, + lnet_nid4_to_nid(data->ioc_nid, &nid); + return lnet_notify(NULL, &nid, data->ioc_flags, false, deadline); } diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index ee4f1d8..358c3f1 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1672,26 +1672,25 @@ bool lnet_router_checker_active(void) * when: notificaiton time. */ int -lnet_notify(struct lnet_ni *ni, lnet_nid_t nid4, bool alive, bool reset, +lnet_notify(struct lnet_ni *ni, struct lnet_nid *nid, bool alive, bool reset, time64_t when) { struct lnet_peer_ni *lpni = NULL; struct lnet_route *route; struct lnet_peer *lp; time64_t now = ktime_get_seconds(); - struct lnet_nid nid; int cpt; LASSERT(!in_interrupt()); CDEBUG(D_NET, "%s notifying %s: %s\n", !ni ? "userspace" : libcfs_nidstr(&ni->ni_nid), - libcfs_nidstr(&nid), alive ? "up" : "down"); + libcfs_nidstr(nid), alive ? "up" : "down"); if (ni && - LNET_NID_NET(&ni->ni_nid) != LNET_NID_NET(&nid)) { + LNET_NID_NET(&ni->ni_nid) != LNET_NID_NET(nid)) { CWARN("Ignoring notification of %s %s by %s (different net)\n", - libcfs_nidstr(&nid), alive ? "birth" : "death", + libcfs_nidstr(nid), alive ? "birth" : "death", libcfs_nidstr(&ni->ni_nid)); return -EINVAL; } @@ -1700,7 +1699,7 @@ bool lnet_router_checker_active(void) if (when > now) { CWARN("Ignoring prediction from %s of %s %s %lld seconds in the future\n", ni ? libcfs_nidstr(&ni->ni_nid) : "userspace", - libcfs_nidstr(&nid), alive ? "up" : "down", when - now); + libcfs_nidstr(nid), alive ? "up" : "down", when - now); return -EINVAL; } @@ -1718,11 +1717,11 @@ bool lnet_router_checker_active(void) return -ESHUTDOWN; } - lpni = lnet_peer_ni_find_locked(&nid); + lpni = lnet_peer_ni_find_locked(nid); if (!lpni) { /* nid not found */ lnet_net_unlock(0); - CDEBUG(D_NET, "%s not found\n", libcfs_nidstr(&nid)); + CDEBUG(D_NET, "%s not found\n", libcfs_nidstr(nid)); return 0; } From patchwork Sun Nov 20 14:17:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BDEB1C433FE for ; Sun, 20 Nov 2022 14:45:20 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXvd4xgJz22Xp; Sun, 20 Nov 2022 06:29:01 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXqd4Z0Hz22SJ for ; Sun, 20 Nov 2022 06:25:33 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 064A91009359; Sun, 20 Nov 2022 09:17:10 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 032F5E8B89; Sun, 20 Nov 2022 09:17:10 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:07 -0500 Message-Id: <1668953828-10909-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/22] lnet: discard lnet_nid2ni_*() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown These 'struct lnet_ni' lookup functions which take a nid4, are discarded in favour of the versions which take a 'struct lnet_nid'. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: cbfbe6d132c6d0fe5 ("LU-10391 lnet: discard lnet_nid2ni_*()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44634 Reviewed-by: Serguei Smirnov Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Reviewed-by: James Simmons Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 -- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 9 +++++---- net/lnet/lnet/api-ni.c | 33 +++------------------------------ 3 files changed, 8 insertions(+), 36 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 3bcea11..a2d5adc 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -542,9 +542,7 @@ unsigned int lnet_nid_cpt_hash(struct lnet_nid *nid, int lnet_cpt_of_nid_locked(struct lnet_nid *nid, struct lnet_ni *ni); int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni); int lnet_nid2cpt(struct lnet_nid *nid, struct lnet_ni *ni); -struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt); struct lnet_ni *lnet_nid_to_ni_locked(struct lnet_nid *nid, int cpt); -struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid); struct lnet_ni *lnet_net2ni_locked(u32 net, int cpt); struct lnet_ni *lnet_net2ni_addref(u32 net); struct lnet_ni *lnet_nid_to_ni_addref(struct lnet_nid *nid); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 451363b..6fc1730 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -2397,8 +2397,9 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, struct kib_peer_ni *peer_ni; struct kib_peer_ni *peer2; struct kib_conn *conn; - struct lnet_ni *ni = NULL; + struct lnet_ni *ni = NULL; struct kib_net *net = NULL; + struct lnet_nid destnid; lnet_nid_t nid; struct rdma_conn_param cp; struct kib_rej rej; @@ -2461,7 +2462,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, } nid = reqmsg->ibm_srcnid; - ni = lnet_nid2ni_addref(reqmsg->ibm_dstnid); + lnet_nid4_to_nid(reqmsg->ibm_dstnid, &destnid); + ni = lnet_nid_to_ni_addref(&destnid); if (ni) { net = (struct kib_net *)ni->ni_data; @@ -2469,8 +2471,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, } if (!ni || /* no matching net */ - lnet_nid_to_nid4(&ni->ni_nid) != - reqmsg->ibm_dstnid || /* right NET, wrong NID! */ + !nid_same(&ni->ni_nid, &destnid) || /* right NET, wrong NID! */ net->ibn_dev != ibdev) { /* wrong device */ CERROR("Can't accept conn from %s on %s (%s:%d:%pI4h): bad dst nid %s\n", libcfs_nid2str(nid), diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 5be2aff..0146509 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1654,33 +1654,6 @@ struct lnet_ni * return NULL; } -struct lnet_ni * -lnet_nid2ni_locked(lnet_nid_t nid4, int cpt) -{ - struct lnet_nid nid; - - lnet_nid4_to_nid(nid4, &nid); - return lnet_nid_to_ni_locked(&nid, cpt); -} - -struct lnet_ni * -lnet_nid2ni_addref(lnet_nid_t nid4) -{ - struct lnet_ni *ni; - struct lnet_nid nid; - - lnet_nid4_to_nid(nid4, &nid); - - lnet_net_lock(0); - ni = lnet_nid_to_ni_locked(&nid, 0); - if (ni) - lnet_ni_addref_locked(ni, 0); - lnet_net_unlock(0); - - return ni; -} -EXPORT_SYMBOL(lnet_nid2ni_addref); - struct lnet_ni * lnet_nid_to_ni_addref(struct lnet_nid *nid) { @@ -3918,11 +3891,11 @@ u32 lnet_get_dlc_seq_locked(void) { int cpt, rc = 0; struct lnet_ni *ni; - lnet_nid_t nid = stats->hlni_nid; + struct lnet_nid nid; + lnet_nid4_to_nid(stats->hlni_nid, &nid); cpt = lnet_net_lock_current(); - ni = lnet_nid2ni_locked(nid, cpt); - + ni = lnet_nid_to_ni_locked(&nid, cpt); if (!ni) { rc = -ENOENT; goto unlock; From patchwork Sun Nov 20 14:17:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13050074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C77A8C4332F for ; Sun, 20 Nov 2022 14:47:13 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4NFXwM0Wczz22YL; Sun, 20 Nov 2022 06:29:39 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4NFXsC6cw7z22Vd for ; Sun, 20 Nov 2022 06:26:55 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 09182100935B; Sun, 20 Nov 2022 09:17:10 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0707FE8B8B; Sun, 20 Nov 2022 09:17:10 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Nov 2022 09:17:08 -0500 Message-Id: <1668953828-10909-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> References: <1668953828-10909-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 22/22] lnet: change lnet_debug_peer() to struct lnet_nid X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown lnet_debug_peer() now takes 'struct lnet_nid *'. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: e834ad5992adef598 ("LU-10391 lnet: change lnet_debug_peer() to struct lnet_nid") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44635 Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Reviewed-by: James Simmons Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 2 +- net/lnet/lnet/api-ni.c | 2 +- net/lnet/lnet/peer.c | 10 ++++------ 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index a2d5adc..ba68d50 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -936,7 +936,7 @@ void lnet_peer_primary_nid_locked(struct lnet_nid *nid, void lnet_peer_tables_cleanup(struct lnet_net *net); void lnet_peer_uninit(void); int lnet_peer_tables_create(void); -void lnet_debug_peer(lnet_nid_t nid); +void lnet_debug_peer(struct lnet_nid *nid); struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer, u32 net_id); bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 0146509..e400de7 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -5257,7 +5257,7 @@ static int lnet_net_cmd(struct sk_buff *skb, struct genl_info *info) void LNetDebugPeer(struct lnet_processid *id) { - lnet_debug_peer(lnet_nid_to_nid4(&id->nid)); + lnet_debug_peer(&id->nid); } EXPORT_SYMBOL(LNetDebugPeer); diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index a1305b6..8c603c9 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -3966,21 +3966,19 @@ void lnet_peer_discovery_stop(void) /* Debugging */ void -lnet_debug_peer(lnet_nid_t nid4) +lnet_debug_peer(struct lnet_nid *nid) { char *aliveness = "NA"; struct lnet_peer_ni *lp; - struct lnet_nid nid; int cpt; - lnet_nid4_to_nid(nid4, &nid); - cpt = lnet_nid2cpt(&nid, NULL); + cpt = lnet_nid2cpt(nid, NULL); lnet_net_lock(cpt); - lp = lnet_peerni_by_nid_locked(&nid, NULL, cpt); + lp = lnet_peerni_by_nid_locked(nid, NULL, cpt); if (IS_ERR(lp)) { lnet_net_unlock(cpt); - CDEBUG(D_WARNING, "No peer %s\n", libcfs_nidstr(&nid)); + CDEBUG(D_WARNING, "No peer %s\n", libcfs_nidstr(nid)); return; }