From patchwork Mon Sep 17 17:30:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10603161 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 534AF157B for ; Mon, 17 Sep 2018 17:31:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 545482A2F2 for ; Mon, 17 Sep 2018 17:31:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 52E922A300; Mon, 17 Sep 2018 17:31:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 010332A2F2 for ; Mon, 17 Sep 2018 17:31:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B61F921FAF6; Mon, 17 Sep 2018 10:31:06 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7099221F7B3 for ; Mon, 17 Sep 2018 10:30:52 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id CB454498; Mon, 17 Sep 2018 13:30:46 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C9B412B3; Mon, 17 Sep 2018 13:30:46 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 17 Sep 2018 13:30:24 -0400 Message-Id: <1537205440-6656-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537205440-6656-1-git-send-email-jsimmons@infradead.org> References: <1537205440-6656-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/30] lustre: ldlm: cond_resched in ldlm_bl_thread_main X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Patrick Farrell When clearing all of the ldlm LRUs (as Cray does at the end of a job), a ldlm_bl_work_item is generated for each namespace and then they are placed on a list for the ldlm_bl threads to iterate over. If the number of namespaces greatly exceeds the number of ldlm_bl threads, a given thread will iterate over many namespaces without sleeping looking for work. This can go on for an extremely long time and result in an RCU stall. This patch adds a cond_resched() between completing one work item and looking for the next. This is a fairly cheap operation, as it will only schedule if there is an interrupt waiting, and it will not be called too much - Even the largest file systems have < 100 namespaces per ldlm_bl_thread currently. Signed-off-by: Patrick Farrell WC-bug-id: https://jira.whamcloud.com/browse/LU-8307 Reviewed-on: https://review.whamcloud.com/20888 Reviewed-by: Ned Bass Reviewed-by: Ann Koehler Reviewed-by: Ben Evans Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c index adc96b6..a8de3d9 100644 --- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c @@ -856,6 +856,12 @@ static int ldlm_bl_thread_main(void *arg) if (rc == LDLM_ITER_STOP) break; + + /* If there are many namespaces, we will not sleep waiting for + * work, and must do a cond_resched to avoid holding the CPU + * for too long + */ + cond_resched(); } atomic_dec(&blp->blp_num_threads);