From patchwork Sun Aug 26 20:53:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10576321 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4FC1174A for ; Sun, 26 Aug 2018 20:54:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A1ED629823 for ; Sun, 26 Aug 2018 20:54:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 95A4F2987E; Sun, 26 Aug 2018 20:54:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 479FE29823 for ; Sun, 26 Aug 2018 20:54:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726714AbeH0Ah0 (ORCPT ); Sun, 26 Aug 2018 20:37:26 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49636 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726741AbeH0Ah0 (ORCPT ); Sun, 26 Aug 2018 20:37:26 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 562AFDFDD; Sun, 26 Aug 2018 20:53:46 +0000 (UTC) Received: from llong.com (ovpn-121-41.rdu2.redhat.com [10.10.121.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id DAC002026D76; Sun, 26 Aug 2018 20:53:45 +0000 (UTC) From: Waiman Long To: "Darrick J. Wong" , Ingo Molnar , Peter Zijlstra Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Dave Chinner , Waiman Long Subject: [PATCH v2 1/3] sched/core: Export wake_q functions to kernel modules Date: Sun, 26 Aug 2018 16:53:13 -0400 Message-Id: <1535316795-21560-2-git-send-email-longman@redhat.com> In-Reply-To: <1535316795-21560-1-git-send-email-longman@redhat.com> References: <1535316795-21560-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Sun, 26 Aug 2018 20:53:46 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Sun, 26 Aug 2018 20:53:46 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The use of wake_q_add() and wake_up_q() functions help to do task wakeup without holding lock can help to reduce lock hold time. They should be available to kernel modules as well. A new task_in_wake_q() inline function is also added to check if the given task is in a wake_q. Signed-off-by: Waiman Long --- include/linux/sched/wake_q.h | 5 +++++ kernel/sched/core.c | 2 ++ 2 files changed, 7 insertions(+) diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h index 10b19a192b2d..902bf1228d32 100644 --- a/include/linux/sched/wake_q.h +++ b/include/linux/sched/wake_q.h @@ -47,6 +47,11 @@ static inline void wake_q_init(struct wake_q_head *head) head->lastp = &head->first; } +static inline bool task_in_wake_q(struct task_struct *task) +{ + return READ_ONCE(task->wake_q.next) != NULL; +} + extern void wake_q_add(struct wake_q_head *head, struct task_struct *task); extern void wake_up_q(struct wake_q_head *head); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 625bc9897f62..d90a2930b8ce 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -420,6 +420,7 @@ void wake_q_add(struct wake_q_head *head, struct task_struct *task) *head->lastp = node; head->lastp = &node->next; } +EXPORT_SYMBOL_GPL(wake_q_add); void wake_up_q(struct wake_q_head *head) { @@ -442,6 +443,7 @@ void wake_up_q(struct wake_q_head *head) put_task_struct(task); } } +EXPORT_SYMBOL_GPL(wake_up_q); /* * resched_curr - mark rq's current task 'to be rescheduled now'. From patchwork Sun Aug 26 20:53:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10576317 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DCA313B8 for ; Sun, 26 Aug 2018 20:53:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B56629802 for ; Sun, 26 Aug 2018 20:53:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0F67A2982C; Sun, 26 Aug 2018 20:53:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A7D7529802 for ; Sun, 26 Aug 2018 20:53:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727000AbeH0Ah2 (ORCPT ); Sun, 26 Aug 2018 20:37:28 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43576 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726908AbeH0Ah1 (ORCPT ); Sun, 26 Aug 2018 20:37:27 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3827240216ED; Sun, 26 Aug 2018 20:53:47 +0000 (UTC) Received: from llong.com (ovpn-121-41.rdu2.redhat.com [10.10.121.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 77F6D2026D76; Sun, 26 Aug 2018 20:53:46 +0000 (UTC) From: Waiman Long To: "Darrick J. Wong" , Ingo Molnar , Peter Zijlstra Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Dave Chinner , Waiman Long Subject: [PATCH v2 2/3] xfs: Prevent multiple wakeups of the same log space waiter Date: Sun, 26 Aug 2018 16:53:14 -0400 Message-Id: <1535316795-21560-3-git-send-email-longman@redhat.com> In-Reply-To: <1535316795-21560-1-git-send-email-longman@redhat.com> References: <1535316795-21560-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Sun, 26 Aug 2018 20:53:47 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Sun, 26 Aug 2018 20:53:47 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The current log space reservation code allows multiple wakeups of the same sleeping waiter to happen. This is a just a waste of cpu time as well as increasing spin lock hold time. So a new XLOG_TIC_WAKING flag is added to track if a task is being waken up and skip the wake_up_process() call if the flag is set. Running the AIM7 fserver workload on a 2-socket 24-core 48-thread Broadwell system with a small xfs filesystem on ramfs, the performance increased from 91,486 jobs/min to 192,666 jobs/min with this change. Signed-off-by: Waiman Long --- fs/xfs/xfs_log.c | 9 +++++++++ fs/xfs/xfs_log_priv.h | 1 + 2 files changed, 10 insertions(+) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index c3b610b687d1..ac1dc8db7112 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -232,8 +232,16 @@ xlog_grant_head_wake( return false; *free_bytes -= need_bytes; + + /* + * Skip task that is being waken up already. + */ + if (tic->t_flags & XLOG_TIC_WAKING) + continue; + trace_xfs_log_grant_wake_up(log, tic); wake_up_process(tic->t_task); + tic->t_flags |= XLOG_TIC_WAKING; } return true; @@ -264,6 +272,7 @@ xlog_grant_head_wait( trace_xfs_log_grant_wake(log, tic); spin_lock(&head->lock); + tic->t_flags &= ~XLOG_TIC_WAKING; if (XLOG_FORCED_SHUTDOWN(log)) goto shutdown; } while (xlog_space_left(log, &head->grant) < need_bytes); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index b5f82cb36202..738df09bf352 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -59,6 +59,7 @@ static inline uint xlog_get_client_id(__be32 i) */ #define XLOG_TIC_INITED 0x1 /* has been initialized */ #define XLOG_TIC_PERM_RESERV 0x2 /* permanent reservation */ +#define XLOG_TIC_WAKING 0x4 /* task is being waken up */ #define XLOG_TIC_FLAGS \ { XLOG_TIC_INITED, "XLOG_TIC_INITED" }, \ From patchwork Sun Aug 26 20:53:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 10576319 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3619C174A for ; Sun, 26 Aug 2018 20:53:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 23E8429823 for ; Sun, 26 Aug 2018 20:53:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1874E2987E; Sun, 26 Aug 2018 20:53:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F7E229823 for ; Sun, 26 Aug 2018 20:53:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726908AbeH0Ah3 (ORCPT ); Sun, 26 Aug 2018 20:37:29 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46300 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726741AbeH0Ah2 (ORCPT ); Sun, 26 Aug 2018 20:37:28 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B159940241C7; Sun, 26 Aug 2018 20:53:47 +0000 (UTC) Received: from llong.com (ovpn-121-41.rdu2.redhat.com [10.10.121.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4DBD62026D76; Sun, 26 Aug 2018 20:53:47 +0000 (UTC) From: Waiman Long To: "Darrick J. Wong" , Ingo Molnar , Peter Zijlstra Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Dave Chinner , Waiman Long Subject: [PATCH v2 3/3] xfs: Use wake_q for waking up log space waiters Date: Sun, 26 Aug 2018 16:53:15 -0400 Message-Id: <1535316795-21560-4-git-send-email-longman@redhat.com> In-Reply-To: <1535316795-21560-1-git-send-email-longman@redhat.com> References: <1535316795-21560-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Sun, 26 Aug 2018 20:53:47 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Sun, 26 Aug 2018 20:53:47 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In the current log space reservation slowpath code, the log space waiters are waken up by an incoming waiter while holding the lock. As the process of waking up a task can be time consuming, doing it while holding the lock can make spinlock contention, if present, more severe. This patch changes the slowpath code to use the wake_q for waking up tasks without holding the lock, thus improving performance and reducing spinlock contention level. Running the AIM7 fserver workload on a 2-socket 24-core 48-thread Broadwell system with a small xfs filesystem on ramfs, the performance increased from 192,666 jobs/min to 285,221 with this change. Signed-off-by: Waiman Long --- fs/xfs/xfs_linux.h | 1 + fs/xfs/xfs_log.c | 50 ++++++++++++++++++++++++++++++++++++---------- 2 files changed, 41 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index edbd5a210df2..1548a353da1e 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -60,6 +60,7 @@ typedef __u32 xfs_nlink_t; #include #include #include +#include #include #include diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index ac1dc8db7112..70d5f85ff059 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -221,7 +221,8 @@ STATIC bool xlog_grant_head_wake( struct xlog *log, struct xlog_grant_head *head, - int *free_bytes) + int *free_bytes, + struct wake_q_head *wakeq) { struct xlog_ticket *tic; int need_bytes; @@ -240,7 +241,7 @@ xlog_grant_head_wake( continue; trace_xfs_log_grant_wake_up(log, tic); - wake_up_process(tic->t_task); + wake_q_add(wakeq, tic->t_task); tic->t_flags |= XLOG_TIC_WAKING; } @@ -252,8 +253,9 @@ xlog_grant_head_wait( struct xlog *log, struct xlog_grant_head *head, struct xlog_ticket *tic, - int need_bytes) __releases(&head->lock) - __acquires(&head->lock) + int need_bytes, + struct wake_q_head *wakeq) __releases(&head->lock) + __acquires(&head->lock) { list_add_tail(&tic->t_queue, &head->waiters); @@ -265,6 +267,11 @@ xlog_grant_head_wait( __set_current_state(TASK_UNINTERRUPTIBLE); spin_unlock(&head->lock); + if (wakeq) { + wake_up_q(wakeq); + wakeq = NULL; + } + XFS_STATS_INC(log->l_mp, xs_sleep_logspace); trace_xfs_log_grant_sleep(log, tic); @@ -272,7 +279,21 @@ xlog_grant_head_wait( trace_xfs_log_grant_wake(log, tic); spin_lock(&head->lock); - tic->t_flags &= ~XLOG_TIC_WAKING; + /* + * The XLOG_TIC_WAKING flag should be set. However, it is + * very unlikely that the current task is still in the + * wake_q. If that happens (maybe anonymous wakeup), we + * have to wait until the task is dequeued before proceeding + * to avoid the possibility of having the task put into + * another wake_q simultaneously. + */ + if (tic->t_flags & XLOG_TIC_WAKING) { + while (task_in_wake_q(current)) + cpu_relax(); + + tic->t_flags &= ~XLOG_TIC_WAKING; + } + if (XLOG_FORCED_SHUTDOWN(log)) goto shutdown; } while (xlog_space_left(log, &head->grant) < need_bytes); @@ -310,6 +331,7 @@ xlog_grant_head_check( { int free_bytes; int error = 0; + DEFINE_WAKE_Q(wakeq); ASSERT(!(log->l_flags & XLOG_ACTIVE_RECOVERY)); @@ -323,15 +345,17 @@ xlog_grant_head_check( free_bytes = xlog_space_left(log, &head->grant); if (!list_empty_careful(&head->waiters)) { spin_lock(&head->lock); - if (!xlog_grant_head_wake(log, head, &free_bytes) || + if (!xlog_grant_head_wake(log, head, &free_bytes, &wakeq) || free_bytes < *need_bytes) { error = xlog_grant_head_wait(log, head, tic, - *need_bytes); + *need_bytes, &wakeq); + wake_q_init(&wakeq); /* Set wake_q to empty */ } spin_unlock(&head->lock); + wake_up_q(&wakeq); } else if (free_bytes < *need_bytes) { spin_lock(&head->lock); - error = xlog_grant_head_wait(log, head, tic, *need_bytes); + error = xlog_grant_head_wait(log, head, tic, *need_bytes, NULL); spin_unlock(&head->lock); } @@ -1077,6 +1101,7 @@ xfs_log_space_wake( { struct xlog *log = mp->m_log; int free_bytes; + DEFINE_WAKE_Q(wakeq); if (XLOG_FORCED_SHUTDOWN(log)) return; @@ -1086,8 +1111,11 @@ xfs_log_space_wake( spin_lock(&log->l_write_head.lock); free_bytes = xlog_space_left(log, &log->l_write_head.grant); - xlog_grant_head_wake(log, &log->l_write_head, &free_bytes); + xlog_grant_head_wake(log, &log->l_write_head, &free_bytes, + &wakeq); spin_unlock(&log->l_write_head.lock); + wake_up_q(&wakeq); + wake_q_init(&wakeq); /* Re-init wake_q to be reused again */ } if (!list_empty_careful(&log->l_reserve_head.waiters)) { @@ -1095,8 +1123,10 @@ xfs_log_space_wake( spin_lock(&log->l_reserve_head.lock); free_bytes = xlog_space_left(log, &log->l_reserve_head.grant); - xlog_grant_head_wake(log, &log->l_reserve_head, &free_bytes); + xlog_grant_head_wake(log, &log->l_reserve_head, &free_bytes, + &wakeq); spin_unlock(&log->l_reserve_head.lock); + wake_up_q(&wakeq); } }