From patchwork Wed Sep 26 21:51:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10616933 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8FE2913 for ; Wed, 26 Sep 2018 21:51:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 985652B851 for ; Wed, 26 Sep 2018 21:51:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8C3CF2B87C; Wed, 26 Sep 2018 21:51:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 163272B851 for ; Wed, 26 Sep 2018 21:51:41 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id F360D21159800; Wed, 26 Sep 2018 14:51:40 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: None (no SPF record) identity=mailfrom; client-ip=192.55.52.43; helo=mga05.intel.com; envelope-from=alexander.h.duyck@linux.intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2638321157438 for ; Wed, 26 Sep 2018 14:51:39 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Sep 2018 14:51:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,307,1534834800"; d="scan'208";a="94009086" Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain) ([10.7.198.154]) by orsmga001.jf.intel.com with ESMTP; 26 Sep 2018 14:51:38 -0700 Subject: [RFC workqueue/driver-core PATCH 1/5] workqueue: Provide queue_work_near to queue work near a given NUMA node From: Alexander Duyck To: linux-nvdimm@lists.01.org, gregkh@linuxfoundation.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, akpm@linux-foundation.org Date: Wed, 26 Sep 2018 14:51:38 -0700 Message-ID: <20180926215138.13512.33146.stgit@localhost.localdomain> In-Reply-To: <20180926214433.13512.30289.stgit@localhost.localdomain> References: <20180926214433.13512.30289.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: len.brown@intel.com, rafael@kernel.org, jiangshanlai@gmail.com, pavel@ucw.cz, zwisler@kernel.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP This patch provides a new function queue_work_near which is meant to schedule work on the nearest unbound CPU to the requested NUMA node. The main motivation for this is to help assist asynchronous init to better improve boot times for devices that are local to a specific node. Signed-off-by: Alexander Duyck --- include/linux/workqueue.h | 2 + kernel/workqueue.c | 129 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 129 insertions(+), 2 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 60d673e15632..1f9f0a65437b 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -463,6 +463,8 @@ int apply_workqueue_attrs(struct workqueue_struct *wq, extern bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work); +extern bool queue_work_near(int node, struct workqueue_struct *wq, + struct work_struct *work); extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *work, unsigned long delay); extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 0280deac392e..a971d3c4096e 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -49,6 +49,7 @@ #include #include #include +#include #include "workqueue_internal.h" @@ -1332,8 +1333,9 @@ static bool is_chained_work(struct workqueue_struct *wq) * by wq_unbound_cpumask. Otherwise, round robin among the allowed ones to * avoid perturbing sensitive tasks. */ -static int wq_select_unbound_cpu(int cpu) +static int wq_select_unbound_cpu(void) { + int cpu = raw_smp_processor_id(); static bool printed_dbg_warning; int new_cpu; @@ -1385,7 +1387,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, return; retry: if (req_cpu == WORK_CPU_UNBOUND) - cpu = wq_select_unbound_cpu(raw_smp_processor_id()); + cpu = wq_select_unbound_cpu(); /* pwq which will be used unless @work is executing elsewhere */ if (!(wq->flags & WQ_UNBOUND)) @@ -1492,6 +1494,129 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq, } EXPORT_SYMBOL(queue_work_on); +/** + * workqueue_select_unbound_cpu_near - Select an unbound CPU based on NUMA node + * @node: NUMA node ID that we want to bind a CPU from + * + * This function will attempt to find a "random" cpu available to the unbound + * workqueues on a given node. If there are no CPUs available on the given + * node it will return WORK_CPU_UNBOUND indicating that we should just + * schedule to any available CPU if we need to schedule this work. + */ +static int workqueue_select_unbound_cpu_near(int node) +{ + const struct cpumask *wq_cpumask, *node_cpumask; + int cpu; + + /* No point in doing this if NUMA isn't enabled for workqueues */ + if (!wq_numa_enabled) + return WORK_CPU_UNBOUND; + + /* delay binding to CPU if node is not valid or online */ + if (node < 0 || node >= MAX_NUMNODES || !node_online(node)) + return WORK_CPU_UNBOUND; + + /* If wq_unbound_cpumask is empty then just use cpu_online_mask */ + wq_cpumask = cpumask_empty(wq_unbound_cpumask) ? cpu_online_mask : + wq_unbound_cpumask; + + /* + * If node has no CPUs, or no CPUs in the unbound cpumask then we + * need to try and find the nearest node that does have CPUs in the + * unbound cpumask. + */ + if (!nr_cpus_node(node) || + !cpumask_intersects(cpumask_of_node(node), wq_cpumask)) { + int min_val = INT_MAX, best_node = NUMA_NO_NODE; + int this_node, val; + + for_each_online_node(this_node) { + if (this_node == node) + continue; + + val = node_distance(node, this_node); + if (min_val < val) + continue; + + if (!nr_cpus_node(this_node) || + !cpumask_intersects(cpumask_of_node(this_node), + wq_cpumask)) + continue; + + best_node = this_node; + min_val = val; + } + + /* If we failed to find a close node just defer */ + if (best_node == NUMA_NO_NODE) + return WORK_CPU_UNBOUND; + + /* update node to reflect optimal value */ + node = best_node; + } + + + /* Use local node/cpu if we are already there */ + cpu = raw_smp_processor_id(); + if (node == cpu_to_node(cpu) && + cpumask_test_cpu(cpu, wq_unbound_cpumask)) + return cpu; + + /* + * Reuse the same value as wq_select_unbound_cpu above to prevent + * us from mapping the same CPU each time. The impact to + * wq_select_unbound_cpu should be minimal since the above function + * only uses it when it has to load balance on remote CPUs similar + * to what I am doing here. + */ + cpu = __this_cpu_read(wq_rr_cpu_last); + node_cpumask = cpumask_of_node(node); + cpu = cpumask_next_and(cpu, wq_cpumask, node_cpumask); + if (unlikely(cpu >= nr_cpu_ids)) { + cpu = cpumask_first_and(wq_cpumask, node_cpumask); + if (unlikely(cpu >= nr_cpu_ids)) + return WORK_CPU_UNBOUND; + } + __this_cpu_write(wq_rr_cpu_last, cpu); + + return cpu; +} + +/** + * queue_work_near - queue work on the nearest unbound cpu to a given NUMA node + * @node: NUMA node that we are targeting the work for + * @wq: workqueue to use + * @work: work to queue + * + * We queue the work to a specific CPU based on a given NUMA node, the + * caller must ensure it can't go away. + * + * This function will only make a best effort attempt at getting this onto + * the right NUMA node. If no node is requested or the requested node is + * offline then we just fall back to standard queue_work behavior. + * + * Return: %false if @work was already on a queue, %true otherwise. + */ +bool queue_work_near(int node, struct workqueue_struct *wq, + struct work_struct *work) +{ + unsigned long flags; + bool ret = false; + + local_irq_save(flags); + + if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) { + int cpu = workqueue_select_unbound_cpu_near(node); + + __queue_work(cpu, wq, work); + ret = true; + } + + local_irq_restore(flags); + return ret; +} +EXPORT_SYMBOL_GPL(queue_work_near); + void delayed_work_timer_fn(struct timer_list *t) { struct delayed_work *dwork = from_timer(dwork, t, timer); From patchwork Wed Sep 26 21:51:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10616937 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BDC7E175A for ; Wed, 26 Sep 2018 21:51:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC6232B851 for ; Wed, 26 Sep 2018 21:51:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E8D02B87C; Wed, 26 Sep 2018 21:51:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 21D842B851 for ; Wed, 26 Sep 2018 21:51:46 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 184F8211597FF; Wed, 26 Sep 2018 14:51:46 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: None (no SPF record) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=alexander.h.duyck@linux.intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 9B16F21157438 for ; Wed, 26 Sep 2018 14:51:44 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Sep 2018 14:51:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,307,1534834800"; d="scan'208";a="94009102" Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain) ([10.7.198.154]) by orsmga001.jf.intel.com with ESMTP; 26 Sep 2018 14:51:44 -0700 Subject: [RFC workqueue/driver-core PATCH 2/5] async: Add support for queueing on specific NUMA node From: Alexander Duyck To: linux-nvdimm@lists.01.org, gregkh@linuxfoundation.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, akpm@linux-foundation.org Date: Wed, 26 Sep 2018 14:51:43 -0700 Message-ID: <20180926215143.13512.56522.stgit@localhost.localdomain> In-Reply-To: <20180926214433.13512.30289.stgit@localhost.localdomain> References: <20180926214433.13512.30289.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: len.brown@intel.com, rafael@kernel.org, jiangshanlai@gmail.com, pavel@ucw.cz, zwisler@kernel.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces four new variants of the async_schedule_ functions that allow scheduling on a specific NUMA node. The first two functions are async_schedule_near and async_schedule_near_domain which end up mapping to async_schedule and async_schedule_domain but provide NUMA node specific functionality. They replace the original functions which were moved to inline function definitions that call the new functions while passing NUMA_NO_NODE. The second two functions are async_schedule_dev and async_schedule_dev_domain which provide NUMA specific functionality when passing a device as the data member and that device has a NUMA node other than NUMA_NO_NODE. The main motivation behind this is to address the need to be able to schedule device specific init work on specific NUMA nodes in order to improve performance of memory initialization. Signed-off-by: Alexander Duyck --- include/linux/async.h | 27 +++++++++++++++++++-- kernel/async.c | 62 ++++++++++++++++++++++++++++++++++++------------- 2 files changed, 69 insertions(+), 20 deletions(-) diff --git a/include/linux/async.h b/include/linux/async.h index 6b0226bdaadc..abf3ee9102df 100644 --- a/include/linux/async.h +++ b/include/linux/async.h @@ -14,6 +14,9 @@ #include #include +#include + +struct device; typedef u64 async_cookie_t; typedef void (*async_func_t) (void *data, async_cookie_t cookie); @@ -37,9 +40,27 @@ struct async_domain { struct async_domain _name = { .pending = LIST_HEAD_INIT(_name.pending), \ .registered = 0 } -extern async_cookie_t async_schedule(async_func_t func, void *data); -extern async_cookie_t async_schedule_domain(async_func_t func, void *data, - struct async_domain *domain); +async_cookie_t async_schedule_near(async_func_t func, void *data, + int node); +async_cookie_t async_schedule_near_domain(async_func_t func, void *data, + int node, + struct async_domain *domain); + +static inline async_cookie_t async_schedule(async_func_t func, void *data) +{ + return async_schedule_near(func, data, NUMA_NO_NODE); +} + +static inline async_cookie_t +async_schedule_domain(async_func_t func, void *data, + struct async_domain *domain) +{ + return async_schedule_near_domain(func, data, NUMA_NO_NODE, domain); +} + +async_cookie_t async_schedule_dev(async_func_t func, struct device *dev); +async_cookie_t async_schedule_dev_domain(async_func_t func, struct device *dev, + struct async_domain *domain); void async_unregister_domain(struct async_domain *domain); extern void async_synchronize_full(void); extern void async_synchronize_full_domain(struct async_domain *domain); diff --git a/kernel/async.c b/kernel/async.c index a893d6170944..13fcf222b89a 100644 --- a/kernel/async.c +++ b/kernel/async.c @@ -56,6 +56,7 @@ synchronization with the async_synchronize_full() function, before returning #include #include #include +#include #include "workqueue_internal.h" @@ -149,7 +150,21 @@ static void async_run_entry_fn(struct work_struct *work) wake_up(&async_done); } -static async_cookie_t __async_schedule(async_func_t func, void *data, struct async_domain *domain) +/** + * async_schedule_near_domain - schedule a function for asynchronous execution within a certain domain + * @func: function to execute asynchronously + * @data: data pointer to pass to the function + * @node: NUMA node that we want to schedule this on or close to + * @domain: the domain + * + * Returns an async_cookie_t that may be used for checkpointing later. + * @domain may be used in the async_synchronize_*_domain() functions to + * wait within a certain synchronization domain rather than globally. A + * synchronization domain is specified via @domain. Note: This function + * may be called from atomic or non-atomic contexts. + */ +async_cookie_t async_schedule_near_domain(async_func_t func, void *data, + int node, struct async_domain *domain) { struct async_entry *entry; unsigned long flags; @@ -195,43 +210,56 @@ static async_cookie_t __async_schedule(async_func_t func, void *data, struct asy current->flags |= PF_USED_ASYNC; /* schedule for execution */ - queue_work(system_unbound_wq, &entry->work); + queue_work_near(node, system_unbound_wq, &entry->work); return newcookie; } +EXPORT_SYMBOL_GPL(async_schedule_near_domain); /** - * async_schedule - schedule a function for asynchronous execution + * async_schedule_near - schedule a function for asynchronous execution * @func: function to execute asynchronously * @data: data pointer to pass to the function + * @node: NUMA node that we want to schedule this on or close to * * Returns an async_cookie_t that may be used for checkpointing later. * Note: This function may be called from atomic or non-atomic contexts. */ -async_cookie_t async_schedule(async_func_t func, void *data) +async_cookie_t async_schedule_near(async_func_t func, void *data, int node) { - return __async_schedule(func, data, &async_dfl_domain); + return async_schedule_near_domain(func, data, node, &async_dfl_domain); } -EXPORT_SYMBOL_GPL(async_schedule); +EXPORT_SYMBOL_GPL(async_schedule_near); /** - * async_schedule_domain - schedule a function for asynchronous execution within a certain domain + * async_schedule_dev_domain - schedule a function for asynchronous execution within a certain domain * @func: function to execute asynchronously - * @data: data pointer to pass to the function + * @dev: device that we are scheduling this work for * @domain: the domain * - * Returns an async_cookie_t that may be used for checkpointing later. - * @domain may be used in the async_synchronize_*_domain() functions to - * wait within a certain synchronization domain rather than globally. A - * synchronization domain is specified via @domain. Note: This function - * may be called from atomic or non-atomic contexts. + * Device specific version of async_schedule_near_domain that provides some + * NUMA awareness based on the device node. + */ +async_cookie_t async_schedule_dev_domain(async_func_t func, struct device *dev, + struct async_domain *domain) +{ + return async_schedule_near_domain(func, dev, dev_to_node(dev), domain); +} +EXPORT_SYMBOL_GPL(async_schedule_dev_domain); + +/** + * async_schedule_dev - schedule a function for asynchronous execution + * @func: function to execute asynchronously + * @dev: device that we are scheduling this work for + * + * Device specific version of async_schedule_near that provides some NUMA + * awareness based on the device node. */ -async_cookie_t async_schedule_domain(async_func_t func, void *data, - struct async_domain *domain) +async_cookie_t async_schedule_dev(async_func_t func, struct device *dev) { - return __async_schedule(func, data, domain); + return async_schedule_dev_domain(func, dev, &async_dfl_domain); } -EXPORT_SYMBOL_GPL(async_schedule_domain); +EXPORT_SYMBOL_GPL(async_schedule_dev); /** * async_synchronize_full - synchronize all asynchronous function calls From patchwork Wed Sep 26 21:51:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10616941 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADCBB913 for ; Wed, 26 Sep 2018 21:51:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E7892B851 for ; Wed, 26 Sep 2018 21:51:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 92D3F2B87C; Wed, 26 Sep 2018 21:51:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3669F2B851 for ; Wed, 26 Sep 2018 21:51:53 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 2D54C21159804; Wed, 26 Sep 2018 14:51:53 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: None (no SPF record) identity=mailfrom; client-ip=134.134.136.24; helo=mga09.intel.com; envelope-from=alexander.h.duyck@linux.intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0FEFE21157438 for ; Wed, 26 Sep 2018 14:51:52 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Sep 2018 14:51:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,307,1534834800"; d="scan'208";a="94009130" Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain) ([10.7.198.154]) by orsmga001.jf.intel.com with ESMTP; 26 Sep 2018 14:51:49 -0700 Subject: [RFC workqueue/driver-core PATCH 3/5] driver core: Probe devices asynchronously instead of the driver From: Alexander Duyck To: linux-nvdimm@lists.01.org, gregkh@linuxfoundation.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, akpm@linux-foundation.org Date: Wed, 26 Sep 2018 14:51:49 -0700 Message-ID: <20180926215149.13512.51991.stgit@localhost.localdomain> In-Reply-To: <20180926214433.13512.30289.stgit@localhost.localdomain> References: <20180926214433.13512.30289.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: len.brown@intel.com, rafael@kernel.org, jiangshanlai@gmail.com, pavel@ucw.cz, zwisler@kernel.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP This change makes it so that we probe devices asynchronously instead of the driver. This results in us seeing the same behavior if the device is registered before the driver or after. This way we can avoid serializing the initialization should the driver not be loaded until after the devices have already been added. The motivation behind this is that if we have a set of devices that take a significant amount of time to load we can greatly reduce the time to load by processing them in parallel instead of one at a time. In addition, each device can exist on a different node so placing a single thread on one CPU to initialize all of the devices for a given driver can result in poor performance on a system with multiple nodes. One issue I can see with this patch is that I am using the dev_set/get_drvdata functions to store the driver in the device while I am waiting on the asynchronous init to complete. For now I am protecting it by using the lack of a dev->driver and the device lock. Signed-off-by: Alexander Duyck --- drivers/base/bus.c | 23 +++-------------------- drivers/base/dd.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 20 deletions(-) diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 8bfd27ec73d6..2a17bed657ec 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -616,17 +616,6 @@ static ssize_t uevent_store(struct device_driver *drv, const char *buf, } static DRIVER_ATTR_WO(uevent); -static void driver_attach_async(void *_drv, async_cookie_t cookie) -{ - struct device_driver *drv = _drv; - int ret; - - ret = driver_attach(drv); - - pr_debug("bus: '%s': driver %s async attach completed: %d\n", - drv->bus->name, drv->name, ret); -} - /** * bus_add_driver - Add a driver to the bus. * @drv: driver. @@ -659,15 +648,9 @@ int bus_add_driver(struct device_driver *drv) klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers); if (drv->bus->p->drivers_autoprobe) { - if (driver_allows_async_probing(drv)) { - pr_debug("bus: '%s': probing driver %s asynchronously\n", - drv->bus->name, drv->name); - async_schedule(driver_attach_async, drv); - } else { - error = driver_attach(drv); - if (error) - goto out_unregister; - } + error = driver_attach(drv); + if (error) + goto out_unregister; } module_add_driver(drv->owner, drv); diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 169412ee4ae8..5ba366c1cb83 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -864,6 +864,29 @@ void device_initial_probe(struct device *dev) __device_attach(dev, true); } +static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie) +{ + struct device *dev = _dev; + + if (dev->parent && dev->bus->need_parent_lock) + device_lock(dev->parent); + device_lock(dev); + + if (!dev->driver) { + struct device_driver *drv = dev_get_drvdata(dev); + + driver_probe_device(drv, dev); + } + + dev_dbg(dev, "async probe completed\n"); + + device_unlock(dev); + if (dev->parent && dev->bus->need_parent_lock) + device_unlock(dev->parent); + + put_device(dev); +} + static int __driver_attach(struct device *dev, void *data) { struct device_driver *drv = data; @@ -891,6 +914,25 @@ static int __driver_attach(struct device *dev, void *data) return ret; } /* ret > 0 means positive match */ + if (driver_allows_async_probing(drv)) { + /* + * Instead of probing the device synchronously we will + * probe it asynchronously to allow for more parallelism. + * + * We only take the device lock here in order to guarantee + * that the dev->driver and driver_data fields are protected + */ + dev_dbg(dev, "scheduling asynchronous probe\n"); + device_lock(dev); + if (!dev->driver) { + get_device(dev); + dev_set_drvdata(dev, drv); + async_schedule(__driver_attach_async_helper, dev); + } + device_unlock(dev); + return 0; + } + if (dev->parent && dev->bus->need_parent_lock) device_lock(dev->parent); device_lock(dev); From patchwork Wed Sep 26 21:51:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10616945 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D7AAD913 for ; Wed, 26 Sep 2018 21:51:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C6F9A2B851 for ; Wed, 26 Sep 2018 21:51:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB8BB2B879; Wed, 26 Sep 2018 21:51:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6B9382B851 for ; Wed, 26 Sep 2018 21:51:57 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 4D57921159808; Wed, 26 Sep 2018 14:51:57 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: None (no SPF record) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=alexander.h.duyck@linux.intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 463B2211597F6 for ; Wed, 26 Sep 2018 14:51:56 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Sep 2018 14:51:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,307,1534834800"; d="scan'208";a="94009149" Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain) ([10.7.198.154]) by orsmga001.jf.intel.com with ESMTP; 26 Sep 2018 14:51:55 -0700 Subject: [RFC workqueue/driver-core PATCH 4/5] driver core: Use new async_schedule_dev command From: Alexander Duyck To: linux-nvdimm@lists.01.org, gregkh@linuxfoundation.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, akpm@linux-foundation.org Date: Wed, 26 Sep 2018 14:51:55 -0700 Message-ID: <20180926215155.13512.30900.stgit@localhost.localdomain> In-Reply-To: <20180926214433.13512.30289.stgit@localhost.localdomain> References: <20180926214433.13512.30289.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: len.brown@intel.com, rafael@kernel.org, jiangshanlai@gmail.com, pavel@ucw.cz, zwisler@kernel.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP This change makes it so that we use the device specific version of the async_schedule commands to defer various tasks related to devices. By doing this we should see a slight improvement in performance as any device that is sensitive to latency/locality in the setup will now be initializing on the node closest to the device. Signed-off-by: Alexander Duyck --- drivers/base/dd.c | 4 ++-- drivers/base/power/main.c | 12 ++++++------ 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 5ba366c1cb83..81472dc44a70 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -826,7 +826,7 @@ static int __device_attach(struct device *dev, bool allow_async) */ dev_dbg(dev, "scheduling asynchronous probe\n"); get_device(dev); - async_schedule(__device_attach_async_helper, dev); + async_schedule_dev(__device_attach_async_helper, dev); } else { pm_request_idle(dev); } @@ -927,7 +927,7 @@ static int __driver_attach(struct device *dev, void *data) if (!dev->driver) { get_device(dev); dev_set_drvdata(dev, drv); - async_schedule(__driver_attach_async_helper, dev); + async_schedule_dev(__driver_attach_async_helper, dev); } device_unlock(dev); return 0; diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 3f68e2919dc5..8495d9b1e9d0 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -726,7 +726,7 @@ void dpm_noirq_resume_devices(pm_message_t state) reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); - async_schedule(async_resume_noirq, dev); + async_schedule_dev(async_resume_noirq, dev); } } @@ -883,7 +883,7 @@ void dpm_resume_early(pm_message_t state) reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); - async_schedule(async_resume_early, dev); + async_schedule_dev(async_resume_early, dev); } } @@ -1047,7 +1047,7 @@ void dpm_resume(pm_message_t state) reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); - async_schedule(async_resume, dev); + async_schedule_dev(async_resume, dev); } } @@ -1366,7 +1366,7 @@ static int device_suspend_noirq(struct device *dev) if (is_async(dev)) { get_device(dev); - async_schedule(async_suspend_noirq, dev); + async_schedule_dev(async_suspend_noirq, dev); return 0; } return __device_suspend_noirq(dev, pm_transition, false); @@ -1569,7 +1569,7 @@ static int device_suspend_late(struct device *dev) if (is_async(dev)) { get_device(dev); - async_schedule(async_suspend_late, dev); + async_schedule_dev(async_suspend_late, dev); return 0; } @@ -1830,7 +1830,7 @@ static int device_suspend(struct device *dev) if (is_async(dev)) { get_device(dev); - async_schedule(async_suspend, dev); + async_schedule_dev(async_suspend, dev); return 0; } From patchwork Wed Sep 26 21:52:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10616949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6D56913 for ; Wed, 26 Sep 2018 21:52:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C7A192B876 for ; Wed, 26 Sep 2018 21:52:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BBD512B891; Wed, 26 Sep 2018 21:52:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6DAD42B876 for ; Wed, 26 Sep 2018 21:52:03 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 6516C21159806; Wed, 26 Sep 2018 14:52:03 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=helo; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=alexander.h.duyck@linux.intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 07806211597F6 for ; Wed, 26 Sep 2018 14:52:02 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Sep 2018 14:52:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,307,1534834800"; d="scan'208";a="94009170" Received: from ahduyck-mobl.amr.corp.intel.com (HELO localhost.localdomain) ([10.7.198.154]) by orsmga001.jf.intel.com with ESMTP; 26 Sep 2018 14:52:01 -0700 Subject: [RFC workqueue/driver-core PATCH 5/5] nvdimm: Schedule device registration on node local to the device From: Alexander Duyck To: linux-nvdimm@lists.01.org, gregkh@linuxfoundation.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, akpm@linux-foundation.org Date: Wed, 26 Sep 2018 14:52:01 -0700 Message-ID: <20180926215201.13512.55945.stgit@localhost.localdomain> In-Reply-To: <20180926214433.13512.30289.stgit@localhost.localdomain> References: <20180926214433.13512.30289.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: len.brown@intel.com, rafael@kernel.org, jiangshanlai@gmail.com, pavel@ucw.cz, zwisler@kernel.org Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP This patch is meant to force the device registration for nvdimm devices to be closer to the actual device. This is achieved by using either the NUMA node ID of the region, or of the parent. By doing this we can have everything above the region based on the region, and everything below the region based on the nvdimm bus. By guaranteeing NUMA locality I see an improvement of as high as 25% for per-node init of a system with 12TB of persistent memory. Signed-off-by: Alexander Duyck --- drivers/nvdimm/bus.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 96154670bf07..f663d6ff524e 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -513,11 +514,15 @@ void __nd_device_register(struct device *dev) set_dev_node(dev, to_nd_region(dev)->numa_node); dev->bus = &nvdimm_bus_type; - if (dev->parent) + if (dev->parent) { get_device(dev->parent); + if (dev_to_node(dev) == NUMA_NO_NODE) + set_dev_node(dev, dev_to_node(dev->parent)); + } get_device(dev); - async_schedule_domain(nd_async_device_register, dev, - &nd_async_domain); + + async_schedule_dev_domain(nd_async_device_register, dev, + &nd_async_domain); } void nd_device_register(struct device *dev)