Message ID | 154170044652.12967.17419321472770956712.stgit@ahduyck-desk1.jf.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add NUMA aware async_schedule calls | expand |
On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck <alexander.h.duyck@linux.intel.com> wrote: > > Force the device registration for nvdimm devices to be closer to the actual > device. This is achieved by using either the NUMA node ID of the region, or > of the parent. By doing this we can have everything above the region based > on the region, and everything below the region based on the nvdimm bus. > > By guaranteeing NUMA locality I see an improvement of as high as 25% for > per-node init of a system with 12TB of persistent memory. > It seems the speed-up is achieved with just patches 1, 2, and 9 from this series, correct? I wouldn't want to hold up that benefit while the driver-core bits are debated. You can add: Reviewed-by: Dan Williams <dan.j.williams@intel.com> ...if the series needs to be kept together, but as far as I can see the workqueue changes enable 2 sub-topics of development and it might make sense for Tejun to take those first 2 and then Greg and I can base any follow-up topics on that stable baseline.
On Mon, 2018-11-26 at 18:21 -0800, Dan Williams wrote: > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck > <alexander.h.duyck@linux.intel.com> wrote: > > > > Force the device registration for nvdimm devices to be closer to the actual > > device. This is achieved by using either the NUMA node ID of the region, or > > of the parent. By doing this we can have everything above the region based > > on the region, and everything below the region based on the nvdimm bus. > > > > By guaranteeing NUMA locality I see an improvement of as high as 25% for > > per-node init of a system with 12TB of persistent memory. > > > > It seems the speed-up is achieved with just patches 1, 2, and 9 from > this series, correct? I wouldn't want to hold up that benefit while > the driver-core bits are debated. Actually patch 6 ends up impacting things for persistent memory as well. The problem is that all the async calls to add interfaces only do anything if the driver is already loaded. So there are cases such as the X86_PMEM_LEGACY_DEVICE case where the memory regions end up still being serialized because the devices are added before the driver. > You can add: > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > ...if the series needs to be kept together, but as far as I can see > the workqueue changes enable 2 sub-topics of development and it might > make sense for Tejun to take those first 2 and then Greg and I can > base any follow-up topics on that stable baseline. I had originally put this out there for Tejun to apply, but him and Greg had talked and Greg agreed to apply the set. If it works for you I would prefer to just keep it together for now as I don't believe there will be too many more revisions of this needed.
On Tue, Nov 27, 2018 at 10:04 AM Alexander Duyck <alexander.h.duyck@linux.intel.com> wrote: > > On Mon, 2018-11-26 at 18:21 -0800, Dan Williams wrote: > > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck > > <alexander.h.duyck@linux.intel.com> wrote: > > > > > > Force the device registration for nvdimm devices to be closer to the actual > > > device. This is achieved by using either the NUMA node ID of the region, or > > > of the parent. By doing this we can have everything above the region based > > > on the region, and everything below the region based on the nvdimm bus. > > > > > > By guaranteeing NUMA locality I see an improvement of as high as 25% for > > > per-node init of a system with 12TB of persistent memory. > > > > > > > It seems the speed-up is achieved with just patches 1, 2, and 9 from > > this series, correct? I wouldn't want to hold up that benefit while > > the driver-core bits are debated. > > Actually patch 6 ends up impacting things for persistent memory as > well. The problem is that all the async calls to add interfaces only do > anything if the driver is already loaded. So there are cases such as > the X86_PMEM_LEGACY_DEVICE case where the memory regions end up still > being serialized because the devices are added before the driver. Ok, but is the patch 6 change generally useful outside of the libnvdimm case? Yes, local hacks like MODULE_SOFTDEP are terrible for global problems, but what I'm trying to tease out if this change benefits other async probing subsystems outside of libnvdimm, SCSI perhaps? Bart can you chime in with the benefits you see so it's clear to Greg that the driver-core changes are a generic improvement? > > You can add: > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > > > ...if the series needs to be kept together, but as far as I can see > > the workqueue changes enable 2 sub-topics of development and it might > > make sense for Tejun to take those first 2 and then Greg and I can > > base any follow-up topics on that stable baseline. > > I had originally put this out there for Tejun to apply, but him and > Greg had talked and Greg agreed to apply the set. If it works for you I > would prefer to just keep it together for now as I don't believe there > will be too many more revisions of this needed. > That works for me.
On Tue, 2018-11-27 at 11:34 -0800, Dan Williams wrote: > On Tue, Nov 27, 2018 at 10:04 AM Alexander Duyck > <alexander.h.duyck@linux.intel.com> wrote: > > > > On Mon, 2018-11-26 at 18:21 -0800, Dan Williams wrote: > > > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck > > > <alexander.h.duyck@linux.intel.com> wrote: > > > > > > > > Force the device registration for nvdimm devices to be closer to the actual > > > > device. This is achieved by using either the NUMA node ID of the region, or > > > > of the parent. By doing this we can have everything above the region based > > > > on the region, and everything below the region based on the nvdimm bus. > > > > > > > > By guaranteeing NUMA locality I see an improvement of as high as 25% for > > > > per-node init of a system with 12TB of persistent memory. > > > > > > > > > > It seems the speed-up is achieved with just patches 1, 2, and 9 from > > > this series, correct? I wouldn't want to hold up that benefit while > > > the driver-core bits are debated. > > > > Actually patch 6 ends up impacting things for persistent memory as > > well. The problem is that all the async calls to add interfaces only do > > anything if the driver is already loaded. So there are cases such as > > the X86_PMEM_LEGACY_DEVICE case where the memory regions end up still > > being serialized because the devices are added before the driver. > > Ok, but is the patch 6 change generally useful outside of the > libnvdimm case? Yes, local hacks like MODULE_SOFTDEP are terrible for > global problems, but what I'm trying to tease out if this change > benefits other async probing subsystems outside of libnvdimm, SCSI > perhaps? Bart can you chime in with the benefits you see so it's clear > to Greg that the driver-core changes are a generic improvement? Hi Dan, For SCSI asynchronous probing is really important because when scanning SAN LUNs there is plenty of potential for concurrency due to the network delay. I think the following quote provides the information you are looking for: "This patch reduces the time needed for loading the scsi_debug kernel module with parameters delay=0 and max_luns=256 from 0.7s to 0.1s. In other words, this specific test runs about seven times faster." Source: https://www.spinics.net/lists/linux-scsi/msg124457.html Best regards, Bart.
On Tue, Nov 27, 2018 at 12:33 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On Tue, 2018-11-27 at 11:34 -0800, Dan Williams wrote: > > On Tue, Nov 27, 2018 at 10:04 AM Alexander Duyck > > <alexander.h.duyck@linux.intel.com> wrote: > > > > > > On Mon, 2018-11-26 at 18:21 -0800, Dan Williams wrote: > > > > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck > > > > <alexander.h.duyck@linux.intel.com> wrote: > > > > > > > > > > Force the device registration for nvdimm devices to be closer to the actual > > > > > device. This is achieved by using either the NUMA node ID of the region, or > > > > > of the parent. By doing this we can have everything above the region based > > > > > on the region, and everything below the region based on the nvdimm bus. > > > > > > > > > > By guaranteeing NUMA locality I see an improvement of as high as 25% for > > > > > per-node init of a system with 12TB of persistent memory. > > > > > > > > > > > > > It seems the speed-up is achieved with just patches 1, 2, and 9 from > > > > this series, correct? I wouldn't want to hold up that benefit while > > > > the driver-core bits are debated. > > > > > > Actually patch 6 ends up impacting things for persistent memory as > > > well. The problem is that all the async calls to add interfaces only do > > > anything if the driver is already loaded. So there are cases such as > > > the X86_PMEM_LEGACY_DEVICE case where the memory regions end up still > > > being serialized because the devices are added before the driver. > > > > Ok, but is the patch 6 change generally useful outside of the > > libnvdimm case? Yes, local hacks like MODULE_SOFTDEP are terrible for > > global problems, but what I'm trying to tease out if this change > > benefits other async probing subsystems outside of libnvdimm, SCSI > > perhaps? Bart can you chime in with the benefits you see so it's clear > > to Greg that the driver-core changes are a generic improvement? > > Hi Dan, > > For SCSI asynchronous probing is really important because when scanning SAN > LUNs there is plenty of potential for concurrency due to the network delay. > > I think the following quote provides the information you are looking for: > > "This patch reduces the time needed for loading the scsi_debug kernel > module with parameters delay=0 and max_luns=256 from 0.7s to 0.1s. In > other words, this specific test runs about seven times faster." > > Source: https://www.spinics.net/lists/linux-scsi/msg124457.html Thanks Bart, so tying this back to Alex's patches, does the ordering problem that Alex's patches solve impact the SCSI case? I'm looking for something like "SCSI depends on asynchronous probing and without 'driver core: Establish clear order of operations for deferred probe and remove' probing is often needlessly serialized". I.e. does it suffer from the same platform problem that libnvdimm ran into where it's local async probing implementation was hindered by the driver core?
On Tue, 2018-11-27 at 12:50 -0800, Dan Williams wrote: > Thanks Bart, so tying this back to Alex's patches, does the ordering > problem that Alex's patches solve impact the SCSI case? I'm looking > for something like "SCSI depends on asynchronous probing and without > 'driver core: Establish clear order of operations for deferred probe > and remove' probing is often needlessly serialized". I.e. does it > suffer from the same platform problem that libnvdimm ran into where > it's local async probing implementation was hindered by the driver > core? (+Martin) Hi Dan, Patch 6/9 reduces the time needed to scan SCSI LUNs significantly. The only way to realize that speedup is by enabling more concurrency. That's why I think that patch 6/9 is a significant driver core improvement. Bart.
On Tue, Nov 27, 2018 at 1:22 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On Tue, 2018-11-27 at 12:50 -0800, Dan Williams wrote: > > Thanks Bart, so tying this back to Alex's patches, does the ordering > > problem that Alex's patches solve impact the SCSI case? I'm looking > > for something like "SCSI depends on asynchronous probing and without > > 'driver core: Establish clear order of operations for deferred probe > > and remove' probing is often needlessly serialized". I.e. does it > > suffer from the same platform problem that libnvdimm ran into where > > it's local async probing implementation was hindered by the driver > > core? > > (+Martin) > > Hi Dan, > > Patch 6/9 reduces the time needed to scan SCSI LUNs significantly. The only > way to realize that speedup is by enabling more concurrency. That's why I > think that patch 6/9 is a significant driver core improvement. Perfect, Alex, with that added to the 6/9 changelog and moving to device_private for the async state tracking you can add my Reviewed-by.
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index f1fb39921236..b1e193541874 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -23,6 +23,7 @@ #include <linux/ndctl.h> #include <linux/sched.h> #include <linux/slab.h> +#include <linux/cpu.h> #include <linux/fs.h> #include <linux/io.h> #include <linux/mm.h> @@ -513,11 +514,15 @@ void __nd_device_register(struct device *dev) set_dev_node(dev, to_nd_region(dev)->numa_node); dev->bus = &nvdimm_bus_type; - if (dev->parent) + if (dev->parent) { get_device(dev->parent); + if (dev_to_node(dev) == NUMA_NO_NODE) + set_dev_node(dev, dev_to_node(dev->parent)); + } get_device(dev); - async_schedule_domain(nd_async_device_register, dev, - &nd_async_domain); + + async_schedule_dev_domain(nd_async_device_register, dev, + &nd_async_domain); } void nd_device_register(struct device *dev)