Message ID | 20241113-refactor-blk-affinity-helpers-v4-5-dd3baa1e267f@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | blk: refactor queue affinity helpers | expand |
On Wed, Nov 13, 2024 at 03:26:19PM +0100, Daniel Wagner wrote: > blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to > hardware queue mapping based on affinity information. These two function > share common code and only differ on how the affinity information is > retrieved. Also, those functions are located in the block subsystem > where it doesn't really fit in. They are virtio and pci subsystem > specific. > > Thus introduce provide a generic mapping function which uses the > irq_get_affinity callback from bus_type. > > Originally idea from Ming Lei <ming.lei@redhat.com> > > Signed-off-by: Daniel Wagner <wagi@kernel.org> > --- > block/blk-mq-cpumap.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > include/linux/blk-mq.h | 2 ++ > 2 files changed, 45 insertions(+) > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 9638b25fd52124f0173e968ebdca5f1fe0b42ad9..3506f1c25a02d331d28212a2a97fb269cb21e738 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -11,6 +11,7 @@ > #include <linux/smp.h> > #include <linux/cpu.h> > #include <linux/group_cpus.h> > +#include <linux/device/bus.h> > > #include "blk.h" > #include "blk-mq.h" > @@ -54,3 +55,45 @@ int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index) > > return NUMA_NO_NODE; > } > + > +/** > + * blk_mq_hctx_map_queues - Create CPU to hardware queue mapping > + * @qmap: CPU to hardware queue map. > + * @dev: The device to map queues. > + * @offset: Queue offset to use for the device. > + * > + * Create a CPU to hardware queue mapping in @qmap. The struct bus_type > + * irq_get_affinity callback will be used to retrieve the affinity. > + */ > +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap, Some drivers may not know hctx at all, maybe blk_mq_map_hw_queues()? > + struct device *dev, unsigned int offset) > + > +{ > + const struct cpumask *(*irq_get_affinity)(struct device *dev, > + unsigned int irq_vec); > + const struct cpumask *mask; > + unsigned int queue, cpu; > + > + if (dev->driver->irq_get_affinity) > + irq_get_affinity = dev->driver->irq_get_affinity; > + else if (dev->bus->irq_get_affinity) > + irq_get_affinity = dev->bus->irq_get_affinity; It is one generic API, I think both 'dev->driver' and 'dev->bus' should be validated here. Thanks, Ming
On Thu, Nov 14, 2024 at 09:58:25AM +0800, Ming Lei wrote: > > +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap, > > Some drivers may not know hctx at all, maybe blk_mq_map_hw_queues()? I am not really attach to the name, I am fine with renaming it to blk_mq_map_hw_queues. > > + if (dev->driver->irq_get_affinity) > > + irq_get_affinity = dev->driver->irq_get_affinity; > > + else if (dev->bus->irq_get_affinity) > > + irq_get_affinity = dev->bus->irq_get_affinity; > > It is one generic API, I think both 'dev->driver' and > 'dev->bus' should be validated here. What do you have in mind here if we get two masks? What should the operation be: AND, OR? This brings up another topic I left out in this series. blk_mq_map_queues does almost the same thing except it starts with the mask returned by group_cpus_evenely. If we figure out how this could be combined in a sane way it's possible to cleanup even a bit more. A bunch of drivers do if (i != HCTX_TYPE_POLL && offset) blk_mq_hctx_map_queues(map, dev->dev, offset); else blk_mq_map_queues(map); IMO it would be nice just to have one blk_mq_map_queues() which handles this correctly for both cases.
On Thu, Nov 14, 2024 at 08:54:46AM +0100, Daniel Wagner wrote: > On Thu, Nov 14, 2024 at 09:58:25AM +0800, Ming Lei wrote: > > > +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap, > > > > Some drivers may not know hctx at all, maybe blk_mq_map_hw_queues()? > > I am not really attach to the name, I am fine with renaming it to > blk_mq_map_hw_queues. > > > > + if (dev->driver->irq_get_affinity) > > > + irq_get_affinity = dev->driver->irq_get_affinity; > > > + else if (dev->bus->irq_get_affinity) > > > + irq_get_affinity = dev->bus->irq_get_affinity; > > > > It is one generic API, I think both 'dev->driver' and > > 'dev->bus' should be validated here. > > What do you have in mind here if we get two masks? What should the > operation be: AND, OR? IMO you just need one callback to return the mask. I feel driver should get higher priority, but in the probe() example, call_driver_probe() actually tries bus->probe() first. But looks not an issue for this patchset since only hisi_sas_v2_driver(platform_driver) defines ->irq_get_affinity(), but the platform_bus_type doesn't have the callback. > > This brings up another topic I left out in this series. > blk_mq_map_queues does almost the same thing except it starts with the > mask returned by group_cpus_evenely. If we figure out how this could be > combined in a sane way it's possible to cleanup even a bit more. A bunch > of drivers do > > if (i != HCTX_TYPE_POLL && offset) > blk_mq_hctx_map_queues(map, dev->dev, offset); > else > blk_mq_map_queues(map); > > IMO it would be nice just to have one blk_mq_map_queues() which handles > this correctly for both cases. I guess it is doable, and the driver just setup the tag_set->map[], then call one generic map_queues API to do everything? Thanks, Ming
On Thu, Nov 14, 2024 at 05:12:22PM +0800, Ming Lei wrote: > I feel driver should get higher priority, but in the probe() example, > call_driver_probe() actually tries bus->probe() first. > > But looks not an issue for this patchset since only hisi_sas_v2_driver(platform_driver) > defines ->irq_get_affinity(), but the platform_bus_type doesn't have > the callback. Oh, I was not aware of this ordering. And after digging this up here: https://lore.kernel.org/all/20060105142951.13.01@flint.arm.linux.org.uk/ I don't think we it's worthwhile to add the callback to device_driver just for hisi_sas_v2. So I am going to drop this part again. > > This brings up another topic I left out in this series. > > blk_mq_map_queues does almost the same thing except it starts with the > > mask returned by group_cpus_evenely. If we figure out how this could be > > combined in a sane way it's possible to cleanup even a bit more. A bunch > > of drivers do > > > > if (i != HCTX_TYPE_POLL && offset) > > blk_mq_hctx_map_queues(map, dev->dev, offset); > > else > > blk_mq_map_queues(map); > > > > IMO it would be nice just to have one blk_mq_map_queues() which handles > > this correctly for both cases. > > I guess it is doable, and the driver just setup the tag_set->map[], then call > one generic map_queues API to do everything? Yes, that is my idea. Just having one function which handles what blk_mq_map_queues and blk_mq_hctx_map_queues/blk_mq_map_hw_queues currently do.
On Thu, Nov 14, 2024 at 01:06:49PM +0100, Daniel Wagner wrote: > Oh, I was not aware of this ordering. And after digging this up here: > > https://lore.kernel.org/all/20060105142951.13.01@flint.arm.linux.org.uk/ > > I don't think we it's worthwhile to add the callback to device_driver > just for hisi_sas_v2. So I am going to drop this part again. Yes, I don't really see how querying driver specific information like this from code called by the driver make much sense.
On 14/11/2024 12:06, Daniel Wagner wrote: > On Thu, Nov 14, 2024 at 05:12:22PM +0800, Ming Lei wrote: >> I feel driver should get higher priority, but in the probe() example, >> call_driver_probe() actually tries bus->probe() first. >> >> But looks not an issue for this patchset since only hisi_sas_v2_driver(platform_driver) >> defines ->irq_get_affinity(), but the platform_bus_type doesn't have >> the callback. > Oh, I was not aware of this ordering. And after digging this up here: > > https://urldefense.com/v3/__https://lore.kernel.org/ > all/20060105142951.13.01@flint.arm.linux.org.uk/__;!!ACWV5N9M2RV99hQ! > IWDpRwKfvo0y1LzokS_0YDXzX7eZLeVUcaTOFCpn0yEcV2e5gknO2Q3igMwPhA1Mhq6IOKBOkiLxHYe-0sM$ > > I don't think we it's worthwhile to add the callback to device_driver > just for hisi_sas_v2. So I am going to drop this part again. I agree. I would not add anything new to core code just for that driver.
diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9638b25fd52124f0173e968ebdca5f1fe0b42ad9..3506f1c25a02d331d28212a2a97fb269cb21e738 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -11,6 +11,7 @@ #include <linux/smp.h> #include <linux/cpu.h> #include <linux/group_cpus.h> +#include <linux/device/bus.h> #include "blk.h" #include "blk-mq.h" @@ -54,3 +55,45 @@ int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index) return NUMA_NO_NODE; } + +/** + * blk_mq_hctx_map_queues - Create CPU to hardware queue mapping + * @qmap: CPU to hardware queue map. + * @dev: The device to map queues. + * @offset: Queue offset to use for the device. + * + * Create a CPU to hardware queue mapping in @qmap. The struct bus_type + * irq_get_affinity callback will be used to retrieve the affinity. + */ +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap, + struct device *dev, unsigned int offset) + +{ + const struct cpumask *(*irq_get_affinity)(struct device *dev, + unsigned int irq_vec); + const struct cpumask *mask; + unsigned int queue, cpu; + + if (dev->driver->irq_get_affinity) + irq_get_affinity = dev->driver->irq_get_affinity; + else if (dev->bus->irq_get_affinity) + irq_get_affinity = dev->bus->irq_get_affinity; + else + goto fallback; + + for (queue = 0; queue < qmap->nr_queues; queue++) { + mask = irq_get_affinity(dev, queue + offset); + if (!mask) + goto fallback; + + for_each_cpu(cpu, mask) + qmap->mq_map[cpu] = qmap->queue_offset + queue; + } + + return; + +fallback: + WARN_ON_ONCE(qmap->nr_queues > 1); + blk_mq_clear_mq_map(qmap); +} +EXPORT_SYMBOL_GPL(blk_mq_hctx_map_queues); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 2035fad3131fb60781957095ce8a3a941dd104be..1a85fdcb443c154390cd29f2b1f2a807bf10bfe3 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -923,6 +923,8 @@ void blk_mq_unfreeze_queue_non_owner(struct request_queue *q); void blk_freeze_queue_start_non_owner(struct request_queue *q); void blk_mq_map_queues(struct blk_mq_queue_map *qmap); +void blk_mq_hctx_map_queues(struct blk_mq_queue_map *qmap, + struct device *dev, unsigned int offset); void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues); void blk_mq_quiesce_queue_nowait(struct request_queue *q);
blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to hardware queue mapping based on affinity information. These two function share common code and only differ on how the affinity information is retrieved. Also, those functions are located in the block subsystem where it doesn't really fit in. They are virtio and pci subsystem specific. Thus introduce provide a generic mapping function which uses the irq_get_affinity callback from bus_type. Originally idea from Ming Lei <ming.lei@redhat.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> --- block/blk-mq-cpumap.c | 43 +++++++++++++++++++++++++++++++++++++++++++ include/linux/blk-mq.h | 2 ++ 2 files changed, 45 insertions(+)