Message ID | 1491140492-25703-6-git-send-email-sagi@grimberg.me (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Sun, Apr 02, 2017 at 04:41:31PM +0300, Sagi Grimberg wrote: > Like pci and virtio, we add a rdma helper for affinity > spreading. This achieves optimal mq affinity assignments > according to the underlying rdma device affinity maps. > > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- > block/Kconfig | 5 ++++ > block/Makefile | 1 + > block/blk-mq-rdma.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > include/linux/blk-mq-rdma.h | 10 ++++++++ > 4 files changed, 72 insertions(+) > create mode 100644 block/blk-mq-rdma.c > create mode 100644 include/linux/blk-mq-rdma.h > > diff --git a/block/Kconfig b/block/Kconfig > index 89cd28f8d051..3ab42bbb06d5 100644 > --- a/block/Kconfig > +++ b/block/Kconfig > @@ -206,4 +206,9 @@ config BLK_MQ_VIRTIO > depends on BLOCK && VIRTIO > default y > > +config BLK_MQ_RDMA > + bool > + depends on BLOCK && INFINIBAND > + default y > + > source block/Kconfig.iosched > diff --git a/block/Makefile b/block/Makefile > index 081bb680789b..4498603dbc83 100644 > --- a/block/Makefile > +++ b/block/Makefile > @@ -26,6 +26,7 @@ obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o > obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o > obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o > obj-$(CONFIG_BLK_MQ_VIRTIO) += blk-mq-virtio.o > +obj-$(CONFIG_BLK_MQ_RDMA) += blk-mq-rdma.o > obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o > obj-$(CONFIG_BLK_WBT) += blk-wbt.o > obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o > diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c > new file mode 100644 > index 000000000000..d402f7c93528 > --- /dev/null > +++ b/block/blk-mq-rdma.c > @@ -0,0 +1,56 @@ > +/* > + * Copyright (c) 2017 Sagi Grimberg. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + */ > +#include <linux/blk-mq.h> > +#include <linux/blk-mq-rdma.h> > +#include <rdma/ib_verbs.h> > +#include <linux/module.h> > +#include "blk-mq.h" > + > +/** > + * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device > + * @set: tagset to provide the mapping for > + * @dev: rdma device associated with @set. > + * @first_vec: first interrupt vectors to use for queues (usually 0) > + * > + * This function assumes the rdma device @dev has at least as many available > + * interrupt vetors as @set has queues. It will then query it's affinity mask > + * and built queue mapping that maps a queue to the CPUs that have irq affinity > + * for the corresponding vector. > + * > + * In case either the driver passed a @dev with less vectors than > + * @set->nr_hw_queues, or @dev does not provide an affinity mask for a > + * vector, we fallback to the naive mapping. > + */ > +int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set, > + struct ib_device *dev, int first_vec) > +{ > + const struct cpumask *mask; > + unsigned int queue, cpu; > + > + if (set->nr_hw_queues > dev->num_comp_vectors) > + goto fallback; maybe print a warning here? Otherwise looks fine: Reviewed-by: Christoph Hellwig <hch@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c > new file mode 100644 > index 000000000000..d402f7c93528 > --- /dev/null > +++ b/block/blk-mq-rdma.c > @@ -0,0 +1,56 @@ > +/* > + * Copyright (c) 2017 Sagi Grimberg. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + */ shouldn't you include <linux/kobject.h> and <linux/blkdev.h> like in commit 8ec2ef2b66ea2f that fixes blk-mq-pci.c ? > +#include <linux/blk-mq.h> > +#include <linux/blk-mq-rdma.h> > +#include <rdma/ib_verbs.h> > +#include <linux/module.h> > +#include "blk-mq.h" Is this include needed ? > + > +/** > + * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device > + * @set: tagset to provide the mapping for > + * @dev: rdma device associated with @set. > + * @first_vec: first interrupt vectors to use for queues (usually 0) > + * > + * This function assumes the rdma device @dev has at least as many available > + * interrupt vetors as @set has queues. It will then query it's affinity mask > + * and built queue mapping that maps a queue to the CPUs that have irq affinity > + * for the corresponding vector. > + * > + * In case either the driver passed a @dev with less vectors than > + * @set->nr_hw_queues, or @dev does not provide an affinity mask for a > + * vector, we fallback to the naive mapping. > + */ > +int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set, > + struct ib_device *dev, int first_vec) > +{ > + const struct cpumask *mask; > + unsigned int queue, cpu; > + > + if (set->nr_hw_queues > dev->num_comp_vectors) > + goto fallback; > + > + for (queue = 0; queue < set->nr_hw_queues; queue++) { > + mask = ib_get_vector_affinity(dev, first_vec + queue); > + if (!mask) > + goto fallback; Christoph, we can use fallback also in the blk-mq-pci.c in case pci_irq_get_affinity fails, right ? > + > + for_each_cpu(cpu, mask) > + set->mq_map[cpu] = queue; > + } > + > + return 0; > +fallback: > + return blk_mq_map_queues(set); > +} > +EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues); Otherwise, Looks good. Reviewed-by: Max Gurtovoy <maxg@mellanox.com> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 04, 2017 at 10:46:54AM +0300, Max Gurtovoy wrote: >> + if (set->nr_hw_queues > dev->num_comp_vectors) >> + goto fallback; >> + >> + for (queue = 0; queue < set->nr_hw_queues; queue++) { >> + mask = ib_get_vector_affinity(dev, first_vec + queue); >> + if (!mask) >> + goto fallback; > > Christoph, > we can use fallback also in the blk-mq-pci.c in case pci_irq_get_affinity > fails, right ? For PCI it shouldn't fail as the driver calling pci_irq_get_affinity knows how it set up the interrupts. So I don't think it's necessary there. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/02/2017 07:41 AM, Sagi Grimberg wrote: > Like pci and virtio, we add a rdma helper for affinity > spreading. This achieves optimal mq affinity assignments > according to the underlying rdma device affinity maps. Reviewed-by: Jens Axboe <axboe@fb.com>
> shouldn't you include <linux/kobject.h> and <linux/blkdev.h> like in > commit 8ec2ef2b66ea2f that fixes blk-mq-pci.c ? Not really. We can lose these from blk-mq-pci.c as well. >> +#include <linux/blk-mq.h> >> +#include <linux/blk-mq-rdma.h> >> +#include <rdma/ib_verbs.h> >> +#include <linux/module.h> >> +#include "blk-mq.h" > > Is this include needed ? You're right, I can just keep: +#include <linux/blk-mq.h> +#include <linux/blk-mq-rdma.h> +#include <rdma/ib_verbs.h> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/Kconfig b/block/Kconfig index 89cd28f8d051..3ab42bbb06d5 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -206,4 +206,9 @@ config BLK_MQ_VIRTIO depends on BLOCK && VIRTIO default y +config BLK_MQ_RDMA + bool + depends on BLOCK && INFINIBAND + default y + source block/Kconfig.iosched diff --git a/block/Makefile b/block/Makefile index 081bb680789b..4498603dbc83 100644 --- a/block/Makefile +++ b/block/Makefile @@ -26,6 +26,7 @@ obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o obj-$(CONFIG_BLK_MQ_VIRTIO) += blk-mq-virtio.o +obj-$(CONFIG_BLK_MQ_RDMA) += blk-mq-rdma.o obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o obj-$(CONFIG_BLK_WBT) += blk-wbt.o obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c new file mode 100644 index 000000000000..d402f7c93528 --- /dev/null +++ b/block/blk-mq-rdma.c @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2017 Sagi Grimberg. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ +#include <linux/blk-mq.h> +#include <linux/blk-mq-rdma.h> +#include <rdma/ib_verbs.h> +#include <linux/module.h> +#include "blk-mq.h" + +/** + * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device + * @set: tagset to provide the mapping for + * @dev: rdma device associated with @set. + * @first_vec: first interrupt vectors to use for queues (usually 0) + * + * This function assumes the rdma device @dev has at least as many available + * interrupt vetors as @set has queues. It will then query it's affinity mask + * and built queue mapping that maps a queue to the CPUs that have irq affinity + * for the corresponding vector. + * + * In case either the driver passed a @dev with less vectors than + * @set->nr_hw_queues, or @dev does not provide an affinity mask for a + * vector, we fallback to the naive mapping. + */ +int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set, + struct ib_device *dev, int first_vec) +{ + const struct cpumask *mask; + unsigned int queue, cpu; + + if (set->nr_hw_queues > dev->num_comp_vectors) + goto fallback; + + for (queue = 0; queue < set->nr_hw_queues; queue++) { + mask = ib_get_vector_affinity(dev, first_vec + queue); + if (!mask) + goto fallback; + + for_each_cpu(cpu, mask) + set->mq_map[cpu] = queue; + } + + return 0; +fallback: + return blk_mq_map_queues(set); +} +EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues); diff --git a/include/linux/blk-mq-rdma.h b/include/linux/blk-mq-rdma.h new file mode 100644 index 000000000000..b4ade198007d --- /dev/null +++ b/include/linux/blk-mq-rdma.h @@ -0,0 +1,10 @@ +#ifndef _LINUX_BLK_MQ_RDMA_H +#define _LINUX_BLK_MQ_RDMA_H + +struct blk_mq_tag_set; +struct ib_device; + +int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set, + struct ib_device *dev, int first_vec); + +#endif /* _LINUX_BLK_MQ_RDMA_H */
Like pci and virtio, we add a rdma helper for affinity spreading. This achieves optimal mq affinity assignments according to the underlying rdma device affinity maps. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- block/Kconfig | 5 ++++ block/Makefile | 1 + block/blk-mq-rdma.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/blk-mq-rdma.h | 10 ++++++++ 4 files changed, 72 insertions(+) create mode 100644 block/blk-mq-rdma.c create mode 100644 include/linux/blk-mq-rdma.h