Message ID | 20181229032650.27256-2-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | nvme pci: two fixes on nvme_setup_irqs | expand |
On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote: > The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC > if leass than @min_vecs interrupt vectors are available for @dev. s/leass/fewer/ > However, this way may be changed by falling back to > __pci_enable_msi_range(), for example, if the device isn't capable of > MSI, __pci_enable_msi_range() will return -EINVAL, and finally it is > returned to users of pci_alloc_irq_vectors_affinity() even though > there are quite MSIX vectors available. This way violates the interface. I *think* the above means: If a device supports MSI-X but not MSI and a caller requests @min_vecs that can't be satisfied by MSI-X, we previously returned -EINVAL (from the failed attempt to enable MSI), not -ENOSPC. and I agree that this doesn't match the documented API. > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq > vectors and allocate vectors again in case that -ENOSPC is returned, such > as NVMe, so we need to respect the current interface and give preference to > -ENOSPC. I thought the whole point of the (min_vecs, max_vecs) tuple was to avoid this sort of "reduce and try again" iteration in the callers. > Cc: Jens Axboe <axboe@fb.com>, > Cc: Keith Busch <keith.busch@intel.com>, > Cc: linux-pci@vger.kernel.org, > Cc: Bjorn Helgaas <bhelgaas@google.com>, > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > drivers/pci/msi.c | 20 +++++++++++--------- > 1 file changed, 11 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 7a1c8a09efa5..91b4f03fee91 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > const struct irq_affinity *affd) > { > static const struct irq_affinity msi_default_affd; > - int vecs = -ENOSPC; > + int msix_vecs = -ENOSPC; > + int msi_vecs = -ENOSPC; > > if (flags & PCI_IRQ_AFFINITY) { > if (!affd) > @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > } > > if (flags & PCI_IRQ_MSIX) { > - vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, > - affd); > - if (vecs > 0) > - return vecs; > + msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs, > + max_vecs, affd); > + if (msix_vecs > 0) > + return msix_vecs; > } > > if (flags & PCI_IRQ_MSI) { > - vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); > - if (vecs > 0) > - return vecs; > + msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, > + affd); > + if (msi_vecs > 0) > + return msi_vecs; > } > > /* use legacy irq if allowed */ > @@ -1199,7 +1201,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > } > } > > - return vecs; > + return msix_vecs == -ENOSPC ? msix_vecs : msi_vecs; If you know you want to return -ENOSPC, just return that, not a variable that happens to contain it, i.e., if (msix_vecs == -ENOSPC) return -ENOSPC; return msi_vecs; > } > EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity); > > -- > 2.9.5 > > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme
On Mon, Dec 31, 2018 at 04:00:59PM -0600, Bjorn Helgaas wrote: > On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote: > > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq > > vectors and allocate vectors again in case that -ENOSPC is returned, such > > as NVMe, so we need to respect the current interface and give preference to > > -ENOSPC. > > I thought the whole point of the (min_vecs, max_vecs) tuple was to > avoid this sort of "reduce and try again" iteration in the callers. The min/max vecs doesn't work correctly when using the irq_affinity nr_sets because rebalancing the set counts is driver specific. To get around that, drivers using nr_sets have to set min and max to the same value and handle the "reduce and try again".
On Mon, Dec 31, 2018 at 04:00:59PM -0600, Bjorn Helgaas wrote: > On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote: > > The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC > > if leass than @min_vecs interrupt vectors are available for @dev. > > s/leass/fewer/ > > > However, this way may be changed by falling back to > > __pci_enable_msi_range(), for example, if the device isn't capable of > > MSI, __pci_enable_msi_range() will return -EINVAL, and finally it is > > returned to users of pci_alloc_irq_vectors_affinity() even though > > there are quite MSIX vectors available. This way violates the interface. > > I *think* the above means: > > If a device supports MSI-X but not MSI and a caller requests > @min_vecs that can't be satisfied by MSI-X, we previously returned > -EINVAL (from the failed attempt to enable MSI), not -ENOSPC. > > and I agree that this doesn't match the documented API. OK, will use the above comment log. > > > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq > > vectors and allocate vectors again in case that -ENOSPC is returned, such > > as NVMe, so we need to respect the current interface and give preference to > > -ENOSPC. > > I thought the whole point of the (min_vecs, max_vecs) tuple was to > avoid this sort of "reduce and try again" iteration in the callers. As Keith replied, in case of NVMe, we have to keep min_vecs same with max_vecs. > > > Cc: Jens Axboe <axboe@fb.com>, > > Cc: Keith Busch <keith.busch@intel.com>, > > Cc: linux-pci@vger.kernel.org, > > Cc: Bjorn Helgaas <bhelgaas@google.com>, > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > drivers/pci/msi.c | 20 +++++++++++--------- > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > index 7a1c8a09efa5..91b4f03fee91 100644 > > --- a/drivers/pci/msi.c > > +++ b/drivers/pci/msi.c > > @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > > const struct irq_affinity *affd) > > { > > static const struct irq_affinity msi_default_affd; > > - int vecs = -ENOSPC; > > + int msix_vecs = -ENOSPC; > > + int msi_vecs = -ENOSPC; > > > > if (flags & PCI_IRQ_AFFINITY) { > > if (!affd) > > @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > > } > > > > if (flags & PCI_IRQ_MSIX) { > > - vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, > > - affd); > > - if (vecs > 0) > > - return vecs; > > + msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs, > > + max_vecs, affd); > > + if (msix_vecs > 0) > > + return msix_vecs; > > } > > > > if (flags & PCI_IRQ_MSI) { > > - vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); > > - if (vecs > 0) > > - return vecs; > > + msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, > > + affd); > > + if (msi_vecs > 0) > > + return msi_vecs; > > } > > > > /* use legacy irq if allowed */ > > @@ -1199,7 +1201,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > > } > > } > > > > - return vecs; > > + return msix_vecs == -ENOSPC ? msix_vecs : msi_vecs; > > If you know you want to return -ENOSPC, just return that, not a > variable that happens to contain it, i.e., > > if (msix_vecs == -ENOSPC) > return -ENOSPC; > return msi_vecs; OK. Thanks, Ming
On Tue, Jan 01, 2019 at 01:24:59PM +0800, Ming Lei wrote: > On Mon, Dec 31, 2018 at 04:00:59PM -0600, Bjorn Helgaas wrote: > > On Sat, Dec 29, 2018 at 11:26:48AM +0800, Ming Lei wrote: ... > > > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq > > > vectors and allocate vectors again in case that -ENOSPC is returned, such > > > as NVMe, so we need to respect the current interface and give preference to > > > -ENOSPC. > > > > I thought the whole point of the (min_vecs, max_vecs) tuple was to > > avoid this sort of "reduce and try again" iteration in the callers. > > As Keith replied, in case of NVMe, we have to keep min_vecs same with > max_vecs. Keith said: > The min/max vecs doesn't work correctly when using the irq_affinity > nr_sets because rebalancing the set counts is driver specific. To > get around that, drivers using nr_sets have to set min and max to > the same value and handle the "reduce and try again". Sorry I saw that, but didn't follow it at first. After a little archaeology, I see that 6da4b3ab9a6e ("genirq/affinity: Add support for allocating interrupt sets") added nr_sets and some validation tests (if affd.nr_sets, min_vecs == max_vecs) for using it in the API. That's sort of a wart on the API, but I don't know if we should live with it or try to clean it up somehow. At the very least, this seems like something that could be documented somewhere in Documentation/PCI/MSI-HOWTO.txt, which mentions PCI_IRQ_AFFINITY, but doesn't cover struct irq_affinity or pci_alloc_irq_vectors_affinity() at all, let alone this wrinkle about affd.nr_sets/min_vecs/max_vecs. Obviously that would not be part of *this* patch. Bjorn
On Wed, Jan 02, 2019 at 03:02:02PM -0600, Bjorn Helgaas wrote: > Keith said: > > The min/max vecs doesn't work correctly when using the irq_affinity > > nr_sets because rebalancing the set counts is driver specific. To > > get around that, drivers using nr_sets have to set min and max to > > the same value and handle the "reduce and try again". > > Sorry I saw that, but didn't follow it at first. After a little > archaeology, I see that 6da4b3ab9a6e ("genirq/affinity: Add support > for allocating interrupt sets") added nr_sets and some validation > tests (if affd.nr_sets, min_vecs == max_vecs) for using it in the API. > > That's sort of a wart on the API, but I don't know if we should live > with it or try to clean it up somehow. Yeah, that interface is a bit awkward. I was thinking it would be nice to thread a driver callback to PCI for the driver to redistribute the sets as needed and let the PCI handle the retries as before. I am testing with the following, and seems to work, but I'm getting some unexpected warnings from blk-mq when I have nvme use it. Still investigating that, but just throwing this out for early feedback. --- diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7a1c8a09efa5..e33abb167c19 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1035,13 +1035,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, if (maxvec < minvec) return -ERANGE; - /* - * If the caller is passing in sets, we can't support a range of - * vectors. The caller needs to handle that. - */ - if (affd && affd->nr_sets && minvec != maxvec) - return -EINVAL; - if (WARN_ON_ONCE(dev->msi_enabled)) return -EINVAL; @@ -1093,13 +1086,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev, if (maxvec < minvec) return -ERANGE; - /* - * If the caller is passing in sets, we can't support a range of - * supported vectors. The caller needs to handle that. - */ - if (affd && affd->nr_sets && minvec != maxvec) - return -EINVAL; - if (WARN_ON_ONCE(dev->msix_enabled)) return -EINVAL; @@ -1110,6 +1096,9 @@ static int __pci_enable_msix_range(struct pci_dev *dev, return -ENOSPC; } + if (nvec != maxvec && affd && affd->recalc_sets) + affd->recalc_sets(affd, nvec); + rc = __pci_enable_msix(dev, entries, nvec, affd); if (rc == 0) return nvec; diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index c672f34235e7..326c9bd05f62 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -249,12 +249,16 @@ struct irq_affinity_notify { * the MSI(-X) vector space * @nr_sets: Length of passed in *sets array * @sets: Number of affinitized sets + * @recalc_sets: Recalculate sets when requested allocation failed + * @priv: Driver private data */ struct irq_affinity { int pre_vectors; int post_vectors; int nr_sets; int *sets; + void (*recalc_sets)(struct irq_affinity *, unsigned int); + void *priv; }; /** --
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7a1c8a09efa5..91b4f03fee91 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, const struct irq_affinity *affd) { static const struct irq_affinity msi_default_affd; - int vecs = -ENOSPC; + int msix_vecs = -ENOSPC; + int msi_vecs = -ENOSPC; if (flags & PCI_IRQ_AFFINITY) { if (!affd) @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } if (flags & PCI_IRQ_MSIX) { - vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, - affd); - if (vecs > 0) - return vecs; + msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs, + max_vecs, affd); + if (msix_vecs > 0) + return msix_vecs; } if (flags & PCI_IRQ_MSI) { - vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); - if (vecs > 0) - return vecs; + msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, + affd); + if (msi_vecs > 0) + return msi_vecs; } /* use legacy irq if allowed */ @@ -1199,7 +1201,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } } - return vecs; + return msix_vecs == -ENOSPC ? msix_vecs : msi_vecs; } EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC if leass than @min_vecs interrupt vectors are available for @dev. However, this way may be changed by falling back to __pci_enable_msi_range(), for example, if the device isn't capable of MSI, __pci_enable_msi_range() will return -EINVAL, and finally it is returned to users of pci_alloc_irq_vectors_affinity() even though there are quite MSIX vectors available. This way violates the interface. Users of pci_alloc_irq_vectors_affinity() may try to reduce irq vectors and allocate vectors again in case that -ENOSPC is returned, such as NVMe, so we need to respect the current interface and give preference to -ENOSPC. Cc: Jens Axboe <axboe@fb.com>, Cc: Keith Busch <keith.busch@intel.com>, Cc: linux-pci@vger.kernel.org, Cc: Bjorn Helgaas <bhelgaas@google.com>, Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/pci/msi.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)