Message ID | 20190103013106.26452-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI/MSI: preference to returning -ENOSPC from pci_alloc_irq_vectors_affinity | expand |
On Thu, Jan 03, 2019 at 09:31:06AM +0800, Ming Lei wrote: > The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC > if fewer than @min_vecs interrupt vectors are available for @dev. > > However, if a device supports MSI-X but not MSI and a caller requests > @min_vecs that can't be satisfied by MSI-X, we previously returned > -EINVAL (from the failed attempt to enable MSI), not -ENOSPC. > > Users of pci_alloc_irq_vectors_affinity() may try to reduce irq vectors > and allocate vectors again in case that -ENOSPC is returned, such as NVMe, > so we need to respect the current interface and give preference to -ENOSPC. > > Especially the min/max vecs doesn't work correctly when using the > irq_affinity nr_sets because rebalancing the set counts is driver specific. > To get around that, drivers using nr_sets have to set min and max to the same > value and handle the "reduce and try again". > > Cc: Jens Axboe <axboe@fb.com> > Cc: Keith Busch <keith.busch@intel.com> > Cc: linux-nvme@lists.infradead.org > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: Christoph Hellwig <hch@lst.de> > Signed-off-by: Ming Lei <ming.lei@redhat.com> Applied to pci/msi for v5.1, thanks! If this is something that should be in v5.0, let me know and include the justification, e.g., something we already merged for v5.0 or regression info, etc, and a Fixes: line, and I'll move it to for-linus. > --- > drivers/pci/msi.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index 7a1c8a09efa5..4c0b47867258 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > const struct irq_affinity *affd) > { > static const struct irq_affinity msi_default_affd; > - int vecs = -ENOSPC; > + int msix_vecs = -ENOSPC; > + int msi_vecs = -ENOSPC; > > if (flags & PCI_IRQ_AFFINITY) { > if (!affd) > @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > } > > if (flags & PCI_IRQ_MSIX) { > - vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, > - affd); > - if (vecs > 0) > - return vecs; > + msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs, > + max_vecs, affd); > + if (msix_vecs > 0) > + return msix_vecs; > } > > if (flags & PCI_IRQ_MSI) { > - vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); > - if (vecs > 0) > - return vecs; > + msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, > + affd); > + if (msi_vecs > 0) > + return msi_vecs; > } > > /* use legacy irq if allowed */ > @@ -1199,7 +1201,9 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > } > } > > - return vecs; > + if (msix_vecs == -ENOSPC) > + return -ENOSPC; > + return msi_vecs; > } > EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity); > > -- > 2.9.5 > > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme
On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote: > Applied to pci/msi for v5.1, thanks! > > If this is something that should be in v5.0, let me know and include the > justification, e.g., something we already merged for v5.0 or regression > info, etc, and a Fixes: line, and I'll move it to for-linus. I'd be tempted to queues this up for 5.0. Ming, what is your position?
On 1/15/19 6:11 AM, Christoph Hellwig wrote: > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote: >> Applied to pci/msi for v5.1, thanks! >> >> If this is something that should be in v5.0, let me know and include the >> justification, e.g., something we already merged for v5.0 or regression >> info, etc, and a Fixes: line, and I'll move it to for-linus. > > I'd be tempted to queues this up for 5.0. Ming, what is your position? I think we should - the API was introduced in this series, I think there's little (to no) reason NOT to fix it for 5.0.
On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote: > On 1/15/19 6:11 AM, Christoph Hellwig wrote: > > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote: > >> Applied to pci/msi for v5.1, thanks! > >> > >> If this is something that should be in v5.0, let me know and include the > >> justification, e.g., something we already merged for v5.0 or regression > >> info, etc, and a Fixes: line, and I'll move it to for-linus. > > > > I'd be tempted to queues this up for 5.0. Ming, what is your position? > > I think we should - the API was introduced in this series, I think there's > little (to no) reason NOT to fix it for 5.0. I'm guessing the justification goes something like this (I haven't done all the research, so I'll leave it to Ming to fill in the details): pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("..."). It had this return value defect then, but its min_vecs/max_vecs parameters removed the need for callers to interatively reduce the number of IRQs requested and retry the allocation, so they didn't need to distinguish -ENOSPC from -EINVAL. In v5.0, XXX ("...") added IRQ sets to the interface, which reintroduced the need to check for -ENOSPC and possibly reduce the number of IRQs requested and retry the allocation.
Hi Bjorn, I think Christoph and Jens are correct, we should make this patch into 5.0 because the issue is triggered since 3b6592f70ad7b4c2 ("nvme: utilize two queue maps, one for reads and one for writes"), which is merged to 5.0-rc. For example, before 3b6592f70ad7b4c2, one nvme controller may be allocated 64 irq vectors; but after that commit, only 1 irq vector is assigned to this controller. On Tue, Jan 15, 2019 at 01:31:35PM -0600, Bjorn Helgaas wrote: > On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote: > > On 1/15/19 6:11 AM, Christoph Hellwig wrote: > > > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote: > > >> Applied to pci/msi for v5.1, thanks! > > >> > > >> If this is something that should be in v5.0, let me know and include the > > >> justification, e.g., something we already merged for v5.0 or regression > > >> info, etc, and a Fixes: line, and I'll move it to for-linus. > > > > > > I'd be tempted to queues this up for 5.0. Ming, what is your position? > > > > I think we should - the API was introduced in this series, I think there's > > little (to no) reason NOT to fix it for 5.0. > > I'm guessing the justification goes something like this (I haven't > done all the research, so I'll leave it to Ming to fill in the details): > > pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("..."). dca51e7892fa3b ("nvme: switch to use pci_alloc_irq_vectors") > It had this return value defect then, but its min_vecs/max_vecs > parameters removed the need for callers to interatively reduce the > number of IRQs requested and retry the allocation, so they didn't > need to distinguish -ENOSPC from -EINVAL. > > In v5.0, XXX ("...") added IRQ sets to the interface, which 3b6592f70ad7b4c2 ("nvme: utilize two queue maps, one for reads and one for writes") > reintroduced the need to check for -ENOSPC and possibly reduce the > number of IRQs requested and retry the allocation. Thanks, Ming
On Wed, Jan 16, 2019 at 06:46:32AM +0800, Ming Lei wrote: > Hi Bjorn, > > I think Christoph and Jens are correct, we should make this patch into > 5.0 because the issue is triggered since 3b6592f70ad7b4c2 ("nvme: utilize > two queue maps, one for reads and one for writes"), which is merged to > 5.0-rc. > > For example, before 3b6592f70ad7b4c2, one nvme controller may be > allocated 64 irq vectors; but after that commit, only 1 irq vector > is assigned to this controller. > > On Tue, Jan 15, 2019 at 01:31:35PM -0600, Bjorn Helgaas wrote: > > On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote: > > > On 1/15/19 6:11 AM, Christoph Hellwig wrote: > > > > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote: > > > >> Applied to pci/msi for v5.1, thanks! > > > >> > > > >> If this is something that should be in v5.0, let me know and include the > > > >> justification, e.g., something we already merged for v5.0 or regression > > > >> info, etc, and a Fixes: line, and I'll move it to for-linus. > > > > > > > > I'd be tempted to queues this up for 5.0. Ming, what is your position? > > > > > > I think we should - the API was introduced in this series, I think there's > > > little (to no) reason NOT to fix it for 5.0. > > > > I'm guessing the justification goes something like this (I haven't > > done all the research, so I'll leave it to Ming to fill in the details): > > > > pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("..."). > > dca51e7892fa3b ("nvme: switch to use pci_alloc_irq_vectors") > > > It had this return value defect then, but its min_vecs/max_vecs > > parameters removed the need for callers to interatively reduce the > > number of IRQs requested and retry the allocation, so they didn't > > need to distinguish -ENOSPC from -EINVAL. > > > > In v5.0, XXX ("...") added IRQ sets to the interface, which > > 3b6592f70ad7b4c2 ("nvme: utilize two queue maps, one for reads and one for writes") > > > reintroduced the need to check for -ENOSPC and possibly reduce the > > number of IRQs requested and retry the allocation. We're fixing a PCI core defect, so we should mention the relevant PCI core commits, not the nvme-specific ones. I looked them up for you and moved this to for-linus for v5.0. commit 77f88abd4a6f73a1a68dbdc0e3f21575fd508fc3 Author: Ming Lei <ming.lei@redhat.com> Date: Tue Jan 15 17:31:29 2019 -0600 PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() The API of pci_alloc_irq_vectors_affinity() says it returns -ENOSPC if fewer than @min_vecs interrupt vectors are available for @dev. However, if a device supports MSI-X but not MSI and a caller requests @min_vecs that can't be satisfied by MSI-X, we previously returned -EINVAL (from the failed attempt to enable MSI), not -ENOSPC. When -ENOSPC is returned, callers may reduce the number IRQs they request and try again. Most callers can use the @min_vecs and @max_vecs parameters to avoid this retry loop, but that doesn't work when using IRQ affinity "nr_sets" because rebalancing the sets is driver-specific. This return value bug has been present since pci_alloc_irq_vectors() was added in v4.10 by aff171641d18 ("PCI: Provide sensible IRQ vector alloc/free routines"), but it wasn't an issue because @min_vecs/@max_vecs removed the need for callers to iteratively reduce the number of IRQs requested and retry the allocation, so they didn't need to distinguish -ENOSPC from -EINVAL. In v5.0, 6da4b3ab9a6e ("genirq/affinity: Add support for allocating interrupt sets") added IRQ sets to the interface, which reintroduced the need to check for -ENOSPC and possibly reduce the number of IRQs requested and retry the allocation. Signed-off-by: Ming Lei <ming.lei@redhat.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7a1c8a09efa5..4c0b47867258 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, const struct irq_affinity *affd) { static const struct irq_affinity msi_default_affd; - int vecs = -ENOSPC; + int msix_vecs = -ENOSPC; + int msi_vecs = -ENOSPC; if (flags & PCI_IRQ_AFFINITY) { if (!affd) @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } if (flags & PCI_IRQ_MSIX) { - vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, - affd); - if (vecs > 0) - return vecs; + msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs, + max_vecs, affd); + if (msix_vecs > 0) + return msix_vecs; } if (flags & PCI_IRQ_MSI) { - vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); - if (vecs > 0) - return vecs; + msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, + affd); + if (msi_vecs > 0) + return msi_vecs; } /* use legacy irq if allowed */ @@ -1199,7 +1201,9 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } } - return vecs; + if (msix_vecs == -ENOSPC) + return -ENOSPC; + return msi_vecs; } EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7a1c8a09efa5..4c0b47867258 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, const struct irq_affinity *affd) { static const struct irq_affinity msi_default_affd; - int vecs = -ENOSPC; + int msix_vecs = -ENOSPC; + int msi_vecs = -ENOSPC; if (flags & PCI_IRQ_AFFINITY) { if (!affd) @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } if (flags & PCI_IRQ_MSIX) { - vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, - affd); - if (vecs > 0) - return vecs; + msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs, + max_vecs, affd); + if (msix_vecs > 0) + return msix_vecs; } if (flags & PCI_IRQ_MSI) { - vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); - if (vecs > 0) - return vecs; + msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, + affd); + if (msi_vecs > 0) + return msi_vecs; } /* use legacy irq if allowed */ @@ -1199,7 +1201,9 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } } - return vecs; + if (msix_vecs == -ENOSPC) + return -ENOSPC; + return msi_vecs; } EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC if fewer than @min_vecs interrupt vectors are available for @dev. However, if a device supports MSI-X but not MSI and a caller requests @min_vecs that can't be satisfied by MSI-X, we previously returned -EINVAL (from the failed attempt to enable MSI), not -ENOSPC. Users of pci_alloc_irq_vectors_affinity() may try to reduce irq vectors and allocate vectors again in case that -ENOSPC is returned, such as NVMe, so we need to respect the current interface and give preference to -ENOSPC. Especially the min/max vecs doesn't work correctly when using the irq_affinity nr_sets because rebalancing the set counts is driver specific. To get around that, drivers using nr_sets have to set min and max to the same value and handle the "reduce and try again". Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-nvme@lists.infradead.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/pci/msi.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)