diff mbox

[1/2] PCI: Provide sensible irq vector alloc/free routines

Message ID 20160512073552.GA4027@lst.de (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Christoph Hellwig May 12, 2016, 7:35 a.m. UTC
Hi Alex,

what do you think about the incremental patch below?  This should
address the concerns about the strange PPC bridges, although I don't
have a way to test one:

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexander Gordeev May 12, 2016, 9:44 a.m. UTC | #1
On Thu, May 12, 2016 at 09:35:52AM +0200, Christoph Hellwig wrote:
> Hi Alex,
> 
> what do you think about the incremental patch below?  This should
> address the concerns about the strange PPC bridges, although I don't
> have a way to test one:
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index a510484..32ce65e 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1177,6 +1177,7 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int nr_vecs,
>  	if (WARN_ON_ONCE(dev->msi_enabled || dev->msix_enabled))
>  		return -EINVAL;
>  
> +retry:

A retry does not seem needs checking MSI support each time, nor reading
MSI header info like number of vectors (not visible in this patch).

>  	if (!pci_msi_supported(dev, 1))
>  		goto use_legacy_irq;
>  
> @@ -1191,17 +1192,26 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int nr_vecs,
>  	if (!dev->irqs)
>  		return -ENOMEM;
>  
> -	if (dev->msix_cap && !(flags & PCI_IRQ_NOMSIX))
> +	if (dev->msix_cap && !(flags & PCI_IRQ_NOMSIX)) {
>  		ret = __pci_enable_msix(dev, nr_vecs);
> -	else
> +		if (ret < 0)
> +			flags |= PCI_IRQ_NOMSIX;
> +	} else {
>  		ret = __pci_enable_msi(dev, nr_vecs);
> -	if (ret)
> -		goto out_free_irqs;
> +	}
>  
> -	return 0;
> +	/* if we succeeded getting all vectors return the number we got: */
> +	if (!ret)
> +		return nr_vecs;
>  
> -out_free_irqs:
>  	kfree(dev->irqs);
> +	/* if ret is positive it's the numbers of vectors we can use, retry: */
> +	if (ret > 0) {
> +		nr_vecs = ret;
> +		goto retry;
> +	}

Okay, now we have a subtlety here. You switch from MSI-X to MSI in case
MSI-X retries exhausted. At this point nr_vecs is lowest (likely 1). So
you start trying MSI from 1 rather from 32. I would suggest two subsequent
loops instead.

Another thing, pci_alloc_irq_vectors() tries to allocate vectors in a
range from 1 to nr_vecs now. So this function implicitly falls into
the other two range functions family and therefore:

  (a) pci_alloc_irq_vectors() name is not perfec;
  (b) why not introduce 'minvec' minimal number of interrupts then?
      We could have a handy pci_enable_irq_range() as result;
  (c) if you do (b) then PCI_IRQ_NOMSIX flag becomes redundant, since
      caller would invoke pci_enable_msi_range() instead;

> +
> +	/* no MSI or MSI-X vectors available, fall back to the legacy IRQ: */
>  use_legacy_irq:
>  	dev->irqs = &dev->irq;
>  	return 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 12, 2016, 11:03 a.m. UTC | #2
On Thu, May 12, 2016 at 11:44:33AM +0200, Alexander Gordeev wrote:
> On Thu, May 12, 2016 at 09:35:52AM +0200, Christoph Hellwig wrote:
> > Hi Alex,
> > 
> > what do you think about the incremental patch below?  This should
> > address the concerns about the strange PPC bridges, although I don't
> > have a way to test one:
> > 
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index a510484..32ce65e 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -1177,6 +1177,7 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int nr_vecs,
> >  	if (WARN_ON_ONCE(dev->msi_enabled || dev->msix_enabled))
> >  		return -EINVAL;
> >  
> > +retry:
> 
> A retry does not seem needs checking MSI support each time, nor reading
> MSI header info like number of vectors (not visible in this patch).

I can easily skip the pci_msi_supported check, but we either need
to re-read the number of vectors, as that is done differently for
MSI vs MSIX, or totally restructure the code, which doesn't seem worth it,
especially as the existing code also re-reads it every time.

> > -out_free_irqs:
> >  	kfree(dev->irqs);
> > +	/* if ret is positive it's the numbers of vectors we can use, retry: */
> > +	if (ret > 0) {
> > +		nr_vecs = ret;
> > +		goto retry;
> > +	}
> 
> Okay, now we have a subtlety here. You switch from MSI-X to MSI in case
> MSI-X retries exhausted. At this point nr_vecs is lowest (likely 1). So
> you start trying MSI from 1 rather from 32. I would suggest two subsequent
> loops instead.

So bridges could support less MSI-X entries than MSI ones?  Yikes.

> Another thing, pci_alloc_irq_vectors() tries to allocate vectors in a
> range from 1 to nr_vecs now. So this function implicitly falls into
> the other two range functions family and therefore:
> 
>   (a) pci_alloc_irq_vectors() name is not perfec;

What would you call it instead?

>   (b) why not introduce 'minvec' minimal number of interrupts then?
>       We could have a handy pci_enable_irq_range() as result;

That seems pretty pointless, when the caller can simply treat a too
small number as failure and use the existing failure path for that.

>   (c) if you do (b) then PCI_IRQ_NOMSIX flag becomes redundant, since
>       caller would invoke pci_enable_msi_range() instead;

pci_enable_msi_range still has a horrible API that forces the caller
to deal with the irq numbers differently than the MSI-X case, so it
should also go away in the long run.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Gordeev May 12, 2016, 12:11 p.m. UTC | #3
On Thu, May 12, 2016 at 01:03:18PM +0200, Christoph Hellwig wrote:
> > Okay, now we have a subtlety here. You switch from MSI-X to MSI in case
> > MSI-X retries exhausted. At this point nr_vecs is lowest (likely 1). So
> > you start trying MSI from 1 rather from 32. I would suggest two subsequent
> > loops instead.
> 
> So bridges could support less MSI-X entries than MSI ones?  Yikes.

Not this way. MSI vectors could be a scarce resource in a platform. So
even though devices could support more MSI-Xs than MSIs the underlying
platform might fail to fulfil a device request. 

> > Another thing, pci_alloc_irq_vectors() tries to allocate vectors in a
> > range from 1 to nr_vecs now. So this function implicitly falls into
> > the other two range functions family and therefore:
> > 
> >   (a) pci_alloc_irq_vectors() name is not perfec;
> 
> What would you call it instead?

I do not know, really :( I would expect "range" within the name since
a range requested indeed, but I am just hinting here.

> >   (b) why not introduce 'minvec' minimal number of interrupts then?
> >       We could have a handy pci_enable_irq_range() as result;
> 
> That seems pretty pointless, when the caller can simply treat a too
> small number as failure and use the existing failure path for that.

There was a huge discussion on this few years ago, when the range
functions were introduced. Actually, the prototypes of these two is
the outcome of that discussion. I almost sure your point was expressed
by many at the time ;)

> >   (c) if you do (b) then PCI_IRQ_NOMSIX flag becomes redundant, since
> >       caller would invoke pci_enable_msi_range() instead;
> 
> pci_enable_msi_range still has a horrible API that forces the caller
> to deal with the irq numbers differently than the MSI-X case, so it
> should also go away in the long run.

Well, if we introduce pci_enable_irq_range() (or something) and
pci_get_dev_irq(int vector) (or something) that covers MSI-X, MSI and
legacy IRQs then we can get it done now. Your pci_alloc_irq_vectors()
is just few steps from there, huh?

(Sorry for the weird quotting)
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 12, 2016, 12:19 p.m. UTC | #4
On Thu, May 12, 2016 at 02:11:57PM +0200, Alexander Gordeev wrote:
> Not this way. MSI vectors could be a scarce resource in a platform. So
> even though devices could support more MSI-Xs than MSIs the underlying
> platform might fail to fulfil a device request. 

I can go back to just calling the existing exported functions like
the first version did for now, that'll handle it for now.

> > > Another thing, pci_alloc_irq_vectors() tries to allocate vectors in a
> > > range from 1 to nr_vecs now. So this function implicitly falls into
> > > the other two range functions family and therefore:
> > > 
> > >   (a) pci_alloc_irq_vectors() name is not perfec;
> > 
> > What would you call it instead?
> 
> I do not know, really :( I would expect "range" within the name since
> a range requested indeed, but I am just hinting here.

We're requesting multiple vectors, so I think the naming should be fine.

> > >   (b) why not introduce 'minvec' minimal number of interrupts then?
> > >       We could have a handy pci_enable_irq_range() as result;
> > 
> > That seems pretty pointless, when the caller can simply treat a too
> > small number as failure and use the existing failure path for that.
> 
> There was a huge discussion on this few years ago, when the range
> functions were introduced. Actually, the prototypes of these two is
> the outcome of that discussion. I almost sure your point was expressed
> by many at the time ;)

A quick audit shows that there are indeed a few users of this
interface.  I can add pci_alloc_irq_vectors_range that allows passing
a minvec, and the old pci_alloc_irq_vectors interface for everyone
who wants to keep it simple.

> > pci_enable_msi_range still has a horrible API that forces the caller
> > to deal with the irq numbers differently than the MSI-X case, so it
> > should also go away in the long run.
> 
> Well, if we introduce pci_enable_irq_range() (or something) and
> pci_get_dev_irq(int vector) (or something) that covers MSI-X, MSI and
> legacy IRQs then we can get it done now. Your pci_alloc_irq_vectors()
> is just few steps from there, huh?

Or we can just look at pdev->irqs as in my patch.  That'll work without
much overhead for the single IRQ case as it can just point to ->irq,
and gives a neat interface for MSI-X and the rare multi-MSI case.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a510484..32ce65e 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1177,6 +1177,7 @@  int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int nr_vecs,
 	if (WARN_ON_ONCE(dev->msi_enabled || dev->msix_enabled))
 		return -EINVAL;
 
+retry:
 	if (!pci_msi_supported(dev, 1))
 		goto use_legacy_irq;
 
@@ -1191,17 +1192,26 @@  int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int nr_vecs,
 	if (!dev->irqs)
 		return -ENOMEM;
 
-	if (dev->msix_cap && !(flags & PCI_IRQ_NOMSIX))
+	if (dev->msix_cap && !(flags & PCI_IRQ_NOMSIX)) {
 		ret = __pci_enable_msix(dev, nr_vecs);
-	else
+		if (ret < 0)
+			flags |= PCI_IRQ_NOMSIX;
+	} else {
 		ret = __pci_enable_msi(dev, nr_vecs);
-	if (ret)
-		goto out_free_irqs;
+	}
 
-	return 0;
+	/* if we succeeded getting all vectors return the number we got: */
+	if (!ret)
+		return nr_vecs;
 
-out_free_irqs:
 	kfree(dev->irqs);
+	/* if ret is positive it's the numbers of vectors we can use, retry: */
+	if (ret > 0) {
+		nr_vecs = ret;
+		goto retry;
+	}
+
+	/* no MSI or MSI-X vectors available, fall back to the legacy IRQ: */
 use_legacy_irq:
 	dev->irqs = &dev->irq;
 	return 1;