Message ID | 20090507082841.GA31751@redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote: > pci_enable_msix currently returns -EINVAL if you ask > for more vectors than supported by the device, which would > typically cause fallback to regular interrupts. > > It's better to return the table size, making the driver retry > MSI-X with less vectors. Hi Michael I think driver should read from capability list to know how many vector supported by this device before enable MSI-X for device, as pci_msix_table_size() did...
On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote: > On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote: > > pci_enable_msix currently returns -EINVAL if you ask > > for more vectors than supported by the device, which would > > typically cause fallback to regular interrupts. > > > > It's better to return the table size, making the driver retry > > MSI-X with less vectors. > > Hi Michael > > I think driver should read from capability list to know how many vector > supported by this device before enable MSI-X for device, as > pci_msix_table_size() did... Drivers can do this, but it's more code. Since pci_enable_msix calls pci_msix_table_size already, let it do the work. Right? > -- > regards > Yang, Sheng > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > > > Hi Jesse, > > This came up when I was adding MSI-X support to virtio pci driver, > > which does not know the exact table size upfront. > > Could you consider this patch for 2.6.31 please? > > > > > > drivers/pci/msi.c | 6 +++--- > > 1 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > index 6f2e629..f5bd1c9 100644 > > --- a/drivers/pci/msi.c > > +++ b/drivers/pci/msi.c > > @@ -687,8 +687,8 @@ int pci_msix_table_size(struct pci_dev *dev) > > * indicates the successful configuration of MSI-X capability structure > > * with new allocated MSI-X irqs. A return of < 0 indicates a failure. > > * Or a return of > 0 indicates that driver request is exceeding the > > number - * of irqs available. Driver should use the returned value to > > re-send - * its request. > > + * of irqs or MSI-X vectors available. Driver should use the returned > > value to + * re-send its request. > > **/ > > int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int > > nvec) { > > @@ -704,7 +704,7 @@ int pci_enable_msix(struct pci_dev* dev, struct > > msix_entry *entries, int nvec) > > > > nr_entries = pci_msix_table_size(dev); > > if (nvec > nr_entries) > > - return -EINVAL; > > + return nr_entries; > > > > /* Check for any invalid entries */ > > for (i = 0; i < nvec; i++) { >
On Thursday 07 May 2009 17:05:06 Michael S. Tsirkin wrote: > On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote: > > On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote: > > > pci_enable_msix currently returns -EINVAL if you ask > > > for more vectors than supported by the device, which would > > > typically cause fallback to regular interrupts. > > > > > > It's better to return the table size, making the driver retry > > > MSI-X with less vectors. > > > > Hi Michael > > > > I think driver should read from capability list to know how many vector > > supported by this device before enable MSI-X for device, as > > pci_msix_table_size() did... > > Drivers can do this, but it's more code. Since pci_enable_msix > calls pci_msix_table_size already, let it do the work. Right? If you don't know the vectors number before you enable MSI-X, how can you setup vectors? I don't think it's proper way to go.
On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote: > On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote: > > pci_enable_msix currently returns -EINVAL if you ask > > for more vectors than supported by the device, which would > > typically cause fallback to regular interrupts. > > > > It's better to return the table size, making the driver retry > > MSI-X with less vectors. > > Hi Michael > > I think driver should read from capability list to know how many vector > supported by this device before enable MSI-X for device, as > pci_msix_table_size() did... I think Michael's patch makes sense. It reduces the amount of work the driver has to do without requiring any additional work in the core. I don't see the disadvantage to it. Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
On Thu, May 07, 2009 at 05:10:46PM +0800, Sheng Yang wrote: > > > I think driver should read from capability list to know how many vector > > > supported by this device before enable MSI-X for device, as > > > pci_msix_table_size() did... > > > > Drivers can do this, but it's more code. Since pci_enable_msix > > calls pci_msix_table_size already, let it do the work. Right? > > If you don't know the vectors number before you enable MSI-X, how can you > setup vectors? I don't know how many irqs I will be able to get anyway. vectors that can't be assigned are useless ...
On Thursday 07 May 2009 17:27:31 Matthew Wilcox wrote: > On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote: > > On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote: > > > pci_enable_msix currently returns -EINVAL if you ask > > > for more vectors than supported by the device, which would > > > typically cause fallback to regular interrupts. > > > > > > It's better to return the table size, making the driver retry > > > MSI-X with less vectors. > > > > Hi Michael > > > > I think driver should read from capability list to know how many vector > > supported by this device before enable MSI-X for device, as > > pci_msix_table_size() did... > > I think Michael's patch makes sense. It reduces the amount of work the > driver has to do without requiring any additional work in the core. I > don't see the disadvantage to it. > > Reviewed-by: Matthew Wilcox <willy@linux.intel.com> It's indeed weird. Why the semantic of pci_enable_msix can be changed to "enable msix, or tell me how many vector do you have"? You can simply call pci_msix_table_size() to get what you want, also without any more work, no? I can't understand...
On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote: > It's indeed weird. Why the semantic of pci_enable_msix can be changed to > "enable msix, or tell me how many vector do you have"? You can simply call > pci_msix_table_size() to get what you want, also without any more work, no? I > can't understand... Here's a good example. Let's suppose you have a driver which supports two different models of cards, one has 16 MSI-X interrupts, the other has 10. You can call pci_enable_msix() asking for 16 vectors. If your card is model A, you get 16 interrupts. If your card is model B, it says "you can have 10". This is less work in the driver (since it must implement falling back to a smaller number of interrupts *anyway*) than interrogating the card to find out how many interrupts there are, then requesting the right number, and still having the fallback path which is going to be less tested.
On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote: > On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote: > > It's indeed weird. Why the semantic of pci_enable_msix can be changed to > > "enable msix, or tell me how many vector do you have"? You can simply > > call pci_msix_table_size() to get what you want, also without any more > > work, no? I can't understand... > > Here's a good example. Let's suppose you have a driver which supports > two different models of cards, one has 16 MSI-X interrupts, the other > has 10. You can call pci_enable_msix() asking for 16 vectors. If your > card is model A, you get 16 interrupts. If your card is model B, it says > "you can have 10". > > This is less work in the driver (since it must implement falling back to > a smaller number of interrupts *anyway*) than interrogating the card to > find out how many interrupts there are, then requesting the right number, > and still having the fallback path which is going to be less tested. Yeah, partly understand now. But the confusing of return value is not that pleasure compared to this benefit. And even you have to fall back if return > 0 anyway, but in the past, you just need fall back once at most; but now you may fall back twice. This make thing more complex - you need either two ifs or a simple loop. And just one "if" can deal with it before. All that required is one call for pci_msix_table_size(), and I believe most driver would like to know how much vector it have before it fill the vectors, so mostly no extra cost. But for this ambiguous return meaning, you have to add more code for fall back - yes, the driver may can assert that the positive return value always would be irq numbers if it call pci_msix_table_size() before, but is it safe in logic?
On Thu, 2009-05-07 at 03:53 -0600, Matthew Wilcox wrote: > On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote: > > It's indeed weird. Why the semantic of pci_enable_msix can be changed to > > "enable msix, or tell me how many vector do you have"? You can simply call > > pci_msix_table_size() to get what you want, also without any more work, no? I > > can't understand... > > Here's a good example. Let's suppose you have a driver which supports > two different models of cards, one has 16 MSI-X interrupts, the other > has 10. You can call pci_enable_msix() asking for 16 vectors. If your > card is model A, you get 16 interrupts. If your card is model B, it says > "you can have 10". > > This is less work in the driver (since it must implement falling back to > a smaller number of interrupts *anyway*) than interrogating the card to > find out how many interrupts there are, then requesting the right number, > and still having the fallback path which is going to be less tested. Not to mention that there's no guarantee that you'll get as many interrupts as the device supports, so you should really be coding to cope with that anyway. Like the example in MSI-HOWTO.txt: 197 static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) 198 { 199 while (nvec >= FOO_DRIVER_MINIMUM_NVEC) { 200 rc = pci_enable_msix(adapter->pdev, 201 adapter->msix_entries, nvec); 202 if (rc > 0) 203 nvec = rc; 204 else 205 return rc; 206 } 207 208 return -ENOSPC; 209 } So I agree, this patch is an improvement. cheers
On Thursday 07 May 2009 18:23:50 Michael Ellerman wrote: > On Thu, 2009-05-07 at 03:53 -0600, Matthew Wilcox wrote: > > On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote: > > > It's indeed weird. Why the semantic of pci_enable_msix can be changed > > > to "enable msix, or tell me how many vector do you have"? You can > > > simply call pci_msix_table_size() to get what you want, also without > > > any more work, no? I can't understand... > > > > Here's a good example. Let's suppose you have a driver which supports > > two different models of cards, one has 16 MSI-X interrupts, the other > > has 10. You can call pci_enable_msix() asking for 16 vectors. If your > > card is model A, you get 16 interrupts. If your card is model B, it says > > "you can have 10". > > > > This is less work in the driver (since it must implement falling back to > > a smaller number of interrupts *anyway*) than interrogating the card to > > find out how many interrupts there are, then requesting the right number, > > and still having the fallback path which is going to be less tested. > > Not to mention that there's no guarantee that you'll get as many > interrupts as the device supports, so you should really be coding to > cope with that anyway. Like the example in MSI-HOWTO.txt: > > 197 static int foo_driver_enable_msix(struct foo_adapter *adapter, int > nvec) 198 { > 199 while (nvec >= FOO_DRIVER_MINIMUM_NVEC) { > 200 rc = pci_enable_msix(adapter->pdev, > 201 adapter->msix_entries, nvec); > 202 if (rc > 0) > 203 nvec = rc; > 204 else > 205 return rc; > 206 } > 207 > 208 return -ENOSPC; > 209 } > > So I agree, this patch is an improvement. > Oh yeah. Forgot irq counts can also be changed from time to time. OK, there should be a loop, so that's fine. :)
On Thu, May 07, 2009 at 06:19:53PM +0800, Sheng Yang wrote: > On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote: > > On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote: > > > It's indeed weird. Why the semantic of pci_enable_msix can be changed to > > > "enable msix, or tell me how many vector do you have"? You can simply > > > call pci_msix_table_size() to get what you want, also without any more > > > work, no? I can't understand... > > > > Here's a good example. Let's suppose you have a driver which supports > > two different models of cards, one has 16 MSI-X interrupts, the other > > has 10. You can call pci_enable_msix() asking for 16 vectors. If your > > card is model A, you get 16 interrupts. If your card is model B, it says > > "you can have 10". > > > > This is less work in the driver (since it must implement falling back to > > a smaller number of interrupts *anyway*) than interrogating the card to > > find out how many interrupts there are, then requesting the right number, > > and still having the fallback path which is going to be less tested. > > Yeah, partly understand now. > > But the confusing of return value is not that pleasure compared to this > benefit. And even you have to fall back if return > 0 anyway, but in the past, > you just need fall back once at most; but now you may fall back twice. I don't think that's right - you might not be able to get the number of interrupts that pci_enable_msix reported. > This > make thing more complex - you need either two ifs or a simple loop. And just > one "if" can deal with it before. All that required is one call for > pci_msix_table_size(), and I believe most driver would like to know how much > vector it have before it fill the vectors, so mostly no extra cost. But for > this ambiguous return meaning, you have to add more code for fall back - yes, > the driver may can assert that the positive return value always would be irq > numbers if it call pci_msix_table_size() before, but is it safe in logic? If you know how many vectors the card has, then the only failure mode is when you are out of irqs. No change there.
Michael Ellerman wrote: > Not to mention that there's no guarantee that you'll get as many > interrupts as the device supports, so you should really be coding to > cope with that anyway. Like the example in MSI-HOWTO.txt: > > 197 static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) > 198 { > 199 while (nvec >= FOO_DRIVER_MINIMUM_NVEC) { > 200 rc = pci_enable_msix(adapter->pdev, > 201 adapter->msix_entries, nvec); > 202 if (rc > 0) > 203 nvec = rc; > 204 else > 205 return rc; > 206 } > 207 > 208 return -ENOSPC; > 209 } > > So I agree, this patch is an improvement. > I imagine this loop is present in many drivers. So why not add a helper static int pci_enable_msix_min(struct foo_adapter *adapter, int min_nvec) { int nvec = 2048; while (nvec >= min_nvec) { rc = pci_enable_msix(adapter->pdev, adapter->msix_entries, nvec); if (rc == 0) return nvec; else if (rc > 0) nvec = rc; else return rc; } return -ENOSPC; }
On Thu, May 07, 2009 at 01:44:38PM +0300, Avi Kivity wrote:
> I imagine this loop is present in many drivers. So why not add a helper
Let's look!
arch/x86/kernel/amd_iommu_init.c : Needs an exact number of vectors.
drivers/block/cciss.c : If it doesn't get all 4 vectors, falls back to pin mode (instead of MSI mode -- bug!)
drivers/dma/ioat_dma.c : Falls back down to 1 vector if it can't get nvec
drivers/infiniband/hw/mthca/mthca_main.c : Reverts to MSI if it can't get 3.
drivers/net/benet/be_main.c : Falls back to MSI if it can't get 2.
drivers/net/bnx2.c : Falls back to MSI if it can't get 9.
drivers/net/bnx2x_main.c : Falls back to MSI if it can't get N.
drivers/net/e1000e/netdev.c: Falls back to MSI if it can't get N.
drivers/net/enic/enic_main.c: Falls back to MSI if it can't get N.
drivers/net/forcedeth.c: Falls back to MSI if it can't get N.
drivers/net/igb/igb_main.c: Falls back to MSI if it can't get N.
drivers/net/igbvf/netdev.c: Falls back to MSI if it can't get 3.
drivers/net/myri10ge/myri10ge.c: Falls back to Pin if it can't get N.
drivers/net/netxen/netxen_nic_main.c: Falls back to MSI if it can't get N.
drivers/net/qlge/qlge_main.c: Falls back to MSI if it can't get N.
drivers/net/s2io.c: Falls back to MSI if it can't get N.
drivers/net/vxge/vxge-main.c: Falls back once to a second number.
drivers/pci/pcie/portdrv_core.c: Falls back to MSI if it can't get all of them.
drivers/scsi/lpfc/lpfc_init.c: Falls back to MSI if it can't get N.
drivers/scsi/mpt2sas/mpt2sas_base.c: Only allocates 1.
drivers/net/cxgb3/cxgb3_main.c: Actually falls back ... in a bit of a weird way. This one could definitely do with the proposed loop.
drivers/net/ixgbe/ixgbe_main.c: Could also be improved with this loop.
drivers/net/mlx4/main.c: Nasty goto-based loop.
drivers/net/niu.c: Ditto
drivers/net/sfc/efx.c: Only falls back once. Would benefit from loop.
drivers/scsi/qla2xxx/qla_isr.c: Only falls back once. Would benefit from loop.
drivers/staging/sxg/sxg.c: /*Should try with less vector returned.*/
so ... 7 drivers would benefit from this loop. 20 seem like they wouldn't.
What a lot of drivers seem to do is fall back either to a single MSI or
just pin-based interrupts when they can't get as many interrupts as they
would like. They don't try to get a single MSI-X interrupt. I feel an
update to the MSI-HOWTO coming on.
On Thu, 7 May 2009 07:49:53 pm Sheng Yang wrote: > On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote: > > Here's a good example. Let's suppose you have a driver which supports > > two different models of cards, one has 16 MSI-X interrupts, the other > > has 10. You can call pci_enable_msix() asking for 16 vectors. If your > > card is model A, you get 16 interrupts. If your card is model B, it says > > "you can have 10". Sheng is absolutely right, that's a horrid API. If it actually enabled that number and returned it, it might make sense (cf. write() returning less bytes than you give it). But overloading the return value to save an explicit call is just ugly; it's not worth saving a few lines of code at cost of making all the drivers subtle and tricksy. Fail with -ENOSPC or something. Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 08, 2009 at 09:25:00AM +0930, Rusty Russell wrote:
> Sheng is absolutely right, that's a horrid API.
But the API already exists, and is in use by 27 drivers. If this were a
change to the API, I'd be against it, but it is the existing API.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2009-05-08 at 09:25 +0930, Rusty Russell wrote: > On Thu, 7 May 2009 07:49:53 pm Sheng Yang wrote: > > On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote: > > > Here's a good example. Let's suppose you have a driver which supports > > > two different models of cards, one has 16 MSI-X interrupts, the other > > > has 10. You can call pci_enable_msix() asking for 16 vectors. If your > > > card is model A, you get 16 interrupts. If your card is model B, it says > > > "you can have 10". > > Sheng is absolutely right, that's a horrid API. It's not horrid, though it is tricky - but only for drivers that care, you can still do: if (pci_enable_msix(..)) bail; > If it actually enabled that number and returned it, it might make sense (cf. > write() returning less bytes than you give it). It could do that, but I think that would be worse. The driver, on finding out it can only get a certain number of MSIs might need to make a decision about how it allocates those - eg. in a network driver, sharing them between TX/RX/management. And in practice most of the drivers just bail if they can't get what they asked for, so enabling less than they wanted would just mean they have to go and disable them. > But overloading the return > value to save an explicit call is just ugly; it's not worth saving a few lines > of code at cost of making all the drivers subtle and tricksy. Looking at just this patch, I would agree, but unfortunately it's not that simple. The first limitation on the number of MSIs the driver can have is the number the device supports, that's what this patch does. But there are others, and they come out of the arch code, or even the firmware. So to implement pci_how_many_msis(), we'd need a parallel API all the way down to the arch code, or a flag to all the existing routines saying "don't really allocate, just find out". That would be horrid IMHO. cheers
On Fri, May 08, 2009 at 09:25:00AM +0930, Rusty Russell wrote: > On Thu, 7 May 2009 07:49:53 pm Sheng Yang wrote: > > On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote: > > > Here's a good example. Let's suppose you have a driver which supports > > > two different models of cards, one has 16 MSI-X interrupts, the other > > > has 10. You can call pci_enable_msix() asking for 16 vectors. If your > > > card is model A, you get 16 interrupts. If your card is model B, it says > > > "you can have 10". > > Sheng is absolutely right, that's a horrid API. > > If it actually enabled that number and returned it, it might make sense (cf. > write() returning less bytes than you give it). But overloading the return > value to save an explicit call is just ugly; it's not worth saving a few lines > of code at cost of making all the drivers subtle and tricksy. > > Fail with -ENOSPC or something. > > Rusty. I do agree that returning a positive value from pci_enable_msix it an ugly API (but note that this is the API that linux currently has). Here's a wrapper that I ended up with in my driver: static int enable_msix(struct pci_dev *dev, struct msix_entry *entries, int *options, int noptions) { int i; for (i = 0; i < noptions; ++i) if (!pci_enable_msix(dev, entries, options[i])) return options[i]; return -EBUSY; } This gets an array of options for # of vectors and tries them one after the other until an option that the system can support is found. On success, we get the # of vectors actually enabled, and driver can then use them as it sees fit. Is there interest in moving something like this to pci.h?
On Thu, 7 May 2009 11:28:41 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > pci_enable_msix currently returns -EINVAL if you ask > for more vectors than supported by the device, which would > typically cause fallback to regular interrupts. > > It's better to return the table size, making the driver retry > MSI-X with less vectors. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > > Hi Jesse, > This came up when I was adding MSI-X support to virtio pci driver, > which does not know the exact table size upfront. > Could you consider this patch for 2.6.31 please? Applied this one to my linux-next branch; hopefully Rusty won't mind too much. :)
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 6f2e629..f5bd1c9 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -687,8 +687,8 @@ int pci_msix_table_size(struct pci_dev *dev) * indicates the successful configuration of MSI-X capability structure * with new allocated MSI-X irqs. A return of < 0 indicates a failure. * Or a return of > 0 indicates that driver request is exceeding the number - * of irqs available. Driver should use the returned value to re-send - * its request. + * of irqs or MSI-X vectors available. Driver should use the returned value to + * re-send its request. **/ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) { @@ -704,7 +704,7 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec) nr_entries = pci_msix_table_size(dev); if (nvec > nr_entries) - return -EINVAL; + return nr_entries; /* Check for any invalid entries */ for (i = 0; i < nvec; i++) {
pci_enable_msix currently returns -EINVAL if you ask for more vectors than supported by the device, which would typically cause fallback to regular interrupts. It's better to return the table size, making the driver retry MSI-X with less vectors. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- Hi Jesse, This came up when I was adding MSI-X support to virtio pci driver, which does not know the exact table size upfront. Could you consider this patch for 2.6.31 please? drivers/pci/msi.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-)