Message ID | 8575dc590b819892f366852fe50835efaf579f4f.1368431413.git.agordeev@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
* Alexander Gordeev <agordeev@redhat.com> wrote: > When multiple MSIs are enabled with pci_enable_msi_block(), the > requested number of interrupts 'nvec' is rounded up to the nearest > power-of-two value. The result is then used for setting up the > number of MSI messages in the PCI device and allocation of > interrupt resources in the operating system (i.e. vector numbers). > Thus, in cases when a device driver requests some number of MSIs > and this number is not a power-of-two value, the extra operating > system resources (allocated as the result of rounding) are wasted. > > This fix introduces 'msi_desc::nvec' field to address the above > issue. When non-zero, it will report the actual number of MSIs the > device will send, as requested by the device driver. This value > should be used by architectures to properly set up and tear down > associated interrupt resources. > > Note, although the existing 'msi_desc::multiple' field might seem > redundant, in fact in does not. In general case the number of MSIs a > PCI device is initialized with is not necessarily the closest power- > of-two value of the number of MSIs the device will send. Thus, in > theory it would not be always possible to derive the former from the > latter and we need to keep them both, to stress this corner case. > Besides, since 'msi_desc::multiple' is a bitfield, throwing it out > would not save us any space. > > Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Would be nice to have an Acked-by from Bjorn for this patch. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 13, 2013 at 11:05:48AM +0200, Alexander Gordeev wrote: > Note, although the existing 'msi_desc::multiple' field might seem > redundant, in fact in does not. In general case the number of MSIs a > PCI device is initialized with is not necessarily the closest power- > of-two value of the number of MSIs the device will send. Thus, in > theory it would not be always possible to derive the former from the > latter and we need to keep them both, to stress this corner case. > Besides, since 'msi_desc::multiple' is a bitfield, throwing it out > would not save us any space. The last paragraph makes me curious. The only place where 'multiple' is set is in do_setup_msi_irqs() and this uses the next power of two for it. And since a device is not enabled twice, it is not overridden. So it should be possible to compute 'multiple' out of 'nvec' but it saves cycles not do to so. I agree to keep 'multiple' but your argument does not seem to make sense. While nitpicking, 'nvec' might deserve a better comment than 'number of messages' since it holds the number of allocated interrupts. :) Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 5, 2013 at 2:56 PM, Sebastian Andrzej Siewior <sebastian@breakpoint.cc> wrote: > On Mon, May 13, 2013 at 11:05:48AM +0200, Alexander Gordeev wrote: >> Note, although the existing 'msi_desc::multiple' field might seem >> redundant, in fact in does not. In general case the number of MSIs a >> PCI device is initialized with is not necessarily the closest power- >> of-two value of the number of MSIs the device will send. Thus, in >> theory it would not be always possible to derive the former from the >> latter and we need to keep them both, to stress this corner case. >> Besides, since 'msi_desc::multiple' is a bitfield, throwing it out >> would not save us any space. > > The last paragraph makes me curious. The only place where 'multiple' is set is > in do_setup_msi_irqs() and this uses the next power of two for it. And since a > device is not enabled twice, it is not overridden. > So it should be possible to compute 'multiple' out of 'nvec' but it saves > cycles not do to so. I agree to keep 'multiple' but your argument does not > seem to make sense. Alexander had an example device that advertised 16 vectors, but the driver knew that it could only generate 6. That's a case where we can't compute 'multiple' from 'nvec' (assuming the driver supplies 'nvec == 6'). If we just rounded up to compute 'multiple', I think we'd compute 8 instead of 16. > While nitpicking, 'nvec' might deserve a better comment than 'number of > messages' since it holds the number of allocated interrupts. :) I did change the name 'nvec' to 'nvec_used', which should help a bit. But I agree that it's still somewhat confusing. BTW, the patches actually in my tree are at http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=pci/alexander-msi (I tweaked this name and some comments slightly). Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-Suresh On Wed, Jun 05, 2013 at 03:09:34PM -0600, Bjorn Helgaas wrote: > > Alexander had an example device that advertised 16 vectors, but the > driver knew that it could only generate 6. That's a case where we > can't compute 'multiple' from 'nvec' (assuming the driver supplies > 'nvec == 6'). If we just rounded up to compute 'multiple', I think > we'd compute 8 instead of 16. Sure, but as I said: the only place where 'multiple' is computed / written it is doing the round-up thingy. > > While nitpicking, 'nvec' might deserve a better comment than 'number of > > messages' since it holds the number of allocated interrupts. :) > > I did change the name 'nvec' to 'nvec_used', which should help a bit. > But I agree that it's still somewhat confusing. > > BTW, the patches actually in my tree are at > http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=pci/alexander-msi > (I tweaked this name and some comments slightly). 'nvec_used' is better the comment next to it is still wrong I think. > Bjorn Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 05, 2013 at 10:56:38PM +0200, Sebastian Andrzej Siewior wrote: > On Mon, May 13, 2013 at 11:05:48AM +0200, Alexander Gordeev wrote: > > Note, although the existing 'msi_desc::multiple' field might seem > > redundant, in fact in does not. In general case the number of MSIs a > > PCI device is initialized with is not necessarily the closest power- > > of-two value of the number of MSIs the device will send. Thus, in > > theory it would not be always possible to derive the former from the > > latter and we need to keep them both, to stress this corner case. > > Besides, since 'msi_desc::multiple' is a bitfield, throwing it out > > would not save us any space. > > The last paragraph makes me curious. The only place where 'multiple' is set is > in do_setup_msi_irqs() and this uses the next power of two for it. And since a > device is not enabled twice, it is not overridden. > So it should be possible to compute 'multiple' out of 'nvec' but it saves > cycles not do to so. I agree to keep 'multiple' but your argument does not > seem to make sense. > While nitpicking, 'nvec' might deserve a better comment than 'number of > messages' since it holds the number of allocated interrupts. :) Sebastian, I re-read my comment few times and I admit it might be confusing. You are right - 'multiple' is set by rounding up only. The part '...not necessarily the closest power-of-two value...' implied an abstract PCI device rather than the described code, but the wording is less than perfect, indeed. In fact, at the moment of writing I kept in mind a follow-up patch that could help with aforementioned devices. That would be a new interface: int pci_enable_msi_block_partial(struct pci_dev *dev, unsigned int nvec_use, unsigned int nvec_init); In this case 'nvec_use' would go to 'msi_desc::nvec_used' and 'nvec_init' would translate to 'msi_desc::multiple' in case 'nvec_init' is not zero. In case 'nvec_init' is zero, 'msi_desc::multiple' would be initialized with the maximum possible value for the device (the way it is done now for pci_enable_msi_block_auto() interface). So, for the AHCI device (Bjorn mentioned) such a call would conserve on 10 of 16 vectors: pci_enable_msi_block_partial(pdev, 6, 0); What I am not sure is whether we need to read out the maximum possible number of vectors like pci_enable_msi_block_auto() does: int pci_enable_msi_block_partial(struct pci_dev *dev, unsigned int nvec_use, unsigned int nvec_init, unsigned int *maxvec); I can not think of any use of 'maxvec' with this interface, but the second variant completes the whole picture about a device... > Sebastian
On Thu, Jun 06, 2013 at 10:30:20AM +0200, Alexander Gordeev wrote: > Sebastian, Hi Alexander, > I re-read my comment few times and I admit it might be confusing. You are > right - 'multiple' is set by rounding up only. The part '...not necessarily > the closest power-of-two value...' implied an abstract PCI device rather than > the described code, but the wording is less than perfect, indeed. Good, so it is not just me :) > In fact, at the moment of writing I kept in mind a follow-up patch that could > help with aforementioned devices. That would be a new interface: > > int pci_enable_msi_block_partial(struct pci_dev *dev, > unsigned int nvec_use, > unsigned int nvec_init); > > In this case 'nvec_use' would go to 'msi_desc::nvec_used' and 'nvec_init' > would translate to 'msi_desc::multiple' in case 'nvec_init' is not zero. > In case 'nvec_init' is zero, 'msi_desc::multiple' would be initialized > with the maximum possible value for the device (the way it is done now for > pci_enable_msi_block_auto() interface). So, for the AHCI device (Bjorn > mentioned) such a call would conserve on 10 of 16 vectors: > > pci_enable_msi_block_partial(pdev, 6, 0); Ah okay. that makes sense. > > What I am not sure is whether we need to read out the maximum possible > number of vectors like pci_enable_msi_block_auto() does: > > int pci_enable_msi_block_partial(struct pci_dev *dev, > unsigned int nvec_use, > unsigned int nvec_init, > unsigned int *maxvec); > > I can not think of any use of 'maxvec' with this interface, but the second > variant completes the whole picture about a device... The user of pci_enable_msi_block_auto() does not know how many it will get so argument seems essential. Your new function on the other hand says exactly how many it requires. Anything less should be an error. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 00cc78c7..014b9d5 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -79,7 +79,10 @@ void default_teardown_msi_irqs(struct pci_dev *dev) int i, nvec; if (entry->irq == 0) continue; - nvec = 1 << entry->msi_attrib.multiple; + if (entry->nvec) + nvec = entry->nvec; + else + nvec = 1 << entry->msi_attrib.multiple; for (i = 0; i < nvec; i++) arch_teardown_msi_irq(entry->irq + i); } @@ -340,7 +343,10 @@ static void free_msi_irqs(struct pci_dev *dev) int i, nvec; if (!entry->irq) continue; - nvec = 1 << entry->msi_attrib.multiple; + if (entry->nvec) + nvec = entry->nvec; + else + nvec = 1 << entry->msi_attrib.multiple; #ifdef CONFIG_GENERIC_HARDIRQS for (i = 0; i < nvec; i++) BUG_ON(irq_has_action(entry->irq + i)); diff --git a/include/linux/msi.h b/include/linux/msi.h index ce93a34..0e20dfc 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -35,6 +35,7 @@ struct msi_desc { u32 masked; /* mask bits */ unsigned int irq; + unsigned int nvec; /* number of messages */ struct list_head list; union {
When multiple MSIs are enabled with pci_enable_msi_block(), the requested number of interrupts 'nvec' is rounded up to the nearest power-of-two value. The result is then used for setting up the number of MSI messages in the PCI device and allocation of interrupt resources in the operating system (i.e. vector numbers). Thus, in cases when a device driver requests some number of MSIs and this number is not a power-of-two value, the extra operating system resources (allocated as the result of rounding) are wasted. This fix introduces 'msi_desc::nvec' field to address the above issue. When non-zero, it will report the actual number of MSIs the device will send, as requested by the device driver. This value should be used by architectures to properly set up and tear down associated interrupt resources. Note, although the existing 'msi_desc::multiple' field might seem redundant, in fact in does not. In general case the number of MSIs a PCI device is initialized with is not necessarily the closest power- of-two value of the number of MSIs the device will send. Thus, in theory it would not be always possible to derive the former from the latter and we need to keep them both, to stress this corner case. Besides, since 'msi_desc::multiple' is a bitfield, throwing it out would not save us any space. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> --- drivers/pci/msi.c | 10 ++++++++-- include/linux/msi.h | 1 + 2 files changed, 9 insertions(+), 2 deletions(-)