Message ID | 20250115083215.2781310-1-danielsftsai@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: dwc: Separate MSI out to different controller | expand |
On Wed, Jan 15, 2025 at 08:32:15AM +0000, Daniel Tsai wrote: > From: Tsai Sung-Fu <danielsftsai@google.com> > > Setup the struct irq_affinity at EP side, and passing that as 1 of the > function parameter when endpoint calls pci_alloc_irq_vectors_affinity, > this could help to setup non-default irq_affinity for target irq (end up > in irq_desc->irq_common_data.affinity), and we can make use of that to > separate msi vector out to bind to other msi controller. > > In current design, we have 8 msi controllers, and each of them own up to > 32 msi vectors, layout as below > > msi_controller0 <- msi_vector0 ~ 31 > msi_controller1 <- msi_vector32 ~ 63 > msi_controller2 <- msi_vector64 ~ 95 > . > . > . > msi_controller7 <- msi_vector224 ~ 255 > > dw_pcie_irq_domain_alloc is allocating msi vector number in a contiguous > fashion, that would end up those allocated msi vectors all handled by > the same msi_controller, which all of them would have irq affinity in > equal. To separate out to different CPU, we need to distribute msi > vectors to different msi_controller, which require to allocate the msi > vector in a stride fashion. > > To do that, the CL make use the cpumask_var_t setup by the endpoint, > compare against to see if the affinities are the same, if they are, > bind to msi controller which previously allocated msi vector goes to, if > they are not, assign to new msi controller. It's crunch time right now getting ready for the merge window, but some superficial things you can address when you get more detailed feedback later: Add "()" after function names. s/EP/Endpoint/ at least the first time it's used s/msi/MSI/ in English text s/irq/IRQ/ s/the CL make use/use/ ("CL" looks like a Google-ism) > + * All IRQs on a given controller will use the same parent interrupt, > + * and therefore the same CPU affinity. We try to honor any CPU spreading > + * requests by assigning distinct affinity masks to distinct vectors. > + * So instead of always allocating the msi vectors in a contiguous fashion, > + * the algo here honor whoever comes first can bind the msi controller to > + * its irq affinity mask, or compare its cpumask against > + * currently recorded to decide if binding to this msi controller. Wrap comment to fit in 80 columns like the rest of the file. s/msi/MSI/ s/irq/IRQ/ > + * no msi controller matches, we would error return (no space) and > + * clear those previously allocated bit, because all those msi vectors > + * didn't really successfully allocated, so we shouldn't occupied that > + * position in the bitmap in case other endpoint may still make use of > + * those. An extra step when choosing to not allocate in contiguous > + * fashion. Similar. Capitalize beginning of sentence. Some of these are not quite sentences or have grammatical issues. Bjorn
Subject needs to be improved. Something like, PCI: dwc: Support IRQ affinity by assigning MSIs to supported MSI controller On Wed, Jan 15, 2025 at 08:32:15AM +0000, Daniel Tsai wrote: > From: Tsai Sung-Fu <danielsftsai@google.com> > > Setup the struct irq_affinity at EP side, What do you mean by 'EP side'? > and passing that as 1 of the > function parameter when endpoint calls pci_alloc_irq_vectors_affinity, > this could help to setup non-default irq_affinity for target irq (end up > in irq_desc->irq_common_data.affinity), and we can make use of that to > separate msi vector out to bind to other msi controller. > BY 'other msi controller' you mean the parent interrupt controller that is chained by the DWC MSI controller? > In current design, we have 8 msi controllers, and each of them own up to Current design of what? I believe that you guys at Google want to add support for your secret future SoC, but atleast mention that in the patch descriptions. Otherwise, we don't know whether the patch applies to currently submitted platforms or future ones. > 32 msi vectors, layout as below > > msi_controller0 <- msi_vector0 ~ 31 > msi_controller1 <- msi_vector32 ~ 63 > msi_controller2 <- msi_vector64 ~ 95 > . > . > . > msi_controller7 <- msi_vector224 ~ 255 > > dw_pcie_irq_domain_alloc is allocating msi vector number in a contiguous > fashion, that would end up those allocated msi vectors all handled by > the same msi_controller, which all of them would have irq affinity in > equal. To separate out to different CPU, we need to distribute msi > vectors to different msi_controller, which require to allocate the msi > vector in a stride fashion. > > To do that, the CL make use the cpumask_var_t setup by the endpoint, What is 'CL'? > compare against to see if the affinities are the same, if they are, > bind to msi controller which previously allocated msi vector goes to, if > they are not, assign to new msi controller. > > Signed-off-by: Tsai Sung-Fu <danielsftsai@google.com> > --- > .../pci/controller/dwc/pcie-designware-host.c | 80 +++++++++++++++---- > drivers/pci/controller/dwc/pcie-designware.h | 2 + > 2 files changed, 67 insertions(+), 15 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > index d2291c3ceb8be..192d05c473b3b 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > @@ -181,25 +181,75 @@ static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, > void *args) > { > struct dw_pcie_rp *pp = domain->host_data; > - unsigned long flags; > - u32 i; > - int bit; > + const struct cpumask *mask; > + unsigned long flags, index, start, size; > + int irq, ctrl, p_irq, *msi_vec_index; > + unsigned int controller_count = (pp->num_vectors / MAX_MSI_IRQS_PER_CTRL); > + > + /* > + * All IRQs on a given controller will use the same parent interrupt, > + * and therefore the same CPU affinity. We try to honor any CPU spreading > + * requests by assigning distinct affinity masks to distinct vectors. > + * So instead of always allocating the msi vectors in a contiguous fashion, > + * the algo here honor whoever comes first can bind the msi controller to > + * its irq affinity mask, or compare its cpumask against > + * currently recorded to decide if binding to this msi controller. > + */ > + > + msi_vec_index = kcalloc(nr_irqs, sizeof(*msi_vec_index), GFP_KERNEL); > + if (!msi_vec_index) > + return -ENOMEM; > > raw_spin_lock_irqsave(&pp->lock, flags); > > - bit = bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors, > - order_base_2(nr_irqs)); > + for (irq = 0; irq < nr_irqs; irq++) { > + mask = irq_get_affinity_mask(virq + irq); > + for (ctrl = 0; ctrl < controller_count; ctrl++) { > + start = ctrl * MAX_MSI_IRQS_PER_CTRL; > + size = start + MAX_MSI_IRQS_PER_CTRL; > + if (find_next_bit(pp->msi_irq_in_use, size, start) >= size) { > + cpumask_copy(&pp->msi_ctrl_to_cpu[ctrl], mask); > + break; I don't see how this relates to the affinity setting of the parent interrupt line that the MSI controller is muxed to. Here you are just copying the target MSI affinity mask to the msi_ctrl_to_cpu[] array entry of the controller. And as per the hierarchial IRQ domain implementation, DWC MSI cannot set the IRQ affinity on its own and that's why MSI_FLAG_NO_AFFINITY flag is set in this driver. So I don't see how this patch is relevant. Am I missing something? - Mani > + } > > - raw_spin_unlock_irqrestore(&pp->lock, flags); > + if (cpumask_equal(&pp->msi_ctrl_to_cpu[ctrl], mask) && > + find_next_zero_bit(pp->msi_irq_in_use, size, start) < size) > + break; > + } > > - if (bit < 0) > - return -ENOSPC; > + /* > + * no msi controller matches, we would error return (no space) and > + * clear those previously allocated bit, because all those msi vectors > + * didn't really successfully allocated, so we shouldn't occupied that > + * position in the bitmap in case other endpoint may still make use of > + * those. An extra step when choosing to not allocate in contiguous > + * fashion. > + */ > + if (ctrl == controller_count) { > + for (p_irq = irq - 1; p_irq >= 0; p_irq--) > + bitmap_clear(pp->msi_irq_in_use, msi_vec_index[p_irq], 1); > + raw_spin_unlock_irqrestore(&pp->lock, flags); > + kfree(msi_vec_index); > + return -ENOSPC; > + } > + > + index = bitmap_find_next_zero_area(pp->msi_irq_in_use, > + size, > + start, > + 1, > + 0); > + bitmap_set(pp->msi_irq_in_use, index, 1); > + msi_vec_index[irq] = index; > + } > > - for (i = 0; i < nr_irqs; i++) > - irq_domain_set_info(domain, virq + i, bit + i, > + raw_spin_unlock_irqrestore(&pp->lock, flags); > + > + for (irq = 0; irq < nr_irqs; irq++) > + irq_domain_set_info(domain, virq + irq, msi_vec_index[irq], > pp->msi_irq_chip, > pp, handle_edge_irq, > NULL, NULL); > + kfree(msi_vec_index); > > return 0; > } > @@ -207,15 +257,15 @@ static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, > static void dw_pcie_irq_domain_free(struct irq_domain *domain, > unsigned int virq, unsigned int nr_irqs) > { > - struct irq_data *d = irq_domain_get_irq_data(domain, virq); > + struct irq_data *d; > struct dw_pcie_rp *pp = domain->host_data; > unsigned long flags; > > raw_spin_lock_irqsave(&pp->lock, flags); > - > - bitmap_release_region(pp->msi_irq_in_use, d->hwirq, > - order_base_2(nr_irqs)); > - > + for (int i = 0; i < nr_irqs; i++) { > + d = irq_domain_get_irq_data(domain, virq + i); > + bitmap_clear(pp->msi_irq_in_use, d->hwirq, 1); > + } > raw_spin_unlock_irqrestore(&pp->lock, flags); > } > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > index 347ab74ac35aa..95629b37a238e 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.h > +++ b/drivers/pci/controller/dwc/pcie-designware.h > @@ -14,6 +14,7 @@ > #include <linux/bitfield.h> > #include <linux/bitops.h> > #include <linux/clk.h> > +#include <linux/cpumask.h> > #include <linux/dma-mapping.h> > #include <linux/dma/edma.h> > #include <linux/gpio/consumer.h> > @@ -373,6 +374,7 @@ struct dw_pcie_rp { > struct irq_chip *msi_irq_chip; > u32 num_vectors; > u32 irq_mask[MAX_MSI_CTRLS]; > + struct cpumask msi_ctrl_to_cpu[MAX_MSI_CTRLS]; > struct pci_host_bridge *bridge; > raw_spinlock_t lock; > DECLARE_BITMAP(msi_irq_in_use, MAX_MSI_IRQS); > -- > 2.48.0.rc2.279.g1de40edade-goog >
Hi Mani and Bjorn, Sorry for the late reply, we just found out some problems in the patch we are trying to upstream here, and figuring out it might not be a good idea to keep this process going, so I would drop this patch submission, and come back once we figure it out a better way. BTW, May I ask why upstream chose to flag this driver with MSI_FLAG_NO_AFFINITY and remove the function dw_pci_msi_set_affinity implementation ? Thanks
On Tue, Feb 11, 2025 at 03:16:11PM +0800, Tsai Sung-Fu wrote: > Hi Mani and Bjorn, > > Sorry for the late reply, we just found out some problems in the patch > we are trying to upstream here, and figuring out it might not be a > good idea to keep this process going, so I would drop this patch > submission, and come back once we figure it out a better way. > > BTW, May I ask why upstream chose to flag this driver with > MSI_FLAG_NO_AFFINITY and remove the function dw_pci_msi_set_affinity > implementation ? > Because you cannot set affinity for chained MSIs as these MSIs are muxed to another parent interrupt. Since the IRQ affinity is all about changing which CPU gets the IRQ, affinity setting is only possible for the MSI parent. - Mani
>Because you cannot set affinity for chained MSIs as these MSIs are muxed to >another parent interrupt. Since the IRQ affinity is all about changing which CPU >gets the IRQ, affinity setting is only possible for the MSI parent. So if we can find the MSI parent by making use of chained relationships (32 MSI vectors muxed to 1 parent), is it possible that we can add that implementation back ? We have another patch that would like to add the dw_pci_msi_set_affinity feature. Would it be a possible try from your perspective ? On Tue, Feb 11, 2025 at 3:57 PM Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Tue, Feb 11, 2025 at 03:16:11PM +0800, Tsai Sung-Fu wrote: > > Hi Mani and Bjorn, > > > > Sorry for the late reply, we just found out some problems in the patch > > we are trying to upstream here, and figuring out it might not be a > > good idea to keep this process going, so I would drop this patch > > submission, and come back once we figure it out a better way. > > > > BTW, May I ask why upstream chose to flag this driver with > > MSI_FLAG_NO_AFFINITY and remove the function dw_pci_msi_set_affinity > > implementation ? > > > > Because you cannot set affinity for chained MSIs as these MSIs are muxed to > another parent interrupt. Since the IRQ affinity is all about changing which CPU > gets the IRQ, affinity setting is only possible for the MSI parent. > > - Mani > > -- > மணிவண்ணன் சதாசிவம்
>Because you cannot set affinity for chained MSIs as these MSIs are muxed to >another parent interrupt. Since the IRQ affinity is all about changing which CPU >gets the IRQ, affinity setting is only possible for the MSI parent. So if we can find the MSI parent by making use of chained relationships (32 MSI vectors muxed to 1 parent), is it possible that we can add that implementation back ? We have another patch that would like to add the dw_pci_msi_set_affinity feature. Would it be a possible try from your perspective ? Thank you On Tue, Feb 11, 2025 at 3:57 PM Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Tue, Feb 11, 2025 at 03:16:11PM +0800, Tsai Sung-Fu wrote: > > Hi Mani and Bjorn, > > > > Sorry for the late reply, we just found out some problems in the patch > > we are trying to upstream here, and figuring out it might not be a > > good idea to keep this process going, so I would drop this patch > > submission, and come back once we figure it out a better way. > > > > BTW, May I ask why upstream chose to flag this driver with > > MSI_FLAG_NO_AFFINITY and remove the function dw_pci_msi_set_affinity > > implementation ? > > > > Because you cannot set affinity for chained MSIs as these MSIs are muxed to > another parent interrupt. Since the IRQ affinity is all about changing which CPU > gets the IRQ, affinity setting is only possible for the MSI parent. > > - Mani > > -- > மணிவண்ணன் சதாசிவம்
On Tue, Feb 11, 2025 at 04:23:53PM +0800, Tsai Sung-Fu wrote: > >Because you cannot set affinity for chained MSIs as these MSIs are muxed to > >another parent interrupt. Since the IRQ affinity is all about changing which CPU > >gets the IRQ, affinity setting is only possible for the MSI parent. > > So if we can find the MSI parent by making use of chained > relationships (32 MSI vectors muxed to 1 parent), > is it possible that we can add that implementation back ? > We have another patch that would like to add the > dw_pci_msi_set_affinity feature. > Would it be a possible try from your perspective ? > This question was brought up plenty of times and the concern from the irqchip maintainer Marc was that if you change the affinity of the parent when the child MSI affinity changes, it tends to break the userspace ABI of the parent. See below links: https://lore.kernel.org/all/87mtg0i8m8.wl-maz@kernel.org/ https://lore.kernel.org/all/874k0bf7f7.wl-maz@kernel.org/ - Mani
On Fri, Feb 14, 2025 at 12:45:52PM +0530, Manivannan Sadhasivam wrote: > On Tue, Feb 11, 2025 at 04:23:53PM +0800, Tsai Sung-Fu wrote: > > >Because you cannot set affinity for chained MSIs as these MSIs are muxed to > > >another parent interrupt. Since the IRQ affinity is all about changing which CPU > > >gets the IRQ, affinity setting is only possible for the MSI parent. > > > > So if we can find the MSI parent by making use of chained > > relationships (32 MSI vectors muxed to 1 parent), > > is it possible that we can add that implementation back ? > > We have another patch that would like to add the > > dw_pci_msi_set_affinity feature. > > Would it be a possible try from your perspective ? > > > > This question was brought up plenty of times and the concern from the irqchip > maintainer Marc was that if you change the affinity of the parent when the child > MSI affinity changes, it tends to break the userspace ABI of the parent. > > See below links: > > https://lore.kernel.org/all/87mtg0i8m8.wl-maz@kernel.org/ > https://lore.kernel.org/all/874k0bf7f7.wl-maz@kernel.org/ It's hard to meaningfully talk about a patch that hasn't been posted yet, but the implementation we have at least attempts to make *some* kind of resolution to those ABI questions. For one, it rejects affinity changes that are incompatible (by some definition) with affinities requested by other virqs shared on the same parent line. It also updates their effective affinities upon changes. Those replies seem to over-focus on dynamic, user-space initiated changes too. But how about for "managed-affinity" interrupts? Those are requested by drivers internally to the kernel (a la pci_alloc_irq_vectors_affinity()), and can't be changed by user space afterward. It seems like there'd be room for supporting that, provided we don't allow conflicting/non-overlapping configurations. I do see that Marc sketched out a complex sysfs/hierarchy API in some of his replies. I'm not sure that would provide too much value to the managed-affinity cases we're interested in, but I get the appeal for user-managed affinity. Brian
On Fri, Feb 14, 2025 at 11:54:52AM -0800, Brian Norris wrote: > On Fri, Feb 14, 2025 at 12:45:52PM +0530, Manivannan Sadhasivam wrote: > > On Tue, Feb 11, 2025 at 04:23:53PM +0800, Tsai Sung-Fu wrote: > > > >Because you cannot set affinity for chained MSIs as these MSIs are muxed to > > > >another parent interrupt. Since the IRQ affinity is all about changing which CPU > > > >gets the IRQ, affinity setting is only possible for the MSI parent. > > > > > > So if we can find the MSI parent by making use of chained > > > relationships (32 MSI vectors muxed to 1 parent), > > > is it possible that we can add that implementation back ? > > > We have another patch that would like to add the > > > dw_pci_msi_set_affinity feature. > > > Would it be a possible try from your perspective ? > > > > > > > This question was brought up plenty of times and the concern from the irqchip > > maintainer Marc was that if you change the affinity of the parent when the child > > MSI affinity changes, it tends to break the userspace ABI of the parent. > > > > See below links: > > > > https://lore.kernel.org/all/87mtg0i8m8.wl-maz@kernel.org/ > > https://lore.kernel.org/all/874k0bf7f7.wl-maz@kernel.org/ > > It's hard to meaningfully talk about a patch that hasn't been posted > yet, but the implementation we have at least attempts to make *some* > kind of resolution to those ABI questions. For one, it rejects affinity > changes that are incompatible (by some definition) with affinities > requested by other virqs shared on the same parent line. It also updates > their effective affinities upon changes. > > Those replies seem to over-focus on dynamic, user-space initiated > changes too. But how about for "managed-affinity" interrupts? Those are > requested by drivers internally to the kernel (a la > pci_alloc_irq_vectors_affinity()), and can't be changed by user space > afterward. It seems like there'd be room for supporting that, provided > we don't allow conflicting/non-overlapping configurations. > > I do see that Marc sketched out a complex sysfs/hierarchy API in some of > his replies. I'm not sure that would provide too much value to the > managed-affinity cases we're interested in, but I get the appeal for > user-managed affinity. > Whatever your proposal is, please get it reviewed by Marc. - Mani
On Wed, 19 Feb 2025 17:51:19 +0000, Manivannan Sadhasivam <mani@kernel.org> wrote: > > On Fri, Feb 14, 2025 at 11:54:52AM -0800, Brian Norris wrote: > > On Fri, Feb 14, 2025 at 12:45:52PM +0530, Manivannan Sadhasivam wrote: > > > On Tue, Feb 11, 2025 at 04:23:53PM +0800, Tsai Sung-Fu wrote: > > > > >Because you cannot set affinity for chained MSIs as these MSIs are muxed to > > > > >another parent interrupt. Since the IRQ affinity is all about changing which CPU > > > > >gets the IRQ, affinity setting is only possible for the MSI parent. > > > > > > > > So if we can find the MSI parent by making use of chained > > > > relationships (32 MSI vectors muxed to 1 parent), > > > > is it possible that we can add that implementation back ? > > > > We have another patch that would like to add the > > > > dw_pci_msi_set_affinity feature. > > > > Would it be a possible try from your perspective ? > > > > > > > > > > This question was brought up plenty of times and the concern from the irqchip > > > maintainer Marc was that if you change the affinity of the parent when the child > > > MSI affinity changes, it tends to break the userspace ABI of the parent. > > > > > > See below links: > > > > > > https://lore.kernel.org/all/87mtg0i8m8.wl-maz@kernel.org/ > > > https://lore.kernel.org/all/874k0bf7f7.wl-maz@kernel.org/ > > > > It's hard to meaningfully talk about a patch that hasn't been posted > > yet, but the implementation we have at least attempts to make *some* > > kind of resolution to those ABI questions. For one, it rejects affinity > > changes that are incompatible (by some definition) with affinities > > requested by other virqs shared on the same parent line. It also updates > > their effective affinities upon changes. > > > > Those replies seem to over-focus on dynamic, user-space initiated > > changes too. But how about for "managed-affinity" interrupts? Those are > > requested by drivers internally to the kernel (a la > > pci_alloc_irq_vectors_affinity()), and can't be changed by user space > > afterward. It seems like there'd be room for supporting that, provided > > we don't allow conflicting/non-overlapping configurations. > > > > I do see that Marc sketched out a complex sysfs/hierarchy API in some of > > his replies. I'm not sure that would provide too much value to the > > managed-affinity cases we're interested in, but I get the appeal for > > user-managed affinity. > > > > Whatever your proposal is, please get it reviewed by Marc. Please see b673fe1a6229a and avoid dragging me into discussion I have purposely removed myself from. I'd also appreciate if you didn't volunteer me for review tasks I have no intention to perform (this is the second time you are doing it, and that's not on). M.
On Wed, Feb 19, 2025 at 06:02:24PM +0000, Marc Zyngier wrote: > On Wed, 19 Feb 2025 17:51:19 +0000, > Manivannan Sadhasivam <mani@kernel.org> wrote: > > > > On Fri, Feb 14, 2025 at 11:54:52AM -0800, Brian Norris wrote: > > > On Fri, Feb 14, 2025 at 12:45:52PM +0530, Manivannan Sadhasivam wrote: > > > > On Tue, Feb 11, 2025 at 04:23:53PM +0800, Tsai Sung-Fu wrote: > > > > > >Because you cannot set affinity for chained MSIs as these MSIs are muxed to > > > > > >another parent interrupt. Since the IRQ affinity is all about changing which CPU > > > > > >gets the IRQ, affinity setting is only possible for the MSI parent. > > > > > > > > > > So if we can find the MSI parent by making use of chained > > > > > relationships (32 MSI vectors muxed to 1 parent), > > > > > is it possible that we can add that implementation back ? > > > > > We have another patch that would like to add the > > > > > dw_pci_msi_set_affinity feature. > > > > > Would it be a possible try from your perspective ? > > > > > > > > > > > > > This question was brought up plenty of times and the concern from the irqchip > > > > maintainer Marc was that if you change the affinity of the parent when the child > > > > MSI affinity changes, it tends to break the userspace ABI of the parent. > > > > > > > > See below links: > > > > > > > > https://lore.kernel.org/all/87mtg0i8m8.wl-maz@kernel.org/ > > > > https://lore.kernel.org/all/874k0bf7f7.wl-maz@kernel.org/ > > > > > > It's hard to meaningfully talk about a patch that hasn't been posted > > > yet, but the implementation we have at least attempts to make *some* > > > kind of resolution to those ABI questions. For one, it rejects affinity > > > changes that are incompatible (by some definition) with affinities > > > requested by other virqs shared on the same parent line. It also updates > > > their effective affinities upon changes. > > > > > > Those replies seem to over-focus on dynamic, user-space initiated > > > changes too. But how about for "managed-affinity" interrupts? Those are > > > requested by drivers internally to the kernel (a la > > > pci_alloc_irq_vectors_affinity()), and can't be changed by user space > > > afterward. It seems like there'd be room for supporting that, provided > > > we don't allow conflicting/non-overlapping configurations. > > > > > > I do see that Marc sketched out a complex sysfs/hierarchy API in some of > > > his replies. I'm not sure that would provide too much value to the > > > managed-affinity cases we're interested in, but I get the appeal for > > > user-managed affinity. > > > > > > > Whatever your proposal is, please get it reviewed by Marc. > > Please see b673fe1a6229a and avoid dragging me into discussion I have > purposely removed myself from. I'd also appreciate if you didn't > volunteer me for review tasks I have no intention to perform (this is > the second time you are doing it, and that's not on). > Apologies for not catching up with the MAINTAINERS update. Since you were pretty much involved in the affinity discussion, I thought about asking your help. Regarding looping you in, I thought you only wanted to avoid reviewing the new driver changes that were deviating from the spec too much. But anyway, now it is clear to me that you are not maintaining the irqchip drivers, so I will not bother you anymore. Sorry about that. - Mani
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index d2291c3ceb8be..192d05c473b3b 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -181,25 +181,75 @@ static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, void *args) { struct dw_pcie_rp *pp = domain->host_data; - unsigned long flags; - u32 i; - int bit; + const struct cpumask *mask; + unsigned long flags, index, start, size; + int irq, ctrl, p_irq, *msi_vec_index; + unsigned int controller_count = (pp->num_vectors / MAX_MSI_IRQS_PER_CTRL); + + /* + * All IRQs on a given controller will use the same parent interrupt, + * and therefore the same CPU affinity. We try to honor any CPU spreading + * requests by assigning distinct affinity masks to distinct vectors. + * So instead of always allocating the msi vectors in a contiguous fashion, + * the algo here honor whoever comes first can bind the msi controller to + * its irq affinity mask, or compare its cpumask against + * currently recorded to decide if binding to this msi controller. + */ + + msi_vec_index = kcalloc(nr_irqs, sizeof(*msi_vec_index), GFP_KERNEL); + if (!msi_vec_index) + return -ENOMEM; raw_spin_lock_irqsave(&pp->lock, flags); - bit = bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors, - order_base_2(nr_irqs)); + for (irq = 0; irq < nr_irqs; irq++) { + mask = irq_get_affinity_mask(virq + irq); + for (ctrl = 0; ctrl < controller_count; ctrl++) { + start = ctrl * MAX_MSI_IRQS_PER_CTRL; + size = start + MAX_MSI_IRQS_PER_CTRL; + if (find_next_bit(pp->msi_irq_in_use, size, start) >= size) { + cpumask_copy(&pp->msi_ctrl_to_cpu[ctrl], mask); + break; + } - raw_spin_unlock_irqrestore(&pp->lock, flags); + if (cpumask_equal(&pp->msi_ctrl_to_cpu[ctrl], mask) && + find_next_zero_bit(pp->msi_irq_in_use, size, start) < size) + break; + } - if (bit < 0) - return -ENOSPC; + /* + * no msi controller matches, we would error return (no space) and + * clear those previously allocated bit, because all those msi vectors + * didn't really successfully allocated, so we shouldn't occupied that + * position in the bitmap in case other endpoint may still make use of + * those. An extra step when choosing to not allocate in contiguous + * fashion. + */ + if (ctrl == controller_count) { + for (p_irq = irq - 1; p_irq >= 0; p_irq--) + bitmap_clear(pp->msi_irq_in_use, msi_vec_index[p_irq], 1); + raw_spin_unlock_irqrestore(&pp->lock, flags); + kfree(msi_vec_index); + return -ENOSPC; + } + + index = bitmap_find_next_zero_area(pp->msi_irq_in_use, + size, + start, + 1, + 0); + bitmap_set(pp->msi_irq_in_use, index, 1); + msi_vec_index[irq] = index; + } - for (i = 0; i < nr_irqs; i++) - irq_domain_set_info(domain, virq + i, bit + i, + raw_spin_unlock_irqrestore(&pp->lock, flags); + + for (irq = 0; irq < nr_irqs; irq++) + irq_domain_set_info(domain, virq + irq, msi_vec_index[irq], pp->msi_irq_chip, pp, handle_edge_irq, NULL, NULL); + kfree(msi_vec_index); return 0; } @@ -207,15 +257,15 @@ static int dw_pcie_irq_domain_alloc(struct irq_domain *domain, static void dw_pcie_irq_domain_free(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs) { - struct irq_data *d = irq_domain_get_irq_data(domain, virq); + struct irq_data *d; struct dw_pcie_rp *pp = domain->host_data; unsigned long flags; raw_spin_lock_irqsave(&pp->lock, flags); - - bitmap_release_region(pp->msi_irq_in_use, d->hwirq, - order_base_2(nr_irqs)); - + for (int i = 0; i < nr_irqs; i++) { + d = irq_domain_get_irq_data(domain, virq + i); + bitmap_clear(pp->msi_irq_in_use, d->hwirq, 1); + } raw_spin_unlock_irqrestore(&pp->lock, flags); } diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h index 347ab74ac35aa..95629b37a238e 100644 --- a/drivers/pci/controller/dwc/pcie-designware.h +++ b/drivers/pci/controller/dwc/pcie-designware.h @@ -14,6 +14,7 @@ #include <linux/bitfield.h> #include <linux/bitops.h> #include <linux/clk.h> +#include <linux/cpumask.h> #include <linux/dma-mapping.h> #include <linux/dma/edma.h> #include <linux/gpio/consumer.h> @@ -373,6 +374,7 @@ struct dw_pcie_rp { struct irq_chip *msi_irq_chip; u32 num_vectors; u32 irq_mask[MAX_MSI_CTRLS]; + struct cpumask msi_ctrl_to_cpu[MAX_MSI_CTRLS]; struct pci_host_bridge *bridge; raw_spinlock_t lock; DECLARE_BITMAP(msi_irq_in_use, MAX_MSI_IRQS);