diff mbox series

[v3] PCI: xilinx-nwl: Fix Multi MSI data programming

Message ID 1559133469-11981-1-git-send-email-bharat.kumar.gogada@xilinx.com (mailing list archive)
State Superseded, archived
Headers show
Series [v3] PCI: xilinx-nwl: Fix Multi MSI data programming | expand

Commit Message

Bharat Kumar Gogada May 29, 2019, 12:37 p.m. UTC
The current Multi MSI data programming fails if multiple end points
requesting MSI and multi MSI are connected with switch, i.e the current
multi MSI data being given is not considering the number of vectors
being requested in case of multi MSI.
Ex: Two EP's connected via switch, EP1 requesting single MSI first,
EP2 requesting Multi MSI of count four. The current code gives
MSI data 0x0 to EP1 and 0x1 to EP2, but EP2 can modify lower two bits
due to which EP2 also sends interrupt with MSI data 0x0 which results
in always invoking virq of EP1 due to which EP2 MSI interrupt never
gets handled.

Fix Multi MSI data programming with required alignment by
using number of vectors being requested.

Fixes: ab597d35ef11 ("PCI: xilinx-nwl: Add support for Xilinx NWL PCIe
Host Controller")
Signed-off-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
---
V3:
 - Added example description of the issue
---
 drivers/pci/controller/pcie-xilinx-nwl.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Lorenzo Pieralisi May 31, 2019, 4:09 p.m. UTC | #1
[+Marc]

On Wed, May 29, 2019 at 06:07:49PM +0530, Bharat Kumar Gogada wrote:
> The current Multi MSI data programming fails if multiple end points
> requesting MSI and multi MSI are connected with switch, i.e the current
> multi MSI data being given is not considering the number of vectors
> being requested in case of multi MSI.
> Ex: Two EP's connected via switch, EP1 requesting single MSI first,
> EP2 requesting Multi MSI of count four. The current code gives
> MSI data 0x0 to EP1 and 0x1 to EP2, but EP2 can modify lower two bits
> due to which EP2 also sends interrupt with MSI data 0x0 which results
> in always invoking virq of EP1 due to which EP2 MSI interrupt never
> gets handled.

If this is a problem it is not the only driver where it should be fixed
it seems. CC'ed Marc in case I have missed something in relation to MSI
IRQs but AFAIU it looks like HW is allowed to toggled bits (according to
bits[6:4] in Message Control for MSI) in the MSI data, given that the
data written is the hwirq number (in this specific MSI controller)
it ought to be fixed.

The commit log and patch should be rewritten (I will do that) but
first I would like to understand if there are more drivers to be
updated.

Lorenzo

> Fix Multi MSI data programming with required alignment by
> using number of vectors being requested.
> 
> Fixes: ab597d35ef11 ("PCI: xilinx-nwl: Add support for Xilinx NWL PCIe
> Host Controller")
> Signed-off-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
> ---
> V3:
>  - Added example description of the issue
> ---
>  drivers/pci/controller/pcie-xilinx-nwl.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c
> index 81538d7..8efcb8a 100644
> --- a/drivers/pci/controller/pcie-xilinx-nwl.c
> +++ b/drivers/pci/controller/pcie-xilinx-nwl.c
> @@ -483,7 +483,16 @@ static int nwl_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
>  	int i;
>  
>  	mutex_lock(&msi->lock);
> -	bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
> +
> +	/*
> +	 * Multi MSI count is requested in power of two
> +	 * Check if multi msi is requested
> +	 */
> +	if (nr_irqs % 2 == 0)
> +		bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
> +					 nr_irqs, nr_irqs - 1);
> +	else
> +		bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
>  					 nr_irqs, 0);
>  	if (bit >= INT_PCI_MSI_NR) {
>  		mutex_unlock(&msi->lock);
> -- 
> 2.7.4
>
Marc Zyngier June 3, 2019, 8:49 a.m. UTC | #2
On 31/05/2019 17:09, Lorenzo Pieralisi wrote:
> [+Marc]
> 
> On Wed, May 29, 2019 at 06:07:49PM +0530, Bharat Kumar Gogada wrote:
>> The current Multi MSI data programming fails if multiple end points
>> requesting MSI and multi MSI are connected with switch, i.e the current
>> multi MSI data being given is not considering the number of vectors
>> being requested in case of multi MSI.
>> Ex: Two EP's connected via switch, EP1 requesting single MSI first,
>> EP2 requesting Multi MSI of count four. The current code gives
>> MSI data 0x0 to EP1 and 0x1 to EP2, but EP2 can modify lower two bits
>> due to which EP2 also sends interrupt with MSI data 0x0 which results
>> in always invoking virq of EP1 due to which EP2 MSI interrupt never
>> gets handled.
> 
> If this is a problem it is not the only driver where it should be fixed
> it seems. CC'ed Marc in case I have missed something in relation to MSI
> IRQs but AFAIU it looks like HW is allowed to toggled bits (according to
> bits[6:4] in Message Control for MSI) in the MSI data, given that the
> data written is the hwirq number (in this specific MSI controller)
> it ought to be fixed.

Yeah, it looks like a number of MSI controllers could be quite broken
in this particular area.

> 
> The commit log and patch should be rewritten (I will do that) but
> first I would like to understand if there are more drivers to be
> updated.
> 
> Lorenzo
> 
>> Fix Multi MSI data programming with required alignment by
>> using number of vectors being requested.
>>
>> Fixes: ab597d35ef11 ("PCI: xilinx-nwl: Add support for Xilinx NWL PCIe
>> Host Controller")
>> Signed-off-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
>> ---
>> V3:
>>  - Added example description of the issue
>> ---
>>  drivers/pci/controller/pcie-xilinx-nwl.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c
>> index 81538d7..8efcb8a 100644
>> --- a/drivers/pci/controller/pcie-xilinx-nwl.c
>> +++ b/drivers/pci/controller/pcie-xilinx-nwl.c
>> @@ -483,7 +483,16 @@ static int nwl_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
>>  	int i;
>>  
>>  	mutex_lock(&msi->lock);
>> -	bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
>> +
>> +	/*
>> +	 * Multi MSI count is requested in power of two
>> +	 * Check if multi msi is requested
>> +	 */
>> +	if (nr_irqs % 2 == 0)
>> +		bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
>> +					 nr_irqs, nr_irqs - 1);
>> +	else
>> +		bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
>>  					 nr_irqs, 0);
>>  	if (bit >= INT_PCI_MSI_NR) {
>>  		mutex_unlock(&msi->lock);
>> -- 
>> 2.7.4
>>

This doesn't look like the best fix. The only case where nr_irqs is not 
set to 1 is when using Multi-MSI, so the '% 2' case actually covers all 
cases. Now, and in the interest of consistency, other drivers use a 
construct such as:

offset = bitmap_find_free_region(bitmap, bitmap_size,
				 get_count_order(nr_irqs));

which has the advantage of dealing with the bitmap setting as well.

I'd suggest something like this (completely untested):

diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c
index 3b031f00a94a..8b9b58909e7c 100644
--- a/drivers/pci/controller/pcie-xilinx-nwl.c
+++ b/drivers/pci/controller/pcie-xilinx-nwl.c
@@ -482,15 +482,13 @@ static int nwl_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	int i;
 
 	mutex_lock(&msi->lock);
-	bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
-					 nr_irqs, 0);
-	if (bit >= INT_PCI_MSI_NR) {
+	bit = bitmap_find_free_region(msi->bitmap, INT_PCI_MSI_NR,
+				      get_count_order(nr_irqs));
+	if (bit < 0) {
 		mutex_unlock(&msi->lock);
 		return -ENOSPC;
 	}
 
-	bitmap_set(msi->bitmap, bit, nr_irqs);
-
 	for (i = 0; i < nr_irqs; i++) {
 		irq_domain_set_info(domain, virq + i, bit + i, &nwl_irq_chip,
 				domain->host_data, handle_simple_irq,
@@ -508,7 +506,7 @@ static void nwl_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 	struct nwl_msi *msi = &pcie->msi;
 
 	mutex_lock(&msi->lock);
-	bitmap_clear(msi->bitmap, data->hwirq, nr_irqs);
+	bitmap_release_region(msi->bitmap, data->hwirq, get_count_order(nr_irqs));
 	mutex_unlock(&msi->lock);
 }
 

Thanks,

	M.
Bharat Kumar Gogada June 6, 2019, 4:49 a.m. UTC | #3
> On 31/05/2019 17:09, Lorenzo Pieralisi wrote:
> > [+Marc]
> >
> > On Wed, May 29, 2019 at 06:07:49PM +0530, Bharat Kumar Gogada wrote:
> >> The current Multi MSI data programming fails if multiple end points
> >> requesting MSI and multi MSI are connected with switch, i.e the
> >> current multi MSI data being given is not considering the number of
> >> vectors being requested in case of multi MSI.
> >> Ex: Two EP's connected via switch, EP1 requesting single MSI first,
> >> EP2 requesting Multi MSI of count four. The current code gives MSI
> >> data 0x0 to EP1 and 0x1 to EP2, but EP2 can modify lower two bits due
> >> to which EP2 also sends interrupt with MSI data 0x0 which results in
> >> always invoking virq of EP1 due to which EP2 MSI interrupt never gets
> >> handled.
> >
> > If this is a problem it is not the only driver where it should be
> > fixed it seems. CC'ed Marc in case I have missed something in relation
> > to MSI IRQs but AFAIU it looks like HW is allowed to toggled bits
> > (according to bits[6:4] in Message Control for MSI) in the MSI data,
> > given that the data written is the hwirq number (in this specific MSI
> > controller) it ought to be fixed.
> 
> Yeah, it looks like a number of MSI controllers could be quite broken in this
> particular area.
> 
> >
> > The commit log and patch should be rewritten (I will do that) but
> > first I would like to understand if there are more drivers to be
> > updated.
> >
> > 
Hi Lorenzo and Marc, thanks for your time.
Marc, I'm yet to test the below suggested solution,
GIC v2m and GIC v3 supports multi MSI, do we see above issue in these MSI controllers ?

Regards,
Bharat
> >
> >> Fix Multi MSI data programming with required alignment by using
> >> number of vectors being requested.
> >>
> >> Fixes: ab597d35ef11 ("PCI: xilinx-nwl: Add support for Xilinx NWL
> >> PCIe Host Controller")
> >> Signed-off-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
> >> ---
> >> V3:
> >>  - Added example description of the issue
> >> ---
> >>  drivers/pci/controller/pcie-xilinx-nwl.c | 11 ++++++++++-
> >>  1 file changed, 10 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c
> >> b/drivers/pci/controller/pcie-xilinx-nwl.c
> >> index 81538d7..8efcb8a 100644
> >> --- a/drivers/pci/controller/pcie-xilinx-nwl.c
> >> +++ b/drivers/pci/controller/pcie-xilinx-nwl.c
> >> @@ -483,7 +483,16 @@ static int nwl_irq_domain_alloc(struct
> irq_domain *domain, unsigned int virq,
> >>  	int i;
> >>
> >>  	mutex_lock(&msi->lock);
> >> -	bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
> >> +
> >> +	/*
> >> +	 * Multi MSI count is requested in power of two
> >> +	 * Check if multi msi is requested
> >> +	 */
> >> +	if (nr_irqs % 2 == 0)
> >> +		bit = bitmap_find_next_zero_area(msi->bitmap,
> INT_PCI_MSI_NR, 0,
> >> +					 nr_irqs, nr_irqs - 1);
> >> +	else
> >> +		bit = bitmap_find_next_zero_area(msi->bitmap,
> INT_PCI_MSI_NR, 0,
> >>  					 nr_irqs, 0);
> >>  	if (bit >= INT_PCI_MSI_NR) {
> >>  		mutex_unlock(&msi->lock);
> >> --
> >> 2.7.4
> >>
> 
> This doesn't look like the best fix. The only case where nr_irqs is not
> set to 1 is when using Multi-MSI, so the '% 2' case actually covers all
> cases. Now, and in the interest of consistency, other drivers use a
> construct such as:
> 
> offset = bitmap_find_free_region(bitmap, bitmap_size,
> 				 get_count_order(nr_irqs));
> 
> which has the advantage of dealing with the bitmap setting as well.
> 
> I'd suggest something like this (completely untested):
> 
> diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c
> b/drivers/pci/controller/pcie-xilinx-nwl.c
> index 3b031f00a94a..8b9b58909e7c 100644
> --- a/drivers/pci/controller/pcie-xilinx-nwl.c
> +++ b/drivers/pci/controller/pcie-xilinx-nwl.c
> @@ -482,15 +482,13 @@ static int nwl_irq_domain_alloc(struct
> irq_domain *domain, unsigned int virq,
>  	int i;
> 
>  	mutex_lock(&msi->lock);
> -	bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
> -					 nr_irqs, 0);
> -	if (bit >= INT_PCI_MSI_NR) {
> +	bit = bitmap_find_free_region(msi->bitmap, INT_PCI_MSI_NR,
> +				      get_count_order(nr_irqs));
> +	if (bit < 0) {
>  		mutex_unlock(&msi->lock);
>  		return -ENOSPC;
>  	}
> 
> -	bitmap_set(msi->bitmap, bit, nr_irqs);
> -
>  	for (i = 0; i < nr_irqs; i++) {
>  		irq_domain_set_info(domain, virq + i, bit + i, &nwl_irq_chip,
>  				domain->host_data, handle_simple_irq,
> @@ -508,7 +506,7 @@ static void nwl_irq_domain_free(struct irq_domain
> *domain, unsigned int virq,
>  	struct nwl_msi *msi = &pcie->msi;
> 
>  	mutex_lock(&msi->lock);
> -	bitmap_clear(msi->bitmap, data->hwirq, nr_irqs);
> +	bitmap_release_region(msi->bitmap, data->hwirq,
> get_count_order(nr_irqs));
>  	mutex_unlock(&msi->lock);
>  }
>
Marc Zyngier June 6, 2019, 7:18 a.m. UTC | #4
On Thu, 06 Jun 2019 05:49:45 +0100,
Bharat Kumar Gogada <bharatku@xilinx.com> wrote:
> 
> > On 31/05/2019 17:09, Lorenzo Pieralisi wrote:
> > > [+Marc]
> > >
> > > On Wed, May 29, 2019 at 06:07:49PM +0530, Bharat Kumar Gogada wrote:
> > >> The current Multi MSI data programming fails if multiple end points
> > >> requesting MSI and multi MSI are connected with switch, i.e the
> > >> current multi MSI data being given is not considering the number of
> > >> vectors being requested in case of multi MSI.
> > >> Ex: Two EP's connected via switch, EP1 requesting single MSI first,
> > >> EP2 requesting Multi MSI of count four. The current code gives MSI
> > >> data 0x0 to EP1 and 0x1 to EP2, but EP2 can modify lower two bits due
> > >> to which EP2 also sends interrupt with MSI data 0x0 which results in
> > >> always invoking virq of EP1 due to which EP2 MSI interrupt never gets
> > >> handled.
> > >
> > > If this is a problem it is not the only driver where it should be
> > > fixed it seems. CC'ed Marc in case I have missed something in relation
> > > to MSI IRQs but AFAIU it looks like HW is allowed to toggled bits
> > > (according to bits[6:4] in Message Control for MSI) in the MSI data,
> > > given that the data written is the hwirq number (in this specific MSI
> > > controller) it ought to be fixed.
> > 
> > Yeah, it looks like a number of MSI controllers could be quite broken in this
> > particular area.
> > 
> > >
> > > The commit log and patch should be rewritten (I will do that) but
> > > first I would like to understand if there are more drivers to be
> > > updated.
> > >
> > > 
> Hi Lorenzo and Marc, thanks for your time.
> Marc, I'm yet to test the below suggested solution,
> GIC v2m and GIC v3 supports multi MSI, do we see above issue in
> these MSI controllers ?

To the best of my knowledge, these drivers do support MultiMSI
correctly. GICv2m actually gained the support pretty recently (see
de337ee30142). The GICv3 ITS never ha an issue with that, given that
per device EventIDs are always 0-based.

Thanks,

	M.
Lorenzo Pieralisi June 10, 2019, 10:27 a.m. UTC | #5
On Thu, Jun 06, 2019 at 08:18:25AM +0100, Marc Zyngier wrote:
> On Thu, 06 Jun 2019 05:49:45 +0100,
> Bharat Kumar Gogada <bharatku@xilinx.com> wrote:
> > 
> > > On 31/05/2019 17:09, Lorenzo Pieralisi wrote:
> > > > [+Marc]
> > > >
> > > > On Wed, May 29, 2019 at 06:07:49PM +0530, Bharat Kumar Gogada wrote:
> > > >> The current Multi MSI data programming fails if multiple end points
> > > >> requesting MSI and multi MSI are connected with switch, i.e the
> > > >> current multi MSI data being given is not considering the number of
> > > >> vectors being requested in case of multi MSI.
> > > >> Ex: Two EP's connected via switch, EP1 requesting single MSI first,
> > > >> EP2 requesting Multi MSI of count four. The current code gives MSI
> > > >> data 0x0 to EP1 and 0x1 to EP2, but EP2 can modify lower two bits due
> > > >> to which EP2 also sends interrupt with MSI data 0x0 which results in
> > > >> always invoking virq of EP1 due to which EP2 MSI interrupt never gets
> > > >> handled.
> > > >
> > > > If this is a problem it is not the only driver where it should be
> > > > fixed it seems. CC'ed Marc in case I have missed something in relation
> > > > to MSI IRQs but AFAIU it looks like HW is allowed to toggled bits
> > > > (according to bits[6:4] in Message Control for MSI) in the MSI data,
> > > > given that the data written is the hwirq number (in this specific MSI
> > > > controller) it ought to be fixed.
> > > 
> > > Yeah, it looks like a number of MSI controllers could be quite broken in this
> > > particular area.
> > > 
> > > >
> > > > The commit log and patch should be rewritten (I will do that) but
> > > > first I would like to understand if there are more drivers to be
> > > > updated.
> > > >
> > > > 
> > Hi Lorenzo and Marc, thanks for your time.
> > Marc, I'm yet to test the below suggested solution,
> > GIC v2m and GIC v3 supports multi MSI, do we see above issue in
> > these MSI controllers ?
> 
> To the best of my knowledge, these drivers do support MultiMSI
> correctly. GICv2m actually gained the support pretty recently (see
> de337ee30142). The GICv3 ITS never ha an issue with that, given that
> per device EventIDs are always 0-based.

AFAIU I think the issues is only present in controllers that use the
hwirq as MSI data and bitmap allocation that is not a power of two and
that's what Marc suggested as fix. There is still some chasing to do to
fix other MSI controllers in the kernel where this subtle issue went
undetected (and the driver has the same bitmap allocation issues as
this one).

@Bharat, please test Marc's patch and post it on completion, I will
rewrite your commit log because I want it to be clear so that we
have a reference to the issue linked to the specs.

Thanks,
Lorenzo
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-xilinx-nwl.c b/drivers/pci/controller/pcie-xilinx-nwl.c
index 81538d7..8efcb8a 100644
--- a/drivers/pci/controller/pcie-xilinx-nwl.c
+++ b/drivers/pci/controller/pcie-xilinx-nwl.c
@@ -483,7 +483,16 @@  static int nwl_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	int i;
 
 	mutex_lock(&msi->lock);
-	bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
+
+	/*
+	 * Multi MSI count is requested in power of two
+	 * Check if multi msi is requested
+	 */
+	if (nr_irqs % 2 == 0)
+		bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
+					 nr_irqs, nr_irqs - 1);
+	else
+		bit = bitmap_find_next_zero_area(msi->bitmap, INT_PCI_MSI_NR, 0,
 					 nr_irqs, 0);
 	if (bit >= INT_PCI_MSI_NR) {
 		mutex_unlock(&msi->lock);