diff mbox series

PCI: dwc: fix MSI IRQ handler ordering

Message ID 20190104214509.16891-1-swarren@wwwdotorg.org (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series PCI: dwc: fix MSI IRQ handler ordering | expand

Commit Message

Stephen Warren Jan. 4, 2019, 9:45 p.m. UTC
From: Stephen Warren <swarren@nvidia.com>

The current code does this when handling MSI IRQs:

a) Process the irq.
b) Clear the latched IRQ status.

If a new IRQ occurs any time after (a) has read the IRQ status for the
last time and before (b), it will be lost. For example, this occurs in
practice when using a Marvell 9171 AHCI controller with NCQ enabled;
many command timeouts occur with certain disk access patterns.

Fix the code to do the following instead, so that if any new IRQs are
raised during the processing of the IRQ, the IRQ status is not cleared,
so that the IRQ is not lost.

a) Clear the latched IRQ status.
b) Process the IRQ.

This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
status after it is handled, not before")

This change re-applies commit ca1658921b63 ("PCI: designware: Fix
missing MSI IRQs")

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Faiz Abbas <faiz_abbas@ti.com>
Cc: Harro Haan <hrhaan@gmail.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Juergen Beisert <jbe@pengutronix.de>
Cc: Marek Vasut <marex@denx.de>
Cc: Matthias Mann <m.mann@arkona-technologies.de>
Cc: Mohit Kumar <mohit.kumar@st.com>
Cc: Pratyush Anand <pratyush.anand@st.com>
Cc: Richard Zhu <hong-xing.zhu@freescale.com>
Cc: Sean Cross <xobs@kosagi.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
Cc: Tim Harvey <tharvey@gateworks.com>
---
Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
However, the exact same code structure is present in mainline and I have
no reason to believe the problem would not reproduce there. I have
compile tested but not runtime tested it in mainline, since my board is
not yet supported in mainline.
---
 drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Trent Piepho Jan. 4, 2019, 10:45 p.m. UTC | #1
On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> The current code does this when handling MSI IRQs:
> 
> a) Process the irq.
> b) Clear the latched IRQ status.
> 
> If a new IRQ occurs any time after (a) has read the IRQ status for the
> last time and before (b), it will be lost. For example, this occurs in

There was a patch series to fix this bug, sent in October, stretching
to December.

See "PCI: dwc: Fix interrupt race in when handling MSI"

And "PCI: designware: Move interrupt acking into the proper callback"

The result was more code changes, which in the end produce the same
order of operations as is done here to fix the race.  But it uses the
irq framework correctly.

I do not think this problem was introduced until 4.14, so the nvidia
4.9 must have additional patches if this is observed there.  I'm sure
it was not present in 4.12, as we used that, and did not see the issue
until updating to 4.16.

The current fix (for 4.21?) depends on significant restructuring that
was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs
allocation to IRQ domains hierarchical API".

I think my original patch ("Fix interrupt race ..") or your patch here,
which is basically the same, should be ported to 4.14 stable.  But
there are some other opinions on that.  It's not clear to me if the
hierarchical domain stuff would be back ported to stable series.  It
does not seem to me to be a good idea to do so.
Stephen Warren Jan. 4, 2019, 11 p.m. UTC | #2
On 1/4/19 3:45 PM, Trent Piepho wrote:
> On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> The current code does this when handling MSI IRQs:
>>
>> a) Process the irq.
>> b) Clear the latched IRQ status.
>>
>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>> last time and before (b), it will be lost. For example, this occurs in
> 
> There was a patch series to fix this bug, sent in October, stretching
> to December.
> 
> See "PCI: dwc: Fix interrupt race in when handling MSI"
> 
> And "PCI: designware: Move interrupt acking into the proper callback"
> 
> The result was more code changes, which in the end produce the same
> order of operations as is done here to fix the race.  But it uses the
> irq framework correctly.
> 
> I do not think this problem was introduced until 4.14, so the nvidia
> 4.9 must have additional patches if this is observed there.  I'm sure
> it was not present in 4.12, as we used that, and did not see the issue
> until updating to 4.16.

Yes, I took the DWC driver from our 4.14 kernel and dropped it into the 
4.9 kernel, so that's why the issue shows up in 4.9 for us.

> The current fix (for 4.21?) depends on significant restructuring that
> was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs
> allocation to IRQ domains hierarchical API".
> 
> I think my original patch ("Fix interrupt race ..") or your patch here,
> which is basically the same, should be ported to 4.14 stable.  But
> there are some other opinions on that.  It's not clear to me if the
> hierarchical domain stuff would be back ported to stable series.  It
> does not seem to me to be a good idea to do so.

Hmm. Well I guess I'll go for the patch I posted in our downstream 
kernels, since back-porting a bunch of not-yet-available restructuring 
to our ancient kernels doesn't sound pleasant:-) But I'll go and take a 
quick look at the other patches you mentioned just in case. Thanks!
Stephen Warren Jan. 4, 2019, 11:23 p.m. UTC | #3
On 1/4/19 4:00 PM, Stephen Warren wrote:
> On 1/4/19 3:45 PM, Trent Piepho wrote:
>> On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote:
>>> From: Stephen Warren <swarren@nvidia.com>
>>>
>>> The current code does this when handling MSI IRQs:
>>>
>>> a) Process the irq.
>>> b) Clear the latched IRQ status.
>>>
>>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>>> last time and before (b), it will be lost. For example, this occurs in
>>
>> There was a patch series to fix this bug, sent in October, stretching
>> to December.
>>
>> See "PCI: dwc: Fix interrupt race in when handling MSI"
>>
>> And "PCI: designware: Move interrupt acking into the proper callback"
>>
>> The result was more code changes, which in the end produce the same
>> order of operations as is done here to fix the race.  But it uses the
>> irq framework correctly.
>>
>> I do not think this problem was introduced until 4.14, so the nvidia
>> 4.9 must have additional patches if this is observed there.  I'm sure
>> it was not present in 4.12, as we used that, and did not see the issue
>> until updating to 4.16.
> 
> Yes, I took the DWC driver from our 4.14 kernel and dropped it into the 
> 4.9 kernel, so that's why the issue shows up in 4.9 for us.
> 
>> The current fix (for 4.21?) depends on significant restructuring that
>> was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs
>> allocation to IRQ domains hierarchical API".
>>
>> I think my original patch ("Fix interrupt race ..") or your patch here,
>> which is basically the same, should be ported to 4.14 stable.  But
>> there are some other opinions on that.  It's not clear to me if the
>> hierarchical domain stuff would be back ported to stable series.  It
>> does not seem to me to be a good idea to do so.
> 
> Hmm. Well I guess I'll go for the patch I posted in our downstream 
> kernels, since back-porting a bunch of not-yet-available restructuring 
> to our ancient kernels doesn't sound pleasant:-) But I'll go and take a 
> quick look at the other patches you mentioned just in case. Thanks!

So I went and read the thread for "PCI: dwc: Fix interrupt race in when 
handling MSI" and see:

Vignesh R wrote:
 > Lorenzo Pieralisi wrote:
>> AFAICS:
>> 
>> 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled,
>> not before")
>> 
>> was fixing a bug, causing "timeouts on some wireless lan cards", we want
>> to understand what the problem is, fix it once for all on all DWC
>> based systems.
>> 
> 
> That issue was root caused to be due to a HW errata in dra7xx DWC
> wrapper which requires a special way of handling MSI interrupts at
> wrapper level. More info in this thread:
> https://www.spinics.net/lists/linux-pci/msg70462.html
> 
> Unfortunately, commit 8c934095fa2f did not fix WLAN issue in longer
> tests and also broke PCIe USB cards. Therefore, it makes sense to revert
> 8c934095fa2f
> 
> I am working on patches fix dra7xx wrapper for WLAN card issue.

So it seems quite clear that the correct course of action is to 
immediately revert commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt 
status after it is handled, not before") (or apply my or Trent's patches 
which are effectively reverts) since it (a) attempts to fix a bug (in 
the core DWC driver) that doesn't actually exist (the bug is in the DRA 
HW and is being fixed in the DRA wrapper driver), and (b) has many 
reports that the change causes regressions; I'm at least the 3rd or 4th 
person to confirm this now (Faiz originally, then later Trent, Vignesh, 
myself).

Now, whether there's some cleanup or additional fixes needed beyond the 
simple revert is another question I currently have no insight into, but 
let's at least get back to a driver without that worked well for people 
for years, even if there's a theoretical issue to be fixed that nobody 
hit in practice.
Trent Piepho Jan. 5, 2019, 12:15 a.m. UTC | #4
On Fri, 2019-01-04 at 16:23 -0700, Stephen Warren wrote:
> On 1/4/19 4:00 PM, Stephen Warren wrote:
> > 
> > 
> > Hmm. Well I guess I'll go for the patch I posted in our downstream 
> > kernels, since back-porting a bunch of not-yet-available restructuring 
> > to our ancient kernels doesn't sound pleasant:-) But I'll go and take a 
> > quick look at the other patches you mentioned just in case. Thanks!
> 
> So I went and read the thread for "PCI: dwc: Fix interrupt race in when 
> handling MSI" and see:
> 
> Vignesh R wrote:
>  > Lorenzo Pieralisi wrote:
> > > AFAICS:
> > > 
> > > 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled,
> > > not before")
> > > 
> > > was fixing a bug, causing "timeouts on some wireless lan cards", we want
> > > to understand what the problem is, fix it once for all on all DWC
> > > based systems.
> > > 
> So it seems quite clear that the correct course of action is to 
> immediately revert commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt 
> status after it is handled, not before") (or apply my or Trent's patches 
> which are effectively reverts) since it (a) attempts to fix a bug (in 
> the core DWC driver) that doesn't actually exist (the bug is in the DRA 
> HW and is being fixed in the DRA wrapper driver), and (b) has many 
> reports that the change causes regressions; I'm at least the 3rd or 4th 
> person to confirm this now (Faiz originally, then later Trent, Vignesh, 
> myself).

It also seemed clear to me!  But if you check the thread fully, this
was very forcefully rejected.

> Now, whether there's some cleanup or additional fixes needed beyond the 
> simple revert is another question I currently have no insight into, but 
> let's at least get back to a driver without that worked well for people 
> for years, even if there's a theoretical issue to be fixed that nobody 
> hit in practice.

I do not think a real or theoretical mechanism for incorrect behavior
was identified beyond the revert we know of.  More of a case of pieces
not being where the framework intended them to be.
Gustavo Pimentel Jan. 7, 2019, 9:36 a.m. UTC | #5
Hi Stephen,

On 04/01/2019 21:45, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
>
> The current code does this when handling MSI IRQs:
>
> a) Process the irq.
> b) Clear the latched IRQ status.
>
> If a new IRQ occurs any time after (a) has read the IRQ status for the
> last time and before (b), it will be lost. For example, this occurs in
> practice when using a Marvell 9171 AHCI controller with NCQ enabled;
> many command timeouts occur with certain disk access patterns.
>
> Fix the code to do the following instead, so that if any new IRQs are
> raised during the processing of the IRQ, the IRQ status is not cleared,
> so that the IRQ is not lost.
>
> a) Clear the latched IRQ status.
> b) Process the IRQ.
>
> This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
> status after it is handled, not before")
>
> This change re-applies commit ca1658921b63 ("PCI: designware: Fix
> missing MSI IRQs")
>
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Faiz Abbas <faiz_abbas@ti.com>
> Cc: Harro Haan <hrhaan@gmail.com>
> Cc: Jingoo Han <jg1.han@samsung.com>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Juergen Beisert <jbe@pengutronix.de>
> Cc: Marek Vasut <marex@denx.de>
> Cc: Matthias Mann <m.mann@arkona-technologies.de>
> Cc: Mohit Kumar <mohit.kumar@st.com>
> Cc: Pratyush Anand <pratyush.anand@st.com>
> Cc: Richard Zhu <hong-xing.zhu@freescale.com>
> Cc: Sean Cross <xobs@kosagi.com>
> Cc: Shawn Guo <shawn.guo@linaro.org>
> Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
> Cc: Tim Harvey <tharvey@gateworks.com>
> ---
> Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
> However, the exact same code structure is present in mainline and I have
> no reason to believe the problem would not reproduce there. I have
> compile tested but not runtime tested it in mainline, since my board is
> not yet supported in mainline.
> ---
>  drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index 692dd1b264fb..7fd6c56a6f35 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>  			irq = irq_find_mapping(pp->irq_domain,
>  					       (i * MAX_MSI_IRQS_PER_CTRL) +
>  					       pos);
> -			generic_handle_irq(irq);
>  			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>  						(i * MSI_REG_CTRL_BLOCK_SIZE),
>  					    4, 1 << pos);
> +			generic_handle_irq(irq);
>  			pos++;
>  		}
>  	}

This fix was already suggested by Trent Piepho, however, after review by Marc
Zyngier and some information obtained from Synopsys IP development team, the
following set of patches was generated to correct the problem.

830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling")
fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt")
3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback")

Regards,

Gustavo
Stephen Warren Jan. 7, 2019, 5:16 p.m. UTC | #6
On 1/7/19 2:36 AM, Gustavo Pimentel wrote:
> Hi Stephen,
> 
> On 04/01/2019 21:45, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> The current code does this when handling MSI IRQs:
>>
>> a) Process the irq.
>> b) Clear the latched IRQ status.
>>
>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>> last time and before (b), it will be lost. For example, this occurs in
>> practice when using a Marvell 9171 AHCI controller with NCQ enabled;
>> many command timeouts occur with certain disk access patterns.
>>
>> Fix the code to do the following instead, so that if any new IRQs are
>> raised during the processing of the IRQ, the IRQ status is not cleared,
>> so that the IRQ is not lost.
>>
>> a) Clear the latched IRQ status.
>> b) Process the IRQ.
>>
>> This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
>> status after it is handled, not before")
>>
>> This change re-applies commit ca1658921b63 ("PCI: designware: Fix
>> missing MSI IRQs")
>>
>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Faiz Abbas <faiz_abbas@ti.com>
>> Cc: Harro Haan <hrhaan@gmail.com>
>> Cc: Jingoo Han <jg1.han@samsung.com>
>> Cc: Joao Pinto <jpinto@synopsys.com>
>> Cc: Juergen Beisert <jbe@pengutronix.de>
>> Cc: Marek Vasut <marex@denx.de>
>> Cc: Matthias Mann <m.mann@arkona-technologies.de>
>> Cc: Mohit Kumar <mohit.kumar@st.com>
>> Cc: Pratyush Anand <pratyush.anand@st.com>
>> Cc: Richard Zhu <hong-xing.zhu@freescale.com>
>> Cc: Sean Cross <xobs@kosagi.com>
>> Cc: Shawn Guo <shawn.guo@linaro.org>
>> Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
>> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
>> Cc: Tim Harvey <tharvey@gateworks.com>
>> ---
>> Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
>> However, the exact same code structure is present in mainline and I have
>> no reason to believe the problem would not reproduce there. I have
>> compile tested but not runtime tested it in mainline, since my board is
>> not yet supported in mainline.
>> ---
>>   drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
>> index 692dd1b264fb..7fd6c56a6f35 100644
>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>>   			irq = irq_find_mapping(pp->irq_domain,
>>   					       (i * MAX_MSI_IRQS_PER_CTRL) +
>>   					       pos);
>> -			generic_handle_irq(irq);
>>   			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>>   						(i * MSI_REG_CTRL_BLOCK_SIZE),
>>   					    4, 1 << pos);
>> +			generic_handle_irq(irq);
>>   			pos++;
>>   		}
>>   	}
> 
> This fix was already suggested by Trent Piepho, however, after review by Marc
> Zyngier and some information obtained from Synopsys IP development team, the
> following set of patches was generated to correct the problem.
> 
> 830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling")
> fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt")
> 3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback")

OK. I assume that dw_pci_bottom_ack() gets called before 
dw_handle_msi_irq() for each IRQ? In which case, the last of those 3 
changes is essentially identical to the patch I proposed, it's just that 
the acking is handled in a separate function rather than earlier in the 
same function.
Gustavo Pimentel Jan. 7, 2019, 6:09 p.m. UTC | #7
On 07/01/2019 17:16, Stephen Warren wrote:
> On 1/7/19 2:36 AM, Gustavo Pimentel wrote:
>> Hi Stephen,
>>
>> On 04/01/2019 21:45, Stephen Warren wrote:
>>> From: Stephen Warren <swarren@nvidia.com>
>>>
>>> The current code does this when handling MSI IRQs:
>>>
>>> a) Process the irq.
>>> b) Clear the latched IRQ status.
>>>
>>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>>> last time and before (b), it will be lost. For example, this occurs in
>>> practice when using a Marvell 9171 AHCI controller with NCQ enabled;
>>> many command timeouts occur with certain disk access patterns.
>>>
>>> Fix the code to do the following instead, so that if any new IRQs are
>>> raised during the processing of the IRQ, the IRQ status is not cleared,
>>> so that the IRQ is not lost.
>>>
>>> a) Clear the latched IRQ status.
>>> b) Process the IRQ.
>>>
>>> This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
>>> status after it is handled, not before")
>>>
>>> This change re-applies commit ca1658921b63 ("PCI: designware: Fix
>>> missing MSI IRQs")
>>>
>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>>> Cc: Faiz Abbas <faiz_abbas@ti.com>
>>> Cc: Harro Haan <hrhaan@gmail.com>
>>> Cc: Jingoo Han <jg1.han@samsung.com>
>>> Cc: Joao Pinto <jpinto@synopsys.com>
>>> Cc: Juergen Beisert <jbe@pengutronix.de>
>>> Cc: Marek Vasut <marex@denx.de>
>>> Cc: Matthias Mann <m.mann@arkona-technologies.de>
>>> Cc: Mohit Kumar <mohit.kumar@st.com>
>>> Cc: Pratyush Anand <pratyush.anand@st.com>
>>> Cc: Richard Zhu <hong-xing.zhu@freescale.com>
>>> Cc: Sean Cross <xobs@kosagi.com>
>>> Cc: Shawn Guo <shawn.guo@linaro.org>
>>> Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
>>> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
>>> Cc: Tim Harvey <tharvey@gateworks.com>
>>> ---
>>> Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
>>> However, the exact same code structure is present in mainline and I have
>>> no reason to believe the problem would not reproduce there. I have
>>> compile tested but not runtime tested it in mainline, since my board is
>>> not yet supported in mainline.
>>> ---
>>>   drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
>>> index 692dd1b264fb..7fd6c56a6f35 100644
>>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>>> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>>>   			irq = irq_find_mapping(pp->irq_domain,
>>>   					       (i * MAX_MSI_IRQS_PER_CTRL) +
>>>   					       pos);
>>> -			generic_handle_irq(irq);
>>>   			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>>>   						(i * MSI_REG_CTRL_BLOCK_SIZE),
>>>   					    4, 1 << pos);
>>> +			generic_handle_irq(irq);
>>>   			pos++;
>>>   		}
>>>   	}
>>
>> This fix was already suggested by Trent Piepho, however, after review by Marc
>> Zyngier and some information obtained from Synopsys IP development team, the
>> following set of patches was generated to correct the problem.
>>
>> 830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling")
>> fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt")
>> 3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback")
> 
> OK. I assume that dw_pci_bottom_ack() gets called before 
> dw_handle_msi_irq() for each IRQ? In which case, the last of those 3 

Yes, the dw_pci_bottom_ack() gets called before dw_handle_msi_irq().

> changes is essentially identical to the patch I proposed, it's just that 
> the acking is handled in a separate function rather than earlier in the 
> same function.
> 

No, there is also two other fixes. A missing lock/unlock on the ack and the
correct use of the MASK/UNMASK interrupt register instead of ENABLE interrupt
register.
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 692dd1b264fb..7fd6c56a6f35 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -98,10 +98,10 @@  irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
 			irq = irq_find_mapping(pp->irq_domain,
 					       (i * MAX_MSI_IRQS_PER_CTRL) +
 					       pos);
-			generic_handle_irq(irq);
 			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
 						(i * MSI_REG_CTRL_BLOCK_SIZE),
 					    4, 1 << pos);
+			generic_handle_irq(irq);
 			pos++;
 		}
 	}