diff mbox series

PCI: mediatek: Change MSI interrupt processing sequence

Message ID 20220123033306.29799-1-qizhong.cheng@mediatek.com (mailing list archive)
State New, archived
Headers show
Series PCI: mediatek: Change MSI interrupt processing sequence | expand

Commit Message

qizhong cheng Jan. 23, 2022, 3:33 a.m. UTC
As an edge-triggered interrupts, its interrupt status should be cleared
before dispatch to the handler of device.

Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com>
---
 drivers/pci/controller/pcie-mediatek.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Chen-Yu Tsai Jan. 24, 2022, 3:12 a.m. UTC | #1
Hi,

On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng
<qizhong.cheng@mediatek.com> wrote:
>
> As an edge-triggered interrupts, its interrupt status should be cleared
> before dispatch to the handler of device.

I'm curious, is this just a code correction or are there real world
cases where something fails?

Also, please add a Fixes tag and maybe Cc stable so this gets backported
automatically.

ChenYu

> Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com>
> ---
>  drivers/pci/controller/pcie-mediatek.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c
> index 2f3f974977a3..705ea33758b1 100644
> --- a/drivers/pci/controller/pcie-mediatek.c
> +++ b/drivers/pci/controller/pcie-mediatek.c
> @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct irq_desc *desc)
>                 if (status & MSI_STATUS){
>                         unsigned long imsi_status;
>
> +                       /* Clear MSI interrupt status */
> +                       writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
>                         while ((imsi_status = readl(port->base + PCIE_IMSI_STATUS))) {
>                                 for_each_set_bit(bit, &imsi_status, MTK_MSI_IRQS_NUM)
>                                         generic_handle_domain_irq(port->inner_domain, bit);
>                         }
> -                       /* Clear MSI interrupt status */
> -                       writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
>                 }
>         }
>
> --
> 2.25.1
>
>
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
qizhong cheng Jan. 24, 2022, 6:27 a.m. UTC | #2
Hi chenYu,

On Mon, 2022-01-24 at 11:12 +0800, Chen-Yu Tsai wrote:
> Hi,
> 
> On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng
> <qizhong.cheng@mediatek.com> wrote:
> > 
> > As an edge-triggered interrupts, its interrupt status should be
> > cleared
> > before dispatch to the handler of device.
> 
> I'm curious, is this just a code correction or are there real world
> cases where something fails?

Yes, we found a failure when used iperf tool for wifi and network cards
performance testing. The function of "while" has just been executed,
and the EP sent an MSI before executing "Clear MSI interrupt status".
After executing "Clear MSI interrupt status", this edge-triggered
interrupt status is cleared, but EP is still waiting for interrupt
handler.

> 
> Also, please add a Fixes tag and maybe Cc stable so this gets
> backported
> automatically.

Thanks for your review, I will fix it in the next version.

> 
> ChenYu
> 
> > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com>
> > ---
> >  drivers/pci/controller/pcie-mediatek.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/pcie-mediatek.c
> > b/drivers/pci/controller/pcie-mediatek.c
> > index 2f3f974977a3..705ea33758b1 100644
> > --- a/drivers/pci/controller/pcie-mediatek.c
> > +++ b/drivers/pci/controller/pcie-mediatek.c
> > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct
> > irq_desc *desc)
> >                 if (status & MSI_STATUS){
> >                         unsigned long imsi_status;
> > 
> > +                       /* Clear MSI interrupt status */
> > +                       writel(MSI_STATUS, port->base +
> > PCIE_INT_STATUS);
> >                         while ((imsi_status = readl(port->base +
> > PCIE_IMSI_STATUS))) {
> >                                 for_each_set_bit(bit, &imsi_status,
> > MTK_MSI_IRQS_NUM)
> >                                         generic_handle_domain_irq(p
> > ort->inner_domain, bit);
> >                         }
> > -                       /* Clear MSI interrupt status */
> > -                       writel(MSI_STATUS, port->base +
> > PCIE_INT_STATUS);
> >                 }
> >         }
> > 
> > --
> > 2.25.1
> > 
> > 
> > _______________________________________________
> > Linux-mediatek mailing list
> > Linux-mediatek@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-mediatek
Chen-Yu Tsai Jan. 24, 2022, 6:55 a.m. UTC | #3
On Mon, Jan 24, 2022 at 2:27 PM qizhong.cheng
<qizhong.cheng@mediatek.com> wrote:
>
> Hi chenYu,
>
> On Mon, 2022-01-24 at 11:12 +0800, Chen-Yu Tsai wrote:
> > Hi,
> >
> > On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng
> > <qizhong.cheng@mediatek.com> wrote:
> > >
> > > As an edge-triggered interrupts, its interrupt status should be
> > > cleared
> > > before dispatch to the handler of device.
> >
> > I'm curious, is this just a code correction or are there real world
> > cases where something fails?
>
> Yes, we found a failure when used iperf tool for wifi and network cards
> performance testing. The function of "while" has just been executed,
> and the EP sent an MSI before executing "Clear MSI interrupt status".
> After executing "Clear MSI interrupt status", this edge-triggered
> interrupt status is cleared, but EP is still waiting for interrupt
> handler.

Can you also include this in the commit log?  It would be nice to record
the exact scenario that this fix targets.

ChenYu

> >
> > Also, please add a Fixes tag and maybe Cc stable so this gets
> > backported
> > automatically.
>
> Thanks for your review, I will fix it in the next version.
>
> >
> > ChenYu
> >
> > > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com>
> > > ---
> > >  drivers/pci/controller/pcie-mediatek.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/pcie-mediatek.c
> > > b/drivers/pci/controller/pcie-mediatek.c
> > > index 2f3f974977a3..705ea33758b1 100644
> > > --- a/drivers/pci/controller/pcie-mediatek.c
> > > +++ b/drivers/pci/controller/pcie-mediatek.c
> > > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct
> > > irq_desc *desc)
> > >                 if (status & MSI_STATUS){
> > >                         unsigned long imsi_status;
> > >
> > > +                       /* Clear MSI interrupt status */
> > > +                       writel(MSI_STATUS, port->base +
> > > PCIE_INT_STATUS);
> > >                         while ((imsi_status = readl(port->base +
> > > PCIE_IMSI_STATUS))) {
> > >                                 for_each_set_bit(bit, &imsi_status,
> > > MTK_MSI_IRQS_NUM)
> > >                                         generic_handle_domain_irq(p
> > > ort->inner_domain, bit);
> > >                         }
> > > -                       /* Clear MSI interrupt status */
> > > -                       writel(MSI_STATUS, port->base +
> > > PCIE_INT_STATUS);
> > >                 }
> > >         }
> > >
> > > --
> > > 2.25.1
> > >
> > >
> > > _______________________________________________
> > > Linux-mediatek mailing list
> > > Linux-mediatek@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-mediatek
>
qizhong cheng Jan. 24, 2022, 8:34 a.m. UTC | #4
On Mon, 2022-01-24 at 14:55 +0800, Chen-Yu Tsai wrote:
> On Mon, Jan 24, 2022 at 2:27 PM qizhong.cheng
> <qizhong.cheng@mediatek.com> wrote:
> > 
> > Hi chenYu,
> > 
> > On Mon, 2022-01-24 at 11:12 +0800, Chen-Yu Tsai wrote:
> > > Hi,
> > > 
> > > On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng
> > > <qizhong.cheng@mediatek.com> wrote:
> > > > 
> > > > As an edge-triggered interrupts, its interrupt status should be
> > > > cleared
> > > > before dispatch to the handler of device.
> > > 
> > > I'm curious, is this just a code correction or are there real
> > > world
> > > cases where something fails?
> > 
> > Yes, we found a failure when used iperf tool for wifi and network
> > cards
> > performance testing. The function of "while" has just been
> > executed,
> > and the EP sent an MSI before executing "Clear MSI interrupt
> > status".
> > After executing "Clear MSI interrupt status", this edge-triggered
> > interrupt status is cleared, but EP is still waiting for interrupt
> > handler.
> 
> Can you also include this in the commit log?  It would be nice to
> record
> the exact scenario that this fix targets.

Thanks for your suggestion. I will add commit log in the next version
for others review.

> 
> ChenYu
> 
> > > 
> > > Also, please add a Fixes tag and maybe Cc stable so this gets
> > > backported
> > > automatically.
> > 
> > Thanks for your review, I will fix it in the next version.
> > 
> > > 
> > > ChenYu
> > > 
> > > > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com>
> > > > ---
> > > >  drivers/pci/controller/pcie-mediatek.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/pcie-mediatek.c
> > > > b/drivers/pci/controller/pcie-mediatek.c
> > > > index 2f3f974977a3..705ea33758b1 100644
> > > > --- a/drivers/pci/controller/pcie-mediatek.c
> > > > +++ b/drivers/pci/controller/pcie-mediatek.c
> > > > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct
> > > > irq_desc *desc)
> > > >                 if (status & MSI_STATUS){
> > > >                         unsigned long imsi_status;
> > > > 
> > > > +                       /* Clear MSI interrupt status */
> > > > +                       writel(MSI_STATUS, port->base +
> > > > PCIE_INT_STATUS);
> > > >                         while ((imsi_status = readl(port->base
> > > > +
> > > > PCIE_IMSI_STATUS))) {
> > > >                                 for_each_set_bit(bit,
> > > > &imsi_status,
> > > > MTK_MSI_IRQS_NUM)
> > > >                                         generic_handle_domain_i
> > > > rq(p
> > > > ort->inner_domain, bit);
> > > >                         }
> > > > -                       /* Clear MSI interrupt status */
> > > > -                       writel(MSI_STATUS, port->base +
> > > > PCIE_INT_STATUS);
> > > >                 }
> > > >         }
> > > > 
> > > > --
> > > > 2.25.1
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Linux-mediatek mailing list
> > > > Linux-mediatek@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-mediatek
Bjorn Helgaas Jan. 25, 2022, 4:57 p.m. UTC | #5
All patches change *something*.  Can you update the subject line so it
says something specific about the change?

Maybe something like "Clear MSI status before dispatching handler"?

On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
> As an edge-triggered interrupts, its interrupt status should be cleared
> before dispatch to the handler of device.

I'm not an IRQ expert, but the reasoning that "we should clear the MSI
interrupt status before dispatching the handler because MSI is an
edge-triggered interrupt" doesn't seem completely convincing because
your code will now look like this:

  /* Clear the INTx */
  writel(1 << bit, port->base + PCIE_INT_STATUS);
  generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT);
  ...

  /* Clear MSI interrupt status */
  writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
  generic_handle_domain_irq(port->inner_domain, bit);

You clear interrupt status before dispatching the handler for *both*
level-triggered INTx interrupts and edge-triggered MSI interrupts.

So it doesn't seem that simply being edge-triggered is the critical
factor here.

> Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com>
> ---
>  drivers/pci/controller/pcie-mediatek.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c
> index 2f3f974977a3..705ea33758b1 100644
> --- a/drivers/pci/controller/pcie-mediatek.c
> +++ b/drivers/pci/controller/pcie-mediatek.c
> @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct irq_desc *desc)
>  		if (status & MSI_STATUS){
>  			unsigned long imsi_status;
>  
> +			/* Clear MSI interrupt status */
> +			writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
>  			while ((imsi_status = readl(port->base + PCIE_IMSI_STATUS))) {
>  				for_each_set_bit(bit, &imsi_status, MTK_MSI_IRQS_NUM)
>  					generic_handle_domain_irq(port->inner_domain, bit);
>  			}
> -			/* Clear MSI interrupt status */
> -			writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
>  		}
>  	}
>  
> -- 
> 2.25.1
> 
> 
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
Marc Zyngier Jan. 25, 2022, 5:21 p.m. UTC | #6
On 2022-01-25 16:57, Bjorn Helgaas wrote:
> All patches change *something*.  Can you update the subject line so it
> says something specific about the change?
> 
> Maybe something like "Clear MSI status before dispatching handler"?
> 
> On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
>> As an edge-triggered interrupts, its interrupt status should be 
>> cleared
>> before dispatch to the handler of device.
> 
> I'm not an IRQ expert, but the reasoning that "we should clear the MSI
> interrupt status before dispatching the handler because MSI is an
> edge-triggered interrupt" doesn't seem completely convincing because
> your code will now look like this:
> 
>   /* Clear the INTx */
>   writel(1 << bit, port->base + PCIE_INT_STATUS);
>   generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT);
>   ...
> 
>   /* Clear MSI interrupt status */
>   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
>   generic_handle_domain_irq(port->inner_domain, bit);
> 
> You clear interrupt status before dispatching the handler for *both*
> level-triggered INTx interrupts and edge-triggered MSI interrupts.
> 
> So it doesn't seem that simply being edge-triggered is the critical
> factor here.

This is the usual problem with these half-baked implementations.
The signalling to the primary interrupt controller is level, as
they take a multitude of input and (crucially) latch the MSI
edges. Effectively, this is an edge-to-level converter, with
all the problems that this creates.

By clearing the status *after* the handling, you lose edges that
have been received and coalesced after the read of the status
register. By clearing it *before*, you are acknowledging the
interrupts early, and allowing them to be coalesced independently
of the ones that have been received earlier.

This is however mostly an educated guess. Someone with access
to the TRM should verify this.

Thanks,

         M.
qizhong cheng Jan. 26, 2022, 3:37 a.m. UTC | #7
On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote:
> On 2022-01-25 16:57, Bjorn Helgaas wrote:
> > All patches change *something*.  Can you update the subject line so
> > it
> > says something specific about the change?
> > 
> > Maybe something like "Clear MSI status before dispatching handler"?
> > 
> > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
> > > As an edge-triggered interrupts, its interrupt status should be 
> > > cleared
> > > before dispatch to the handler of device.
> > 
> > I'm not an IRQ expert, but the reasoning that "we should clear the
> > MSI
> > interrupt status before dispatching the handler because MSI is an
> > edge-triggered interrupt" doesn't seem completely convincing
> > because
> > your code will now look like this:
> > 
> >   /* Clear the INTx */
> >   writel(1 << bit, port->base + PCIE_INT_STATUS);
> >   generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT);
> >   ...
> > 
> >   /* Clear MSI interrupt status */
> >   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
> >   generic_handle_domain_irq(port->inner_domain, bit);
> > 
> > You clear interrupt status before dispatching the handler for
> > *both*
> > level-triggered INTx interrupts and edge-triggered MSI interrupts.
> > 
> > So it doesn't seem that simply being edge-triggered is the critical
> > factor here.
> 
> This is the usual problem with these half-baked implementations.
> The signalling to the primary interrupt controller is level, as
> they take a multitude of input and (crucially) latch the MSI
> edges. Effectively, this is an edge-to-level converter, with
> all the problems that this creates.
> 
> By clearing the status *after* the handling, you lose edges that
> have been received and coalesced after the read of the status
> register. By clearing it *before*, you are acknowledging the
> interrupts early, and allowing them to be coalesced independently
> of the ones that have been received earlier.
> 
> This is however mostly an educated guess. Someone with access
> to the TRM should verify this.
> 

Yes, as Maz said, we save the edge-interrupt status so that it becomes
a level-interrupt. This is similar to an edge-to-level converter, so we
need to clear it *before*. We found this problem through a lot of
experiments and tested this patch.

Thanks Helgaas and Maz for your comment.

--
Jazz ain't dead, dreams haven't parted with you.
Bjorn Helgaas Jan. 27, 2022, 9:21 p.m. UTC | #8
[+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun]

On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote:
> On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote:
> > On 2022-01-25 16:57, Bjorn Helgaas wrote:
> > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
> > > > As an edge-triggered interrupts, its interrupt status should
> > > > be cleared before dispatch to the handler of device.
> > > 
> > > I'm not an IRQ expert, but the reasoning that "we should clear
> > > the MSI interrupt status before dispatching the handler because
> > > MSI is an edge-triggered interrupt" doesn't seem completely
> > > convincing because your code will now look like this:
> > > 
> > >   /* Clear the INTx */
> > >   writel(1 << bit, port->base + PCIE_INT_STATUS);
> > >   generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT);
> > >   ...
> > > 
> > >   /* Clear MSI interrupt status */
> > >   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
> > >   generic_handle_domain_irq(port->inner_domain, bit);
> > > 
> > > You clear interrupt status before dispatching the handler for
> > > *both* level-triggered INTx interrupts and edge-triggered MSI
> > > interrupts.
> > > 
> > > So it doesn't seem that simply being edge-triggered is the
> > > critical factor here.
> > 
> > This is the usual problem with these half-baked implementations.
> > The signalling to the primary interrupt controller is level, as
> > they take a multitude of input and (crucially) latch the MSI
> > edges. Effectively, this is an edge-to-level converter, with all
> > the problems that this creates.
> > 
> > By clearing the status *after* the handling, you lose edges that
> > have been received and coalesced after the read of the status
> > register. By clearing it *before*, you are acknowledging the
> > interrupts early, and allowing them to be coalesced independently
> > of the ones that have been received earlier.
> > 
> > This is however mostly an educated guess. Someone with access to
> > the TRM should verify this.
> 
> Yes, as Maz said, we save the edge-interrupt status so that it
> becomes a level-interrupt. This is similar to an edge-to-level
> converter, so we need to clear it *before*. We found this problem
> through a lot of experiments and tested this patch.

I thought there might be other host controllers with similar design,
so I looked at all the other drivers and tried to figure out whether
any others had similar problems.

The ones below look suspicious to me because they all clear some sort
of status register *after* handling an MSI.  Can you guys take a look
and make sure they are working correctly?

  keembay_pcie_msi_irq_handler
    status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
    if (status & MSI_CTRL_INT)
      dw_handle_msi_irq
	generic_handle_domain_irq
      writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)

  spear13xx_pcie_irq_handler
    status = readl(&app_reg->int_sts)
    if (status & MSI_CTRL_INT)
      dw_handle_msi_irq
	generic_handle_domain_irq
    writel(status, &app_reg->int_clr)

  advk_pcie_handle_int
    isr0_status = advk_readl(pcie, PCIE_ISR0_REG)
    if (isr0_status & PCIE_ISR0_MSI_INT_PENDING)
      advk_pcie_handle_msi
        advk_readl(pcie, PCIE_MSI_STATUS_REG)
	advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG)
	generic_handle_irq
	advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG)

  mtk_pcie_irq_handler
    status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG)
    for_each_set_bit_from(irq_bit, &status, ...)
      mtk_pcie_msi_handler
        generic_handle_domain_irq
      writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG)

Bjorn
Jianjun Wang (王建军) Jan. 28, 2022, 7:58 a.m. UTC | #9
Hi Bjorn,

On Thu, 2022-01-27 at 15:21 -0600, Bjorn Helgaas wrote:
> [+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun]
> 
> On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote:
> > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote:
> > > On 2022-01-25 16:57, Bjorn Helgaas wrote:
> > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
> > > > > As an edge-triggered interrupts, its interrupt status should
> > > > > be cleared before dispatch to the handler of device.
> > > > 
> > > > I'm not an IRQ expert, but the reasoning that "we should clear
> > > > the MSI interrupt status before dispatching the handler because
> > > > MSI is an edge-triggered interrupt" doesn't seem completely
> > > > convincing because your code will now look like this:
> > > > 
> > > >   /* Clear the INTx */
> > > >   writel(1 << bit, port->base + PCIE_INT_STATUS);
> > > >   generic_handle_domain_irq(port->irq_domain, bit -
> > > > INTX_SHIFT);
> > > >   ...
> > > > 
> > > >   /* Clear MSI interrupt status */
> > > >   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
> > > >   generic_handle_domain_irq(port->inner_domain, bit);
> > > > 
> > > > You clear interrupt status before dispatching the handler for
> > > > *both* level-triggered INTx interrupts and edge-triggered MSI
> > > > interrupts.
> > > > 
> > > > So it doesn't seem that simply being edge-triggered is the
> > > > critical factor here.
> > > 
> > > This is the usual problem with these half-baked implementations.
> > > The signalling to the primary interrupt controller is level, as
> > > they take a multitude of input and (crucially) latch the MSI
> > > edges. Effectively, this is an edge-to-level converter, with all
> > > the problems that this creates.
> > > 
> > > By clearing the status *after* the handling, you lose edges that
> > > have been received and coalesced after the read of the status
> > > register. By clearing it *before*, you are acknowledging the
> > > interrupts early, and allowing them to be coalesced independently
> > > of the ones that have been received earlier.
> > > 
> > > This is however mostly an educated guess. Someone with access to
> > > the TRM should verify this.
> > 
> > Yes, as Maz said, we save the edge-interrupt status so that it
> > becomes a level-interrupt. This is similar to an edge-to-level
> > converter, so we need to clear it *before*. We found this problem
> > through a lot of experiments and tested this patch.
> 
> I thought there might be other host controllers with similar design,
> so I looked at all the other drivers and tried to figure out whether
> any others had similar problems.
> 
> The ones below look suspicious to me because they all clear some sort
> of status register *after* handling an MSI.  Can you guys take a look
> and make sure they are working correctly?
> 
>   keembay_pcie_msi_irq_handler
>     status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
>     if (status & MSI_CTRL_INT)
>       dw_handle_msi_irq
> 	generic_handle_domain_irq
>       writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> 
>   spear13xx_pcie_irq_handler
>     status = readl(&app_reg->int_sts)
>     if (status & MSI_CTRL_INT)
>       dw_handle_msi_irq
> 	generic_handle_domain_irq
>     writel(status, &app_reg->int_clr)
> 
>   advk_pcie_handle_int
>     isr0_status = advk_readl(pcie, PCIE_ISR0_REG)
>     if (isr0_status & PCIE_ISR0_MSI_INT_PENDING)
>       advk_pcie_handle_msi
>         advk_readl(pcie, PCIE_MSI_STATUS_REG)
> 	advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG)
> 	generic_handle_irq
> 	advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG)
> 
>   mtk_pcie_irq_handler
>     status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG)
>     for_each_set_bit_from(irq_bit, &status, ...)
>       mtk_pcie_msi_handler
>         generic_handle_domain_irq
>       writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG)

Thanks for mention that. In the hardware corresponding to pcie-
mediatek-gen3.c, the interrupt status in PCIE_INT_STATUS_REG cannot be
cleared if the MSI status remaining in the register of msi_set, so we
have to clear it after handling the MSI.

I guess the root cause of this patch is the interrupt status can be
cleared even the MSI status still remaining, hence that if there are
some MSIs received while clearing the interrupt status, these MSIs
cannot be serviced.

We will discuss and test internally and update the results later,
thanks for your review.

Thanks.

> 
> Bjorn
Marc Zyngier Jan. 28, 2022, 8:57 a.m. UTC | #10
On Thu, 27 Jan 2022 21:21:00 +0000,
Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> [+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun]
> 
> On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote:
> > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote:
> > > On 2022-01-25 16:57, Bjorn Helgaas wrote:
> > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
> > > > > As an edge-triggered interrupts, its interrupt status should
> > > > > be cleared before dispatch to the handler of device.
> > > > 
> > > > I'm not an IRQ expert, but the reasoning that "we should clear
> > > > the MSI interrupt status before dispatching the handler because
> > > > MSI is an edge-triggered interrupt" doesn't seem completely
> > > > convincing because your code will now look like this:
> > > > 
> > > >   /* Clear the INTx */
> > > >   writel(1 << bit, port->base + PCIE_INT_STATUS);
> > > >   generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT);
> > > >   ...
> > > > 
> > > >   /* Clear MSI interrupt status */
> > > >   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
> > > >   generic_handle_domain_irq(port->inner_domain, bit);
> > > > 
> > > > You clear interrupt status before dispatching the handler for
> > > > *both* level-triggered INTx interrupts and edge-triggered MSI
> > > > interrupts.
> > > > 
> > > > So it doesn't seem that simply being edge-triggered is the
> > > > critical factor here.
> > > 
> > > This is the usual problem with these half-baked implementations.
> > > The signalling to the primary interrupt controller is level, as
> > > they take a multitude of input and (crucially) latch the MSI
> > > edges. Effectively, this is an edge-to-level converter, with all
> > > the problems that this creates.
> > > 
> > > By clearing the status *after* the handling, you lose edges that
> > > have been received and coalesced after the read of the status
> > > register. By clearing it *before*, you are acknowledging the
> > > interrupts early, and allowing them to be coalesced independently
> > > of the ones that have been received earlier.
> > > 
> > > This is however mostly an educated guess. Someone with access to
> > > the TRM should verify this.
> > 
> > Yes, as Maz said, we save the edge-interrupt status so that it
> > becomes a level-interrupt. This is similar to an edge-to-level
> > converter, so we need to clear it *before*. We found this problem
> > through a lot of experiments and tested this patch.
> 
> I thought there might be other host controllers with similar design,
> so I looked at all the other drivers and tried to figure out whether
> any others had similar problems.
> 
> The ones below look suspicious to me because they all clear some sort
> of status register *after* handling an MSI.  Can you guys take a look
> and make sure they are working correctly?
> 
>   keembay_pcie_msi_irq_handler
>     status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
>     if (status & MSI_CTRL_INT)
>       dw_handle_msi_irq
> 	generic_handle_domain_irq
>       writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> 
>   spear13xx_pcie_irq_handler
>     status = readl(&app_reg->int_sts)
>     if (status & MSI_CTRL_INT)
>       dw_handle_msi_irq
> 	generic_handle_domain_irq
>     writel(status, &app_reg->int_clr)

I think these two are fine.

The top level interrupt is only a level signal that the is something
to process. The only thing that is unclear is what the effect of
writing to that status register if MSIs are pending at that point. A
sane implementation would just ignore the write.

The actual processing is done in dw_handle_msi_irq(), reading the
PCIE_MSI_INTR0_STATUS register. This same register is then used to Ack
the interrupt, one bit at a time, as interrupts are handled (see
dw_pci_bottom_ack). Ack taking place before the handling, it makes it
safe for edge delivery.

> 
>   advk_pcie_handle_int
>     isr0_status = advk_readl(pcie, PCIE_ISR0_REG)
>     if (isr0_status & PCIE_ISR0_MSI_INT_PENDING)
>       advk_pcie_handle_msi
>         advk_readl(pcie, PCIE_MSI_STATUS_REG)
> 	advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG)
> 	generic_handle_irq
> 	advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG)

Same thing, I guess. It is just that the Ack has been open-coded.

>
>   mtk_pcie_irq_handler
>     status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG)
>     for_each_set_bit_from(irq_bit, &status, ...)
>       mtk_pcie_msi_handler
>         generic_handle_domain_irq
>       writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG)

Similar thing. The PCIE_MSI_SET_STATUS register is read first, and
then written back in the ack callback.

	M.
Bjorn Helgaas Jan. 28, 2022, 1:12 p.m. UTC | #11
On Fri, Jan 28, 2022 at 08:57:16AM +0000, Marc Zyngier wrote:
> On Thu, 27 Jan 2022 21:21:00 +0000,
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote:
> > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote:
> > > > On 2022-01-25 16:57, Bjorn Helgaas wrote:
> > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote:
> > > > > > As an edge-triggered interrupts, its interrupt status should
> > > > > > be cleared before dispatch to the handler of device.
> > > > > 
> > > > > I'm not an IRQ expert, but the reasoning that "we should clear
> > > > > the MSI interrupt status before dispatching the handler because
> > > > > MSI is an edge-triggered interrupt" doesn't seem completely
> > > > > convincing because your code will now look like this:
> > > > > 
> > > > >   /* Clear the INTx */
> > > > >   writel(1 << bit, port->base + PCIE_INT_STATUS);
> > > > >   generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT);
> > > > >   ...
> > > > > 
> > > > >   /* Clear MSI interrupt status */
> > > > >   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
> > > > >   generic_handle_domain_irq(port->inner_domain, bit);
> > > > > 
> > > > > You clear interrupt status before dispatching the handler for
> > > > > *both* level-triggered INTx interrupts and edge-triggered MSI
> > > > > interrupts.
> > > > > 
> > > > > So it doesn't seem that simply being edge-triggered is the
> > > > > critical factor here.
> > > > 
> > > > This is the usual problem with these half-baked implementations.
> > > > The signalling to the primary interrupt controller is level, as
> > > > they take a multitude of input and (crucially) latch the MSI
> > > > edges. Effectively, this is an edge-to-level converter, with all
> > > > the problems that this creates.
> > > > 
> > > > By clearing the status *after* the handling, you lose edges that
> > > > have been received and coalesced after the read of the status
> > > > register. By clearing it *before*, you are acknowledging the
> > > > interrupts early, and allowing them to be coalesced independently
> > > > of the ones that have been received earlier.
> > > > 
> > > > This is however mostly an educated guess. Someone with access to
> > > > the TRM should verify this.
> > > 
> > > Yes, as Maz said, we save the edge-interrupt status so that it
> > > becomes a level-interrupt. This is similar to an edge-to-level
> > > converter, so we need to clear it *before*. We found this problem
> > > through a lot of experiments and tested this patch.
> > 
> > I thought there might be other host controllers with similar design,
> > so I looked at all the other drivers and tried to figure out whether
> > any others had similar problems.
> > 
> > The ones below look suspicious to me because they all clear some sort
> > of status register *after* handling an MSI.  Can you guys take a look
> > and make sure they are working correctly?
> > 
> >   keembay_pcie_msi_irq_handler
> >     status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> >     if (status & MSI_CTRL_INT)
> >       dw_handle_msi_irq
> > 	generic_handle_domain_irq
> >       writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> > 
> >   spear13xx_pcie_irq_handler
> >     status = readl(&app_reg->int_sts)
> >     if (status & MSI_CTRL_INT)
> >       dw_handle_msi_irq
> > 	generic_handle_domain_irq
> >     writel(status, &app_reg->int_clr)
> 
> I think these two are fine.
> 
> The top level interrupt is only a level signal that the is something
> to process. The only thing that is unclear is what the effect of
> writing to that status register if MSIs are pending at that point. A
> sane implementation would just ignore the write.
> 
> The actual processing is done in dw_handle_msi_irq(), reading the
> PCIE_MSI_INTR0_STATUS register. This same register is then used to Ack
> the interrupt, one bit at a time, as interrupts are handled (see
> dw_pci_bottom_ack). Ack taking place before the handling, it makes it
> safe for edge delivery.
> 
> >   advk_pcie_handle_int
> >     isr0_status = advk_readl(pcie, PCIE_ISR0_REG)
> >     if (isr0_status & PCIE_ISR0_MSI_INT_PENDING)
> >       advk_pcie_handle_msi
> >         advk_readl(pcie, PCIE_MSI_STATUS_REG)
> > 	advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG)
> > 	generic_handle_irq
> > 	advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG)
> 
> Same thing, I guess. It is just that the Ack has been open-coded.
> 
> >   mtk_pcie_irq_handler
> >     status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG)
> >     for_each_set_bit_from(irq_bit, &status, ...)
> >       mtk_pcie_msi_handler
> >         generic_handle_domain_irq
> >       writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG)
> 
> Similar thing. The PCIE_MSI_SET_STATUS register is read first, and
> then written back in the ack callback.

Thanks a lot for taking a look at these, Marc!  Is there anything we
can do to make all these drivers/pci/controller/* drivers more
consistent and easier to review?  I found it very difficult to look
across all of them and find similar design patterns.

Bjorn
Marc Zyngier Jan. 28, 2022, 3:09 p.m. UTC | #12
On Fri, 28 Jan 2022 13:12:50 +0000,
Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> On Fri, Jan 28, 2022 at 08:57:16AM +0000, Marc Zyngier wrote:
> > On Thu, 27 Jan 2022 21:21:00 +0000,
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> Thanks a lot for taking a look at these, Marc!  Is there anything we
> can do to make all these drivers/pci/controller/* drivers more
> consistent and easier to review?  I found it very difficult to look
> across all of them and find similar design patterns.

It looks to me that a number of them are just wrapping the same
underlying IP block, most likely the DW controller (this looks to be
the case for at least the first two).

They probably all use different register and bit offsets, but it
should be possible to write a library abstracting all these details
and have a common handling for most of them. This would certainly go a
long way in making things more solid.

	M.
qizhong cheng Feb. 8, 2022, 7:08 a.m. UTC | #13
On Fri, 2022-01-28 at 15:58 +0800, Jianjun Wang wrote:
> Hi Bjorn,
> 
> On Thu, 2022-01-27 at 15:21 -0600, Bjorn Helgaas wrote:
> > [+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun]
> > 
> > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote:
> > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote:
> > > > On 2022-01-25 16:57, Bjorn Helgaas wrote:
> > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng
> > > > > wrote:
> > > > > > As an edge-triggered interrupts, its interrupt status
> > > > > > should
> > > > > > be cleared before dispatch to the handler of device.
> > > > > 
> > > > > I'm not an IRQ expert, but the reasoning that "we should
> > > > > clear
> > > > > the MSI interrupt status before dispatching the handler
> > > > > because
> > > > > MSI is an edge-triggered interrupt" doesn't seem completely
> > > > > convincing because your code will now look like this:
> > > > > 
> > > > >   /* Clear the INTx */
> > > > >   writel(1 << bit, port->base + PCIE_INT_STATUS);
> > > > >   generic_handle_domain_irq(port->irq_domain, bit -
> > > > > INTX_SHIFT);
> > > > >   ...
> > > > > 
> > > > >   /* Clear MSI interrupt status */
> > > > >   writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
> > > > >   generic_handle_domain_irq(port->inner_domain, bit);
> > > > > 
> > > > > You clear interrupt status before dispatching the handler for
> > > > > *both* level-triggered INTx interrupts and edge-triggered MSI
> > > > > interrupts.
> > > > > 
> > > > > So it doesn't seem that simply being edge-triggered is the
> > > > > critical factor here.
> > > > 
> > > > This is the usual problem with these half-baked
> > > > implementations.
> > > > The signalling to the primary interrupt controller is level, as
> > > > they take a multitude of input and (crucially) latch the MSI
> > > > edges. Effectively, this is an edge-to-level converter, with
> > > > all
> > > > the problems that this creates.
> > > > 
> > > > By clearing the status *after* the handling, you lose edges
> > > > that
> > > > have been received and coalesced after the read of the status
> > > > register. By clearing it *before*, you are acknowledging the
> > > > interrupts early, and allowing them to be coalesced
> > > > independently
> > > > of the ones that have been received earlier.
> > > > 
> > > > This is however mostly an educated guess. Someone with access
> > > > to
> > > > the TRM should verify this.
> > > 
> > > Yes, as Maz said, we save the edge-interrupt status so that it
> > > becomes a level-interrupt. This is similar to an edge-to-level
> > > converter, so we need to clear it *before*. We found this problem
> > > through a lot of experiments and tested this patch.
> > 
> > I thought there might be other host controllers with similar
> > design,
> > so I looked at all the other drivers and tried to figure out
> > whether
> > any others had similar problems.
> > 
> > The ones below look suspicious to me because they all clear some
> > sort
> > of status register *after* handling an MSI.  Can you guys take a
> > look
> > and make sure they are working correctly?
> > 
> >   keembay_pcie_msi_irq_handler
> >     status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> >     if (status & MSI_CTRL_INT)
> >       dw_handle_msi_irq
> > 	generic_handle_domain_irq
> >       writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS)
> > 
> >   spear13xx_pcie_irq_handler
> >     status = readl(&app_reg->int_sts)
> >     if (status & MSI_CTRL_INT)
> >       dw_handle_msi_irq
> > 	generic_handle_domain_irq
> >     writel(status, &app_reg->int_clr)
> > 
> >   advk_pcie_handle_int
> >     isr0_status = advk_readl(pcie, PCIE_ISR0_REG)
> >     if (isr0_status & PCIE_ISR0_MSI_INT_PENDING)
> >       advk_pcie_handle_msi
> >         advk_readl(pcie, PCIE_MSI_STATUS_REG)
> > 	advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG)
> > 	generic_handle_irq
> > 	advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG)
> > 
> >   mtk_pcie_irq_handler
> >     status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG)
> >     for_each_set_bit_from(irq_bit, &status, ...)
> >       mtk_pcie_msi_handler
> >         generic_handle_domain_irq
> >       writel_relaxed(BIT(irq_bit), pcie->base +
> > PCIE_INT_STATUS_REG)
> 
> Thanks for mention that. In the hardware corresponding to pcie-
> mediatek-gen3.c, the interrupt status in PCIE_INT_STATUS_REG cannot
> be
> cleared if the MSI status remaining in the register of msi_set, so we
> have to clear it after handling the MSI.
> 
> I guess the root cause of this patch is the interrupt status can be
> cleared even the MSI status still remaining, hence that if there are
> some MSIs received while clearing the interrupt status, these MSIs
> cannot be serviced.
> 
> We will discuss and test internally and update the results later,
> thanks for your review.
> 
> Thanks.
> 
> > 
> > Bjorn
> 
> 

Sorry for the late reply. Thanks for your comment. I will update
subject and add commit log in the next version.

The interrupt status can be cleared even the MSI status still
remaining, as an edge-triggered interrupts, its interrupt status should
be cleared before dispatching handler to capture the next interrupt.

The design of MSI hardware block diagram is as follows:
      +-----+
      | GIC |
      +-----+
         ^
         |
  +-----------------+
  | INT_STATUS |
  +-----------------+
         ^
         | (edge-triggered)
  +-----------------+
  | MSI_STATUS | 
  +-----------------+
         ^
         |
  +-----------------+
  | EP send MSI |
  +-----------------+

Thanks
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c
index 2f3f974977a3..705ea33758b1 100644
--- a/drivers/pci/controller/pcie-mediatek.c
+++ b/drivers/pci/controller/pcie-mediatek.c
@@ -624,12 +624,12 @@  static void mtk_pcie_intr_handler(struct irq_desc *desc)
 		if (status & MSI_STATUS){
 			unsigned long imsi_status;
 
+			/* Clear MSI interrupt status */
+			writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
 			while ((imsi_status = readl(port->base + PCIE_IMSI_STATUS))) {
 				for_each_set_bit(bit, &imsi_status, MTK_MSI_IRQS_NUM)
 					generic_handle_domain_irq(port->inner_domain, bit);
 			}
-			/* Clear MSI interrupt status */
-			writel(MSI_STATUS, port->base + PCIE_INT_STATUS);
 		}
 	}