Message ID | 20220123033306.29799-1-qizhong.cheng@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PCI: mediatek: Change MSI interrupt processing sequence | expand |
Hi, On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng <qizhong.cheng@mediatek.com> wrote: > > As an edge-triggered interrupts, its interrupt status should be cleared > before dispatch to the handler of device. I'm curious, is this just a code correction or are there real world cases where something fails? Also, please add a Fixes tag and maybe Cc stable so this gets backported automatically. ChenYu > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com> > --- > drivers/pci/controller/pcie-mediatek.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c > index 2f3f974977a3..705ea33758b1 100644 > --- a/drivers/pci/controller/pcie-mediatek.c > +++ b/drivers/pci/controller/pcie-mediatek.c > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct irq_desc *desc) > if (status & MSI_STATUS){ > unsigned long imsi_status; > > + /* Clear MSI interrupt status */ > + writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > while ((imsi_status = readl(port->base + PCIE_IMSI_STATUS))) { > for_each_set_bit(bit, &imsi_status, MTK_MSI_IRQS_NUM) > generic_handle_domain_irq(port->inner_domain, bit); > } > - /* Clear MSI interrupt status */ > - writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > } > } > > -- > 2.25.1 > > > _______________________________________________ > Linux-mediatek mailing list > Linux-mediatek@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek
Hi chenYu, On Mon, 2022-01-24 at 11:12 +0800, Chen-Yu Tsai wrote: > Hi, > > On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng > <qizhong.cheng@mediatek.com> wrote: > > > > As an edge-triggered interrupts, its interrupt status should be > > cleared > > before dispatch to the handler of device. > > I'm curious, is this just a code correction or are there real world > cases where something fails? Yes, we found a failure when used iperf tool for wifi and network cards performance testing. The function of "while" has just been executed, and the EP sent an MSI before executing "Clear MSI interrupt status". After executing "Clear MSI interrupt status", this edge-triggered interrupt status is cleared, but EP is still waiting for interrupt handler. > > Also, please add a Fixes tag and maybe Cc stable so this gets > backported > automatically. Thanks for your review, I will fix it in the next version. > > ChenYu > > > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com> > > --- > > drivers/pci/controller/pcie-mediatek.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/controller/pcie-mediatek.c > > b/drivers/pci/controller/pcie-mediatek.c > > index 2f3f974977a3..705ea33758b1 100644 > > --- a/drivers/pci/controller/pcie-mediatek.c > > +++ b/drivers/pci/controller/pcie-mediatek.c > > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct > > irq_desc *desc) > > if (status & MSI_STATUS){ > > unsigned long imsi_status; > > > > + /* Clear MSI interrupt status */ > > + writel(MSI_STATUS, port->base + > > PCIE_INT_STATUS); > > while ((imsi_status = readl(port->base + > > PCIE_IMSI_STATUS))) { > > for_each_set_bit(bit, &imsi_status, > > MTK_MSI_IRQS_NUM) > > generic_handle_domain_irq(p > > ort->inner_domain, bit); > > } > > - /* Clear MSI interrupt status */ > > - writel(MSI_STATUS, port->base + > > PCIE_INT_STATUS); > > } > > } > > > > -- > > 2.25.1 > > > > > > _______________________________________________ > > Linux-mediatek mailing list > > Linux-mediatek@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-mediatek
On Mon, Jan 24, 2022 at 2:27 PM qizhong.cheng <qizhong.cheng@mediatek.com> wrote: > > Hi chenYu, > > On Mon, 2022-01-24 at 11:12 +0800, Chen-Yu Tsai wrote: > > Hi, > > > > On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng > > <qizhong.cheng@mediatek.com> wrote: > > > > > > As an edge-triggered interrupts, its interrupt status should be > > > cleared > > > before dispatch to the handler of device. > > > > I'm curious, is this just a code correction or are there real world > > cases where something fails? > > Yes, we found a failure when used iperf tool for wifi and network cards > performance testing. The function of "while" has just been executed, > and the EP sent an MSI before executing "Clear MSI interrupt status". > After executing "Clear MSI interrupt status", this edge-triggered > interrupt status is cleared, but EP is still waiting for interrupt > handler. Can you also include this in the commit log? It would be nice to record the exact scenario that this fix targets. ChenYu > > > > Also, please add a Fixes tag and maybe Cc stable so this gets > > backported > > automatically. > > Thanks for your review, I will fix it in the next version. > > > > > ChenYu > > > > > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com> > > > --- > > > drivers/pci/controller/pcie-mediatek.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/pci/controller/pcie-mediatek.c > > > b/drivers/pci/controller/pcie-mediatek.c > > > index 2f3f974977a3..705ea33758b1 100644 > > > --- a/drivers/pci/controller/pcie-mediatek.c > > > +++ b/drivers/pci/controller/pcie-mediatek.c > > > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct > > > irq_desc *desc) > > > if (status & MSI_STATUS){ > > > unsigned long imsi_status; > > > > > > + /* Clear MSI interrupt status */ > > > + writel(MSI_STATUS, port->base + > > > PCIE_INT_STATUS); > > > while ((imsi_status = readl(port->base + > > > PCIE_IMSI_STATUS))) { > > > for_each_set_bit(bit, &imsi_status, > > > MTK_MSI_IRQS_NUM) > > > generic_handle_domain_irq(p > > > ort->inner_domain, bit); > > > } > > > - /* Clear MSI interrupt status */ > > > - writel(MSI_STATUS, port->base + > > > PCIE_INT_STATUS); > > > } > > > } > > > > > > -- > > > 2.25.1 > > > > > > > > > _______________________________________________ > > > Linux-mediatek mailing list > > > Linux-mediatek@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-mediatek >
On Mon, 2022-01-24 at 14:55 +0800, Chen-Yu Tsai wrote: > On Mon, Jan 24, 2022 at 2:27 PM qizhong.cheng > <qizhong.cheng@mediatek.com> wrote: > > > > Hi chenYu, > > > > On Mon, 2022-01-24 at 11:12 +0800, Chen-Yu Tsai wrote: > > > Hi, > > > > > > On Sun, Jan 23, 2022 at 11:34 AM qizhong cheng > > > <qizhong.cheng@mediatek.com> wrote: > > > > > > > > As an edge-triggered interrupts, its interrupt status should be > > > > cleared > > > > before dispatch to the handler of device. > > > > > > I'm curious, is this just a code correction or are there real > > > world > > > cases where something fails? > > > > Yes, we found a failure when used iperf tool for wifi and network > > cards > > performance testing. The function of "while" has just been > > executed, > > and the EP sent an MSI before executing "Clear MSI interrupt > > status". > > After executing "Clear MSI interrupt status", this edge-triggered > > interrupt status is cleared, but EP is still waiting for interrupt > > handler. > > Can you also include this in the commit log? It would be nice to > record > the exact scenario that this fix targets. Thanks for your suggestion. I will add commit log in the next version for others review. > > ChenYu > > > > > > > Also, please add a Fixes tag and maybe Cc stable so this gets > > > backported > > > automatically. > > > > Thanks for your review, I will fix it in the next version. > > > > > > > > ChenYu > > > > > > > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com> > > > > --- > > > > drivers/pci/controller/pcie-mediatek.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/pci/controller/pcie-mediatek.c > > > > b/drivers/pci/controller/pcie-mediatek.c > > > > index 2f3f974977a3..705ea33758b1 100644 > > > > --- a/drivers/pci/controller/pcie-mediatek.c > > > > +++ b/drivers/pci/controller/pcie-mediatek.c > > > > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct > > > > irq_desc *desc) > > > > if (status & MSI_STATUS){ > > > > unsigned long imsi_status; > > > > > > > > + /* Clear MSI interrupt status */ > > > > + writel(MSI_STATUS, port->base + > > > > PCIE_INT_STATUS); > > > > while ((imsi_status = readl(port->base > > > > + > > > > PCIE_IMSI_STATUS))) { > > > > for_each_set_bit(bit, > > > > &imsi_status, > > > > MTK_MSI_IRQS_NUM) > > > > generic_handle_domain_i > > > > rq(p > > > > ort->inner_domain, bit); > > > > } > > > > - /* Clear MSI interrupt status */ > > > > - writel(MSI_STATUS, port->base + > > > > PCIE_INT_STATUS); > > > > } > > > > } > > > > > > > > -- > > > > 2.25.1 > > > > > > > > > > > > _______________________________________________ > > > > Linux-mediatek mailing list > > > > Linux-mediatek@lists.infradead.org > > > > http://lists.infradead.org/mailman/listinfo/linux-mediatek
All patches change *something*. Can you update the subject line so it says something specific about the change? Maybe something like "Clear MSI status before dispatching handler"? On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: > As an edge-triggered interrupts, its interrupt status should be cleared > before dispatch to the handler of device. I'm not an IRQ expert, but the reasoning that "we should clear the MSI interrupt status before dispatching the handler because MSI is an edge-triggered interrupt" doesn't seem completely convincing because your code will now look like this: /* Clear the INTx */ writel(1 << bit, port->base + PCIE_INT_STATUS); generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT); ... /* Clear MSI interrupt status */ writel(MSI_STATUS, port->base + PCIE_INT_STATUS); generic_handle_domain_irq(port->inner_domain, bit); You clear interrupt status before dispatching the handler for *both* level-triggered INTx interrupts and edge-triggered MSI interrupts. So it doesn't seem that simply being edge-triggered is the critical factor here. > Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com> > --- > drivers/pci/controller/pcie-mediatek.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c > index 2f3f974977a3..705ea33758b1 100644 > --- a/drivers/pci/controller/pcie-mediatek.c > +++ b/drivers/pci/controller/pcie-mediatek.c > @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct irq_desc *desc) > if (status & MSI_STATUS){ > unsigned long imsi_status; > > + /* Clear MSI interrupt status */ > + writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > while ((imsi_status = readl(port->base + PCIE_IMSI_STATUS))) { > for_each_set_bit(bit, &imsi_status, MTK_MSI_IRQS_NUM) > generic_handle_domain_irq(port->inner_domain, bit); > } > - /* Clear MSI interrupt status */ > - writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > } > } > > -- > 2.25.1 > > > _______________________________________________ > Linux-mediatek mailing list > Linux-mediatek@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek
On 2022-01-25 16:57, Bjorn Helgaas wrote: > All patches change *something*. Can you update the subject line so it > says something specific about the change? > > Maybe something like "Clear MSI status before dispatching handler"? > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: >> As an edge-triggered interrupts, its interrupt status should be >> cleared >> before dispatch to the handler of device. > > I'm not an IRQ expert, but the reasoning that "we should clear the MSI > interrupt status before dispatching the handler because MSI is an > edge-triggered interrupt" doesn't seem completely convincing because > your code will now look like this: > > /* Clear the INTx */ > writel(1 << bit, port->base + PCIE_INT_STATUS); > generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT); > ... > > /* Clear MSI interrupt status */ > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > generic_handle_domain_irq(port->inner_domain, bit); > > You clear interrupt status before dispatching the handler for *both* > level-triggered INTx interrupts and edge-triggered MSI interrupts. > > So it doesn't seem that simply being edge-triggered is the critical > factor here. This is the usual problem with these half-baked implementations. The signalling to the primary interrupt controller is level, as they take a multitude of input and (crucially) latch the MSI edges. Effectively, this is an edge-to-level converter, with all the problems that this creates. By clearing the status *after* the handling, you lose edges that have been received and coalesced after the read of the status register. By clearing it *before*, you are acknowledging the interrupts early, and allowing them to be coalesced independently of the ones that have been received earlier. This is however mostly an educated guess. Someone with access to the TRM should verify this. Thanks, M.
On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote: > On 2022-01-25 16:57, Bjorn Helgaas wrote: > > All patches change *something*. Can you update the subject line so > > it > > says something specific about the change? > > > > Maybe something like "Clear MSI status before dispatching handler"? > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: > > > As an edge-triggered interrupts, its interrupt status should be > > > cleared > > > before dispatch to the handler of device. > > > > I'm not an IRQ expert, but the reasoning that "we should clear the > > MSI > > interrupt status before dispatching the handler because MSI is an > > edge-triggered interrupt" doesn't seem completely convincing > > because > > your code will now look like this: > > > > /* Clear the INTx */ > > writel(1 << bit, port->base + PCIE_INT_STATUS); > > generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT); > > ... > > > > /* Clear MSI interrupt status */ > > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > > generic_handle_domain_irq(port->inner_domain, bit); > > > > You clear interrupt status before dispatching the handler for > > *both* > > level-triggered INTx interrupts and edge-triggered MSI interrupts. > > > > So it doesn't seem that simply being edge-triggered is the critical > > factor here. > > This is the usual problem with these half-baked implementations. > The signalling to the primary interrupt controller is level, as > they take a multitude of input and (crucially) latch the MSI > edges. Effectively, this is an edge-to-level converter, with > all the problems that this creates. > > By clearing the status *after* the handling, you lose edges that > have been received and coalesced after the read of the status > register. By clearing it *before*, you are acknowledging the > interrupts early, and allowing them to be coalesced independently > of the ones that have been received earlier. > > This is however mostly an educated guess. Someone with access > to the TRM should verify this. > Yes, as Maz said, we save the edge-interrupt status so that it becomes a level-interrupt. This is similar to an edge-to-level converter, so we need to clear it *before*. We found this problem through a lot of experiments and tested this patch. Thanks Helgaas and Maz for your comment. -- Jazz ain't dead, dreams haven't parted with you.
[+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun] On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote: > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote: > > On 2022-01-25 16:57, Bjorn Helgaas wrote: > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: > > > > As an edge-triggered interrupts, its interrupt status should > > > > be cleared before dispatch to the handler of device. > > > > > > I'm not an IRQ expert, but the reasoning that "we should clear > > > the MSI interrupt status before dispatching the handler because > > > MSI is an edge-triggered interrupt" doesn't seem completely > > > convincing because your code will now look like this: > > > > > > /* Clear the INTx */ > > > writel(1 << bit, port->base + PCIE_INT_STATUS); > > > generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT); > > > ... > > > > > > /* Clear MSI interrupt status */ > > > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > > > generic_handle_domain_irq(port->inner_domain, bit); > > > > > > You clear interrupt status before dispatching the handler for > > > *both* level-triggered INTx interrupts and edge-triggered MSI > > > interrupts. > > > > > > So it doesn't seem that simply being edge-triggered is the > > > critical factor here. > > > > This is the usual problem with these half-baked implementations. > > The signalling to the primary interrupt controller is level, as > > they take a multitude of input and (crucially) latch the MSI > > edges. Effectively, this is an edge-to-level converter, with all > > the problems that this creates. > > > > By clearing the status *after* the handling, you lose edges that > > have been received and coalesced after the read of the status > > register. By clearing it *before*, you are acknowledging the > > interrupts early, and allowing them to be coalesced independently > > of the ones that have been received earlier. > > > > This is however mostly an educated guess. Someone with access to > > the TRM should verify this. > > Yes, as Maz said, we save the edge-interrupt status so that it > becomes a level-interrupt. This is similar to an edge-to-level > converter, so we need to clear it *before*. We found this problem > through a lot of experiments and tested this patch. I thought there might be other host controllers with similar design, so I looked at all the other drivers and tried to figure out whether any others had similar problems. The ones below look suspicious to me because they all clear some sort of status register *after* handling an MSI. Can you guys take a look and make sure they are working correctly? keembay_pcie_msi_irq_handler status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) if (status & MSI_CTRL_INT) dw_handle_msi_irq generic_handle_domain_irq writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) spear13xx_pcie_irq_handler status = readl(&app_reg->int_sts) if (status & MSI_CTRL_INT) dw_handle_msi_irq generic_handle_domain_irq writel(status, &app_reg->int_clr) advk_pcie_handle_int isr0_status = advk_readl(pcie, PCIE_ISR0_REG) if (isr0_status & PCIE_ISR0_MSI_INT_PENDING) advk_pcie_handle_msi advk_readl(pcie, PCIE_MSI_STATUS_REG) advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG) generic_handle_irq advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG) mtk_pcie_irq_handler status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG) for_each_set_bit_from(irq_bit, &status, ...) mtk_pcie_msi_handler generic_handle_domain_irq writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG) Bjorn
Hi Bjorn, On Thu, 2022-01-27 at 15:21 -0600, Bjorn Helgaas wrote: > [+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun] > > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote: > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote: > > > On 2022-01-25 16:57, Bjorn Helgaas wrote: > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: > > > > > As an edge-triggered interrupts, its interrupt status should > > > > > be cleared before dispatch to the handler of device. > > > > > > > > I'm not an IRQ expert, but the reasoning that "we should clear > > > > the MSI interrupt status before dispatching the handler because > > > > MSI is an edge-triggered interrupt" doesn't seem completely > > > > convincing because your code will now look like this: > > > > > > > > /* Clear the INTx */ > > > > writel(1 << bit, port->base + PCIE_INT_STATUS); > > > > generic_handle_domain_irq(port->irq_domain, bit - > > > > INTX_SHIFT); > > > > ... > > > > > > > > /* Clear MSI interrupt status */ > > > > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > > > > generic_handle_domain_irq(port->inner_domain, bit); > > > > > > > > You clear interrupt status before dispatching the handler for > > > > *both* level-triggered INTx interrupts and edge-triggered MSI > > > > interrupts. > > > > > > > > So it doesn't seem that simply being edge-triggered is the > > > > critical factor here. > > > > > > This is the usual problem with these half-baked implementations. > > > The signalling to the primary interrupt controller is level, as > > > they take a multitude of input and (crucially) latch the MSI > > > edges. Effectively, this is an edge-to-level converter, with all > > > the problems that this creates. > > > > > > By clearing the status *after* the handling, you lose edges that > > > have been received and coalesced after the read of the status > > > register. By clearing it *before*, you are acknowledging the > > > interrupts early, and allowing them to be coalesced independently > > > of the ones that have been received earlier. > > > > > > This is however mostly an educated guess. Someone with access to > > > the TRM should verify this. > > > > Yes, as Maz said, we save the edge-interrupt status so that it > > becomes a level-interrupt. This is similar to an edge-to-level > > converter, so we need to clear it *before*. We found this problem > > through a lot of experiments and tested this patch. > > I thought there might be other host controllers with similar design, > so I looked at all the other drivers and tried to figure out whether > any others had similar problems. > > The ones below look suspicious to me because they all clear some sort > of status register *after* handling an MSI. Can you guys take a look > and make sure they are working correctly? > > keembay_pcie_msi_irq_handler > status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > if (status & MSI_CTRL_INT) > dw_handle_msi_irq > generic_handle_domain_irq > writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > spear13xx_pcie_irq_handler > status = readl(&app_reg->int_sts) > if (status & MSI_CTRL_INT) > dw_handle_msi_irq > generic_handle_domain_irq > writel(status, &app_reg->int_clr) > > advk_pcie_handle_int > isr0_status = advk_readl(pcie, PCIE_ISR0_REG) > if (isr0_status & PCIE_ISR0_MSI_INT_PENDING) > advk_pcie_handle_msi > advk_readl(pcie, PCIE_MSI_STATUS_REG) > advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG) > generic_handle_irq > advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG) > > mtk_pcie_irq_handler > status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG) > for_each_set_bit_from(irq_bit, &status, ...) > mtk_pcie_msi_handler > generic_handle_domain_irq > writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG) Thanks for mention that. In the hardware corresponding to pcie- mediatek-gen3.c, the interrupt status in PCIE_INT_STATUS_REG cannot be cleared if the MSI status remaining in the register of msi_set, so we have to clear it after handling the MSI. I guess the root cause of this patch is the interrupt status can be cleared even the MSI status still remaining, hence that if there are some MSIs received while clearing the interrupt status, these MSIs cannot be serviced. We will discuss and test internally and update the results later, thanks for your review. Thanks. > > Bjorn
On Thu, 27 Jan 2022 21:21:00 +0000, Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun] > > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote: > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote: > > > On 2022-01-25 16:57, Bjorn Helgaas wrote: > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: > > > > > As an edge-triggered interrupts, its interrupt status should > > > > > be cleared before dispatch to the handler of device. > > > > > > > > I'm not an IRQ expert, but the reasoning that "we should clear > > > > the MSI interrupt status before dispatching the handler because > > > > MSI is an edge-triggered interrupt" doesn't seem completely > > > > convincing because your code will now look like this: > > > > > > > > /* Clear the INTx */ > > > > writel(1 << bit, port->base + PCIE_INT_STATUS); > > > > generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT); > > > > ... > > > > > > > > /* Clear MSI interrupt status */ > > > > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > > > > generic_handle_domain_irq(port->inner_domain, bit); > > > > > > > > You clear interrupt status before dispatching the handler for > > > > *both* level-triggered INTx interrupts and edge-triggered MSI > > > > interrupts. > > > > > > > > So it doesn't seem that simply being edge-triggered is the > > > > critical factor here. > > > > > > This is the usual problem with these half-baked implementations. > > > The signalling to the primary interrupt controller is level, as > > > they take a multitude of input and (crucially) latch the MSI > > > edges. Effectively, this is an edge-to-level converter, with all > > > the problems that this creates. > > > > > > By clearing the status *after* the handling, you lose edges that > > > have been received and coalesced after the read of the status > > > register. By clearing it *before*, you are acknowledging the > > > interrupts early, and allowing them to be coalesced independently > > > of the ones that have been received earlier. > > > > > > This is however mostly an educated guess. Someone with access to > > > the TRM should verify this. > > > > Yes, as Maz said, we save the edge-interrupt status so that it > > becomes a level-interrupt. This is similar to an edge-to-level > > converter, so we need to clear it *before*. We found this problem > > through a lot of experiments and tested this patch. > > I thought there might be other host controllers with similar design, > so I looked at all the other drivers and tried to figure out whether > any others had similar problems. > > The ones below look suspicious to me because they all clear some sort > of status register *after* handling an MSI. Can you guys take a look > and make sure they are working correctly? > > keembay_pcie_msi_irq_handler > status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > if (status & MSI_CTRL_INT) > dw_handle_msi_irq > generic_handle_domain_irq > writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > spear13xx_pcie_irq_handler > status = readl(&app_reg->int_sts) > if (status & MSI_CTRL_INT) > dw_handle_msi_irq > generic_handle_domain_irq > writel(status, &app_reg->int_clr) I think these two are fine. The top level interrupt is only a level signal that the is something to process. The only thing that is unclear is what the effect of writing to that status register if MSIs are pending at that point. A sane implementation would just ignore the write. The actual processing is done in dw_handle_msi_irq(), reading the PCIE_MSI_INTR0_STATUS register. This same register is then used to Ack the interrupt, one bit at a time, as interrupts are handled (see dw_pci_bottom_ack). Ack taking place before the handling, it makes it safe for edge delivery. > > advk_pcie_handle_int > isr0_status = advk_readl(pcie, PCIE_ISR0_REG) > if (isr0_status & PCIE_ISR0_MSI_INT_PENDING) > advk_pcie_handle_msi > advk_readl(pcie, PCIE_MSI_STATUS_REG) > advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG) > generic_handle_irq > advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG) Same thing, I guess. It is just that the Ack has been open-coded. > > mtk_pcie_irq_handler > status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG) > for_each_set_bit_from(irq_bit, &status, ...) > mtk_pcie_msi_handler > generic_handle_domain_irq > writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG) Similar thing. The PCIE_MSI_SET_STATUS register is read first, and then written back in the ack callback. M.
On Fri, Jan 28, 2022 at 08:57:16AM +0000, Marc Zyngier wrote: > On Thu, 27 Jan 2022 21:21:00 +0000, > Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote: > > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote: > > > > On 2022-01-25 16:57, Bjorn Helgaas wrote: > > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: > > > > > > As an edge-triggered interrupts, its interrupt status should > > > > > > be cleared before dispatch to the handler of device. > > > > > > > > > > I'm not an IRQ expert, but the reasoning that "we should clear > > > > > the MSI interrupt status before dispatching the handler because > > > > > MSI is an edge-triggered interrupt" doesn't seem completely > > > > > convincing because your code will now look like this: > > > > > > > > > > /* Clear the INTx */ > > > > > writel(1 << bit, port->base + PCIE_INT_STATUS); > > > > > generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT); > > > > > ... > > > > > > > > > > /* Clear MSI interrupt status */ > > > > > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > > > > > generic_handle_domain_irq(port->inner_domain, bit); > > > > > > > > > > You clear interrupt status before dispatching the handler for > > > > > *both* level-triggered INTx interrupts and edge-triggered MSI > > > > > interrupts. > > > > > > > > > > So it doesn't seem that simply being edge-triggered is the > > > > > critical factor here. > > > > > > > > This is the usual problem with these half-baked implementations. > > > > The signalling to the primary interrupt controller is level, as > > > > they take a multitude of input and (crucially) latch the MSI > > > > edges. Effectively, this is an edge-to-level converter, with all > > > > the problems that this creates. > > > > > > > > By clearing the status *after* the handling, you lose edges that > > > > have been received and coalesced after the read of the status > > > > register. By clearing it *before*, you are acknowledging the > > > > interrupts early, and allowing them to be coalesced independently > > > > of the ones that have been received earlier. > > > > > > > > This is however mostly an educated guess. Someone with access to > > > > the TRM should verify this. > > > > > > Yes, as Maz said, we save the edge-interrupt status so that it > > > becomes a level-interrupt. This is similar to an edge-to-level > > > converter, so we need to clear it *before*. We found this problem > > > through a lot of experiments and tested this patch. > > > > I thought there might be other host controllers with similar design, > > so I looked at all the other drivers and tried to figure out whether > > any others had similar problems. > > > > The ones below look suspicious to me because they all clear some sort > > of status register *after* handling an MSI. Can you guys take a look > > and make sure they are working correctly? > > > > keembay_pcie_msi_irq_handler > > status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > if (status & MSI_CTRL_INT) > > dw_handle_msi_irq > > generic_handle_domain_irq > > writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > > > spear13xx_pcie_irq_handler > > status = readl(&app_reg->int_sts) > > if (status & MSI_CTRL_INT) > > dw_handle_msi_irq > > generic_handle_domain_irq > > writel(status, &app_reg->int_clr) > > I think these two are fine. > > The top level interrupt is only a level signal that the is something > to process. The only thing that is unclear is what the effect of > writing to that status register if MSIs are pending at that point. A > sane implementation would just ignore the write. > > The actual processing is done in dw_handle_msi_irq(), reading the > PCIE_MSI_INTR0_STATUS register. This same register is then used to Ack > the interrupt, one bit at a time, as interrupts are handled (see > dw_pci_bottom_ack). Ack taking place before the handling, it makes it > safe for edge delivery. > > > advk_pcie_handle_int > > isr0_status = advk_readl(pcie, PCIE_ISR0_REG) > > if (isr0_status & PCIE_ISR0_MSI_INT_PENDING) > > advk_pcie_handle_msi > > advk_readl(pcie, PCIE_MSI_STATUS_REG) > > advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG) > > generic_handle_irq > > advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG) > > Same thing, I guess. It is just that the Ack has been open-coded. > > > mtk_pcie_irq_handler > > status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG) > > for_each_set_bit_from(irq_bit, &status, ...) > > mtk_pcie_msi_handler > > generic_handle_domain_irq > > writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG) > > Similar thing. The PCIE_MSI_SET_STATUS register is read first, and > then written back in the ack callback. Thanks a lot for taking a look at these, Marc! Is there anything we can do to make all these drivers/pci/controller/* drivers more consistent and easier to review? I found it very difficult to look across all of them and find similar design patterns. Bjorn
On Fri, 28 Jan 2022 13:12:50 +0000, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Jan 28, 2022 at 08:57:16AM +0000, Marc Zyngier wrote: > > On Thu, 27 Jan 2022 21:21:00 +0000, > > Bjorn Helgaas <helgaas@kernel.org> wrote: > > Thanks a lot for taking a look at these, Marc! Is there anything we > can do to make all these drivers/pci/controller/* drivers more > consistent and easier to review? I found it very difficult to look > across all of them and find similar design patterns. It looks to me that a number of them are just wrapping the same underlying IP block, most likely the DW controller (this looks to be the case for at least the first two). They probably all use different register and bit offsets, but it should be possible to write a library abstracting all these details and have a common handling for most of them. This would certainly go a long way in making things more solid. M.
On Fri, 2022-01-28 at 15:58 +0800, Jianjun Wang wrote: > Hi Bjorn, > > On Thu, 2022-01-27 at 15:21 -0600, Bjorn Helgaas wrote: > > [+cc Srikanth, Pratyush, Thomas, Pali, Ryder, Jianjun] > > > > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote: > > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote: > > > > On 2022-01-25 16:57, Bjorn Helgaas wrote: > > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng > > > > > wrote: > > > > > > As an edge-triggered interrupts, its interrupt status > > > > > > should > > > > > > be cleared before dispatch to the handler of device. > > > > > > > > > > I'm not an IRQ expert, but the reasoning that "we should > > > > > clear > > > > > the MSI interrupt status before dispatching the handler > > > > > because > > > > > MSI is an edge-triggered interrupt" doesn't seem completely > > > > > convincing because your code will now look like this: > > > > > > > > > > /* Clear the INTx */ > > > > > writel(1 << bit, port->base + PCIE_INT_STATUS); > > > > > generic_handle_domain_irq(port->irq_domain, bit - > > > > > INTX_SHIFT); > > > > > ... > > > > > > > > > > /* Clear MSI interrupt status */ > > > > > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > > > > > generic_handle_domain_irq(port->inner_domain, bit); > > > > > > > > > > You clear interrupt status before dispatching the handler for > > > > > *both* level-triggered INTx interrupts and edge-triggered MSI > > > > > interrupts. > > > > > > > > > > So it doesn't seem that simply being edge-triggered is the > > > > > critical factor here. > > > > > > > > This is the usual problem with these half-baked > > > > implementations. > > > > The signalling to the primary interrupt controller is level, as > > > > they take a multitude of input and (crucially) latch the MSI > > > > edges. Effectively, this is an edge-to-level converter, with > > > > all > > > > the problems that this creates. > > > > > > > > By clearing the status *after* the handling, you lose edges > > > > that > > > > have been received and coalesced after the read of the status > > > > register. By clearing it *before*, you are acknowledging the > > > > interrupts early, and allowing them to be coalesced > > > > independently > > > > of the ones that have been received earlier. > > > > > > > > This is however mostly an educated guess. Someone with access > > > > to > > > > the TRM should verify this. > > > > > > Yes, as Maz said, we save the edge-interrupt status so that it > > > becomes a level-interrupt. This is similar to an edge-to-level > > > converter, so we need to clear it *before*. We found this problem > > > through a lot of experiments and tested this patch. > > > > I thought there might be other host controllers with similar > > design, > > so I looked at all the other drivers and tried to figure out > > whether > > any others had similar problems. > > > > The ones below look suspicious to me because they all clear some > > sort > > of status register *after* handling an MSI. Can you guys take a > > look > > and make sure they are working correctly? > > > > keembay_pcie_msi_irq_handler > > status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > if (status & MSI_CTRL_INT) > > dw_handle_msi_irq > > generic_handle_domain_irq > > writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > > > spear13xx_pcie_irq_handler > > status = readl(&app_reg->int_sts) > > if (status & MSI_CTRL_INT) > > dw_handle_msi_irq > > generic_handle_domain_irq > > writel(status, &app_reg->int_clr) > > > > advk_pcie_handle_int > > isr0_status = advk_readl(pcie, PCIE_ISR0_REG) > > if (isr0_status & PCIE_ISR0_MSI_INT_PENDING) > > advk_pcie_handle_msi > > advk_readl(pcie, PCIE_MSI_STATUS_REG) > > advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG) > > generic_handle_irq > > advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG) > > > > mtk_pcie_irq_handler > > status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG) > > for_each_set_bit_from(irq_bit, &status, ...) > > mtk_pcie_msi_handler > > generic_handle_domain_irq > > writel_relaxed(BIT(irq_bit), pcie->base + > > PCIE_INT_STATUS_REG) > > Thanks for mention that. In the hardware corresponding to pcie- > mediatek-gen3.c, the interrupt status in PCIE_INT_STATUS_REG cannot > be > cleared if the MSI status remaining in the register of msi_set, so we > have to clear it after handling the MSI. > > I guess the root cause of this patch is the interrupt status can be > cleared even the MSI status still remaining, hence that if there are > some MSIs received while clearing the interrupt status, these MSIs > cannot be serviced. > > We will discuss and test internally and update the results later, > thanks for your review. > > Thanks. > > > > > Bjorn > > Sorry for the late reply. Thanks for your comment. I will update subject and add commit log in the next version. The interrupt status can be cleared even the MSI status still remaining, as an edge-triggered interrupts, its interrupt status should be cleared before dispatching handler to capture the next interrupt. The design of MSI hardware block diagram is as follows: +-----+ | GIC | +-----+ ^ | +-----------------+ | INT_STATUS | +-----------------+ ^ | (edge-triggered) +-----------------+ | MSI_STATUS | +-----------------+ ^ | +-----------------+ | EP send MSI | +-----------------+ Thanks
diff --git a/drivers/pci/controller/pcie-mediatek.c b/drivers/pci/controller/pcie-mediatek.c index 2f3f974977a3..705ea33758b1 100644 --- a/drivers/pci/controller/pcie-mediatek.c +++ b/drivers/pci/controller/pcie-mediatek.c @@ -624,12 +624,12 @@ static void mtk_pcie_intr_handler(struct irq_desc *desc) if (status & MSI_STATUS){ unsigned long imsi_status; + /* Clear MSI interrupt status */ + writel(MSI_STATUS, port->base + PCIE_INT_STATUS); while ((imsi_status = readl(port->base + PCIE_IMSI_STATUS))) { for_each_set_bit(bit, &imsi_status, MTK_MSI_IRQS_NUM) generic_handle_domain_irq(port->inner_domain, bit); } - /* Clear MSI interrupt status */ - writel(MSI_STATUS, port->base + PCIE_INT_STATUS); } }
As an edge-triggered interrupts, its interrupt status should be cleared before dispatch to the handler of device. Signed-off-by: qizhong cheng <qizhong.cheng@mediatek.com> --- drivers/pci/controller/pcie-mediatek.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)