Message ID | 20221125-mtk-iommu-v1-0-bb5ecac97a28@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu/mediatek: Fix crash on isr after kexec() | expand |
On 2022-11-25 16:28, Ricardo Ribalda wrote: > If the system is rebooted via isr(), the IRQ handler might be triggerd > before the domain is initialized. Resulting on an invalid memory access > error. > > Fix: > [ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070 > [ 0.501166] Call trace: > [ 0.501174] report_iommu_fault+0x28/0xfc > [ 0.501180] mtk_iommu_isr+0x10c/0x1c0 Hmm, shouldn't we clear any pending faults at probe in mtk_iommu_hw_init(), before the IRQ is requested? mtk_iommu_isr() might still want to be robust against a spurious interrupt, but then it can simply return without doing anything at all if the domain is NULL, since we'll know that's the case. Thanks, Robin. (It might be nice if request_irq() had a flag to say "if this IRQ looks pending already just clear it" for drivers that know it could only be spurious at that point; kexec seems to lead to this problem quite a lot...) > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > --- > To: Yong Wu <yong.wu@mediatek.com> > To: Joerg Roedel <joro@8bytes.org> > To: Will Deacon <will@kernel.org> > To: Robin Murphy <robin.murphy@arm.com> > To: Matthias Brugger <matthias.bgg@gmail.com> > Cc: iommu@lists.linux.dev > Cc: linux-mediatek@lists.infradead.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/iommu/mtk_iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > index 2ab2ecfe01f8..17f6be5a5097 100644 > --- a/drivers/iommu/mtk_iommu.c > +++ b/drivers/iommu/mtk_iommu.c > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) > fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm]; > } > > - if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, > + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, > write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) { > dev_err_ratelimited( > bank->parent_dev, > > --- > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4 > change-id: 20221125-mtk-iommu-13023f971298 > > Best regards,
Hi Robin Thanks for your review! On Fri, 25 Nov 2022 at 18:02, Robin Murphy <robin.murphy@arm.com> wrote: > > On 2022-11-25 16:28, Ricardo Ribalda wrote: > > If the system is rebooted via isr(), the IRQ handler might be triggerd > > before the domain is initialized. Resulting on an invalid memory access > > error. > > > > Fix: > > [ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070 > > [ 0.501166] Call trace: > > [ 0.501174] report_iommu_fault+0x28/0xfc > > [ 0.501180] mtk_iommu_isr+0x10c/0x1c0 > > Hmm, shouldn't we clear any pending faults at probe in > mtk_iommu_hw_init(), before the IRQ is requested? mtk_iommu_isr() might > still want to be robust against a spurious interrupt, but then it can > simply return without doing anything at all if the domain is NULL, since > we'll know that's the case. > > Thanks, > Robin. > > (It might be nice if request_irq() had a flag to say "if this IRQ looks > pending already just clear it" for drivers that know it could only be > spurious at that point; kexec seems to lead to this problem quite a lot...) It is not only about the "last" IRQ before kexec. The peripherals under the IOMMU might still active and producing faults and therefore IRQs. I tried this: @@ -886,6 +886,11 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data *data, unsigned int ban upper_32_bits(data->protect_base); writel_relaxed(regval, bankx->base + REG_MMU_IVRP_PADDR); + /* Clear previous IRQs */ + regval = readl_relaxed(bankx->base + REG_MMU_INT_CONTROL0); + regval |= F_INT_CLR_BIT; + writel_relaxed(regval, bankx->base + REG_MMU_INT_CONTROL0); + if (devm_request_irq(bankx->pdev, bankx->irq, mtk_iommu_isr, 0, dev_name(bankx->pdev), (void *)bankx)) { writel_relaxed(0, bankx->base + REG_MMU_PT_BASE_ADDR); And I still get the same crash > > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > > --- > > To: Yong Wu <yong.wu@mediatek.com> > > To: Joerg Roedel <joro@8bytes.org> > > To: Will Deacon <will@kernel.org> > > To: Robin Murphy <robin.murphy@arm.com> > > To: Matthias Brugger <matthias.bgg@gmail.com> > > Cc: iommu@lists.linux.dev > > Cc: linux-mediatek@lists.infradead.org > > Cc: linux-arm-kernel@lists.infradead.org > > Cc: linux-kernel@vger.kernel.org > > --- > > drivers/iommu/mtk_iommu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > > index 2ab2ecfe01f8..17f6be5a5097 100644 > > --- a/drivers/iommu/mtk_iommu.c > > +++ b/drivers/iommu/mtk_iommu.c > > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) > > fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm]; > > } > > > > - if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, > > + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, > > write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) { > > dev_err_ratelimited( > > bank->parent_dev, > > > > --- > > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4 > > change-id: 20221125-mtk-iommu-13023f971298 > > > > Best regards,
On Fri, 2022-11-25 at 17:28 +0100, Ricardo Ribalda wrote: > If the system is rebooted via isr(), the IRQ handler might be > triggerd > before the domain is initialized. Resulting on an invalid memory > access > error. > > Fix: > [ 0.500930] Unable to handle kernel read from unreadable memory at > virtual address 0000000000000070 > [ 0.501166] Call trace: > [ 0.501174] report_iommu_fault+0x28/0xfc > [ 0.501180] mtk_iommu_isr+0x10c/0x1c0 > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > --- > To: Yong Wu <yong.wu@mediatek.com> > To: Joerg Roedel <joro@8bytes.org> > To: Will Deacon <will@kernel.org> > To: Robin Murphy <robin.murphy@arm.com> > To: Matthias Brugger <matthias.bgg@gmail.com> > Cc: iommu@lists.linux.dev > Cc: linux-mediatek@lists.infradead.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/iommu/mtk_iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > index 2ab2ecfe01f8..17f6be5a5097 100644 > --- a/drivers/iommu/mtk_iommu.c > +++ b/drivers/iommu/mtk_iommu.c > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void > *dev_id) > fault_larb = data->plat_data- > >larbid_remap[fault_larb][sub_comm]; > } > > - if (report_iommu_fault(&dom->domain, bank->parent_dev, > fault_iova, > + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, > fault_iova, Which SoC does this issue happen? Does this issue is happened in the upstream kernel or the downstream kernel? Normally each port enable the iommu defaultly. Let's print the error log even though "dom" is null to check which port fail here. then analyse the port's behavior. if (!dom || report_iommu_fault(xx)) dev_err_ratelimited(xx) > write ? IOMMU_FAULT_WRITE : > IOMMU_FAULT_READ)) { > dev_err_ratelimited( > bank->parent_dev, > > --- > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4 > change-id: 20221125-mtk-iommu-13023f971298 > > Best regards,
Hi Yong On Mon, 28 Nov 2022 at 07:44, Yong Wu (吴勇) <Yong.Wu@mediatek.com> wrote: > > On Fri, 2022-11-25 at 17:28 +0100, Ricardo Ribalda wrote: > > If the system is rebooted via isr(), the IRQ handler might be > > triggerd > > before the domain is initialized. Resulting on an invalid memory > > access > > error. > > > > Fix: > > [ 0.500930] Unable to handle kernel read from unreadable memory at > > virtual address 0000000000000070 > > [ 0.501166] Call trace: > > [ 0.501174] report_iommu_fault+0x28/0xfc > > [ 0.501180] mtk_iommu_isr+0x10c/0x1c0 > > > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> > > --- > > To: Yong Wu <yong.wu@mediatek.com> > > To: Joerg Roedel <joro@8bytes.org> > > To: Will Deacon <will@kernel.org> > > To: Robin Murphy <robin.murphy@arm.com> > > To: Matthias Brugger <matthias.bgg@gmail.com> > > Cc: iommu@lists.linux.dev > > Cc: linux-mediatek@lists.infradead.org > > Cc: linux-arm-kernel@lists.infradead.org > > Cc: linux-kernel@vger.kernel.org > > --- > > drivers/iommu/mtk_iommu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > > index 2ab2ecfe01f8..17f6be5a5097 100644 > > --- a/drivers/iommu/mtk_iommu.c > > +++ b/drivers/iommu/mtk_iommu.c > > @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void > > *dev_id) > > fault_larb = data->plat_data- > > >larbid_remap[fault_larb][sub_comm]; > > } > > > > - if (report_iommu_fault(&dom->domain, bank->parent_dev, > > fault_iova, > > + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, > > fault_iova, > > > Which SoC does this issue happen? Does this issue is happened in the > upstream kernel or the downstream kernel? I am using chromeos-5.10 and chromeos-5.15 (which are pretty much upstream). I have seen this issue at least with MT8195 and MT8183 > > Normally each port enable the iommu defaultly. Let's print the error > log even though "dom" is null to check which port fail here. then > analyse the port's behavior. > > if (!dom || report_iommu_fault(xx)) > dev_err_ratelimited(xx) sending a v2 with the change. Thanks! > > > write ? IOMMU_FAULT_WRITE : > > IOMMU_FAULT_READ)) { > > dev_err_ratelimited( > > bank->parent_dev, > > > > --- > > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4 > > change-id: 20221125-mtk-iommu-13023f971298 > > > > Best regards,
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 2ab2ecfe01f8..17f6be5a5097 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -454,7 +454,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm]; } - if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, + if (dom && report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) { dev_err_ratelimited( bank->parent_dev,
If the system is rebooted via isr(), the IRQ handler might be triggerd before the domain is initialized. Resulting on an invalid memory access error. Fix: [ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070 [ 0.501166] Call trace: [ 0.501174] report_iommu_fault+0x28/0xfc [ 0.501180] mtk_iommu_isr+0x10c/0x1c0 Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> --- To: Yong Wu <yong.wu@mediatek.com> To: Joerg Roedel <joro@8bytes.org> To: Will Deacon <will@kernel.org> To: Robin Murphy <robin.murphy@arm.com> To: Matthias Brugger <matthias.bgg@gmail.com> Cc: iommu@lists.linux.dev Cc: linux-mediatek@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org --- drivers/iommu/mtk_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4 change-id: 20221125-mtk-iommu-13023f971298 Best regards,