Message ID | 20230901203904.4073-1-nicolinc@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range | expand |
I found this patch cannot be applied to v6.6-rc1 due to conflicts with some new commits that were merged during the previous cycle. I can redo a version rebasing on the v6.6-rc1, yet the new version won't apply to earlier kernel stable trees. Is there a way to make it happen that both mainline and earlier trees can have this fix? Thanks Nicolin On Fri, Sep 01, 2023 at 01:39:04PM -0700, Nicolin Chen wrote: > When running an SVA case, the following soft lockup is triggered: > -------------------------------------------------------------------- > watchdog: BUG: soft lockup - CPU#244 stuck for 26s! > pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 > lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 > sp : ffff8000d83ef290 > x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000 > x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000 > x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0 > x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000 > x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0 > x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc > x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa > x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a > x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001 > Call trace: > arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 > __arm_smmu_tlb_inv_range+0x118/0x254 > arm_smmu_tlb_inv_range_asid+0x6c/0x130 > arm_smmu_mm_invalidate_range+0xa0/0xa4 > __mmu_notifier_invalidate_range_end+0x88/0x120 > unmap_vmas+0x194/0x1e0 > unmap_region+0xb4/0x144 > do_mas_align_munmap+0x290/0x490 > do_mas_munmap+0xbc/0x124 > __vm_munmap+0xa8/0x19c > __arm64_sys_munmap+0x28/0x50 > invoke_syscall+0x78/0x11c > el0_svc_common.constprop.0+0x58/0x1c0 > do_el0_svc+0x34/0x60 > el0_svc+0x2c/0xd4 > el0t_64_sync_handler+0x114/0x140 > el0t_64_sync+0x1a4/0x1a8 > -------------------------------------------------------------------- > > The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable > protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur > to SMMU too since arm_smmu_mm_invalidate_range() is typically called next > to MMU tlb flush function, e.g. > tlb_flush_mmu_tlbonly { > tlb_flush { > __flush_tlb_range { > // check MAX_TLBI_OPS > } > } > mmu_notifier_invalidate_range { > arm_smmu_mm_invalidate_range { > // does not check MAX_TLBI_OPS > } > } > } > > Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an > SVA case SMMU uses the CPU page table, so it makes sense to align with the > tlbflush code. Then, replace per-page TLBI commands with a single per-asid > TLBI command, if the request size hits this threshold. > > Fixes: 51d113c3be09 ("iommu/arm-smmu-v3: Make BTM optional for SVA") > Cc: stable@vger.kernel.org > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++++++--- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++++++ > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > index a5a63b1c947e..7ec3f5219250 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > @@ -201,9 +201,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn, > */ > size = end - start; > > - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) > - arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, > - PAGE_SIZE, false, smmu_domain); > + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) { > + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) && > + size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE) > + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); > + else > + arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, > + PAGE_SIZE, false, smmu_domain); > + } > arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size); > } > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index dcab85698a4e..79a81eed1dcc 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -326,6 +326,15 @@ > */ > #define CMDQ_BATCH_ENTRIES BITS_PER_LONG > > +/* > + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this > + * is used as a threshold to replace per-page TLBI commands to issue in the > + * command queue with an address-space TLBI command, when SMMU w/o a range > + * invalidation feature handles too many per-page TLBI commands, which will > + * otherwise result in a soft lockup. > + */ > +#define CMDQ_MAX_TLBI_OPS (1 << (PAGE_SHIFT - 3)) > + > #define CMDQ_0_OP GENMASK_ULL(7, 0) > #define CMDQ_0_SSV (1UL << 11) > > -- > 2.42.0 > >
On 9/15/23 3:43 PM, Nicolin Chen wrote: > I found this patch cannot be applied to v6.6-rc1 due to conflicts > with some new commits that were merged during the previous cycle. > > I can redo a version rebasing on the v6.6-rc1, yet the new version > won't apply to earlier kernel stable trees. Is there a way to make > it happen that both mainline and earlier trees can have this fix? Normally, bug fixes should first be submitted to the mainline kernel (also known as Linus's tree). If you use the "Fixes" and "CC-stable" tags, the patch will be automatically picked up for the appropriate stable kernels. If the patch does not apply to any stable kernel that you want it to be there, you can then post a back-ported patch to the stable mailing list. When doing so, it's better to include the following information: - The mainline commit ID of the back-ported patch. - The versions of the stable kernel(s) to which you want the back-ported patch to be applied. Hope this helps. Best regards, baolu
On Fri, Sep 01, 2023 at 01:39:04PM -0700, Nicolin Chen wrote: > When running an SVA case, the following soft lockup is triggered: > -------------------------------------------------------------------- > watchdog: BUG: soft lockup - CPU#244 stuck for 26s! > pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 > lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 > sp : ffff8000d83ef290 > x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000 > x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000 > x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0 > x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000 > x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0 > x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc > x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa > x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a > x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001 > Call trace: > arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 > __arm_smmu_tlb_inv_range+0x118/0x254 > arm_smmu_tlb_inv_range_asid+0x6c/0x130 > arm_smmu_mm_invalidate_range+0xa0/0xa4 > __mmu_notifier_invalidate_range_end+0x88/0x120 > unmap_vmas+0x194/0x1e0 > unmap_region+0xb4/0x144 > do_mas_align_munmap+0x290/0x490 > do_mas_munmap+0xbc/0x124 > __vm_munmap+0xa8/0x19c > __arm64_sys_munmap+0x28/0x50 > invoke_syscall+0x78/0x11c > el0_svc_common.constprop.0+0x58/0x1c0 > do_el0_svc+0x34/0x60 > el0_svc+0x2c/0xd4 > el0t_64_sync_handler+0x114/0x140 > el0t_64_sync+0x1a4/0x1a8 > -------------------------------------------------------------------- > > The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable > protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur > to SMMU too since arm_smmu_mm_invalidate_range() is typically called next > to MMU tlb flush function, e.g. > tlb_flush_mmu_tlbonly { > tlb_flush { > __flush_tlb_range { > // check MAX_TLBI_OPS > } > } > mmu_notifier_invalidate_range { > arm_smmu_mm_invalidate_range { > // does not check MAX_TLBI_OPS > } > } > } > > Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an > SVA case SMMU uses the CPU page table, so it makes sense to align with the > tlbflush code. Then, replace per-page TLBI commands with a single per-asid > TLBI command, if the request size hits this threshold. > > Fixes: 51d113c3be09 ("iommu/arm-smmu-v3: Make BTM optional for SVA") > Cc: stable@vger.kernel.org > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++++++--- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++++++ > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > index a5a63b1c947e..7ec3f5219250 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > @@ -201,9 +201,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn, > */ > size = end - start; > > - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) > - arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, > - PAGE_SIZE, false, smmu_domain); > + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) { > + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) && > + size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE) > + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); > + else > + arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, > + PAGE_SIZE, false, smmu_domain); cosmetic nit: Please use braces for the multi-line conditionals. > + } > arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size); > } > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index dcab85698a4e..79a81eed1dcc 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -326,6 +326,15 @@ > */ > #define CMDQ_BATCH_ENTRIES BITS_PER_LONG > > +/* > + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this > + * is used as a threshold to replace per-page TLBI commands to issue in the > + * command queue with an address-space TLBI command, when SMMU w/o a range > + * invalidation feature handles too many per-page TLBI commands, which will > + * otherwise result in a soft lockup. > + */ > +#define CMDQ_MAX_TLBI_OPS (1 << (PAGE_SHIFT - 3)) Maybe stick "SVA" in the name of this somewhere, since that's the reason why looking at PAGE_SHIFT is relevant? Will
On Mon, Sep 18, 2023 at 01:33:31PM +0800, Baolu Lu wrote: > On 9/15/23 3:43 PM, Nicolin Chen wrote: > > I found this patch cannot be applied to v6.6-rc1 due to conflicts > > with some new commits that were merged during the previous cycle. > > > > I can redo a version rebasing on the v6.6-rc1, yet the new version > > won't apply to earlier kernel stable trees. Is there a way to make > > it happen that both mainline and earlier trees can have this fix? > > Normally, bug fixes should first be submitted to the mainline kernel > (also known as Linus's tree). If you use the "Fixes" and "CC-stable" > tags, the patch will be automatically picked up for the appropriate > stable kernels. > > If the patch does not apply to any stable kernel that you want it to be > there, you can then post a back-ported patch to the stable mailing list. > > When doing so, it's better to include the following information: > > - The mainline commit ID of the back-ported patch. > - The versions of the stable kernel(s) to which you want the back-ported > patch to be applied. > > Hope this helps. Yup, please send a version against -rc1 so we can land that first. Will
On Mon, Sep 18, 2023 at 10:23:42AM +0100, Will Deacon wrote: > > If the patch does not apply to any stable kernel that you want it to be > > there, you can then post a back-ported patch to the stable mailing list. > > > > When doing so, it's better to include the following information: > > > > - The mainline commit ID of the back-ported patch. > > - The versions of the stable kernel(s) to which you want the back-ported > > patch to be applied. > > > > Hope this helps. Yes! Thanks :) > Yup, please send a version against -rc1 so we can land that first. Yea, will send a new version. Thank you Nicolin
On Mon, Sep 18, 2023 at 10:22:36AM +0100, Will Deacon wrote: > > @@ -201,9 +201,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn, > > */ > > size = end - start; > > > > - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) > > - arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, > > - PAGE_SIZE, false, smmu_domain); > > + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) { > > + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) && > > + size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE) > > + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); > > + else > > + arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, > > + PAGE_SIZE, false, smmu_domain); > > cosmetic nit: Please use braces for the multi-line conditionals. Ack. > > + } > > arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size); > > } > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > index dcab85698a4e..79a81eed1dcc 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > @@ -326,6 +326,15 @@ > > */ > > #define CMDQ_BATCH_ENTRIES BITS_PER_LONG > > > > +/* > > + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this > > + * is used as a threshold to replace per-page TLBI commands to issue in the > > + * command queue with an address-space TLBI command, when SMMU w/o a range > > + * invalidation feature handles too many per-page TLBI commands, which will > > + * otherwise result in a soft lockup. > > + */ > > +#define CMDQ_MAX_TLBI_OPS (1 << (PAGE_SHIFT - 3)) > > Maybe stick "SVA" in the name of this somewhere, since that's the reason why > looking at PAGE_SHIFT is relevant? Hmm, that does make sense, yet it wouldn't apply to the non-SVA pathway, which makes putting it in the common header meaningless. Perhaps I should have just left it in arm-smmu-v3-sva.c file... Meanwhile, we'd need to figure out another definition for non-SVA pathway. Thanks Nicolin
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index a5a63b1c947e..7ec3f5219250 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -201,9 +201,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn, */ size = end - start; - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) - arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, - PAGE_SIZE, false, smmu_domain); + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) { + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) && + size >= CMDQ_MAX_TLBI_OPS * PAGE_SIZE) + arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); + else + arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid, + PAGE_SIZE, false, smmu_domain); + } arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index dcab85698a4e..79a81eed1dcc 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -326,6 +326,15 @@ */ #define CMDQ_BATCH_ENTRIES BITS_PER_LONG +/* + * Cloned from the MAX_TLBI_OPS in arch/arm64/include/asm/tlbflush.h, this + * is used as a threshold to replace per-page TLBI commands to issue in the + * command queue with an address-space TLBI command, when SMMU w/o a range + * invalidation feature handles too many per-page TLBI commands, which will + * otherwise result in a soft lockup. + */ +#define CMDQ_MAX_TLBI_OPS (1 << (PAGE_SHIFT - 3)) + #define CMDQ_0_OP GENMASK_ULL(7, 0) #define CMDQ_0_SSV (1UL << 11)
When running an SVA case, the following soft lockup is triggered: -------------------------------------------------------------------- watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : ffff8000d83ef290 x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000 x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000 x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0 x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0 x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 -------------------------------------------------------------------- The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur to SMMU too since arm_smmu_mm_invalidate_range() is typically called next to MMU tlb flush function, e.g. tlb_flush_mmu_tlbonly { tlb_flush { __flush_tlb_range { // check MAX_TLBI_OPS } } mmu_notifier_invalidate_range { arm_smmu_mm_invalidate_range { // does not check MAX_TLBI_OPS } } } Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an SVA case SMMU uses the CPU page table, so it makes sense to align with the tlbflush code. Then, replace per-page TLBI commands with a single per-asid TLBI command, if the request size hits this threshold. Fixes: 51d113c3be09 ("iommu/arm-smmu-v3: Make BTM optional for SVA") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 11 ++++++++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++++++ 2 files changed, 17 insertions(+), 3 deletions(-)