Message ID | 20240513161343.1.I5db5530070a1335e6cc3c55e056c2a84b1031308@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu/arm-smmu: Don't disable next-page prefetcher on devices it works on | expand |
Hi Doug, On 2024-05-14 12:13 am, Douglas Anderson wrote: > On sc7180 trogdor devices we get a scary warning at bootup: > arm-smmu 15000000.iommu: > Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK > > We spent some time trying to figure out how we were going to fix these > errata and whether we needed to do a firmware update. Upon closer > inspection, however, we realized that the errata don't apply to us. > Specifically, the errata document says that for these errata: > * Found in: r0p0 > * Fixed in: r2p2 > > ...and trogdor devices appear to be running r2p4. That means that they > are unaffected despite the scary warning. > > The issue is that the kernel unconditionally tries to disable the > prefetcher even on unaffected devices and then warns when it's unable > to. > > Let's change the kernel to only disable the prefetcher on affected > devices, which will get rid of the scary warning on devices that are > unaffected. As per the comment the prefetcher is > "not-particularly-beneficial" but it shouldn't hurt to leave it on for > devices where it doesn't cause problems. Unfortunately by now there are also at least #562869 and #1047329, plus a small possibility of further corners of systemic brokenness in the prefetcher yet to be discovered (or at least characterised sufficiently to be reported as an erratum). One could argue that we're not currently meeting the conditions for #1047329 yet, but with the IOMMUFD APIs finally falling into place, and potential pKVM use-cases on the horizon too, there's a distinct chance that someone will be interested in nesting support for SMMUv2 sooner or later. Thanks, Robin. > Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c > index 9dc772f2cbb2..d9b38b0db0d4 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c > @@ -109,7 +109,7 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm > > int arm_mmu500_reset(struct arm_smmu_device *smmu) > { > - u32 reg, major; > + u32 reg, major, minor; > int i; > /* > * On MMU-500 r2p0 onwards we need to clear ACR.CACHE_LOCK before > @@ -118,6 +118,7 @@ int arm_mmu500_reset(struct arm_smmu_device *smmu) > */ > reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID7); > major = FIELD_GET(ARM_SMMU_ID7_MAJOR, reg); > + minor = FIELD_GET(ARM_SMMU_ID7_MINOR, reg); > reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sACR); > if (major >= 2) > reg &= ~ARM_MMU500_ACR_CACHE_LOCK; > @@ -131,14 +132,18 @@ int arm_mmu500_reset(struct arm_smmu_device *smmu) > /* > * Disable MMU-500's not-particularly-beneficial next-page > * prefetcher for the sake of errata #841119 and #826419. > + * These errata only affect r0p0 through r2p1 (fixed in r2p2). > */ > - for (i = 0; i < smmu->num_context_banks; ++i) { > - reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); > - reg &= ~ARM_MMU500_ACTLR_CPRE; > - arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, reg); > - reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); > - if (reg & ARM_MMU500_ACTLR_CPRE) > - dev_warn_once(smmu->dev, "Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK\n"); > + if (major < 2 || (major == 2 && minor < 2)) { > + for (i = 0; i < smmu->num_context_banks; ++i) { > + reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); > + reg &= ~ARM_MMU500_ACTLR_CPRE; > + arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, reg); > + reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); > + if (reg & ARM_MMU500_ACTLR_CPRE) > + dev_warn_once(smmu->dev, > + "Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK\n"); > + } > } > > return 0;
Hi Doug, On Mon, May 13, 2024 at 04:13:47PM -0700, Douglas Anderson wrote: > On sc7180 trogdor devices we get a scary warning at bootup: > arm-smmu 15000000.iommu: > Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK > > We spent some time trying to figure out how we were going to fix these > errata and whether we needed to do a firmware update. Upon closer > inspection, however, we realized that the errata don't apply to us. > Specifically, the errata document says that for these errata: > * Found in: r0p0 > * Fixed in: r2p2 > > ...and trogdor devices appear to be running r2p4. That means that they > are unaffected despite the scary warning. > > The issue is that the kernel unconditionally tries to disable the > prefetcher even on unaffected devices and then warns when it's unable > to. > > Let's change the kernel to only disable the prefetcher on affected > devices, which will get rid of the scary warning on devices that are > unaffected. As per the comment the prefetcher is > "not-particularly-beneficial" but it shouldn't hurt to leave it on for > devices where it doesn't cause problems. > > Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) Just curious, but did you see any performance impact (good or bad) as a result of this patch? The next-page prefetcher has always looked a little naive to me and, with a tendency for tiny TLBs in some implementations, there's a possibility it could do more harm than good. Will
Hi, On Fri, May 17, 2024 at 9:37 AM Will Deacon <will@kernel.org> wrote: > > Hi Doug, > > On Mon, May 13, 2024 at 04:13:47PM -0700, Douglas Anderson wrote: > > On sc7180 trogdor devices we get a scary warning at bootup: > > arm-smmu 15000000.iommu: > > Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK > > > > We spent some time trying to figure out how we were going to fix these > > errata and whether we needed to do a firmware update. Upon closer > > inspection, however, we realized that the errata don't apply to us. > > Specifically, the errata document says that for these errata: > > * Found in: r0p0 > > * Fixed in: r2p2 > > > > ...and trogdor devices appear to be running r2p4. That means that they > > are unaffected despite the scary warning. > > > > The issue is that the kernel unconditionally tries to disable the > > prefetcher even on unaffected devices and then warns when it's unable > > to. > > > > Let's change the kernel to only disable the prefetcher on affected > > devices, which will get rid of the scary warning on devices that are > > unaffected. As per the comment the prefetcher is > > "not-particularly-beneficial" but it shouldn't hurt to leave it on for > > devices where it doesn't cause problems. > > > > Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply") > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++-------- > > 1 file changed, 13 insertions(+), 8 deletions(-) > > > Just curious, but did you see any performance impact (good or bad) as a > result of this patch? The next-page prefetcher has always looked a little > naive to me and, with a tendency for tiny TLBs in some implementations, > there's a possibility it could do more harm than good. This patch actually makes no difference on trogdor today other than getting rid of the scary warning. Specifically on trogdor the ACR.CACHE_LOCK bit seems to be set so the kernel is unable to change the setting anyway and has never been able to. We are working on figuring out how to fix the firmware and then we have to get a firmware spin before we can really see any changes. I'll keep an eye out to see if performance numbers change when the firmware uprevs. BTW: any idea how big of a deal these errata are? We're _just_ finishing a firmware uprev process and there is always pushback against kicking off a new one unless the issue is important. Given that we've been living with this issue since devices shipped I'm going to assume we don't need to rush a firmware update, but if this is really scary and needs to be addressed sooner we can figure that out. -Doug
On 5/17/2024 10:49 PM, Doug Anderson wrote: > Hi, > > On Fri, May 17, 2024 at 9:37 AM Will Deacon <will@kernel.org> wrote: >> >> Hi Doug, >> >> On Mon, May 13, 2024 at 04:13:47PM -0700, Douglas Anderson wrote: >>> On sc7180 trogdor devices we get a scary warning at bootup: >>> arm-smmu 15000000.iommu: >>> Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK >>> >>> We spent some time trying to figure out how we were going to fix these >>> errata and whether we needed to do a firmware update. Upon closer >>> inspection, however, we realized that the errata don't apply to us. >>> Specifically, the errata document says that for these errata: >>> * Found in: r0p0 >>> * Fixed in: r2p2 >>> >>> ...and trogdor devices appear to be running r2p4. That means that they >>> are unaffected despite the scary warning. >>> >>> The issue is that the kernel unconditionally tries to disable the >>> prefetcher even on unaffected devices and then warns when it's unable >>> to. >>> >>> Let's change the kernel to only disable the prefetcher on affected >>> devices, which will get rid of the scary warning on devices that are >>> unaffected. As per the comment the prefetcher is >>> "not-particularly-beneficial" but it shouldn't hurt to leave it on for >>> devices where it doesn't cause problems. >>> >>> Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply") >>> Signed-off-by: Douglas Anderson <dianders@chromium.org> >>> --- >>> >>> drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++-------- >>> 1 file changed, 13 insertions(+), 8 deletions(-) >> >> >> Just curious, but did you see any performance impact (good or bad) as a >> result of this patch? The next-page prefetcher has always looked a little >> naive to me and, with a tendency for tiny TLBs in some implementations, >> there's a possibility it could do more harm than good. > > This patch actually makes no difference on trogdor today other than > getting rid of the scary warning. Specifically on trogdor the > ACR.CACHE_LOCK bit seems to be set so the kernel is unable to change > the setting anyway and has never been able to. We are working on > figuring out how to fix the firmware and then we have to get a > firmware spin before we can really see any changes. I'll keep an eye > out to see if performance numbers change when the firmware uprevs. > > BTW: any idea how big of a deal these errata are? We're _just_ > finishing a firmware uprev process and there is always pushback > against kicking off a new one unless the issue is important. Given > that we've been living with this issue since devices shipped I'm going > to assume we don't need to rush a firmware update, but if this is > really scary and needs to be addressed sooner we can figure that out. > > -Doug Receiving the warning on pre-silicon platforms as well, despite being unaffected. If merged, it will help in reducing log clutter. The patch applies cleanly on the tip of linux-next, tag: next-20240904.
On 9/4/2024 1:59 PM, Pankaj Patil wrote: > On 5/17/2024 10:49 PM, Doug Anderson wrote: >> Hi, >> >> On Fri, May 17, 2024 at 9:37 AM Will Deacon <will@kernel.org> wrote: >>> >>> Hi Doug, >>> >>> On Mon, May 13, 2024 at 04:13:47PM -0700, Douglas Anderson wrote: >>>> On sc7180 trogdor devices we get a scary warning at bootup: >>>> arm-smmu 15000000.iommu: >>>> Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK >>>> >>>> We spent some time trying to figure out how we were going to fix these >>>> errata and whether we needed to do a firmware update. Upon closer >>>> inspection, however, we realized that the errata don't apply to us. >>>> Specifically, the errata document says that for these errata: >>>> * Found in: r0p0 >>>> * Fixed in: r2p2 >>>> >>>> ...and trogdor devices appear to be running r2p4. That means that they >>>> are unaffected despite the scary warning. >>>> >>>> The issue is that the kernel unconditionally tries to disable the >>>> prefetcher even on unaffected devices and then warns when it's unable >>>> to. >>>> >>>> Let's change the kernel to only disable the prefetcher on affected >>>> devices, which will get rid of the scary warning on devices that are >>>> unaffected. As per the comment the prefetcher is >>>> "not-particularly-beneficial" but it shouldn't hurt to leave it on for >>>> devices where it doesn't cause problems. >>>> >>>> Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply") >>>> Signed-off-by: Douglas Anderson <dianders@chromium.org> >>>> --- >>>> >>>> drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++-------- >>>> 1 file changed, 13 insertions(+), 8 deletions(-) >>> >>> >>> Just curious, but did you see any performance impact (good or bad) as a >>> result of this patch? The next-page prefetcher has always looked a little >>> naive to me and, with a tendency for tiny TLBs in some implementations, >>> there's a possibility it could do more harm than good. >> >> This patch actually makes no difference on trogdor today other than >> getting rid of the scary warning. Specifically on trogdor the >> ACR.CACHE_LOCK bit seems to be set so the kernel is unable to change >> the setting anyway and has never been able to. We are working on >> figuring out how to fix the firmware and then we have to get a >> firmware spin before we can really see any changes. I'll keep an eye >> out to see if performance numbers change when the firmware uprevs. >> >> BTW: any idea how big of a deal these errata are? We're _just_ >> finishing a firmware uprev process and there is always pushback >> against kicking off a new one unless the issue is important. Given >> that we've been living with this issue since devices shipped I'm going >> to assume we don't need to rush a firmware update, but if this is >> really scary and needs to be addressed sooner we can figure that out. >> >> -Doug > > Receiving the warning on pre-silicon platforms as well, despite being unaffected. If merged, it will help in reducing log clutter. > The patch applies cleanly on the tip of linux-next, tag: next-20240904. > Following up on the patch. Please let me know if any additional changes are required.
On 07/10/2024 11:03 am, Pankaj Patil wrote: > On 9/4/2024 1:59 PM, Pankaj Patil wrote: >> On 5/17/2024 10:49 PM, Doug Anderson wrote: >>> Hi, >>> >>> On Fri, May 17, 2024 at 9:37 AM Will Deacon <will@kernel.org> wrote: >>>> >>>> Hi Doug, >>>> >>>> On Mon, May 13, 2024 at 04:13:47PM -0700, Douglas Anderson wrote: >>>>> On sc7180 trogdor devices we get a scary warning at bootup: >>>>> arm-smmu 15000000.iommu: >>>>> Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK >>>>> >>>>> We spent some time trying to figure out how we were going to fix these >>>>> errata and whether we needed to do a firmware update. Upon closer >>>>> inspection, however, we realized that the errata don't apply to us. >>>>> Specifically, the errata document says that for these errata: >>>>> * Found in: r0p0 >>>>> * Fixed in: r2p2 >>>>> >>>>> ...and trogdor devices appear to be running r2p4. That means that they >>>>> are unaffected despite the scary warning. >>>>> >>>>> The issue is that the kernel unconditionally tries to disable the >>>>> prefetcher even on unaffected devices and then warns when it's unable >>>>> to. >>>>> >>>>> Let's change the kernel to only disable the prefetcher on affected >>>>> devices, which will get rid of the scary warning on devices that are >>>>> unaffected. As per the comment the prefetcher is >>>>> "not-particularly-beneficial" but it shouldn't hurt to leave it on for >>>>> devices where it doesn't cause problems. >>>>> >>>>> Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply") >>>>> Signed-off-by: Douglas Anderson <dianders@chromium.org> >>>>> --- >>>>> >>>>> drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++-------- >>>>> 1 file changed, 13 insertions(+), 8 deletions(-) >>>> >>>> >>>> Just curious, but did you see any performance impact (good or bad) as a >>>> result of this patch? The next-page prefetcher has always looked a little >>>> naive to me and, with a tendency for tiny TLBs in some implementations, >>>> there's a possibility it could do more harm than good. >>> >>> This patch actually makes no difference on trogdor today other than >>> getting rid of the scary warning. Specifically on trogdor the >>> ACR.CACHE_LOCK bit seems to be set so the kernel is unable to change >>> the setting anyway and has never been able to. We are working on >>> figuring out how to fix the firmware and then we have to get a >>> firmware spin before we can really see any changes. I'll keep an eye >>> out to see if performance numbers change when the firmware uprevs. >>> >>> BTW: any idea how big of a deal these errata are? We're _just_ >>> finishing a firmware uprev process and there is always pushback >>> against kicking off a new one unless the issue is important. Given >>> that we've been living with this issue since devices shipped I'm going >>> to assume we don't need to rush a firmware update, but if this is >>> really scary and needs to be addressed sooner we can figure that out. >>> >>> -Doug >> >> Receiving the warning on pre-silicon platforms as well, despite being unaffected. If merged, it will help in reducing log clutter. >> The patch applies cleanly on the tip of linux-next, tag: next-20240904. >> > Following up on the patch. Please let me know if any additional > changes are required. Surely at pre-silicon there's really very little excuse for not just fixing the firmware? Anyway, it remains the case that the real issue here is the message and comment being misleadingly over-specific, and I already sent a patch to address that[1]. Thanks, Robin. [1] https://lore.kernel.org/linux-iommu/7c426dc0-4fde-4d1e-bb91-538984bd8b59@arm.com/
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c index 9dc772f2cbb2..d9b38b0db0d4 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c @@ -109,7 +109,7 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct arm_smmu_device *smm int arm_mmu500_reset(struct arm_smmu_device *smmu) { - u32 reg, major; + u32 reg, major, minor; int i; /* * On MMU-500 r2p0 onwards we need to clear ACR.CACHE_LOCK before @@ -118,6 +118,7 @@ int arm_mmu500_reset(struct arm_smmu_device *smmu) */ reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_ID7); major = FIELD_GET(ARM_SMMU_ID7_MAJOR, reg); + minor = FIELD_GET(ARM_SMMU_ID7_MINOR, reg); reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sACR); if (major >= 2) reg &= ~ARM_MMU500_ACR_CACHE_LOCK; @@ -131,14 +132,18 @@ int arm_mmu500_reset(struct arm_smmu_device *smmu) /* * Disable MMU-500's not-particularly-beneficial next-page * prefetcher for the sake of errata #841119 and #826419. + * These errata only affect r0p0 through r2p1 (fixed in r2p2). */ - for (i = 0; i < smmu->num_context_banks; ++i) { - reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); - reg &= ~ARM_MMU500_ACTLR_CPRE; - arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, reg); - reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); - if (reg & ARM_MMU500_ACTLR_CPRE) - dev_warn_once(smmu->dev, "Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK\n"); + if (major < 2 || (major == 2 && minor < 2)) { + for (i = 0; i < smmu->num_context_banks; ++i) { + reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); + reg &= ~ARM_MMU500_ACTLR_CPRE; + arm_smmu_cb_write(smmu, i, ARM_SMMU_CB_ACTLR, reg); + reg = arm_smmu_cb_read(smmu, i, ARM_SMMU_CB_ACTLR); + if (reg & ARM_MMU500_ACTLR_CPRE) + dev_warn_once(smmu->dev, + "Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK\n"); + } } return 0;
On sc7180 trogdor devices we get a scary warning at bootup: arm-smmu 15000000.iommu: Failed to disable prefetcher [errata #841119 and #826419], check ACR.CACHE_LOCK We spent some time trying to figure out how we were going to fix these errata and whether we needed to do a firmware update. Upon closer inspection, however, we realized that the errata don't apply to us. Specifically, the errata document says that for these errata: * Found in: r0p0 * Fixed in: r2p2 ...and trogdor devices appear to be running r2p4. That means that they are unaffected despite the scary warning. The issue is that the kernel unconditionally tries to disable the prefetcher even on unaffected devices and then warns when it's unable to. Let's change the kernel to only disable the prefetcher on affected devices, which will get rid of the scary warning on devices that are unaffected. As per the comment the prefetcher is "not-particularly-beneficial" but it shouldn't hurt to leave it on for devices where it doesn't cause problems. Fixes: f87f6e5b4539 ("iommu/arm-smmu: Warn once when the perfetcher errata patch fails to apply") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)