Message ID | 20210610214431.539029-1-robdclark@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | iommu/arm-smmu: adreno-smmu page fault handling | expand |
Hi, I've had splash screen disabled on my RB3. However once I've enabled it, I've got the attached crash during the boot on the msm/msm-next. It looks like it is related to this particular set of changes. On 11/06/2021 00:44, Rob Clark wrote: > From: Rob Clark <robdclark@chromium.org> > > This picks up an earlier series[1] from Jordan, and adds additional > support needed to generate GPU devcore dumps on iova faults. Original > description: > > This is a stack to add an Adreno GPU specific handler for pagefaults. The first > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds > a adreno-smmu-priv function hook to capture a handful of important debugging > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the > third patch to print more detailed information on page fault such as the TTBR0 > for the pagetable that caused the fault and the source of the fault as > determined by a combination of the FSYNR1 register and an internal GPU > register. > > This code provides a solid base that we can expand on later for even more > extensive GPU side page fault debugging capabilities. > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where > GPU snapshotting needs to avoid crashdumper, and check the > RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver > resume translation after it has had a chance to snapshot the GPUs > state > v3: Always clear FSR even if the target driver is going to handle resume > v2: Fix comment wording and function pointer check per Rob Clark > > [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/ > > Jordan Crouse (3): > iommu/arm-smmu: Add support for driver IOMMU fault handlers > iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault > info > drm/msm: Improve the a6xx page fault handler > > Rob Clark (2): > iommu/arm-smmu-qcom: Add stall support > drm/msm: devcoredump iommu fault support > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 23 +++- > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 110 +++++++++++++++++++- > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 42 ++++++-- > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 15 +++ > drivers/gpu/drm/msm/msm_gem.h | 1 + > drivers/gpu/drm/msm/msm_gem_submit.c | 1 + > drivers/gpu/drm/msm/msm_gpu.c | 48 +++++++++ > drivers/gpu/drm/msm/msm_gpu.h | 17 +++ > drivers/gpu/drm/msm/msm_gpummu.c | 5 + > drivers/gpu/drm/msm/msm_iommu.c | 22 +++- > drivers/gpu/drm/msm/msm_mmu.h | 5 +- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 +++++++++ > drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +- > drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 + > include/linux/adreno-smmu-priv.h | 38 ++++++- > 15 files changed, 367 insertions(+), 21 deletions(-) >
I suspect you are getting a dpu fault, and need: https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/ I suppose Bjorn was expecting me to send that patch BR, -R On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote: > > Hi, > > I've had splash screen disabled on my RB3. However once I've enabled it, > I've got the attached crash during the boot on the msm/msm-next. It > looks like it is related to this particular set of changes. > > On 11/06/2021 00:44, Rob Clark wrote: > > From: Rob Clark <robdclark@chromium.org> > > > > This picks up an earlier series[1] from Jordan, and adds additional > > support needed to generate GPU devcore dumps on iova faults. Original > > description: > > > > This is a stack to add an Adreno GPU specific handler for pagefaults. The first > > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds > > a adreno-smmu-priv function hook to capture a handful of important debugging > > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the > > third patch to print more detailed information on page fault such as the TTBR0 > > for the pagetable that caused the fault and the source of the fault as > > determined by a combination of the FSYNR1 register and an internal GPU > > register. > > > > This code provides a solid base that we can expand on later for even more > > extensive GPU side page fault debugging capabilities. > > > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where > > GPU snapshotting needs to avoid crashdumper, and check the > > RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths > > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver > > resume translation after it has had a chance to snapshot the GPUs > > state > > v3: Always clear FSR even if the target driver is going to handle resume > > v2: Fix comment wording and function pointer check per Rob Clark > > > > [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/ > > > > Jordan Crouse (3): > > iommu/arm-smmu: Add support for driver IOMMU fault handlers > > iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault > > info > > drm/msm: Improve the a6xx page fault handler > > > > Rob Clark (2): > > iommu/arm-smmu-qcom: Add stall support > > drm/msm: devcoredump iommu fault support > > > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 23 +++- > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 110 +++++++++++++++++++- > > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 42 ++++++-- > > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 15 +++ > > drivers/gpu/drm/msm/msm_gem.h | 1 + > > drivers/gpu/drm/msm/msm_gem_submit.c | 1 + > > drivers/gpu/drm/msm/msm_gpu.c | 48 +++++++++ > > drivers/gpu/drm/msm/msm_gpu.h | 17 +++ > > drivers/gpu/drm/msm/msm_gpummu.c | 5 + > > drivers/gpu/drm/msm/msm_iommu.c | 22 +++- > > drivers/gpu/drm/msm/msm_mmu.h | 5 +- > > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 +++++++++ > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +- > > drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 + > > include/linux/adreno-smmu-priv.h | 38 ++++++- > > 15 files changed, 367 insertions(+), 21 deletions(-) > > > > > -- > With best wishes > Dmitry
On Sun 04 Jul 13:20 CDT 2021, Rob Clark wrote: > I suspect you are getting a dpu fault, and need: > > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/ > > I suppose Bjorn was expecting me to send that patch > No, I left that discussion with the same understanding as you... But I ended up side tracked by some other craziness. Did you post this somewhere or would you still like me to test it and spin a patch? Regards, Bjorn > BR, > -R > > On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov > <dmitry.baryshkov@linaro.org> wrote: > > > > Hi, > > > > I've had splash screen disabled on my RB3. However once I've enabled it, > > I've got the attached crash during the boot on the msm/msm-next. It > > looks like it is related to this particular set of changes. > > > > On 11/06/2021 00:44, Rob Clark wrote: > > > From: Rob Clark <robdclark@chromium.org> > > > > > > This picks up an earlier series[1] from Jordan, and adds additional > > > support needed to generate GPU devcore dumps on iova faults. Original > > > description: > > > > > > This is a stack to add an Adreno GPU specific handler for pagefaults. The first > > > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds > > > a adreno-smmu-priv function hook to capture a handful of important debugging > > > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the > > > third patch to print more detailed information on page fault such as the TTBR0 > > > for the pagetable that caused the fault and the source of the fault as > > > determined by a combination of the FSYNR1 register and an internal GPU > > > register. > > > > > > This code provides a solid base that we can expand on later for even more > > > extensive GPU side page fault debugging capabilities. > > > > > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where > > > GPU snapshotting needs to avoid crashdumper, and check the > > > RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths > > > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver > > > resume translation after it has had a chance to snapshot the GPUs > > > state > > > v3: Always clear FSR even if the target driver is going to handle resume > > > v2: Fix comment wording and function pointer check per Rob Clark > > > > > > [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/ > > > > > > Jordan Crouse (3): > > > iommu/arm-smmu: Add support for driver IOMMU fault handlers > > > iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault > > > info > > > drm/msm: Improve the a6xx page fault handler > > > > > > Rob Clark (2): > > > iommu/arm-smmu-qcom: Add stall support > > > drm/msm: devcoredump iommu fault support > > > > > > drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 23 +++- > > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 110 +++++++++++++++++++- > > > drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 42 ++++++-- > > > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 15 +++ > > > drivers/gpu/drm/msm/msm_gem.h | 1 + > > > drivers/gpu/drm/msm/msm_gem_submit.c | 1 + > > > drivers/gpu/drm/msm/msm_gpu.c | 48 +++++++++ > > > drivers/gpu/drm/msm/msm_gpu.h | 17 +++ > > > drivers/gpu/drm/msm/msm_gpummu.c | 5 + > > > drivers/gpu/drm/msm/msm_iommu.c | 22 +++- > > > drivers/gpu/drm/msm/msm_mmu.h | 5 +- > > > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 +++++++++ > > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +- > > > drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 + > > > include/linux/adreno-smmu-priv.h | 38 ++++++- > > > 15 files changed, 367 insertions(+), 21 deletions(-) > > > > > > > > > -- > > With best wishes > > Dmitry
On Sun, Jul 4, 2021 at 11:16 AM Rob Clark <robdclark@gmail.com> wrote: > > I suspect you are getting a dpu fault, and need: > > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/ > > I suppose Bjorn was expecting me to send that patch If it's helpful, I applied that and it got the db845c booting mainline again for me (along with some reverts for a separate ext4 shrinker crash). Tested-by: John Stultz <john.stultz@linaro.org> thanks -john
On Tue, Jul 6, 2021 at 10:12 PM John Stultz <john.stultz@linaro.org> wrote: > > On Sun, Jul 4, 2021 at 11:16 AM Rob Clark <robdclark@gmail.com> wrote: > > > > I suspect you are getting a dpu fault, and need: > > > > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/ > > > > I suppose Bjorn was expecting me to send that patch > > If it's helpful, I applied that and it got the db845c booting mainline > again for me (along with some reverts for a separate ext4 shrinker > crash). > Tested-by: John Stultz <john.stultz@linaro.org> > Thanks, I'll send a patch shortly BR, -R
From: Rob Clark <robdclark@chromium.org> This picks up an earlier series[1] from Jordan, and adds additional support needed to generate GPU devcore dumps on iova faults. Original description: This is a stack to add an Adreno GPU specific handler for pagefaults. The first patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds a adreno-smmu-priv function hook to capture a handful of important debugging registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the third patch to print more detailed information on page fault such as the TTBR0 for the pagetable that caused the fault and the source of the fault as determined by a combination of the FSYNR1 register and an internal GPU register. This code provides a solid base that we can expand on later for even more extensive GPU side page fault debugging capabilities. v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where GPU snapshotting needs to avoid crashdumper, and check the RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver resume translation after it has had a chance to snapshot the GPUs state v3: Always clear FSR even if the target driver is going to handle resume v2: Fix comment wording and function pointer check per Rob Clark [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/ Jordan Crouse (3): iommu/arm-smmu: Add support for driver IOMMU fault handlers iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault info drm/msm: Improve the a6xx page fault handler Rob Clark (2): iommu/arm-smmu-qcom: Add stall support drm/msm: devcoredump iommu fault support drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 23 +++- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 110 +++++++++++++++++++- drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 42 ++++++-- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 15 +++ drivers/gpu/drm/msm/msm_gem.h | 1 + drivers/gpu/drm/msm/msm_gem_submit.c | 1 + drivers/gpu/drm/msm/msm_gpu.c | 48 +++++++++ drivers/gpu/drm/msm/msm_gpu.h | 17 +++ drivers/gpu/drm/msm/msm_gpummu.c | 5 + drivers/gpu/drm/msm/msm_iommu.c | 22 +++- drivers/gpu/drm/msm/msm_mmu.h | 5 +- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 +++++++++ drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +- drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 + include/linux/adreno-smmu-priv.h | 38 ++++++- 15 files changed, 367 insertions(+), 21 deletions(-)