Message ID | 20220617115529.17530-1-nirmoy.das@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Fix a lockdep warning at error capture | expand |
Missed the fdo issue ref: Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5595 On 6/17/2022 1:55 PM, Nirmoy Das wrote: > For some platfroms we use stop_machine version of > gen8_ggtt_insert_page/gen8_ggtt_insert_entries to avoid a > concurrent GGTT access bug but this causes a circular locking > dependency warning: > > Possible unsafe locking scenario: > CPU0 CPU1 > ---- ---- > lock(&ggtt->error_mutex); > lock(dma_fence_map); > lock(&ggtt->error_mutex); > lock(cpu_hotplug_lock); > > Fix this by calling gen8_ggtt_insert_page/gen8_ggtt_insert_entries > directly at error capture which is concurrent GGTT access safe because > reset path make sure of that. > > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_gt_gmch.c | 2 ++ > drivers/gpu/drm/i915/gt/intel_gtt.h | 9 +++++++++ > drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 ++++- > drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++++-- > 4 files changed, 21 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c > index 18e488672d1b..2260ed576776 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c > @@ -629,6 +629,8 @@ int intel_gt_gmch_gen8_probe(struct i915_ggtt *ggtt) > if (intel_vm_no_concurrent_access_wa(i915)) { > ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL; > ggtt->vm.insert_page = bxt_vtd_ggtt_insert_page__BKL; > + ggtt->vm.raw_insert_page = gen8_ggtt_insert_page; > + ggtt->vm.raw_insert_entries = gen8_ggtt_insert_entries; > ggtt->vm.bind_async_flags = > I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; > } > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h > index a40d928b3888..f9a1f3d17272 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h > @@ -306,6 +306,15 @@ struct i915_address_space { > struct i915_vma_resource *vma_res, > enum i915_cache_level cache_level, > u32 flags); > + void (*raw_insert_page)(struct i915_address_space *vm, > + dma_addr_t addr, > + u64 offset, > + enum i915_cache_level cache_level, > + u32 flags); > + void (*raw_insert_entries)(struct i915_address_space *vm, > + struct i915_vma_resource *vma_res, > + enum i915_cache_level cache_level, > + u32 flags); > void (*cleanup)(struct i915_address_space *vm); > > void (*foreach)(struct i915_address_space *vm, > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c > index d2c5c9367cc4..c06e83872c34 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c > @@ -493,7 +493,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw) > if (i915_gem_object_is_lmem(obj)) > pte_flags |= PTE_LM; > > - ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags); > + if (ggtt->vm.raw_insert_entries) > + ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags); > + else > + ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags); > } > > static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw) > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c > index bff8a111424a..f9b1969ed7ed 100644 > --- a/drivers/gpu/drm/i915/i915_gpu_error.c > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c > @@ -1104,8 +1104,12 @@ i915_vma_coredump_create(const struct intel_gt *gt, > > for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { > mutex_lock(&ggtt->error_mutex); > - ggtt->vm.insert_page(&ggtt->vm, dma, slot, > - I915_CACHE_NONE, 0); > + if (ggtt->vm.raw_insert_page) > + ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot, > + I915_CACHE_NONE, 0); > + else > + ggtt->vm.insert_page(&ggtt->vm, dma, slot, > + I915_CACHE_NONE, 0); > mb(); > > s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
Commit message and code changes look good to me. Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> This is a question about the issue mentioned in this patch, not the patch. The tests (igt@gem_ctx_persistence@legacy-engines-hostile@render) mentioned in this issue ( https://gitlab.freedesktop.org/drm/intel/-/issues/5595 ) are dealing with a test that causes gpu reset / hang? Or did the reset happen during the test? br, G.G. On 6/17/22 4:21 PM, Das, Nirmoy wrote: > Missed the fdo issue ref: > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5595 > > On 6/17/2022 1:55 PM, Nirmoy Das wrote: >> For some platfroms we use stop_machine version of >> gen8_ggtt_insert_page/gen8_ggtt_insert_entries to avoid a >> concurrent GGTT access bug but this causes a circular locking >> dependency warning: >> >> Possible unsafe locking scenario: >> CPU0 CPU1 >> ---- ---- >> lock(&ggtt->error_mutex); >> lock(dma_fence_map); >> lock(&ggtt->error_mutex); >> lock(cpu_hotplug_lock); >> >> Fix this by calling gen8_ggtt_insert_page/gen8_ggtt_insert_entries >> directly at error capture which is concurrent GGTT access safe because >> reset path make sure of that. >> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> >> --- >> drivers/gpu/drm/i915/gt/intel_gt_gmch.c | 2 ++ >> drivers/gpu/drm/i915/gt/intel_gtt.h | 9 +++++++++ >> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 ++++- >> drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++++-- >> 4 files changed, 21 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >> b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >> index 18e488672d1b..2260ed576776 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >> +++ b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >> @@ -629,6 +629,8 @@ int intel_gt_gmch_gen8_probe(struct i915_ggtt *ggtt) >> if (intel_vm_no_concurrent_access_wa(i915)) { >> ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL; >> ggtt->vm.insert_page = bxt_vtd_ggtt_insert_page__BKL; >> + ggtt->vm.raw_insert_page = gen8_ggtt_insert_page; >> + ggtt->vm.raw_insert_entries = gen8_ggtt_insert_entries; >> ggtt->vm.bind_async_flags = >> I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; >> } >> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h >> b/drivers/gpu/drm/i915/gt/intel_gtt.h >> index a40d928b3888..f9a1f3d17272 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h >> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h >> @@ -306,6 +306,15 @@ struct i915_address_space { >> struct i915_vma_resource *vma_res, >> enum i915_cache_level cache_level, >> u32 flags); >> + void (*raw_insert_page)(struct i915_address_space *vm, >> + dma_addr_t addr, >> + u64 offset, >> + enum i915_cache_level cache_level, >> + u32 flags); >> + void (*raw_insert_entries)(struct i915_address_space *vm, >> + struct i915_vma_resource *vma_res, >> + enum i915_cache_level cache_level, >> + u32 flags); >> void (*cleanup)(struct i915_address_space *vm); >> void (*foreach)(struct i915_address_space *vm, >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >> index d2c5c9367cc4..c06e83872c34 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >> @@ -493,7 +493,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw >> *uc_fw) >> if (i915_gem_object_is_lmem(obj)) >> pte_flags |= PTE_LM; >> - ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, >> pte_flags); >> + if (ggtt->vm.raw_insert_entries) >> + ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, >> I915_CACHE_NONE, pte_flags); >> + else >> + ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, >> pte_flags); >> } >> static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw) >> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c >> b/drivers/gpu/drm/i915/i915_gpu_error.c >> index bff8a111424a..f9b1969ed7ed 100644 >> --- a/drivers/gpu/drm/i915/i915_gpu_error.c >> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c >> @@ -1104,8 +1104,12 @@ i915_vma_coredump_create(const struct intel_gt >> *gt, >> for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { >> mutex_lock(&ggtt->error_mutex); >> - ggtt->vm.insert_page(&ggtt->vm, dma, slot, >> - I915_CACHE_NONE, 0); >> + if (ggtt->vm.raw_insert_page) >> + ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot, >> + I915_CACHE_NONE, 0); >> + else >> + ggtt->vm.insert_page(&ggtt->vm, dma, slot, >> + I915_CACHE_NONE, 0); >> mb(); >> s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
Hi G.G. On 6/23/2022 5:31 PM, Gwan-gyeong Mun wrote: > Commit message and code changes look good to me. > > Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> > > This is a question about the issue mentioned in this patch, not the > patch. The tests > (igt@gem_ctx_persistence@legacy-engines-hostile@render) mentioned in > this issue ( https://gitlab.freedesktop.org/drm/intel/-/issues/5595 ) > are dealing with a test that causes gpu reset / hang? Or did the reset > happen during the test? If the test resets the GPU then APL should hit this. From a quick look into the igt, the test seems to me can trigger reset. The issue also mentions other tests like igt@i915_hangman@engine-error-state-capture which might sounds more reasonable. Regards, Nirmoy > > br, > G.G. > > On 6/17/22 4:21 PM, Das, Nirmoy wrote: >> Missed the fdo issue ref: >> >> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5595 >> >> On 6/17/2022 1:55 PM, Nirmoy Das wrote: >>> For some platfroms we use stop_machine version of >>> gen8_ggtt_insert_page/gen8_ggtt_insert_entries to avoid a >>> concurrent GGTT access bug but this causes a circular locking >>> dependency warning: >>> >>> Possible unsafe locking scenario: >>> CPU0 CPU1 >>> ---- ---- >>> lock(&ggtt->error_mutex); >>> lock(dma_fence_map); >>> lock(&ggtt->error_mutex); >>> lock(cpu_hotplug_lock); >>> >>> Fix this by calling gen8_ggtt_insert_page/gen8_ggtt_insert_entries >>> directly at error capture which is concurrent GGTT access safe because >>> reset path make sure of that. >>> >>> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> >>> --- >>> drivers/gpu/drm/i915/gt/intel_gt_gmch.c | 2 ++ >>> drivers/gpu/drm/i915/gt/intel_gtt.h | 9 +++++++++ >>> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 ++++- >>> drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++++-- >>> 4 files changed, 21 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >>> b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >>> index 18e488672d1b..2260ed576776 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c >>> @@ -629,6 +629,8 @@ int intel_gt_gmch_gen8_probe(struct i915_ggtt >>> *ggtt) >>> if (intel_vm_no_concurrent_access_wa(i915)) { >>> ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL; >>> ggtt->vm.insert_page = bxt_vtd_ggtt_insert_page__BKL; >>> + ggtt->vm.raw_insert_page = gen8_ggtt_insert_page; >>> + ggtt->vm.raw_insert_entries = gen8_ggtt_insert_entries; >>> ggtt->vm.bind_async_flags = >>> I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; >>> } >>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h >>> b/drivers/gpu/drm/i915/gt/intel_gtt.h >>> index a40d928b3888..f9a1f3d17272 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h >>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h >>> @@ -306,6 +306,15 @@ struct i915_address_space { >>> struct i915_vma_resource *vma_res, >>> enum i915_cache_level cache_level, >>> u32 flags); >>> + void (*raw_insert_page)(struct i915_address_space *vm, >>> + dma_addr_t addr, >>> + u64 offset, >>> + enum i915_cache_level cache_level, >>> + u32 flags); >>> + void (*raw_insert_entries)(struct i915_address_space *vm, >>> + struct i915_vma_resource *vma_res, >>> + enum i915_cache_level cache_level, >>> + u32 flags); >>> void (*cleanup)(struct i915_address_space *vm); >>> void (*foreach)(struct i915_address_space *vm, >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >>> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >>> index d2c5c9367cc4..c06e83872c34 100644 >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c >>> @@ -493,7 +493,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw >>> *uc_fw) >>> if (i915_gem_object_is_lmem(obj)) >>> pte_flags |= PTE_LM; >>> - ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, >>> pte_flags); >>> + if (ggtt->vm.raw_insert_entries) >>> + ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, >>> I915_CACHE_NONE, pte_flags); >>> + else >>> + ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, >>> pte_flags); >>> } >>> static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw) >>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c >>> b/drivers/gpu/drm/i915/i915_gpu_error.c >>> index bff8a111424a..f9b1969ed7ed 100644 >>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c >>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c >>> @@ -1104,8 +1104,12 @@ i915_vma_coredump_create(const struct >>> intel_gt *gt, >>> for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { >>> mutex_lock(&ggtt->error_mutex); >>> - ggtt->vm.insert_page(&ggtt->vm, dma, slot, >>> - I915_CACHE_NONE, 0); >>> + if (ggtt->vm.raw_insert_page) >>> + ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot, >>> + I915_CACHE_NONE, 0); >>> + else >>> + ggtt->vm.insert_page(&ggtt->vm, dma, slot, >>> + I915_CACHE_NONE, 0); >>> mb(); >>> s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c index 18e488672d1b..2260ed576776 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c @@ -629,6 +629,8 @@ int intel_gt_gmch_gen8_probe(struct i915_ggtt *ggtt) if (intel_vm_no_concurrent_access_wa(i915)) { ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL; ggtt->vm.insert_page = bxt_vtd_ggtt_insert_page__BKL; + ggtt->vm.raw_insert_page = gen8_ggtt_insert_page; + ggtt->vm.raw_insert_entries = gen8_ggtt_insert_entries; ggtt->vm.bind_async_flags = I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; } diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index a40d928b3888..f9a1f3d17272 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -306,6 +306,15 @@ struct i915_address_space { struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags); + void (*raw_insert_page)(struct i915_address_space *vm, + dma_addr_t addr, + u64 offset, + enum i915_cache_level cache_level, + u32 flags); + void (*raw_insert_entries)(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, + enum i915_cache_level cache_level, + u32 flags); void (*cleanup)(struct i915_address_space *vm); void (*foreach)(struct i915_address_space *vm, diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index d2c5c9367cc4..c06e83872c34 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -493,7 +493,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw) if (i915_gem_object_is_lmem(obj)) pte_flags |= PTE_LM; - ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags); + if (ggtt->vm.raw_insert_entries) + ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags); + else + ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags); } static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index bff8a111424a..f9b1969ed7ed 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1104,8 +1104,12 @@ i915_vma_coredump_create(const struct intel_gt *gt, for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { mutex_lock(&ggtt->error_mutex); - ggtt->vm.insert_page(&ggtt->vm, dma, slot, - I915_CACHE_NONE, 0); + if (ggtt->vm.raw_insert_page) + ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot, + I915_CACHE_NONE, 0); + else + ggtt->vm.insert_page(&ggtt->vm, dma, slot, + I915_CACHE_NONE, 0); mb(); s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
For some platfroms we use stop_machine version of gen8_ggtt_insert_page/gen8_ggtt_insert_entries to avoid a concurrent GGTT access bug but this causes a circular locking dependency warning: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ggtt->error_mutex); lock(dma_fence_map); lock(&ggtt->error_mutex); lock(cpu_hotplug_lock); Fix this by calling gen8_ggtt_insert_page/gen8_ggtt_insert_entries directly at error capture which is concurrent GGTT access safe because reset path make sure of that. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> --- drivers/gpu/drm/i915/gt/intel_gt_gmch.c | 2 ++ drivers/gpu/drm/i915/gt/intel_gtt.h | 9 +++++++++ drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 ++++- drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++++-- 4 files changed, 21 insertions(+), 3 deletions(-)