Message ID | 1539304063-9518-1-git-send-email-sunpeng.li@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: Get ref on CRTC commit object when waiting for flip_done | expand |
On Thu, Oct 11, 2018 at 08:27:43PM -0400, sunpeng.li@amd.com wrote: > From: Leo Li <sunpeng.li@amd.com> > > This fixes a general protection fault, caused by accessing the contents > of a flip_done completion object that has already been freed. It occurs > due to the preemption of a non-blocking commit worker thread W by > another commit thread X. X continues to clear its atomic state at the > end, destroying the CRTC commit object that W still needs. Switching > back to W and accessing the commit objects then leads to bad results. > > Worker W becomes preemptable when waiting for flip_done to complete. At > this point, a frequently occurring commit thread X can take over. Here's > an example where W is a worker thread that flips on both CRTCs, and X > does a legacy cursor update on both CRTCs: > > ... > 1. W does flip work > 2. W runs commit_hw_done() > 3. W waits for flip_done on CRTC 1 > 4. > flip_done for CRTC 1 completes > 5. W finishes waiting for CRTC 1 > 6. W waits for flip_done on CRTC 2 > > 7. > Preempted by X > 8. > flip_done for CRTC 2 completes > 9. X atomic_check: hw_done and flip_done are complete on all CRTCs > 10. X updates cursor on both CRTCs > 11. X destroys atomic state > 12. X done > > 13. > Switch back to W > 14. W waits for flip_done on CRTC 2 > 15. W raises general protection fault > > The error looks like so: > > general protection fault: 0000 [#1] PREEMPT SMP PTI > **snip** > Call Trace: > lock_acquire+0xa2/0x1b0 > _raw_spin_lock_irq+0x39/0x70 > wait_for_completion_timeout+0x31/0x130 > drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper] > amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu] > commit_tail+0x3d/0x70 [drm_kms_helper] > process_one_work+0x212/0x650 > worker_thread+0x49/0x420 > kthread+0xfb/0x130 > ret_from_fork+0x3a/0x50 > Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O) > gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt > fb_sys_fops ttm(O) drm(O) > > Note that i915 has this issue masked, since hw_done is signaled after > waiting for flip_done. Doing so will block the cursor update from > happening until hw_done is signaled, preventing the cursor commit from > destroying the state. > > Signed-off-by: Leo Li <sunpeng.li@amd.com> Great analysis! Bugfix itself needs a bit more work though, see below. > --- > drivers/gpu/drm/drm_atomic_helper.c | 30 ++++++++++++++++++++++++++---- > 1 file changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > index 80be74d..efdf043 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -1410,20 +1410,42 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev, > { > struct drm_crtc_state *new_crtc_state; > struct drm_crtc *crtc; > + struct drm_crtc_commit **commits; > int i; > + int num_crtc = dev->mode_config.num_crtc; > > + commits = kcalloc(num_crtc, sizeof(*commits), GFP_KERNEL); > + > + /* > + * Because we open ourselves to preemption by calling > + * wait_for_completion_timeout(), we need to get our own references to > + * the commit objects. This is so that a preempting commit won't free > + * them. > + */ The scheduler can preempt you at any point before here too, if you enable CONFIG_PREEMPT. It definitely can preempt/block in the kcalloc above, even if you have CONFIG_PREEMPT disabled. The scheduling point you identified below is simply the most likely. The underlying bug is that after drm_atomic_helper_commit_hw_done() we unblock the next commit worker (thread X in your example), and cannot assume anymore that the new state will stay around (since new state is released by the subsequent commit work run in thread X). Therefore your drm_crtc_commit_get here already can access freed memory. The correct fix here is to store the temporary reference before we call drm_atomic_helper_commit_hw_done(), and then use that in this function here. The problem is that this is somewhat tricky to pull off, since some drivers call wait_for_flip_done() before hw_done() (like i915). Here's what I'd do: - Add a new drm_crtc_commit pointer to struct __drm_crtcs_state. - Fill that out in drm_atomic_helper_setup_commit, and make sure you release the reference for it in drm_atomic_state_default_clear(). - Use that pointer instead in drm_atomic_helper_wait_for_flip_done() here. Cheers, Daniel > for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) { > - struct drm_crtc_commit *commit = new_crtc_state->commit; > + commits[i] = new_crtc_state->commit; > + if (commits[i]) > + drm_crtc_commit_get(commits[i]); > + } > + > + for (i = 0; i < num_crtc; i++) { > int ret; > > - if (!commit) > + if (!commits[i]) > continue; > > - ret = wait_for_completion_timeout(&commit->flip_done, 10 * HZ); > + ret = wait_for_completion_timeout(&commits[i]->flip_done, > + 10 * HZ); > if (ret == 0) > DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", > - crtc->base.id, crtc->name); > + commits[i]->crtc->base.id, > + commits[i]->crtc->name); > } > + > + for (i = 0; i < num_crtc; i++) > + if (commits[i]) > + drm_crtc_commit_put(commits[i]); > + kfree(commits); > } > EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done); > > -- > 2.7.4 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On 2018-10-12 03:34 AM, Daniel Vetter wrote: > On Thu, Oct 11, 2018 at 08:27:43PM -0400, sunpeng.li@amd.com wrote: >> From: Leo Li <sunpeng.li@amd.com> >> >> This fixes a general protection fault, caused by accessing the contents >> of a flip_done completion object that has already been freed. It occurs >> due to the preemption of a non-blocking commit worker thread W by >> another commit thread X. X continues to clear its atomic state at the >> end, destroying the CRTC commit object that W still needs. Switching >> back to W and accessing the commit objects then leads to bad results. >> >> Worker W becomes preemptable when waiting for flip_done to complete. At >> this point, a frequently occurring commit thread X can take over. Here's >> an example where W is a worker thread that flips on both CRTCs, and X >> does a legacy cursor update on both CRTCs: >> >> ... >> 1. W does flip work >> 2. W runs commit_hw_done() >> 3. W waits for flip_done on CRTC 1 >> 4. > flip_done for CRTC 1 completes >> 5. W finishes waiting for CRTC 1 >> 6. W waits for flip_done on CRTC 2 >> >> 7. > Preempted by X >> 8. > flip_done for CRTC 2 completes >> 9. X atomic_check: hw_done and flip_done are complete on all CRTCs >> 10. X updates cursor on both CRTCs >> 11. X destroys atomic state >> 12. X done >> >> 13. > Switch back to W >> 14. W waits for flip_done on CRTC 2 >> 15. W raises general protection fault >> >> The error looks like so: >> >> general protection fault: 0000 [#1] PREEMPT SMP PTI >> **snip** >> Call Trace: >> lock_acquire+0xa2/0x1b0 >> _raw_spin_lock_irq+0x39/0x70 >> wait_for_completion_timeout+0x31/0x130 >> drm_atomic_helper_wait_for_flip_done+0x64/0x90 [drm_kms_helper] >> amdgpu_dm_atomic_commit_tail+0xcae/0xdd0 [amdgpu] >> commit_tail+0x3d/0x70 [drm_kms_helper] >> process_one_work+0x212/0x650 >> worker_thread+0x49/0x420 >> kthread+0xfb/0x130 >> ret_from_fork+0x3a/0x50 >> Modules linked in: x86_pkg_temp_thermal amdgpu(O) chash(O) >> gpu_sched(O) drm_kms_helper(O) syscopyarea sysfillrect sysimgblt >> fb_sys_fops ttm(O) drm(O) >> >> Note that i915 has this issue masked, since hw_done is signaled after >> waiting for flip_done. Doing so will block the cursor update from >> happening until hw_done is signaled, preventing the cursor commit from >> destroying the state. >> >> Signed-off-by: Leo Li <sunpeng.li@amd.com> > > Great analysis! > > Bugfix itself needs a bit more work though, see below. > >> --- >> drivers/gpu/drm/drm_atomic_helper.c | 30 ++++++++++++++++++++++++++---- >> 1 file changed, 26 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c >> index 80be74d..efdf043 100644 >> --- a/drivers/gpu/drm/drm_atomic_helper.c >> +++ b/drivers/gpu/drm/drm_atomic_helper.c >> @@ -1410,20 +1410,42 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev, >> { >> struct drm_crtc_state *new_crtc_state; >> struct drm_crtc *crtc; >> + struct drm_crtc_commit **commits; >> int i; >> + int num_crtc = dev->mode_config.num_crtc; >> >> + commits = kcalloc(num_crtc, sizeof(*commits), GFP_KERNEL); >> + >> + /* >> + * Because we open ourselves to preemption by calling >> + * wait_for_completion_timeout(), we need to get our own references to >> + * the commit objects. This is so that a preempting commit won't free >> + * them. >> + */ > > The scheduler can preempt you at any point before here too, if you enable > CONFIG_PREEMPT. It definitely can preempt/block in the kcalloc above, even > if you have CONFIG_PREEMPT disabled. The scheduling point you identified > below is simply the most likely. > Ah, didn't think of this. > The underlying bug is that after drm_atomic_helper_commit_hw_done() we > unblock the next commit worker (thread X in your example), and cannot > assume anymore that the new state will stay around (since new state is > released by the subsequent commit work run in thread X). > > Therefore your drm_crtc_commit_get here already can access freed memory. > The correct fix here is to store the temporary reference before we call > drm_atomic_helper_commit_hw_done(), and then use that in this function > here. The problem is that this is somewhat tricky to pull off, since some > drivers call wait_for_flip_done() before hw_done() (like i915). > > Here's what I'd do: > - Add a new drm_crtc_commit pointer to struct __drm_crtcs_state. > > - Fill that out in drm_atomic_helper_setup_commit, and make sure you > release the reference for it in drm_atomic_state_default_clear(). > > - Use that pointer instead in drm_atomic_helper_wait_for_flip_done() here. Thanks for the pointers, patch incoming :) Leo > > Cheers, Daniel >> for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) { >> - struct drm_crtc_commit *commit = new_crtc_state->commit; >> + commits[i] = new_crtc_state->commit; >> + if (commits[i]) >> + drm_crtc_commit_get(commits[i]); >> + } >> + >> + for (i = 0; i < num_crtc; i++) { >> int ret; >> >> - if (!commit) >> + if (!commits[i]) >> continue; >> >> - ret = wait_for_completion_timeout(&commit->flip_done, 10 * HZ); >> + ret = wait_for_completion_timeout(&commits[i]->flip_done, >> + 10 * HZ); >> if (ret == 0) >> DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", >> - crtc->base.id, crtc->name); >> + commits[i]->crtc->base.id, >> + commits[i]->crtc->name); >> } >> + >> + for (i = 0; i < num_crtc; i++) >> + if (commits[i]) >> + drm_crtc_commit_put(commits[i]); >> + kfree(commits); >> } >> EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done); >> >> -- >> 2.7.4 >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 80be74d..efdf043 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1410,20 +1410,42 @@ void drm_atomic_helper_wait_for_flip_done(struct drm_device *dev, { struct drm_crtc_state *new_crtc_state; struct drm_crtc *crtc; + struct drm_crtc_commit **commits; int i; + int num_crtc = dev->mode_config.num_crtc; + commits = kcalloc(num_crtc, sizeof(*commits), GFP_KERNEL); + + /* + * Because we open ourselves to preemption by calling + * wait_for_completion_timeout(), we need to get our own references to + * the commit objects. This is so that a preempting commit won't free + * them. + */ for_each_new_crtc_in_state(old_state, crtc, new_crtc_state, i) { - struct drm_crtc_commit *commit = new_crtc_state->commit; + commits[i] = new_crtc_state->commit; + if (commits[i]) + drm_crtc_commit_get(commits[i]); + } + + for (i = 0; i < num_crtc; i++) { int ret; - if (!commit) + if (!commits[i]) continue; - ret = wait_for_completion_timeout(&commit->flip_done, 10 * HZ); + ret = wait_for_completion_timeout(&commits[i]->flip_done, + 10 * HZ); if (ret == 0) DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", - crtc->base.id, crtc->name); + commits[i]->crtc->base.id, + commits[i]->crtc->name); } + + for (i = 0; i < num_crtc; i++) + if (commits[i]) + drm_crtc_commit_put(commits[i]); + kfree(commits); } EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done);