Message ID | 7b9dbbbb1e6a3aa6d7a4d9367d44d18ddd947158.1725269643.git.tjakobi@math.uni-bielefeld.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/amd: fix VRR race condition during IRQ handling | expand |
On Mon, 2024-09-02 at 11:40 +0200, tjakobi@math.uni-bielefeld.de wrote: > From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > > dc_state_destruct() nulls the resource context of the DC state. The > pipe > context passed to dcn10_set_drr() is a member of this resource > context. > > If dc_state_destruct() is called parallel to the IRQ processing > (which > calls dcn10_set_drr() at some point), we can end up using already > nulled > function callback fields of struct stream_resource. > > The logic in dcn10_set_drr() already tries to avoid this, by checking > tg > against NULL. But if the nulling happens exactly after the NULL check > and > before the next access, then we get a race. > > Avoid this by copying tg first to a local variable, and then use this > variable for all the operations. This should work, as long as nobody > frees the resource pool where the timing generators live. > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > Fixes: 06ad7e164256 ("drm/amd/display: Destroy DC context while > keeping DML and DML2") > Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > --- > .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++------ > -- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > index 3306684e805a..da8f2cb3c5db 100644 > --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > @@ -3223,15 +3223,19 @@ void dcn10_set_drr(struct pipe_ctx > **pipe_ctx, > * as well. > */ > for (i = 0; i < num_pipes; i++) { > - if ((pipe_ctx[i]->stream_res.tg != NULL) && > pipe_ctx[i]->stream_res.tg->funcs) { > - if (pipe_ctx[i]->stream_res.tg->funcs- > >set_drr) > - pipe_ctx[i]->stream_res.tg->funcs- > >set_drr( > - pipe_ctx[i]->stream_res.tg, > ¶ms); > + /* dc_state_destruct() might null the stream > resources, so fetch tg > + * here first to avoid a race condition. The > lifetime of the pointee > + * itself (the timing_generator object) is not a > problem here. > + */ > + struct timing_generator *tg = pipe_ctx[i]- > >stream_res.tg; > + > + if ((tg != NULL) && tg->funcs) { > + if (tg->funcs->set_drr) > + tg->funcs->set_drr(tg, ¶ms); > if (adjust.v_total_max != 0 && > adjust.v_total_min != 0) > - if (pipe_ctx[i]->stream_res.tg- > >funcs->set_static_screen_control) > - pipe_ctx[i]->stream_res.tg- > >funcs->set_static_screen_control( > - pipe_ctx[i]- > >stream_res.tg, > - event_triggers, > num_frames); > + if (tg->funcs- > >set_static_screen_control) > + tg->funcs- > >set_static_screen_control( > + tg, event_triggers, > num_frames); > } > } > } This fixes the panics with my RX 6800 XT on Sway with VRR enabled! Tested-by: Sefa Eyeoglu <contact@scrumplex.net>
> From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > > dc_state_destruct() nulls the resource context of the DC state. The > pipe > context passed to dcn10_set_drr() is a member of this resource > context. > > If dc_state_destruct() is called parallel to the IRQ processing > (which > calls dcn10_set_drr() at some point), we can end up using already > nulled > function callback fields of struct stream_resource. > > The logic in dcn10_set_drr() already tries to avoid this, by checking > tg > against NULL. But if the nulling happens exactly after the NULL check > and > before the next access, then we get a race. > > Avoid this by copying tg first to a local variable, and then use this > variable for all the operations. This should work, as long as nobody > frees the resource pool where the timing generators live. > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > Fixes: 06ad7e164256 ("drm/amd/display: Destroy DC context while > keeping DML and DML2") > Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > --- > .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++------ > -- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > index 3306684e805a..da8f2cb3c5db 100644 > --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > @@ -3223,15 +3223,19 @@ void dcn10_set_drr(struct pipe_ctx > **pipe_ctx, > * as well. > */ > for (i = 0; i < num_pipes; i++) { > - if ((pipe_ctx[i]->stream_res.tg != NULL) && > pipe_ctx[i]->stream_res.tg->funcs) { > - if (pipe_ctx[i]->stream_res.tg->funcs- > >set_drr) > - pipe_ctx[i]->stream_res.tg->funcs- > >set_drr( > - pipe_ctx[i]->stream_res.tg, > ¶ms); > + /* dc_state_destruct() might null the stream > resources, so fetch tg > + * here first to avoid a race condition. The > lifetime of the pointee > + * itself (the timing_generator object) is not a > problem here. > + */ > + struct timing_generator *tg = pipe_ctx[i]- > >stream_res.tg; > + > + if ((tg != NULL) && tg->funcs) { > + if (tg->funcs->set_drr) > + tg->funcs->set_drr(tg, ¶ms); > if (adjust.v_total_max != 0 && > adjust.v_total_min != 0) > - if (pipe_ctx[i]->stream_res.tg- > >funcs->set_static_screen_control) > - pipe_ctx[i]->stream_res.tg- > >funcs->set_static_screen_control( > - pipe_ctx[i]- > >stream_res.tg, > - event_triggers, > num_frames); > + if (tg->funcs- > >set_static_screen_control) > + tg->funcs- > >set_static_screen_control( > + tg, event_triggers, > num_frames); > } > } > } This fixes full system freezes when taking screenshots at low framerates with VRR enabled on an RX 7900 XTX. Tested-by: Raoul van Rüschen <raoul.van.rueschen@gmail.com>
On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote: > From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > > dc_state_destruct() nulls the resource context of the DC state. The pipe > context passed to dcn10_set_drr() is a member of this resource context. > > If dc_state_destruct() is called parallel to the IRQ processing (which > calls dcn10_set_drr() at some point), we can end up using already nulled > function callback fields of struct stream_resource. > > The logic in dcn10_set_drr() already tries to avoid this, by checking tg > against NULL. But if the nulling happens exactly after the NULL check and > before the next access, then we get a race. > > Avoid this by copying tg first to a local variable, and then use this > variable for all the operations. This should work, as long as nobody > frees the resource pool where the timing generators live. > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > Fixes: 06ad7e164256 ("drm/amd/display: Destroy DC context while keeping DML and DML2") > Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > --- > .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > index 3306684e805a..da8f2cb3c5db 100644 > --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > @@ -3223,15 +3223,19 @@ void dcn10_set_drr(struct pipe_ctx **pipe_ctx, > * as well. > */ > for (i = 0; i < num_pipes; i++) { > - if ((pipe_ctx[i]->stream_res.tg != NULL) && pipe_ctx[i]->stream_res.tg->funcs) { > - if (pipe_ctx[i]->stream_res.tg->funcs->set_drr) > - pipe_ctx[i]->stream_res.tg->funcs->set_drr( > - pipe_ctx[i]->stream_res.tg, ¶ms); > + /* dc_state_destruct() might null the stream resources, so fetch tg > + * here first to avoid a race condition. The lifetime of the pointee > + * itself (the timing_generator object) is not a problem here. > + */ > + struct timing_generator *tg = pipe_ctx[i]->stream_res.tg; > + > + if ((tg != NULL) && tg->funcs) { > + if (tg->funcs->set_drr) > + tg->funcs->set_drr(tg, ¶ms); > if (adjust.v_total_max != 0 && adjust.v_total_min != 0) > - if (pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control) > - pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control( > - pipe_ctx[i]->stream_res.tg, > - event_triggers, num_frames); > + if (tg->funcs->set_static_screen_control) > + tg->funcs->set_static_screen_control( > + tg, event_triggers, num_frames); > } > } > } This fixes hard to trace panics with labwc VRR and Wayfire on RX 6700 XT. I had to use netconsole to arrive at the original bug report. Tested-by: Christopher Snowhill <chris@kode54.net>
On 2024-09-02 05:40, tjakobi@math.uni-bielefeld.de wrote: > From: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> > > dc_state_destruct() nulls the resource context of the DC state. The pipe > context passed to dcn10_set_drr() is a member of this resource context. > > If dc_state_destruct() is called parallel to the IRQ processing (which > calls dcn10_set_drr() at some point), we can end up using already nulled > function callback fields of struct stream_resource. > > The logic in dcn10_set_drr() already tries to avoid this, by checking tg > against NULL. But if the nulling happens exactly after the NULL check and > before the next access, then we get a race. > > Avoid this by copying tg first to a local variable, and then use this > variable for all the operations. This should work, as long as nobody > frees the resource pool where the timing generators live. > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3142 > Fixes: 06ad7e164256 ("drm/amd/display: Destroy DC context while keeping DML and DML2") > Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Thanks for this fix. It also makes the code more readable. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Harry > --- > .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > index 3306684e805a..da8f2cb3c5db 100644 > --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c > @@ -3223,15 +3223,19 @@ void dcn10_set_drr(struct pipe_ctx **pipe_ctx, > * as well. > */ > for (i = 0; i < num_pipes; i++) { > - if ((pipe_ctx[i]->stream_res.tg != NULL) && pipe_ctx[i]->stream_res.tg->funcs) { > - if (pipe_ctx[i]->stream_res.tg->funcs->set_drr) > - pipe_ctx[i]->stream_res.tg->funcs->set_drr( > - pipe_ctx[i]->stream_res.tg, ¶ms); > + /* dc_state_destruct() might null the stream resources, so fetch tg > + * here first to avoid a race condition. The lifetime of the pointee > + * itself (the timing_generator object) is not a problem here. > + */ > + struct timing_generator *tg = pipe_ctx[i]->stream_res.tg; > + > + if ((tg != NULL) && tg->funcs) { > + if (tg->funcs->set_drr) > + tg->funcs->set_drr(tg, ¶ms); > if (adjust.v_total_max != 0 && adjust.v_total_min != 0) > - if (pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control) > - pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control( > - pipe_ctx[i]->stream_res.tg, > - event_triggers, num_frames); > + if (tg->funcs->set_static_screen_control) > + tg->funcs->set_static_screen_control( > + tg, event_triggers, num_frames); > } > } > }
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c index 3306684e805a..da8f2cb3c5db 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c @@ -3223,15 +3223,19 @@ void dcn10_set_drr(struct pipe_ctx **pipe_ctx, * as well. */ for (i = 0; i < num_pipes; i++) { - if ((pipe_ctx[i]->stream_res.tg != NULL) && pipe_ctx[i]->stream_res.tg->funcs) { - if (pipe_ctx[i]->stream_res.tg->funcs->set_drr) - pipe_ctx[i]->stream_res.tg->funcs->set_drr( - pipe_ctx[i]->stream_res.tg, ¶ms); + /* dc_state_destruct() might null the stream resources, so fetch tg + * here first to avoid a race condition. The lifetime of the pointee + * itself (the timing_generator object) is not a problem here. + */ + struct timing_generator *tg = pipe_ctx[i]->stream_res.tg; + + if ((tg != NULL) && tg->funcs) { + if (tg->funcs->set_drr) + tg->funcs->set_drr(tg, ¶ms); if (adjust.v_total_max != 0 && adjust.v_total_min != 0) - if (pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control) - pipe_ctx[i]->stream_res.tg->funcs->set_static_screen_control( - pipe_ctx[i]->stream_res.tg, - event_triggers, num_frames); + if (tg->funcs->set_static_screen_control) + tg->funcs->set_static_screen_control( + tg, event_triggers, num_frames); } } }