Message ID | 20181221143324.27679-1-nicholas.kazlauskas@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: Block fb changes for async plane updates | expand |
As far as we discussed this internally looks good to me, but obviously we need to wait for some feedback from non AMD people. Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Andrey On 12/21/2018 09:33 AM, Nicholas Kazlauskas wrote: > The behavior of drm_atomic_helper_cleanup_planes differs depending on > whether the commit was an asynchronous update or not. > > For a typical (non-async) atomic commit prepare_fb is called on the > new_plane_state and cleanup_fb is called on the old_plane_state. > > However, async commits are performed in place and don't swap the state > for the plane. The call to prepare_fb happens on the new_plane_state > and the call to cleanup_fb is also called on the new_plane_state in > this case (since the state hasn't swapped). > > This behavior can lead to use-after-free or unpin of an active fb. > > Consider the following sequence of events for interleaving fbs: > > - Async update, fb1 prepare, fb1 cleanup_fb > - Async update, fb2 prepare, fb2 cleanup_fb > - Non-async update, fb1 prepare, fb2 cleanup_fb > - Async update, fb2 cleanup_fb -> double cleanup, use-after-free > > This situation has been observed in practice for a double buffered > cursor when closing an X client. The non-async update occurs because > the new_plane_state->crtc != old_plane_state->crtc which forces the > non-async path to occur. > > The simplest fix for this is to block fb updates in > drm_atomic_helper_async_check. This guarantees that the framebuffer > will have previously been prepared and any subsequent async updates > will always call prepare and cleanup_fb like the non-async atomic > commit path would. > > Cc: Michel Dänzer <michel@daenzer.net> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Cc: Harry Wentland <harry.wentland@amd.com> > Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> > --- > drivers/gpu/drm/drm_atomic_helper.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > index 54e2ae614dcc..d2f80bf14f86 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev, > return -EINVAL; > > if (!new_plane_state->crtc || > - old_plane_state->crtc != new_plane_state->crtc) > + old_plane_state->crtc != new_plane_state->crtc || > + old_plane_state->fb != new_plane_state->fb) > return -EINVAL; > > funcs = plane->helper_private;
On Fri, Dec 21, 2018 at 09:33:24AM -0500, Nicholas Kazlauskas wrote: > The behavior of drm_atomic_helper_cleanup_planes differs depending on > whether the commit was an asynchronous update or not. > > For a typical (non-async) atomic commit prepare_fb is called on the > new_plane_state and cleanup_fb is called on the old_plane_state. > > However, async commits are performed in place and don't swap the state > for the plane. The call to prepare_fb happens on the new_plane_state > and the call to cleanup_fb is also called on the new_plane_state in > this case (since the state hasn't swapped). > > This behavior can lead to use-after-free or unpin of an active fb. > > Consider the following sequence of events for interleaving fbs: > > - Async update, fb1 prepare, fb1 cleanup_fb > - Async update, fb2 prepare, fb2 cleanup_fb > - Non-async update, fb1 prepare, fb2 cleanup_fb > - Async update, fb2 cleanup_fb -> double cleanup, use-after-free I think I see your bug, but I'm completely lost in your description above. I think this is ok as a short-term gap, but imo better if it's a separate if condition with a FIXME comment. Long-term we want to fix this, and I think simplest way to do that is if we expect drivers to store the old fb in the new_plane_state (and check that with a WARN_ON like the others). I think that should work. We probably also need some locking on top, to prevent races with the cleanup_fb calls done by non-blocking commits, to make sure those clean up the right fb. -Daniel > This situation has been observed in practice for a double buffered > cursor when closing an X client. The non-async update occurs because > the new_plane_state->crtc != old_plane_state->crtc which forces the > non-async path to occur. > > The simplest fix for this is to block fb updates in > drm_atomic_helper_async_check. This guarantees that the framebuffer > will have previously been prepared and any subsequent async updates > will always call prepare and cleanup_fb like the non-async atomic > commit path would. > > Cc: Michel Dänzer <michel@daenzer.net> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Cc: Harry Wentland <harry.wentland@amd.com> > Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> > --- > drivers/gpu/drm/drm_atomic_helper.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > index 54e2ae614dcc..d2f80bf14f86 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev, > return -EINVAL; > > if (!new_plane_state->crtc || > - old_plane_state->crtc != new_plane_state->crtc) > + old_plane_state->crtc != new_plane_state->crtc || > + old_plane_state->fb != new_plane_state->fb) > return -EINVAL; > > funcs = plane->helper_private; > -- > 2.17.1 >
On 12/24/18 7:15 AM, Daniel Vetter wrote: > On Fri, Dec 21, 2018 at 09:33:24AM -0500, Nicholas Kazlauskas wrote: >> The behavior of drm_atomic_helper_cleanup_planes differs depending on >> whether the commit was an asynchronous update or not. >> >> For a typical (non-async) atomic commit prepare_fb is called on the >> new_plane_state and cleanup_fb is called on the old_plane_state. >> >> However, async commits are performed in place and don't swap the state >> for the plane. The call to prepare_fb happens on the new_plane_state >> and the call to cleanup_fb is also called on the new_plane_state in >> this case (since the state hasn't swapped). >> >> This behavior can lead to use-after-free or unpin of an active fb. >> >> Consider the following sequence of events for interleaving fbs: >> >> - Async update, fb1 prepare, fb1 cleanup_fb >> - Async update, fb2 prepare, fb2 cleanup_fb >> - Non-async update, fb1 prepare, fb2 cleanup_fb >> - Async update, fb2 cleanup_fb -> double cleanup, use-after-free > > I think I see your bug, but I'm completely lost in your description above. > > I think this is ok as a short-term gap, but imo better if it's a separate > if condition with a FIXME comment. > > Long-term we want to fix this, and I think simplest way to do that is if > we expect drivers to store the old fb in the new_plane_state (and check > that with a WARN_ON like the others). I think that should work. > > We probably also need some locking on top, to prevent races with the > cleanup_fb calls done by non-blocking commits, to make sure those clean up > the right fb. > -Daniel Hi Daniel, The description is supposed to be saying that the wrong fb is being cleaned up because in state updates to the plane don't swap the state pointer - which is required by the cleanup planes helper function. But putting that aside, I think what you suggest here would work best for "fixing" the problem. Storing the old fb pointer in the new state looks kind of odd, but as long as it's documented and there's the WARN_ON in the helper I think it's fine. I think I would prefer this over a solution that blocks fb changes in the helper altogether. I don't think any additional drm level locking is necessary here. The race with cleanup_fb calls is already something you have to worry about when dealing with async commits even if the fb hasn't changed. Most drivers have their own internal locks or ref-counting that can handle this. So to summarize, I think I can post a new patch series that addresses this problem by fixing existing async commit usage in vc4 and amdgpu for framebuffer changes. The last patch in the series could be the one that adds a WARN_ON in the drm_atomic_helper_async_commit helper along with a comment indicating that the old fb should be in the new plane state. Does this sound reasonable to you? Nicholas Kazlauskas > >> This situation has been observed in practice for a double buffered >> cursor when closing an X client. The non-async update occurs because >> the new_plane_state->crtc != old_plane_state->crtc which forces the >> non-async path to occur. >> >> The simplest fix for this is to block fb updates in >> drm_atomic_helper_async_check. This guarantees that the framebuffer >> will have previously been prepared and any subsequent async updates >> will always call prepare and cleanup_fb like the non-async atomic >> commit path would. >> >> Cc: Michel Dänzer <michel@daenzer.net> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> >> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> Cc: Harry Wentland <harry.wentland@amd.com> >> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> >> --- >> drivers/gpu/drm/drm_atomic_helper.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c >> index 54e2ae614dcc..d2f80bf14f86 100644 >> --- a/drivers/gpu/drm/drm_atomic_helper.c >> +++ b/drivers/gpu/drm/drm_atomic_helper.c >> @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev, >> return -EINVAL; >> >> if (!new_plane_state->crtc || >> - old_plane_state->crtc != new_plane_state->crtc) >> + old_plane_state->crtc != new_plane_state->crtc || >> + old_plane_state->fb != new_plane_state->fb) >> return -EINVAL; >> >> funcs = plane->helper_private; >> -- >> 2.17.1 >> >
On Wed, Jan 02, 2019 at 03:02:04PM +0000, Kazlauskas, Nicholas wrote: > On 12/24/18 7:15 AM, Daniel Vetter wrote: > > On Fri, Dec 21, 2018 at 09:33:24AM -0500, Nicholas Kazlauskas wrote: > >> The behavior of drm_atomic_helper_cleanup_planes differs depending on > >> whether the commit was an asynchronous update or not. > >> > >> For a typical (non-async) atomic commit prepare_fb is called on the > >> new_plane_state and cleanup_fb is called on the old_plane_state. > >> > >> However, async commits are performed in place and don't swap the state > >> for the plane. The call to prepare_fb happens on the new_plane_state > >> and the call to cleanup_fb is also called on the new_plane_state in > >> this case (since the state hasn't swapped). > >> > >> This behavior can lead to use-after-free or unpin of an active fb. > >> > >> Consider the following sequence of events for interleaving fbs: > >> > >> - Async update, fb1 prepare, fb1 cleanup_fb > >> - Async update, fb2 prepare, fb2 cleanup_fb > >> - Non-async update, fb1 prepare, fb2 cleanup_fb > >> - Async update, fb2 cleanup_fb -> double cleanup, use-after-free > > > > I think I see your bug, but I'm completely lost in your description above. > > > > I think this is ok as a short-term gap, but imo better if it's a separate > > if condition with a FIXME comment. > > > > Long-term we want to fix this, and I think simplest way to do that is if > > we expect drivers to store the old fb in the new_plane_state (and check > > that with a WARN_ON like the others). I think that should work. > > > > We probably also need some locking on top, to prevent races with the > > cleanup_fb calls done by non-blocking commits, to make sure those clean up > > the right fb. > > -Daniel > > Hi Daniel, > > The description is supposed to be saying that the wrong fb is being > cleaned up because in state updates to the plane don't swap the state > pointer - which is required by the cleanup planes helper function. Yeah I inferred that, but the actual commit message isn't really clear on this, and the example you've listed only confused me. We need good commit messages, so that half a year down the road we still know what exactly we've been thinking :-) Especially in an area that's really tricky, like synchronization of the nonblocking atomic updates against normal updates (and each another), which has seen plenty of bugs in the past. > But putting that aside, I think what you suggest here would work best > for "fixing" the problem. Storing the old fb pointer in the new state > looks kind of odd, but as long as it's documented and there's the > WARN_ON in the helper I think it's fine. I think I would prefer this > over a solution that blocks fb changes in the helper altogether. > > I don't think any additional drm level locking is necessary here. The > race with cleanup_fb calls is already something you have to worry about > when dealing with async commits even if the fb hasn't changed. Most > drivers have their own internal locks or ref-counting that can handle this. > > So to summarize, I think I can post a new patch series that addresses > this problem by fixing existing async commit usage in vc4 and amdgpu for > framebuffer changes. The last patch in the series could be the one that > adds a WARN_ON in the drm_atomic_helper_async_commit helper along with a > comment indicating that the old fb should be in the new plane state. > > Does this sound reasonable to you? Let's please do the the minimal bugfixe first (probably cc: stable), and figure out how to properly fix things up long-term. -Daniel > > Nicholas Kazlauskas > > > > >> This situation has been observed in practice for a double buffered > >> cursor when closing an X client. The non-async update occurs because > >> the new_plane_state->crtc != old_plane_state->crtc which forces the > >> non-async path to occur. > >> > >> The simplest fix for this is to block fb updates in > >> drm_atomic_helper_async_check. This guarantees that the framebuffer > >> will have previously been prepared and any subsequent async updates > >> will always call prepare and cleanup_fb like the non-async atomic > >> commit path would. > >> > >> Cc: Michel Dänzer <michel@daenzer.net> > >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > >> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > >> Cc: Harry Wentland <harry.wentland@amd.com> > >> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> > >> --- > >> drivers/gpu/drm/drm_atomic_helper.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > >> index 54e2ae614dcc..d2f80bf14f86 100644 > >> --- a/drivers/gpu/drm/drm_atomic_helper.c > >> +++ b/drivers/gpu/drm/drm_atomic_helper.c > >> @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev, > >> return -EINVAL; > >> > >> if (!new_plane_state->crtc || > >> - old_plane_state->crtc != new_plane_state->crtc) > >> + old_plane_state->crtc != new_plane_state->crtc || > >> + old_plane_state->fb != new_plane_state->fb) > >> return -EINVAL; > >> > >> funcs = plane->helper_private; > >> -- > >> 2.17.1 > >> > > >
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 54e2ae614dcc..d2f80bf14f86 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev, return -EINVAL; if (!new_plane_state->crtc || - old_plane_state->crtc != new_plane_state->crtc) + old_plane_state->crtc != new_plane_state->crtc || + old_plane_state->fb != new_plane_state->fb) return -EINVAL; funcs = plane->helper_private;
The behavior of drm_atomic_helper_cleanup_planes differs depending on whether the commit was an asynchronous update or not. For a typical (non-async) atomic commit prepare_fb is called on the new_plane_state and cleanup_fb is called on the old_plane_state. However, async commits are performed in place and don't swap the state for the plane. The call to prepare_fb happens on the new_plane_state and the call to cleanup_fb is also called on the new_plane_state in this case (since the state hasn't swapped). This behavior can lead to use-after-free or unpin of an active fb. Consider the following sequence of events for interleaving fbs: - Async update, fb1 prepare, fb1 cleanup_fb - Async update, fb2 prepare, fb2 cleanup_fb - Non-async update, fb1 prepare, fb2 cleanup_fb - Async update, fb2 cleanup_fb -> double cleanup, use-after-free This situation has been observed in practice for a double buffered cursor when closing an X client. The non-async update occurs because the new_plane_state->crtc != old_plane_state->crtc which forces the non-async path to occur. The simplest fix for this is to block fb updates in drm_atomic_helper_async_check. This guarantees that the framebuffer will have previously been prepared and any subsequent async updates will always call prepare and cleanup_fb like the non-async atomic commit path would. Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> --- drivers/gpu/drm/drm_atomic_helper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)