mbox series

[v3,0/5] Fix some race conditions that exists between fbmem and sysfb

Message ID 20220420085303.100654-1-javierm@redhat.com (mailing list archive)
Headers show
Series Fix some race conditions that exists between fbmem and sysfb | expand

Message

Javier Martinez Canillas April 20, 2022, 8:52 a.m. UTC
Hello,

The patches in this series are mostly changes suggested by Daniel Vetter
to fix some race conditions that exists between the fbdev core (fbmem)
and sysfb with regard to device registration and removal.

For example, it is currently possible for sysfb to register a platform
device after a real DRM driver was registered and requested to remove the
conflicting framebuffers.

A symptom of this issue, was worked around with by commit fb561bf9abde
("fbdev: Prevent probing generic drivers if a FB is already registered")
but that's really a hack and should be reverted.

This series attempt to fix it more properly and revert the mentioned hack.
That will also unblock a pending patch to not make the num_registered_fb
variable visible to drivers anymore, since that's internal to fbdev core.

Patch #1 is just a trivial preparatory change.

Patch #2 add sysfb_disable() and sysfb_try_unregister() helpers for fbmem
to use them.

Patch #3 changes how is dealt with conflicting framebuffers unregistering,
rather than having a variable to determine if a lock should be take, it
just drops the lock before unregistering the platform device.

Patch #4 fixes the mentioned race conditions and finally patch #5 is the
revert patch that was posted by Daniel before but he dropped from his set.

The patches were tested on a rpi4 using different video configurations:
(simpledrm -> vc4 both builtin, only vc4 builtin, only simpledrm builtin
and simpledrm builtin with vc4 built as a module).

Best regards,
Javier

Changes in v3:
- Rebase on top of latest drm-misc-next branch.

Changes in v2:
- Rebase on top of latest drm-misc-next and fix conflicts (Daniel Vetter).
- Add kernel-doc comments and include in other_interfaces.rst (Daniel Vetter).
- Explain in the commit message that fbmem has to unregister the device
  as fallback if a driver registered the device itself (Daniel Vetter).
- Also explain that fallback in a comment in the code (Daniel Vetter).
- Don't encode in fbmem the assumption that sysfb will always register
  platform devices (Daniel Vetter).
- Add a FIXME comment about drivers registering devices (Daniel Vetter).
- Drop RFC prefix since patches were already reviewed by Daniel Vetter.
- Add Daniel Reviewed-by tags to the patches.

Daniel Vetter (1):
  Revert "fbdev: Prevent probing generic drivers if a FB is already
    registered"

Javier Martinez Canillas (4):
  firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
  firmware: sysfb: Add helpers to unregister a pdev and disable
    registration
  fbdev: Restart conflicting fb removal loop when unregistering devices
  fbdev: Fix some race conditions between fbmem and sysfb

 .../driver-api/firmware/other_interfaces.rst  |  6 ++
 drivers/firmware/sysfb.c                      | 77 +++++++++++++++++--
 drivers/firmware/sysfb_simplefb.c             | 16 ++--
 drivers/video/fbdev/core/fbmem.c              | 62 ++++++++++++---
 drivers/video/fbdev/efifb.c                   | 11 ---
 drivers/video/fbdev/simplefb.c                | 11 ---
 include/linux/fb.h                            |  1 -
 include/linux/sysfb.h                         | 29 +++++--
 8 files changed, 158 insertions(+), 55 deletions(-)

Comments

Greg KH April 22, 2022, 3:17 p.m. UTC | #1
On Wed, Apr 20, 2022 at 10:52:58AM +0200, Javier Martinez Canillas wrote:
> Hello,
> 
> The patches in this series are mostly changes suggested by Daniel Vetter
> to fix some race conditions that exists between the fbdev core (fbmem)
> and sysfb with regard to device registration and removal.
> 
> For example, it is currently possible for sysfb to register a platform
> device after a real DRM driver was registered and requested to remove the
> conflicting framebuffers.
> 
> A symptom of this issue, was worked around with by commit fb561bf9abde
> ("fbdev: Prevent probing generic drivers if a FB is already registered")
> but that's really a hack and should be reverted.
> 
> This series attempt to fix it more properly and revert the mentioned hack.
> That will also unblock a pending patch to not make the num_registered_fb
> variable visible to drivers anymore, since that's internal to fbdev core.
> 
> Patch #1 is just a trivial preparatory change.
> 
> Patch #2 add sysfb_disable() and sysfb_try_unregister() helpers for fbmem
> to use them.
> 
> Patch #3 changes how is dealt with conflicting framebuffers unregistering,
> rather than having a variable to determine if a lock should be take, it
> just drops the lock before unregistering the platform device.
> 
> Patch #4 fixes the mentioned race conditions and finally patch #5 is the
> revert patch that was posted by Daniel before but he dropped from his set.
> 
> The patches were tested on a rpi4 using different video configurations:
> (simpledrm -> vc4 both builtin, only vc4 builtin, only simpledrm builtin
> and simpledrm builtin with vc4 built as a module).
> 
> Best regards,
> Javier

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Zimmermann April 25, 2022, 8:54 a.m. UTC | #2
Hi

Am 20.04.22 um 10:52 schrieb Javier Martinez Canillas:
> Hello,
> 
> The patches in this series are mostly changes suggested by Daniel Vetter
> to fix some race conditions that exists between the fbdev core (fbmem)
> and sysfb with regard to device registration and removal.
> 
> For example, it is currently possible for sysfb to register a platform
> device after a real DRM driver was registered and requested to remove the
> conflicting framebuffers.
> 
> A symptom of this issue, was worked around with by commit fb561bf9abde
> ("fbdev: Prevent probing generic drivers if a FB is already registered")
> but that's really a hack and should be reverted.

As I mentioned on IRC, I think this series should be merged for the 
reasons I give in the other comments.

> 
> This series attempt to fix it more properly and revert the mentioned hack.
> That will also unblock a pending patch to not make the num_registered_fb
> variable visible to drivers anymore, since that's internal to fbdev core.

Here's as far as I understand the problem:

  1) build DRM/fbdev and sysfb code into the kernel
  2) during boot, load the DRM/fbdev modules and have them acquire I/O 
ranges
  3) afterwards load sysfb and have it register platform devices for the 
generic framebuffers
  4) these devices now conflict with the already-registered DRM/fbdev 
devices

If that is the problem here, let's simply set a sysfb_disable flag in 
sysfb code when the first DRM/fbdev driver first loads. With the flag 
set, sysfb won't create any platform devices. We assume that there are 
now DRM/fbdev drivers for the framebuffers and sysfb won't be needed.

We can set the flag internally from drm_aperture_detach_drivers() [1] 
and do_remove_conflicting_framebuffers() [2].

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.17.4/source/drivers/gpu/drm/drm_aperture.c#L253
[2] 
https://elixir.bootlin.com/linux/v5.17.4/source/drivers/video/fbdev/core/fbmem.c#L1559

> 
> Patch #1 is just a trivial preparatory change.
> 
> Patch #2 add sysfb_disable() and sysfb_try_unregister() helpers for fbmem
> to use them.
> 
> Patch #3 changes how is dealt with conflicting framebuffers unregistering,
> rather than having a variable to determine if a lock should be take, it
> just drops the lock before unregistering the platform device.
> 
> Patch #4 fixes the mentioned race conditions and finally patch #5 is the
> revert patch that was posted by Daniel before but he dropped from his set.
> 
> The patches were tested on a rpi4 using different video configurations:
> (simpledrm -> vc4 both builtin, only vc4 builtin, only simpledrm builtin
> and simpledrm builtin with vc4 built as a module).
> 
> Best regards,
> Javier
> 
> Changes in v3:
> - Rebase on top of latest drm-misc-next branch.
> 
> Changes in v2:
> - Rebase on top of latest drm-misc-next and fix conflicts (Daniel Vetter).
> - Add kernel-doc comments and include in other_interfaces.rst (Daniel Vetter).
> - Explain in the commit message that fbmem has to unregister the device
>    as fallback if a driver registered the device itself (Daniel Vetter).
> - Also explain that fallback in a comment in the code (Daniel Vetter).
> - Don't encode in fbmem the assumption that sysfb will always register
>    platform devices (Daniel Vetter).
> - Add a FIXME comment about drivers registering devices (Daniel Vetter).
> - Drop RFC prefix since patches were already reviewed by Daniel Vetter.
> - Add Daniel Reviewed-by tags to the patches.
> 
> Daniel Vetter (1):
>    Revert "fbdev: Prevent probing generic drivers if a FB is already
>      registered"
> 
> Javier Martinez Canillas (4):
>    firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
>    firmware: sysfb: Add helpers to unregister a pdev and disable
>      registration
>    fbdev: Restart conflicting fb removal loop when unregistering devices
>    fbdev: Fix some race conditions between fbmem and sysfb
> 
>   .../driver-api/firmware/other_interfaces.rst  |  6 ++
>   drivers/firmware/sysfb.c                      | 77 +++++++++++++++++--
>   drivers/firmware/sysfb_simplefb.c             | 16 ++--
>   drivers/video/fbdev/core/fbmem.c              | 62 ++++++++++++---
>   drivers/video/fbdev/efifb.c                   | 11 ---
>   drivers/video/fbdev/simplefb.c                | 11 ---
>   include/linux/fb.h                            |  1 -
>   include/linux/sysfb.h                         | 29 +++++--
>   8 files changed, 158 insertions(+), 55 deletions(-)
>
Thomas Zimmermann April 25, 2022, 9:15 a.m. UTC | #3
Hi

Am 25.04.22 um 10:54 schrieb Thomas Zimmermann:
> Hi
> 
> Am 20.04.22 um 10:52 schrieb Javier Martinez Canillas:
>> Hello,
>>
>> The patches in this series are mostly changes suggested by Daniel Vetter
>> to fix some race conditions that exists between the fbdev core (fbmem)
>> and sysfb with regard to device registration and removal.
>>
>> For example, it is currently possible for sysfb to register a platform
>> device after a real DRM driver was registered and requested to remove the
>> conflicting framebuffers.
>>
>> A symptom of this issue, was worked around with by commit fb561bf9abde
>> ("fbdev: Prevent probing generic drivers if a FB is already registered")
>> but that's really a hack and should be reverted.
> 
> As I mentioned on IRC, I think this series should be merged for the 
> reasons I give in the other comments.
> 
>>
>> This series attempt to fix it more properly and revert the mentioned 
>> hack.
>> That will also unblock a pending patch to not make the num_registered_fb
>> variable visible to drivers anymore, since that's internal to fbdev core.
> 
> Here's as far as I understand the problem:
> 
>   1) build DRM/fbdev and sysfb code into the kernel
>   2) during boot, load the DRM/fbdev modules and have them acquire I/O 
> ranges
>   3) afterwards load sysfb and have it register platform devices for the 
> generic framebuffers
>   4) these devices now conflict with the already-registered DRM/fbdev 
> devices
> 
> If that is the problem here, let's simply set a sysfb_disable flag in 
> sysfb code when the first DRM/fbdev driver first loads. With the flag 
> set, sysfb won't create any platform devices. We assume that there are 
> now DRM/fbdev drivers for the framebuffers and sysfb won't be needed.
> 
> We can set the flag internally from drm_aperture_detach_drivers() [1] 
> and do_remove_conflicting_framebuffers() [2].

And further thinking about it, it would be better to set such a flag 
after successfully registering a DRM/fbdev device.  So we know that 
there's at least one working display in the system. We don't have to 
rely on generic framebuffers after that.

Best regards
Thomas

> 
> Best regards
> Thomas
> 
> [1] 
> https://elixir.bootlin.com/linux/v5.17.4/source/drivers/gpu/drm/drm_aperture.c#L253 
> 
> [2] 
> https://elixir.bootlin.com/linux/v5.17.4/source/drivers/video/fbdev/core/fbmem.c#L1559 
> 
> 
>>
>> Patch #1 is just a trivial preparatory change.
>>
>> Patch #2 add sysfb_disable() and sysfb_try_unregister() helpers for fbmem
>> to use them.
>>
>> Patch #3 changes how is dealt with conflicting framebuffers 
>> unregistering,
>> rather than having a variable to determine if a lock should be take, it
>> just drops the lock before unregistering the platform device.
>>
>> Patch #4 fixes the mentioned race conditions and finally patch #5 is the
>> revert patch that was posted by Daniel before but he dropped from his 
>> set.
>>
>> The patches were tested on a rpi4 using different video configurations:
>> (simpledrm -> vc4 both builtin, only vc4 builtin, only simpledrm builtin
>> and simpledrm builtin with vc4 built as a module).
>>
>> Best regards,
>> Javier
>>
>> Changes in v3:
>> - Rebase on top of latest drm-misc-next branch.
>>
>> Changes in v2:
>> - Rebase on top of latest drm-misc-next and fix conflicts (Daniel 
>> Vetter).
>> - Add kernel-doc comments and include in other_interfaces.rst (Daniel 
>> Vetter).
>> - Explain in the commit message that fbmem has to unregister the device
>>    as fallback if a driver registered the device itself (Daniel Vetter).
>> - Also explain that fallback in a comment in the code (Daniel Vetter).
>> - Don't encode in fbmem the assumption that sysfb will always register
>>    platform devices (Daniel Vetter).
>> - Add a FIXME comment about drivers registering devices (Daniel Vetter).
>> - Drop RFC prefix since patches were already reviewed by Daniel Vetter.
>> - Add Daniel Reviewed-by tags to the patches.
>>
>> Daniel Vetter (1):
>>    Revert "fbdev: Prevent probing generic drivers if a FB is already
>>      registered"
>>
>> Javier Martinez Canillas (4):
>>    firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
>>    firmware: sysfb: Add helpers to unregister a pdev and disable
>>      registration
>>    fbdev: Restart conflicting fb removal loop when unregistering devices
>>    fbdev: Fix some race conditions between fbmem and sysfb
>>
>>   .../driver-api/firmware/other_interfaces.rst  |  6 ++
>>   drivers/firmware/sysfb.c                      | 77 +++++++++++++++++--
>>   drivers/firmware/sysfb_simplefb.c             | 16 ++--
>>   drivers/video/fbdev/core/fbmem.c              | 62 ++++++++++++---
>>   drivers/video/fbdev/efifb.c                   | 11 ---
>>   drivers/video/fbdev/simplefb.c                | 11 ---
>>   include/linux/fb.h                            |  1 -
>>   include/linux/sysfb.h                         | 29 +++++--
>>   8 files changed, 158 insertions(+), 55 deletions(-)
>>
>
Javier Martinez Canillas April 25, 2022, 9:49 a.m. UTC | #4
Hello Thomas,

Thanks for the feedback. It was very useful.

On 4/25/22 11:15, Thomas Zimmermann wrote:
> Hi
> 
> Am 25.04.22 um 10:54 schrieb Thomas Zimmermann:
>> Hi
>>
>> Am 20.04.22 um 10:52 schrieb Javier Martinez Canillas:
>>> Hello,
>>>
>>> The patches in this series are mostly changes suggested by Daniel Vetter
>>> to fix some race conditions that exists between the fbdev core (fbmem)
>>> and sysfb with regard to device registration and removal.
>>>
>>> For example, it is currently possible for sysfb to register a platform
>>> device after a real DRM driver was registered and requested to remove the
>>> conflicting framebuffers.
>>>
>>> A symptom of this issue, was worked around with by commit fb561bf9abde
>>> ("fbdev: Prevent probing generic drivers if a FB is already registered")
>>> but that's really a hack and should be reverted.
>>
>> As I mentioned on IRC, I think this series should be merged for the 
>> reasons I give in the other comments.
>>

You meant that should *not* get merged, as we discussed over IRC.

>>>
>>> This series attempt to fix it more properly and revert the mentioned 
>>> hack.
>>> That will also unblock a pending patch to not make the num_registered_fb
>>> variable visible to drivers anymore, since that's internal to fbdev core.
>>
>> Here's as far as I understand the problem:
>>
>>   1) build DRM/fbdev and sysfb code into the kernel
>>   2) during boot, load the DRM/fbdev modules and have them acquire I/O 
>> ranges
>>   3) afterwards load sysfb and have it register platform devices for the 
>> generic framebuffers
>>   4) these devices now conflict with the already-registered DRM/fbdev 
>> devices
>>

That's correct, yes.

>> If that is the problem here, let's simply set a sysfb_disable flag in 
>> sysfb code when the first DRM/fbdev driver first loads. With the flag 
>> set, sysfb won't create any platform devices. We assume that there are 
>> now DRM/fbdev drivers for the framebuffers and sysfb won't be needed.
>>
>> We can set the flag internally from drm_aperture_detach_drivers() [1] 
>> and do_remove_conflicting_framebuffers() [2].
> 
> And further thinking about it, it would be better to set such a flag 
> after successfully registering a DRM/fbdev device.  So we know that 
> there's at least one working display in the system. We don't have to 
> rely on generic framebuffers after that.
>

Exactly, should be done when the device is registered rather than when
the driver is registered or a call is made to remove the conflicting FB.

I'll rework this series with only the bits for sysfb_disable() and drop
the rest. We can go back to the discussion of the remaining parts later
if that makes sense (I still think that patch 3/5 is a better approach,
but let's defer that for a different series).
Daniel Vetter April 29, 2022, 7:47 a.m. UTC | #5
On Mon, Apr 25, 2022 at 11:49:13AM +0200, Javier Martinez Canillas wrote:
> Hello Thomas,
> 
> Thanks for the feedback. It was very useful.
> 
> On 4/25/22 11:15, Thomas Zimmermann wrote:
> > Hi
> > 
> > Am 25.04.22 um 10:54 schrieb Thomas Zimmermann:
> >> Hi
> >>
> >> Am 20.04.22 um 10:52 schrieb Javier Martinez Canillas:
> >>> Hello,
> >>>
> >>> The patches in this series are mostly changes suggested by Daniel Vetter
> >>> to fix some race conditions that exists between the fbdev core (fbmem)
> >>> and sysfb with regard to device registration and removal.
> >>>
> >>> For example, it is currently possible for sysfb to register a platform
> >>> device after a real DRM driver was registered and requested to remove the
> >>> conflicting framebuffers.
> >>>
> >>> A symptom of this issue, was worked around with by commit fb561bf9abde
> >>> ("fbdev: Prevent probing generic drivers if a FB is already registered")
> >>> but that's really a hack and should be reverted.
> >>
> >> As I mentioned on IRC, I think this series should be merged for the 
> >> reasons I give in the other comments.
> >>
> 
> You meant that should *not* get merged, as we discussed over IRC.
> 
> >>>
> >>> This series attempt to fix it more properly and revert the mentioned 
> >>> hack.
> >>> That will also unblock a pending patch to not make the num_registered_fb
> >>> variable visible to drivers anymore, since that's internal to fbdev core.
> >>
> >> Here's as far as I understand the problem:
> >>
> >>   1) build DRM/fbdev and sysfb code into the kernel
> >>   2) during boot, load the DRM/fbdev modules and have them acquire I/O 
> >> ranges
> >>   3) afterwards load sysfb and have it register platform devices for the 
> >> generic framebuffers
> >>   4) these devices now conflict with the already-registered DRM/fbdev 
> >> devices
> >>
> 
> That's correct, yes.
> 
> >> If that is the problem here, let's simply set a sysfb_disable flag in 
> >> sysfb code when the first DRM/fbdev driver first loads. With the flag 
> >> set, sysfb won't create any platform devices. We assume that there are 
> >> now DRM/fbdev drivers for the framebuffers and sysfb won't be needed.
> >>
> >> We can set the flag internally from drm_aperture_detach_drivers() [1] 
> >> and do_remove_conflicting_framebuffers() [2].
> > 
> > And further thinking about it, it would be better to set such a flag 
> > after successfully registering a DRM/fbdev device.  So we know that 
> > there's at least one working display in the system. We don't have to 
> > rely on generic framebuffers after that.
> >
> 
> Exactly, should be done when the device is registered rather than when
> the driver is registered or a call is made to remove the conflicting FB.
> 
> I'll rework this series with only the bits for sysfb_disable() and drop
> the rest. We can go back to the discussion of the remaining parts later
> if that makes sense (I still think that patch 3/5 is a better approach,
> but let's defer that for a different series).

We need to kill sysfb _before_ the driver loads, otherwise you can have
two drivers fighting over each another. And yes that means you might end
up with black screen if the driver load goes wrong, but the two drivers
fighting over each another can also result in black screens. And the
latter isn't fixable any other way (in general at least) than by making
sure the fw stuff is gone before driver load starts in earnest.
-Daniel
Javier Martinez Canillas April 29, 2022, 8:06 a.m. UTC | #6
Hello Daniel,

On 4/29/22 09:47, Daniel Vetter wrote:

[snip]

>>
>> Exactly, should be done when the device is registered rather than when
>> the driver is registered or a call is made to remove the conflicting FB.
>>
>> I'll rework this series with only the bits for sysfb_disable() and drop
>> the rest. We can go back to the discussion of the remaining parts later
>> if that makes sense (I still think that patch 3/5 is a better approach,
>> but let's defer that for a different series).
> 
> We need to kill sysfb _before_ the driver loads, otherwise you can have
> two drivers fighting over each another. And yes that means you might end
> up with black screen if the driver load goes wrong, but the two drivers
> fighting over each another can also result in black screens. And the
> latter isn't fixable any other way (in general at least) than by making
> sure the fw stuff is gone before driver load starts in earnest.

Yes, you are correct. I didn't realize all the possible cases when agreed
with Thomas about doing this but tried and found that it's not enough.

I've a full patch-set now and will post as a RFC so we can discuss more.

> -Daniel