Message ID | 20220504215722.56970-1-javierm@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fbdev: Fix use-after-free caused by wrong fb_info cleanup in drivers | expand |
Hi Am 04.05.22 um 23:57 schrieb Javier Martinez Canillas: > The driver is calling framebuffer_release() in its .remove callback, but > this will cause the struct fb_info to be freed too early. Since it could > be that a reference is still hold to it if user-space opened the fbdev. > > This would lead to a use-after-free error if the framebuffer device was > unregistered but later a user-space process tries to close the fbdev fd. > > The correct thing to do is to only unregister the framebuffer in the > driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy. > > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Please see my question below. > --- > > drivers/video/fbdev/simplefb.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c > index 94fc9c6d0411..2c198561c338 100644 > --- a/drivers/video/fbdev/simplefb.c > +++ b/drivers/video/fbdev/simplefb.c > @@ -84,6 +84,10 @@ struct simplefb_par { > static void simplefb_clocks_destroy(struct simplefb_par *par); > static void simplefb_regulators_destroy(struct simplefb_par *par); > > +/* > + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end > + * of unregister_framebuffer() or fb_release(). Do any cleanup here. > + */ > static void simplefb_destroy(struct fb_info *info) > { > struct simplefb_par *par = info->par; > @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info) > if (info->screen_base) > iounmap(info->screen_base); > > + framebuffer_release(info); > + > if (mem) > release_mem_region(mem->start, resource_size(mem)); The original problem with fbdev hot-unplug was that vmwgfx needed the framebuffer region to be released. If we release it only after userspace closed it's final file descriptor, vmwgfx could have already failed. I still don't fully get why this code apparently works or at least doesn't blow up occasionally. Any ideas? Best regards Thomas > } > @@ -545,8 +551,8 @@ static int simplefb_remove(struct platform_device *pdev) > { > struct fb_info *info = platform_get_drvdata(pdev); > > + /* simplefb_destroy takes care of info cleanup */ > unregister_framebuffer(info); > - framebuffer_release(info); > > return 0; > }
Hello Thomas, On 5/5/22 09:29, Thomas Zimmermann wrote: [snip] >> static void simplefb_destroy(struct fb_info *info) >> { >> struct simplefb_par *par = info->par; >> @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info) >> if (info->screen_base) >> iounmap(info->screen_base); >> >> + framebuffer_release(info); >> + >> if (mem) >> release_mem_region(mem->start, resource_size(mem)); > > The original problem with fbdev hot-unplug was that vmwgfx needed the > framebuffer region to be released. If we release it only after userspace > closed it's final file descriptor, vmwgfx could have already failed. > > I still don't fully get why this code apparently works or at least > doesn't blow up occasionally. Any ideas? > I believe that vmwgfx doesn't fail to probe (or any other DRM driver) only when there are not user-space processes with a fbdev node opened since otherwise as you said the memory wouldn't be released yet. unregister_framebuffer() is called from the driver's .remove handler and that decrement the fb_info refcount, so if reaches zero it will call to the fb fops .destroy() handler and release the I/O memory. In other words, in most cases (i.e: only fbcon bound to the fbdev) the driver's removal/ device unbind and the memory release will be at the same time.
Hi Am 05.05.22 um 09:38 schrieb Javier Martinez Canillas: > Hello Thomas, > > On 5/5/22 09:29, Thomas Zimmermann wrote: > > [snip] > >>> static void simplefb_destroy(struct fb_info *info) >>> { >>> struct simplefb_par *par = info->par; >>> @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info) >>> if (info->screen_base) >>> iounmap(info->screen_base); >>> >>> + framebuffer_release(info); >>> + >>> if (mem) >>> release_mem_region(mem->start, resource_size(mem)); >> >> The original problem with fbdev hot-unplug was that vmwgfx needed the >> framebuffer region to be released. If we release it only after userspace >> closed it's final file descriptor, vmwgfx could have already failed. >> >> I still don't fully get why this code apparently works or at least >> doesn't blow up occasionally. Any ideas? >> > > I believe that vmwgfx doesn't fail to probe (or any other DRM driver) > only when there are not user-space processes with a fbdev node opened > since otherwise as you said the memory wouldn't be released yet. > > unregister_framebuffer() is called from the driver's .remove handler > and that decrement the fb_info refcount, so if reaches zero it will > call to the fb fops .destroy() handler and release the I/O memory. > > In other words, in most cases (i.e: only fbcon bound to the fbdev) > the driver's removal/ device unbind and the memory release will be > at the same time. > We're one the same page here, but it's still sort of a mystery to me why this works in practice. I'm specifically talking about pci_request_regions() in vmwgfx [1]. IIRC this would fail if simplefb still owns the framebuffer region. Lots of systems run Plymouth during boot and this should result in failures occasionally. Still, we never heard about anything. Of course, it's always been broken (even long before real fbdev hotunplugging). Switching to simpledrm resolves the problem. Best regards Thomas [1] https://elixir.bootlin.com/linux/v5.17.5/source/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c#L727
Hello Thomas, On 5/5/22 10:05, Thomas Zimmermann wrote: [snip] >> >> In other words, in most cases (i.e: only fbcon bound to the fbdev) >> the driver's removal/ device unbind and the memory release will be >> at the same time. >> > > We're one the same page here, but it's still sort of a mystery to me why > this works in practice. > > I'm specifically talking about pci_request_regions() in vmwgfx [1]. IIRC > this would fail if simplefb still owns the framebuffer region. Lots of > systems run Plymouth during boot and this should result in failures > occasionally. Still, we never heard about anything. > Yes, I think is because Plymouth IIUC waits for a /dev/dri/card? to be present and only uses a /dev/fb? as a fallback if a timeout expires. At least in Fedora (even before the efifb -> simpledrm change) it will use KMS/DRM since the DRM kernel module for the graphics device in the machine would be in the intird. So efifb was only used for fbcon and plymouth would only use DRM/KMS and not its fbdev backend. This seems to be sort of a corner case when you have {efi,simple}fb in the early boot but the real DRM module only in the rootfs after the initrd has done a pivot_root(2). > Of course, it's always been broken (even long before real fbdev > hotunplugging). Switching to simpledrm resolves the problem. > Indeed. My opinion after dealing with these fbdev problems is that we shouldn't try to fix all possible corner cases and just try to get rid of fbdev as soon as possible. -- Best regards, Javier Martinez Canillas Linux Engineering Red Hat
Hi Am 05.05.22 um 10:28 schrieb Javier Martinez Canillas: > Hello Thomas, > > On 5/5/22 10:05, Thomas Zimmermann wrote: > > [snip] > >>> >>> In other words, in most cases (i.e: only fbcon bound to the fbdev) >>> the driver's removal/ device unbind and the memory release will be >>> at the same time. >>> >> >> We're one the same page here, but it's still sort of a mystery to me why >> this works in practice. >> >> I'm specifically talking about pci_request_regions() in vmwgfx [1]. IIRC >> this would fail if simplefb still owns the framebuffer region. Lots of >> systems run Plymouth during boot and this should result in failures >> occasionally. Still, we never heard about anything. >> > > Yes, I think is because Plymouth IIUC waits for a /dev/dri/card? to be > present and only uses a /dev/fb? as a fallback if a timeout expires. Oh, right! The infamous plymouth timeout. 'sleep(30)' is the swiss-army knife of concurrent programming. ;) But I'm not blaming anyone. There are situations where nothing else helps. Plymouth really can't do anything else here. We've received reports for gfx-handover bugs when the timeout expired and plymouth uses the fbdev. The system got stuck then because of fbdev IIRC. Best regards Thomas > > At least in Fedora (even before the efifb -> simpledrm change) it will > use KMS/DRM since the DRM kernel module for the graphics device in the > machine would be in the intird. > > So efifb was only used for fbcon and plymouth would only use DRM/KMS > and not its fbdev backend. > > This seems to be sort of a corner case when you have {efi,simple}fb > in the early boot but the real DRM module only in the rootfs after the > initrd has done a pivot_root(2). > >> Of course, it's always been broken (even long before real fbdev >> hotunplugging). Switching to simpledrm resolves the problem. >> > > Indeed. My opinion after dealing with these fbdev problems is that we > shouldn't try to fix all possible corner cases and just try to get rid > of fbdev as soon as possible. > -- > Best regards, > > Javier Martinez Canillas > Linux Engineering > Red Hat >
On Wed, May 04, 2022 at 11:57:22PM +0200, Javier Martinez Canillas wrote: > The driver is calling framebuffer_release() in its .remove callback, but > this will cause the struct fb_info to be freed too early. Since it could > be that a reference is still hold to it if user-space opened the fbdev. > > This would lead to a use-after-free error if the framebuffer device was > unregistered but later a user-space process tries to close the fbdev fd. > > The correct thing to do is to only unregister the framebuffer in the > driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy. > > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> I think this should have a Fixes: line for the patch from Thomas which changed the remove_conflicting_fb code: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") I think we should also mention that strictly speaking the code flow is now wrong, because hw cleanup (like iounmap) should be done from ->remove while sw cleanup (like calling framebuffer_release()) is the only thing that should be done from ->fb_destroy. But the current code matches what was happening before 27599aacbaef so more minimal "fix" With those details added Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Same for the next patch. -Daniel > --- > > drivers/video/fbdev/simplefb.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c > index 94fc9c6d0411..2c198561c338 100644 > --- a/drivers/video/fbdev/simplefb.c > +++ b/drivers/video/fbdev/simplefb.c > @@ -84,6 +84,10 @@ struct simplefb_par { > static void simplefb_clocks_destroy(struct simplefb_par *par); > static void simplefb_regulators_destroy(struct simplefb_par *par); > > +/* > + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end > + * of unregister_framebuffer() or fb_release(). Do any cleanup here. > + */ > static void simplefb_destroy(struct fb_info *info) > { > struct simplefb_par *par = info->par; > @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info) > if (info->screen_base) > iounmap(info->screen_base); > > + framebuffer_release(info); > + > if (mem) > release_mem_region(mem->start, resource_size(mem)); > } > @@ -545,8 +551,8 @@ static int simplefb_remove(struct platform_device *pdev) > { > struct fb_info *info = platform_get_drvdata(pdev); > > + /* simplefb_destroy takes care of info cleanup */ > unregister_framebuffer(info); > - framebuffer_release(info); > > return 0; > } > -- > 2.35.1 >
Hello Daniel, On 5/5/22 14:52, Daniel Vetter wrote: > On Wed, May 04, 2022 at 11:57:22PM +0200, Javier Martinez Canillas wrote: >> The driver is calling framebuffer_release() in its .remove callback, but >> this will cause the struct fb_info to be freed too early. Since it could >> be that a reference is still hold to it if user-space opened the fbdev. >> >> This would lead to a use-after-free error if the framebuffer device was >> unregistered but later a user-space process tries to close the fbdev fd. >> >> The correct thing to do is to only unregister the framebuffer in the >> driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy. >> >> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> >> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> > > I think this should have a Fixes: line for the patch from Thomas which > changed the remove_conflicting_fb code: > > 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") > Ok. > I think we should also mention that strictly speaking the code flow is now > wrong, because hw cleanup (like iounmap) should be done from ->remove > while sw cleanup (like calling framebuffer_release()) is the only thing > that should be done from ->fb_destroy. But the current code matches what > was happening before 27599aacbaef so more minimal "fix" > Yes, I tried to make it as small as possible. Thomas pointed out that vesafb has the same issue and I included in v2. I had move things around more there though. > With those details added Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Same for the next patch. Thanks. I'll post a v3 adding what you suggested but probably not today.
On Thu, May 05, 2022 at 09:29:40AM +0200, Thomas Zimmermann wrote: > Hi > > Am 04.05.22 um 23:57 schrieb Javier Martinez Canillas: > > The driver is calling framebuffer_release() in its .remove callback, but > > this will cause the struct fb_info to be freed too early. Since it could > > be that a reference is still hold to it if user-space opened the fbdev. > > > > This would lead to a use-after-free error if the framebuffer device was > > unregistered but later a user-space process tries to close the fbdev fd. > > > > The correct thing to do is to only unregister the framebuffer in the > > driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy. > > > > Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> > > Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> > > Please see my question below. > > > --- > > > > drivers/video/fbdev/simplefb.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c > > index 94fc9c6d0411..2c198561c338 100644 > > --- a/drivers/video/fbdev/simplefb.c > > +++ b/drivers/video/fbdev/simplefb.c > > @@ -84,6 +84,10 @@ struct simplefb_par { > > static void simplefb_clocks_destroy(struct simplefb_par *par); > > static void simplefb_regulators_destroy(struct simplefb_par *par); > > +/* > > + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end > > + * of unregister_framebuffer() or fb_release(). Do any cleanup here. > > + */ > > static void simplefb_destroy(struct fb_info *info) > > { > > struct simplefb_par *par = info->par; > > @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info) > > if (info->screen_base) > > iounmap(info->screen_base); > > + framebuffer_release(info); > > + > > if (mem) > > release_mem_region(mem->start, resource_size(mem)); > > The original problem with fbdev hot-unplug was that vmwgfx needed the > framebuffer region to be released. If we release it only after userspace > closed it's final file descriptor, vmwgfx could have already failed. > > I still don't fully get why this code apparently works or at least doesn't > blow up occasionally. Any ideas? See my other reply, releasing hw stuff should be done from ->remove, not ->fb_destroy. Also note that none of the patches discussed moved around anything here, we simply leaked things a bit when only unregistering the fb and not going through the device->remove callback. I guess in practice it works because unregistering the device sends out a uevent, and userspace then closes all it's device fd as a reaction to that, and usually that's fast enough to not upset anyone? No idea tbh. -Daniel > Best regards > Thomas > > > } > > @@ -545,8 +551,8 @@ static int simplefb_remove(struct platform_device *pdev) > > { > > struct fb_info *info = platform_get_drvdata(pdev); > > + /* simplefb_destroy takes care of info cleanup */ > > unregister_framebuffer(info); > > - framebuffer_release(info); > > return 0; > > } > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Ivo Totev
diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c index 94fc9c6d0411..2c198561c338 100644 --- a/drivers/video/fbdev/simplefb.c +++ b/drivers/video/fbdev/simplefb.c @@ -84,6 +84,10 @@ struct simplefb_par { static void simplefb_clocks_destroy(struct simplefb_par *par); static void simplefb_regulators_destroy(struct simplefb_par *par); +/* + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end + * of unregister_framebuffer() or fb_release(). Do any cleanup here. + */ static void simplefb_destroy(struct fb_info *info) { struct simplefb_par *par = info->par; @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info) if (info->screen_base) iounmap(info->screen_base); + framebuffer_release(info); + if (mem) release_mem_region(mem->start, resource_size(mem)); } @@ -545,8 +551,8 @@ static int simplefb_remove(struct platform_device *pdev) { struct fb_info *info = platform_get_drvdata(pdev); + /* simplefb_destroy takes care of info cleanup */ unregister_framebuffer(info); - framebuffer_release(info); return 0; }
The driver is calling framebuffer_release() in its .remove callback, but this will cause the struct fb_info to be freed too early. Since it could be that a reference is still hold to it if user-space opened the fbdev. This would lead to a use-after-free error if the framebuffer device was unregistered but later a user-space process tries to close the fbdev fd. The correct thing to do is to only unregister the framebuffer in the driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy. Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> --- drivers/video/fbdev/simplefb.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)