diff mbox series

video: fbdev: don't remove firmware fb device if it is busy

Message ID 20220430025749.2320824-1-junxiao.chang@intel.com (mailing list archive)
State Superseded
Headers show
Series video: fbdev: don't remove firmware fb device if it is busy | expand

Commit Message

Chang, Junxiao April 30, 2022, 2:57 a.m. UTC
When firmware framebuffer is in use, don't remove its platform
device. Or else its fb_info buffer is released by platform remove
hook while device is still opened. This introduces use after free
issue.

A kernel panic example:
CPU: 2 PID: 3425 Comm: psplash Tainted: G     U  W     5.18.0-rc3
Hardware name: Intel Client Platform/ADP-S DDR5 UDIMM CRB
RIP: 0010:native_queued_spin_lock_slowpath+0x1c7/0x210
RSP: 0018:ffffb3a0c0c2fdb0 EFLAGS: 00010206
RAX: 002dc074ff5c0988 RBX: ffff92e987a5d818 RCX: ffff92e989ba9f40
RDX: 0000000000002067 RSI: ffffffff864344f1 RDI: ffffffff8644183c
RBP: ffff92f10f4abd40 R08: 0000000000000001 R09: ffff92e986dc2188
...
Call Trace:
 <TASK>
 _raw_spin_lock+0x2c/0x30
 __mutex_lock.constprop.0+0x175/0x4f0
 ? _raw_spin_unlock+0x15/0x30
 ? list_lru_add+0x124/0x160
 fb_release+0x1b/0x60
 __fput+0x89/0x240
 task_work_run+0x59/0x90
 do_exit+0x343/0xaf0
 do_group_exit+0x2d/0x90
 __x64_sys_exit_group+0x14/0x20
 do_syscall_64+0x40/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
Signed-off-by: Lili Li <lili.li@intel.com>
---
 drivers/video/fbdev/core/fbmem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Thomas Zimmermann May 2, 2022, 7:24 a.m. UTC | #1
Hi

Am 30.04.22 um 04:57 schrieb Junxiao Chang:
> When firmware framebuffer is in use, don't remove its platform
> device. Or else its fb_info buffer is released by platform remove
> hook while device is still opened. This introduces use after free
> issue.

When exactly does this happen? Do you have a reliable way of reproducing it?

Best regards
Thomas

> 
> A kernel panic example:
> CPU: 2 PID: 3425 Comm: psplash Tainted: G     U  W     5.18.0-rc3
> Hardware name: Intel Client Platform/ADP-S DDR5 UDIMM CRB
> RIP: 0010:native_queued_spin_lock_slowpath+0x1c7/0x210
> RSP: 0018:ffffb3a0c0c2fdb0 EFLAGS: 00010206
> RAX: 002dc074ff5c0988 RBX: ffff92e987a5d818 RCX: ffff92e989ba9f40
> RDX: 0000000000002067 RSI: ffffffff864344f1 RDI: ffffffff8644183c
> RBP: ffff92f10f4abd40 R08: 0000000000000001 R09: ffff92e986dc2188
> ...
> Call Trace:
>   <TASK>
>   _raw_spin_lock+0x2c/0x30
>   __mutex_lock.constprop.0+0x175/0x4f0
>   ? _raw_spin_unlock+0x15/0x30
>   ? list_lru_add+0x124/0x160
>   fb_release+0x1b/0x60
>   __fput+0x89/0x240
>   task_work_run+0x59/0x90
>   do_exit+0x343/0xaf0
>   do_group_exit+0x2d/0x90
>   __x64_sys_exit_group+0x14/0x20
>   do_syscall_64+0x40/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
> Signed-off-by: Lili Li <lili.li@intel.com>
> ---
>   drivers/video/fbdev/core/fbmem.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index a6bb0e438216..ff9b9830b398 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -1586,7 +1586,9 @@ static void do_remove_conflicting_framebuffers(struct apertures_struct *a,
>   				 * framebuffer as before without warning.
>   				 */
>   				do_unregister_framebuffer(registered_fb[i]);
> -			} else if (dev_is_platform(device)) {
> +			} else if (dev_is_platform(device) &&
> +					refcount_read(&registered_fb[i]->count) == 1) {
> +				/* Remove platform device if it is not in use. */
>   				registered_fb[i]->forced_out = true;
>   				platform_device_unregister(to_platform_device(device));
>   			} else {
Chang, Junxiao May 3, 2022, 12:38 a.m. UTC | #2
Hi Thomas,

We reproduced this issue with Yocto image + 5.18-rc3 kernel.
If you could try Yocto image and enable psplash, the problem could be reproduced almost every time if psplash is launched before Intel i915 registers 2nd framebuffer.

During Yocto booting up, it launches psplash which opens EFI firmware framebuffer and plays animation. When it exits, fb_release in kernel might access memory which has been released.

Overall process looks like:
1. EFI registers framebuffer in efifb_probe, framebuffer_alloc is called to allocate "struct fb_info" memory;
2. In userspace, psplash is started to play boot animation, it opens framebuffer driver. In fb_open function, it saves fb_info memory to file->private_data;
3. When Intel i915 driver is probed, it registers 2nd framebuffer, and will remove conflicting framebuffer;
4. In do_remove_conflicting_framebuffers, it calls "platform_device_unregister" to remove EFI platform framebuffer device;
5. In EFI framebuffer's efifb_remove function, it calls framebuffer_release to release "struct fb_info" memory which is still use and stored in file->private_data;
6. When psplash user space application exits, it calls fb_release function, and accesses to fb_info memory, but it has been released in step 5. Kernel panic happens.

Our patch is to check whether EFI framebuffer is opened at that time. If it is free(registered_fb[i]->count == 1), go ahead to remove EFI platform device. Or else, just unregister framebuffer to avoid kernel panic. 

Thanks,
Junxiao 

-----Original Message-----
From: Thomas Zimmermann <tzimmermann@suse.de> 
Sent: Monday, May 2, 2022 3:24 PM
To: Chang, Junxiao <junxiao.chang@intel.com>; linux-fbdev@vger.kernel.org
Cc: lethal@linux-sh.org; patchwork-bot@kernel.org; deller@gmx.de; Li, Lili <lili.li@intel.com>
Subject: Re: [PATCH] video: fbdev: don't remove firmware fb device if it is busy

Hi

Am 30.04.22 um 04:57 schrieb Junxiao Chang:
> When firmware framebuffer is in use, don't remove its platform device. 
> Or else its fb_info buffer is released by platform remove hook while 
> device is still opened. This introduces use after free issue.

When exactly does this happen? Do you have a reliable way of reproducing it?

Best regards
Thomas

> 
> A kernel panic example:
> CPU: 2 PID: 3425 Comm: psplash Tainted: G     U  W     5.18.0-rc3
> Hardware name: Intel Client Platform/ADP-S DDR5 UDIMM CRB
> RIP: 0010:native_queued_spin_lock_slowpath+0x1c7/0x210
> RSP: 0018:ffffb3a0c0c2fdb0 EFLAGS: 00010206
> RAX: 002dc074ff5c0988 RBX: ffff92e987a5d818 RCX: ffff92e989ba9f40
> RDX: 0000000000002067 RSI: ffffffff864344f1 RDI: ffffffff8644183c
> RBP: ffff92f10f4abd40 R08: 0000000000000001 R09: ffff92e986dc2188 ...
> Call Trace:
>   <TASK>
>   _raw_spin_lock+0x2c/0x30
>   __mutex_lock.constprop.0+0x175/0x4f0
>   ? _raw_spin_unlock+0x15/0x30
>   ? list_lru_add+0x124/0x160
>   fb_release+0x1b/0x60
>   __fput+0x89/0x240
>   task_work_run+0x59/0x90
>   do_exit+0x343/0xaf0
>   do_group_exit+0x2d/0x90
>   __x64_sys_exit_group+0x14/0x20
>   do_syscall_64+0x40/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced 
> removal")
> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
> Signed-off-by: Lili Li <lili.li@intel.com>
> ---
>   drivers/video/fbdev/core/fbmem.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c 
> b/drivers/video/fbdev/core/fbmem.c
> index a6bb0e438216..ff9b9830b398 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -1586,7 +1586,9 @@ static void do_remove_conflicting_framebuffers(struct apertures_struct *a,
>   				 * framebuffer as before without warning.
>   				 */
>   				do_unregister_framebuffer(registered_fb[i]);
> -			} else if (dev_is_platform(device)) {
> +			} else if (dev_is_platform(device) &&
> +					refcount_read(&registered_fb[i]->count) == 1) {
> +				/* Remove platform device if it is not in use. */
>   				registered_fb[i]->forced_out = true;
>   				platform_device_unregister(to_platform_device(device));
>   			} else {

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
Thomas Zimmermann May 3, 2022, 7:16 a.m. UTC | #3
(cc'ing Javier)

Hi

Am 03.05.22 um 02:38 schrieb Chang, Junxiao:
> Hi Thomas,
> 
> We reproduced this issue with Yocto image + 5.18-rc3 kernel.
> If you could try Yocto image and enable psplash, the problem could be reproduced almost every time if psplash is launched before Intel i915 registers 2nd framebuffer.
> 
> During Yocto booting up, it launches psplash which opens EFI firmware framebuffer and plays animation. When it exits, fb_release in kernel might access memory which has been released.
> 
> Overall process looks like:
> 1. EFI registers framebuffer in efifb_probe, framebuffer_alloc is called to allocate "struct fb_info" memory;
> 2. In userspace, psplash is started to play boot animation, it opens framebuffer driver. In fb_open function, it saves fb_info memory to file->private_data;
> 3. When Intel i915 driver is probed, it registers 2nd framebuffer, and will remove conflicting framebuffer;
> 4. In do_remove_conflicting_framebuffers, it calls "platform_device_unregister" to remove EFI platform framebuffer device;
> 5. In EFI framebuffer's efifb_remove function, it calls framebuffer_release to release "struct fb_info" memory which is still use and stored in file->private_data;
> 6. When psplash user space application exits, it calls fb_release function, and accesses to fb_info memory, but it has been released in step 5. Kernel panic happens.

Thanks for the information. We only had a similar report about RPi 
devices, but we never encountered this problem before.

> 
> Our patch is to check whether EFI framebuffer is opened at that time. If it is free(registered_fb[i]->count == 1), go ahead to remove EFI platform device. Or else, just unregister framebuffer to avoid kernel panic.

Javier posted a possible fix for the root cause of this problem, but 
we're still looking for testers. If you have the opportunity, could you 
  please test the patch at [1] and report back on the results.

Best regards
Thomas

[1] 
https://lore.kernel.org/dri-devel/20220502135014.377945-1-javierm@redhat.com/T/#u

> 
> Thanks,
> Junxiao
> 
> -----Original Message-----
> From: Thomas Zimmermann <tzimmermann@suse.de>
> Sent: Monday, May 2, 2022 3:24 PM
> To: Chang, Junxiao <junxiao.chang@intel.com>; linux-fbdev@vger.kernel.org
> Cc: lethal@linux-sh.org; patchwork-bot@kernel.org; deller@gmx.de; Li, Lili <lili.li@intel.com>
> Subject: Re: [PATCH] video: fbdev: don't remove firmware fb device if it is busy
> 
> Hi
> 
> Am 30.04.22 um 04:57 schrieb Junxiao Chang:
>> When firmware framebuffer is in use, don't remove its platform device.
>> Or else its fb_info buffer is released by platform remove hook while
>> device is still opened. This introduces use after free issue.
> 
> When exactly does this happen? Do you have a reliable way of reproducing it?
> 
> Best regards
> Thomas
> 
>>
>> A kernel panic example:
>> CPU: 2 PID: 3425 Comm: psplash Tainted: G     U  W     5.18.0-rc3
>> Hardware name: Intel Client Platform/ADP-S DDR5 UDIMM CRB
>> RIP: 0010:native_queued_spin_lock_slowpath+0x1c7/0x210
>> RSP: 0018:ffffb3a0c0c2fdb0 EFLAGS: 00010206
>> RAX: 002dc074ff5c0988 RBX: ffff92e987a5d818 RCX: ffff92e989ba9f40
>> RDX: 0000000000002067 RSI: ffffffff864344f1 RDI: ffffffff8644183c
>> RBP: ffff92f10f4abd40 R08: 0000000000000001 R09: ffff92e986dc2188 ...
>> Call Trace:
>>    <TASK>
>>    _raw_spin_lock+0x2c/0x30
>>    __mutex_lock.constprop.0+0x175/0x4f0
>>    ? _raw_spin_unlock+0x15/0x30
>>    ? list_lru_add+0x124/0x160
>>    fb_release+0x1b/0x60
>>    __fput+0x89/0x240
>>    task_work_run+0x59/0x90
>>    do_exit+0x343/0xaf0
>>    do_group_exit+0x2d/0x90
>>    __x64_sys_exit_group+0x14/0x20
>>    do_syscall_64+0x40/0x90
>>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced
>> removal")
>> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
>> Signed-off-by: Lili Li <lili.li@intel.com>
>> ---
>>    drivers/video/fbdev/core/fbmem.c | 4 +++-
>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/video/fbdev/core/fbmem.c
>> b/drivers/video/fbdev/core/fbmem.c
>> index a6bb0e438216..ff9b9830b398 100644
>> --- a/drivers/video/fbdev/core/fbmem.c
>> +++ b/drivers/video/fbdev/core/fbmem.c
>> @@ -1586,7 +1586,9 @@ static void do_remove_conflicting_framebuffers(struct apertures_struct *a,
>>    				 * framebuffer as before without warning.
>>    				 */
>>    				do_unregister_framebuffer(registered_fb[i]);
>> -			} else if (dev_is_platform(device)) {
>> +			} else if (dev_is_platform(device) &&
>> +					refcount_read(&registered_fb[i]->count) == 1) {
>> +				/* Remove platform device if it is not in use. */
>>    				registered_fb[i]->forced_out = true;
>>    				platform_device_unregister(to_platform_device(device));
>>    			} else {
> 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev
Chang, Junxiao May 3, 2022, 12:29 p.m. UTC | #4
We tested Javier's fix, issue couldn't be reproduced anymore.
https://lore.kernel.org/dri-devel/20220502135014.377945-1-javierm@redhat.com/T/#u

But this fix doesn't cover all FB driver's interface:

	.get_unmapped_area = get_fb_unmapped_area,
	.fsync =	fb_deferred_io_fsync,

file_fb_info(file) NULL checking should be added in these two interface functions(get_fb_unmapped_area and fb_deferred_io_fsync) as well?

Regards,
Junxiao

-----Original Message-----
From: Thomas Zimmermann <tzimmermann@suse.de> 
Sent: Tuesday, May 3, 2022 3:16 PM
To: Chang, Junxiao <junxiao.chang@intel.com>; linux-fbdev@vger.kernel.org
Cc: lethal@linux-sh.org; patchwork-bot@kernel.org; deller@gmx.de; Li, Lili <lili.li@intel.com>; Javier Martinez Canillas <javierm@redhat.com>
Subject: Re: [PATCH] video: fbdev: don't remove firmware fb device if it is busy

(cc'ing Javier)

Hi

Am 03.05.22 um 02:38 schrieb Chang, Junxiao:
> Hi Thomas,
> 
> We reproduced this issue with Yocto image + 5.18-rc3 kernel.
> If you could try Yocto image and enable psplash, the problem could be reproduced almost every time if psplash is launched before Intel i915 registers 2nd framebuffer.
> 
> During Yocto booting up, it launches psplash which opens EFI firmware framebuffer and plays animation. When it exits, fb_release in kernel might access memory which has been released.
> 
> Overall process looks like:
> 1. EFI registers framebuffer in efifb_probe, framebuffer_alloc is 
> called to allocate "struct fb_info" memory; 2. In userspace, psplash 
> is started to play boot animation, it opens framebuffer driver. In 
> fb_open function, it saves fb_info memory to file->private_data; 3. 
> When Intel i915 driver is probed, it registers 2nd framebuffer, and 
> will remove conflicting framebuffer; 4. In do_remove_conflicting_framebuffers, it calls "platform_device_unregister" to remove EFI platform framebuffer device; 5. In EFI framebuffer's efifb_remove function, it calls framebuffer_release to release "struct fb_info" memory which is still use and stored in file->private_data; 6. When psplash user space application exits, it calls fb_release function, and accesses to fb_info memory, but it has been released in step 5. Kernel panic happens.

Thanks for the information. We only had a similar report about RPi devices, but we never encountered this problem before.

> 
> Our patch is to check whether EFI framebuffer is opened at that time. If it is free(registered_fb[i]->count == 1), go ahead to remove EFI platform device. Or else, just unregister framebuffer to avoid kernel panic.

Javier posted a possible fix for the root cause of this problem, but we're still looking for testers. If you have the opportunity, could you
  please test the patch at [1] and report back on the results.

Best regards
Thomas

[1]
https://lore.kernel.org/dri-devel/20220502135014.377945-1-javierm@redhat.com/T/#u

> 
> Thanks,
> Junxiao
> 
> -----Original Message-----
> From: Thomas Zimmermann <tzimmermann@suse.de>
> Sent: Monday, May 2, 2022 3:24 PM
> To: Chang, Junxiao <junxiao.chang@intel.com>; 
> linux-fbdev@vger.kernel.org
> Cc: lethal@linux-sh.org; patchwork-bot@kernel.org; deller@gmx.de; Li, 
> Lili <lili.li@intel.com>
> Subject: Re: [PATCH] video: fbdev: don't remove firmware fb device if 
> it is busy
> 
> Hi
> 
> Am 30.04.22 um 04:57 schrieb Junxiao Chang:
>> When firmware framebuffer is in use, don't remove its platform device.
>> Or else its fb_info buffer is released by platform remove hook while 
>> device is still opened. This introduces use after free issue.
> 
> When exactly does this happen? Do you have a reliable way of reproducing it?
> 
> Best regards
> Thomas
> 
>>
>> A kernel panic example:
>> CPU: 2 PID: 3425 Comm: psplash Tainted: G     U  W     5.18.0-rc3
>> Hardware name: Intel Client Platform/ADP-S DDR5 UDIMM CRB
>> RIP: 0010:native_queued_spin_lock_slowpath+0x1c7/0x210
>> RSP: 0018:ffffb3a0c0c2fdb0 EFLAGS: 00010206
>> RAX: 002dc074ff5c0988 RBX: ffff92e987a5d818 RCX: ffff92e989ba9f40
>> RDX: 0000000000002067 RSI: ffffffff864344f1 RDI: ffffffff8644183c
>> RBP: ffff92f10f4abd40 R08: 0000000000000001 R09: ffff92e986dc2188 ...
>> Call Trace:
>>    <TASK>
>>    _raw_spin_lock+0x2c/0x30
>>    __mutex_lock.constprop.0+0x175/0x4f0
>>    ? _raw_spin_unlock+0x15/0x30
>>    ? list_lru_add+0x124/0x160
>>    fb_release+0x1b/0x60
>>    __fput+0x89/0x240
>>    task_work_run+0x59/0x90
>>    do_exit+0x343/0xaf0
>>    do_group_exit+0x2d/0x90
>>    __x64_sys_exit_group+0x14/0x20
>>    do_syscall_64+0x40/0x90
>>    entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced
>> removal")
>> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
>> Signed-off-by: Lili Li <lili.li@intel.com>
>> ---
>>    drivers/video/fbdev/core/fbmem.c | 4 +++-
>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/video/fbdev/core/fbmem.c
>> b/drivers/video/fbdev/core/fbmem.c
>> index a6bb0e438216..ff9b9830b398 100644
>> --- a/drivers/video/fbdev/core/fbmem.c
>> +++ b/drivers/video/fbdev/core/fbmem.c
>> @@ -1586,7 +1586,9 @@ static void do_remove_conflicting_framebuffers(struct apertures_struct *a,
>>    				 * framebuffer as before without warning.
>>    				 */
>>    				do_unregister_framebuffer(registered_fb[i]);
>> -			} else if (dev_is_platform(device)) {
>> +			} else if (dev_is_platform(device) &&
>> +					refcount_read(&registered_fb[i]->count) == 1) {
>> +				/* Remove platform device if it is not in use. */
>>    				registered_fb[i]->forced_out = true;
>>    				platform_device_unregister(to_platform_device(device));
>>    			} else {
> 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
Thomas Zimmermann May 3, 2022, 1:05 p.m. UTC | #5
Hi

Am 03.05.22 um 14:29 schrieb Chang, Junxiao:
> We tested Javier's fix, issue couldn't be reproduced anymore.
> https://lore.kernel.org/dri-devel/20220502135014.377945-1-javierm@redhat.com/T/#u

Thank you so much.

Best regards
Thomas

> 
> But this fix doesn't cover all FB driver's interface:
> 
> 	.get_unmapped_area = get_fb_unmapped_area,
> 	.fsync =	fb_deferred_io_fsync,
> 
> file_fb_info(file) NULL checking should be added in these two interface functions(get_fb_unmapped_area and fb_deferred_io_fsync) as well?
> 
> Regards,
> Junxiao
> 
> -----Original Message-----
> From: Thomas Zimmermann <tzimmermann@suse.de>
> Sent: Tuesday, May 3, 2022 3:16 PM
> To: Chang, Junxiao <junxiao.chang@intel.com>; linux-fbdev@vger.kernel.org
> Cc: lethal@linux-sh.org; patchwork-bot@kernel.org; deller@gmx.de; Li, Lili <lili.li@intel.com>; Javier Martinez Canillas <javierm@redhat.com>
> Subject: Re: [PATCH] video: fbdev: don't remove firmware fb device if it is busy
> 
> (cc'ing Javier)
> 
> Hi
> 
> Am 03.05.22 um 02:38 schrieb Chang, Junxiao:
>> Hi Thomas,
>>
>> We reproduced this issue with Yocto image + 5.18-rc3 kernel.
>> If you could try Yocto image and enable psplash, the problem could be reproduced almost every time if psplash is launched before Intel i915 registers 2nd framebuffer.
>>
>> During Yocto booting up, it launches psplash which opens EFI firmware framebuffer and plays animation. When it exits, fb_release in kernel might access memory which has been released.
>>
>> Overall process looks like:
>> 1. EFI registers framebuffer in efifb_probe, framebuffer_alloc is
>> called to allocate "struct fb_info" memory; 2. In userspace, psplash
>> is started to play boot animation, it opens framebuffer driver. In
>> fb_open function, it saves fb_info memory to file->private_data; 3.
>> When Intel i915 driver is probed, it registers 2nd framebuffer, and
>> will remove conflicting framebuffer; 4. In do_remove_conflicting_framebuffers, it calls "platform_device_unregister" to remove EFI platform framebuffer device; 5. In EFI framebuffer's efifb_remove function, it calls framebuffer_release to release "struct fb_info" memory which is still use and stored in file->private_data; 6. When psplash user space application exits, it calls fb_release function, and accesses to fb_info memory, but it has been released in step 5. Kernel panic happens.
> 
> Thanks for the information. We only had a similar report about RPi devices, but we never encountered this problem before.
> 
>>
>> Our patch is to check whether EFI framebuffer is opened at that time. If it is free(registered_fb[i]->count == 1), go ahead to remove EFI platform device. Or else, just unregister framebuffer to avoid kernel panic.
> 
> Javier posted a possible fix for the root cause of this problem, but we're still looking for testers. If you have the opportunity, could you
>    please test the patch at [1] and report back on the results.
> 
> Best regards
> Thomas
> 
> [1]
> https://lore.kernel.org/dri-devel/20220502135014.377945-1-javierm@redhat.com/T/#u
> 
>>
>> Thanks,
>> Junxiao
>>
>> -----Original Message-----
>> From: Thomas Zimmermann <tzimmermann@suse.de>
>> Sent: Monday, May 2, 2022 3:24 PM
>> To: Chang, Junxiao <junxiao.chang@intel.com>;
>> linux-fbdev@vger.kernel.org
>> Cc: lethal@linux-sh.org; patchwork-bot@kernel.org; deller@gmx.de; Li,
>> Lili <lili.li@intel.com>
>> Subject: Re: [PATCH] video: fbdev: don't remove firmware fb device if
>> it is busy
>>
>> Hi
>>
>> Am 30.04.22 um 04:57 schrieb Junxiao Chang:
>>> When firmware framebuffer is in use, don't remove its platform device.
>>> Or else its fb_info buffer is released by platform remove hook while
>>> device is still opened. This introduces use after free issue.
>>
>> When exactly does this happen? Do you have a reliable way of reproducing it?
>>
>> Best regards
>> Thomas
>>
>>>
>>> A kernel panic example:
>>> CPU: 2 PID: 3425 Comm: psplash Tainted: G     U  W     5.18.0-rc3
>>> Hardware name: Intel Client Platform/ADP-S DDR5 UDIMM CRB
>>> RIP: 0010:native_queued_spin_lock_slowpath+0x1c7/0x210
>>> RSP: 0018:ffffb3a0c0c2fdb0 EFLAGS: 00010206
>>> RAX: 002dc074ff5c0988 RBX: ffff92e987a5d818 RCX: ffff92e989ba9f40
>>> RDX: 0000000000002067 RSI: ffffffff864344f1 RDI: ffffffff8644183c
>>> RBP: ffff92f10f4abd40 R08: 0000000000000001 R09: ffff92e986dc2188 ...
>>> Call Trace:
>>>     <TASK>
>>>     _raw_spin_lock+0x2c/0x30
>>>     __mutex_lock.constprop.0+0x175/0x4f0
>>>     ? _raw_spin_unlock+0x15/0x30
>>>     ? list_lru_add+0x124/0x160
>>>     fb_release+0x1b/0x60
>>>     __fput+0x89/0x240
>>>     task_work_run+0x59/0x90
>>>     do_exit+0x343/0xaf0
>>>     do_group_exit+0x2d/0x90
>>>     __x64_sys_exit_group+0x14/0x20
>>>     do_syscall_64+0x40/0x90
>>>     entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced
>>> removal")
>>> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
>>> Signed-off-by: Lili Li <lili.li@intel.com>
>>> ---
>>>     drivers/video/fbdev/core/fbmem.c | 4 +++-
>>>     1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/video/fbdev/core/fbmem.c
>>> b/drivers/video/fbdev/core/fbmem.c
>>> index a6bb0e438216..ff9b9830b398 100644
>>> --- a/drivers/video/fbdev/core/fbmem.c
>>> +++ b/drivers/video/fbdev/core/fbmem.c
>>> @@ -1586,7 +1586,9 @@ static void do_remove_conflicting_framebuffers(struct apertures_struct *a,
>>>     				 * framebuffer as before without warning.
>>>     				 */
>>>     				do_unregister_framebuffer(registered_fb[i]);
>>> -			} else if (dev_is_platform(device)) {
>>> +			} else if (dev_is_platform(device) &&
>>> +					refcount_read(&registered_fb[i]->count) == 1) {
>>> +				/* Remove platform device if it is not in use. */
>>>     				registered_fb[i]->forced_out = true;
>>>     				platform_device_unregister(to_platform_device(device));
>>>     			} else {
>>
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg)
>> Geschäftsführer: Ivo Totev
> 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev
Javier Martinez Canillas May 3, 2022, 3:15 p.m. UTC | #6
Hello Junxiao,

On 5/3/22 14:29, Chang, Junxiao wrote:
> We tested Javier's fix, issue couldn't be reproduced anymore.
> https://lore.kernel.org/dri-devel/20220502135014.377945-1-javierm@redhat.com/T/#u
> 

Thanks for testing!

> But this fix doesn't cover all FB driver's interface:
> 
> 	.get_unmapped_area = get_fb_unmapped_area,
> 	.fsync =	fb_deferred_io_fsync,
> 
> file_fb_info(file) NULL checking should be added in these two interface functions(get_fb_unmapped_area and fb_deferred_io_fsync) as well?
>

Yes, I was about to send a new version adding checks for this but I decided
not to. Because by looking at these file ops, I see get_fb_unmapped_area()
is only used when:

#if defined(HAVE_ARCH_FB_UNMAPPED_AREA) || \
	(defined(CONFIG_FB_PROVIDE_GET_FB_UNMAPPED_AREA) && \
	 !defined(CONFIG_MMU))
	.get_unmapped_area = get_fb_unmapped_area,
#endif

so is not really a common configuration and fb_deferred_io_fsync() is not
defined in the same compilation unit, which means that file_fb_info() would
have to be exported (and probably renamed to fb_file_fb_info or something),
which will make the fix more complex and harder to backport to stable.

The fb_release() handler bug in the other hand is quite easy to trigger,
so I'll just keep this fix with the minimum change.

I plan to fix the other two handlers though in a separate patch.
 -- 
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat
diff mbox series

Patch

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index a6bb0e438216..ff9b9830b398 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1586,7 +1586,9 @@  static void do_remove_conflicting_framebuffers(struct apertures_struct *a,
 				 * framebuffer as before without warning.
 				 */
 				do_unregister_framebuffer(registered_fb[i]);
-			} else if (dev_is_platform(device)) {
+			} else if (dev_is_platform(device) &&
+					refcount_read(&registered_fb[i]->count) == 1) {
+				/* Remove platform device if it is not in use. */
 				registered_fb[i]->forced_out = true;
 				platform_device_unregister(to_platform_device(device));
 			} else {