diff mbox series

[v4] rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev

Message ID 20211208125220.v4.1.Iaac908f3e3149a89190ce006ba166e2d3fd247a3@changeid (mailing list archive)
State Superseded
Headers show
Series [v4] rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev | expand

Commit Message

Matthias Kaehlcke Dec. 8, 2021, 8:52 p.m. UTC
From: Sujit Kautkar <sujitka@chromium.org>

From: Sujit Kautkar <sujitka@chromium.org>

struct rpmsg_ctrldev contains a struct cdev. The current code frees
the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the
cdev is a managed object, therefore its release is not predictable
and the rpmsg_ctrldev could be freed before the cdev is entirely
released, as in the backtrace below.

[   93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
[   93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
[   93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v
[   93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G    B             5.4.163-lockdep #26
[   93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
[   93.730055] Workqueue: events kobject_delayed_cleanup
[   93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO)
[   93.740216] pc : debug_print_object+0x13c/0x1b0
[   93.744890] lr : debug_print_object+0x13c/0x1b0
[   93.749555] sp : ffffffacf5bc7940
[   93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000
[   93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000
[   93.763916] x25: ffffffd0734f856c x24: dfffffd000000000
[   93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0
[   93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0
[   93.780338] x19: ffffffd075199100 x18: 00000000000276e0
[   93.785814] x17: 0000000000000000 x16: dfffffd000000000
[   93.791291] x15: ffffffffffffffff x14: 6e6968207473696c
[   93.796768] x13: 0000000000000000 x12: ffffffd075e2b000
[   93.802244] x11: 0000000000000001 x10: 0000000000000000
[   93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900
[   93.813200] x7 : 0000000000000000 x6 : 0000000000000000
[   93.818676] x5 : 0000000000000080 x4 : 0000000000000000
[   93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001
[   93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061
[   93.835104] Call trace:
[   93.837644]  debug_print_object+0x13c/0x1b0
[   93.841963]  __debug_check_no_obj_freed+0x25c/0x3c0
[   93.846987]  debug_check_no_obj_freed+0x18/0x20
[   93.851669]  slab_free_freelist_hook+0xbc/0x1e4
[   93.856346]  kfree+0xfc/0x2f4
[   93.859416]  rpmsg_ctrldev_release_device+0x78/0xb8
[   93.864445]  device_release+0x84/0x168
[   93.868310]  kobject_cleanup+0x12c/0x298
[   93.872356]  kobject_delayed_cleanup+0x10/0x18
[   93.876948]  process_one_work+0x578/0x92c
[   93.881086]  worker_thread+0x804/0xcf8
[   93.884963]  kthread+0x2a8/0x314
[   93.888303]  ret_from_fork+0x10/0x18

The cdev_device_add/del() API was created to address this issue
(see commit 233ed09d7fda), use it instead of cdev add/del().

Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---

Changes in v4:
- call cdev_device_del() from rpmsg_chrdev_remove() instead of
  rpmsg_ctrldev_release_device()
- updated subject (was: "rpmsg: glink: Update cdev add/del API in
  rpmsg_ctrldev_release_device()")
- updated commit message
- replaced backtrace in commit message with one that doesn't have
  a dump_backtrace() call

Changes in v3:
- Remove unecessary error check as per Matthias's comment

Changes in v2:
- Fix typo in commit message

 drivers/rpmsg/rpmsg_char.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

Comments

Stephen Boyd Dec. 9, 2021, 1:05 a.m. UTC | #1
Quoting Matthias Kaehlcke (2021-12-08 12:52:28)
> From: Sujit Kautkar <sujitka@chromium.org>
>
> From: Sujit Kautkar <sujitka@chromium.org>

This is here twice. Remove one?

>
> struct rpmsg_ctrldev contains a struct cdev. The current code frees
> the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the
> cdev is a managed object, therefore its release is not predictable
> and the rpmsg_ctrldev could be freed before the cdev is entirely
> released, as in the backtrace below.
>
> [   93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> [   93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> [   93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v
> [   93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G    B             5.4.163-lockdep #26
> [   93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> [   93.730055] Workqueue: events kobject_delayed_cleanup
> [   93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO)
> [   93.740216] pc : debug_print_object+0x13c/0x1b0
> [   93.744890] lr : debug_print_object+0x13c/0x1b0
> [   93.749555] sp : ffffffacf5bc7940
> [   93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000
> [   93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000
> [   93.763916] x25: ffffffd0734f856c x24: dfffffd000000000
> [   93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0
> [   93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0
> [   93.780338] x19: ffffffd075199100 x18: 00000000000276e0
> [   93.785814] x17: 0000000000000000 x16: dfffffd000000000
> [   93.791291] x15: ffffffffffffffff x14: 6e6968207473696c
> [   93.796768] x13: 0000000000000000 x12: ffffffd075e2b000
> [   93.802244] x11: 0000000000000001 x10: 0000000000000000
> [   93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900
> [   93.813200] x7 : 0000000000000000 x6 : 0000000000000000
> [   93.818676] x5 : 0000000000000080 x4 : 0000000000000000
> [   93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001
> [   93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061
> [   93.835104] Call trace:
> [   93.837644]  debug_print_object+0x13c/0x1b0
> [   93.841963]  __debug_check_no_obj_freed+0x25c/0x3c0
> [   93.846987]  debug_check_no_obj_freed+0x18/0x20
> [   93.851669]  slab_free_freelist_hook+0xbc/0x1e4
> [   93.856346]  kfree+0xfc/0x2f4
> [   93.859416]  rpmsg_ctrldev_release_device+0x78/0xb8
> [   93.864445]  device_release+0x84/0x168
> [   93.868310]  kobject_cleanup+0x12c/0x298
> [   93.872356]  kobject_delayed_cleanup+0x10/0x18
> [   93.876948]  process_one_work+0x578/0x92c
> [   93.881086]  worker_thread+0x804/0xcf8
> [   93.884963]  kthread+0x2a8/0x314
> [   93.888303]  ret_from_fork+0x10/0x18
>
> The cdev_device_add/del() API was created to address this issue
> (see commit 233ed09d7fda), use it instead of cdev add/del().
>
> Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---

Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Matthias Kaehlcke Dec. 9, 2021, 1:15 a.m. UTC | #2
On Wed, Dec 08, 2021 at 05:05:29PM -0800, Stephen Boyd wrote:
> Quoting Matthias Kaehlcke (2021-12-08 12:52:28)
> > From: Sujit Kautkar <sujitka@chromium.org>
> >
> > From: Sujit Kautkar <sujitka@chromium.org>
> 
> This is here twice. Remove one?

Ah, forgot that tools add that automatically and added a manual entry.

Ohad/Bjorn/Mathieu: assuming there are no other comments, do you want me
to resend this patch or can you remove the extra tag when applying it?

> > struct rpmsg_ctrldev contains a struct cdev. The current code frees
> > the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the
> > cdev is a managed object, therefore its release is not predictable
> > and the rpmsg_ctrldev could be freed before the cdev is entirely
> > released, as in the backtrace below.
> >
> > [   93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> > [   93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> > [   93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v
> > [   93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G    B             5.4.163-lockdep #26
> > [   93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> > [   93.730055] Workqueue: events kobject_delayed_cleanup
> > [   93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO)
> > [   93.740216] pc : debug_print_object+0x13c/0x1b0
> > [   93.744890] lr : debug_print_object+0x13c/0x1b0
> > [   93.749555] sp : ffffffacf5bc7940
> > [   93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000
> > [   93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000
> > [   93.763916] x25: ffffffd0734f856c x24: dfffffd000000000
> > [   93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0
> > [   93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0
> > [   93.780338] x19: ffffffd075199100 x18: 00000000000276e0
> > [   93.785814] x17: 0000000000000000 x16: dfffffd000000000
> > [   93.791291] x15: ffffffffffffffff x14: 6e6968207473696c
> > [   93.796768] x13: 0000000000000000 x12: ffffffd075e2b000
> > [   93.802244] x11: 0000000000000001 x10: 0000000000000000
> > [   93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900
> > [   93.813200] x7 : 0000000000000000 x6 : 0000000000000000
> > [   93.818676] x5 : 0000000000000080 x4 : 0000000000000000
> > [   93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001
> > [   93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061
> > [   93.835104] Call trace:
> > [   93.837644]  debug_print_object+0x13c/0x1b0
> > [   93.841963]  __debug_check_no_obj_freed+0x25c/0x3c0
> > [   93.846987]  debug_check_no_obj_freed+0x18/0x20
> > [   93.851669]  slab_free_freelist_hook+0xbc/0x1e4
> > [   93.856346]  kfree+0xfc/0x2f4
> > [   93.859416]  rpmsg_ctrldev_release_device+0x78/0xb8
> > [   93.864445]  device_release+0x84/0x168
> > [   93.868310]  kobject_cleanup+0x12c/0x298
> > [   93.872356]  kobject_delayed_cleanup+0x10/0x18
> > [   93.876948]  process_one_work+0x578/0x92c
> > [   93.881086]  worker_thread+0x804/0xcf8
> > [   93.884963]  kthread+0x2a8/0x314
> > [   93.888303]  ret_from_fork+0x10/0x18
> >
> > The cdev_device_add/del() API was created to address this issue
> > (see commit 233ed09d7fda), use it instead of cdev add/del().
> >
> > Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> 
> Reviewed-by: Stephen Boyd <swboyd@chromium.org>

Thanks!
Mathieu Poirier Dec. 13, 2021, 5:32 p.m. UTC | #3
On Wed, Dec 08, 2021 at 12:52:28PM -0800, Matthias Kaehlcke wrote:
> From: Sujit Kautkar <sujitka@chromium.org>
> 
> From: Sujit Kautkar <sujitka@chromium.org>
> 
> struct rpmsg_ctrldev contains a struct cdev. The current code frees
> the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the
> cdev is a managed object, therefore its release is not predictable
> and the rpmsg_ctrldev could be freed before the cdev is entirely
> released, as in the backtrace below.
> 
> [   93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> [   93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> [   93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v
> [   93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G    B             5.4.163-lockdep #26
> [   93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> [   93.730055] Workqueue: events kobject_delayed_cleanup
> [   93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO)
> [   93.740216] pc : debug_print_object+0x13c/0x1b0
> [   93.744890] lr : debug_print_object+0x13c/0x1b0
> [   93.749555] sp : ffffffacf5bc7940
> [   93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000
> [   93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000
> [   93.763916] x25: ffffffd0734f856c x24: dfffffd000000000
> [   93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0
> [   93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0
> [   93.780338] x19: ffffffd075199100 x18: 00000000000276e0
> [   93.785814] x17: 0000000000000000 x16: dfffffd000000000
> [   93.791291] x15: ffffffffffffffff x14: 6e6968207473696c
> [   93.796768] x13: 0000000000000000 x12: ffffffd075e2b000
> [   93.802244] x11: 0000000000000001 x10: 0000000000000000
> [   93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900
> [   93.813200] x7 : 0000000000000000 x6 : 0000000000000000
> [   93.818676] x5 : 0000000000000080 x4 : 0000000000000000
> [   93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001
> [   93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061
> [   93.835104] Call trace:
> [   93.837644]  debug_print_object+0x13c/0x1b0
> [   93.841963]  __debug_check_no_obj_freed+0x25c/0x3c0
> [   93.846987]  debug_check_no_obj_freed+0x18/0x20
> [   93.851669]  slab_free_freelist_hook+0xbc/0x1e4
> [   93.856346]  kfree+0xfc/0x2f4
> [   93.859416]  rpmsg_ctrldev_release_device+0x78/0xb8
> [   93.864445]  device_release+0x84/0x168
> [   93.868310]  kobject_cleanup+0x12c/0x298
> [   93.872356]  kobject_delayed_cleanup+0x10/0x18
> [   93.876948]  process_one_work+0x578/0x92c
> [   93.881086]  worker_thread+0x804/0xcf8
> [   93.884963]  kthread+0x2a8/0x314
> [   93.888303]  ret_from_fork+0x10/0x18
> 
> The cdev_device_add/del() API was created to address this issue
> (see commit 233ed09d7fda), use it instead of cdev add/del().
> 
> Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> 
> Changes in v4:
> - call cdev_device_del() from rpmsg_chrdev_remove() instead of
>   rpmsg_ctrldev_release_device()
> - updated subject (was: "rpmsg: glink: Update cdev add/del API in
>   rpmsg_ctrldev_release_device()")
> - updated commit message
> - replaced backtrace in commit message with one that doesn't have
>   a dump_backtrace() call
> 
> Changes in v3:
> - Remove unecessary error check as per Matthias's comment
> 
> Changes in v2:
> - Fix typo in commit message
> 
>  drivers/rpmsg/rpmsg_char.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
> index b5907b80727c..b1b75ef04560 100644
> --- a/drivers/rpmsg/rpmsg_char.c
> +++ b/drivers/rpmsg/rpmsg_char.c
> @@ -459,7 +459,6 @@ static void rpmsg_ctrldev_release_device(struct device *dev)
>  
>  	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
>  	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
> -	cdev_del(&ctrldev->cdev);
>  	kfree(ctrldev);
>  }
>  
> @@ -494,19 +493,13 @@ static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
>  	dev->id = ret;
>  	dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
>  
> -	ret = cdev_add(&ctrldev->cdev, dev->devt, 1);
> +	ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev);
>  	if (ret)
>  		goto free_ctrl_ida;
>  
>  	/* We can now rely on the release function for cleanup */
>  	dev->release = rpmsg_ctrldev_release_device;
>  
> -	ret = device_add(dev);
> -	if (ret) {
> -		dev_err(&rpdev->dev, "device_add failed: %d\n", ret);
> -		put_device(dev);
> -	}
> -
>  	dev_set_drvdata(&rpdev->dev, ctrldev);
>  
>  	return ret;
> @@ -532,7 +525,7 @@ static void rpmsg_chrdev_remove(struct rpmsg_device *rpdev)
>  	if (ret)
>  		dev_warn(&rpdev->dev, "failed to nuke endpoints: %d\n", ret);
>  
> -	device_del(&ctrldev->dev);
> +	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);
>  	put_device(&ctrldev->dev);

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

I'll let Bjorn pick this one to make sure it doesn't break anything for current
users of the driver.

Thanks,
Mathieu

>  }
>  
> -- 
> 2.34.1.400.ga245620fadb-goog
>
Matthias Kaehlcke Jan. 7, 2022, 7:44 p.m. UTC | #4
On Mon, Dec 13, 2021 at 10:32:07AM -0700, Mathieu Poirier wrote:
> On Wed, Dec 08, 2021 at 12:52:28PM -0800, Matthias Kaehlcke wrote:
> > From: Sujit Kautkar <sujitka@chromium.org>
> > 
> > From: Sujit Kautkar <sujitka@chromium.org>
> > 
> > struct rpmsg_ctrldev contains a struct cdev. The current code frees
> > the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the
> > cdev is a managed object, therefore its release is not predictable
> > and the rpmsg_ctrldev could be freed before the cdev is entirely
> > released, as in the backtrace below.
> > 
> > [   93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> > [   93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> > [   93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v
> > [   93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G    B             5.4.163-lockdep #26
> > [   93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> > [   93.730055] Workqueue: events kobject_delayed_cleanup
> > [   93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO)
> > [   93.740216] pc : debug_print_object+0x13c/0x1b0
> > [   93.744890] lr : debug_print_object+0x13c/0x1b0
> > [   93.749555] sp : ffffffacf5bc7940
> > [   93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000
> > [   93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000
> > [   93.763916] x25: ffffffd0734f856c x24: dfffffd000000000
> > [   93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0
> > [   93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0
> > [   93.780338] x19: ffffffd075199100 x18: 00000000000276e0
> > [   93.785814] x17: 0000000000000000 x16: dfffffd000000000
> > [   93.791291] x15: ffffffffffffffff x14: 6e6968207473696c
> > [   93.796768] x13: 0000000000000000 x12: ffffffd075e2b000
> > [   93.802244] x11: 0000000000000001 x10: 0000000000000000
> > [   93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900
> > [   93.813200] x7 : 0000000000000000 x6 : 0000000000000000
> > [   93.818676] x5 : 0000000000000080 x4 : 0000000000000000
> > [   93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001
> > [   93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061
> > [   93.835104] Call trace:
> > [   93.837644]  debug_print_object+0x13c/0x1b0
> > [   93.841963]  __debug_check_no_obj_freed+0x25c/0x3c0
> > [   93.846987]  debug_check_no_obj_freed+0x18/0x20
> > [   93.851669]  slab_free_freelist_hook+0xbc/0x1e4
> > [   93.856346]  kfree+0xfc/0x2f4
> > [   93.859416]  rpmsg_ctrldev_release_device+0x78/0xb8
> > [   93.864445]  device_release+0x84/0x168
> > [   93.868310]  kobject_cleanup+0x12c/0x298
> > [   93.872356]  kobject_delayed_cleanup+0x10/0x18
> > [   93.876948]  process_one_work+0x578/0x92c
> > [   93.881086]  worker_thread+0x804/0xcf8
> > [   93.884963]  kthread+0x2a8/0x314
> > [   93.888303]  ret_from_fork+0x10/0x18
> > 
> > The cdev_device_add/del() API was created to address this issue
> > (see commit 233ed09d7fda), use it instead of cdev add/del().
> > 
> > Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > 
> > Changes in v4:
> > - call cdev_device_del() from rpmsg_chrdev_remove() instead of
> >   rpmsg_ctrldev_release_device()
> > - updated subject (was: "rpmsg: glink: Update cdev add/del API in
> >   rpmsg_ctrldev_release_device()")
> > - updated commit message
> > - replaced backtrace in commit message with one that doesn't have
> >   a dump_backtrace() call
> > 
> > Changes in v3:
> > - Remove unecessary error check as per Matthias's comment
> > 
> > Changes in v2:
> > - Fix typo in commit message
> > 
> >  drivers/rpmsg/rpmsg_char.c | 11 ++---------
> >  1 file changed, 2 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
> > index b5907b80727c..b1b75ef04560 100644
> > --- a/drivers/rpmsg/rpmsg_char.c
> > +++ b/drivers/rpmsg/rpmsg_char.c
> > @@ -459,7 +459,6 @@ static void rpmsg_ctrldev_release_device(struct device *dev)
> >  
> >  	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
> >  	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
> > -	cdev_del(&ctrldev->cdev);
> >  	kfree(ctrldev);
> >  }
> >  
> > @@ -494,19 +493,13 @@ static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
> >  	dev->id = ret;
> >  	dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
> >  
> > -	ret = cdev_add(&ctrldev->cdev, dev->devt, 1);
> > +	ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev);
> >  	if (ret)
> >  		goto free_ctrl_ida;
> >  
> >  	/* We can now rely on the release function for cleanup */
> >  	dev->release = rpmsg_ctrldev_release_device;
> >  
> > -	ret = device_add(dev);
> > -	if (ret) {
> > -		dev_err(&rpdev->dev, "device_add failed: %d\n", ret);
> > -		put_device(dev);
> > -	}
> > -
> >  	dev_set_drvdata(&rpdev->dev, ctrldev);
> >  
> >  	return ret;
> > @@ -532,7 +525,7 @@ static void rpmsg_chrdev_remove(struct rpmsg_device *rpdev)
> >  	if (ret)
> >  		dev_warn(&rpdev->dev, "failed to nuke endpoints: %d\n", ret);
> >  
> > -	device_del(&ctrldev->dev);
> > +	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);
> >  	put_device(&ctrldev->dev);
> 
> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> 
> I'll let Bjorn pick this one to make sure it doesn't break anything for current
> users of the driver.

Bjorn: can this land or is there any action pending on my side?
Bjorn Andersson Jan. 7, 2022, 11:57 p.m. UTC | #5
On Fri 07 Jan 11:44 PST 2022, Matthias Kaehlcke wrote:

> On Mon, Dec 13, 2021 at 10:32:07AM -0700, Mathieu Poirier wrote:
> > On Wed, Dec 08, 2021 at 12:52:28PM -0800, Matthias Kaehlcke wrote:
> > > From: Sujit Kautkar <sujitka@chromium.org>
> > > 
> > > From: Sujit Kautkar <sujitka@chromium.org>
> > > 
> > > struct rpmsg_ctrldev contains a struct cdev. The current code frees
> > > the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the
> > > cdev is a managed object, therefore its release is not predictable
> > > and the rpmsg_ctrldev could be freed before the cdev is entirely
> > > released, as in the backtrace below.
> > > 
> > > [   93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> > > [   93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> > > [   93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v
> > > [   93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G    B             5.4.163-lockdep #26
> > > [   93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> > > [   93.730055] Workqueue: events kobject_delayed_cleanup
> > > [   93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO)
> > > [   93.740216] pc : debug_print_object+0x13c/0x1b0
> > > [   93.744890] lr : debug_print_object+0x13c/0x1b0
> > > [   93.749555] sp : ffffffacf5bc7940
> > > [   93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000
> > > [   93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000
> > > [   93.763916] x25: ffffffd0734f856c x24: dfffffd000000000
> > > [   93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0
> > > [   93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0
> > > [   93.780338] x19: ffffffd075199100 x18: 00000000000276e0
> > > [   93.785814] x17: 0000000000000000 x16: dfffffd000000000
> > > [   93.791291] x15: ffffffffffffffff x14: 6e6968207473696c
> > > [   93.796768] x13: 0000000000000000 x12: ffffffd075e2b000
> > > [   93.802244] x11: 0000000000000001 x10: 0000000000000000
> > > [   93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900
> > > [   93.813200] x7 : 0000000000000000 x6 : 0000000000000000
> > > [   93.818676] x5 : 0000000000000080 x4 : 0000000000000000
> > > [   93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001
> > > [   93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061
> > > [   93.835104] Call trace:
> > > [   93.837644]  debug_print_object+0x13c/0x1b0
> > > [   93.841963]  __debug_check_no_obj_freed+0x25c/0x3c0
> > > [   93.846987]  debug_check_no_obj_freed+0x18/0x20
> > > [   93.851669]  slab_free_freelist_hook+0xbc/0x1e4
> > > [   93.856346]  kfree+0xfc/0x2f4
> > > [   93.859416]  rpmsg_ctrldev_release_device+0x78/0xb8
> > > [   93.864445]  device_release+0x84/0x168
> > > [   93.868310]  kobject_cleanup+0x12c/0x298
> > > [   93.872356]  kobject_delayed_cleanup+0x10/0x18
> > > [   93.876948]  process_one_work+0x578/0x92c
> > > [   93.881086]  worker_thread+0x804/0xcf8
> > > [   93.884963]  kthread+0x2a8/0x314
> > > [   93.888303]  ret_from_fork+0x10/0x18
> > > 
> > > The cdev_device_add/del() API was created to address this issue
> > > (see commit 233ed09d7fda), use it instead of cdev add/del().
> > > 
> > > Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > > ---
> > > 
> > > Changes in v4:
> > > - call cdev_device_del() from rpmsg_chrdev_remove() instead of
> > >   rpmsg_ctrldev_release_device()
> > > - updated subject (was: "rpmsg: glink: Update cdev add/del API in
> > >   rpmsg_ctrldev_release_device()")
> > > - updated commit message
> > > - replaced backtrace in commit message with one that doesn't have
> > >   a dump_backtrace() call
> > > 
> > > Changes in v3:
> > > - Remove unecessary error check as per Matthias's comment
> > > 
> > > Changes in v2:
> > > - Fix typo in commit message
> > > 
> > >  drivers/rpmsg/rpmsg_char.c | 11 ++---------
> > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
> > > index b5907b80727c..b1b75ef04560 100644
> > > --- a/drivers/rpmsg/rpmsg_char.c
> > > +++ b/drivers/rpmsg/rpmsg_char.c
> > > @@ -459,7 +459,6 @@ static void rpmsg_ctrldev_release_device(struct device *dev)
> > >  
> > >  	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
> > >  	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
> > > -	cdev_del(&ctrldev->cdev);
> > >  	kfree(ctrldev);
> > >  }
> > >  
> > > @@ -494,19 +493,13 @@ static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
> > >  	dev->id = ret;
> > >  	dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
> > >  
> > > -	ret = cdev_add(&ctrldev->cdev, dev->devt, 1);
> > > +	ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev);
> > >  	if (ret)
> > >  		goto free_ctrl_ida;
> > >  
> > >  	/* We can now rely on the release function for cleanup */
> > >  	dev->release = rpmsg_ctrldev_release_device;
> > >  
> > > -	ret = device_add(dev);
> > > -	if (ret) {
> > > -		dev_err(&rpdev->dev, "device_add failed: %d\n", ret);
> > > -		put_device(dev);
> > > -	}
> > > -
> > >  	dev_set_drvdata(&rpdev->dev, ctrldev);
> > >  
> > >  	return ret;
> > > @@ -532,7 +525,7 @@ static void rpmsg_chrdev_remove(struct rpmsg_device *rpdev)
> > >  	if (ret)
> > >  		dev_warn(&rpdev->dev, "failed to nuke endpoints: %d\n", ret);
> > >  
> > > -	device_del(&ctrldev->dev);
> > > +	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);
> > >  	put_device(&ctrldev->dev);
> > 
> > Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> > 
> > I'll let Bjorn pick this one to make sure it doesn't break anything for current
> > users of the driver.
> 
> Bjorn: can this land or is there any action pending on my side?

Thanks for the ping Mathieu, this looks good to me, so:

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

But it should come with:

Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface")

But afaict we have the exact same problem in rpmsg_eptdev_create(), is
there a reason why this isn't updated as well?

Regards,
Bjorn
Matthias Kaehlcke Jan. 8, 2022, 1:19 a.m. UTC | #6
On Fri, Jan 07, 2022 at 03:57:50PM -0800, Bjorn Andersson wrote:
> On Fri 07 Jan 11:44 PST 2022, Matthias Kaehlcke wrote:
> 
> > On Mon, Dec 13, 2021 at 10:32:07AM -0700, Mathieu Poirier wrote:
> > > On Wed, Dec 08, 2021 at 12:52:28PM -0800, Matthias Kaehlcke wrote:
> > > > From: Sujit Kautkar <sujitka@chromium.org>
> > > > 
> > > > From: Sujit Kautkar <sujitka@chromium.org>
> > > > 
> > > > struct rpmsg_ctrldev contains a struct cdev. The current code frees
> > > > the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the
> > > > cdev is a managed object, therefore its release is not predictable
> > > > and the rpmsg_ctrldev could be freed before the cdev is entirely
> > > > released, as in the backtrace below.
> > > > 
> > > > [   93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c
> > > > [   93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0
> > > > [   93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v
> > > > [   93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G    B             5.4.163-lockdep #26
> > > > [   93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> > > > [   93.730055] Workqueue: events kobject_delayed_cleanup
> > > > [   93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO)
> > > > [   93.740216] pc : debug_print_object+0x13c/0x1b0
> > > > [   93.744890] lr : debug_print_object+0x13c/0x1b0
> > > > [   93.749555] sp : ffffffacf5bc7940
> > > > [   93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000
> > > > [   93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000
> > > > [   93.763916] x25: ffffffd0734f856c x24: dfffffd000000000
> > > > [   93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0
> > > > [   93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0
> > > > [   93.780338] x19: ffffffd075199100 x18: 00000000000276e0
> > > > [   93.785814] x17: 0000000000000000 x16: dfffffd000000000
> > > > [   93.791291] x15: ffffffffffffffff x14: 6e6968207473696c
> > > > [   93.796768] x13: 0000000000000000 x12: ffffffd075e2b000
> > > > [   93.802244] x11: 0000000000000001 x10: 0000000000000000
> > > > [   93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900
> > > > [   93.813200] x7 : 0000000000000000 x6 : 0000000000000000
> > > > [   93.818676] x5 : 0000000000000080 x4 : 0000000000000000
> > > > [   93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001
> > > > [   93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061
> > > > [   93.835104] Call trace:
> > > > [   93.837644]  debug_print_object+0x13c/0x1b0
> > > > [   93.841963]  __debug_check_no_obj_freed+0x25c/0x3c0
> > > > [   93.846987]  debug_check_no_obj_freed+0x18/0x20
> > > > [   93.851669]  slab_free_freelist_hook+0xbc/0x1e4
> > > > [   93.856346]  kfree+0xfc/0x2f4
> > > > [   93.859416]  rpmsg_ctrldev_release_device+0x78/0xb8
> > > > [   93.864445]  device_release+0x84/0x168
> > > > [   93.868310]  kobject_cleanup+0x12c/0x298
> > > > [   93.872356]  kobject_delayed_cleanup+0x10/0x18
> > > > [   93.876948]  process_one_work+0x578/0x92c
> > > > [   93.881086]  worker_thread+0x804/0xcf8
> > > > [   93.884963]  kthread+0x2a8/0x314
> > > > [   93.888303]  ret_from_fork+0x10/0x18
> > > > 
> > > > The cdev_device_add/del() API was created to address this issue
> > > > (see commit 233ed09d7fda), use it instead of cdev add/del().
> > > > 
> > > > Signed-off-by: Sujit Kautkar <sujitka@chromium.org>
> > > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > > > ---
> > > > 
> > > > Changes in v4:
> > > > - call cdev_device_del() from rpmsg_chrdev_remove() instead of
> > > >   rpmsg_ctrldev_release_device()
> > > > - updated subject (was: "rpmsg: glink: Update cdev add/del API in
> > > >   rpmsg_ctrldev_release_device()")
> > > > - updated commit message
> > > > - replaced backtrace in commit message with one that doesn't have
> > > >   a dump_backtrace() call
> > > > 
> > > > Changes in v3:
> > > > - Remove unecessary error check as per Matthias's comment
> > > > 
> > > > Changes in v2:
> > > > - Fix typo in commit message
> > > > 
> > > >  drivers/rpmsg/rpmsg_char.c | 11 ++---------
> > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
> > > > index b5907b80727c..b1b75ef04560 100644
> > > > --- a/drivers/rpmsg/rpmsg_char.c
> > > > +++ b/drivers/rpmsg/rpmsg_char.c
> > > > @@ -459,7 +459,6 @@ static void rpmsg_ctrldev_release_device(struct device *dev)
> > > >  
> > > >  	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
> > > >  	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
> > > > -	cdev_del(&ctrldev->cdev);
> > > >  	kfree(ctrldev);
> > > >  }
> > > >  
> > > > @@ -494,19 +493,13 @@ static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
> > > >  	dev->id = ret;
> > > >  	dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
> > > >  
> > > > -	ret = cdev_add(&ctrldev->cdev, dev->devt, 1);
> > > > +	ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev);
> > > >  	if (ret)
> > > >  		goto free_ctrl_ida;
> > > >  
> > > >  	/* We can now rely on the release function for cleanup */
> > > >  	dev->release = rpmsg_ctrldev_release_device;
> > > >  
> > > > -	ret = device_add(dev);
> > > > -	if (ret) {
> > > > -		dev_err(&rpdev->dev, "device_add failed: %d\n", ret);
> > > > -		put_device(dev);
> > > > -	}
> > > > -
> > > >  	dev_set_drvdata(&rpdev->dev, ctrldev);
> > > >  
> > > >  	return ret;
> > > > @@ -532,7 +525,7 @@ static void rpmsg_chrdev_remove(struct rpmsg_device *rpdev)
> > > >  	if (ret)
> > > >  		dev_warn(&rpdev->dev, "failed to nuke endpoints: %d\n", ret);
> > > >  
> > > > -	device_del(&ctrldev->dev);
> > > > +	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);
> > > >  	put_device(&ctrldev->dev);
> > > 
> > > Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> > > 
> > > I'll let Bjorn pick this one to make sure it doesn't break anything for current
> > > users of the driver.
> > 
> > Bjorn: can this land or is there any action pending on my side?
> 
> Thanks for the ping Mathieu, this looks good to me, so:
> 
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> 
> But it should come with:
> 
> Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface")

Thanks, will add!

> But afaict we have the exact same problem in rpmsg_eptdev_create(), is
> there a reason why this isn't updated as well?

Because I didn't notice :)

I'll fix it too, not sure yet if in the same patch or a separate one.
diff mbox series

Patch

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index b5907b80727c..b1b75ef04560 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -459,7 +459,6 @@  static void rpmsg_ctrldev_release_device(struct device *dev)
 
 	ida_simple_remove(&rpmsg_ctrl_ida, dev->id);
 	ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt));
-	cdev_del(&ctrldev->cdev);
 	kfree(ctrldev);
 }
 
@@ -494,19 +493,13 @@  static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
 	dev->id = ret;
 	dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
 
-	ret = cdev_add(&ctrldev->cdev, dev->devt, 1);
+	ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev);
 	if (ret)
 		goto free_ctrl_ida;
 
 	/* We can now rely on the release function for cleanup */
 	dev->release = rpmsg_ctrldev_release_device;
 
-	ret = device_add(dev);
-	if (ret) {
-		dev_err(&rpdev->dev, "device_add failed: %d\n", ret);
-		put_device(dev);
-	}
-
 	dev_set_drvdata(&rpdev->dev, ctrldev);
 
 	return ret;
@@ -532,7 +525,7 @@  static void rpmsg_chrdev_remove(struct rpmsg_device *rpdev)
 	if (ret)
 		dev_warn(&rpdev->dev, "failed to nuke endpoints: %d\n", ret);
 
-	device_del(&ctrldev->dev);
+	cdev_device_del(&ctrldev->cdev, &ctrldev->dev);
 	put_device(&ctrldev->dev);
 }