Message ID | 1599741856-16239-3-git-send-email-mansur@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Venus - Handle race conditions in concurrency | expand |
Hi, On 9/10/20 3:44 PM, Mansur Alisha Shaik wrote: > In concurrency usecase and reboot scenario we are > observing race condition and seeing NULL pointer > dereference crash. In shutdown path and system > recovery path we are destroying the same mutex > hence seeing crash. > > This case is handled by mutex protection and > cancel delayed work items in work queue. > > Below is the call trace for the crash > Call trace: > venus_remove+0xdc/0xec [venus_core] > venus_core_shutdown+0x1c/0x34 [venus_core] > platform_drv_shutdown+0x28/0x34 > device_shutdown+0x154/0x1fc > kernel_restart_prepare+0x40/0x4c > kernel_restart+0x1c/0x64 > > Call trace: > mutex_lock+0x34/0x60 > venus_hfi_destroy+0x28/0x98 [venus_core] > hfi_destroy+0x1c/0x28 [venus_core] I queued up [1] and after it this cannot happen anymore because hfi_destroy() is not called by venus_sys_error_handler(). So I guess this patch is not needed anymore. [1] https://www.spinics.net/lists/linux-arm-msm/msg70092.html > venus_sys_error_handler+0x60/0x14c [venus_core] > process_one_work+0x210/0x3d0 > worker_thread+0x248/0x3f4 > kthread+0x11c/0x12c > ret_from_fork+0x10/0x18 > > Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org> > --- > drivers/media/platform/qcom/venus/core.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c > index c5af428..69aa199 100644 > --- a/drivers/media/platform/qcom/venus/core.c > +++ b/drivers/media/platform/qcom/venus/core.c > @@ -323,6 +323,8 @@ static int venus_remove(struct platform_device *pdev) > struct device *dev = core->dev; > int ret; > > + cancel_delayed_work_sync(&core->work); > + > ret = pm_runtime_get_sync(dev); > WARN_ON(ret < 0); > > @@ -340,7 +342,9 @@ static int venus_remove(struct platform_device *pdev) > if (pm_ops->core_put) > pm_ops->core_put(dev); > > + mutex_lock(&core->lock); > hfi_destroy(core); > + mutex_unlock(&core->lock); > > icc_put(core->video_path); > icc_put(core->cpucfg_path); >
On 2020-09-11 15:52, Stanimir Varbanov wrote: > Hi, > > On 9/10/20 3:44 PM, Mansur Alisha Shaik wrote: >> In concurrency usecase and reboot scenario we are >> observing race condition and seeing NULL pointer >> dereference crash. In shutdown path and system >> recovery path we are destroying the same mutex >> hence seeing crash. >> >> This case is handled by mutex protection and >> cancel delayed work items in work queue. >> >> Below is the call trace for the crash >> Call trace: >> venus_remove+0xdc/0xec [venus_core] >> venus_core_shutdown+0x1c/0x34 [venus_core] >> platform_drv_shutdown+0x28/0x34 >> device_shutdown+0x154/0x1fc >> kernel_restart_prepare+0x40/0x4c >> kernel_restart+0x1c/0x64 >> >> Call trace: >> mutex_lock+0x34/0x60 >> venus_hfi_destroy+0x28/0x98 [venus_core] >> hfi_destroy+0x1c/0x28 [venus_core] > > I queued up [1] and after it this cannot happen anymore because > hfi_destroy() is not called by venus_sys_error_handler(). > > So I guess this patch is not needed anymore. > > [1] https://www.spinics.net/lists/linux-arm-msm/msg70092.html > Yes, this patch is not needed any more. rebased and posted new version https://lore.kernel.org/patchwork/project/lkml/list/?series=463091 >> venus_sys_error_handler+0x60/0x14c [venus_core] >> process_one_work+0x210/0x3d0 >> worker_thread+0x248/0x3f4 >> kthread+0x11c/0x12c >> ret_from_fork+0x10/0x18 >> >> Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org> >> --- >> drivers/media/platform/qcom/venus/core.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/media/platform/qcom/venus/core.c >> b/drivers/media/platform/qcom/venus/core.c >> index c5af428..69aa199 100644 >> --- a/drivers/media/platform/qcom/venus/core.c >> +++ b/drivers/media/platform/qcom/venus/core.c >> @@ -323,6 +323,8 @@ static int venus_remove(struct platform_device >> *pdev) >> struct device *dev = core->dev; >> int ret; >> >> + cancel_delayed_work_sync(&core->work); >> + >> ret = pm_runtime_get_sync(dev); >> WARN_ON(ret < 0); >> >> @@ -340,7 +342,9 @@ static int venus_remove(struct platform_device >> *pdev) >> if (pm_ops->core_put) >> pm_ops->core_put(dev); >> >> + mutex_lock(&core->lock); >> hfi_destroy(core); >> + mutex_unlock(&core->lock); >> >> icc_put(core->video_path); >> icc_put(core->cpucfg_path); >>
diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c index c5af428..69aa199 100644 --- a/drivers/media/platform/qcom/venus/core.c +++ b/drivers/media/platform/qcom/venus/core.c @@ -323,6 +323,8 @@ static int venus_remove(struct platform_device *pdev) struct device *dev = core->dev; int ret; + cancel_delayed_work_sync(&core->work); + ret = pm_runtime_get_sync(dev); WARN_ON(ret < 0); @@ -340,7 +342,9 @@ static int venus_remove(struct platform_device *pdev) if (pm_ops->core_put) pm_ops->core_put(dev); + mutex_lock(&core->lock); hfi_destroy(core); + mutex_unlock(&core->lock); icc_put(core->video_path); icc_put(core->cpucfg_path);
In concurrency usecase and reboot scenario we are observing race condition and seeing NULL pointer dereference crash. In shutdown path and system recovery path we are destroying the same mutex hence seeing crash. This case is handled by mutex protection and cancel delayed work items in work queue. Below is the call trace for the crash Call trace: venus_remove+0xdc/0xec [venus_core] venus_core_shutdown+0x1c/0x34 [venus_core] platform_drv_shutdown+0x28/0x34 device_shutdown+0x154/0x1fc kernel_restart_prepare+0x40/0x4c kernel_restart+0x1c/0x64 Call trace: mutex_lock+0x34/0x60 venus_hfi_destroy+0x28/0x98 [venus_core] hfi_destroy+0x1c/0x28 [venus_core] venus_sys_error_handler+0x60/0x14c [venus_core] process_one_work+0x210/0x3d0 worker_thread+0x248/0x3f4 kthread+0x11c/0x12c ret_from_fork+0x10/0x18 Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org> --- drivers/media/platform/qcom/venus/core.c | 4 ++++ 1 file changed, 4 insertions(+)