Message ID | 1599741856-16239-2-git-send-email-mansur@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Venus - Handle race conditions in concurrency | expand |
On 9/10/20 3:44 PM, Mansur Alisha Shaik wrote: > For core ops we are having only write protect but there > is no read protect, because of this in multthreading > and concurrency, one CPU core is reading without wait > which is causing the NULL pointer dereferece crash. > > one such scenario is as show below, where in one CPU > core, core->ops becoming NULL and in another CPU core > calling core->ops->session_init(). > > CPU: core-7: > Call trace: > hfi_session_init+0x180/0x1dc [venus_core] > vdec_queue_setup+0x9c/0x364 [venus_dec] > vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common] > vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2] > v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem] > v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem] > v4l_reqbufs+0x4c/0x5c > __video_do_ioctl+0x2b0/0x39c > > CPU: core-0: > Call trace: > venus_shutdown+0x98/0xfc [venus_core] > venus_sys_error_handler+0x64/0x148 [venus_core] > process_one_work+0x210/0x3d0 > worker_thread+0x248/0x3f4 > kthread+0x11c/0x12c > > Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org> > Acked-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> > --- > Changes in V2: > - Addressed review comments by stan by validating on top > - of https://lore.kernel.org/patchwork/project/lkml/list/?series=455962 > > drivers/media/platform/qcom/venus/hfi.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c > index a59022a..3137071 100644 > --- a/drivers/media/platform/qcom/venus/hfi.c > +++ b/drivers/media/platform/qcom/venus/hfi.c > @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create); > int hfi_session_init(struct venus_inst *inst, u32 pixfmt) > { > struct venus_core *core = inst->core; > - const struct hfi_ops *ops = core->ops; > + const struct hfi_ops *ops; > int ret; > If we are in system error recovery the session_init cannot pass successfully, so we exit early in the function. I'd suggest to make it: /* If core shutdown is in progress or we are in system error recovery, return an error */ mutex_lock(&core->lock); if (!core->ops || core->sys_error) { mutex_unclock(&core->lock); return -EIO; } mutex_unclock(&core->lock); > if (inst->state != INST_UNINIT) > @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, u32 pixfmt) > inst->hfi_codec = to_codec_type(pixfmt); > reinit_completion(&inst->done); > > + mutex_lock(&core->lock); > + ops = core->ops; > ret = ops->session_init(inst, inst->session_type, inst->hfi_codec); > if (ret) > return ret; > > + mutex_unlock(&core->lock); > ret = wait_session_msg(inst); > if (ret) > return ret; >
On 2020-09-11 15:40, Stanimir Varbanov wrote: > On 9/10/20 3:44 PM, Mansur Alisha Shaik wrote: >> For core ops we are having only write protect but there >> is no read protect, because of this in multthreading >> and concurrency, one CPU core is reading without wait >> which is causing the NULL pointer dereferece crash. >> >> one such scenario is as show below, where in one CPU >> core, core->ops becoming NULL and in another CPU core >> calling core->ops->session_init(). >> >> CPU: core-7: >> Call trace: >> hfi_session_init+0x180/0x1dc [venus_core] >> vdec_queue_setup+0x9c/0x364 [venus_dec] >> vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common] >> vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2] >> v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem] >> v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem] >> v4l_reqbufs+0x4c/0x5c >> __video_do_ioctl+0x2b0/0x39c >> >> CPU: core-0: >> Call trace: >> venus_shutdown+0x98/0xfc [venus_core] >> venus_sys_error_handler+0x64/0x148 [venus_core] >> process_one_work+0x210/0x3d0 >> worker_thread+0x248/0x3f4 >> kthread+0x11c/0x12c >> >> Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org> >> Acked-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> >> --- >> Changes in V2: >> - Addressed review comments by stan by validating on top >> - of >> https://lore.kernel.org/patchwork/project/lkml/list/?series=455962 >> >> drivers/media/platform/qcom/venus/hfi.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/media/platform/qcom/venus/hfi.c >> b/drivers/media/platform/qcom/venus/hfi.c >> index a59022a..3137071 100644 >> --- a/drivers/media/platform/qcom/venus/hfi.c >> +++ b/drivers/media/platform/qcom/venus/hfi.c >> @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create); >> int hfi_session_init(struct venus_inst *inst, u32 pixfmt) >> { >> struct venus_core *core = inst->core; >> - const struct hfi_ops *ops = core->ops; >> + const struct hfi_ops *ops; >> int ret; >> > > If we are in system error recovery the session_init cannot pass > successfully, so we exit early in the function. > > I'd suggest to make it: > > /* If core shutdown is in progress or we are in system error > recovery, > return an error */ > mutex_lock(&core->lock); > if (!core->ops || core->sys_error) { > mutex_unclock(&core->lock); > return -EIO; > } > mutex_unclock(&core->lock); > Tried above suggestion and ran the failed scenario, I didn't see any issue. Posted new version https://lore.kernel.org/patchwork/project/lkml/list/?series=463091 >> if (inst->state != INST_UNINIT) >> @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, >> u32 pixfmt) >> inst->hfi_codec = to_codec_type(pixfmt); >> reinit_completion(&inst->done); >> >> + mutex_lock(&core->lock); >> + ops = core->ops; >> ret = ops->session_init(inst, inst->session_type, inst->hfi_codec); >> if (ret) >> return ret; >> >> + mutex_unlock(&core->lock); >> ret = wait_session_msg(inst); >> if (ret) >> return ret; >>
diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c index a59022a..3137071 100644 --- a/drivers/media/platform/qcom/venus/hfi.c +++ b/drivers/media/platform/qcom/venus/hfi.c @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create); int hfi_session_init(struct venus_inst *inst, u32 pixfmt) { struct venus_core *core = inst->core; - const struct hfi_ops *ops = core->ops; + const struct hfi_ops *ops; int ret; if (inst->state != INST_UNINIT) @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, u32 pixfmt) inst->hfi_codec = to_codec_type(pixfmt); reinit_completion(&inst->done); + mutex_lock(&core->lock); + ops = core->ops; ret = ops->session_init(inst, inst->session_type, inst->hfi_codec); if (ret) return ret; + mutex_unlock(&core->lock); ret = wait_session_msg(inst); if (ret) return ret;