Message ID | 9f481156-f220-4adf-b3d9-670871351e26@siemens.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | remoteproc: k3-r5: Fix error handling when power-up failed | expand |
On 19-08-2024 20:54, Jan Kiszka wrote: > From: Jan Kiszka <jan.kiszka@siemens.com> > > By simply bailing out, the driver was violating its rule and internal Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work. > assumptions that either both or no rproc should be initialized. E.g., > this could cause the first core to be available but not the second one, > leading to crashes on its shutdown later on while trying to dereference > that second instance. > > Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > --- > drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c > index 39a47540c590..eb09d2e9b32a 100644 > --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c > +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c > @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) > dev_err(dev, > "Timed out waiting for %s core to power up!\n", > rproc->name); > - return ret; > + goto err_powerup; > } > } > > @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) > } > } > > +err_powerup: > rproc_del(rproc); Please use devm_rproc_add() to avoid having to do rproc_del() manually here. > err_add: > k3_r5_reserved_mem_exit(kproc);
On 21.08.24 07:30, Beleswar Prasad Padhi wrote: > > On 19-08-2024 20:54, Jan Kiszka wrote: >> From: Jan Kiszka <jan.kiszka@siemens.com> >> >> By simply bailing out, the driver was violating its rule and internal > > > Using device lifecycle managed functions to register the rproc > (devm_rproc_add()), bailing out with an error code will work. > >> assumptions that either both or no rproc should be initialized. E.g., >> this could cause the first core to be available but not the second one, >> leading to crashes on its shutdown later on while trying to dereference >> that second instance. >> >> Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up >> before powering up core1") >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >> --- >> drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c >> b/drivers/remoteproc/ti_k3_r5_remoteproc.c >> index 39a47540c590..eb09d2e9b32a 100644 >> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c >> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c >> @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct >> platform_device *pdev) >> dev_err(dev, >> "Timed out waiting for %s core to power up!\n", >> rproc->name); >> - return ret; >> + goto err_powerup; >> } >> } >> @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct >> platform_device *pdev) >> } >> } >> +err_powerup: >> rproc_del(rproc); > > > Please use devm_rproc_add() to avoid having to do rproc_del() manually > here. This is just be the tip of the iceberg. The whole code needs to be reworked accordingly so that we can drop these goto, not just this one. Just look at k3_r5_reserved_mem_init. Your change in [1] was also too early in this regard, breaking current error handling additionally. I'll stop my whac-a-mole. Someone needs to sit down and do that for the complete code consistently. And test the error cases. Jan [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=f3f11cfe890733373ddbb1ce8991ccd4ee5e79e1 > >> err_add: >> k3_r5_reserved_mem_exit(kproc);
On 21-08-2024 23:40, Jan Kiszka wrote: > On 21.08.24 07:30, Beleswar Prasad Padhi wrote: >> On 19-08-2024 20:54, Jan Kiszka wrote: >>> From: Jan Kiszka <jan.kiszka@siemens.com> >>> >>> By simply bailing out, the driver was violating its rule and internal >> >> Using device lifecycle managed functions to register the rproc >> (devm_rproc_add()), bailing out with an error code will work. >> >>> assumptions that either both or no rproc should be initialized. E.g., >>> this could cause the first core to be available but not the second one, >>> leading to crashes on its shutdown later on while trying to dereference >>> that second instance. >>> >>> Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up >>> before powering up core1") >>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>> --- >>> drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>> b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>> index 39a47540c590..eb09d2e9b32a 100644 >>> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>> @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct >>> platform_device *pdev) >>> dev_err(dev, >>> "Timed out waiting for %s core to power up!\n", >>> rproc->name); >>> - return ret; >>> + goto err_powerup; >>> } >>> } >>> @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct >>> platform_device *pdev) >>> } >>> } >>> +err_powerup: >>> rproc_del(rproc); >> >> Please use devm_rproc_add() to avoid having to do rproc_del() manually >> here. > This is just be the tip of the iceberg. The whole code needs to be > reworked accordingly so that we can drop these goto, not just this one. You are correct. Unfortunately, the organic growth of this driver has resulted in a need to refactor. I plan on doing this and post the refactoring soon. This should be part of the overall refactoring as suggested by Mathieu[2]. But for the immediate problem, your fix does patch things up.. hence: Acked-by: Beleswar Padhi <b-padhi@ti.com> [2]: https://lore.kernel.org/all/Zr4w8Vj0mVo5sBsJ@p14s/ > Just look at k3_r5_reserved_mem_init. Your change in [1] was also too > early in this regard, breaking current error handling additionally. Curious, Could you point out how does the change in [1] breaks current error handling? > > I'll stop my whac-a-mole. Someone needs to sit down and do that for the > complete code consistently. And test the error cases. > > Jan > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=f3f11cfe890733373ddbb1ce8991ccd4ee5e79e1 > >>> err_add: >>> k3_r5_reserved_mem_exit(kproc);
On 22.08.24 07:22, Beleswar Prasad Padhi wrote: > > On 21-08-2024 23:40, Jan Kiszka wrote: >> On 21.08.24 07:30, Beleswar Prasad Padhi wrote: >>> On 19-08-2024 20:54, Jan Kiszka wrote: >>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>> >>>> By simply bailing out, the driver was violating its rule and internal >>> >>> Using device lifecycle managed functions to register the rproc >>> (devm_rproc_add()), bailing out with an error code will work. >>> >>>> assumptions that either both or no rproc should be initialized. E.g., >>>> this could cause the first core to be available but not the second one, >>>> leading to crashes on its shutdown later on while trying to dereference >>>> that second instance. >>>> >>>> Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up >>>> before powering up core1") >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>> --- >>>> drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>> b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>> index 39a47540c590..eb09d2e9b32a 100644 >>>> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>> @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct >>>> platform_device *pdev) >>>> dev_err(dev, >>>> "Timed out waiting for %s core to power up!\n", >>>> rproc->name); >>>> - return ret; >>>> + goto err_powerup; >>>> } >>>> } >>>> @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct >>>> platform_device *pdev) >>>> } >>>> } >>>> +err_powerup: >>>> rproc_del(rproc); >>> >>> Please use devm_rproc_add() to avoid having to do rproc_del() manually >>> here. >> This is just be the tip of the iceberg. The whole code needs to be >> reworked accordingly so that we can drop these goto, not just this one. > > > You are correct. Unfortunately, the organic growth of this driver has > resulted in a need to refactor. I plan on doing this and post the > refactoring soon. This should be part of the overall refactoring as > suggested by Mathieu[2]. But for the immediate problem, your fix does > patch things up.. hence: > > Acked-by: Beleswar Padhi <b-padhi@ti.com> > > [2]: https://lore.kernel.org/all/Zr4w8Vj0mVo5sBsJ@p14s/ > >> Just look at k3_r5_reserved_mem_init. Your change in [1] was also too >> early in this regard, breaking current error handling additionally. > > > > Curious, Could you point out how does the change in [1] breaks current > error handling? > Same story: You leave the inner loop of k3_r5_cluster_rproc_init() via return without that loop having been converted to support this. Jan
On 22-08-2024 10:57, Jan Kiszka wrote: > On 22.08.24 07:22, Beleswar Prasad Padhi wrote: >> On 21-08-2024 23:40, Jan Kiszka wrote: >>> On 21.08.24 07:30, Beleswar Prasad Padhi wrote: >>>> On 19-08-2024 20:54, Jan Kiszka wrote: >>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>> >>>>> By simply bailing out, the driver was violating its rule and internal >>>> Using device lifecycle managed functions to register the rproc >>>> (devm_rproc_add()), bailing out with an error code will work. >>>> >>>>> assumptions that either both or no rproc should be initialized. E.g., >>>>> this could cause the first core to be available but not the second one, >>>>> leading to crashes on its shutdown later on while trying to dereference >>>>> that second instance. >>>>> >>>>> Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up >>>>> before powering up core1") >>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>> --- >>>>> drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>> b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>> index 39a47540c590..eb09d2e9b32a 100644 >>>>> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>> @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct >>>>> platform_device *pdev) >>>>> dev_err(dev, >>>>> "Timed out waiting for %s core to power up!\n", >>>>> rproc->name); >>>>> - return ret; >>>>> + goto err_powerup; >>>>> } >>>>> } >>>>> @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct >>>>> platform_device *pdev) >>>>> } >>>>> } >>>>> +err_powerup: >>>>> rproc_del(rproc); >>>> Please use devm_rproc_add() to avoid having to do rproc_del() manually >>>> here. >>> This is just be the tip of the iceberg. The whole code needs to be >>> reworked accordingly so that we can drop these goto, not just this one. >> >> You are correct. Unfortunately, the organic growth of this driver has >> resulted in a need to refactor. I plan on doing this and post the >> refactoring soon. This should be part of the overall refactoring as >> suggested by Mathieu[2]. But for the immediate problem, your fix does >> patch things up.. hence: >> >> Acked-by: Beleswar Padhi <b-padhi@ti.com> >> >> [2]: https://lore.kernel.org/all/Zr4w8Vj0mVo5sBsJ@p14s/ >> >>> Just look at k3_r5_reserved_mem_init. Your change in [1] was also too >>> early in this regard, breaking current error handling additionally. >> >> >> Curious, Could you point out how does the change in [1] breaks current >> error handling? >> > Same story: You leave the inner loop of k3_r5_cluster_rproc_init() via > return without that loop having been converted to support this. The rproc has been allocated via devm_rproc_alloc[3] before the return[4] at k3_r5_cluster_rproc_init. Thus, it is capable of freeing the rproc just based on error codes. It was tested. [3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/remoteproc/ti_k3_r5_remoteproc.c#n1238 [4]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/remoteproc/ti_k3_r5_remoteproc.c#n1259 > > Jan >
On 22.08.24 07:42, Beleswar Prasad Padhi wrote: > > On 22-08-2024 10:57, Jan Kiszka wrote: >> On 22.08.24 07:22, Beleswar Prasad Padhi wrote: >>> On 21-08-2024 23:40, Jan Kiszka wrote: >>>> On 21.08.24 07:30, Beleswar Prasad Padhi wrote: >>>>> On 19-08-2024 20:54, Jan Kiszka wrote: >>>>>> From: Jan Kiszka <jan.kiszka@siemens.com> >>>>>> >>>>>> By simply bailing out, the driver was violating its rule and internal >>>>> Using device lifecycle managed functions to register the rproc >>>>> (devm_rproc_add()), bailing out with an error code will work. >>>>> >>>>>> assumptions that either both or no rproc should be initialized. E.g., >>>>>> this could cause the first core to be available but not the second >>>>>> one, >>>>>> leading to crashes on its shutdown later on while trying to >>>>>> dereference >>>>>> that second instance. >>>>>> >>>>>> Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up >>>>>> before powering up core1") >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >>>>>> --- >>>>>> drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>>> b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>>> index 39a47540c590..eb09d2e9b32a 100644 >>>>>> --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>>> +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c >>>>>> @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct >>>>>> platform_device *pdev) >>>>>> dev_err(dev, >>>>>> "Timed out waiting for %s core to power up!\n", >>>>>> rproc->name); >>>>>> - return ret; >>>>>> + goto err_powerup; >>>>>> } >>>>>> } >>>>>> @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct >>>>>> platform_device *pdev) >>>>>> } >>>>>> } >>>>>> +err_powerup: >>>>>> rproc_del(rproc); >>>>> Please use devm_rproc_add() to avoid having to do rproc_del() manually >>>>> here. >>>> This is just be the tip of the iceberg. The whole code needs to be >>>> reworked accordingly so that we can drop these goto, not just this one. >>> >>> You are correct. Unfortunately, the organic growth of this driver has >>> resulted in a need to refactor. I plan on doing this and post the >>> refactoring soon. This should be part of the overall refactoring as >>> suggested by Mathieu[2]. But for the immediate problem, your fix does >>> patch things up.. hence: >>> >>> Acked-by: Beleswar Padhi <b-padhi@ti.com> >>> >>> [2]: https://lore.kernel.org/all/Zr4w8Vj0mVo5sBsJ@p14s/ >>> >>>> Just look at k3_r5_reserved_mem_init. Your change in [1] was also too >>>> early in this regard, breaking current error handling additionally. >>> >>> >>> Curious, Could you point out how does the change in [1] breaks current >>> error handling? >>> >> Same story: You leave the inner loop of k3_r5_cluster_rproc_init() via >> return without that loop having been converted to support this. > > > The rproc has been allocated via devm_rproc_alloc[3] before the This is insufficient. Study the code again what it currently does to role back. I'm not saying that this is the only way to do it, but you need to change the code FIRST before introducing direct returns. And once you can do that, you should obviously replace the existing gotos as well. Jan > return[4] at k3_r5_cluster_rproc_init. Thus, it is capable of freeing > the rproc just based on error codes. It was tested. > [3]: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/remoteproc/ti_k3_r5_remoteproc.c#n1238 > [4]: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/remoteproc/ti_k3_r5_remoteproc.c#n1259 > >> >> Jan >>
On Thu, Aug 22, 2024 at 10:52:40AM +0530, Beleswar Prasad Padhi wrote: > > On 21-08-2024 23:40, Jan Kiszka wrote: > > On 21.08.24 07:30, Beleswar Prasad Padhi wrote: > > > On 19-08-2024 20:54, Jan Kiszka wrote: > > > > From: Jan Kiszka <jan.kiszka@siemens.com> > > > > > > > > By simply bailing out, the driver was violating its rule and internal > > > > > > Using device lifecycle managed functions to register the rproc > > > (devm_rproc_add()), bailing out with an error code will work. > > > > > > > assumptions that either both or no rproc should be initialized. E.g., > > > > this could cause the first core to be available but not the second one, > > > > leading to crashes on its shutdown later on while trying to dereference > > > > that second instance. > > > > > > > > Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up > > > > before powering up core1") > > > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > > > > --- > > > > drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c > > > > b/drivers/remoteproc/ti_k3_r5_remoteproc.c > > > > index 39a47540c590..eb09d2e9b32a 100644 > > > > --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c > > > > +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c > > > > @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct > > > > platform_device *pdev) > > > > dev_err(dev, > > > > "Timed out waiting for %s core to power up!\n", > > > > rproc->name); > > > > - return ret; > > > > + goto err_powerup; > > > > } > > > > } > > > > @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct > > > > platform_device *pdev) > > > > } > > > > } > > > > +err_powerup: > > > > rproc_del(rproc); > > > > > > Please use devm_rproc_add() to avoid having to do rproc_del() manually > > > here. > > This is just be the tip of the iceberg. The whole code needs to be > > reworked accordingly so that we can drop these goto, not just this one. > > > You are correct. Unfortunately, the organic growth of this driver has > resulted in a need to refactor. I plan on doing this and post the > refactoring soon. This should be part of the overall refactoring as > suggested by Mathieu[2]. But for the immediate problem, your fix does patch > things up.. hence: > > Acked-by: Beleswar Padhi <b-padhi@ti.com> > I have applied this patch. Thanks, Mathieu > [2]: https://lore.kernel.org/all/Zr4w8Vj0mVo5sBsJ@p14s/ > > > Just look at k3_r5_reserved_mem_init. Your change in [1] was also too > > early in this regard, breaking current error handling additionally. > > > > Curious, Could you point out how does the change in [1] breaks current error > handling? > > > > > I'll stop my whac-a-mole. Someone needs to sit down and do that for the > > complete code consistently. And test the error cases. > > > > Jan > > > > [1] > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=f3f11cfe890733373ddbb1ce8991ccd4ee5e79e1 > > > > > > err_add: > > > > k3_r5_reserved_mem_exit(kproc);
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc); err_add: k3_r5_reserved_mem_exit(kproc);