diff mbox

ARM: OMAP2+: omap_device: add fail hook for runtime_pm when bad data is detected

Message ID 1386121153-32351-1-git-send-email-nm@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nishanth Menon Dec. 4, 2013, 1:39 a.m. UTC
Due to the cross dependencies between hwmod for automanaged device
information for OMAP and dts node definitions, we can run into scenarios
where the dts node is defined, however it's hwmod entry is yet to be
added. In these cases:
a) omap_device does not register a pm_domain (since it cannot find
   hwmod entry).
b) driver does not know about (a), does a pm_runtime_get_sync which
   never fails
c) It then tries to do some operation on the device (such as read the
  revision register (as part of probe) without clock or adequate OMAP
  generic PM operation performed for enabling the module.

This causes a crash such as that reported in:
https://bugzilla.kernel.org/show_bug.cgi?id=66441

When 'ti,hwmod' is provided in dt node, it is expected that the device
will not function without the OMAP's power automanagement. Hence, when
we hit a fail condition (due to hwmod entries not present or other
similar scenario), fail at pm_domain level due to lack of data, provide
enough information for it to be fixed, however, it allows for the driver
to take appropriate measures to prevent crash.

Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Nishanth Menon <nm@ti.com>
---
 arch/arm/mach-omap2/omap_device.c |   24 ++++++++++++++++++++++++
 arch/arm/mach-omap2/omap_device.h |    1 +
 2 files changed, 25 insertions(+)

Comments

Joel Fernandes Dec. 4, 2013, 8:08 a.m. UTC | #1
On 12/04/2013 07:09 AM, Nishanth Menon wrote:
> Due to the cross dependencies between hwmod for automanaged device
> information for OMAP and dts node definitions, we can run into scenarios
> where the dts node is defined, however it's hwmod entry is yet to be
> added. In these cases:
> a) omap_device does not register a pm_domain (since it cannot find
>    hwmod entry).
> b) driver does not know about (a), does a pm_runtime_get_sync which
>    never fails
> c) It then tries to do some operation on the device (such as read the
>   revision register (as part of probe) without clock or adequate OMAP
>   generic PM operation performed for enabling the module.
> 
> This causes a crash such as that reported in:
> https://bugzilla.kernel.org/show_bug.cgi?id=66441
> 
> When 'ti,hwmod' is provided in dt node, it is expected that the device
> will not function without the OMAP's power automanagement. Hence, when
> we hit a fail condition (due to hwmod entries not present or other
> similar scenario), fail at pm_domain level due to lack of data, provide
> enough information for it to be fixed, however, it allows for the driver
> to take appropriate measures to prevent crash.
> 
> Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
> Signed-off-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/mach-omap2/omap_device.c |   24 ++++++++++++++++++++++++
>  arch/arm/mach-omap2/omap_device.h |    1 +
>  2 files changed, 25 insertions(+)
> 
> diff --git a/arch/arm/mach-omap2/omap_device.c b/arch/arm/mach-omap2/omap_device.c
> index 53f0735..e0a398c 100644
> --- a/arch/arm/mach-omap2/omap_device.c
> +++ b/arch/arm/mach-omap2/omap_device.c
> @@ -183,6 +183,10 @@ static int omap_device_build_from_dt(struct platform_device *pdev)
>  odbfd_exit1:
>  	kfree(hwmods);
>  odbfd_exit:
> +	/* if data/we are at fault.. load up a fail handler */
> +	if (ret)
> +		pdev->dev.pm_domain = &omap_device_fail_pm_domain;
> +
>  	return ret;
>  }
>  

Just wondering, can't we just print the warning here instead of registering new
pm_domain callbacks?

Concerned that all this LOC may end up being dead code when the "ti,hwmods"
property becomes obsolete anyway.

-Joel
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Dec. 4, 2013, 11:33 a.m. UTC | #2
On 12/04/2013 02:08 AM, Joel Fernandes wrote:
> On 12/04/2013 07:09 AM, Nishanth Menon wrote:
>> Due to the cross dependencies between hwmod for automanaged device
>> information for OMAP and dts node definitions, we can run into scenarios
>> where the dts node is defined, however it's hwmod entry is yet to be
>> added. In these cases:
>> a) omap_device does not register a pm_domain (since it cannot find
>>     hwmod entry).
>> b) driver does not know about (a), does a pm_runtime_get_sync which
>>     never fails
>> c) It then tries to do some operation on the device (such as read the
>>    revision register (as part of probe) without clock or adequate OMAP
>>    generic PM operation performed for enabling the module.
>>
>> This causes a crash such as that reported in:
>> https://bugzilla.kernel.org/show_bug.cgi?id=66441
>>
>> When 'ti,hwmod' is provided in dt node, it is expected that the device
>> will not function without the OMAP's power automanagement. Hence, when
>> we hit a fail condition (due to hwmod entries not present or other
>> similar scenario), fail at pm_domain level due to lack of data, provide
>> enough information for it to be fixed, however, it allows for the driver
>> to take appropriate measures to prevent crash.
>>
>> Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
>> Signed-off-by: Nishanth Menon <nm@ti.com>
>> ---
>>   arch/arm/mach-omap2/omap_device.c |   24 ++++++++++++++++++++++++
>>   arch/arm/mach-omap2/omap_device.h |    1 +
>>   2 files changed, 25 insertions(+)
>>
>> diff --git a/arch/arm/mach-omap2/omap_device.c b/arch/arm/mach-omap2/omap_device.c
>> index 53f0735..e0a398c 100644
>> --- a/arch/arm/mach-omap2/omap_device.c
>> +++ b/arch/arm/mach-omap2/omap_device.c
>> @@ -183,6 +183,10 @@ static int omap_device_build_from_dt(struct platform_device *pdev)
>>   odbfd_exit1:
>>   	kfree(hwmods);
>>   odbfd_exit:
>> +	/* if data/we are at fault.. load up a fail handler */
>> +	if (ret)
>> +		pdev->dev.pm_domain = &omap_device_fail_pm_domain;
>> +
>>   	return ret;
>>   }
>>
>
> Just wondering, can't we just print the warning here instead of registering new
> pm_domain callbacks?
>

I suggest you might want to read the commit message again.. but lets try 
once again:

As you see in dmesg log 
https://bugzilla.kernel.org/attachment.cgi?id=117311 pointed in the bug 
https://bugzilla.kernel.org/show_bug.cgi?id=66441,


you already have
"
[    0.176940] platform 4b501000.aes: Cannot lookup hwmod 'aes'
[    0.177215] platform 480a5000.des: Cannot lookup hwmod 'des'"

Now, printing that warning does not help, as I already explained in the 
commit log,
"
 >> b) driver does not know about (a), does a pm_runtime_get_sync which
 >>     never fails"

A device node stated it will have hwmod to adequately control it, but in 
reality, as in this case, it does not. how does printing a warning alone 
help the driver which is not aware of these? The driver's attempt at 
pm_runtime_sync should fail, as that is what "ti,hwmod" property controls.


> Concerned that all this LOC may end up being dead code when the "ti,hwmods"
> property becomes obsolete anyway.

we detected we have a bug with 3.13-rc2 - this is a fix for kernel 
(probably a stable candidate too). ti,hwmod property might become 
eventually obsolete (and we are working towards that), but the 
functionality that it provides today is necessary for the transition 
from mixed dt-hwmod world to pure dt world. - remember we are moving 
from data structure which is used to describe hardware to another which 
again describes hardware in a different form - the kind of bugs we see 
now are expected to be fixed for transition to be smooth for everyone.

without providing adequate warnings, bugs like 
https://bugzilla.kernel.org/show_bug.cgi?id=66441 will need pretty nasty 
debug.

I hope this helps convince you that error code is worth the LoC.

--
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes Dec. 4, 2013, 12:44 p.m. UTC | #3
On 12/04/2013 05:03 PM, Nishanth Menon wrote:
> On 12/04/2013 02:08 AM, Joel Fernandes wrote:
>> On 12/04/2013 07:09 AM, Nishanth Menon wrote:
>>> Due to the cross dependencies between hwmod for automanaged device
>>> information for OMAP and dts node definitions, we can run into scenarios
>>> where the dts node is defined, however it's hwmod entry is yet to be
>>> added. In these cases:
>>> a) omap_device does not register a pm_domain (since it cannot find
>>>     hwmod entry).
>>> b) driver does not know about (a), does a pm_runtime_get_sync which
>>>     never fails
>>> c) It then tries to do some operation on the device (such as read the
>>>    revision register (as part of probe) without clock or adequate OMAP
>>>    generic PM operation performed for enabling the module.
>>>
>>> This causes a crash such as that reported in:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=66441
>>>
>>> When 'ti,hwmod' is provided in dt node, it is expected that the device
>>> will not function without the OMAP's power automanagement. Hence, when
>>> we hit a fail condition (due to hwmod entries not present or other
>>> similar scenario), fail at pm_domain level due to lack of data, provide
>>> enough information for it to be fixed, however, it allows for the driver
>>> to take appropriate measures to prevent crash.
>>>
>>> Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
>>> Signed-off-by: Nishanth Menon <nm@ti.com>
>>> ---
>>>   arch/arm/mach-omap2/omap_device.c |   24 ++++++++++++++++++++++++
>>>   arch/arm/mach-omap2/omap_device.h |    1 +
>>>   2 files changed, 25 insertions(+)
>>>
>>> diff --git a/arch/arm/mach-omap2/omap_device.c
>>> b/arch/arm/mach-omap2/omap_device.c
>>> index 53f0735..e0a398c 100644
>>> --- a/arch/arm/mach-omap2/omap_device.c
>>> +++ b/arch/arm/mach-omap2/omap_device.c
>>> @@ -183,6 +183,10 @@ static int omap_device_build_from_dt(struct
>>> platform_device *pdev)
>>>   odbfd_exit1:
>>>       kfree(hwmods);
>>>   odbfd_exit:
>>> +    /* if data/we are at fault.. load up a fail handler */
>>> +    if (ret)
>>> +        pdev->dev.pm_domain = &omap_device_fail_pm_domain;
>>> +
>>>       return ret;
>>>   }
>>>
>>
>> Just wondering, can't we just print the warning here instead of registering new
>> pm_domain callbacks?
>>
> 
> I suggest you might want to read the commit message again.. but lets try once
> again:

I know what your patch does and what the problem you're trying to solve is.. Was
just trying to see if there's a better way of doing what you're trying to do..

>>> b) driver does not know about (a), does a pm_runtime_get_sync which
>>>     never fails"
> 
> A device node stated it will have hwmod to adequately control it, but in
> reality, as in this case, it does not. how does printing a warning alone help
> the driver which is not aware of these? The driver's attempt at pm_runtime_sync
> should fail, as that is what "ti,hwmod" property controls.

Why not do the following?

Assign pm_domain as omap_device_pm_domain always regardless of error or not.

Then in the _od_runtime_resume, check if the od or hwmods exists. If not, print
the warning. That way you don't need to register additional special callbacks
just to print a warning and will prolly be fewer LoC fwiw.

That may be harder to do and may require additional checks in omap_device_enable
etc, not sure. In that case, your approach is certainly the next best way. Just
thought its worth looking into :)

regards,

-Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Dec. 5, 2013, 7:03 p.m. UTC | #4
* Nishanth Menon <nm@ti.com> [131203 17:40]:
> Due to the cross dependencies between hwmod for automanaged device
> information for OMAP and dts node definitions, we can run into scenarios
> where the dts node is defined, however it's hwmod entry is yet to be
> added. In these cases:
> a) omap_device does not register a pm_domain (since it cannot find
>    hwmod entry).
> b) driver does not know about (a), does a pm_runtime_get_sync which
>    never fails
> c) It then tries to do some operation on the device (such as read the
>   revision register (as part of probe) without clock or adequate OMAP
>   generic PM operation performed for enabling the module.
> 
> This causes a crash such as that reported in:
> https://bugzilla.kernel.org/show_bug.cgi?id=66441
> 
> When 'ti,hwmod' is provided in dt node, it is expected that the device
> will not function without the OMAP's power automanagement. Hence, when
> we hit a fail condition (due to hwmod entries not present or other
> similar scenario), fail at pm_domain level due to lack of data, provide
> enough information for it to be fixed, however, it allows for the driver
> to take appropriate measures to prevent crash.

Kevin, any comments on this one?

Regards,

Tony
 
> Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
> Signed-off-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/mach-omap2/omap_device.c |   24 ++++++++++++++++++++++++
>  arch/arm/mach-omap2/omap_device.h |    1 +
>  2 files changed, 25 insertions(+)
> 
> diff --git a/arch/arm/mach-omap2/omap_device.c b/arch/arm/mach-omap2/omap_device.c
> index 53f0735..e0a398c 100644
> --- a/arch/arm/mach-omap2/omap_device.c
> +++ b/arch/arm/mach-omap2/omap_device.c
> @@ -183,6 +183,10 @@ static int omap_device_build_from_dt(struct platform_device *pdev)
>  odbfd_exit1:
>  	kfree(hwmods);
>  odbfd_exit:
> +	/* if data/we are at fault.. load up a fail handler */
> +	if (ret)
> +		pdev->dev.pm_domain = &omap_device_fail_pm_domain;
> +
>  	return ret;
>  }
>  
> @@ -604,6 +608,19 @@ static int _od_runtime_resume(struct device *dev)
>  
>  	return pm_generic_runtime_resume(dev);
>  }
> +
> +static int _od_fail_runtime_suspend(struct device *dev)
> +{
> +	dev_warn(dev, "%s: FIXME: missing hwmod/omap_dev info\n", __func__);
> +	return -ENODEV;
> +}
> +
> +static int _od_fail_runtime_resume(struct device *dev)
> +{
> +	dev_warn(dev, "%s: FIXME: missing hwmod/omap_dev info\n", __func__);
> +	return -ENODEV;
> +}
> +
>  #endif
>  
>  #ifdef CONFIG_SUSPEND
> @@ -657,6 +674,13 @@ static int _od_resume_noirq(struct device *dev)
>  #define _od_resume_noirq NULL
>  #endif
>  
> +struct dev_pm_domain omap_device_fail_pm_domain = {
> +	.ops = {
> +		SET_RUNTIME_PM_OPS(_od_fail_runtime_suspend,
> +				   _od_fail_runtime_resume, NULL)
> +	}
> +};
> +
>  struct dev_pm_domain omap_device_pm_domain = {
>  	.ops = {
>  		SET_RUNTIME_PM_OPS(_od_runtime_suspend, _od_runtime_resume,
> diff --git a/arch/arm/mach-omap2/omap_device.h b/arch/arm/mach-omap2/omap_device.h
> index 17ca1ae..78c02b3 100644
> --- a/arch/arm/mach-omap2/omap_device.h
> +++ b/arch/arm/mach-omap2/omap_device.h
> @@ -29,6 +29,7 @@
>  #include "omap_hwmod.h"
>  
>  extern struct dev_pm_domain omap_device_pm_domain;
> +extern struct dev_pm_domain omap_device_fail_pm_domain;
>  
>  /* omap_device._state values */
>  #define OMAP_DEVICE_STATE_UNKNOWN	0
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman Dec. 9, 2013, 4:06 p.m. UTC | #5
Tony Lindgren <tony@atomide.com> writes:

> * Nishanth Menon <nm@ti.com> [131203 17:40]:
>> Due to the cross dependencies between hwmod for automanaged device
>> information for OMAP and dts node definitions, we can run into scenarios
>> where the dts node is defined, however it's hwmod entry is yet to be
>> added. In these cases:
>> a) omap_device does not register a pm_domain (since it cannot find
>>    hwmod entry).
>> b) driver does not know about (a), does a pm_runtime_get_sync which
>>    never fails
>> c) It then tries to do some operation on the device (such as read the
>>   revision register (as part of probe) without clock or adequate OMAP
>>   generic PM operation performed for enabling the module.
>> 
>> This causes a crash such as that reported in:
>> https://bugzilla.kernel.org/show_bug.cgi?id=66441
>> 
>> When 'ti,hwmod' is provided in dt node, it is expected that the device
>> will not function without the OMAP's power automanagement. Hence, when
>> we hit a fail condition (due to hwmod entries not present or other
>> similar scenario), fail at pm_domain level due to lack of data, provide
>> enough information for it to be fixed, however, it allows for the driver
>> to take appropriate measures to prevent crash.
>
> Kevin, any comments on this one?

Looks like a good approach to catch these corner cases.

Acked-by: Kevin Hilman <khilman@linaro.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Dec. 10, 2013, 5:30 p.m. UTC | #6
* Kevin Hilman <khilman@linaro.org> [131209 08:07]:
> Tony Lindgren <tony@atomide.com> writes:
> 
> > * Nishanth Menon <nm@ti.com> [131203 17:40]:
> >> Due to the cross dependencies between hwmod for automanaged device
> >> information for OMAP and dts node definitions, we can run into scenarios
> >> where the dts node is defined, however it's hwmod entry is yet to be
> >> added. In these cases:
> >> a) omap_device does not register a pm_domain (since it cannot find
> >>    hwmod entry).
> >> b) driver does not know about (a), does a pm_runtime_get_sync which
> >>    never fails
> >> c) It then tries to do some operation on the device (such as read the
> >>   revision register (as part of probe) without clock or adequate OMAP
> >>   generic PM operation performed for enabling the module.
> >> 
> >> This causes a crash such as that reported in:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=66441
> >> 
> >> When 'ti,hwmod' is provided in dt node, it is expected that the device
> >> will not function without the OMAP's power automanagement. Hence, when
> >> we hit a fail condition (due to hwmod entries not present or other
> >> similar scenario), fail at pm_domain level due to lack of data, provide
> >> enough information for it to be fixed, however, it allows for the driver
> >> to take appropriate measures to prevent crash.
> >
> > Kevin, any comments on this one?
> 
> Looks like a good approach to catch these corner cases.
> 
> Acked-by: Kevin Hilman <khilman@linaro.org>

Kevin, care to apply this directly?

Acked-by: Tony Lindgren <tony@atomide.com> 
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kevin Hilman Dec. 10, 2013, 5:41 p.m. UTC | #7
Tony Lindgren <tony@atomide.com> writes:

> * Kevin Hilman <khilman@linaro.org> [131209 08:07]:
>> Tony Lindgren <tony@atomide.com> writes:
>> 
>> > * Nishanth Menon <nm@ti.com> [131203 17:40]:
>> >> Due to the cross dependencies between hwmod for automanaged device
>> >> information for OMAP and dts node definitions, we can run into scenarios
>> >> where the dts node is defined, however it's hwmod entry is yet to be
>> >> added. In these cases:
>> >> a) omap_device does not register a pm_domain (since it cannot find
>> >>    hwmod entry).
>> >> b) driver does not know about (a), does a pm_runtime_get_sync which
>> >>    never fails
>> >> c) It then tries to do some operation on the device (such as read the
>> >>   revision register (as part of probe) without clock or adequate OMAP
>> >>   generic PM operation performed for enabling the module.
>> >> 
>> >> This causes a crash such as that reported in:
>> >> https://bugzilla.kernel.org/show_bug.cgi?id=66441
>> >> 
>> >> When 'ti,hwmod' is provided in dt node, it is expected that the device
>> >> will not function without the OMAP's power automanagement. Hence, when
>> >> we hit a fail condition (due to hwmod entries not present or other
>> >> similar scenario), fail at pm_domain level due to lack of data, provide
>> >> enough information for it to be fixed, however, it allows for the driver
>> >> to take appropriate measures to prevent crash.
>> >
>> > Kevin, any comments on this one?
>> 
>> Looks like a good approach to catch these corner cases.
>> 
>> Acked-by: Kevin Hilman <khilman@linaro.org>
>
> Kevin, care to apply this directly?
>
> Acked-by: Tony Lindgren <tony@atomide.com> 

Applied.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/mach-omap2/omap_device.c b/arch/arm/mach-omap2/omap_device.c
index 53f0735..e0a398c 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -183,6 +183,10 @@  static int omap_device_build_from_dt(struct platform_device *pdev)
 odbfd_exit1:
 	kfree(hwmods);
 odbfd_exit:
+	/* if data/we are at fault.. load up a fail handler */
+	if (ret)
+		pdev->dev.pm_domain = &omap_device_fail_pm_domain;
+
 	return ret;
 }
 
@@ -604,6 +608,19 @@  static int _od_runtime_resume(struct device *dev)
 
 	return pm_generic_runtime_resume(dev);
 }
+
+static int _od_fail_runtime_suspend(struct device *dev)
+{
+	dev_warn(dev, "%s: FIXME: missing hwmod/omap_dev info\n", __func__);
+	return -ENODEV;
+}
+
+static int _od_fail_runtime_resume(struct device *dev)
+{
+	dev_warn(dev, "%s: FIXME: missing hwmod/omap_dev info\n", __func__);
+	return -ENODEV;
+}
+
 #endif
 
 #ifdef CONFIG_SUSPEND
@@ -657,6 +674,13 @@  static int _od_resume_noirq(struct device *dev)
 #define _od_resume_noirq NULL
 #endif
 
+struct dev_pm_domain omap_device_fail_pm_domain = {
+	.ops = {
+		SET_RUNTIME_PM_OPS(_od_fail_runtime_suspend,
+				   _od_fail_runtime_resume, NULL)
+	}
+};
+
 struct dev_pm_domain omap_device_pm_domain = {
 	.ops = {
 		SET_RUNTIME_PM_OPS(_od_runtime_suspend, _od_runtime_resume,
diff --git a/arch/arm/mach-omap2/omap_device.h b/arch/arm/mach-omap2/omap_device.h
index 17ca1ae..78c02b3 100644
--- a/arch/arm/mach-omap2/omap_device.h
+++ b/arch/arm/mach-omap2/omap_device.h
@@ -29,6 +29,7 @@ 
 #include "omap_hwmod.h"
 
 extern struct dev_pm_domain omap_device_pm_domain;
+extern struct dev_pm_domain omap_device_fail_pm_domain;
 
 /* omap_device._state values */
 #define OMAP_DEVICE_STATE_UNKNOWN	0