diff mbox

Boot hang regression 3.10.0-rc4 -> 3.10.0

Message ID 20130708133512.GD31221@arwen.pp.htv.fi (mailing list archive)
State New, archived
Headers show

Commit Message

Felipe Balbi July 8, 2013, 1:35 p.m. UTC
Hi,

On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote:
> >>>>>> I wonder if this is because the timeouts get now initialized to 0 instead
> >>>>>> of -1 for the serial driver?
> >>>>>>
> >>>>>
> >>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i
> >>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1.
> >>>
> >>> OK
> >>
> >> Issue 2: Causing boot to stop when serial driver is initialized.
> >> (After Issue 1 is fixed)
> >>
> >> I could narrow this down to the change done to return -EINVAL
> >> instead of 0 in serial_omap_get_context_loss_count() as part of
> >> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap:
> >> Fix device tree based PM runtime"
> >>
> >> What this change in turn seems to do is cause a
> >> serial_omap_restore_context() to get called as part of
> >> serial_omap_runtime_resume() which was not the case when
> >> serial_omap_get_context_loss_count() returned 0
> >>
> >> from serial_omap_runtime_resume():
> >> -----
> >>         int loss_cnt = serial_omap_get_context_loss_count(up);
> >>
> >>         if (loss_cnt < 0) {
> >>                 dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n",
> >>                         loss_cnt);
> >>                 serial_omap_restore_context(up);
> >>         } else if (up->context_loss_cnt != loss_cnt) {
> >>                 serial_omap_restore_context(up);
> >>         }
> >> -----
> >>
> >> I am still working on why a serial_omap_restore_context() could
> >> have caused console to die. I will work with Sourav on this and
> >> post the fixes for both issue 1 and issue2 once its clear on whats
> >> really causing issue 2.
> > 
> > That's because we don't have the omap specific pdata callbacks for
> > context loss any longer. We may be able to detect when the context
> > was really lost in the serial driver, and only then call the
> > serial_omap_restore_context().
> 
> Right, but calling serial_omap_restore_context() even when the context
> is not lost, should not ideally cause an issue.

it does in one condition. If context hasn't been saved before. And that
can happen in the case of wrong pm runtime status for that device.

Imagine the device is marked as suspended even though it's fully enabled
(it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
your context structure is all zeroes (context has never been saved
before) then when you call pm_runtime_get_sync() on probe() your
->runtime_resume() will get called, which will restore context,
essentially undoing anything which was configured by u-boot.

Am I missing something ?

> >> Let me know if the fix I listed for Issue 1: makes sense.
> > 
> > Yes makes sense as a fix, but IMHO we should not need any workarounds
> > like that. Is the hwmod code idling the the uarts early? If so, then
> > it should only do that in a late_initcall if no drivers are registered.
> 
> hwmod as part of its setup (early) enables/resets and idles all modules.
> These flags are used to tell hwmod to avoid a reset and idle and leave the
> module enabled (in this case console uart)

then it needs to call pm_runtime_set_active() for those devices which
have that flag set, right ?

(completely untested, didn't even try to compile, just to illustrate)

Comments

Rajendra Nayak July 9, 2013, 5:33 a.m. UTC | #1
On Monday 08 July 2013 07:05 PM, Felipe Balbi wrote:
> Hi,
> 
> On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote:
>>>>>>>> I wonder if this is because the timeouts get now initialized to 0 instead
>>>>>>>> of -1 for the serial driver?
>>>>>>>>
>>>>>>>
>>>>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i
>>>>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1.
>>>>>
>>>>> OK
>>>>
>>>> Issue 2: Causing boot to stop when serial driver is initialized.
>>>> (After Issue 1 is fixed)
>>>>
>>>> I could narrow this down to the change done to return -EINVAL
>>>> instead of 0 in serial_omap_get_context_loss_count() as part of
>>>> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap:
>>>> Fix device tree based PM runtime"
>>>>
>>>> What this change in turn seems to do is cause a
>>>> serial_omap_restore_context() to get called as part of
>>>> serial_omap_runtime_resume() which was not the case when
>>>> serial_omap_get_context_loss_count() returned 0
>>>>
>>>> from serial_omap_runtime_resume():
>>>> -----
>>>>         int loss_cnt = serial_omap_get_context_loss_count(up);
>>>>
>>>>         if (loss_cnt < 0) {
>>>>                 dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n",
>>>>                         loss_cnt);
>>>>                 serial_omap_restore_context(up);
>>>>         } else if (up->context_loss_cnt != loss_cnt) {
>>>>                 serial_omap_restore_context(up);
>>>>         }
>>>> -----
>>>>
>>>> I am still working on why a serial_omap_restore_context() could
>>>> have caused console to die. I will work with Sourav on this and
>>>> post the fixes for both issue 1 and issue2 once its clear on whats
>>>> really causing issue 2.
>>>
>>> That's because we don't have the omap specific pdata callbacks for
>>> context loss any longer. We may be able to detect when the context
>>> was really lost in the serial driver, and only then call the
>>> serial_omap_restore_context().
>>
>> Right, but calling serial_omap_restore_context() even when the context
>> is not lost, should not ideally cause an issue.
> 
> it does in one condition. If context hasn't been saved before. And that
> can happen in the case of wrong pm runtime status for that device.
> 
> Imagine the device is marked as suspended even though it's fully enabled
> (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
> your context structure is all zeroes (context has never been saved
> before) then when you call pm_runtime_get_sync() on probe() your
> ->runtime_resume() will get called, which will restore context,
> essentially undoing anything which was configured by u-boot.

This could be a problem for drivers which do a save context in ->runtime_suspend()
but from what I see with omap serial, there is no save context done as part of
->runtime_suspend.

> 
> Am I missing something ?
> 
>>>> Let me know if the fix I listed for Issue 1: makes sense.
>>>
>>> Yes makes sense as a fix, but IMHO we should not need any workarounds
>>> like that. Is the hwmod code idling the the uarts early? If so, then
>>> it should only do that in a late_initcall if no drivers are registered.
>>
>> hwmod as part of its setup (early) enables/resets and idles all modules.
>> These flags are used to tell hwmod to avoid a reset and idle and leave the
>> module enabled (in this case console uart)
> 
> then it needs to call pm_runtime_set_active() for those devices which
> have that flag set, right ?
> 
> (completely untested, didn't even try to compile, just to illustrate)
> 
> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
> index 7341eff..d8dca68 100644
> --- a/arch/arm/mach-omap2/omap_hwmod.c
> +++ b/arch/arm/mach-omap2/omap_hwmod.c
> @@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh)
>  	    (postsetup_state == _HWMOD_STATE_IDLE)) {
>  		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
>  		postsetup_state = _HWMOD_STATE_ENABLED;
> +
> +		/* tell pm_runtime this device is already active */
> +		pm_runtime_set_active(&oh->od->pdev->dev);
> +	} else {
> +		/* tell pm_runtime this device is trully suspended */
> +		pm_runtime_set_suspended(&oh->od->pdev->dev);
>  	}
>  
>  	if (postsetup_state == _HWMOD_STATE_IDLE)
>
Felipe Balbi July 9, 2013, 6:42 a.m. UTC | #2
Hi,

On Tue, Jul 09, 2013 at 11:03:54AM +0530, Rajendra Nayak wrote:
> On Monday 08 July 2013 07:05 PM, Felipe Balbi wrote:
> > Hi,
> > 
> > On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote:
> >>>>>>>> I wonder if this is because the timeouts get now initialized to 0 instead
> >>>>>>>> of -1 for the serial driver?
> >>>>>>>>
> >>>>>>>
> >>>>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i
> >>>>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1.
> >>>>>
> >>>>> OK
> >>>>
> >>>> Issue 2: Causing boot to stop when serial driver is initialized.
> >>>> (After Issue 1 is fixed)
> >>>>
> >>>> I could narrow this down to the change done to return -EINVAL
> >>>> instead of 0 in serial_omap_get_context_loss_count() as part of
> >>>> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap:
> >>>> Fix device tree based PM runtime"
> >>>>
> >>>> What this change in turn seems to do is cause a
> >>>> serial_omap_restore_context() to get called as part of
> >>>> serial_omap_runtime_resume() which was not the case when
> >>>> serial_omap_get_context_loss_count() returned 0
> >>>>
> >>>> from serial_omap_runtime_resume():
> >>>> -----
> >>>>         int loss_cnt = serial_omap_get_context_loss_count(up);
> >>>>
> >>>>         if (loss_cnt < 0) {
> >>>>                 dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n",
> >>>>                         loss_cnt);
> >>>>                 serial_omap_restore_context(up);
> >>>>         } else if (up->context_loss_cnt != loss_cnt) {
> >>>>                 serial_omap_restore_context(up);
> >>>>         }
> >>>> -----
> >>>>
> >>>> I am still working on why a serial_omap_restore_context() could
> >>>> have caused console to die. I will work with Sourav on this and
> >>>> post the fixes for both issue 1 and issue2 once its clear on whats
> >>>> really causing issue 2.
> >>>
> >>> That's because we don't have the omap specific pdata callbacks for
> >>> context loss any longer. We may be able to detect when the context
> >>> was really lost in the serial driver, and only then call the
> >>> serial_omap_restore_context().
> >>
> >> Right, but calling serial_omap_restore_context() even when the context
> >> is not lost, should not ideally cause an issue.
> > 
> > it does in one condition. If context hasn't been saved before. And that
> > can happen in the case of wrong pm runtime status for that device.
> > 
> > Imagine the device is marked as suspended even though it's fully enabled
> > (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
> > your context structure is all zeroes (context has never been saved
> > before) then when you call pm_runtime_get_sync() on probe() your
> > ->runtime_resume() will get called, which will restore context,
> > essentially undoing anything which was configured by u-boot.
> 
> This could be a problem for drivers which do a save context in ->runtime_suspend()
> but from what I see with omap serial, there is no save context done as part of
> ->runtime_suspend.

right, because context is "saved" in set_termios. probe() will get
called much before set_termios() has a chance to run, right ?

Same problem will trigger in that case.

I still think patch below is necessary

> > (completely untested, didn't even try to compile, just to illustrate)
> > 
> > diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
> > index 7341eff..d8dca68 100644
> > --- a/arch/arm/mach-omap2/omap_hwmod.c
> > +++ b/arch/arm/mach-omap2/omap_hwmod.c
> > @@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh)
> >  	    (postsetup_state == _HWMOD_STATE_IDLE)) {
> >  		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
> >  		postsetup_state = _HWMOD_STATE_ENABLED;
> > +
> > +		/* tell pm_runtime this device is already active */
> > +		pm_runtime_set_active(&oh->od->pdev->dev);
> > +	} else {
> > +		/* tell pm_runtime this device is trully suspended */
> > +		pm_runtime_set_suspended(&oh->od->pdev->dev);
> >  	}
> >  
> >  	if (postsetup_state == _HWMOD_STATE_IDLE)
Rajendra Nayak July 9, 2013, 7:19 a.m. UTC | #3
On Tuesday 09 July 2013 12:12 PM, Felipe Balbi wrote:
> Hi,
> 
> On Tue, Jul 09, 2013 at 11:03:54AM +0530, Rajendra Nayak wrote:
>> On Monday 08 July 2013 07:05 PM, Felipe Balbi wrote:
>>> Hi,
>>>
>>> On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote:
>>>>>>>>>> I wonder if this is because the timeouts get now initialized to 0 instead
>>>>>>>>>> of -1 for the serial driver?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i
>>>>>>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1.
>>>>>>>
>>>>>>> OK
>>>>>>
>>>>>> Issue 2: Causing boot to stop when serial driver is initialized.
>>>>>> (After Issue 1 is fixed)
>>>>>>
>>>>>> I could narrow this down to the change done to return -EINVAL
>>>>>> instead of 0 in serial_omap_get_context_loss_count() as part of
>>>>>> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap:
>>>>>> Fix device tree based PM runtime"
>>>>>>
>>>>>> What this change in turn seems to do is cause a
>>>>>> serial_omap_restore_context() to get called as part of
>>>>>> serial_omap_runtime_resume() which was not the case when
>>>>>> serial_omap_get_context_loss_count() returned 0
>>>>>>
>>>>>> from serial_omap_runtime_resume():
>>>>>> -----
>>>>>>         int loss_cnt = serial_omap_get_context_loss_count(up);
>>>>>>
>>>>>>         if (loss_cnt < 0) {
>>>>>>                 dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n",
>>>>>>                         loss_cnt);
>>>>>>                 serial_omap_restore_context(up);
>>>>>>         } else if (up->context_loss_cnt != loss_cnt) {
>>>>>>                 serial_omap_restore_context(up);
>>>>>>         }
>>>>>> -----
>>>>>>
>>>>>> I am still working on why a serial_omap_restore_context() could
>>>>>> have caused console to die. I will work with Sourav on this and
>>>>>> post the fixes for both issue 1 and issue2 once its clear on whats
>>>>>> really causing issue 2.
>>>>>
>>>>> That's because we don't have the omap specific pdata callbacks for
>>>>> context loss any longer. We may be able to detect when the context
>>>>> was really lost in the serial driver, and only then call the
>>>>> serial_omap_restore_context().
>>>>
>>>> Right, but calling serial_omap_restore_context() even when the context
>>>> is not lost, should not ideally cause an issue.
>>>
>>> it does in one condition. If context hasn't been saved before. And that
>>> can happen in the case of wrong pm runtime status for that device.
>>>
>>> Imagine the device is marked as suspended even though it's fully enabled
>>> (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
>>> your context structure is all zeroes (context has never been saved
>>> before) then when you call pm_runtime_get_sync() on probe() your
>>> ->runtime_resume() will get called, which will restore context,
>>> essentially undoing anything which was configured by u-boot.
>>
>> This could be a problem for drivers which do a save context in ->runtime_suspend()
>> but from what I see with omap serial, there is no save context done as part of
>> ->runtime_suspend.
> 
> right, because context is "saved" in set_termios. probe() will get
> called much before set_termios() has a chance to run, right ?
> 
> Same problem will trigger in that case.
> 
> I still think patch below is necessary

Right, I'll try some on those lines. Looks like a pm_runtime_set_active() is done
for the console in the non DT case in omap_serial_init_port(). It seems to be
missing in the DT case.

Although I feel this should fix the issue we have right now, I wonder if there could
ever be a case with uart being suspended and having to resume again before a
set_termios? What I mean to ask is, if the omap serial driver assuming a resume to
happen only post a set_termios is always valid.

> 
>>> (completely untested, didn't even try to compile, just to illustrate)
>>>
>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
>>> index 7341eff..d8dca68 100644
>>> --- a/arch/arm/mach-omap2/omap_hwmod.c
>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c
>>> @@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh)
>>>  	    (postsetup_state == _HWMOD_STATE_IDLE)) {
>>>  		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
>>>  		postsetup_state = _HWMOD_STATE_ENABLED;
>>> +
>>> +		/* tell pm_runtime this device is already active */
>>> +		pm_runtime_set_active(&oh->od->pdev->dev);
>>> +	} else {
>>> +		/* tell pm_runtime this device is trully suspended */
>>> +		pm_runtime_set_suspended(&oh->od->pdev->dev);
>>>  	}
>>>  
>>>  	if (postsetup_state == _HWMOD_STATE_IDLE)
>
Felipe Balbi July 9, 2013, 7:40 a.m. UTC | #4
Hi,

On Tue, Jul 09, 2013 at 12:49:10PM +0530, Rajendra Nayak wrote:
> >>>> Right, but calling serial_omap_restore_context() even when the context
> >>>> is not lost, should not ideally cause an issue.
> >>>
> >>> it does in one condition. If context hasn't been saved before. And that
> >>> can happen in the case of wrong pm runtime status for that device.
> >>>
> >>> Imagine the device is marked as suspended even though it's fully enabled
> >>> (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
> >>> your context structure is all zeroes (context has never been saved
> >>> before) then when you call pm_runtime_get_sync() on probe() your
> >>> ->runtime_resume() will get called, which will restore context,
> >>> essentially undoing anything which was configured by u-boot.
> >>
> >> This could be a problem for drivers which do a save context in ->runtime_suspend()
> >> but from what I see with omap serial, there is no save context done as part of
> >> ->runtime_suspend.
> > 
> > right, because context is "saved" in set_termios. probe() will get
> > called much before set_termios() has a chance to run, right ?
> > 
> > Same problem will trigger in that case.
> > 
> > I still think patch below is necessary
> 
> Right, I'll try some on those lines. Looks like a
> pm_runtime_set_active() is done for the console in the non DT case in
> omap_serial_init_port(). It seems to be missing in the DT case.

yeah, based on conversation I had with Alan Stern a while back,
pm_runtime_set_*() is best called by the underlying 'bus' and I tend to
agree with him on that.

Our HWMOD being our 'bus' (not really, but still), is the entity which
knows if the device is powered or not and should make sure to call
pm_runtime_set_*() for all devices.

> Although I feel this should fix the issue we have right now, I wonder
> if there could ever be a case with uart being suspended and having to
> resume again before a set_termios? What I mean to ask is, if the omap
> serial driver assuming a resume to happen only post a set_termios is
> always valid.

Meaning, probe() -> suspend() -> resume() ? I don't think that would
ever happen. ->set_termios() would always get called and that's actually
what triggers the resume.
Grygorii Strashko July 9, 2013, 6:59 p.m. UTC | #5
Hi,

On 07/09/2013 09:42 AM, Felipe Balbi wrote:
> Hi,
>
> On Tue, Jul 09, 2013 at 11:03:54AM +0530, Rajendra Nayak wrote:
>> On Monday 08 July 2013 07:05 PM, Felipe Balbi wrote:
>>> Hi,
>>>
>>> On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote:
>>>>>>>>>> I wonder if this is because the timeouts get now initialized to 0 instead
>>>>>>>>>> of -1 for the serial driver?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i
>>>>>>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1.
>>>>>>>
>>>>>>> OK
>>>>>>
>>>>>> Issue 2: Causing boot to stop when serial driver is initialized.
>>>>>> (After Issue 1 is fixed)
>>>>>>
>>>>>> I could narrow this down to the change done to return -EINVAL
>>>>>> instead of 0 in serial_omap_get_context_loss_count() as part of
>>>>>> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap:
>>>>>> Fix device tree based PM runtime"
>>>>>>
>>>>>> What this change in turn seems to do is cause a
>>>>>> serial_omap_restore_context() to get called as part of
>>>>>> serial_omap_runtime_resume() which was not the case when
>>>>>> serial_omap_get_context_loss_count() returned 0
>>>>>>
>>>>>> from serial_omap_runtime_resume():
>>>>>> -----
>>>>>>          int loss_cnt = serial_omap_get_context_loss_count(up);
>>>>>>
>>>>>>          if (loss_cnt < 0) {
>>>>>>                  dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n",
>>>>>>                          loss_cnt);
>>>>>>                  serial_omap_restore_context(up);
>>>>>>          } else if (up->context_loss_cnt != loss_cnt) {
>>>>>>                  serial_omap_restore_context(up);
>>>>>>          }
>>>>>> -----
>>>>>>
>>>>>> I am still working on why a serial_omap_restore_context() could
>>>>>> have caused console to die. I will work with Sourav on this and
>>>>>> post the fixes for both issue 1 and issue2 once its clear on whats
>>>>>> really causing issue 2.
>>>>>
>>>>> That's because we don't have the omap specific pdata callbacks for
>>>>> context loss any longer. We may be able to detect when the context
>>>>> was really lost in the serial driver, and only then call the
>>>>> serial_omap_restore_context().
>>>>
>>>> Right, but calling serial_omap_restore_context() even when the context
>>>> is not lost, should not ideally cause an issue.
>>>
>>> it does in one condition. If context hasn't been saved before. And that
>>> can happen in the case of wrong pm runtime status for that device.
>>>
>>> Imagine the device is marked as suspended even though it's fully enabled
>>> (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
>>> your context structure is all zeroes (context has never been saved
>>> before) then when you call pm_runtime_get_sync() on probe() your
>>> ->runtime_resume() will get called, which will restore context,
>>> essentially undoing anything which was configured by u-boot.
>>
>> This could be a problem for drivers which do a save context in ->runtime_suspend()
>> but from what I see with omap serial, there is no save context done as part of
>> ->runtime_suspend.
>
> right, because context is "saved" in set_termios. probe() will get
> called much before set_termios() has a chance to run, right ?
>
> Same problem will trigger in that case.
>
> I still think patch below is necessary
>
>>> (completely untested, didn't even try to compile, just to illustrate)
>>>
>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
>>> index 7341eff..d8dca68 100644
>>> --- a/arch/arm/mach-omap2/omap_hwmod.c
>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c
>>> @@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh)
>>>   	    (postsetup_state == _HWMOD_STATE_IDLE)) {
>>>   		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
>>>   		postsetup_state = _HWMOD_STATE_ENABLED;
>>> +
>>> +		/* tell pm_runtime this device is already active */
>>> +		pm_runtime_set_active(&oh->od->pdev->dev);
>>> +	} else {
>>> +		/* tell pm_runtime this device is trully suspended */
>>> +		pm_runtime_set_suspended(&oh->od->pdev->dev);
>>>   	}
>>>
>>>   	if (postsetup_state == _HWMOD_STATE_IDLE)
>

This will not work - _setup_postsetup() is called from core_initcall 
level and OMAP devices have not been created at this moment yet 
(of_platform_populate() is called from
customize_machine->init_machine->omap_generic_init() at arch_initcall time)

More over, I don't recommend to depend on hwmod->od field - it's been 
created to support OPPs and it's obsolete now in case of DT use.

Seems, This issue need to be handled in driver for DT boot use case, 
possibly cmdline need to be parsed in the same way as it's done in
omap_serial_early_init().

Regards,
-grygorii
Felipe Balbi July 9, 2013, 7:41 p.m. UTC | #6
Hi,

On Tue, Jul 09, 2013 at 09:59:28PM +0300, Grygorii Strashko wrote:
> >>>Imagine the device is marked as suspended even though it's fully enabled
> >>>(it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
> >>>your context structure is all zeroes (context has never been saved
> >>>before) then when you call pm_runtime_get_sync() on probe() your
> >>>->runtime_resume() will get called, which will restore context,
> >>>essentially undoing anything which was configured by u-boot.
> >>
> >>This could be a problem for drivers which do a save context in ->runtime_suspend()
> >>but from what I see with omap serial, there is no save context done as part of
> >>->runtime_suspend.
> >
> >right, because context is "saved" in set_termios. probe() will get
> >called much before set_termios() has a chance to run, right ?
> >
> >Same problem will trigger in that case.
> >
> >I still think patch below is necessary
> >
> >>>(completely untested, didn't even try to compile, just to illustrate)
> >>>
> >>>diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
> >>>index 7341eff..d8dca68 100644
> >>>--- a/arch/arm/mach-omap2/omap_hwmod.c
> >>>+++ b/arch/arm/mach-omap2/omap_hwmod.c
> >>>@@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh)
> >>>  	    (postsetup_state == _HWMOD_STATE_IDLE)) {
> >>>  		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
> >>>  		postsetup_state = _HWMOD_STATE_ENABLED;
> >>>+
> >>>+		/* tell pm_runtime this device is already active */
> >>>+		pm_runtime_set_active(&oh->od->pdev->dev);
> >>>+	} else {
> >>>+		/* tell pm_runtime this device is trully suspended */
> >>>+		pm_runtime_set_suspended(&oh->od->pdev->dev);
> >>>  	}
> >>>
> >>>  	if (postsetup_state == _HWMOD_STATE_IDLE)
> >
> 
> This will not work - _setup_postsetup() is called from core_initcall
> level and OMAP devices have not been created at this moment yet
> (of_platform_populate() is called from
> customize_machine->init_machine->omap_generic_init() at arch_initcall time)

fair enough, but something *like* that needs to be done. If pm_runtime
doesn't know the state of the device by the time pm_runtime_enable() is
called, the wrong assumptions might be made and we will forever have
such problems as our ->runtime_resume() callback being called when it
shouldn't.

> More over, I don't recommend to depend on hwmod->od field - it's been
> created to support OPPs and it's obsolete now in case of DT use.

that's alright, but still we need something similar.

But in any case, if on DT boot that's not used, than *what* uses
HWMOD_INIT_NO_IDLE during DT boot ?

There's a single place in kernel source which checks if
HWMOD_INIT_NO_IDLE is set, and that's the place which I patched.

We, certainly, need a way to tell pm_runtime if the device is active or
suspended by the time we reach our probe() function. Either we assume
*all* devices are active and we blindly call pm_runtime_set_active() for
all devices, or we assume *all* devices are suspended as we call
pm_runtime_set_suspend() for all devices, or we figure out which ones
are active and which are not and call pm_runtime_set_{active,suspend}()
conditionally.

> Seems, This issue need to be handled in driver for DT boot use case,
> possibly cmdline need to be parsed in the same way as it's done in
> omap_serial_early_init().

so you want *every* single driver to parse their own cmdline ? How big
would the cmdline become ? This makes no sense.
Kevin Hilman July 10, 2013, 8:22 a.m. UTC | #7
Felipe Balbi <balbi@ti.com> writes:

> Hi,
>
> On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote:
>> >>>>>> I wonder if this is because the timeouts get now initialized to 0 instead
>> >>>>>> of -1 for the serial driver?
>> >>>>>>
>> >>>>>
>> >>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i
>> >>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1.
>> >>>
>> >>> OK
>> >>
>> >> Issue 2: Causing boot to stop when serial driver is initialized.
>> >> (After Issue 1 is fixed)
>> >>
>> >> I could narrow this down to the change done to return -EINVAL
>> >> instead of 0 in serial_omap_get_context_loss_count() as part of
>> >> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap:
>> >> Fix device tree based PM runtime"
>> >>
>> >> What this change in turn seems to do is cause a
>> >> serial_omap_restore_context() to get called as part of
>> >> serial_omap_runtime_resume() which was not the case when
>> >> serial_omap_get_context_loss_count() returned 0
>> >>
>> >> from serial_omap_runtime_resume():
>> >> -----
>> >>         int loss_cnt = serial_omap_get_context_loss_count(up);
>> >>
>> >>         if (loss_cnt < 0) {
>> >>                 dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n",
>> >>                         loss_cnt);
>> >>                 serial_omap_restore_context(up);
>> >>         } else if (up->context_loss_cnt != loss_cnt) {
>> >>                 serial_omap_restore_context(up);
>> >>         }
>> >> -----
>> >>
>> >> I am still working on why a serial_omap_restore_context() could
>> >> have caused console to die. I will work with Sourav on this and
>> >> post the fixes for both issue 1 and issue2 once its clear on whats
>> >> really causing issue 2.
>> > 
>> > That's because we don't have the omap specific pdata callbacks for
>> > context loss any longer. We may be able to detect when the context
>> > was really lost in the serial driver, and only then call the
>> > serial_omap_restore_context().
>> 
>> Right, but calling serial_omap_restore_context() even when the context
>> is not lost, should not ideally cause an issue.
>
> it does in one condition. If context hasn't been saved before. And that
> can happen in the case of wrong pm runtime status for that device.
>
> Imagine the device is marked as suspended even though it's fully enabled
> (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
> your context structure is all zeroes (context has never been saved
> before) then when you call pm_runtime_get_sync() on probe() your
> ->runtime_resume() will get called, which will restore context,
> essentially undoing anything which was configured by u-boot.
>
> Am I missing something ?

You're right, the _set_active() is crucial in the case when we prevent
the console UART from idling during boot (though that shouldn't be
happening in mainline unless the fix for "Issue 1" is done.)

Kevin
Tony Lindgren July 10, 2013, 12:10 p.m. UTC | #8
* Kevin Hilman <khilman@linaro.org> [130710 01:29]:
> Felipe Balbi <balbi@ti.com> writes:
> 
> > Hi,
> >
> > On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote:
> >> >>>>>> I wonder if this is because the timeouts get now initialized to 0 instead
> >> >>>>>> of -1 for the serial driver?
> >> >>>>>>
> >> >>>>>
> >> >>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i
> >> >>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1.
> >> >>>
> >> >>> OK
> >> >>
> >> >> Issue 2: Causing boot to stop when serial driver is initialized.
> >> >> (After Issue 1 is fixed)
> >> >>
> >> >> I could narrow this down to the change done to return -EINVAL
> >> >> instead of 0 in serial_omap_get_context_loss_count() as part of
> >> >> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap:
> >> >> Fix device tree based PM runtime"
> >> >>
> >> >> What this change in turn seems to do is cause a
> >> >> serial_omap_restore_context() to get called as part of
> >> >> serial_omap_runtime_resume() which was not the case when
> >> >> serial_omap_get_context_loss_count() returned 0
> >> >>
> >> >> from serial_omap_runtime_resume():
> >> >> -----
> >> >>         int loss_cnt = serial_omap_get_context_loss_count(up);
> >> >>
> >> >>         if (loss_cnt < 0) {
> >> >>                 dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n",
> >> >>                         loss_cnt);
> >> >>                 serial_omap_restore_context(up);
> >> >>         } else if (up->context_loss_cnt != loss_cnt) {
> >> >>                 serial_omap_restore_context(up);
> >> >>         }
> >> >> -----
> >> >>
> >> >> I am still working on why a serial_omap_restore_context() could
> >> >> have caused console to die. I will work with Sourav on this and
> >> >> post the fixes for both issue 1 and issue2 once its clear on whats
> >> >> really causing issue 2.
> >> > 
> >> > That's because we don't have the omap specific pdata callbacks for
> >> > context loss any longer. We may be able to detect when the context
> >> > was really lost in the serial driver, and only then call the
> >> > serial_omap_restore_context().
> >> 
> >> Right, but calling serial_omap_restore_context() even when the context
> >> is not lost, should not ideally cause an issue.
> >
> > it does in one condition. If context hasn't been saved before. And that
> > can happen in the case of wrong pm runtime status for that device.
> >
> > Imagine the device is marked as suspended even though it's fully enabled
> > (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
> > your context structure is all zeroes (context has never been saved
> > before) then when you call pm_runtime_get_sync() on probe() your
> > ->runtime_resume() will get called, which will restore context,
> > essentially undoing anything which was configured by u-boot.
> >
> > Am I missing something ?
> 
> You're right, the _set_active() is crucial in the case when we prevent
> the console UART from idling during boot (though that shouldn't be
> happening in mainline unless the fix for "Issue 1" is done.)

OK so how should we fix this?

If the problem is hwmod automatically resetting things early while
earlycon is running, then the reset should happen later.

If the problem is omap-serial.c idling things while earlycon is
running, then omap-serial.c should be fixed.

In any case we want mach-omap2/serial.c out of the way completely
for DT based booting.

Regards,

Tony
Grygorii Strashko July 10, 2013, 12:16 p.m. UTC | #9
On 07/09/2013 10:41 PM, Felipe Balbi wrote:
> Hi,
>
> On Tue, Jul 09, 2013 at 09:59:28PM +0300, Grygorii Strashko wrote:
>>>>> Imagine the device is marked as suspended even though it's fully enabled
>>>>> (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
>>>>> your context structure is all zeroes (context has never been saved
>>>>> before) then when you call pm_runtime_get_sync() on probe() your
>>>>> ->runtime_resume() will get called, which will restore context,
>>>>> essentially undoing anything which was configured by u-boot.
>>>>
>>>> This could be a problem for drivers which do a save context in ->runtime_suspend()
>>>> but from what I see with omap serial, there is no save context done as part of
>>>> ->runtime_suspend.
>>>
>>> right, because context is "saved" in set_termios. probe() will get
>>> called much before set_termios() has a chance to run, right ?
>>>
>>> Same problem will trigger in that case.
>>>
>>> I still think patch below is necessary
>>>
>>>>> (completely untested, didn't even try to compile, just to illustrate)
>>>>>
>>>>> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
>>>>> index 7341eff..d8dca68 100644
>>>>> --- a/arch/arm/mach-omap2/omap_hwmod.c
>>>>> +++ b/arch/arm/mach-omap2/omap_hwmod.c
>>>>> @@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh)
>>>>>   	    (postsetup_state == _HWMOD_STATE_IDLE)) {
>>>>>   		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
>>>>>   		postsetup_state = _HWMOD_STATE_ENABLED;
>>>>> +
>>>>> +		/* tell pm_runtime this device is already active */
>>>>> +		pm_runtime_set_active(&oh->od->pdev->dev);
>>>>> +	} else {
>>>>> +		/* tell pm_runtime this device is trully suspended */
>>>>> +		pm_runtime_set_suspended(&oh->od->pdev->dev);
>>>>>   	}
>>>>>
>>>>>   	if (postsetup_state == _HWMOD_STATE_IDLE)
>>>
>>
>> This will not work - _setup_postsetup() is called from core_initcall
>> level and OMAP devices have not been created at this moment yet
>> (of_platform_populate() is called from
>> customize_machine->init_machine->omap_generic_init() at arch_initcall time)
>
> fair enough, but something *like* that needs to be done. If pm_runtime
> doesn't know the state of the device by the time pm_runtime_enable() is
> called, the wrong assumptions might be made and we will forever have
> such problems as our ->runtime_resume() callback being called when it
> shouldn't.
>
>> More over, I don't recommend to depend on hwmod->od field - it's been
>> created to support OPPs and it's obsolete now in case of DT use.
>
> that's alright, but still we need something similar.
>
> But in any case, if on DT boot that's not used, than *what* uses
> HWMOD_INIT_NO_IDLE during DT boot ?
>
> There's a single place in kernel source which checks if
> HWMOD_INIT_NO_IDLE is set, and that's the place which I patched.
>
> We, certainly, need a way to tell pm_runtime if the device is active or
> suspended by the time we reach our probe() function. Either we assume
> *all* devices are active and we blindly call pm_runtime_set_active() for
> all devices, or we assume *all* devices are suspended as we call
> pm_runtime_set_suspend() for all devices, or we figure out which ones
> are active and which are not and call pm_runtime_set_{active,suspend}()
> conditionally.
>
>> Seems, This issue need to be handled in driver for DT boot use case,
>> possibly cmdline need to be parsed in the same way as it's done in
>> omap_serial_early_init().
>
> so you want *every* single driver to parse their own cmdline ? How big
> would the cmdline become ? This makes no sense.
>

Agree here. Seems there two similar issues:
- this one with omap_serial
- GPIO reset issue http://www.spinics.net/lists/linux-omap/msg84186.html
   (similar issue is observed on OMAP4460+TPS6236x)

And I think, that HWMOD_INIT_NO_IDLE and HWMOD_INIT_NO_RESET flags can't 
be treated as
SoC (IP) specific parameters any more - they should be board specific.

I think, some kind of board_layout {} DT definition need to be created 
to solve these issues (just an idea). For example:

Board DTS file:
board_layout {
  ti,hwmods-init-no-reset = "gpio1", "uart3";
  ti,hwmods-init-no-idle = "uart3";
}

Then above DT parameters can be used in omap_hwmod_setup_all()->_setup()
and omap_device_build_from_dt().
- omap_hwmod_setup_all()->_setup() will configure hwmod's flags
- omap_device_build_from_dt() will check for HWMOD_INIT_NO_IDLE and
   call pm_runtime_set_active() if needed (this is the valid place for that)

NOTE. "ti,hwmods-init-no-reset" and "ti,hwmods-init-no-idle" can't be 
defined per device, because HWMOD framework will work 
(init/reset/setup/idle) with IP even if corresponding IP is not defined 
in DT.

More over, board_layout{} section can be used to define some common for 
OMAP2+ boards properties or enable/disable PM features, like:
- Setup clksetup/clkshoutdown time for oscillator
- enable/disable OFF-mode

Regards,
-grygorii
Felipe Balbi July 10, 2013, 12:25 p.m. UTC | #10
Hi,

On Wed, Jul 10, 2013 at 03:16:54PM +0300, Grygorii Strashko wrote:
> >>>>>Imagine the device is marked as suspended even though it's fully enabled
> >>>>>(it hasn't been suspended by hwmod due to NO_IDLE flag). In that case
> >>>>>your context structure is all zeroes (context has never been saved
> >>>>>before) then when you call pm_runtime_get_sync() on probe() your
> >>>>>->runtime_resume() will get called, which will restore context,
> >>>>>essentially undoing anything which was configured by u-boot.
> >>>>
> >>>>This could be a problem for drivers which do a save context in ->runtime_suspend()
> >>>>but from what I see with omap serial, there is no save context done as part of
> >>>>->runtime_suspend.
> >>>
> >>>right, because context is "saved" in set_termios. probe() will get
> >>>called much before set_termios() has a chance to run, right ?
> >>>
> >>>Same problem will trigger in that case.
> >>>
> >>>I still think patch below is necessary
> >>>
> >>>>>(completely untested, didn't even try to compile, just to illustrate)
> >>>>>
> >>>>>diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
> >>>>>index 7341eff..d8dca68 100644
> >>>>>--- a/arch/arm/mach-omap2/omap_hwmod.c
> >>>>>+++ b/arch/arm/mach-omap2/omap_hwmod.c
> >>>>>@@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh)
> >>>>>  	    (postsetup_state == _HWMOD_STATE_IDLE)) {
> >>>>>  		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
> >>>>>  		postsetup_state = _HWMOD_STATE_ENABLED;
> >>>>>+
> >>>>>+		/* tell pm_runtime this device is already active */
> >>>>>+		pm_runtime_set_active(&oh->od->pdev->dev);
> >>>>>+	} else {
> >>>>>+		/* tell pm_runtime this device is trully suspended */
> >>>>>+		pm_runtime_set_suspended(&oh->od->pdev->dev);
> >>>>>  	}
> >>>>>
> >>>>>  	if (postsetup_state == _HWMOD_STATE_IDLE)
> >>>
> >>
> >>This will not work - _setup_postsetup() is called from core_initcall
> >>level and OMAP devices have not been created at this moment yet
> >>(of_platform_populate() is called from
> >>customize_machine->init_machine->omap_generic_init() at arch_initcall time)
> >
> >fair enough, but something *like* that needs to be done. If pm_runtime
> >doesn't know the state of the device by the time pm_runtime_enable() is
> >called, the wrong assumptions might be made and we will forever have
> >such problems as our ->runtime_resume() callback being called when it
> >shouldn't.
> >
> >>More over, I don't recommend to depend on hwmod->od field - it's been
> >>created to support OPPs and it's obsolete now in case of DT use.
> >
> >that's alright, but still we need something similar.
> >
> >But in any case, if on DT boot that's not used, than *what* uses
> >HWMOD_INIT_NO_IDLE during DT boot ?
> >
> >There's a single place in kernel source which checks if
> >HWMOD_INIT_NO_IDLE is set, and that's the place which I patched.
> >
> >We, certainly, need a way to tell pm_runtime if the device is active or
> >suspended by the time we reach our probe() function. Either we assume
> >*all* devices are active and we blindly call pm_runtime_set_active() for
> >all devices, or we assume *all* devices are suspended as we call
> >pm_runtime_set_suspend() for all devices, or we figure out which ones
> >are active and which are not and call pm_runtime_set_{active,suspend}()
> >conditionally.
> >
> >>Seems, This issue need to be handled in driver for DT boot use case,
> >>possibly cmdline need to be parsed in the same way as it's done in
> >>omap_serial_early_init().
> >
> >so you want *every* single driver to parse their own cmdline ? How big
> >would the cmdline become ? This makes no sense.
> >
> 
> Agree here. Seems there two similar issues:
> - this one with omap_serial
> - GPIO reset issue http://www.spinics.net/lists/linux-omap/msg84186.html
>   (similar issue is observed on OMAP4460+TPS6236x)
> 
> And I think, that HWMOD_INIT_NO_IDLE and HWMOD_INIT_NO_RESET flags
> can't be treated as
> SoC (IP) specific parameters any more - they should be board specific.
> 
> I think, some kind of board_layout {} DT definition need to be
> created to solve these issues (just an idea). For example:
> 
> Board DTS file:
> board_layout {
>  ti,hwmods-init-no-reset = "gpio1", "uart3";
>  ti,hwmods-init-no-idle = "uart3";
> }

nah... if we go down the path of passing this info via DT, all you need
to pass is some information if the IP should be reset/idled or not
during initialization.
Tony Lindgren July 10, 2013, 12:27 p.m. UTC | #11
* Tony Lindgren <tony@atomide.com> [130710 05:17]:
> * Kevin Hilman <khilman@linaro.org> [130710 01:29]:
> 
> If the problem is omap-serial.c idling things while earlycon is
> running, then omap-serial.c should be fixed.

If this is the case, then delaying the idling in omap-serial.c
should take care of the issue as earlycon gets disabled when
the real console has been initialized.

Regards,

Tony
diff mbox

Patch

diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index 7341eff..d8dca68 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -2559,6 +2559,12 @@  static void __init _setup_postsetup(struct omap_hwmod *oh)
 	    (postsetup_state == _HWMOD_STATE_IDLE)) {
 		oh->_int_flags |= _HWMOD_SKIP_ENABLE;
 		postsetup_state = _HWMOD_STATE_ENABLED;
+
+		/* tell pm_runtime this device is already active */
+		pm_runtime_set_active(&oh->od->pdev->dev);
+	} else {
+		/* tell pm_runtime this device is trully suspended */
+		pm_runtime_set_suspended(&oh->od->pdev->dev);
 	}
 
 	if (postsetup_state == _HWMOD_STATE_IDLE)