diff mbox

Xen: Force non-irq keyhandler to be run in tasklet when receive a debugkey from serial port

Message ID 20161022112303.11451-1-tianyu.lan@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

lan,Tianyu Oct. 22, 2016, 11:23 a.m. UTC
__serial_rx() runs in either irq handler or timer handler and non-irq
keyhandler should not run in these contexts. So always force non-irq
keyhandler to run in tasklet when receive a debugkey from serial port

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
 xen/drivers/char/console.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Konrad Rzeszutek Wilk Oct. 24, 2016, 12:19 a.m. UTC | #1
On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
> __serial_rx() runs in either irq handler or timer handler and non-irq
> keyhandler should not run in these contexts. So always force non-irq
> keyhandler to run in tasklet when receive a debugkey from serial port

If the machine is hung with an IRQ handler being stuck, and 
one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
(as it is not an IRQ handler??


> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  xen/drivers/char/console.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
> index b0f74ce..184b523 100644
> --- a/xen/drivers/char/console.c
> +++ b/xen/drivers/char/console.c
> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>  {
>      if ( xen_rx )
> -        return handle_keypress(c, regs, !in_irq());
> +        return handle_keypress(c, regs, true);
>  
>      /* Deliver input to guest buffer, unless it is already full. */
>      if ( (serial_rx_prod-serial_rx_cons) != SERIAL_RX_SIZE )
> -- 
> 2.9.3
>
Jan Beulich Oct. 24, 2016, 10:53 a.m. UTC | #2
>>> On 22.10.16 at 13:23, <tianyu.lan@intel.com> wrote:
> __serial_rx() runs in either irq handler or timer handler and non-irq
> keyhandler should not run in these contexts. So always force non-irq
> keyhandler to run in tasklet when receive a debugkey from serial port
> 
> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> ---
>  xen/drivers/char/console.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
> index b0f74ce..184b523 100644
> --- a/xen/drivers/char/console.c
> +++ b/xen/drivers/char/console.c
> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>  {
>      if ( xen_rx )
> -        return handle_keypress(c, regs, !in_irq());
> +        return handle_keypress(c, regs, true);

Together with one of your earlier patches having got reverted, I
think we need to take a step back here instead of going back to
what was requested to be changed from v2 of the original patch.
In particular I assume that the problem you're trying to address is
not limited to dump_timerq() - at least dump_runq() should be as
problematic on many-CPU systems.

I think (and I vaguely recall possibly having said so during earlier
review) that dump functions the output of which depends on CPU
count should get modeled after dump_registers(), and it might be
worth abstracting this in keyhandler.c. In any case quite likely the
other patch of yours (which the one here basically modifies) may
then also want to be reverted.

Jan
Wei Liu Oct. 24, 2016, 11:26 a.m. UTC | #3
On Mon, Oct 24, 2016 at 04:53:17AM -0600, Jan Beulich wrote:
> >>> On 22.10.16 at 13:23, <tianyu.lan@intel.com> wrote:
> > __serial_rx() runs in either irq handler or timer handler and non-irq
> > keyhandler should not run in these contexts. So always force non-irq
> > keyhandler to run in tasklet when receive a debugkey from serial port
> > 
> > Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> > ---
> >  xen/drivers/char/console.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
> > index b0f74ce..184b523 100644
> > --- a/xen/drivers/char/console.c
> > +++ b/xen/drivers/char/console.c
> > @@ -347,7 +347,7 @@ static void switch_serial_input(void)
> >  static void __serial_rx(char c, struct cpu_user_regs *regs)
> >  {
> >      if ( xen_rx )
> > -        return handle_keypress(c, regs, !in_irq());
> > +        return handle_keypress(c, regs, true);
> 
> Together with one of your earlier patches having got reverted, I
> think we need to take a step back here instead of going back to
> what was requested to be changed from v2 of the original patch.
> In particular I assume that the problem you're trying to address is
> not limited to dump_timerq() - at least dump_runq() should be as
> problematic on many-CPU systems.
> 
> I think (and I vaguely recall possibly having said so during earlier
> review) that dump functions the output of which depends on CPU
> count should get modeled after dump_registers(), and it might be
> worth abstracting this in keyhandler.c. In any case quite likely the
> other patch of yours (which the one here basically modifies) may
> then also want to be reverted.
> 

Feel free to do so. We don't really need to rush this into 4.8.

Wei.

> Jan
>
lan,Tianyu Oct. 24, 2016, 1:29 p.m. UTC | #4
On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>> __serial_rx() runs in either irq handler or timer handler and non-irq
>> keyhandler should not run in these contexts. So always force non-irq
>> keyhandler to run in tasklet when receive a debugkey from serial port
>
> If the machine is hung with an IRQ handler being stuck, and
> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
> (as it is not an IRQ handler??

If serial port's interrupt still works in this case, the 'C'
keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
changed by my patches if includes this patch.


>
>
>>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  xen/drivers/char/console.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
>> index b0f74ce..184b523 100644
>> --- a/xen/drivers/char/console.c
>> +++ b/xen/drivers/char/console.c
>> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>>  {
>>      if ( xen_rx )
>> -        return handle_keypress(c, regs, !in_irq());
>> +        return handle_keypress(c, regs, true);
>>
>>      /* Deliver input to guest buffer, unless it is already full. */
>>      if ( (serial_rx_prod-serial_rx_cons) != SERIAL_RX_SIZE )
>> --
>> 2.9.3
>>
Konrad Rzeszutek Wilk Oct. 24, 2016, 1:38 p.m. UTC | #5
On Mon, Oct 24, 2016 at 09:29:53PM +0800, Lan, Tianyu wrote:
> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
> > On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
> > > __serial_rx() runs in either irq handler or timer handler and non-irq
> > > keyhandler should not run in these contexts. So always force non-irq
> > > keyhandler to run in tasklet when receive a debugkey from serial port
> > 
> > If the machine is hung with an IRQ handler being stuck, and
> > one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
> > (as it is not an IRQ handler??
> 
> If serial port's interrupt still works in this case, the 'C'
> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
> changed by my patches if includes this patch.

Right, but the tasklet won't get to run at that point - as for example
the IRQ handler is stuck - so tasklets never get run? Or maybe
they do on another CPU?


> 
> 
> > 
> > 
> > > 
> > > Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> > > ---
> > >  xen/drivers/char/console.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
> > > index b0f74ce..184b523 100644
> > > --- a/xen/drivers/char/console.c
> > > +++ b/xen/drivers/char/console.c
> > > @@ -347,7 +347,7 @@ static void switch_serial_input(void)
> > >  static void __serial_rx(char c, struct cpu_user_regs *regs)
> > >  {
> > >      if ( xen_rx )
> > > -        return handle_keypress(c, regs, !in_irq());
> > > +        return handle_keypress(c, regs, true);
> > > 
> > >      /* Deliver input to guest buffer, unless it is already full. */
> > >      if ( (serial_rx_prod-serial_rx_cons) != SERIAL_RX_SIZE )
> > > --
> > > 2.9.3
> > >
Jan Beulich Oct. 24, 2016, 1:54 p.m. UTC | #6
>>> On 24.10.16 at 15:29, <tianyu.lan@intel.com> wrote:
> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
>> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>> keyhandler should not run in these contexts. So always force non-irq
>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>
>> If the machine is hung with an IRQ handler being stuck, and
>> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
>> (as it is not an IRQ handler??
> 
> If serial port's interrupt still works in this case, the 'C'
> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
> changed by my patches if includes this patch.

As indicated already by Konrad's reply, this is not going to be
acceptable.

Jan
lan,Tianyu Oct. 24, 2016, 2:01 p.m. UTC | #7
On 10/24/2016 6:53 PM, Jan Beulich wrote:
>>>> On 22.10.16 at 13:23, <tianyu.lan@intel.com> wrote:
>> __serial_rx() runs in either irq handler or timer handler and non-irq
>> keyhandler should not run in these contexts. So always force non-irq
>> keyhandler to run in tasklet when receive a debugkey from serial port
>>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>> ---
>>  xen/drivers/char/console.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
>> index b0f74ce..184b523 100644
>> --- a/xen/drivers/char/console.c
>> +++ b/xen/drivers/char/console.c
>> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>>  {
>>      if ( xen_rx )
>> -        return handle_keypress(c, regs, !in_irq());
>> +        return handle_keypress(c, regs, true);
>
> Together with one of your earlier patches having got reverted, I
> think we need to take a step back here instead of going back to
> what was requested to be changed from v2 of the original patch.
> In particular I assume that the problem you're trying to address is
> not limited to dump_timerq() - at least dump_runq() should be as
> problematic on many-CPU systems.

I think the issue here is that my previous patch commit
610b4eda2c("keyhandler: rework process of nonirq keyhandler") makes
non-irq keyhandler run in irq context. This is caused by input param
"!in_irq()" which is false in irq context. handle_keypress() runs 
keyhandler synchronically. This patch fixes the issue.

>
> I think (and I vaguely recall possibly having said so during earlier
> review) that dump functions the output of which depends on CPU
> count should get modeled after dump_registers(), and it might be
> worth abstracting this in keyhandler.c.

Yes, but this sounds like a new feature or framework rework rather than 
a fix patch.


> In any case quite likely the
> other patch of yours (which the one here basically modifies) may
> then also want to be reverted.

I think patch "timer: process softirq during dumping timer"
does right thing. The issue is triggered by previous patch.
lan,Tianyu Oct. 24, 2016, 2:12 p.m. UTC | #8
On 10/24/2016 9:38 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 24, 2016 at 09:29:53PM +0800, Lan, Tianyu wrote:
>> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
>>> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>> keyhandler should not run in these contexts. So always force non-irq
>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>
>>> If the machine is hung with an IRQ handler being stuck, and
>>> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
>>> (as it is not an IRQ handler??
>>
>> If serial port's interrupt still works in this case, the 'C'
>> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
>> changed by my patches if includes this patch.
>
> Right, but the tasklet won't get to run at that point - as for example
> the IRQ handler is stuck - so tasklets never get run? Or maybe
> they do on another CPU?
>


If serial interrupt handler works, the cpu receiving serial port
interrupt should work normally. Tasklet_schedule() in the
handle_keypress() queues keyhandler tasklet to that cpu and tasklet also
should get to run at that point.
.
lan,Tianyu Oct. 24, 2016, 2:15 p.m. UTC | #9
On 10/24/2016 9:54 PM, Jan Beulich wrote:
>>>> On 24.10.16 at 15:29, <tianyu.lan@intel.com> wrote:
>> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
>>> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>> keyhandler should not run in these contexts. So always force non-irq
>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>
>>> If the machine is hung with an IRQ handler being stuck, and
>>> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
>>> (as it is not an IRQ handler??
>>
>> If serial port's interrupt still works in this case, the 'C'
>> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
>> changed by my patches if includes this patch.

Sorry. A typo. I meant the behavior wasn't changed by my patches.

>
> As indicated already by Konrad's reply, this is not going to be
> acceptable.
>
Jan Beulich Oct. 24, 2016, 2:28 p.m. UTC | #10
>>> On 24.10.16 at 16:01, <tianyu.lan@intel.com> wrote:
> On 10/24/2016 6:53 PM, Jan Beulich wrote:
>>>>> On 22.10.16 at 13:23, <tianyu.lan@intel.com> wrote:
>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>> keyhandler should not run in these contexts. So always force non-irq
>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>
>>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>>> ---
>>>  xen/drivers/char/console.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
>>> index b0f74ce..184b523 100644
>>> --- a/xen/drivers/char/console.c
>>> +++ b/xen/drivers/char/console.c
>>> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>>>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>>>  {
>>>      if ( xen_rx )
>>> -        return handle_keypress(c, regs, !in_irq());
>>> +        return handle_keypress(c, regs, true);
>>
>> Together with one of your earlier patches having got reverted, I
>> think we need to take a step back here instead of going back to
>> what was requested to be changed from v2 of the original patch.
>> In particular I assume that the problem you're trying to address is
>> not limited to dump_timerq() - at least dump_runq() should be as
>> problematic on many-CPU systems.
> 
> I think the issue here is that my previous patch commit
> 610b4eda2c("keyhandler: rework process of nonirq keyhandler") makes
> non-irq keyhandler run in irq context. This is caused by input param
> "!in_irq()" which is false in irq context. handle_keypress() runs 
> keyhandler synchronically. This patch fixes the issue.

Not really - your earlier patch only moved the !in_irq() check, i.e.
things continued to run in the same context they always did
_except_ for the one special case you cared about. Plus your
other patch fixed the respective issue only for one individual
handler, instead of generally.

>> I think (and I vaguely recall possibly having said so during earlier
>> review) that dump functions the output of which depends on CPU
>> count should get modeled after dump_registers(), and it might be
>> worth abstracting this in keyhandler.c.
> 
> Yes, but this sounds like a new feature or framework rework rather than 
> a fix patch.

In a way, sure. It's a more extensive fix, which would avoid
someone else running into the same issue with another handler.

>> In any case quite likely the
>> other patch of yours (which the one here basically modifies) may
>> then also want to be reverted.
> 
> I think patch "timer: process softirq during dumping timer"
> does right thing. The issue is triggered by previous patch.

Well - the issue did not exist prior to both of your patches
going in, and I think it would have continued to exist if the
keyhandler rework patch alone had been reverted. (And I'm
afraid anyway that "previous" is ambiguous here, as the timer
handler change went in first.)

Jan
Jan Beulich Oct. 24, 2016, 2:29 p.m. UTC | #11
>>> On 24.10.16 at 16:12, <tianyu.lan@intel.com> wrote:

> 
> On 10/24/2016 9:38 PM, Konrad Rzeszutek Wilk wrote:
>> On Mon, Oct 24, 2016 at 09:29:53PM +0800, Lan, Tianyu wrote:
>>> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
>>>> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>>> keyhandler should not run in these contexts. So always force non-irq
>>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>>
>>>> If the machine is hung with an IRQ handler being stuck, and
>>>> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
>>>> (as it is not an IRQ handler??
>>>
>>> If serial port's interrupt still works in this case, the 'C'
>>> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
>>> changed by my patches if includes this patch.
>>
>> Right, but the tasklet won't get to run at that point - as for example
>> the IRQ handler is stuck - so tasklets never get run? Or maybe
>> they do on another CPU?
> 
> If serial interrupt handler works, the cpu receiving serial port
> interrupt should work normally. Tasklet_schedule() in the
> handle_keypress() queues keyhandler tasklet to that cpu and tasklet also
> should get to run at that point.

Unless something else in the system is in a state preventing that.

Jan
Jan Beulich Oct. 24, 2016, 2:31 p.m. UTC | #12
>>> On 24.10.16 at 16:15, <tianyu.lan@intel.com> wrote:

> 
> On 10/24/2016 9:54 PM, Jan Beulich wrote:
>>>>> On 24.10.16 at 15:29, <tianyu.lan@intel.com> wrote:
>>> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
>>>> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>>> keyhandler should not run in these contexts. So always force non-irq
>>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>>
>>>> If the machine is hung with an IRQ handler being stuck, and
>>>> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
>>>> (as it is not an IRQ handler??
>>>
>>> If serial port's interrupt still works in this case, the 'C'
>>> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
>>> changed by my patches if includes this patch.
> 
> Sorry. A typo. I meant the behavior wasn't changed by my patches.

How was it not? The softirq machinery didn't get invoked in that case
prior to your patch, afaict.

Jan
lan,Tianyu Oct. 24, 2016, 2:43 p.m. UTC | #13
On 10/24/2016 10:31 PM, Jan Beulich wrote:
>>>> On 24.10.16 at 16:15, <tianyu.lan@intel.com> wrote:
>
>>
>> On 10/24/2016 9:54 PM, Jan Beulich wrote:
>>>>>> On 24.10.16 at 15:29, <tianyu.lan@intel.com> wrote:
>>>> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
>>>>> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>>>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>>>> keyhandler should not run in these contexts. So always force non-irq
>>>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>>>
>>>>> If the machine is hung with an IRQ handler being stuck, and
>>>>> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
>>>>> (as it is not an IRQ handler??
>>>>
>>>> If serial port's interrupt still works in this case, the 'C'
>>>> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
>>>> changed by my patches if includes this patch.
>>
>> Sorry. A typo. I meant the behavior wasn't changed by my patches.
>
> How was it not? The softirq machinery didn't get invoked in that case
> prior to your patch, afaict.
>

Which softirq? You mean addiing process_pending_softirqs() in the
dump_timerq()?
Konrad Rzeszutek Wilk Oct. 24, 2016, 2:56 p.m. UTC | #14
On Mon, Oct 24, 2016 at 10:43:57PM +0800, Lan, Tianyu wrote:
> 
> 
> On 10/24/2016 10:31 PM, Jan Beulich wrote:
> > > > > On 24.10.16 at 16:15, <tianyu.lan@intel.com> wrote:
> > 
> > > 
> > > On 10/24/2016 9:54 PM, Jan Beulich wrote:
> > > > > > > On 24.10.16 at 15:29, <tianyu.lan@intel.com> wrote:
> > > > > On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
> > > > > > On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
> > > > > > > __serial_rx() runs in either irq handler or timer handler and non-irq
> > > > > > > keyhandler should not run in these contexts. So always force non-irq
> > > > > > > keyhandler to run in tasklet when receive a debugkey from serial port
> > > > > > 
> > > > > > If the machine is hung with an IRQ handler being stuck, and
> > > > > > one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
> > > > > > (as it is not an IRQ handler??
> > > > > 
> > > > > If serial port's interrupt still works in this case, the 'C'
> > > > > keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
> > > > > changed by my patches if includes this patch.
> > > 
> > > Sorry. A typo. I meant the behavior wasn't changed by my patches.
> > 
> > How was it not? The softirq machinery didn't get invoked in that case
> > prior to your patch, afaict.
> > 
> 
> Which softirq? You mean addiing process_pending_softirqs() in the
> dump_timerq()?

The softirq code gets executed _after_ the IRQ handler (do_IRQ) is finished
doing whatever it needs to do.

Which means that if have say:

 1) an errant piece of code that is spinning forever in
    a loop (say in do_IRQ and then perhaps in __do_IRQ_guest?)

 2) The system admin pressed 'Ctrl-A' three times followed by 'C',
    and ends up in the serial code. We schedule the tasklet, which
    calls 'tasklet_schedule_on_cpu' - which means the tasklet will
    only run on _this_ CPU. And then return back to 1)
    code.

 3) The 1) code keeps spinning forever, and we never call do_softirq
    which means we never process the tasklet.

That is the concern I have. I really really need 'C' to work so that
when things go wrong we can analyze the crashdump and figure out what
went wrong. But with your patch I can't create the crash dump.

I may also be wrong - maybe this is not the case and the crash dump
handler runs just fine (which is just fine). In which case please please
explain that in the commit description.

Thanks!
Jan Beulich Oct. 24, 2016, 2:58 p.m. UTC | #15
>>> On 24.10.16 at 16:43, <tianyu.lan@intel.com> wrote:

> 
> On 10/24/2016 10:31 PM, Jan Beulich wrote:
>>>>> On 24.10.16 at 16:15, <tianyu.lan@intel.com> wrote:
>>
>>>
>>> On 10/24/2016 9:54 PM, Jan Beulich wrote:
>>>>>>> On 24.10.16 at 15:29, <tianyu.lan@intel.com> wrote:
>>>>> On 10/24/2016 8:19 AM, Konrad Rzeszutek Wilk wrote:
>>>>>> On Sat, Oct 22, 2016 at 07:23:03PM +0800, Lan Tianyu wrote:
>>>>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>>>>> keyhandler should not run in these contexts. So always force non-irq
>>>>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>>>>
>>>>>> If the machine is hung with an IRQ handler being stuck, and
>>>>>> one does 'Ctrl-Ax3` followed by 'C' .. which would not be invoked
>>>>>> (as it is not an IRQ handler??
>>>>>
>>>>> If serial port's interrupt still works in this case, the 'C'
>>>>> keyhandler kexec_crash() will be invoked in a tasklet. This behavior was
>>>>> changed by my patches if includes this patch.
>>>
>>> Sorry. A typo. I meant the behavior wasn't changed by my patches.
>>
>> How was it not? The softirq machinery didn't get invoked in that case
>> prior to your patch, afaict.
> 
> Which softirq? You mean addiing process_pending_softirqs() in the
> dump_timerq()?

I'm sorry - s/softirq/tasklet/.

Jan
lan,Tianyu Oct. 24, 2016, 3:03 p.m. UTC | #16
On 10/24/2016 10:28 PM, Jan Beulich wrote:
>>>> On 24.10.16 at 16:01, <tianyu.lan@intel.com> wrote:
>> On 10/24/2016 6:53 PM, Jan Beulich wrote:
>>>>>> On 22.10.16 at 13:23, <tianyu.lan@intel.com> wrote:
>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>> keyhandler should not run in these contexts. So always force non-irq
>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>>
>>>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>>>> ---
>>>>  xen/drivers/char/console.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
>>>> index b0f74ce..184b523 100644
>>>> --- a/xen/drivers/char/console.c
>>>> +++ b/xen/drivers/char/console.c
>>>> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>>>>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>>>>  {
>>>>      if ( xen_rx )
>>>> -        return handle_keypress(c, regs, !in_irq());
>>>> +        return handle_keypress(c, regs, true);
>>>
>>> Together with one of your earlier patches having got reverted, I
>>> think we need to take a step back here instead of going back to
>>> what was requested to be changed from v2 of the original patch.
>>> In particular I assume that the problem you're trying to address is
>>> not limited to dump_timerq() - at least dump_runq() should be as
>>> problematic on many-CPU systems.
>>
>> I think the issue here is that my previous patch commit
>> 610b4eda2c("keyhandler: rework process of nonirq keyhandler") makes
>> non-irq keyhandler run in irq context. This is caused by input param
>> "!in_irq()" which is false in irq context. handle_keypress() runs
>> keyhandler synchronically. This patch fixes the issue.
>
> Not really - your earlier patch only moved the !in_irq() check, i.e.
> things continued to run in the same context they always did
> _except_ for the one special case you cared about.

I supposed the special case you meant is to run keyhandler in timer 
handler. It's necessary to make any timer handler run in a short time 
otherwise it will trigger watchdog problem.


> Plus your
> other patch fixed the respective issue only for one individual
> handler, instead of generally.

So you think adding process_pending_softirqs() in the keyhandler isn't
general? But this is a common solution so far.


>
>>> I think (and I vaguely recall possibly having said so during earlier
>>> review) that dump functions the output of which depends on CPU
>>> count should get modeled after dump_registers(), and it might be
>>> worth abstracting this in keyhandler.c.
>>
>> Yes, but this sounds like a new feature or framework rework rather than
>> a fix patch.
>
> In a way, sure. It's a more extensive fix, which would avoid
> someone else running into the same issue with another handler.

This seems a big change and a lot of dump function needs to rework, right?



>
>>> In any case quite likely the
>>> other patch of yours (which the one here basically modifies) may
>>> then also want to be reverted.
>>
>> I think patch "timer: process softirq during dumping timer"
>> does right thing. The issue is triggered by previous patch.
>
> Well - the issue did not exist prior to both of your patches
> going in, and I think it would have continued to exist if the
> keyhandler rework patch alone had been reverted. (And I'm
> afraid anyway that "previous" is ambiguous here, as the timer
> handler change went in first.)
>
> Jan
>
Jan Beulich Oct. 24, 2016, 3:14 p.m. UTC | #17
>>> On 24.10.16 at 17:03, <tianyu.lan@intel.com> wrote:

> 
> On 10/24/2016 10:28 PM, Jan Beulich wrote:
>>>>> On 24.10.16 at 16:01, <tianyu.lan@intel.com> wrote:
>>> On 10/24/2016 6:53 PM, Jan Beulich wrote:
>>>>>>> On 22.10.16 at 13:23, <tianyu.lan@intel.com> wrote:
>>>>> __serial_rx() runs in either irq handler or timer handler and non-irq
>>>>> keyhandler should not run in these contexts. So always force non-irq
>>>>> keyhandler to run in tasklet when receive a debugkey from serial port
>>>>>
>>>>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>>>>> ---
>>>>>  xen/drivers/char/console.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
>>>>> index b0f74ce..184b523 100644
>>>>> --- a/xen/drivers/char/console.c
>>>>> +++ b/xen/drivers/char/console.c
>>>>> @@ -347,7 +347,7 @@ static void switch_serial_input(void)
>>>>>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>>>>>  {
>>>>>      if ( xen_rx )
>>>>> -        return handle_keypress(c, regs, !in_irq());
>>>>> +        return handle_keypress(c, regs, true);
>>>>
>>>> Together with one of your earlier patches having got reverted, I
>>>> think we need to take a step back here instead of going back to
>>>> what was requested to be changed from v2 of the original patch.
>>>> In particular I assume that the problem you're trying to address is
>>>> not limited to dump_timerq() - at least dump_runq() should be as
>>>> problematic on many-CPU systems.
>>>
>>> I think the issue here is that my previous patch commit
>>> 610b4eda2c("keyhandler: rework process of nonirq keyhandler") makes
>>> non-irq keyhandler run in irq context. This is caused by input param
>>> "!in_irq()" which is false in irq context. handle_keypress() runs
>>> keyhandler synchronically. This patch fixes the issue.
>>
>> Not really - your earlier patch only moved the !in_irq() check, i.e.
>> things continued to run in the same context they always did
>> _except_ for the one special case you cared about.
> 
> I supposed the special case you meant is to run keyhandler in timer 
> handler. It's necessary to make any timer handler run in a short time 
> otherwise it will trigger watchdog problem.
> 
> 
>> Plus your
>> other patch fixed the respective issue only for one individual
>> handler, instead of generally.
> 
> So you think adding process_pending_softirqs() in the keyhandler isn't
> general?

No - it fixes just the one handler where you do that addition. As
already said, you for instance leave dump_runq() unfixed, yet I
believe it has the same issue.

> But this is a common solution so far.

It's a solution which has been used in a few special cases; I
wouldn't call this a "common solution", and I know I had
expressed this already during review of your original series.

>>>> I think (and I vaguely recall possibly having said so during earlier
>>>> review) that dump functions the output of which depends on CPU
>>>> count should get modeled after dump_registers(), and it might be
>>>> worth abstracting this in keyhandler.c.
>>>
>>> Yes, but this sounds like a new feature or framework rework rather than
>>> a fix patch.
>>
>> In a way, sure. It's a more extensive fix, which would avoid
>> someone else running into the same issue with another handler.
> 
> This seems a big change and a lot of dump function needs to rework, right?

I'm not sure. In particular I didn't go check which dump functions
have their output really scale with CPU count.

Jan
diff mbox

Patch

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index b0f74ce..184b523 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -347,7 +347,7 @@  static void switch_serial_input(void)
 static void __serial_rx(char c, struct cpu_user_regs *regs)
 {
     if ( xen_rx )
-        return handle_keypress(c, regs, !in_irq());
+        return handle_keypress(c, regs, true);
 
     /* Deliver input to guest buffer, unless it is already full. */
     if ( (serial_rx_prod-serial_rx_cons) != SERIAL_RX_SIZE )