diff mbox

[06/15] xen: trace IRQ enabling/disabling

Message ID 149633845700.12814.7130992212550379105.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli June 1, 2017, 5:34 p.m. UTC
Trace when interrupts are disabled and (re)enabled.
Basically, we replace the IRQ disabling and enabling
functions with helpers that does the same, but also
output the proper trace record.

For putting in the record something that will let
us identify _where_ in the code (i.e., in what function)
the IRQ manipulation is happening, use either:
 - current_text_addr(),
 - or __builtin_return_address(0).

In fact, depending on whether the disabling/enabling
happens in macros (like for local_irq_disable() and
local_irq_enable()) or in actual functions (like in
spin_lock_irq*()), it is either:
 - the actual content of the instruction pointer when
   IRQ are disabled/enabled,
 - or the return address of the utility function where
   IRQ are disabled/enabled,
that will tell us what it is the actual piece of code
that is asking for the IRQ manipulation operation.

Gate this with its specific Kconfig option, and keep
it in disabled state by default (i.e., don't build it,
if not explicitly specified), as the impact on
performance may be non negligible.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
---
 xen/Kconfig.debug                  |   11 ++++-
 xen/common/spinlock.c              |   16 +++++--
 xen/common/trace.c                 |   10 +++-
 xen/include/asm-arm/arm32/system.h |   12 +++++
 xen/include/asm-arm/arm64/system.h |   12 +++++
 xen/include/asm-x86/system.h       |   85 ++++++++++++++++++++++++++++++++++--
 xen/include/public/trace.h         |    2 +
 xen/include/xen/rwlock.h           |   33 +++++++++++---
 8 files changed, 161 insertions(+), 20 deletions(-)

Comments

Andrew Cooper June 1, 2017, 7:08 p.m. UTC | #1
On 01/06/17 18:34, Dario Faggioli wrote:
> Trace when interrupts are disabled and (re)enabled.
> Basically, we replace the IRQ disabling and enabling
> functions with helpers that does the same, but also
> output the proper trace record.
>
> For putting in the record something that will let
> us identify _where_ in the code (i.e., in what function)
> the IRQ manipulation is happening, use either:
>  - current_text_addr(),
>  - or __builtin_return_address(0).
>
> In fact, depending on whether the disabling/enabling
> happens in macros (like for local_irq_disable() and
> local_irq_enable()) or in actual functions (like in
> spin_lock_irq*()), it is either:
>  - the actual content of the instruction pointer when
>    IRQ are disabled/enabled,
>  - or the return address of the utility function where
>    IRQ are disabled/enabled,
> that will tell us what it is the actual piece of code
> that is asking for the IRQ manipulation operation.
>
> Gate this with its specific Kconfig option, and keep
> it in disabled state by default (i.e., don't build it,
> if not explicitly specified), as the impact on
> performance may be non negligible.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

This reminds me that I need to dust off my series which fixes
local_irq_save() to not capture its variables by name.

I.e. it would be used as "flags = local_irq_save();" in the normal C way.

I'm fairly sure I can also improve the generated assembly.

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
> ---
>  xen/Kconfig.debug                  |   11 ++++-
>  xen/common/spinlock.c              |   16 +++++--
>  xen/common/trace.c                 |   10 +++-
>  xen/include/asm-arm/arm32/system.h |   12 +++++
>  xen/include/asm-arm/arm64/system.h |   12 +++++
>  xen/include/asm-x86/system.h       |   85 ++++++++++++++++++++++++++++++++++--
>  xen/include/public/trace.h         |    2 +
>  xen/include/xen/rwlock.h           |   33 +++++++++++---
>  8 files changed, 161 insertions(+), 20 deletions(-)
>
> diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
> index 374c1c0..81910c9 100644
> --- a/xen/Kconfig.debug
> +++ b/xen/Kconfig.debug
> @@ -98,7 +98,7 @@ config PERF_ARRAYS
>  	---help---
>  	  Enables software performance counter array histograms.
>  
> -config TRACING
> +menuconfig TRACING
>  	bool "Tracing"
>  	default y
>  	---help---
> @@ -106,6 +106,15 @@ config TRACING
>  	  in per-CPU ring buffers. The 'xentrace' tool can be used to read
>  	  the buffers and dump the content on the disk.
>  
> +config TRACE_IRQSOFF
> +	bool "Trace when IRQs are disabled and (re)enabled" if EXPERT = "y"
> +	default n
> +	depends on TRACING
> +	---help---
> +	  Makes it possible to generate events _every_ time IRQs are disabled
> +          and (re)enabled, with also an indication of where that happened.
> +          Note that this comes with high overead and produces huge amount of
> +          tracing data.

You've got mixed tabs/spaces here.

>  
>  config VERBOSE_DEBUG
>  	bool "Verbose debug messages"
> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> index 2a06406..33b903e 100644
> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>  void _spin_lock_irq(spinlock_t *lock)
>  {
>      ASSERT(local_irq_is_enabled());
> -    local_irq_disable();
> +    _local_irq_disable();
> +    if ( unlikely(tb_init_done) )
> +        trace_irq_disable_ret();
>      _spin_lock(lock);
>  }

Is it sensible to do this way around?

By writing the trace record while interrupts are disabled, you do
prevent nesting in the general case (but not in NMIs/MCEs or the
irqsave() variants), but you increase the critical region while
interrupts are disabled.

Alternatively, writing the trace record outside of the critical region
would reduce the size of the region.  Interpretation logic already needs
to cope with nesting, so is one extra level of nesting in this corner
case a problem?

Does the logic cope with the fact that interrupt gates automatically
disable interrupts?

~Andrew
Dario Faggioli June 1, 2017, 11:42 p.m. UTC | #2
On Thu, 2017-06-01 at 20:08 +0100, Andrew Cooper wrote:
> On 01/06/17 18:34, Dario Faggioli wrote:
> > diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> > index 2a06406..33b903e 100644
> > --- a/xen/common/spinlock.c
> > +++ b/xen/common/spinlock.c
> > @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
> >  void _spin_lock_irq(spinlock_t *lock)
> >  {
> >      ASSERT(local_irq_is_enabled());
> > -    local_irq_disable();
> > +    _local_irq_disable();
> > +    if ( unlikely(tb_init_done) )
> > +        trace_irq_disable_ret();
> >      _spin_lock(lock);
> >  }
> 
> Is it sensible to do this way around?
> 
Well, I guess either variant has up and down sides, and it looks to me
that there is no way to measure this, without interfering with the
behavior of the thing being measured ("Once upon a time, there was a
cat, who lived in a box. His name was Schrödinger..." :-D :-D :-D)

> By writing the trace record while interrupts are disabled, you do
> prevent nesting in the general case (but not in NMIs/MCEs or the
> irqsave() variants), 
>
Forgive the ignorance, what's special about NMIs/MCAs that is relevant
for this? About _irqsave(), I'm also not sure what you mean, as
_irqsave() is already done differently than this.

> but you increase the critical region while
> interrupts are disabled.
> 
I know.

> Alternatively, writing the trace record outside of the critical
> region
> would reduce the size of the region.  Interpretation logic already
> needs
> to cope with nesting, so is one extra level of nesting in this corner
> case a problem?
> 
TBH, I was indeed trying to offer to the component that will be looking
at and interpreting the data, the as clear as possible view on when
IRQs are _actually_ disabled and enabled. As in, if nesting occurs,
only the event corresponding to:
 - the first local_irq_disable()
 - the last local_irq_enable()

I.e., to not require that it (the interpretation logic) does understand
nesting.

But more than this, what I was concerned about was the accuracy of the
reporting itself. In fact, if I do as you suggest, I can be interrupted
(as interrupts are still on) after having called trace_irq_disable().
That means the report will show higher IRQ disabled time period, for
this instance, than what the reality is.

And the same is true on the enabling side, when I call
trace_irq_enable() _before_ re-enabling interrupts, which has the same
bad effect on the length of IRQ disabled section.

Maybe, considering that anything that will interrupt me in these cases,
are other interrupts, that will most likely disable interrupts
themselves, I should not worry too much about this... What do you
think?

> Does the logic cope with the fact that interrupt gates automatically
> disable interrupts?
> 
Ah, right. No, it does not. I probably should mention this in the
changelog. Any ideas of how to deal with that? If yes, I'm more than
happy to fix this...

Thanks and Regards,
Dario
Julien Grall June 7, 2017, 11:16 a.m. UTC | #3
Hi Dario,

On 01/06/17 18:34, Dario Faggioli wrote:
> Trace when interrupts are disabled and (re)enabled.
> Basically, we replace the IRQ disabling and enabling
> functions with helpers that does the same, but also
> output the proper trace record.
>
> For putting in the record something that will let
> us identify _where_ in the code (i.e., in what function)
> the IRQ manipulation is happening, use either:
>  - current_text_addr(),
>  - or __builtin_return_address(0).
>
> In fact, depending on whether the disabling/enabling
> happens in macros (like for local_irq_disable() and
> local_irq_enable()) or in actual functions (like in
> spin_lock_irq*()), it is either:
>  - the actual content of the instruction pointer when
>    IRQ are disabled/enabled,
>  - or the return address of the utility function where
>    IRQ are disabled/enabled,
> that will tell us what it is the actual piece of code
> that is asking for the IRQ manipulation operation.
>
> Gate this with its specific Kconfig option, and keep
> it in disabled state by default (i.e., don't build it,
> if not explicitly specified), as the impact on
> performance may be non negligible.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
> ---
>  xen/Kconfig.debug                  |   11 ++++-
>  xen/common/spinlock.c              |   16 +++++--
>  xen/common/trace.c                 |   10 +++-
>  xen/include/asm-arm/arm32/system.h |   12 +++++
>  xen/include/asm-arm/arm64/system.h |   12 +++++
>  xen/include/asm-x86/system.h       |   85 ++++++++++++++++++++++++++++++++++--
>  xen/include/public/trace.h         |    2 +
>  xen/include/xen/rwlock.h           |   33 +++++++++++---
>  8 files changed, 161 insertions(+), 20 deletions(-)
>
> diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
> index 374c1c0..81910c9 100644
> --- a/xen/Kconfig.debug
> +++ b/xen/Kconfig.debug
> @@ -98,7 +98,7 @@ config PERF_ARRAYS
>  	---help---
>  	  Enables software performance counter array histograms.
>
> -config TRACING
> +menuconfig TRACING
>  	bool "Tracing"
>  	default y
>  	---help---
> @@ -106,6 +106,15 @@ config TRACING
>  	  in per-CPU ring buffers. The 'xentrace' tool can be used to read
>  	  the buffers and dump the content on the disk.
>
> +config TRACE_IRQSOFF
> +	bool "Trace when IRQs are disabled and (re)enabled" if EXPERT = "y"
> +	default n
> +	depends on TRACING
> +	---help---
> +	  Makes it possible to generate events _every_ time IRQs are disabled
> +          and (re)enabled, with also an indication of where that happened.
> +          Note that this comes with high overead and produces huge amount of
> +          tracing data.
>
>  config VERBOSE_DEBUG
>  	bool "Verbose debug messages"
> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> index 2a06406..33b903e 100644
> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>  void _spin_lock_irq(spinlock_t *lock)
>  {
>      ASSERT(local_irq_is_enabled());
> -    local_irq_disable();
> +    _local_irq_disable();
> +    if ( unlikely(tb_init_done) )

__trace_var already contain a "if ( !tb_init_done )". It sounds 
pointless to do it twice, and also I think it is not obvious to 
understand the meaning of the check (i.e what is tb_init_done).

Would it be possible to hide this check and avoid check tb_init_done twice?

> diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h
> index c617b40..20871ad 100644
> --- a/xen/include/asm-arm/arm32/system.h
> +++ b/xen/include/asm-arm/arm32/system.h
> @@ -4,6 +4,8 @@
>
>  #include <asm/arm32/cmpxchg.h>
>
> +#include <xen/trace.h>
> +
>  #define local_irq_disable() asm volatile ( "cpsid i @ local_irq_disable\n" : : : "cc" )
>  #define local_irq_enable()  asm volatile ( "cpsie i @ local_irq_enable\n" : : : "cc" )
>
> @@ -41,6 +43,16 @@ static inline int local_irq_is_enabled(void)
>  #define local_abort_enable() __asm__("cpsie a  @ __sta\n" : : : "memory", "cc")
>  #define local_abort_disable() __asm__("cpsid a @ __sta\n" : : : "memory", "cc")
>
> +/* We do not support tracing (at all) yet */

I know that some bits are missing for ARM, but the patch #5 contradicts 
this comment as you have CONFIG_TRACE=y by default.

> +#define trace_irq_disable_ret()   do { } while ( 0 )
> +#define trace_irq_enable_ret()    do { } while ( 0 )
> +#define trace_irq_save_ret(_x)    do { } while ( 0 )
> +#define trace_irq_restore_ret(_x) do { } while ( 0 )
> +#define _local_irq_disable()      local_irq_disable()
> +#define _local_irq_enable()       local_irq_enable()
> +#define _local_irq_save(_x)       local_irq_save(_x)
> +#define _local_irq_restore(_x)    local_irq_restore(_x)
> +

This does not need to be duplicated in both asm-arm/arm{32,64}/system.h. 
You could directly implement them in asm-arm/system.h.

Also, I would prefer if you stay consistent with x86. I.e non-underscore 
versions are calling the underscore versions and not the invert.

Cheers,
Dario Faggioli June 7, 2017, 3:22 p.m. UTC | #4
On Wed, 2017-06-07 at 12:16 +0100, Julien Grall wrote:
> Hi Dario,
> 
Hi,

> On 01/06/17 18:34, Dario Faggioli wrote:
> > diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> > index 2a06406..33b903e 100644
> > --- a/xen/common/spinlock.c
> > +++ b/xen/common/spinlock.c
> > @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
> >  void _spin_lock_irq(spinlock_t *lock)
> >  {
> >      ASSERT(local_irq_is_enabled());
> > -    local_irq_disable();
> > +    _local_irq_disable();
> > +    if ( unlikely(tb_init_done) )
> 
> __trace_var already contain a "if ( !tb_init_done )". It sounds 
> pointless to do it twice, and also I think it is not obvious to 
> understand the meaning of the check (i.e what is tb_init_done).
> 
> Would it be possible to hide this check and avoid check tb_init_done
> twice?
> 
I totally agree. And in fact, in another patch, I remove the
tb_init_done check in __trace_var(). Reason behind this is that all the
callers of __trace_var() (the main one being trace_var()), already
checks for tb_init_done themselves.

In fact, the explicit check in the caller, is the (only) basis on which
one decides to call either trace_var() or __trace_var().

This here is one of the call sites where I think the check is better
done in the caller.

> > diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-
> > arm/arm32/system.h
> > index c617b40..20871ad 100644
> > --- a/xen/include/asm-arm/arm32/system.h
> > +++ b/xen/include/asm-arm/arm32/system.h
> > @@ -4,6 +4,8 @@
> > 
> >  #include <asm/arm32/cmpxchg.h>
> > 
> > +#include <xen/trace.h>
> > +
> >  #define local_irq_disable() asm volatile ( "cpsid i @
> > local_irq_disable\n" : : : "cc" )
> >  #define local_irq_enable()  asm volatile ( "cpsie i @
> > local_irq_enable\n" : : : "cc" )
> > 
> > @@ -41,6 +43,16 @@ static inline int local_irq_is_enabled(void)
> >  #define local_abort_enable() __asm__("cpsie a  @ __sta\n" : : :
> > "memory", "cc")
> >  #define local_abort_disable() __asm__("cpsid a @ __sta\n" : : :
> > "memory", "cc")
> > 
> > +/* We do not support tracing (at all) yet */
> 
> I know that some bits are missing for ARM, but the patch #5
> contradicts 
> this comment as you have CONFIG_TRACE=y by default.
> 
No sure what you mean. Tracing is de facto on by default right now,
despite it not being implemented for ARM. So what I'm doing is
basically keeping things as they are.

If you think it should be off by default, well, let's talk about it...
but I'm not sure this is really what you are saying/asking.


> > +#define trace_irq_disable_ret()   do { } while ( 0 )
> > +#define trace_irq_enable_ret()    do { } while ( 0 )
> > +#define trace_irq_save_ret(_x)    do { } while ( 0 )
> > +#define trace_irq_restore_ret(_x) do { } while ( 0 )
> > +#define _local_irq_disable()      local_irq_disable()
> > +#define _local_irq_enable()       local_irq_enable()
> > +#define _local_irq_save(_x)       local_irq_save(_x)
> > +#define _local_irq_restore(_x)    local_irq_restore(_x)
> > +
> 
> This does not need to be duplicated in both asm-
> arm/arm{32,64}/system.h. 
> You could directly implement them in asm-arm/system.h.
> 
Ok, thanks.

> Also, I would prefer if you stay consistent with x86. I.e non-
> underscore 
> versions are calling the underscore versions and not the invert.
> 
Well, I know it is counterintuitive, but it's the easiest way of
getting over this, i.e., without the need of a lot of stubs, and
touching as few existing code as possible.

But I certainly can give it a try doing it as you say, and see how it
ends up looking like.

Regards,
Dario
George Dunlap June 8, 2017, 3:51 p.m. UTC | #5
On 02/06/17 00:42, Dario Faggioli wrote:
> On Thu, 2017-06-01 at 20:08 +0100, Andrew Cooper wrote:
>> On 01/06/17 18:34, Dario Faggioli wrote:
>>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
>>> index 2a06406..33b903e 100644
>>> --- a/xen/common/spinlock.c
>>> +++ b/xen/common/spinlock.c
>>> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>>>  void _spin_lock_irq(spinlock_t *lock)
>>>  {
>>>      ASSERT(local_irq_is_enabled());
>>> -    local_irq_disable();
>>> +    _local_irq_disable();
>>> +    if ( unlikely(tb_init_done) )
>>> +        trace_irq_disable_ret();
>>>      _spin_lock(lock);
>>>  }
>>
>> Is it sensible to do this way around?
>>
> Well, I guess either variant has up and down sides, and it looks to me
> that there is no way to measure this, without interfering with the
> behavior of the thing being measured ("Once upon a time, there was a
> cat, who lived in a box. His name was Schrödinger..." :-D :-D :-D)
> 
>> By writing the trace record while interrupts are disabled, you do
>> prevent nesting in the general case (but not in NMIs/MCEs or the
>> irqsave() variants), 
>>
> Forgive the ignorance, what's special about NMIs/MCAs that is relevant
> for this? About _irqsave(), I'm also not sure what you mean, as
> _irqsave() is already done differently than this.
> 
>> but you increase the critical region while
>> interrupts are disabled.
>>
> I know.
> 
>> Alternatively, writing the trace record outside of the critical
>> region
>> would reduce the size of the region.  Interpretation logic already
>> needs
>> to cope with nesting, so is one extra level of nesting in this corner
>> case a problem?
>>
> TBH, I was indeed trying to offer to the component that will be looking
> at and interpreting the data, the as clear as possible view on when
> IRQs are _actually_ disabled and enabled. As in, if nesting occurs,
> only the event corresponding to:
>  - the first local_irq_disable()
>  - the last local_irq_enable()
> 
> I.e., to not require that it (the interpretation logic) does understand
> nesting.
> 
> But more than this, what I was concerned about was the accuracy of the
> reporting itself. In fact, if I do as you suggest, I can be interrupted
> (as interrupts are still on) after having called trace_irq_disable().
> That means the report will show higher IRQ disabled time period, for
> this instance, than what the reality is.
> 
> And the same is true on the enabling side, when I call
> trace_irq_enable() _before_ re-enabling interrupts, which has the same
> bad effect on the length of IRQ disabled section.
> 
> Maybe, considering that anything that will interrupt me in these cases,
> are other interrupts, that will most likely disable interrupts
> themselves, I should not worry too much about this... What do you
> think?

I think it would make the analysis easier to do it the way you've done
it.  It certainly does increase the length of the actual critical
section, but we're already assuming that all of this code is going to be
disabled unless people specifically want to do IRQ analysis, so I don't
think that should be an issue.

 -George
George Dunlap June 8, 2017, 4:01 p.m. UTC | #6
On 01/06/17 18:34, Dario Faggioli wrote:
> Trace when interrupts are disabled and (re)enabled.
> Basically, we replace the IRQ disabling and enabling
> functions with helpers that does the same, but also
> output the proper trace record.
> 
> For putting in the record something that will let
> us identify _where_ in the code (i.e., in what function)
> the IRQ manipulation is happening, use either:
>  - current_text_addr(),
>  - or __builtin_return_address(0).
> 
> In fact, depending on whether the disabling/enabling
> happens in macros (like for local_irq_disable() and
> local_irq_enable()) or in actual functions (like in
> spin_lock_irq*()), it is either:
>  - the actual content of the instruction pointer when
>    IRQ are disabled/enabled,
>  - or the return address of the utility function where
>    IRQ are disabled/enabled,
> that will tell us what it is the actual piece of code
> that is asking for the IRQ manipulation operation.
> 
> Gate this with its specific Kconfig option, and keep
> it in disabled state by default (i.e., don't build it,
> if not explicitly specified), as the impact on
> performance may be non negligible.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
> ---
>  xen/Kconfig.debug                  |   11 ++++-
>  xen/common/spinlock.c              |   16 +++++--
>  xen/common/trace.c                 |   10 +++-
>  xen/include/asm-arm/arm32/system.h |   12 +++++
>  xen/include/asm-arm/arm64/system.h |   12 +++++
>  xen/include/asm-x86/system.h       |   85 ++++++++++++++++++++++++++++++++++--
>  xen/include/public/trace.h         |    2 +
>  xen/include/xen/rwlock.h           |   33 +++++++++++---
>  8 files changed, 161 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
> index 374c1c0..81910c9 100644
> --- a/xen/Kconfig.debug
> +++ b/xen/Kconfig.debug
> @@ -98,7 +98,7 @@ config PERF_ARRAYS
>  	---help---
>  	  Enables software performance counter array histograms.
>  
> -config TRACING
> +menuconfig TRACING
>  	bool "Tracing"
>  	default y
>  	---help---
> @@ -106,6 +106,15 @@ config TRACING
>  	  in per-CPU ring buffers. The 'xentrace' tool can be used to read
>  	  the buffers and dump the content on the disk.
>  
> +config TRACE_IRQSOFF
> +	bool "Trace when IRQs are disabled and (re)enabled" if EXPERT = "y"
> +	default n
> +	depends on TRACING
> +	---help---
> +	  Makes it possible to generate events _every_ time IRQs are disabled
> +          and (re)enabled, with also an indication of where that happened.
> +          Note that this comes with high overead and produces huge amount of
> +          tracing data.

*overhead (yours is missing an 'h')

I think I might emphasize that the overhead well be present even when
tracing is off.  What about something like this?

"Note that this comes with an overhead even when tracing is disabled;
and has a high overhead and produces a large amount of tracing data when
enabled."

With that change:

Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich June 8, 2017, 4:05 p.m. UTC | #7
>>> On 02.06.17 at 01:42, <dario.faggioli@citrix.com> wrote:
> On Thu, 2017-06-01 at 20:08 +0100, Andrew Cooper wrote:
>> By writing the trace record while interrupts are disabled, you do
>> prevent nesting in the general case (but not in NMIs/MCEs or the
>> irqsave() variants), 
>>
> Forgive the ignorance, what's special about NMIs/MCAs that is relevant
> for this?

NMI/#MC can nest nevertheless, so preventing nesting in the
common case doesn't mean you won't see nesting at all.

>> Does the logic cope with the fact that interrupt gates automatically
>> disable interrupts?
>> 
> Ah, right. No, it does not. I probably should mention this in the
> changelog. Any ideas of how to deal with that? If yes, I'm more than
> happy to fix this...

Well, respective entry points will need to update tracking state.
See how Linux has placed certain macros to that effect on
various entry paths.

Jan
Dario Faggioli June 8, 2017, 4:11 p.m. UTC | #8
On Thu, 2017-06-08 at 17:01 +0100, George Dunlap wrote:
> On 01/06/17 18:34, Dario Faggioli wrote:
> > diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
> > index 374c1c0..81910c9 100644
> > --- a/xen/Kconfig.debug
> > +++ b/xen/Kconfig.debug
> > @@ -98,7 +98,7 @@ config PERF_ARRAYS
> >  	---help---
> >  	  Enables software performance counter array histograms.
> >  
> > -config TRACING
> > +menuconfig TRACING
> >  	bool "Tracing"
> >  	default y
> >  	---help---
> > @@ -106,6 +106,15 @@ config TRACING
> >  	  in per-CPU ring buffers. The 'xentrace' tool can be used
> > to read
> >  	  the buffers and dump the content on the disk.
> >  
> > +config TRACE_IRQSOFF
> > +	bool "Trace when IRQs are disabled and (re)enabled" if
> > EXPERT = "y"
> > +	default n
> > +	depends on TRACING
> > +	---help---
> > +	  Makes it possible to generate events _every_ time IRQs
> > are disabled
> > +          and (re)enabled, with also an indication of where that
> > happened.
> > +          Note that this comes with high overead and produces huge
> > amount of
> > +          tracing data.
> 
> I think I might emphasize that the overhead well be present even when
> tracing is off.  What about something like this?
> 
Yes, good point.

> "Note that this comes with an overhead even when tracing is disabled;
> and has a high overhead and produces a large amount of tracing data
> when
> enabled."
> 
I like it, I'll go for it.

> With that change:
> 
> Acked-by: George Dunlap <george.dunlap@citrix.com>
> 
Thanks and Regards,
Dario
Jan Beulich June 9, 2017, 10:41 a.m. UTC | #9
>>> On 01.06.17 at 19:34, <dario.faggioli@citrix.com> wrote:
> @@ -106,6 +106,15 @@ config TRACING
>  	  in per-CPU ring buffers. The 'xentrace' tool can be used to read
>  	  the buffers and dump the content on the disk.
>  
> +config TRACE_IRQSOFF
> +	bool "Trace when IRQs are disabled and (re)enabled" if EXPERT = "y"
> +	default n

I'm not aware of any behavioral difference between this an not
specifying a default at all, so I'd like to ask to follow suit (see
surrounding code) and omit this line.

> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>  void _spin_lock_irq(spinlock_t *lock)
>  {
>      ASSERT(local_irq_is_enabled());
> -    local_irq_disable();
> +    _local_irq_disable();

Hmm, looks you're introducing new non-static, underscore-prefixed
symbols. Please don't, use e.g. a "arch_" prefix or a "_notrace" suffix
instead.

> --- a/xen/include/asm-x86/system.h
> +++ b/xen/include/asm-x86/system.h
> @@ -5,6 +5,8 @@
>  #include <xen/bitops.h>
>  #include <asm/processor.h>
>  
> +#include <xen/trace.h>

Why?

> @@ -214,6 +216,79 @@ static always_inline unsigned long __xadd(
>                         "ri" ( (x) & X86_EFLAGS_IF ) );           \
>  })
>  
> +#ifdef CONFIG_TRACE_IRQSOFF
> +
> +#define TRACE_LOCAL_ADDR ((uint64_t) current_text_addr())
> +#define TRACE_RET_ADDR   ((uint64_t) __builtin_return_address(0))
> +
> +#define trace_irq_disable(_a)                                    \
> +({                                                               \
> +    uint64_t addr = _a;                                          \
> +    __trace_var(TRC_HW_IRQ_DISABLE, 1, sizeof(addr), &addr);     \
> +})
> +#define trace_irq_enable(_a)                                     \
> +({                                                               \
> +    uint64_t addr = _a;                                          \
> +    __trace_var(TRC_HW_IRQ_ENABLE, 1, sizeof(addr), &addr);      \
> +})
> +#define trace_irq_save(_x, _a)                                   \
> +({                                                               \
> +    uint64_t addr = _a;                                          \
> +    if ( _x & X86_EFLAGS_IF )                                    \
> +        __trace_var(TRC_HW_IRQ_DISABLE, 1, sizeof(addr), &addr); \
> +})
> +#define trace_irq_restore(_x, _a)                                \
> +({                                                               \
> +    uint64_t addr = _a;                                          \
> +    if ( _x & X86_EFLAGS_IF )                                    \
> +        __trace_var(TRC_HW_IRQ_ENABLE, 1, sizeof(addr), &addr);  \
> +})
> +
> +#define trace_irq_disable_ret()   trace_irq_disable(TRACE_RET_ADDR)
> +#define trace_irq_enable_ret()    trace_irq_enable(TRACE_RET_ADDR)
> +#define trace_irq_save_ret(_x)    trace_irq_save(_x, TRACE_RET_ADDR)
> +#define trace_irq_restore_ret(_x) trace_irq_restore(_x, TRACE_RET_ADDR)
> +
> +#define local_irq_disable()                      \
> +({                                               \
> +    bool_t irqon = local_irq_is_enabled();       \
> +    _local_irq_disable();                        \
> +    if ( unlikely(tb_init_done && irqon) )       \
> +        trace_irq_disable(TRACE_LOCAL_ADDR);     \
> +})
> +
> +#define local_irq_enable()                       \
> +({                                               \
> +    if ( unlikely(tb_init_done) )                \
> +        trace_irq_enable(TRACE_LOCAL_ADDR);      \
> +    _local_irq_enable();                         \
> +})
> +
> +#define local_irq_save(_x)                       \
> +({                                               \
> +    local_save_flags(_x);                        \
> +    _local_irq_disable();                        \
> +    if ( unlikely(tb_init_done) )                \
> +        trace_irq_save(_x, TRACE_LOCAL_ADDR);    \
> +})
> +
> +#define local_irq_restore(_x)                    \
> +({                                               \
> +    if ( unlikely(tb_init_done) )                \
> +        trace_irq_restore(_x, TRACE_LOCAL_ADDR); \
> +    _local_irq_restore(_x);                      \
> +})
> +#else /* !TRACE_IRQSOFF */
> +#define trace_irq_disable_ret()   do { } while ( 0 )
> +#define trace_irq_enable_ret()    do { } while ( 0 )
> +#define trace_irq_save_ret(_x)    do { } while ( 0 )
> +#define trace_irq_restore_ret(_x) do { } while ( 0 )
> +#define local_irq_disable()       _local_irq_disable()
> +#define local_irq_enable()        _local_irq_enable()
> +#define local_irq_save(_x)        _local_irq_save(_x)
> +#define local_irq_restore(_x)     _local_irq_restore(_x)
> +#endif /* TRACE_IRQSOFF */

None of this looks to be arch-specific.

Jan
Julien Grall June 9, 2017, 10:51 a.m. UTC | #10
On 07/06/17 16:22, Dario Faggioli wrote:
> On Wed, 2017-06-07 at 12:16 +0100, Julien Grall wrote:
>> Hi Dario,
>>
> Hi,
>
>> On 01/06/17 18:34, Dario Faggioli wrote:
>>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
>>> index 2a06406..33b903e 100644
>>> --- a/xen/common/spinlock.c
>>> +++ b/xen/common/spinlock.c
>>> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>>>  void _spin_lock_irq(spinlock_t *lock)
>>>  {
>>>      ASSERT(local_irq_is_enabled());
>>> -    local_irq_disable();
>>> +    _local_irq_disable();
>>> +    if ( unlikely(tb_init_done) )
>>
>> __trace_var already contain a "if ( !tb_init_done )". It sounds
>> pointless to do it twice, and also I think it is not obvious to
>> understand the meaning of the check (i.e what is tb_init_done).
>>
>> Would it be possible to hide this check and avoid check tb_init_done
>> twice?
>>
> I totally agree. And in fact, in another patch, I remove the
> tb_init_done check in __trace_var(). Reason behind this is that all the
> callers of __trace_var() (the main one being trace_var()), already
> checks for tb_init_done themselves.
>
> In fact, the explicit check in the caller, is the (only) basis on which
> one decides to call either trace_var() or __trace_var().
>
> This here is one of the call sites where I think the check is better
> done in the caller.

I admit, it is a bit confusing to require the caller to check 
tb_init_done. It adds more code and IHMO counterintuitive.

>
>>> diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-
>>> arm/arm32/system.h
>>> index c617b40..20871ad 100644
>>> --- a/xen/include/asm-arm/arm32/system.h
>>> +++ b/xen/include/asm-arm/arm32/system.h
>>> @@ -4,6 +4,8 @@
>>>
>>>  #include <asm/arm32/cmpxchg.h>
>>>
>>> +#include <xen/trace.h>
>>> +
>>>  #define local_irq_disable() asm volatile ( "cpsid i @
>>> local_irq_disable\n" : : : "cc" )
>>>  #define local_irq_enable()  asm volatile ( "cpsie i @
>>> local_irq_enable\n" : : : "cc" )
>>>
>>> @@ -41,6 +43,16 @@ static inline int local_irq_is_enabled(void)
>>>  #define local_abort_enable() __asm__("cpsie a  @ __sta\n" : : :
>>> "memory", "cc")
>>>  #define local_abort_disable() __asm__("cpsid a @ __sta\n" : : :
>>> "memory", "cc")
>>>
>>> +/* We do not support tracing (at all) yet */
>>
>> I know that some bits are missing for ARM, but the patch #5
>> contradicts
>> this comment as you have CONFIG_TRACE=y by default.
>>
> No sure what you mean. Tracing is de facto on by default right now,
> despite it not being implemented for ARM. So what I'm doing is
> basically keeping things as they are.
>
> If you think it should be off by default, well, let's talk about it...
> but I'm not sure this is really what you are saying/asking.

I mean that CONFIG_TRACE=y by default gives the impression that tracing 
is supported on ARM. This is not the case, hence your comment. So I 
think it should be off my default until we get the support. BTW, it is 
only a couple of simple patches but they never made upstreamed :/

>>> +#define trace_irq_disable_ret()   do { } while ( 0 )
>>> +#define trace_irq_enable_ret()    do { } while ( 0 )
>>> +#define trace_irq_save_ret(_x)    do { } while ( 0 )
>>> +#define trace_irq_restore_ret(_x) do { } while ( 0 )
>>> +#define _local_irq_disable()      local_irq_disable()
>>> +#define _local_irq_enable()       local_irq_enable()
>>> +#define _local_irq_save(_x)       local_irq_save(_x)
>>> +#define _local_irq_restore(_x)    local_irq_restore(_x)
>>> +
>>
>> This does not need to be duplicated in both asm-
>> arm/arm{32,64}/system.h.
>> You could directly implement them in asm-arm/system.h.
>>
> Ok, thanks.
>
>> Also, I would prefer if you stay consistent with x86. I.e non-
>> underscore
>> versions are calling the underscore versions and not the invert.
>>
> Well, I know it is counterintuitive, but it's the easiest way of
> getting over this, i.e., without the need of a lot of stubs, and
> touching as few existing code as possible.

Can you detail here? The only things I was asking is to do:

#define _local_irq_disable() ......

#define local_irq_disable() _local_irq_disable.

The callers would still use local_irq_disable() and the code would be 
consistent with x86.

Cheers,
Julien Grall June 9, 2017, 10:53 a.m. UTC | #11
On 09/06/17 11:51, Julien Grall wrote:
>
>
> On 07/06/17 16:22, Dario Faggioli wrote:
>> On Wed, 2017-06-07 at 12:16 +0100, Julien Grall wrote:
>>> Hi Dario,
>>>
>> Hi,
>>
>>> On 01/06/17 18:34, Dario Faggioli wrote:
>>>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
>>>> index 2a06406..33b903e 100644
>>>> --- a/xen/common/spinlock.c
>>>> +++ b/xen/common/spinlock.c
>>>> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>>>>  void _spin_lock_irq(spinlock_t *lock)
>>>>  {
>>>>      ASSERT(local_irq_is_enabled());
>>>> -    local_irq_disable();
>>>> +    _local_irq_disable();
>>>> +    if ( unlikely(tb_init_done) )
>>>
>>> __trace_var already contain a "if ( !tb_init_done )". It sounds
>>> pointless to do it twice, and also I think it is not obvious to
>>> understand the meaning of the check (i.e what is tb_init_done).
>>>
>>> Would it be possible to hide this check and avoid check tb_init_done
>>> twice?
>>>
>> I totally agree. And in fact, in another patch, I remove the
>> tb_init_done check in __trace_var(). Reason behind this is that all the
>> callers of __trace_var() (the main one being trace_var()), already
>> checks for tb_init_done themselves.
>>
>> In fact, the explicit check in the caller, is the (only) basis on which
>> one decides to call either trace_var() or __trace_var().
>>
>> This here is one of the call sites where I think the check is better
>> done in the caller.
>
> I admit, it is a bit confusing to require the caller to check
> tb_init_done. It adds more code and IHMO counterintuitive.

BTW, I am not responsible of this code, so if the REST maintainers are 
happy with that then fine.

>
>>
>>>> diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-
>>>> arm/arm32/system.h
>>>> index c617b40..20871ad 100644
>>>> --- a/xen/include/asm-arm/arm32/system.h
>>>> +++ b/xen/include/asm-arm/arm32/system.h
>>>> @@ -4,6 +4,8 @@
>>>>
>>>>  #include <asm/arm32/cmpxchg.h>
>>>>
>>>> +#include <xen/trace.h>
>>>> +
>>>>  #define local_irq_disable() asm volatile ( "cpsid i @
>>>> local_irq_disable\n" : : : "cc" )
>>>>  #define local_irq_enable()  asm volatile ( "cpsie i @
>>>> local_irq_enable\n" : : : "cc" )
>>>>
>>>> @@ -41,6 +43,16 @@ static inline int local_irq_is_enabled(void)
>>>>  #define local_abort_enable() __asm__("cpsie a  @ __sta\n" : : :
>>>> "memory", "cc")
>>>>  #define local_abort_disable() __asm__("cpsid a @ __sta\n" : : :
>>>> "memory", "cc")
>>>>
>>>> +/* We do not support tracing (at all) yet */
>>>
>>> I know that some bits are missing for ARM, but the patch #5
>>> contradicts
>>> this comment as you have CONFIG_TRACE=y by default.
>>>
>> No sure what you mean. Tracing is de facto on by default right now,
>> despite it not being implemented for ARM. So what I'm doing is
>> basically keeping things as they are.
>>
>> If you think it should be off by default, well, let's talk about it...
>> but I'm not sure this is really what you are saying/asking.
>
> I mean that CONFIG_TRACE=y by default gives the impression that tracing
> is supported on ARM. This is not the case, hence your comment. So I
> think it should be off my default until we get the support. BTW, it is
> only a couple of simple patches but they never made upstreamed :/
>
>>>> +#define trace_irq_disable_ret()   do { } while ( 0 )
>>>> +#define trace_irq_enable_ret()    do { } while ( 0 )
>>>> +#define trace_irq_save_ret(_x)    do { } while ( 0 )
>>>> +#define trace_irq_restore_ret(_x) do { } while ( 0 )
>>>> +#define _local_irq_disable()      local_irq_disable()
>>>> +#define _local_irq_enable()       local_irq_enable()
>>>> +#define _local_irq_save(_x)       local_irq_save(_x)
>>>> +#define _local_irq_restore(_x)    local_irq_restore(_x)
>>>> +
>>>
>>> This does not need to be duplicated in both asm-
>>> arm/arm{32,64}/system.h.
>>> You could directly implement them in asm-arm/system.h.
>>>
>> Ok, thanks.
>>
>>> Also, I would prefer if you stay consistent with x86. I.e non-
>>> underscore
>>> versions are calling the underscore versions and not the invert.
>>>
>> Well, I know it is counterintuitive, but it's the easiest way of
>> getting over this, i.e., without the need of a lot of stubs, and
>> touching as few existing code as possible.
>
> Can you detail here? The only things I was asking is to do:
>
> #define _local_irq_disable() ......
>
> #define local_irq_disable() _local_irq_disable.
>
> The callers would still use local_irq_disable() and the code would be
> consistent with x86.
>
> Cheers,
>
George Dunlap June 9, 2017, 10:55 a.m. UTC | #12
On 09/06/17 11:51, Julien Grall wrote:
> 
> 
> On 07/06/17 16:22, Dario Faggioli wrote:
>> On Wed, 2017-06-07 at 12:16 +0100, Julien Grall wrote:
>>> Hi Dario,
>>>
>> Hi,
>>
>>> On 01/06/17 18:34, Dario Faggioli wrote:
>>>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
>>>> index 2a06406..33b903e 100644
>>>> --- a/xen/common/spinlock.c
>>>> +++ b/xen/common/spinlock.c
>>>> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>>>>  void _spin_lock_irq(spinlock_t *lock)
>>>>  {
>>>>      ASSERT(local_irq_is_enabled());
>>>> -    local_irq_disable();
>>>> +    _local_irq_disable();
>>>> +    if ( unlikely(tb_init_done) )
>>>
>>> __trace_var already contain a "if ( !tb_init_done )". It sounds
>>> pointless to do it twice, and also I think it is not obvious to
>>> understand the meaning of the check (i.e what is tb_init_done).
>>>
>>> Would it be possible to hide this check and avoid check tb_init_done
>>> twice?
>>>
>> I totally agree. And in fact, in another patch, I remove the
>> tb_init_done check in __trace_var(). Reason behind this is that all the
>> callers of __trace_var() (the main one being trace_var()), already
>> checks for tb_init_done themselves.
>>
>> In fact, the explicit check in the caller, is the (only) basis on which
>> one decides to call either trace_var() or __trace_var().
>>
>> This here is one of the call sites where I think the check is better
>> done in the caller.
> 
> I admit, it is a bit confusing to require the caller to check
> tb_init_done. It adds more code and IHMO counterintuitive.

Because a number of the traces include a certain amount if "marshalling"
work ahead of time; there's no point in making space for a struct on the
stack, carefully reading and sorting all kinds if things to put into it,
only to have the data ignored because tracing isn't enabled (which is by
far the most common case).

 -George
Julien Grall June 9, 2017, 11 a.m. UTC | #13
On 09/06/17 11:55, George Dunlap wrote:
> On 09/06/17 11:51, Julien Grall wrote:
>>
>>
>> On 07/06/17 16:22, Dario Faggioli wrote:
>>> On Wed, 2017-06-07 at 12:16 +0100, Julien Grall wrote:
>>>> Hi Dario,
>>>>
>>> Hi,
>>>
>>>> On 01/06/17 18:34, Dario Faggioli wrote:
>>>>> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
>>>>> index 2a06406..33b903e 100644
>>>>> --- a/xen/common/spinlock.c
>>>>> +++ b/xen/common/spinlock.c
>>>>> @@ -150,7 +150,9 @@ void _spin_lock(spinlock_t *lock)
>>>>>  void _spin_lock_irq(spinlock_t *lock)
>>>>>  {
>>>>>      ASSERT(local_irq_is_enabled());
>>>>> -    local_irq_disable();
>>>>> +    _local_irq_disable();
>>>>> +    if ( unlikely(tb_init_done) )
>>>>
>>>> __trace_var already contain a "if ( !tb_init_done )". It sounds
>>>> pointless to do it twice, and also I think it is not obvious to
>>>> understand the meaning of the check (i.e what is tb_init_done).
>>>>
>>>> Would it be possible to hide this check and avoid check tb_init_done
>>>> twice?
>>>>
>>> I totally agree. And in fact, in another patch, I remove the
>>> tb_init_done check in __trace_var(). Reason behind this is that all the
>>> callers of __trace_var() (the main one being trace_var()), already
>>> checks for tb_init_done themselves.
>>>
>>> In fact, the explicit check in the caller, is the (only) basis on which
>>> one decides to call either trace_var() or __trace_var().
>>>
>>> This here is one of the call sites where I think the check is better
>>> done in the caller.
>>
>> I admit, it is a bit confusing to require the caller to check
>> tb_init_done. It adds more code and IHMO counterintuitive.
>
> Because a number of the traces include a certain amount if "marshalling"
> work ahead of time; there's no point in making space for a struct on the
> stack, carefully reading and sorting all kinds if things to put into it,
> only to have the data ignored because tracing isn't enabled (which is by
> far the most common case).

At the moment all the callers have the semantic:

if ( !unlikely(tb_init_done) )
    trace_irq_*

So why not hiding that in the macro trace_irq_disable/trace_irq_enable...?

Cheers,
diff mbox

Patch

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 374c1c0..81910c9 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -98,7 +98,7 @@  config PERF_ARRAYS
 	---help---
 	  Enables software performance counter array histograms.
 
-config TRACING
+menuconfig TRACING
 	bool "Tracing"
 	default y
 	---help---
@@ -106,6 +106,15 @@  config TRACING
 	  in per-CPU ring buffers. The 'xentrace' tool can be used to read
 	  the buffers and dump the content on the disk.
 
+config TRACE_IRQSOFF
+	bool "Trace when IRQs are disabled and (re)enabled" if EXPERT = "y"
+	default n
+	depends on TRACING
+	---help---
+	  Makes it possible to generate events _every_ time IRQs are disabled
+          and (re)enabled, with also an indication of where that happened.
+          Note that this comes with high overead and produces huge amount of
+          tracing data.
 
 config VERBOSE_DEBUG
 	bool "Verbose debug messages"
diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..33b903e 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -150,7 +150,9 @@  void _spin_lock(spinlock_t *lock)
 void _spin_lock_irq(spinlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
-    local_irq_disable();
+    _local_irq_disable();
+    if ( unlikely(tb_init_done) )
+        trace_irq_disable_ret();
     _spin_lock(lock);
 }
 
@@ -158,7 +160,9 @@  unsigned long _spin_lock_irqsave(spinlock_t *lock)
 {
     unsigned long flags;
 
-    local_irq_save(flags);
+    _local_irq_save(flags);
+    if ( unlikely(tb_init_done) )
+        trace_irq_save_ret(flags);
     _spin_lock(lock);
     return flags;
 }
@@ -175,13 +179,17 @@  void _spin_unlock(spinlock_t *lock)
 void _spin_unlock_irq(spinlock_t *lock)
 {
     _spin_unlock(lock);
-    local_irq_enable();
+    if ( unlikely(tb_init_done) )
+        trace_irq_enable_ret();
+    _local_irq_enable();
 }
 
 void _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
 {
     _spin_unlock(lock);
-    local_irq_restore(flags);
+    if ( unlikely(tb_init_done) )
+        trace_irq_restore_ret(flags);
+    _local_irq_restore(flags);
 }
 
 int _spin_is_locked(spinlock_t *lock)
diff --git a/xen/common/trace.c b/xen/common/trace.c
index 2c18462..71202df 100644
--- a/xen/common/trace.c
+++ b/xen/common/trace.c
@@ -722,7 +722,12 @@  void __trace_var(u32 event, bool_t cycles, unsigned int extra,
     /* Read tb_init_done /before/ t_bufs. */
     smp_rmb();
 
-    spin_lock_irqsave(&this_cpu(t_lock), flags);
+    /*
+     * spin_lock_irqsave() would call local_irq_save(), which (may)
+     * call __trace_var(). Open code it to avoid recursing.
+     */
+    _local_irq_save(flags);
+    spin_lock(&this_cpu(t_lock));
 
     buf = this_cpu(t_bufs);
 
@@ -809,7 +814,8 @@  void __trace_var(u32 event, bool_t cycles, unsigned int extra,
     __insert_record(buf, event, extra, cycles, rec_size, extra_data);
 
 unlock:
-    spin_unlock_irqrestore(&this_cpu(t_lock), flags);
+    spin_unlock(&this_cpu(t_lock));
+    _local_irq_restore(flags);
 
     /* Notify trace buffer consumer that we've crossed the high water mark. */
     if ( likely(buf!=NULL)
diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h
index c617b40..20871ad 100644
--- a/xen/include/asm-arm/arm32/system.h
+++ b/xen/include/asm-arm/arm32/system.h
@@ -4,6 +4,8 @@ 
 
 #include <asm/arm32/cmpxchg.h>
 
+#include <xen/trace.h>
+
 #define local_irq_disable() asm volatile ( "cpsid i @ local_irq_disable\n" : : : "cc" )
 #define local_irq_enable()  asm volatile ( "cpsie i @ local_irq_enable\n" : : : "cc" )
 
@@ -41,6 +43,16 @@  static inline int local_irq_is_enabled(void)
 #define local_abort_enable() __asm__("cpsie a  @ __sta\n" : : : "memory", "cc")
 #define local_abort_disable() __asm__("cpsid a @ __sta\n" : : : "memory", "cc")
 
+/* We do not support tracing (at all) yet */
+#define trace_irq_disable_ret()   do { } while ( 0 )
+#define trace_irq_enable_ret()    do { } while ( 0 )
+#define trace_irq_save_ret(_x)    do { } while ( 0 )
+#define trace_irq_restore_ret(_x) do { } while ( 0 )
+#define _local_irq_disable()      local_irq_disable()
+#define _local_irq_enable()       local_irq_enable()
+#define _local_irq_save(_x)       local_irq_save(_x)
+#define _local_irq_restore(_x)    local_irq_restore(_x)
+
 static inline int local_fiq_is_enabled(void)
 {
     unsigned long flags;
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 2e2ee21..6603b0c 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -4,6 +4,8 @@ 
 
 #include <asm/arm64/cmpxchg.h>
 
+#include <xen/trace.h>
+
 /* Uses uimm4 as a bitmask to select the clearing of one or more of
  * the DAIF exception mask bits:
  * bit 3 selects the D mask,
@@ -44,6 +46,16 @@ 
         : "memory");                                             \
 })
 
+/* We do not support tracing (at all) yet */
+#define trace_irq_disable_ret()   do { } while ( 0 )
+#define trace_irq_enable_ret()    do { } while ( 0 )
+#define trace_irq_save_ret(_x)    do { } while ( 0 )
+#define trace_irq_restore_ret(_x) do { } while ( 0 )
+#define _local_irq_disable()      local_irq_disable()
+#define _local_irq_enable()       local_irq_enable()
+#define _local_irq_save(_x)       local_irq_save(_x)
+#define _local_irq_restore(_x)    local_irq_restore(_x)
+
 static inline int local_irq_is_enabled(void)
 {
     unsigned long flags;
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index eb498f5..0e7bf01 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -5,6 +5,8 @@ 
 #include <xen/bitops.h>
 #include <asm/processor.h>
 
+#include <xen/trace.h>
+
 #define read_sreg(name)                                         \
 ({  unsigned int __sel;                                         \
     asm volatile ( "mov %%" STR(name) ",%0" : "=r" (__sel) );   \
@@ -185,8 +187,8 @@  static always_inline unsigned long __xadd(
 #define set_mb(var, value) do { xchg(&var, value); } while (0)
 #define set_wmb(var, value) do { var = value; wmb(); } while (0)
 
-#define local_irq_disable()     asm volatile ( "cli" : : : "memory" )
-#define local_irq_enable()      asm volatile ( "sti" : : : "memory" )
+#define _local_irq_disable()    asm volatile ( "cli" : : : "memory" )
+#define _local_irq_enable()     asm volatile ( "sti" : : : "memory" )
 
 /* used in the idle loop; sti takes one instruction cycle to complete */
 #define safe_halt()     asm volatile ( "sti; hlt" : : : "memory" )
@@ -198,12 +200,12 @@  static always_inline unsigned long __xadd(
     BUILD_BUG_ON(sizeof(x) != sizeof(long));                     \
     asm volatile ( "pushf" __OS " ; pop" __OS " %0" : "=g" (x)); \
 })
-#define local_irq_save(x)                                        \
+#define _local_irq_save(x)                                       \
 ({                                                               \
     local_save_flags(x);                                         \
-    local_irq_disable();                                         \
+    _local_irq_disable();                                        \
 })
-#define local_irq_restore(x)                                     \
+#define _local_irq_restore(x)                                    \
 ({                                                               \
     BUILD_BUG_ON(sizeof(x) != sizeof(long));                     \
     asm volatile ( "pushfq\n\t"                                  \
@@ -214,6 +216,79 @@  static always_inline unsigned long __xadd(
                        "ri" ( (x) & X86_EFLAGS_IF ) );           \
 })
 
+#ifdef CONFIG_TRACE_IRQSOFF
+
+#define TRACE_LOCAL_ADDR ((uint64_t) current_text_addr())
+#define TRACE_RET_ADDR   ((uint64_t) __builtin_return_address(0))
+
+#define trace_irq_disable(_a)                                    \
+({                                                               \
+    uint64_t addr = _a;                                          \
+    __trace_var(TRC_HW_IRQ_DISABLE, 1, sizeof(addr), &addr);     \
+})
+#define trace_irq_enable(_a)                                     \
+({                                                               \
+    uint64_t addr = _a;                                          \
+    __trace_var(TRC_HW_IRQ_ENABLE, 1, sizeof(addr), &addr);      \
+})
+#define trace_irq_save(_x, _a)                                   \
+({                                                               \
+    uint64_t addr = _a;                                          \
+    if ( _x & X86_EFLAGS_IF )                                    \
+        __trace_var(TRC_HW_IRQ_DISABLE, 1, sizeof(addr), &addr); \
+})
+#define trace_irq_restore(_x, _a)                                \
+({                                                               \
+    uint64_t addr = _a;                                          \
+    if ( _x & X86_EFLAGS_IF )                                    \
+        __trace_var(TRC_HW_IRQ_ENABLE, 1, sizeof(addr), &addr);  \
+})
+
+#define trace_irq_disable_ret()   trace_irq_disable(TRACE_RET_ADDR)
+#define trace_irq_enable_ret()    trace_irq_enable(TRACE_RET_ADDR)
+#define trace_irq_save_ret(_x)    trace_irq_save(_x, TRACE_RET_ADDR)
+#define trace_irq_restore_ret(_x) trace_irq_restore(_x, TRACE_RET_ADDR)
+
+#define local_irq_disable()                      \
+({                                               \
+    bool_t irqon = local_irq_is_enabled();       \
+    _local_irq_disable();                        \
+    if ( unlikely(tb_init_done && irqon) )       \
+        trace_irq_disable(TRACE_LOCAL_ADDR);     \
+})
+
+#define local_irq_enable()                       \
+({                                               \
+    if ( unlikely(tb_init_done) )                \
+        trace_irq_enable(TRACE_LOCAL_ADDR);      \
+    _local_irq_enable();                         \
+})
+
+#define local_irq_save(_x)                       \
+({                                               \
+    local_save_flags(_x);                        \
+    _local_irq_disable();                        \
+    if ( unlikely(tb_init_done) )                \
+        trace_irq_save(_x, TRACE_LOCAL_ADDR);    \
+})
+
+#define local_irq_restore(_x)                    \
+({                                               \
+    if ( unlikely(tb_init_done) )                \
+        trace_irq_restore(_x, TRACE_LOCAL_ADDR); \
+    _local_irq_restore(_x);                      \
+})
+#else /* !TRACE_IRQSOFF */
+#define trace_irq_disable_ret()   do { } while ( 0 )
+#define trace_irq_enable_ret()    do { } while ( 0 )
+#define trace_irq_save_ret(_x)    do { } while ( 0 )
+#define trace_irq_restore_ret(_x) do { } while ( 0 )
+#define local_irq_disable()       _local_irq_disable()
+#define local_irq_enable()        _local_irq_enable()
+#define local_irq_save(_x)        _local_irq_save(_x)
+#define local_irq_restore(_x)     _local_irq_restore(_x)
+#endif /* TRACE_IRQSOFF */
+
 static inline int local_irq_is_enabled(void)
 {
     unsigned long flags;
diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h
index f66a7af..1692a79 100644
--- a/xen/include/public/trace.h
+++ b/xen/include/public/trace.h
@@ -275,6 +275,8 @@ 
 #define TRC_HW_IRQ_ENTER              (TRC_HW_IRQ + 0xA)
 #define TRC_HW_IRQ_GUEST              (TRC_HW_IRQ + 0xB)
 #define TRC_HW_IRQ_EXIT               (TRC_HW_IRQ + 0xC)
+#define TRC_HW_IRQ_DISABLE            (TRC_HW_IRQ + 0xD)
+#define TRC_HW_IRQ_ENABLE             (TRC_HW_IRQ + 0xE)
 
 /*
  * Event Flags
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index 35657c5..04f50e5 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -73,14 +73,19 @@  static inline void _read_lock(rwlock_t *lock)
 static inline void _read_lock_irq(rwlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
-    local_irq_disable();
+    _local_irq_disable();
+    if ( unlikely(tb_init_done) )
+        trace_irq_disable_ret();
     _read_lock(lock);
 }
 
 static inline unsigned long _read_lock_irqsave(rwlock_t *lock)
 {
     unsigned long flags;
-    local_irq_save(flags);
+
+    _local_irq_save(flags);
+    if ( unlikely(tb_init_done) )
+        trace_irq_save_ret(flags);
     _read_lock(lock);
     return flags;
 }
@@ -100,13 +105,17 @@  static inline void _read_unlock(rwlock_t *lock)
 static inline void _read_unlock_irq(rwlock_t *lock)
 {
     _read_unlock(lock);
-    local_irq_enable();
+    if ( unlikely(tb_init_done) )
+        trace_irq_enable_ret();
+    _local_irq_enable();
 }
 
 static inline void _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
     _read_unlock(lock);
-    local_irq_restore(flags);
+    if ( unlikely(tb_init_done) )
+        trace_irq_restore_ret(flags);
+    _local_irq_restore(flags);
 }
 
 static inline int _rw_is_locked(rwlock_t *lock)
@@ -130,7 +139,9 @@  static inline void _write_lock(rwlock_t *lock)
 static inline void _write_lock_irq(rwlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
-    local_irq_disable();
+    _local_irq_disable();
+    if ( unlikely(tb_init_done) )
+        trace_irq_disable_ret();
     _write_lock(lock);
 }
 
@@ -138,7 +149,9 @@  static inline unsigned long _write_lock_irqsave(rwlock_t *lock)
 {
     unsigned long flags;
 
-    local_irq_save(flags);
+    _local_irq_save(flags);
+    if ( unlikely(tb_init_done) )
+        trace_irq_save_ret(flags);
     _write_lock(lock);
     return flags;
 }
@@ -171,13 +184,17 @@  static inline void _write_unlock(rwlock_t *lock)
 static inline void _write_unlock_irq(rwlock_t *lock)
 {
     _write_unlock(lock);
-    local_irq_enable();
+    if ( unlikely(tb_init_done) )
+        trace_irq_enable_ret();
+    _local_irq_enable();
 }
 
 static inline void _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
     _write_unlock(lock);
-    local_irq_restore(flags);
+    if ( unlikely(tb_init_done) )
+        trace_irq_restore_ret(flags);
+    _local_irq_restore(flags);
 }
 
 static inline int _rw_is_write_locked(rwlock_t *lock)