diff mbox series

[v2,3/3] trace: Example of "centralized" recorder tracing

Message ID 20200626162706.3304357-4-dinechin@redhat.com (mailing list archive)
State New, archived
Headers show
Series trace: Add a trace backend for the recorder library | expand

Commit Message

Christophe de Dinechin June 26, 2020, 4:27 p.m. UTC
This is an example showing how the recorder can be used to have one
"topic" covering multiple entries. Here, the topic is "lock".

Here are a few use cases:

- Checking locks:
    RECORDER_TRACES=lock qemu
- Graphic visualization of locks:
    RECORDER_TRACES="lock=state,id" qemu &
    recorder_scope state
    <Hit the 't' key to toggle timing display>
    <Hit the 'c' key to dump the screen data as CSV>
    cat recorder_scope_data-1.csv

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
---
 util/qemu-thread-common.h | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Daniel P. Berrangé June 30, 2020, 12:41 p.m. UTC | #1
On Fri, Jun 26, 2020 at 06:27:06PM +0200, Christophe de Dinechin wrote:
> This is an example showing how the recorder can be used to have one
> "topic" covering multiple entries. Here, the topic is "lock".
> 
> Here are a few use cases:
> 
> - Checking locks:
>     RECORDER_TRACES=lock qemu
> - Graphic visualization of locks:
>     RECORDER_TRACES="lock=state,id" qemu &
>     recorder_scope state
>     <Hit the 't' key to toggle timing display>
>     <Hit the 'c' key to dump the screen data as CSV>
>     cat recorder_scope_data-1.csv
> 
> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
> ---
>  util/qemu-thread-common.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/util/qemu-thread-common.h b/util/qemu-thread-common.h
> index 2af6b12085..0de07a471f 100644
> --- a/util/qemu-thread-common.h
> +++ b/util/qemu-thread-common.h
> @@ -15,6 +15,9 @@
>  
>  #include "qemu/thread.h"
>  #include "trace.h"
> +#include "trace/recorder.h"
> +
> +RECORDER_DEFINE(lock, 16, "Lock state");
>  
>  static inline void qemu_mutex_post_init(QemuMutex *mutex)
>  {
> @@ -23,12 +26,14 @@ static inline void qemu_mutex_post_init(QemuMutex *mutex)
>      mutex->line = 0;
>  #endif
>      mutex->initialized = true;
> +    record(lock, "Init state %d for %p", -1, mutex);
>  }
>  
>  static inline void qemu_mutex_pre_lock(QemuMutex *mutex,
>                                         const char *file, int line)
>  {
>      trace_qemu_mutex_lock(mutex, file, line);
> +    record(lock, "Locking state %d for %p", 1, mutex);
>  }
>  
>  static inline void qemu_mutex_post_lock(QemuMutex *mutex,
> @@ -39,6 +44,7 @@ static inline void qemu_mutex_post_lock(QemuMutex *mutex,
>      mutex->line = line;
>  #endif
>      trace_qemu_mutex_locked(mutex, file, line);
> +    record(lock, "Locked state %d for %p", 2, mutex);
>  }
>  
>  static inline void qemu_mutex_pre_unlock(QemuMutex *mutex,
> @@ -49,6 +55,7 @@ static inline void qemu_mutex_pre_unlock(QemuMutex *mutex,
>      mutex->line = 0;
>  #endif
>      trace_qemu_mutex_unlock(mutex, file, line);
> +    record(lock, "Unkocked state %d for %p", 0, mutex);
>  }

IMHO the whole point of having the pluggable trace backend impls, is
precisely that we don't have to add multiple different calls in the
code. A single trace_qemu_mutex_unlock() is supposed to work with
any backend.

Regards,
Daniel
Stefan Hajnoczi June 30, 2020, 12:58 p.m. UTC | #2
On Fri, Jun 26, 2020 at 06:27:06PM +0200, Christophe de Dinechin wrote:
> This is an example showing how the recorder can be used to have one
> "topic" covering multiple entries. Here, the topic is "lock".
> 
> Here are a few use cases:
> 
> - Checking locks:
>     RECORDER_TRACES=lock qemu
> - Graphic visualization of locks:
>     RECORDER_TRACES="lock=state,id" qemu &
>     recorder_scope state
>     <Hit the 't' key to toggle timing display>
>     <Hit the 'c' key to dump the screen data as CSV>
>     cat recorder_scope_data-1.csv
> 
> Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
> ---
>  util/qemu-thread-common.h | 7 +++++++
>  1 file changed, 7 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Stefan Hajnoczi July 1, 2020, 4:09 p.m. UTC | #3
On Tue, Jun 30, 2020 at 01:41:36PM +0100, Daniel P. Berrangé wrote:
> On Fri, Jun 26, 2020 at 06:27:06PM +0200, Christophe de Dinechin wrote:
> IMHO the whole point of having the pluggable trace backend impls, is
> precisely that we don't have to add multiple different calls in the
> code. A single trace_qemu_mutex_unlock() is supposed to work with
> any backend.

I think an exception is okay when the other trace backends do not offer
equivalent functionality.

Who knows if anyone other than Christophe will use this functionality,
but it doesn't cost much to allow it.

Stefan
Daniel P. Berrangé July 1, 2020, 4:15 p.m. UTC | #4
On Wed, Jul 01, 2020 at 05:09:06PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 30, 2020 at 01:41:36PM +0100, Daniel P. Berrangé wrote:
> > On Fri, Jun 26, 2020 at 06:27:06PM +0200, Christophe de Dinechin wrote:
> > IMHO the whole point of having the pluggable trace backend impls, is
> > precisely that we don't have to add multiple different calls in the
> > code. A single trace_qemu_mutex_unlock() is supposed to work with
> > any backend.
> 
> I think an exception is okay when the other trace backends do not offer
> equivalent functionality.
> 
> Who knows if anyone other than Christophe will use this functionality,
> but it doesn't cost much to allow it.

This patch is just an example though, suggesting this kind of usage is
expected to done in other current trace probe locations. The trace wrapper
has most of the information required already including a format string,
so I'd think it could be wired up to the generator so we don't add extra
record() statements through the codebase. At most it should require an
extra annotation in the trace-events file to take the extra parameter
for grouping, and other trace backends can ignore that.

Regards,
Daniel
Stefan Hajnoczi July 2, 2020, 1:47 p.m. UTC | #5
On Wed, Jul 01, 2020 at 05:15:01PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 01, 2020 at 05:09:06PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 30, 2020 at 01:41:36PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, Jun 26, 2020 at 06:27:06PM +0200, Christophe de Dinechin wrote:
> > > IMHO the whole point of having the pluggable trace backend impls, is
> > > precisely that we don't have to add multiple different calls in the
> > > code. A single trace_qemu_mutex_unlock() is supposed to work with
> > > any backend.
> > 
> > I think an exception is okay when the other trace backends do not offer
> > equivalent functionality.
> > 
> > Who knows if anyone other than Christophe will use this functionality,
> > but it doesn't cost much to allow it.
> 
> This patch is just an example though, suggesting this kind of usage is
> expected to done in other current trace probe locations. The trace wrapper
> has most of the information required already including a format string,
> so I'd think it could be wired up to the generator so we don't add extra
> record() statements through the codebase. At most it should require an
> extra annotation in the trace-events file to take the extra parameter
> for grouping, and other trace backends can ignore that.

It's true, it may be possible to put this functionality in the
trace-events.

Christophe: how does this differ from regular trace events and what
extra information is needed?

Stefan
Christophe de Dinechin July 3, 2020, 10:12 a.m. UTC | #6
On 2020-07-02 at 15:47 CEST, Stefan Hajnoczi wrote...
> On Wed, Jul 01, 2020 at 05:15:01PM +0100, Daniel P. Berrangé wrote:
>> On Wed, Jul 01, 2020 at 05:09:06PM +0100, Stefan Hajnoczi wrote:
>> > On Tue, Jun 30, 2020 at 01:41:36PM +0100, Daniel P. Berrangé wrote:
>> > > On Fri, Jun 26, 2020 at 06:27:06PM +0200, Christophe de Dinechin wrote:
>> > > IMHO the whole point of having the pluggable trace backend impls, is
>> > > precisely that we don't have to add multiple different calls in the
>> > > code. A single trace_qemu_mutex_unlock() is supposed to work with
>> > > any backend.
>> >
>> > I think an exception is okay when the other trace backends do not offer
>> > equivalent functionality.
>> >
>> > Who knows if anyone other than Christophe will use this functionality,
>> > but it doesn't cost much to allow it.
>>
>> This patch is just an example though, suggesting this kind of usage is
>> expected to done in other current trace probe locations. The trace wrapper
>> has most of the information required already including a format string,
>> so I'd think it could be wired up to the generator so we don't add extra
>> record() statements through the codebase.

The primary purpose of the recorder is post-mortem dumps, flight recorder
style. Tracing is only a secondary benefit. Not sure if it's worth making a
distinction between events you want to record and those you want to trace.
(Example: You might want to record all command line options, but almost
never trace them)

> At most it should require an
>> extra annotation in the trace-events file to take the extra parameter
>> for grouping, and other trace backends can ignore that.
>
> It's true, it may be possible to put this functionality in the
> trace-events.

Let me think more about integrating these features with other trace
backends. See below for short-term impact.

> Christophe: how does this differ from regular trace events and what
> extra information is needed?

- Grouping, as indicated above, mostly useful in practice to make selection
  of tracing topics easy (e.g. "modules") but also for real-time graphing,
  because typically a state change occurs in different functions, which is
  why I used locking as an example.

- Self-documentation. Right now, the recorder back-end generates rather
  unhelpful help messages.

- Trace buffer size. This is important to make post-mortem dumps usable if
  you record infrequent events alongside much higher-rate ones. For example,
  you may want a large buffer to record info about command-line option
  processing, the default 8 is definitely too small.

- Support for %+s, which tells that a string is safe to print later (e.g. it
  is a compile-time constant, or never ever freed).

- Support for custom formats, e.g. I use %v in the XL compiler for LLVM
  value pointers. This is a bit more advanced, but it would be neat to be
  able to print out QOM objects using %q :-)

For the short term, what about providing trace-named wrappers around these
additional recorder features?

--
Cheers,
Christophe de Dinechin (IRC c3d)
Daniel P. Berrangé July 3, 2020, 1:08 p.m. UTC | #7
On Fri, Jul 03, 2020 at 12:12:14PM +0200, Christophe de Dinechin wrote:
> 
> On 2020-07-02 at 15:47 CEST, Stefan Hajnoczi wrote...
> > On Wed, Jul 01, 2020 at 05:15:01PM +0100, Daniel P. Berrangé wrote:
> > At most it should require an
> >> extra annotation in the trace-events file to take the extra parameter
> >> for grouping, and other trace backends can ignore that.
> >
> > It's true, it may be possible to put this functionality in the
> > trace-events.
> 
> Let me think more about integrating these features with other trace
> backends. See below for short-term impact.
> 
> > Christophe: how does this differ from regular trace events and what
> > extra information is needed?
> 
> - Grouping, as indicated above, mostly useful in practice to make selection
>   of tracing topics easy (e.g. "modules") but also for real-time graphing,
>   because typically a state change occurs in different functions, which is
>   why I used locking as an example.

In many cases we already have a form of grouping via the backdoor by
using a common string prefix on trace point names. eg

qio_task_new(void *task, void *source, void *func, void *opaque) "Task new task=%p source=%p func=%p opaque=%p"
qio_task_complete(void *task) "Task complete task=%p"
qio_task_thread_start(void *task, void *worker, void *opaque) "Task thread start task=%p worker=%p opaque=%p"
qio_task_thread_run(void *task) "Task thread run task=%p"
qio_task_thread_exit(void *task) "Task thread exit task=%p"
qio_task_thread_result(void *task) "Task thread result task=%p"
qio_task_thread_source_attach(void *task, void *source) "Task thread source attach task=%p source=%p"
qio_task_thread_source_cancel(void *task, void *source) "Task thread source cancel task=%p source=%p"

We could turn this into an explicit multi-level grouping concept,
using "." separator for each level of the group. eg

qio.task.new(void *task, void *source, void *func, void *opaque) "Task new task=%p source=%p func=%p opaque=%p"
qio.task.complete(void *task) "Task complete task=%p"
qio.task.thread_start(void *task, void *worker, void *opaque) "Task thread start task=%p worker=%p opaque=%p"
qio.task.thread_run(void *task) "Task thread run task=%p"
qio.task.thread_exit(void *task) "Task thread exit task=%p"
qio.task.thread_result(void *task) "Task thread result task=%p"
qio.task.thread_source_attach(void *task, void *source) "Task thread source attach task=%p source=%p"
qio.task.thread_source_cancel(void *task, void *source) "Task thread source cancel task=%p source=%p"


For backends that dont support groups, the "." can be turned back
into a "_", and everything is unchanged. For backends that do support
groups, we now have the info required.

This would be useful for the systemtap backend because that has an
explicit grouping concept already. ie we have probes definitions
like:

probe qemu.system.x86_64.qio_task_new = process("/usr/bin/qemu-system-x86_64").mark("qio_task_new")
{
  task = $arg1;
  source = $arg2;
  func = $arg3;
  opaque = $arg4;
}


which we can now turn into

probe qemu.system.x86_64.qio.task.new = process("/usr/bin/qemu-system-x86_64").mark("qio_task_new")
{
  task = $arg1;
  source = $arg2;
  func = $arg3;
  opaque = $arg4;
}


the generated C trace macro would still use "_" not "." of course
so  qio_task_new(src, func, opaque)

> - Self-documentation. Right now, the recorder back-end generates rather
>   unhelpful help messages.

Not sure I understand the problem here.

> - Trace buffer size. This is important to make post-mortem dumps usable if
>   you record infrequent events alongside much higher-rate ones. For example,
>   you may want a large buffer to record info about command-line option
>   processing, the default 8 is definitely too small.

Sure, that's a problem for all the trace backends. They all need to be
good at filtering / controlling what probes are fired to avoid signal
getting lost in noise.  The grouping feature would certainly help with
that making it easier to control what it turned on/off.

> - Support for %+s, which tells that a string is safe to print later (e.g. it
>   is a compile-time constant, or never ever freed).

Seems we should be able to support that in the trace-events format string.
Any other backend can turn "%+s" back into a normal "%s".

> - Support for custom formats, e.g. I use %v in the XL compiler for LLVM
>   value pointers. This is a bit more advanced, but it would be neat to be
>   able to print out QOM objects using %q :-)

Seems like a similar trick can be used here. The %v/%q could be turned
back into a plain %p for backends which don't have the pretty-printing
support.

Regards,
Daniel
Christophe de Dinechin July 3, 2020, 4:45 p.m. UTC | #8
On 2020-07-03 at 15:08 CEST, Daniel P. Berrangé wrote...
> On Fri, Jul 03, 2020 at 12:12:14PM +0200, Christophe de Dinechin wrote:
>>
>> On 2020-07-02 at 15:47 CEST, Stefan Hajnoczi wrote...
>> > On Wed, Jul 01, 2020 at 05:15:01PM +0100, Daniel P. Berrangé wrote:
>> > At most it should require an
>> >> extra annotation in the trace-events file to take the extra parameter
>> >> for grouping, and other trace backends can ignore that.
>> >
>> > It's true, it may be possible to put this functionality in the
>> > trace-events.
>>
>> Let me think more about integrating these features with other trace
>> backends. See below for short-term impact.
>>
>> > Christophe: how does this differ from regular trace events and what
>> > extra information is needed?
>>
>> - Grouping, as indicated above, mostly useful in practice to make selection
>>   of tracing topics easy (e.g. "modules") but also for real-time graphing,
>>   because typically a state change occurs in different functions, which is
>>   why I used locking as an example.
>
> In many cases we already have a form of grouping via the backdoor by
> using a common string prefix on trace point names. eg
>
> qio_task_new(void *task, void *source, void *func, void *opaque) "Task new task=%p source=%p func=%p opaque=%p"
> qio_task_complete(void *task) "Task complete task=%p"
> qio_task_thread_start(void *task, void *worker, void *opaque) "Task thread start task=%p worker=%p opaque=%p"
> qio_task_thread_run(void *task) "Task thread run task=%p"
> qio_task_thread_exit(void *task) "Task thread exit task=%p"
> qio_task_thread_result(void *task) "Task thread result task=%p"
> qio_task_thread_source_attach(void *task, void *source) "Task thread source attach task=%p source=%p"
> qio_task_thread_source_cancel(void *task, void *source) "Task thread source cancel task=%p source=%p"
>
> We could turn this into an explicit multi-level grouping concept,
> using "." separator for each level of the group. eg

Interesting idea.

>
> qio.task.new(void *task, void *source, void *func, void *opaque) "Task new task=%p source=%p func=%p opaque=%p"
> qio.task.complete(void *task) "Task complete task=%p"
> qio.task.thread_start(void *task, void *worker, void *opaque) "Task thread start task=%p worker=%p opaque=%p"
> qio.task.thread_run(void *task) "Task thread run task=%p"
> qio.task.thread_exit(void *task) "Task thread exit task=%p"
> qio.task.thread_result(void *task) "Task thread result task=%p"
> qio.task.thread_source_attach(void *task, void *source) "Task thread source attach task=%p source=%p"
> qio.task.thread_source_cancel(void *task, void *source) "Task thread source cancel task=%p source=%p"
>
>
> For backends that dont support groups, the "." can be turned back
> into a "_", and everything is unchanged. For backends that do support
> groups, we now have the info required.
>
> This would be useful for the systemtap backend because that has an
> explicit grouping concept already. ie we have probes definitions
> like:
>
> probe qemu.system.x86_64.qio_task_new = process("/usr/bin/qemu-system-x86_64").mark("qio_task_new")
> {
>   task = $arg1;
>   source = $arg2;
>   func = $arg3;
>   opaque = $arg4;
> }
>
>
> which we can now turn into
>
> probe qemu.system.x86_64.qio.task.new = process("/usr/bin/qemu-system-x86_64").mark("qio_task_new")
> {
>   task = $arg1;
>   source = $arg2;
>   func = $arg3;
>   opaque = $arg4;
> }
>
>
> the generated C trace macro would still use "_" not "." of course
> so  qio_task_new(src, func, opaque)
>
>> - Self-documentation. Right now, the recorder back-end generates rather
>>   unhelpful help messages.
>
> Not sure I understand the problem here.

RECORDER_DEFINE specifies a text that is listed when you pass "help" as a
trace name. There is no equivalent concept that I'm aware of in the existing
qemu trace infrastructure.

>
>> - Trace buffer size. This is important to make post-mortem dumps usable if
>>   you record infrequent events alongside much higher-rate ones. For example,
>>   you may want a large buffer to record info about command-line option
>>   processing, the default 8 is definitely too small.
>
> Sure, that's a problem for all the trace backends. They all need to be
> good at filtering / controlling what probes are fired to avoid signal
> getting lost in noise.  The grouping feature would certainly help with
> that making it easier to control what it turned on/off.

Well, for the recorder it's a slightly different problem. When you do a
recorder post-mortem dump, entries are sorted and printed at that time. THis
makes it possible to have a number of recent traces for high-frequency
events, as well as some much older traces for low-frequency events.

For example, you can have a buffer that records command-line options, and
another that records events that occur typically once per second, and yet
another that records high-frequency events. Then, when your program crashes
and you look at a dump, you see a context that includes both recent
high-frequency stuff as well as much older low-frequency events.

See https://gitlab.com/c3d/recorder for examples.


>
>> - Support for %+s, which tells that a string is safe to print later (e.g. it
>>   is a compile-time constant, or never ever freed).
>
> Seems we should be able to support that in the trace-events format string.
> Any other backend can turn "%+s" back into a normal "%s".

Yes, but it's much more complicated than this simple patch series ;-) As is,
the goal of this series is to not touch any other back-end.

>
>> - Support for custom formats, e.g. I use %v in the XL compiler for LLVM
>>   value pointers. This is a bit more advanced, but it would be neat to be
>>   able to print out QOM objects using %q :-)
>
> Seems like a similar trick can be used here. The %v/%q could be turned
> back into a plain %p for backends which don't have the pretty-printing
> support.

Sure, but this is also touching the rest of the tracing infrastructure,
which was a definite non-goal for this specific series.

>
> Regards,
> Daniel


--
Cheers,
Christophe de Dinechin (IRC c3d)
diff mbox series

Patch

diff --git a/util/qemu-thread-common.h b/util/qemu-thread-common.h
index 2af6b12085..0de07a471f 100644
--- a/util/qemu-thread-common.h
+++ b/util/qemu-thread-common.h
@@ -15,6 +15,9 @@ 
 
 #include "qemu/thread.h"
 #include "trace.h"
+#include "trace/recorder.h"
+
+RECORDER_DEFINE(lock, 16, "Lock state");
 
 static inline void qemu_mutex_post_init(QemuMutex *mutex)
 {
@@ -23,12 +26,14 @@  static inline void qemu_mutex_post_init(QemuMutex *mutex)
     mutex->line = 0;
 #endif
     mutex->initialized = true;
+    record(lock, "Init state %d for %p", -1, mutex);
 }
 
 static inline void qemu_mutex_pre_lock(QemuMutex *mutex,
                                        const char *file, int line)
 {
     trace_qemu_mutex_lock(mutex, file, line);
+    record(lock, "Locking state %d for %p", 1, mutex);
 }
 
 static inline void qemu_mutex_post_lock(QemuMutex *mutex,
@@ -39,6 +44,7 @@  static inline void qemu_mutex_post_lock(QemuMutex *mutex,
     mutex->line = line;
 #endif
     trace_qemu_mutex_locked(mutex, file, line);
+    record(lock, "Locked state %d for %p", 2, mutex);
 }
 
 static inline void qemu_mutex_pre_unlock(QemuMutex *mutex,
@@ -49,6 +55,7 @@  static inline void qemu_mutex_pre_unlock(QemuMutex *mutex,
     mutex->line = 0;
 #endif
     trace_qemu_mutex_unlock(mutex, file, line);
+    record(lock, "Unkocked state %d for %p", 0, mutex);
 }
 
 #endif