[v3,net-next,4/4] net: switchdev: Add tracepoints

Message ID	20240223114453.335809-5-tobias@waldekranz.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57FC27B3F9 for <netdev@vger.kernel.org>; Fri, 23 Feb 2024 11:46:54 +0000 (UTC) From: Tobias Waldekranz <tobias@waldekranz.com> To: davem@davemloft.net, kuba@kernel.org Cc: roopa@nvidia.com, razor@blackwall.org, bridge@lists.linux.dev, netdev@vger.kernel.org, jiri@resnulli.us, ivecera@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH v3 net-next 4/4] net: switchdev: Add tracepoints Date: Fri, 23 Feb 2024 12:44:53 +0100 Message-Id: <20240223114453.335809-5-tobias@waldekranz.com> In-Reply-To: <20240223114453.335809-1-tobias@waldekranz.com> References: <20240223114453.335809-1-tobias@waldekranz.com> Precedence: bulk MIME-Version: 1.0 Organization: Addiva Elektronik Content-Transfer-Encoding: 8bit
Series	net: switchdev: Tracepoints \| expand [v3,net-next,0/4] net: switchdev: Tracepoints [v3,net-next,1/4] net: switchdev: Wrap enums in mapper macros [v3,net-next,2/4] net: switchdev: Add helpers to display switchdev objects as strings [v3,net-next,3/4] net: switchdev: Relay all replay messages through a central function [v3,net-next,4/4] net: switchdev: Add tracepoints

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 942 this patch: 942
netdev/build_tools	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	warning	3 maintainers not CCed: pabeni@redhat.com mathieu.desnoyers@efficios.com edumazet@google.com
netdev/build_clang	success	Errors and warnings before: 958 this patch: 958
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 959 this patch: 959
netdev/checkpatch	warning	CHECK: Alignment should match open parenthesis CHECK: Lines should not end with a '(' WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-02-23--18-00 (tests: 1457)

Tobias Waldekranz Feb. 23, 2024, 11:44 a.m. UTC

Add a basic set of tracepoints:

- switchdev_defer: Fires whenever an operation is enqueued to the
  switchdev workqueue for deferred delivery.

- switchdev_call_{atomic,blocking}: Fires whenever a notification is
  sent to the corresponding switchdev notifier chain.

- switchdev_call_replay: Fires whenever a notification is sent to a
  specific driver's notifier block, in response to a replay request.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 include/trace/events/switchdev.h | 74 ++++++++++++++++++++++++++++++++
 net/switchdev/switchdev.c        | 71 +++++++++++++++++++++++++-----
 2 files changed, 135 insertions(+), 10 deletions(-)
 create mode 100644 include/trace/events/switchdev.h

Steven Rostedt Feb. 23, 2024, 3:38 p.m. UTC | #1

On Fri, 23 Feb 2024 12:44:53 +0100
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> Add a basic set of tracepoints:
> 
> - switchdev_defer: Fires whenever an operation is enqueued to the
>   switchdev workqueue for deferred delivery.
> 
> - switchdev_call_{atomic,blocking}: Fires whenever a notification is
>   sent to the corresponding switchdev notifier chain.
> 
> - switchdev_call_replay: Fires whenever a notification is sent to a
>   specific driver's notifier block, in response to a replay request.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  include/trace/events/switchdev.h | 74 ++++++++++++++++++++++++++++++++
>  net/switchdev/switchdev.c        | 71 +++++++++++++++++++++++++-----
>  2 files changed, 135 insertions(+), 10 deletions(-)
>  create mode 100644 include/trace/events/switchdev.h
> 
> diff --git a/include/trace/events/switchdev.h b/include/trace/events/switchdev.h
> new file mode 100644
> index 000000000000..dcaf6870d017
> --- /dev/null
> +++ b/include/trace/events/switchdev.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM	switchdev
> +
> +#if !defined(_TRACE_SWITCHDEV_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_SWITCHDEV_H
> +
> +#include <linux/tracepoint.h>
> +#include <net/switchdev.h>
> +
> +#define SWITCHDEV_TRACE_MSG_MAX 128

128 bytes is awfully big to waste on the ring buffer. What's the average
size of a string?

> +
> +DECLARE_EVENT_CLASS(switchdev_call,
> +	TP_PROTO(unsigned long val,
> +		 const struct switchdev_notifier_info *info,
> +		 int err),
> +
> +	TP_ARGS(val, info, err),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, val)
> +		__string(dev, info->dev ? netdev_name(info->dev) : "(null)")
> +		__field(const struct switchdev_notifier_info *, info)
> +		__field(int, err)
> +		__array(char, msg, SWITCHDEV_TRACE_MSG_MAX)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->val = val;
> +		__assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)");
> +		__entry->info = info;
> +		__entry->err = err;
> +		switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX);

Is it possible to just store the information in the trace event and then
call the above function in the read stage? There's helpers to pass strings
around (namely a struct trace_seq *p).

It would require a plugin for libtraceevent if you want to expose it to
user space tools for trace-cmd and perf though.

Another possibility is if this event will not race with other events on he
same CPU, you could create a per-cpu buffer, write into that, and then use
__string() and __assign_str() to save it, as traces happen with preemption
disabled.

-- Steve

> +	),
> +
> +	TP_printk("dev %s %s -> %d", __get_str(dev), __entry->msg, __entry->err)
> +);
> +
> +DEFINE_EVENT(switchdev_call, switchdev_defer,
> +	TP_PROTO(unsigned long val,
> +		 const struct switchdev_notifier_info *info,
> +		 int err),
> +
> +	TP_ARGS(val, info, err)
> +);
> +
> +DEFINE_EVENT(switchdev_call, switchdev_call_atomic,
> +	TP_PROTO(unsigned long val,
> +		 const struct switchdev_notifier_info *info,
> +		 int err),
> +
> +	TP_ARGS(val, info, err)
> +);
> +
> +DEFINE_EVENT(switchdev_call, switchdev_call_blocking,
> +	TP_PROTO(unsigned long val,
> +		 const struct switchdev_notifier_info *info,
> +		 int err),
> +
> +	TP_ARGS(val, info, err)
> +);
> +
> +DEFINE_EVENT(switchdev_call, switchdev_call_replay,
> +	TP_PROTO(unsigned long val,
> +		 const struct switchdev_notifier_info *info,
> +		 int err),
> +
> +	TP_ARGS(val, info, err)
> +);
> +
> +#endif /* _TRACE_SWITCHDEV_H */
> +

Tobias Waldekranz Feb. 26, 2024, 1:05 p.m. UTC | #2

On fre, feb 23, 2024 at 10:38, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 23 Feb 2024 12:44:53 +0100
> Tobias Waldekranz <tobias@waldekranz.com> wrote:
>
>> Add a basic set of tracepoints:
>> 
>> - switchdev_defer: Fires whenever an operation is enqueued to the
>>   switchdev workqueue for deferred delivery.
>> 
>> - switchdev_call_{atomic,blocking}: Fires whenever a notification is
>>   sent to the corresponding switchdev notifier chain.
>> 
>> - switchdev_call_replay: Fires whenever a notification is sent to a
>>   specific driver's notifier block, in response to a replay request.
>> 
>> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>> ---
>>  include/trace/events/switchdev.h | 74 ++++++++++++++++++++++++++++++++
>>  net/switchdev/switchdev.c        | 71 +++++++++++++++++++++++++-----
>>  2 files changed, 135 insertions(+), 10 deletions(-)
>>  create mode 100644 include/trace/events/switchdev.h
>> 
>> diff --git a/include/trace/events/switchdev.h b/include/trace/events/switchdev.h
>> new file mode 100644
>> index 000000000000..dcaf6870d017
>> --- /dev/null
>> +++ b/include/trace/events/switchdev.h
>> @@ -0,0 +1,74 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM	switchdev
>> +
>> +#if !defined(_TRACE_SWITCHDEV_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_SWITCHDEV_H
>> +
>> +#include <linux/tracepoint.h>
>> +#include <net/switchdev.h>
>> +
>> +#define SWITCHDEV_TRACE_MSG_MAX 128
>
> 128 bytes is awfully big to waste on the ring buffer. What's the average
> size of a string?

I would say the typical message is around 60-80 bytes. Some common examples:

    PORT_OBJ_ADD PORT_VLAN(flags 0x0 orig br0) vid 1 flags 0x27
    PORT_OBJ_DEL HOST_MDB(flags 0x0 orig br0) vid 100 addr 33:33:ff:ff:00:09

The worst case I can think of currently is 95 characters:

    VXLAN_FDB_ADD_TO_DEVICE vid 1000 addr de:ad:be:ef:00:01 added_by_user is_local locked offloaded

>> +
>> +DECLARE_EVENT_CLASS(switchdev_call,
>> +	TP_PROTO(unsigned long val,
>> +		 const struct switchdev_notifier_info *info,
>> +		 int err),
>> +
>> +	TP_ARGS(val, info, err),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(unsigned long, val)
>> +		__string(dev, info->dev ? netdev_name(info->dev) : "(null)")
>> +		__field(const struct switchdev_notifier_info *, info)
>> +		__field(int, err)
>> +		__array(char, msg, SWITCHDEV_TRACE_MSG_MAX)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->val = val;
>> +		__assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)");
>> +		__entry->info = info;
>> +		__entry->err = err;
>> +		switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX);
>
> Is it possible to just store the information in the trace event and then
> call the above function in the read stage? There's helpers to pass strings
> around (namely a struct trace_seq *p).

I'm a complete novice when it comes to tracepoint internals. Am I right
in assuming that TP_printk may execute at a much later time than
TP_fast_assign? E.g., the object referenced by __entry->info may very
well have been freed by that time? That at least seems to be what my
naive refactor to replace __entry->msg with __get_buf() suggests :)

If so, the layout of the switchdev_notifier_* structs makes it a bit
cumbersome to clone the notification in the assign phase, as the size of
a specific notification (e.g., switchdev_notifier_port_obj_info) is not
known by the common notification (switchdev_notifier_info). In the case
of switchdev objects, the problem repeats a second time. E.g., the size
of switchdev_obj_port_vlan is not known by switchdev_obj.

> It would require a plugin for libtraceevent if you want to expose it to
> user space tools for trace-cmd and perf though.
>
> Another possibility is if this event will not race with other events on he
> same CPU, you could create a per-cpu buffer, write into that, and then use
> __string() and __assign_str() to save it, as traces happen with preemption
> disabled.

But bottom halves are still enabled I suppose? Notifications can be
generated both from process context (e.g., users configuring the bridge
with iproute2), and from bridge packet processing (e.g., adding new
neighbors to the FDB). So I don't think that would work in this case.

> -- Steve

Thanks for taking the time!

>> +	),
>> +
>> +	TP_printk("dev %s %s -> %d", __get_str(dev), __entry->msg, __entry->err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_defer,
>> +	TP_PROTO(unsigned long val,
>> +		 const struct switchdev_notifier_info *info,
>> +		 int err),
>> +
>> +	TP_ARGS(val, info, err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_call_atomic,
>> +	TP_PROTO(unsigned long val,
>> +		 const struct switchdev_notifier_info *info,
>> +		 int err),
>> +
>> +	TP_ARGS(val, info, err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_call_blocking,
>> +	TP_PROTO(unsigned long val,
>> +		 const struct switchdev_notifier_info *info,
>> +		 int err),
>> +
>> +	TP_ARGS(val, info, err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_call_replay,
>> +	TP_PROTO(unsigned long val,
>> +		 const struct switchdev_notifier_info *info,
>> +		 int err),
>> +
>> +	TP_ARGS(val, info, err)
>> +);
>> +
>> +#endif /* _TRACE_SWITCHDEV_H */
>> +

Paolo Abeni Feb. 27, 2024, 10:04 a.m. UTC | #3

On Fri, 2024-02-23 at 10:38 -0500, Steven Rostedt wrote:
> On Fri, 23 Feb 2024 12:44:53 +0100
> Tobias Waldekranz <tobias@waldekranz.com> wrote:
> 
> > Add a basic set of tracepoints:
> > 
> > - switchdev_defer: Fires whenever an operation is enqueued to the
> >   switchdev workqueue for deferred delivery.
> > 
> > - switchdev_call_{atomic,blocking}: Fires whenever a notification is
> >   sent to the corresponding switchdev notifier chain.
> > 
> > - switchdev_call_replay: Fires whenever a notification is sent to a
> >   specific driver's notifier block, in response to a replay request.
> > 
> > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> > ---
> >  include/trace/events/switchdev.h | 74 ++++++++++++++++++++++++++++++++
> >  net/switchdev/switchdev.c        | 71 +++++++++++++++++++++++++-----
> >  2 files changed, 135 insertions(+), 10 deletions(-)
> >  create mode 100644 include/trace/events/switchdev.h
> > 
> > diff --git a/include/trace/events/switchdev.h b/include/trace/events/switchdev.h
> > new file mode 100644
> > index 000000000000..dcaf6870d017
> > --- /dev/null
> > +++ b/include/trace/events/switchdev.h
> > @@ -0,0 +1,74 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM	switchdev
> > +
> > +#if !defined(_TRACE_SWITCHDEV_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_SWITCHDEV_H
> > +
> > +#include <linux/tracepoint.h>
> > +#include <net/switchdev.h>
> > +
> > +#define SWITCHDEV_TRACE_MSG_MAX 128
> 
> 128 bytes is awfully big to waste on the ring buffer. What's the average
> size of a string?
> 
> > +
> > +DECLARE_EVENT_CLASS(switchdev_call,
> > +	TP_PROTO(unsigned long val,
> > +		 const struct switchdev_notifier_info *info,
> > +		 int err),
> > +
> > +	TP_ARGS(val, info, err),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(unsigned long, val)
> > +		__string(dev, info->dev ? netdev_name(info->dev) : "(null)")
> > +		__field(const struct switchdev_notifier_info *, info)
> > +		__field(int, err)
> > +		__array(char, msg, SWITCHDEV_TRACE_MSG_MAX)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->val = val;
> > +		__assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)");
> > +		__entry->info = info;
> > +		__entry->err = err;
> > +		switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX);
> 
> Is it possible to just store the information in the trace event and then
> call the above function in the read stage?

I agree with Steven: it looks like that with the above code the
tracepoint itself will become measurably costily in terms of CPU
cycles: we want to avoid that.

Perhaps using different tracepoints with different notifier_block type
would help? so that each trace point could just copy a few specific
fields.

Cheers,

Paolo

Tobias Waldekranz Feb. 28, 2024, 10:47 a.m. UTC | #4

On tis, feb 27, 2024 at 11:04, Paolo Abeni <pabeni@redhat.com> wrote:
> On Fri, 2024-02-23 at 10:38 -0500, Steven Rostedt wrote:
>> On Fri, 23 Feb 2024 12:44:53 +0100
>> Tobias Waldekranz <tobias@waldekranz.com> wrote:
>> 
>> > Add a basic set of tracepoints:
>> > 
>> > - switchdev_defer: Fires whenever an operation is enqueued to the
>> >   switchdev workqueue for deferred delivery.
>> > 
>> > - switchdev_call_{atomic,blocking}: Fires whenever a notification is
>> >   sent to the corresponding switchdev notifier chain.
>> > 
>> > - switchdev_call_replay: Fires whenever a notification is sent to a
>> >   specific driver's notifier block, in response to a replay request.
>> > 
>> > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>> > ---
>> >  include/trace/events/switchdev.h | 74 ++++++++++++++++++++++++++++++++
>> >  net/switchdev/switchdev.c        | 71 +++++++++++++++++++++++++-----
>> >  2 files changed, 135 insertions(+), 10 deletions(-)
>> >  create mode 100644 include/trace/events/switchdev.h
>> > 
>> > diff --git a/include/trace/events/switchdev.h b/include/trace/events/switchdev.h
>> > new file mode 100644
>> > index 000000000000..dcaf6870d017
>> > --- /dev/null
>> > +++ b/include/trace/events/switchdev.h
>> > @@ -0,0 +1,74 @@
>> > +/* SPDX-License-Identifier: GPL-2.0 */
>> > +#undef TRACE_SYSTEM
>> > +#define TRACE_SYSTEM	switchdev
>> > +
>> > +#if !defined(_TRACE_SWITCHDEV_H) || defined(TRACE_HEADER_MULTI_READ)
>> > +#define _TRACE_SWITCHDEV_H
>> > +
>> > +#include <linux/tracepoint.h>
>> > +#include <net/switchdev.h>
>> > +
>> > +#define SWITCHDEV_TRACE_MSG_MAX 128
>> 
>> 128 bytes is awfully big to waste on the ring buffer. What's the average
>> size of a string?
>> 
>> > +
>> > +DECLARE_EVENT_CLASS(switchdev_call,
>> > +	TP_PROTO(unsigned long val,
>> > +		 const struct switchdev_notifier_info *info,
>> > +		 int err),
>> > +
>> > +	TP_ARGS(val, info, err),
>> > +
>> > +	TP_STRUCT__entry(
>> > +		__field(unsigned long, val)
>> > +		__string(dev, info->dev ? netdev_name(info->dev) : "(null)")
>> > +		__field(const struct switchdev_notifier_info *, info)
>> > +		__field(int, err)
>> > +		__array(char, msg, SWITCHDEV_TRACE_MSG_MAX)
>> > +	),
>> > +
>> > +	TP_fast_assign(
>> > +		__entry->val = val;
>> > +		__assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)");
>> > +		__entry->info = info;
>> > +		__entry->err = err;
>> > +		switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX);
>> 
>> Is it possible to just store the information in the trace event and then
>> call the above function in the read stage?
>
> I agree with Steven: it looks like that with the above code the
> tracepoint itself will become measurably costily in terms of CPU
> cycles: we want to avoid that.
>
> Perhaps using different tracepoints with different notifier_block type
> would help? so that each trace point could just copy a few specific
> fields.

This can be done, but you will end up having to duplicate the decoding
and formatting logic from switchdev-str.c, with the additional hurdle of
having to figure out the sizes of all referenced objects in order to
create flattened versions of every notification type.

What I like about the current approach is that when new notification and
object types are added, switchdev_notifier_str will automatically be
able to decode them and give you some rough idea of what is going on,
even if no new message specific decoding logic is added. It is also
reusable by drivers that might want to decode notifications or objects
in error messages.

Would some variant of (how I understand) Steven's suggestion to instead
store the formatted message in a dynamic array (__assign_str()), rather
than in the tracepoint entry, be acceptable?

Steven Rostedt Feb. 28, 2024, 2:56 p.m. UTC | #5

On Wed, 28 Feb 2024 11:47:24 +0100
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> >> > +	TP_fast_assign(
> >> > +		__entry->val = val;
> >> > +		__assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)");
> >> > +		__entry->info = info;
> >> > +		__entry->err = err;
> >> > +		switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX);  
> >> 
> >> Is it possible to just store the information in the trace event and then
> >> call the above function in the read stage?  
> >
> > I agree with Steven: it looks like that with the above code the
> > tracepoint itself will become measurably costily in terms of CPU
> > cycles: we want to avoid that.
> >
> > Perhaps using different tracepoints with different notifier_block type
> > would help? so that each trace point could just copy a few specific
> > fields.  
> 
> This can be done, but you will end up having to duplicate the decoding
> and formatting logic from switchdev-str.c, with the additional hurdle of
> having to figure out the sizes of all referenced objects in order to
> create flattened versions of every notification type.

Would it help if you could pass a trace_seq to it? The TP_printk() has a
"magical" trace_seq variable that trace events can use in the TP_printk()
called "p".

Look at:

  include/trace/events/libata.h:

const char *libata_trace_parse_status(struct trace_seq*, unsigned char);
#define __parse_status(s) libata_trace_parse_status(p, s)

Where we have:

const char *
libata_trace_parse_status(struct trace_seq *p, unsigned char status)
{
	const char *ret = trace_seq_buffer_ptr(p);

	trace_seq_printf(p, "{ ");
	if (status & ATA_BUSY)
		trace_seq_printf(p, "BUSY ");
	if (status & ATA_DRDY)
		trace_seq_printf(p, "DRDY ");
	if (status & ATA_DF)
		trace_seq_printf(p, "DF ");
	if (status & ATA_DSC)
		trace_seq_printf(p, "DSC ");
	if (status & ATA_DRQ)
		trace_seq_printf(p, "DRQ ");
	if (status & ATA_CORR)
		trace_seq_printf(p, "CORR ");
	if (status & ATA_SENSE)
		trace_seq_printf(p, "SENSE ");
	if (status & ATA_ERR)
		trace_seq_printf(p, "ERR ");
	trace_seq_putc(p, '}');
	trace_seq_putc(p, 0);

	return ret;
}

The "trace_seq p" is a pointer to trace_seq descriptor that can build
strings, and then you can use it to print a custom string in the trace
output.



> 
> What I like about the current approach is that when new notification and
> object types are added, switchdev_notifier_str will automatically be
> able to decode them and give you some rough idea of what is going on,
> even if no new message specific decoding logic is added. It is also
> reusable by drivers that might want to decode notifications or objects
> in error messages.
> 
> Would some variant of (how I understand) Steven's suggestion to instead
> store the formatted message in a dynamic array (__assign_str()), rather
> than in the tracepoint entry, be acceptable?

Matters if you could adapt using a trace_seq for the output. Or at least
use a seq_buf, as that's what is under the covers of trace_seq. If you
rather just use seq_buf, the above could pretty much be the same by passing
in: &p->seq.

-- Steve

Tobias Waldekranz March 4, 2024, 10:40 p.m. UTC | #6

On ons, feb 28, 2024 at 09:56, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 28 Feb 2024 11:47:24 +0100
> Tobias Waldekranz <tobias@waldekranz.com> wrote:
>
>> >> > +	TP_fast_assign(
>> >> > +		__entry->val = val;
>> >> > +		__assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)");
>> >> > +		__entry->info = info;
>> >> > +		__entry->err = err;
>> >> > +		switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX);  
>> >> 
>> >> Is it possible to just store the information in the trace event and then
>> >> call the above function in the read stage?  
>> >
>> > I agree with Steven: it looks like that with the above code the
>> > tracepoint itself will become measurably costily in terms of CPU
>> > cycles: we want to avoid that.
>> >
>> > Perhaps using different tracepoints with different notifier_block type
>> > would help? so that each trace point could just copy a few specific
>> > fields.  
>> 
>> This can be done, but you will end up having to duplicate the decoding
>> and formatting logic from switchdev-str.c, with the additional hurdle of
>> having to figure out the sizes of all referenced objects in order to
>> create flattened versions of every notification type.
>
> Would it help if you could pass a trace_seq to it? The TP_printk() has a
> "magical" trace_seq variable that trace events can use in the TP_printk()
> called "p".
>
> Look at:
>
>   include/trace/events/libata.h:
>
> const char *libata_trace_parse_status(struct trace_seq*, unsigned char);
> #define __parse_status(s) libata_trace_parse_status(p, s)
>
> Where we have:
>
> const char *
> libata_trace_parse_status(struct trace_seq *p, unsigned char status)
> {
> 	const char *ret = trace_seq_buffer_ptr(p);
>
> 	trace_seq_printf(p, "{ ");
> 	if (status & ATA_BUSY)
> 		trace_seq_printf(p, "BUSY ");
> 	if (status & ATA_DRDY)
> 		trace_seq_printf(p, "DRDY ");
> 	if (status & ATA_DF)
> 		trace_seq_printf(p, "DF ");
> 	if (status & ATA_DSC)
> 		trace_seq_printf(p, "DSC ");
> 	if (status & ATA_DRQ)
> 		trace_seq_printf(p, "DRQ ");
> 	if (status & ATA_CORR)
> 		trace_seq_printf(p, "CORR ");
> 	if (status & ATA_SENSE)
> 		trace_seq_printf(p, "SENSE ");
> 	if (status & ATA_ERR)
> 		trace_seq_printf(p, "ERR ");
> 	trace_seq_putc(p, '}');
> 	trace_seq_putc(p, 0);
>
> 	return ret;
> }
>
> The "trace_seq p" is a pointer to trace_seq descriptor that can build
> strings, and then you can use it to print a custom string in the trace
> output.

Yes I managed to decode the hidden variable :) I also found
trace_seq_acquire() (and its macro alter ego __get_buf()), which would
let me keep the generic stringer functions. So far, so good.

I think the foundational problem remains though: TP_printk() is not
executed until a user reads from the trace_pipe; at which point the
object referenced by __entry->info may already be dead and
buried. Right?

>> 
>> What I like about the current approach is that when new notification and
>> object types are added, switchdev_notifier_str will automatically be
>> able to decode them and give you some rough idea of what is going on,
>> even if no new message specific decoding logic is added. It is also
>> reusable by drivers that might want to decode notifications or objects
>> in error messages.
>> 
>> Would some variant of (how I understand) Steven's suggestion to instead
>> store the formatted message in a dynamic array (__assign_str()), rather
>> than in the tracepoint entry, be acceptable?
>
> Matters if you could adapt using a trace_seq for the output. Or at least
> use a seq_buf, as that's what is under the covers of trace_seq. If you
> rather just use seq_buf, the above could pretty much be the same by passing
> in: &p->seq.
>
> -- Steve

Steven Rostedt March 6, 2024, 3:15 p.m. UTC | #7

On Mon, 04 Mar 2024 23:40:49 +0100
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> On ons, feb 28, 2024 at 09:56, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Wed, 28 Feb 2024 11:47:24 +0100
> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
> >  
> > The "trace_seq p" is a pointer to trace_seq descriptor that can build
> > strings, and then you can use it to print a custom string in the trace
> > output.  
> 
> Yes I managed to decode the hidden variable :) I also found
> trace_seq_acquire() (and its macro alter ego __get_buf()), which would
> let me keep the generic stringer functions. So far, so good.
> 
> I think the foundational problem remains though: TP_printk() is not
> executed until a user reads from the trace_pipe; at which point the
> object referenced by __entry->info may already be dead and
> buried. Right?

Correct. You would need to load all the information into the event data
itself, at the time of the event is triggered, that is needed to determine
how to display it.

-- Steve

Tobias Waldekranz March 6, 2024, 8:02 p.m. UTC | #8

On ons, mar 06, 2024 at 10:15, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 04 Mar 2024 23:40:49 +0100
> Tobias Waldekranz <tobias@waldekranz.com> wrote:
>
>> On ons, feb 28, 2024 at 09:56, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Wed, 28 Feb 2024 11:47:24 +0100
>> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
>> >  
>> > The "trace_seq p" is a pointer to trace_seq descriptor that can build
>> > strings, and then you can use it to print a custom string in the trace
>> > output.  
>> 
>> Yes I managed to decode the hidden variable :) I also found
>> trace_seq_acquire() (and its macro alter ego __get_buf()), which would
>> let me keep the generic stringer functions. So far, so good.
>> 
>> I think the foundational problem remains though: TP_printk() is not
>> executed until a user reads from the trace_pipe; at which point the
>> object referenced by __entry->info may already be dead and
>> buried. Right?
>
> Correct. You would need to load all the information into the event data
> itself, at the time of the event is triggered, that is needed to determine
> how to display it.

Given that that is quite gnarly to do for the events I'm trying to
trace, because of the complex object graph, would it be acceptable to
format the message in the assign phase and store it as dynamic data?
I.e., what (I think) you suggested at the end of your first response.

My thinking is:

- Managing a duplicate (flattened) object graph, exclusively for use by
  these tracepoints, increases the effort to keep the tracing in sync
  with new additions to switchdev; which I think will result in
  developers simply avoiding it altogether. In other words: I'd rather
  have somewhat inefficient but simple flashlight, rather than a very
  efficient one that no one knows how to change the batteries in.

- This is typically not a very hot path. Most events are triggered by
  user configuration. Otherwise when new neighbors are discovered.

- __entry->info is still there for use by raw tracepoint consumers from
  userspace.

Steven Rostedt March 6, 2024, 9:46 p.m. UTC | #9

On Wed, 06 Mar 2024 21:02:06 +0100
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> On ons, mar 06, 2024 at 10:15, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Mon, 04 Mar 2024 23:40:49 +0100
> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
> >  
> >> On ons, feb 28, 2024 at 09:56, Steven Rostedt <rostedt@goodmis.org> wrote:  
> >> > On Wed, 28 Feb 2024 11:47:24 +0100
> >> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
> >> >  
> >> > The "trace_seq p" is a pointer to trace_seq descriptor that can build
> >> > strings, and then you can use it to print a custom string in the trace
> >> > output.    
> >> 
> >> Yes I managed to decode the hidden variable :) I also found
> >> trace_seq_acquire() (and its macro alter ego __get_buf()), which would
> >> let me keep the generic stringer functions. So far, so good.
> >> 
> >> I think the foundational problem remains though: TP_printk() is not
> >> executed until a user reads from the trace_pipe; at which point the
> >> object referenced by __entry->info may already be dead and
> >> buried. Right?  
> >
> > Correct. You would need to load all the information into the event data
> > itself, at the time of the event is triggered, that is needed to determine
> > how to display it.  
> 
> Given that that is quite gnarly to do for the events I'm trying to
> trace, because of the complex object graph, would it be acceptable to
> format the message in the assign phase and store it as dynamic data?
> I.e., what (I think) you suggested at the end of your first response.

It's really up to what you want to do ;-)

> 
> My thinking is:
> 
> - Managing a duplicate (flattened) object graph, exclusively for use by
>   these tracepoints, increases the effort to keep the tracing in sync
>   with new additions to switchdev; which I think will result in
>   developers simply avoiding it altogether. In other words: I'd rather
>   have somewhat inefficient but simple flashlight, rather than a very
>   efficient one that no one knows how to change the batteries in.
> 
> - This is typically not a very hot path. Most events are triggered by
>   user configuration. Otherwise when new neighbors are discovered.
> 
> - __entry->info is still there for use by raw tracepoint consumers from
>   userspace.

How big is this info?

-- Steve

Tobias Waldekranz March 6, 2024, 10:45 p.m. UTC | #10

On ons, mar 06, 2024 at 16:46, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 06 Mar 2024 21:02:06 +0100
> Tobias Waldekranz <tobias@waldekranz.com> wrote:
>
>> On ons, mar 06, 2024 at 10:15, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Mon, 04 Mar 2024 23:40:49 +0100
>> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
>> >  
>> >> On ons, feb 28, 2024 at 09:56, Steven Rostedt <rostedt@goodmis.org> wrote:  
>> >> > On Wed, 28 Feb 2024 11:47:24 +0100
>> >> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
>> >> >  
>> >> > The "trace_seq p" is a pointer to trace_seq descriptor that can build
>> >> > strings, and then you can use it to print a custom string in the trace
>> >> > output.    
>> >> 
>> >> Yes I managed to decode the hidden variable :) I also found
>> >> trace_seq_acquire() (and its macro alter ego __get_buf()), which would
>> >> let me keep the generic stringer functions. So far, so good.
>> >> 
>> >> I think the foundational problem remains though: TP_printk() is not
>> >> executed until a user reads from the trace_pipe; at which point the
>> >> object referenced by __entry->info may already be dead and
>> >> buried. Right?  
>> >
>> > Correct. You would need to load all the information into the event data
>> > itself, at the time of the event is triggered, that is needed to determine
>> > how to display it.  
>> 
>> Given that that is quite gnarly to do for the events I'm trying to
>> trace, because of the complex object graph, would it be acceptable to
>> format the message in the assign phase and store it as dynamic data?
>> I.e., what (I think) you suggested at the end of your first response.
>
> It's really up to what you want to do ;-)

Alright. I'll interpret that as "there's a >0% chance that I'll give you
an Acked-by on something like that" :)

>> 
>> My thinking is:
>> 
>> - Managing a duplicate (flattened) object graph, exclusively for use by
>>   these tracepoints, increases the effort to keep the tracing in sync
>>   with new additions to switchdev; which I think will result in
>>   developers simply avoiding it altogether. In other words: I'd rather
>>   have somewhat inefficient but simple flashlight, rather than a very
>>   efficient one that no one knows how to change the batteries in.
>> 
>> - This is typically not a very hot path. Most events are triggered by
>>   user configuration. Otherwise when new neighbors are discovered.
>> 
>> - __entry->info is still there for use by raw tracepoint consumers from
>>   userspace.
>
> How big is this info?

The common struct (switchdev_notifier_info) is 24B at the
moment. Depending on __entry->val, the size of the enclosing
notification (e.g. switchdev_notifier_port_obj_info) is between
40-64B. This pattern may then repeat again inside the concrete notifier,
where you have a pointer to a common object (e.g. switchdev_obj, 48B)
whose outer size (e.g. switchdev_obj_port_vlan, 56B) is determined by an
accompanying enum.

Steven Rostedt March 6, 2024, 11:33 p.m. UTC | #11

On Wed, 06 Mar 2024 23:45:57 +0100
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> > How big is this info?  
> 
> The common struct (switchdev_notifier_info) is 24B at the
> moment. Depending on __entry->val, the size of the enclosing
> notification (e.g. switchdev_notifier_port_obj_info) is between
> 40-64B. This pattern may then repeat again inside the concrete notifier,
> where you have a pointer to a common object (e.g. switchdev_obj, 48B)
> whose outer size (e.g. switchdev_obj_port_vlan, 56B) is determined by an
> accompanying enum.

As long as it's under 1K, you should be good.

-- Steve

[v3,net-next,4/4] net: switchdev: Add tracepoints

Checks

Commit Message

Comments

Patch