diff mbox series

[026/158] mm: emit tracepoint when RSS changes

Message ID 20191201015030.MR-ux4mV1%akpm@linux-foundation.org (mailing list archive)
State New, archived
Headers show
Series [001/158] scripts/spelling.txt: add more spellings to spelling.txt | expand

Commit Message

Andrew Morton Dec. 1, 2019, 1:50 a.m. UTC
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: mm: emit tracepoint when RSS changes

Useful to track how RSS is changing per TGID to detect spikes in RSS and
memory hogs.  Several Android teams have been using this patch in various
kernel trees for half a year now.  Many reported to me it is really useful
so I'm posting it upstream.

Initial patch developed by Tim Murray.  Changes I made from original
patch: o Prevent any additional space consumed by mm_struct.

Regarding the fact that the RSS may change too often thus flooding the
traces - note that, there is some "hysterisis" with this already.  That is
- We update the counter only if we receive 64 page faults due to
SPLIT_RSS_ACCOUNTING.  However, during zapping or copying of pte range,
the RSS is updated immediately which can become noisy/flooding.  In a
previous discussion, we agreed that BPF or ftrace can be used to rate
limit the signal if this becomes an issue.

Also note that I added wrappers to trace_rss_stat to prevent compiler
errors where linux/mm.h is included from tracing code, causing errors such
as:

  CC      kernel/trace/power-traces.o
In file included from ./include/trace/define_trace.h:102,
                 from ./include/trace/events/kmem.h:342,
                 from ./include/linux/mm.h:31,
                 from ./include/linux/ring_buffer.h:5,
                 from ./include/linux/trace_events.h:6,
                 from ./include/trace/events/power.h:12,
                 from kernel/trace/power-traces.c:15:
./include/trace/trace_events.h:113:22: error: field `ent' has incomplete type
   struct trace_entry ent;    \

Link: http://lore.kernel.org/r/20190903200905.198642-1-joel@joelfernandes.org
Link: http://lkml.kernel.org/r/20191001172817.234886-1-joel@joelfernandes.org
Co-developed-by: Tim Murray <timmurray@google.com>
Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Carmen Jackson <carmenjackson@google.com>
Cc: Mayank Gupta <mayankgupta@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h          |   14 +++++++++++---
 include/trace/events/kmem.h |   21 +++++++++++++++++++++
 mm/memory.c                 |    6 ++++++
 3 files changed, 38 insertions(+), 3 deletions(-)

Comments

Steven Rostedt Dec. 2, 2019, 5:14 p.m. UTC | #1
On Sat, 30 Nov 2019 17:50:30 -0800
akpm@linux-foundation.org wrote:

>  /* Optimized variant when page is already known not to be PageAnon */
> --- a/include/trace/events/kmem.h~mm-emit-tracepoint-when-rss-changes
> +++ a/include/trace/events/kmem.h
> @@ -316,6 +316,27 @@ TRACE_EVENT(mm_page_alloc_extfrag,
>  		__entry->change_ownership)
>  );
>  
> +TRACE_EVENT(rss_stat,
> +
> +	TP_PROTO(int member,
> +		long count),
> +
> +	TP_ARGS(member, count),
> +
> +	TP_STRUCT__entry(
> +		__field(int, member)
> +		__field(long, size)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->member = member;
> +		__entry->size = (count << PAGE_SHIFT);

It's best to put all calculations (including shifts) in the print part,
as that's the slow path. The TP_fast_assign() is done when the trace
point is triggered (during the execution of the code). It's best to
keep this in the slow path (TP_printk).

		__entry->count = count;

> +	),
> +
> +	TP_printk("member=%d size=%ldB",
> +		__entry->member,
> +		__entry->size)

		__entry->count << PAGE_SHIFT)

-- Steve


> +	);
>  #endif /* _TRACE_KMEM_H */
>  
>  /* This part must be outside protection */
> --- a/mm/memory.c~mm-emit-tracepoint-when-rss-changes
Joel Fernandes Dec. 2, 2019, 9:13 p.m. UTC | #2
On Mon, Dec 02, 2019 at 12:14:15PM -0500, Steven Rostedt wrote:
> On Sat, 30 Nov 2019 17:50:30 -0800
> akpm@linux-foundation.org wrote:
> 
> >  /* Optimized variant when page is already known not to be PageAnon */
> > --- a/include/trace/events/kmem.h~mm-emit-tracepoint-when-rss-changes
> > +++ a/include/trace/events/kmem.h
> > @@ -316,6 +316,27 @@ TRACE_EVENT(mm_page_alloc_extfrag,
> >  		__entry->change_ownership)
> >  );
> >  
> > +TRACE_EVENT(rss_stat,
> > +
> > +	TP_PROTO(int member,
> > +		long count),
> > +
> > +	TP_ARGS(member, count),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(int, member)
> > +		__field(long, size)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->member = member;
> > +		__entry->size = (count << PAGE_SHIFT);
> 
> It's best to put all calculations (including shifts) in the print part,
> as that's the slow path. The TP_fast_assign() is done when the trace
> point is triggered (during the execution of the code). It's best to
> keep this in the slow path (TP_printk).
> 
> 		__entry->count = count;
> 
> > +	),
> > +
> > +	TP_printk("member=%d size=%ldB",
> > +		__entry->member,
> > +		__entry->size)
> 
> 		__entry->count << PAGE_SHIFT)

Ah. Android users now use a tool called perfetto which gather raw trace
(binary format). So such shifting will have to be done by userspace then if
we did it this way. And I'm afraid this patch has been in circulation for
quite some time now that may cause major pains in changing userspace tooling
now :-\

I would say lets leave it alone for this once! But that is a good idea.

thanks,

 - Joel
Steven Rostedt Dec. 2, 2019, 9:56 p.m. UTC | #3
On Mon, 2 Dec 2019 16:13:45 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Mon, Dec 02, 2019 at 12:14:15PM -0500, Steven Rostedt wrote:
> > On Sat, 30 Nov 2019 17:50:30 -0800
> > akpm@linux-foundation.org wrote:
> >   
> > >  /* Optimized variant when page is already known not to be PageAnon */
> > > --- a/include/trace/events/kmem.h~mm-emit-tracepoint-when-rss-changes
> > > +++ a/include/trace/events/kmem.h
> > > @@ -316,6 +316,27 @@ TRACE_EVENT(mm_page_alloc_extfrag,
> > >  		__entry->change_ownership)
> > >  );
> > >  
> > > +TRACE_EVENT(rss_stat,
> > > +
> > > +	TP_PROTO(int member,
> > > +		long count),
> > > +
> > > +	TP_ARGS(member, count),
> > > +
> > > +	TP_STRUCT__entry(
> > > +		__field(int, member)
> > > +		__field(long, size)
> > > +	),
> > > +
> > > +	TP_fast_assign(
> > > +		__entry->member = member;
> > > +		__entry->size = (count << PAGE_SHIFT);  
> > 
> > It's best to put all calculations (including shifts) in the print part,
> > as that's the slow path. The TP_fast_assign() is done when the trace
> > point is triggered (during the execution of the code). It's best to
> > keep this in the slow path (TP_printk).
> > 
> > 		__entry->count = count;
> >   
> > > +	),
> > > +
> > > +	TP_printk("member=%d size=%ldB",
> > > +		__entry->member,
> > > +		__entry->size)  
> > 
> > 		__entry->count << PAGE_SHIFT)  
> 
> Ah. Android users now use a tool called perfetto which gather raw trace
> (binary format). So such shifting will have to be done by userspace then if
> we did it this way. And I'm afraid this patch has been in circulation for
> quite some time now that may cause major pains in changing userspace tooling
> now :-\
> 

I've been trying to get libtraceevent into a standard library, and
it's been ready to go. Can your tool please use that. It can handle this
without issue. Just feed it the format file (that's exported to
user space via the tracefs file system).

-- Steve
Joel Fernandes Dec. 2, 2019, 11:45 p.m. UTC | #4
On Mon, Dec 02, 2019 at 04:56:01PM -0500, Steven Rostedt wrote:
> On Mon, 2 Dec 2019 16:13:45 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > On Mon, Dec 02, 2019 at 12:14:15PM -0500, Steven Rostedt wrote:
> > > On Sat, 30 Nov 2019 17:50:30 -0800
> > > akpm@linux-foundation.org wrote:
> > >   
> > > >  /* Optimized variant when page is already known not to be PageAnon */
> > > > --- a/include/trace/events/kmem.h~mm-emit-tracepoint-when-rss-changes
> > > > +++ a/include/trace/events/kmem.h
> > > > @@ -316,6 +316,27 @@ TRACE_EVENT(mm_page_alloc_extfrag,
> > > >  		__entry->change_ownership)
> > > >  );
> > > >  
> > > > +TRACE_EVENT(rss_stat,
> > > > +
> > > > +	TP_PROTO(int member,
> > > > +		long count),
> > > > +
> > > > +	TP_ARGS(member, count),
> > > > +
> > > > +	TP_STRUCT__entry(
> > > > +		__field(int, member)
> > > > +		__field(long, size)
> > > > +	),
> > > > +
> > > > +	TP_fast_assign(
> > > > +		__entry->member = member;
> > > > +		__entry->size = (count << PAGE_SHIFT);  
> > > 
> > > It's best to put all calculations (including shifts) in the print part,
> > > as that's the slow path. The TP_fast_assign() is done when the trace
> > > point is triggered (during the execution of the code). It's best to
> > > keep this in the slow path (TP_printk).
> > > 
> > > 		__entry->count = count;
> > >   
> > > > +	),
> > > > +
> > > > +	TP_printk("member=%d size=%ldB",
> > > > +		__entry->member,
> > > > +		__entry->size)  
> > > 
> > > 		__entry->count << PAGE_SHIFT)  
> > 
> > Ah. Android users now use a tool called perfetto which gather raw trace
> > (binary format). So such shifting will have to be done by userspace then if
> > we did it this way. And I'm afraid this patch has been in circulation for
> > quite some time now that may cause major pains in changing userspace tooling
> > now :-\
> > 
> 
> I've been trying to get libtraceevent into a standard library, and
> it's been ready to go. Can your tool please use that. It can handle this
> without issue. Just feed it the format file (that's exported to
> user space via the tracefs file system).

I would love for that to happen but I don't develop Perfetto much. If I am
writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
lead developer Primiano -- I believe you have already met Primiano at a
conference before as he mentioned it to me that you guys met. I also believe
this topic of using a common library was discussed before, but something
about licensing came up.

thanks,

 - Joel
Steven Rostedt Dec. 2, 2019, 11:53 p.m. UTC | #5
On Mon, 2 Dec 2019 18:45:14 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:

> 
> I would love for that to happen but I don't develop Perfetto much. If I am
> writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
> lead developer Primiano -- I believe you have already met Primiano at a
> conference before as he mentioned it to me that you guys met. I also believe
> this topic of using a common library was discussed before, but something
> about licensing came up.

libtraceevent is under LGPL, is that an issue?

-- Steve
Primiano Tucci Dec. 3, 2019, 2:48 a.m. UTC | #6
On Mon, Dec 2, 2019 at 11:53 PM Steven Rostedt <rostedt@goodmis.org> wrote:
On Mon, 2 Dec 2019 18:45:14 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:

>
> I would love for that to happen but I don't develop Perfetto much. If I am
> writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
> lead developer Primiano -- I believe you have already met Primiano at a
> conference before as he mentioned it to me that you guys met. I also believe
> this topic of using a common library was discussed before, but something
> about licensing came up.

Oh hello again!

> libtraceevent is under LGPL, is that an issue?

Unfortunately yes, it is :/
Our process for incorporating GPL or LGPL code makes Perfetto [1] (which is
Apache-2 licensed) problematic for us and recursively for other projects that
depend on us.

For context, Perfetto is a cross-platform tracing project based on shmem and
protobuf, shipped on production devices and used by other app-developer-facing
tools (e.g. [6, 7]). It deals with both:
(1) pure userspace-to-userspace tracing (on all major OSs).
(2) kernel tracing via ftrace/tracefs (only on Linux/Android).
https://docs.perfetto.dev/ explains it a bit more.

Today Perfetto is embedded and used both by Chrome [2] and Android platform [3].
For both projects, pulling LGPL-licensed code is cumbersome process-wise: It
would require us to put mechanism in place to guarantee that the relevant LGPL
dependencies don't get accidentally linked in any production binary but only
used for the standalone offline tools to analyze traces.
Such process is unfortunately very expensive to setup and maintain for us and
for the projects that depend on us.
I don't want to start an ideological battle about licensing. To be clear, I
don't have any issues with LGPL, nor I think there's anything inherently
wrong with it. Just, it makes things too complicated when a smaller sub-project
like ours is embedded in larger projects.

Anyhow, beyond licensing, the principle of grabbing the format files on-device
and bundling them as part of the trace is also problematic for us on Android for
technical reasons (mainly interoperability with other tools that depend on
Perfetto).

From my viewpoint, it would be great if Linux treated the individual fields of
ftrace events as an ABI, given that ftrace events are exposed in their binary
form to userspace through tracefs (which is something I'm extremely grateful
for).

We use ftrace_pipe_raw through Perfetto in Android since 2017 and it has been
working great with the exception of a few cases, mainly enums (see below).
I'd love if we could solve the enum problem in a way that didn't involve running
a C-preprocessor-alike runtime on the format files, regardless of licensing.

Unfortunately I don't have any docs that describe the Perfetto <> ftrace interop
in great details. I apologize for that. I'll fix it soon but in the meanwhile
I'll try to do my best to summarize this part of Perfetto here:

Our ftrace-interop code read()s the binary per_cpu/*/trace_pipe_raw, very
similarly in spirit to what trace_cmd does.
However, unlike trace_cmd, we convert the trace event stream into protobufs
(e.g., [4]), doing binary-to-binary conversions (ftrace raw pipe -> protobuf) at
runtime, asynchronously, on the device being traced.

The way we deal with format files in Perfetto is twofold:
1) We have an archive of known format files for the most common kernels we care
  about. From this archive we generate protobuf schema files like [4]
  at Perfetto-compile-time (which is != compile time of the target device's
  kernel).
  At runtime, on-device, we read the format files from tracefs and we merge
  the compile-time knowledge (about messages and field names) with the ABI
  described by tracefs' format files (message IDs, field types and offsets).
  We make the following assumption about the ABI of raw ftrace events:
  - We do *not* rely on message IDs being stable.
  - We do *not* rely on the field offset to be stable.
  - We do *not* rely on the size and length of int fields to be stable.
  - We do *not* assume the presence of any field.
  - We can detect if a field's type doesn't match anymore (e.g. a string became
    an int) and ignore the field in that case.
  - We only deal with fields whose name matches what known at compile-time.
  This allows us to turn the raw ftrace into a binary-stable protobuf (modulo
  some fields that might be missing) and allows us to play some other tricks
  to reduce the size of the trace (e.g. intern/dedupe thread names).

2) We have a generic schema [5] to transcode ftrace events that we didn't know
   at Perfetto-compile-time. This allows us to deal with both ftrace events
   introduced by future kernel versions or as a fallback for events in 1) where
   we detect ABI mismatches at runtime. The downside of this generic
   schema is that the cost of each event, in terms of trace-size, is
   significantly higher.

There is a point where both 1 and 2 become problematic for us, and this is enums
and, more in general, any ftrace field depends on macro expansions (which turns
out to be mainly enums, in practice).
For instance, the gfp_flags of ftrace events directly reflects the internal enum
values, which are not stable across kenrnel versions. We had to come up
with an internal map to catch up with the various kernel versions.
There are few other cases like gfp_flags but they are quite rare and we ended up
not needing those events, at least until now.

Beyond this, ingesting the raw trace events from ftrace raw pipes it has been
great for all other events without requiring any other parsing library
Super thanks for all the hard work on developing and maintaining ftrace.

Happy to discuss more on IRC, email or VC if you want to know more,
Primiano.

[1] https://docs.perfetto.dev
[2] https://cs.chromium.org/chromium/src/third_party/perfetto/?q=f:perfetto&sq=package:chromium&dr
[3] https://android.googlesource.com/platform/external/perfetto/
[4] https://android.googlesource.com/platform/external/perfetto/+/refs/heads/master/protos/perfetto/trace/ftrace/sched.proto
[5] https://android.googlesource.com/platform/external/perfetto/+/refs/heads/master/protos/perfetto/trace/ftrace/generic.proto
[6] https://github.com/google/gapid/
[7] https://developers.google.com/web/tools/chrome-devtools
Mathieu Desnoyers Dec. 3, 2019, 5:38 a.m. UTC | #7
----- On Dec 2, 2019, at 9:48 PM, Primiano Tucci primiano@google.com wrote:

> On Mon, Dec 2, 2019 at 11:53 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2 Dec 2019 18:45:14 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
>>
>> I would love for that to happen but I don't develop Perfetto much. If I am
>> writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
>> lead developer Primiano -- I believe you have already met Primiano at a
>> conference before as he mentioned it to me that you guys met. I also believe
>> this topic of using a common library was discussed before, but something
>> about licensing came up.
> 
> Oh hello again!
> 
>> libtraceevent is under LGPL, is that an issue?
> 
> Unfortunately yes, it is :/
> Our process for incorporating GPL or LGPL code makes Perfetto [1] (which is
> Apache-2 licensed) problematic for us and recursively for other projects that
> depend on us.
> 
> For context, Perfetto is a cross-platform tracing project based on shmem and
> protobuf, shipped on production devices and used by other app-developer-facing
> tools (e.g. [6, 7]). It deals with both:
> (1) pure userspace-to-userspace tracing (on all major OSs).
> (2) kernel tracing via ftrace/tracefs (only on Linux/Android).
> https://docs.perfetto.dev/ explains it a bit more.
> 
> Today Perfetto is embedded and used both by Chrome [2] and Android platform [3].
> For both projects, pulling LGPL-licensed code is cumbersome process-wise: It
> would require us to put mechanism in place to guarantee that the relevant LGPL
> dependencies don't get accidentally linked in any production binary but only
> used for the standalone offline tools to analyze traces.
> Such process is unfortunately very expensive to setup and maintain for us and
> for the projects that depend on us.
> I don't want to start an ideological battle about licensing. To be clear, I
> don't have any issues with LGPL, nor I think there's anything inherently
> wrong with it. Just, it makes things too complicated when a smaller sub-project
> like ours is embedded in larger projects.

Just to clarify: my understanding is that a constraint that requires no dynamic
linking of proprietary code on LGPL libraries does not seem to originate from
restrictions based on the wording of the LGPL text. So it appears to be self-imposed
either by your employer's (or your own) additional requirements, so not to bother
dealing with the legal side of compliance, am I correct ?

Thanks,

Mathieu
Primiano Tucci Dec. 3, 2019, 10:44 a.m. UTC | #8
On Tue, Dec 3, 2019 at 5:38 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Dec 2, 2019, at 9:48 PM, Primiano Tucci primiano@google.com wrote:
>
> > On Mon, Dec 2, 2019 at 11:53 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Mon, 2 Dec 2019 18:45:14 -0500
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> >>
> >> I would love for that to happen but I don't develop Perfetto much. If I am
> >> writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
> >> lead developer Primiano -- I believe you have already met Primiano at a
> >> conference before as he mentioned it to me that you guys met. I also believe
> >> this topic of using a common library was discussed before, but something
> >> about licensing came up.
> >
> > Oh hello again!
> >
> >> libtraceevent is under LGPL, is that an issue?
> >
> > Unfortunately yes, it is :/
> > Our process for incorporating GPL or LGPL code makes Perfetto [1] (which is
> > Apache-2 licensed) problematic for us and recursively for other projects that
> > depend on us.
> >
> > For context, Perfetto is a cross-platform tracing project based on shmem and
> > protobuf, shipped on production devices and used by other app-developer-facing
> > tools (e.g. [6, 7]). It deals with both:
> > (1) pure userspace-to-userspace tracing (on all major OSs).
> > (2) kernel tracing via ftrace/tracefs (only on Linux/Android).
> > https://docs.perfetto.dev/ explains it a bit more.
> >
> > Today Perfetto is embedded and used both by Chrome [2] and Android platform [3].
> > For both projects, pulling LGPL-licensed code is cumbersome process-wise: It
> > would require us to put mechanism in place to guarantee that the relevant LGPL
> > dependencies don't get accidentally linked in any production binary but only
> > used for the standalone offline tools to analyze traces.
> > Such process is unfortunately very expensive to setup and maintain for us and
> > for the projects that depend on us.
> > I don't want to start an ideological battle about licensing. To be clear, I
> > don't have any issues with LGPL, nor I think there's anything inherently
> > wrong with it. Just, it makes things too complicated when a smaller sub-project
> > like ours is embedded in larger projects.
>
> Just to clarify: my understanding is that a constraint that requires no dynamic
> linking of proprietary code on LGPL libraries does not seem to originate from
> restrictions based on the wording of the LGPL text. So it appears to be self-imposed
> either by your employer's (or your own) additional requirements, so not to bother
> dealing with the legal side of compliance, am I correct ?

Essentially adding a LGPL dep for us is the moral equivalent of putting
a sticker on the project that says "This needs to be handled with care now".
Any new user of the project would need to go through the process of
putting build-time scripts/checks in place to ensure that the dynamic linking
requirements are not accidentally violated.
In large projects with hundreds of build targets (like the aforementioned
ones which embed Perfetto today), refactorings happens every single day.
The risk of somebody pulling unwanted deps by accident is non-zero.
Licensing is not something that one can risk to accidentally get wrong,
hence we are required to do our best to avoid those mistakes.

Furthermore, from a pure technical viewpoint, dynamic linking is a major pain
for many cross-platform projects because has different subtleties on each
platform. Most projects just ship big statically linked monoliths.
Having a LGPL dependency in Perfetto means telling them "if you want to use
this tracing project you need to change your build rules / packaging strategy
and start dealing with dynamic linking on four different platforms
(Linux, Android, Mac, Windows)". This would be a show-stopper for our
project.

Primiano


>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com



--
Primiano Tucci
Software Engineer
Google UK Limited
Registered Office: Belgrave House, 76 Buckingham Palace Road, London SW1W 9TQ
Steven Rostedt Dec. 3, 2019, 2:53 p.m. UTC | #9
On Tue, 3 Dec 2019 02:48:37 +0000
Primiano Tucci <primiano@google.com> wrote:

> On Mon, Dec 2, 2019 at 11:53 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 2 Dec 2019 18:45:14 -0500
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> >
> > I would love for that to happen but I don't develop Perfetto much. If I am
> > writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
> > lead developer Primiano -- I believe you have already met Primiano at a
> > conference before as he mentioned it to me that you guys met. I also believe
> > this topic of using a common library was discussed before, but something
> > about licensing came up.  
> 
> Oh hello again!
> 
> > libtraceevent is under LGPL, is that an issue?  
> 
> Unfortunately yes, it is :/
> Our process for incorporating GPL or LGPL code makes Perfetto [1] (which is
> Apache-2 licensed) problematic for us and recursively for other projects that
> depend on us.

The reason for the LGPL was so that this code could be used by non GPL
code. Too bad the process is what limits this.

> 
> For context, Perfetto is a cross-platform tracing project based on shmem and
> protobuf, shipped on production devices and used by other app-developer-facing
> tools (e.g. [6, 7]). It deals with both:
> (1) pure userspace-to-userspace tracing (on all major OSs).
> (2) kernel tracing via ftrace/tracefs (only on Linux/Android).
> https://docs.perfetto.dev/ explains it a bit more.
> 

> 
> >From my viewpoint, it would be great if Linux treated the individual fields of  
> ftrace events as an ABI, given that ftrace events are exposed in their binary
> form to userspace through tracefs (which is something I'm extremely grateful
> for).

If this were to be the case, trace events would not even exist in the
kernel. The reason new trace events do not appear in the scheduler, and
the VFS layer refuses to add *any* trace events, is because they are
already a partial ABI. Making them an ABI would tightly couple the
implementation of Linux to the exported trace events. The reason I
created libtraceevent was to make it easier to modify trace events
without people worrying too much about the parsing of the events.

Just look at this thread. This started because I wanted to move the
shift operation from the fast path to the slow "print" path, and the
reason for not doing so is because we have a user space dependency on
it.

Really, the kernel developers are fearful about trace events causing
problems with future development. And just saying what you stated
re-enforces their beliefs, and expect to see less trace events being
exposed. This has come up at several kernel/maintainer summits, and
Linus finally has black listed the topic, stating that if it breaks
user space, it gets reverted, PERIOD!

  https://lwn.net/Articles/799262/

> 
> We use ftrace_pipe_raw through Perfetto in Android since 2017 and it has been
> working great with the exception of a few cases, mainly enums (see below).
> I'd love if we could solve the enum problem in a way that didn't involve running
> a C-preprocessor-alike runtime on the format files, regardless of licensing.
> 
> Unfortunately I don't have any docs that describe the Perfetto <> ftrace interop
> in great details. I apologize for that. I'll fix it soon but in the meanwhile
> I'll try to do my best to summarize this part of Perfetto here:
> 
> Our ftrace-interop code read()s the binary per_cpu/*/trace_pipe_raw, very
> similarly in spirit to what trace_cmd does.
> However, unlike trace_cmd, we convert the trace event stream into protobufs
> (e.g., [4]), doing binary-to-binary conversions (ftrace raw pipe -> protobuf) at
> runtime, asynchronously, on the device being traced.
> 
> The way we deal with format files in Perfetto is twofold:
> 1) We have an archive of known format files for the most common kernels we care
>   about. From this archive we generate protobuf schema files like [4]
>   at Perfetto-compile-time (which is != compile time of the target device's
>   kernel).
>   At runtime, on-device, we read the format files from tracefs and we merge
>   the compile-time knowledge (about messages and field names) with the ABI
>   described by tracefs' format files (message IDs, field types and offsets).
>   We make the following assumption about the ABI of raw ftrace events:
>   - We do *not* rely on message IDs being stable.
>   - We do *not* rely on the field offset to be stable.
>   - We do *not* rely on the size and length of int fields to be stable.
>   - We do *not* assume the presence of any field.
>   - We can detect if a field's type doesn't match anymore (e.g. a string became
>     an int) and ignore the field in that case.
>   - We only deal with fields whose name matches what known at compile-time.
>   This allows us to turn the raw ftrace into a binary-stable protobuf (modulo
>   some fields that might be missing) and allows us to play some other tricks
>   to reduce the size of the trace (e.g. intern/dedupe thread names).
> 
> 2) We have a generic schema [5] to transcode ftrace events that we didn't know
>    at Perfetto-compile-time. This allows us to deal with both ftrace events
>    introduced by future kernel versions or as a fallback for events in 1) where
>    we detect ABI mismatches at runtime. The downside of this generic
>    schema is that the cost of each event, in terms of trace-size, is
>    significantly higher.

Thanks for the detailed explanation.

Question, can the tool that reads the tracefs to feed the message ids,
fields, and sizes have LGPL in it? Then that tool itself could have
libtraceveent, and then you could create your own event format to feed
into the other compiled tool. This could solve a lot of issues.

> 
> There is a point where both 1 and 2 become problematic for us, and this is enums
> and, more in general, any ftrace field depends on macro expansions (which turns
> out to be mainly enums, in practice).
> For instance, the gfp_flags of ftrace events directly reflects the internal enum
> values, which are not stable across kenrnel versions. We had to come up
> with an internal map to catch up with the various kernel versions.
> There are few other cases like gfp_flags but they are quite rare and we ended up
> not needing those events, at least until now.

You are probably dealing with older kernels, as a bunch of newer
kernels have in it:

	TRACE_DEFINE_ENUM(<some-enum>);

Which will convert the enums into their actual values before exporting
it to tracefs. And when you have this combined with
CONFIG_TRACE_EVAL_MAP_FILE, you get a file in tracefs called eval_map
that shows the mapping of enums with their values (although, the enums
should have been converted in the tracefs format files).

BTW, the gfp flags look to be defines, and are converted in the format
files.

> 
> Beyond this, ingesting the raw trace events from ftrace raw pipes it has been
> great for all other events without requiring any other parsing library
> Super thanks for all the hard work on developing and maintaining ftrace.
> 
> Happy to discuss more on IRC, email or VC if you want to know more,
> Primiano.

Feel free to join us on #linux-rt on IRC OFTC if you are not already
there.

-- Steve

> 
> [1] https://docs.perfetto.dev
> [2] https://cs.chromium.org/chromium/src/third_party/perfetto/?q=f:perfetto&sq=package:chromium&dr
> [3] https://android.googlesource.com/platform/external/perfetto/
> [4] https://android.googlesource.com/platform/external/perfetto/+/refs/heads/master/protos/perfetto/trace/ftrace/sched.proto
> [5] https://android.googlesource.com/platform/external/perfetto/+/refs/heads/master/protos/perfetto/trace/ftrace/generic.proto
> [6] https://github.com/google/gapid/
> [7] https://developers.google.com/web/tools/chrome-devtools
Steven Rostedt Dec. 3, 2019, 3:58 p.m. UTC | #10
On Tue, 3 Dec 2019 10:44:36 +0000
Primiano Tucci <primiano@google.com> wrote:

> Furthermore, from a pure technical viewpoint, dynamic linking is a major pain
> for many cross-platform projects because has different subtleties on each
> platform. Most projects just ship big statically linked monoliths.
> Having a LGPL dependency in Perfetto means telling them "if you want to use
> this tracing project you need to change your build rules / packaging strategy
> and start dealing with dynamic linking on four different platforms
> (Linux, Android, Mac, Windows)". This would be a show-stopper for our
> project.
>

LGPL does not impose restrictions on having to be dynamically linked.
You can legally statically link a LGPL project as well. There's no
problem with that. You can do that with glibc too.

The one reason I can see that Google wouldn't want to include it, is
that you would also be required to maintain the library that you use.
That is, if someone wants the source code of the library you are using,
you must be able to provide it for them. That's the requirement that
LGPL imposes.

Now it is true that if someone wants to modify the LGPL library, enough
must be distributed to allow the user to do that. But as your code is
Apache, I'm guessing you can give people enough code to still do that.

From: https://copyleft.org/guide/comprehensive-gpl-guidech11.html#x14-10100010.1

"There are, however, subtle differences and additions. For example not
 only is CCS required (as would be with normal versions of GPL), but
 also the CCS provided must enable a developer to regenerate the
 modified version of the entire combined work, using with a modified
 version of the LGPL’d work (as a replacement for the version a
 distributor provided). For example, LGPL’d code is statically linked to
 a non-copyleft executable, the required source code must also include
 sufficient material to split the distributed executable and relink with
 a modified version of the library."

Thus, you are right that code licensed under LGPL is a "handle with
care". But there's a lot of code under LGPL, I'm surprised that there's
not better mechanisms to handle it.

One solution I'm looking at is creating flex/bison templates to do the
parsing. If we do this, those could be licensed under a different
license that would make it easier for others to include this code.

-- Steve
Joel Fernandes Dec. 4, 2019, 4:08 a.m. UTC | #11
On Tue, Dec 03, 2019 at 10:58:42AM -0500, Steven Rostedt wrote:
> On Tue, 3 Dec 2019 10:44:36 +0000
> Primiano Tucci <primiano@google.com> wrote:
> 
> > Furthermore, from a pure technical viewpoint, dynamic linking is a major pain
> > for many cross-platform projects because has different subtleties on each
> > platform. Most projects just ship big statically linked monoliths.
> > Having a LGPL dependency in Perfetto means telling them "if you want to use
> > this tracing project you need to change your build rules / packaging strategy
> > and start dealing with dynamic linking on four different platforms
> > (Linux, Android, Mac, Windows)". This would be a show-stopper for our
> > project.
> >
> 
> LGPL does not impose restrictions on having to be dynamically linked.
> You can legally statically link a LGPL project as well. There's no
> problem with that. You can do that with glibc too.

I did know static link is Ok, thanks for clarifying that.

> 
> The one reason I can see that Google wouldn't want to include it, is
> that you would also be required to maintain the library that you use.
> That is, if someone wants the source code of the library you are using,
> you must be able to provide it for them. That's the requirement that
> LGPL imposes.
> 
> Now it is true that if someone wants to modify the LGPL library, enough
> must be distributed to allow the user to do that. But as your code is
> Apache, I'm guessing you can give people enough code to still do that.
> 
> From: https://copyleft.org/guide/comprehensive-gpl-guidech11.html#x14-10100010.1
> 
> "There are, however, subtle differences and additions. For example not
>  only is CCS required (as would be with normal versions of GPL), but
>  also the CCS provided must enable a developer to regenerate the
>  modified version of the entire combined work, using with a modified
>  version of the LGPL’d work (as a replacement for the version a
>  distributor provided). For example, LGPL’d code is statically linked to
>  a non-copyleft executable, the required source code must also include
>  sufficient material to split the distributed executable and relink with
>  a modified version of the library."
> 
> Thus, you are right that code licensed under LGPL is a "handle with
> care". But there's a lot of code under LGPL, I'm surprised that there's
> not better mechanisms to handle it.

Actually, this is not the concern here. CCS should not be an issue for
Perfetto since (AIUI) it is fully open source. Or, am I missing your point?
For fully open source projects, I don't think Google should have a problem
with statically linking to LGPL.

The only issue that can arise is if an Apache 2.0 project like Perfetto uses
libtraceevent, but instead of statically linking to libtraceevent, it
actually derives from libtraceevent. I think only this case it will be
problematic, right? All other cases should be fine.

thanks,

 - Joel

> One solution I'm looking at is creating flex/bison templates to do the
> parsing. If we do this, those could be licensed under a different
> license that would make it easier for others to include this code.
> 
> -- Steve
diff mbox series

Patch

--- a/include/linux/mm.h~mm-emit-tracepoint-when-rss-changes
+++ a/include/linux/mm.h
@@ -1643,19 +1643,27 @@  static inline unsigned long get_mm_count
 	return (unsigned long)val;
 }
 
+void mm_trace_rss_stat(int member, long count);
+
 static inline void add_mm_counter(struct mm_struct *mm, int member, long value)
 {
-	atomic_long_add(value, &mm->rss_stat.count[member]);
+	long count = atomic_long_add_return(value, &mm->rss_stat.count[member]);
+
+	mm_trace_rss_stat(member, count);
 }
 
 static inline void inc_mm_counter(struct mm_struct *mm, int member)
 {
-	atomic_long_inc(&mm->rss_stat.count[member]);
+	long count = atomic_long_inc_return(&mm->rss_stat.count[member]);
+
+	mm_trace_rss_stat(member, count);
 }
 
 static inline void dec_mm_counter(struct mm_struct *mm, int member)
 {
-	atomic_long_dec(&mm->rss_stat.count[member]);
+	long count = atomic_long_dec_return(&mm->rss_stat.count[member]);
+
+	mm_trace_rss_stat(member, count);
 }
 
 /* Optimized variant when page is already known not to be PageAnon */
--- a/include/trace/events/kmem.h~mm-emit-tracepoint-when-rss-changes
+++ a/include/trace/events/kmem.h
@@ -316,6 +316,27 @@  TRACE_EVENT(mm_page_alloc_extfrag,
 		__entry->change_ownership)
 );
 
+TRACE_EVENT(rss_stat,
+
+	TP_PROTO(int member,
+		long count),
+
+	TP_ARGS(member, count),
+
+	TP_STRUCT__entry(
+		__field(int, member)
+		__field(long, size)
+	),
+
+	TP_fast_assign(
+		__entry->member = member;
+		__entry->size = (count << PAGE_SHIFT);
+	),
+
+	TP_printk("member=%d size=%ldB",
+		__entry->member,
+		__entry->size)
+	);
 #endif /* _TRACE_KMEM_H */
 
 /* This part must be outside protection */
--- a/mm/memory.c~mm-emit-tracepoint-when-rss-changes
+++ a/mm/memory.c
@@ -72,6 +72,8 @@ 
 #include <linux/oom.h>
 #include <linux/numa.h>
 
+#include <trace/events/kmem.h>
+
 #include <asm/io.h>
 #include <asm/mmu_context.h>
 #include <asm/pgalloc.h>
@@ -152,6 +154,10 @@  static int __init init_zero_pfn(void)
 }
 core_initcall(init_zero_pfn);
 
+void mm_trace_rss_stat(int member, long count)
+{
+	trace_rss_stat(member, count);
+}
 
 #if defined(SPLIT_RSS_COUNTING)