diff mbox series

[v2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace

Message ID 1535535863-1818-1-git-send-email-robert.walker@arm.com (mailing list archive)
State New, archived
Headers show
Series [v2] perf: Support for Arm A32/T32 instruction sets in CoreSight trace | expand

Commit Message

Robert Walker Aug. 29, 2018, 9:44 a.m. UTC
This patch adds support for generating instruction samples from trace of
AArch32 programs using the A32 and T32 instruction sets.

T32 has variable 2 or 4 byte instruction size, so the conversion between
addresses and instruction counts requires extra information from the trace
decoder, requiring version 0.9.1 of OpenCSD.  A check for the new struct
member has been added to the feature check for OpenCSD.

Signed-off-by: Robert Walker <robert.walker@arm.com>
---

v2: Minor fixes following review comments from Mathieu
    Rebased on v4.19-rc1

 tools/build/feature/test-libopencsd.c           |  7 +++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 27 ++++++++++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 10 ++++
 tools/perf/util/cs-etm.c                        | 71 +++++++++++--------------
 4 files changed, 75 insertions(+), 40 deletions(-)

Comments

Kim Phillips Aug. 29, 2018, 1:49 p.m. UTC | #1
On Wed, 29 Aug 2018 10:44:23 +0100
Robert Walker <robert.walker@arm.com> wrote:

> This patch adds support for generating instruction samples from trace of
> AArch32 programs using the A32 and T32 instruction sets.
> 
> T32 has variable 2 or 4 byte instruction size, so the conversion between
> addresses and instruction counts requires extra information from the trace
> decoder, requiring version 0.9.1 of OpenCSD.  A check for the new struct
> member has been added to the feature check for OpenCSD.
> 
> Signed-off-by: Robert Walker <robert.walker@arm.com>
> ---
...
> +++ b/tools/build/feature/test-libopencsd.c
> @@ -3,6 +3,13 @@
>  
>  int main(void)
>  {
> +	/*
> +	 * Requires ocsd_generic_trace_elem.num_instr_range introduced in
> +	 * OpenCSD 0.9

0.9 != 0.9.1 in the above commit text: which is it?

> +	 */
> +	ocsd_generic_trace_elem elem;
> +	(void)elem.num_instr_range;
> +

This breaks building against older versions of OpenCSD, right?

>  	(void)ocsd_get_version();

Why don't we maintain building against older versions of the library,
and use the version information to make the decision on whether to use
the new feature being introduced in this patch?

Thanks,

Kim
Robert Walker Aug. 29, 2018, 2:34 p.m. UTC | #2
Hi Kim,


On 29/08/18 14:49, Kim Phillips wrote:
> On Wed, 29 Aug 2018 10:44:23 +0100
> Robert Walker <robert.walker@arm.com> wrote:
>
>> This patch adds support for generating instruction samples from trace of
>> AArch32 programs using the A32 and T32 instruction sets.
>>
>> T32 has variable 2 or 4 byte instruction size, so the conversion between
>> addresses and instruction counts requires extra information from the trace
>> decoder, requiring version 0.9.1 of OpenCSD.  A check for the new struct
>> member has been added to the feature check for OpenCSD.
>>
>> Signed-off-by: Robert Walker <robert.walker@arm.com>
>> ---
> ...
>> +++ b/tools/build/feature/test-libopencsd.c
>> @@ -3,6 +3,13 @@
>>   
>>   int main(void)
>>   {
>> +	/*
>> +	 * Requires ocsd_generic_trace_elem.num_instr_range introduced in
>> +	 * OpenCSD 0.9
> 0.9 != 0.9.1 in the above commit text: which is it?
I'll change it to 0.9.1 if there's another version of the patch (it was 
introduced in 0.9, but 0.9.1 has a necessary bug fix)
>> +	 */
>> +	ocsd_generic_trace_elem elem;
>> +	(void)elem.num_instr_range;
>> +
> This breaks building against older versions of OpenCSD, right?
>
>>   	(void)ocsd_get_version();
> Why don't we maintain building against older versions of the library,
> and use the version information to make the decision on whether to use
> the new feature being introduced in this patch?
The intention is to fail the feature detection check if the older 
version is installed - perf will still compile, but without the 
CoreSight trace support.

OpenCSD is still in development, so new features like this are being 
added and it would add a lot of #ifdef mess to the perf code to continue 
to support any machines with the old library version installed - there 
will only be a handful of machines affected and it's trivial to upgrade 
them (the new Debian packages are available).  How long would we 
continue to support such an older version?  I also don't see any 
precedent for supporting multiple dependent library versions in perf.
>
> Thanks,
>
> Kim
Regards

Rob
Kim Phillips Aug. 29, 2018, 4:32 p.m. UTC | #3
On Wed, 29 Aug 2018 15:34:16 +0100
Robert Walker <robert.walker@arm.com> wrote:

> Hi Kim,

Hi Robert,

> On 29/08/18 14:49, Kim Phillips wrote:
> > On Wed, 29 Aug 2018 10:44:23 +0100
> > Robert Walker <robert.walker@arm.com> wrote:
> >
> >> This patch adds support for generating instruction samples from trace of
> >> AArch32 programs using the A32 and T32 instruction sets.
> >>
> >> T32 has variable 2 or 4 byte instruction size, so the conversion between
> >> addresses and instruction counts requires extra information from the trace
> >> decoder, requiring version 0.9.1 of OpenCSD.  A check for the new struct
> >> member has been added to the feature check for OpenCSD.
> >>
> >> Signed-off-by: Robert Walker <robert.walker@arm.com>
> >> ---
> > ...
> >> +++ b/tools/build/feature/test-libopencsd.c
> >> @@ -3,6 +3,13 @@
> >>   
> >>   int main(void)
> >>   {
> >> +	/*
> >> +	 * Requires ocsd_generic_trace_elem.num_instr_range introduced in
> >> +	 * OpenCSD 0.9
> > 0.9 != 0.9.1 in the above commit text: which is it?
> I'll change it to 0.9.1 if there's another version of the patch (it was 
> introduced in 0.9, but 0.9.1 has a necessary bug fix)
> >> +	 */
> >> +	ocsd_generic_trace_elem elem;
> >> +	(void)elem.num_instr_range;
> >> +
> > This breaks building against older versions of OpenCSD, right?
> >
> >>   	(void)ocsd_get_version();
> > Why don't we maintain building against older versions of the library,
> > and use the version information to make the decision on whether to use
> > the new feature being introduced in this patch?
> The intention is to fail the feature detection check if the older 
> version is installed - perf will still compile, but without the 
> CoreSight trace support.

It should still compile, and with CoreSight trace support, just
not support for A32/T32 instruction sets.  The user shouldn't be denied
CoreSight trace support if they don't care for A32/T32 ISA support.

> OpenCSD is still in development, so new features like this are being 
> added and it would add a lot of #ifdef mess to the perf code to continue 
> to support any machines with the old library version installed - there 

Even adding #ifdefs, that won't survive taking one perf executable
built on one machine and running it on another machine with a different
version of the OpenCSD library: it'll break inconspicuously, not
gracefully!

There needs to be a run-time means of working with older versions of
the library.

Consider checking the sizeof some of the structs?  IIRC, some of the
structs sizes changed in the library.  See e.g., the 'size' field of
perf_event_attr:

size
     The size of the perf_event_attr structure for forward/backward
     compatibility.  Set this using sizeof(struct perf_event_attr)
     to allow the kernel to see the struct size at the time
     of compilation.

or, likely better, the 'version' and 'compat_version' of the
perf_event_mmap_page structure:

           struct perf_event_mmap_page {
               __u32 version;        /* version number of this structure */
               __u32 compat_version; /* lowest version this is compat with */
	       ...

> will only be a handful of machines affected and it's trivial to upgrade 
> them (the new Debian packages are available).  

This is upstream linux, so I don't know how you know only a 'handful'
of machines affected, and I wouldn't assume everyone's using Debian.

For one, I'd hate to see a single user affected if it isn't necessary,
as is in this case - not everyone wants A32/T32 ISA support, and
library compatibility needn't be broken.

This 'screw compatibility' mentality needs to be dropped *now* if
CoreSight support is to have a successful future.

Otherwise, I suggest keeping this feature in downstream trees for the
'handful', until the library and perf code are rewritten in a state
where they properly interoperate, and do not break each other.

> How long would we 
> continue to support such an older version?

What do you mean such an older version?  The project's v0.9.0 commit
was on 20 June 2018, the one that's usable - v0.9.1 - has a July 27
2018 commit date!  One month is *not* *old*!

>  I also don't see any 
> precedent for supporting multiple dependent library versions in perf.

That's because perf doesn't have a precedent on depending on libraries
that flat-out break their own users compatibility across versions ;)

Thanks,

Kim
Robert Walker Aug. 31, 2018, 1:42 p.m. UTC | #4
Hi Kim,

Generally, I agree with you about breaking backward compatibility, but 
in this case I don't think there is an actual problem.  As I understand 
it, you're worried that perf will break for people who are using an 
older version (0.8.x) of the OpenCSD library for CoreSight trace decode 
and this patch updates the requirement to a newer version (0.9.x) to 
enable support for trace of 32-bit applications.

There are only a few (4/5?) targets around with working support for 
CoreSight trace (and of these only Juno is the only platform with a 
devicetree in the mainline kernel), so only a few users of perf for Arm 
trace decode - most of these are people working those directly involved 
with Arm & Linaro or will be reading the coresight mailing list.  Anyone 
working with OpenCSD will have got it from github and compiled it 
themselves, so they can update and build a new version.  It's only been 
packaged for debian so far and testing already has the 0.9.x version 
(the 0.8.x version was only in debian for 8 days before being replaced 
by 0.9.x).

It would be possible to add conditional compilation flags to support 
compiling with 0.8.x, but I feel this would add too much mess to the 
code and I'd need some help in figuring out perf's feature detection 
system to generate the flags.  Given the likely small number of people 
affected and the easy upgrade path, I don't think this is worth it.


On 29/08/18 17:32, Kim Phillips wrote:
> On Wed, 29 Aug 2018 15:34:16 +0100
> Robert Walker <robert.walker@arm.com> wrote:
>
>> Hi Kim,
> Hi Robert,
>
>> On 29/08/18 14:49, Kim Phillips wrote:
>>> On Wed, 29 Aug 2018 10:44:23 +0100
>>> Robert Walker <robert.walker@arm.com> wrote:
>>>
>>>> This patch adds support for generating instruction samples from trace of
>>>> AArch32 programs using the A32 and T32 instruction sets.
>>>>
>>>> T32 has variable 2 or 4 byte instruction size, so the conversion between
>>>> addresses and instruction counts requires extra information from the trace
>>>> decoder, requiring version 0.9.1 of OpenCSD.  A check for the new struct
>>>> member has been added to the feature check for OpenCSD.
>>>>
>>>> Signed-off-by: Robert Walker <robert.walker@arm.com>
>>>> ---
>>> ...
>>>> +++ b/tools/build/feature/test-libopencsd.c
>>>> @@ -3,6 +3,13 @@
>>>>    
>>>>    int main(void)
>>>>    {
>>>> +	/*
>>>> +	 * Requires ocsd_generic_trace_elem.num_instr_range introduced in
>>>> +	 * OpenCSD 0.9
>>> 0.9 != 0.9.1 in the above commit text: which is it?
>> I'll change it to 0.9.1 if there's another version of the patch (it was
>> introduced in 0.9, but 0.9.1 has a necessary bug fix)
>>>> +	 */
>>>> +	ocsd_generic_trace_elem elem;
>>>> +	(void)elem.num_instr_range;
>>>> +
>>> This breaks building against older versions of OpenCSD, right?
>>>
>>>>    	(void)ocsd_get_version();
>>> Why don't we maintain building against older versions of the library,
>>> and use the version information to make the decision on whether to use
>>> the new feature being introduced in this patch?
>> The intention is to fail the feature detection check if the older
>> version is installed - perf will still compile, but without the
>> CoreSight trace support.
> It should still compile, and with CoreSight trace support, just
> not support for A32/T32 instruction sets.  The user shouldn't be denied
> CoreSight trace support if they don't care for A32/T32 ISA support.
>
>> OpenCSD is still in development, so new features like this are being
>> added and it would add a lot of #ifdef mess to the perf code to continue
>> to support any machines with the old library version installed - there
> Even adding #ifdefs, that won't survive taking one perf executable
> built on one machine and running it on another machine with a different
> version of the OpenCSD library: it'll break inconspicuously, not
> gracefully!

perf has a lot of other shared library dependencies (ELF , unwind 
libraries etc), so moving builds between systems is already fragile.

> There needs to be a run-time means of working with older versions of
> the library.
>
> Consider checking the sizeof some of the structs?  IIRC, some of the
> structs sizes changed in the library.  See e.g., the 'size' field of
> perf_event_attr:
>
> size
>       The size of the perf_event_attr structure for forward/backward
>       compatibility.  Set this using sizeof(struct perf_event_attr)
>       to allow the kernel to see the struct size at the time
>       of compilation.
>
> or, likely better, the 'version' and 'compat_version' of the
> perf_event_mmap_page structure:
>
>             struct perf_event_mmap_page {
>                 __u32 version;        /* version number of this structure */
>                 __u32 compat_version; /* lowest version this is compat with */
> 	       ...
>
>> will only be a handful of machines affected and it's trivial to upgrade
>> them (the new Debian packages are available).
> This is upstream linux, so I don't know how you know only a 'handful'
> of machines affected, and I wouldn't assume everyone's using Debian.
>
> For one, I'd hate to see a single user affected if it isn't necessary,
> as is in this case - not everyone wants A32/T32 ISA support, and
> library compatibility needn't be broken.
>
> This 'screw compatibility' mentality needs to be dropped *now* if
> CoreSight support is to have a successful future.
>
> Otherwise, I suggest keeping this feature in downstream trees for the
> 'handful', until the library and perf code are rewritten in a state
> where they properly interoperate, and do not break each other.
>
>> How long would we
>> continue to support such an older version?
> What do you mean such an older version?  The project's v0.9.0 commit
> was on 20 June 2018, the one that's usable - v0.9.1 - has a July 27
> 2018 commit date!  One month is *not* *old*!
I mean the 0.8.x version as the old version.
>>    I also don't see any
>> precedent for supporting multiple dependent library versions in perf.
> That's because perf doesn't have a precedent on depending on libraries
> that flat-out break their own users compatibility across versions ;)
This patch picks up a new feature that's been added - I notice the 
feature detection checks for other libraries check a number of features 
and emit warnings about required versions.

> Thanks,
>
> Kim

Regards

Rob
Kim Phillips Aug. 31, 2018, 2:43 p.m. UTC | #5
On Fri, 31 Aug 2018 14:42:00 +0100
Robert Walker <robert.walker@arm.com> wrote:

> Generally, I agree with you about breaking backward compatibility, but 
> in this case I don't think there is an actual problem.  As I understand 

I consider it a serious problem.

> it, you're worried that perf will break for people who are using an 
> older version (0.8.x) of the OpenCSD library for CoreSight trace decode 
> and this patch updates the requirement to a newer version (0.9.x) to 
> enable support for trace of 32-bit applications.

My problem is: every time a new feature is added, it shouldn't
force base CoreSight decode functionality to require a new version of
the library.

My second problem is: this patch implements a build-time requirement,
which is insufficient for running on machines with incompatible
versions of the library.

> There are only a few (4/5?) targets around with working support for 
> CoreSight trace (and of these only Juno is the only platform with a 
> devicetree in the mainline kernel), so only a few users of perf for Arm 
> trace decode - most of these are people working those directly involved 
> with Arm & Linaro or will be reading the coresight mailing list.  Anyone

Great, then share this feature with them, but don't send a patch to
break upstream, which, in turn, goes back to many things downstream
(future distro releases on newer targets, etc.).

> working with OpenCSD will have got it from github and compiled it 
> themselves, so they can update and build a new version.  It's only been 

No.  Updating the library - no matter where one gets it from - shouldn't
have to be necessary to avoid perf build regressions.

> packaged for debian so far and testing already has the 0.9.x version 
> (the 0.8.x version was only in debian for 8 days before being replaced 
> by 0.9.x).

What Debian does is immaterial to upstream submissions.

How is Debian testing the library, btw?  Functionality that worked in
0.8 will fail in 0.9 AFAICT.

> It would be possible to add conditional compilation flags to support 
> compiling with 0.8.x, but I feel this would add too much mess to the 
> code and I'd need some help in figuring out perf's feature detection 
> system to generate the flags.

No, we need run-time compatibility.  Build-time compatibility does not
satisfy the run-time requirements in this case.

>  Given the likely small number of people 
> affected and the easy upgrade path, I don't think this is worth it.

This is an upstream submission, and I wouldn't like for a single person
to ever have to experience such silently deceitful bugs.

For now, share the new feature in a personal git tree, for those that
need it.

Meanwhile, the library needs to be fixed with a run-time version
compatibility API ASAP.

Thanks,

Kim

> On 29/08/18 17:32, Kim Phillips wrote:
> > On Wed, 29 Aug 2018 15:34:16 +0100
> > Robert Walker <robert.walker@arm.com> wrote:
> >
> >> Hi Kim,
> > Hi Robert,
> >
> >> On 29/08/18 14:49, Kim Phillips wrote:
> >>> On Wed, 29 Aug 2018 10:44:23 +0100
> >>> Robert Walker <robert.walker@arm.com> wrote:
> >>>
> >>>> This patch adds support for generating instruction samples from trace of
> >>>> AArch32 programs using the A32 and T32 instruction sets.
> >>>>
> >>>> T32 has variable 2 or 4 byte instruction size, so the conversion between
> >>>> addresses and instruction counts requires extra information from the trace
> >>>> decoder, requiring version 0.9.1 of OpenCSD.  A check for the new struct
> >>>> member has been added to the feature check for OpenCSD.
> >>>>
> >>>> Signed-off-by: Robert Walker <robert.walker@arm.com>
> >>>> ---
> >>> ...
> >>>> +++ b/tools/build/feature/test-libopencsd.c
> >>>> @@ -3,6 +3,13 @@
> >>>>    
> >>>>    int main(void)
> >>>>    {
> >>>> +	/*
> >>>> +	 * Requires ocsd_generic_trace_elem.num_instr_range introduced in
> >>>> +	 * OpenCSD 0.9
> >>> 0.9 != 0.9.1 in the above commit text: which is it?
> >> I'll change it to 0.9.1 if there's another version of the patch (it was
> >> introduced in 0.9, but 0.9.1 has a necessary bug fix)
> >>>> +	 */
> >>>> +	ocsd_generic_trace_elem elem;
> >>>> +	(void)elem.num_instr_range;
> >>>> +
> >>> This breaks building against older versions of OpenCSD, right?
> >>>
> >>>>    	(void)ocsd_get_version();
> >>> Why don't we maintain building against older versions of the library,
> >>> and use the version information to make the decision on whether to use
> >>> the new feature being introduced in this patch?
> >> The intention is to fail the feature detection check if the older
> >> version is installed - perf will still compile, but without the
> >> CoreSight trace support.
> > It should still compile, and with CoreSight trace support, just
> > not support for A32/T32 instruction sets.  The user shouldn't be denied
> > CoreSight trace support if they don't care for A32/T32 ISA support.
> >
> >> OpenCSD is still in development, so new features like this are being
> >> added and it would add a lot of #ifdef mess to the perf code to continue
> >> to support any machines with the old library version installed - there
> > Even adding #ifdefs, that won't survive taking one perf executable
> > built on one machine and running it on another machine with a different
> > version of the OpenCSD library: it'll break inconspicuously, not
> > gracefully!
> 
> perf has a lot of other shared library dependencies (ELF , unwind 
> libraries etc), so moving builds between systems is already fragile.
> 
> > There needs to be a run-time means of working with older versions of
> > the library.
> >
> > Consider checking the sizeof some of the structs?  IIRC, some of the
> > structs sizes changed in the library.  See e.g., the 'size' field of
> > perf_event_attr:
> >
> > size
> >       The size of the perf_event_attr structure for forward/backward
> >       compatibility.  Set this using sizeof(struct perf_event_attr)
> >       to allow the kernel to see the struct size at the time
> >       of compilation.
> >
> > or, likely better, the 'version' and 'compat_version' of the
> > perf_event_mmap_page structure:
> >
> >             struct perf_event_mmap_page {
> >                 __u32 version;        /* version number of this structure */
> >                 __u32 compat_version; /* lowest version this is compat with */
> > 	       ...
> >
> >> will only be a handful of machines affected and it's trivial to upgrade
> >> them (the new Debian packages are available).
> > This is upstream linux, so I don't know how you know only a 'handful'
> > of machines affected, and I wouldn't assume everyone's using Debian.
> >
> > For one, I'd hate to see a single user affected if it isn't necessary,
> > as is in this case - not everyone wants A32/T32 ISA support, and
> > library compatibility needn't be broken.
> >
> > This 'screw compatibility' mentality needs to be dropped *now* if
> > CoreSight support is to have a successful future.
> >
> > Otherwise, I suggest keeping this feature in downstream trees for the
> > 'handful', until the library and perf code are rewritten in a state
> > where they properly interoperate, and do not break each other.
> >
> >> How long would we
> >> continue to support such an older version?
> > What do you mean such an older version?  The project's v0.9.0 commit
> > was on 20 June 2018, the one that's usable - v0.9.1 - has a July 27
> > 2018 commit date!  One month is *not* *old*!
> I mean the 0.8.x version as the old version.
> >>    I also don't see any
> >> precedent for supporting multiple dependent library versions in perf.
> > That's because perf doesn't have a precedent on depending on libraries
> > that flat-out break their own users compatibility across versions ;)
> This patch picks up a new feature that's been added - I notice the 
> feature detection checks for other libraries check a number of features 
> and emit warnings about required versions.
> 
> > Thanks,
> >
> > Kim
> 
> Regards
> 
> Rob
> 
>
Mathieu Poirier Sept. 5, 2018, 6:16 p.m. UTC | #6
On Fri, 31 Aug 2018 at 08:43, Kim Phillips <kim.phillips@arm.com> wrote:
>
> On Fri, 31 Aug 2018 14:42:00 +0100
> Robert Walker <robert.walker@arm.com> wrote:
>
> > Generally, I agree with you about breaking backward compatibility, but
> > in this case I don't think there is an actual problem.  As I understand
>
> I consider it a serious problem.
>
> > it, you're worried that perf will break for people who are using an
> > older version (0.8.x) of the OpenCSD library for CoreSight trace decode
> > and this patch updates the requirement to a newer version (0.9.x) to
> > enable support for trace of 32-bit applications.
>
> My problem is: every time a new feature is added, it shouldn't
> force base CoreSight decode functionality to require a new version of
> the library.
>
> My second problem is: this patch implements a build-time requirement,
> which is insufficient for running on machines with incompatible
> versions of the library.

While it is not realistic to expect eternal backward compatibility for
core features, I think in this case we can do better.

Looking at the code in the patch it seems possible to implement the
new functionality in functions that can be enabled based on the
library version we compile against.  If the library version is not
high enough, the functions simply get stubbed out and the feature
isn't available.

A prerequisite step would be to move openCSD's
"decoder/include/ocsd_if_version.h" under
"decoder/include/opencsd/ocsd_if_version.h" in order to make symbols
OCSD_VER_{ MAJOR | MINOW | PATH } accessible but that's not a big
deal.

Please try this approach first - we can consider more drastic measures
if that doesn't work.

>
> > There are only a few (4/5?) targets around with working support for
> > CoreSight trace (and of these only Juno is the only platform with a
> > devicetree in the mainline kernel), so only a few users of perf for Arm
> > trace decode - most of these are people working those directly involved
> > with Arm & Linaro or will be reading the coresight mailing list.  Anyone
>
> Great, then share this feature with them, but don't send a patch to
> break upstream, which, in turn, goes back to many things downstream
> (future distro releases on newer targets, etc.).
>
> > working with OpenCSD will have got it from github and compiled it
> > themselves, so they can update and build a new version.  It's only been
>
> No.  Updating the library - no matter where one gets it from - shouldn't
> have to be necessary to avoid perf build regressions.
>
> > packaged for debian so far and testing already has the 0.9.x version
> > (the 0.8.x version was only in debian for 8 days before being replaced
> > by 0.9.x).
>
> What Debian does is immaterial to upstream submissions.
>
> How is Debian testing the library, btw?  Functionality that worked in
> 0.8 will fail in 0.9 AFAICT.
>
> > It would be possible to add conditional compilation flags to support
> > compiling with 0.8.x, but I feel this would add too much mess to the
> > code and I'd need some help in figuring out perf's feature detection
> > system to generate the flags.
>
> No, we need run-time compatibility.  Build-time compatibility does not
> satisfy the run-time requirements in this case.
>
> >  Given the likely small number of people
> > affected and the easy upgrade path, I don't think this is worth it.
>
> This is an upstream submission, and I wouldn't like for a single person
> to ever have to experience such silently deceitful bugs.
>
> For now, share the new feature in a personal git tree, for those that
> need it.
>
> Meanwhile, the library needs to be fixed with a run-time version
> compatibility API ASAP.
>
> Thanks,
>
> Kim
>
> > On 29/08/18 17:32, Kim Phillips wrote:
> > > On Wed, 29 Aug 2018 15:34:16 +0100
> > > Robert Walker <robert.walker@arm.com> wrote:
> > >
> > >> Hi Kim,
> > > Hi Robert,
> > >
> > >> On 29/08/18 14:49, Kim Phillips wrote:
> > >>> On Wed, 29 Aug 2018 10:44:23 +0100
> > >>> Robert Walker <robert.walker@arm.com> wrote:
> > >>>
> > >>>> This patch adds support for generating instruction samples from trace of
> > >>>> AArch32 programs using the A32 and T32 instruction sets.
> > >>>>
> > >>>> T32 has variable 2 or 4 byte instruction size, so the conversion between
> > >>>> addresses and instruction counts requires extra information from the trace
> > >>>> decoder, requiring version 0.9.1 of OpenCSD.  A check for the new struct
> > >>>> member has been added to the feature check for OpenCSD.
> > >>>>
> > >>>> Signed-off-by: Robert Walker <robert.walker@arm.com>
> > >>>> ---
> > >>> ...
> > >>>> +++ b/tools/build/feature/test-libopencsd.c
> > >>>> @@ -3,6 +3,13 @@
> > >>>>
> > >>>>    int main(void)
> > >>>>    {
> > >>>> +        /*
> > >>>> +         * Requires ocsd_generic_trace_elem.num_instr_range introduced in
> > >>>> +         * OpenCSD 0.9
> > >>> 0.9 != 0.9.1 in the above commit text: which is it?
> > >> I'll change it to 0.9.1 if there's another version of the patch (it was
> > >> introduced in 0.9, but 0.9.1 has a necessary bug fix)
> > >>>> +         */
> > >>>> +        ocsd_generic_trace_elem elem;
> > >>>> +        (void)elem.num_instr_range;
> > >>>> +
> > >>> This breaks building against older versions of OpenCSD, right?
> > >>>
> > >>>>          (void)ocsd_get_version();
> > >>> Why don't we maintain building against older versions of the library,
> > >>> and use the version information to make the decision on whether to use
> > >>> the new feature being introduced in this patch?
> > >> The intention is to fail the feature detection check if the older
> > >> version is installed - perf will still compile, but without the
> > >> CoreSight trace support.
> > > It should still compile, and with CoreSight trace support, just
> > > not support for A32/T32 instruction sets.  The user shouldn't be denied
> > > CoreSight trace support if they don't care for A32/T32 ISA support.
> > >
> > >> OpenCSD is still in development, so new features like this are being
> > >> added and it would add a lot of #ifdef mess to the perf code to continue
> > >> to support any machines with the old library version installed - there
> > > Even adding #ifdefs, that won't survive taking one perf executable
> > > built on one machine and running it on another machine with a different
> > > version of the OpenCSD library: it'll break inconspicuously, not
> > > gracefully!
> >
> > perf has a lot of other shared library dependencies (ELF , unwind
> > libraries etc), so moving builds between systems is already fragile.
> >
> > > There needs to be a run-time means of working with older versions of
> > > the library.
> > >
> > > Consider checking the sizeof some of the structs?  IIRC, some of the
> > > structs sizes changed in the library.  See e.g., the 'size' field of
> > > perf_event_attr:
> > >
> > > size
> > >       The size of the perf_event_attr structure for forward/backward
> > >       compatibility.  Set this using sizeof(struct perf_event_attr)
> > >       to allow the kernel to see the struct size at the time
> > >       of compilation.
> > >
> > > or, likely better, the 'version' and 'compat_version' of the
> > > perf_event_mmap_page structure:
> > >
> > >             struct perf_event_mmap_page {
> > >                 __u32 version;        /* version number of this structure */
> > >                 __u32 compat_version; /* lowest version this is compat with */
> > >            ...
> > >
> > >> will only be a handful of machines affected and it's trivial to upgrade
> > >> them (the new Debian packages are available).
> > > This is upstream linux, so I don't know how you know only a 'handful'
> > > of machines affected, and I wouldn't assume everyone's using Debian.
> > >
> > > For one, I'd hate to see a single user affected if it isn't necessary,
> > > as is in this case - not everyone wants A32/T32 ISA support, and
> > > library compatibility needn't be broken.
> > >
> > > This 'screw compatibility' mentality needs to be dropped *now* if
> > > CoreSight support is to have a successful future.
> > >
> > > Otherwise, I suggest keeping this feature in downstream trees for the
> > > 'handful', until the library and perf code are rewritten in a state
> > > where they properly interoperate, and do not break each other.
> > >
> > >> How long would we
> > >> continue to support such an older version?
> > > What do you mean such an older version?  The project's v0.9.0 commit
> > > was on 20 June 2018, the one that's usable - v0.9.1 - has a July 27
> > > 2018 commit date!  One month is *not* *old*!
> > I mean the 0.8.x version as the old version.
> > >>    I also don't see any
> > >> precedent for supporting multiple dependent library versions in perf.
> > > That's because perf doesn't have a precedent on depending on libraries
> > > that flat-out break their own users compatibility across versions ;)
> > This patch picks up a new feature that's been added - I notice the
> > feature detection checks for other libraries check a number of features
> > and emit warnings about required versions.
> >
> > > Thanks,
> > >
> > > Kim
> >
> > Regards
> >
> > Rob
> >
> >
diff mbox series

Patch

diff --git a/tools/build/feature/test-libopencsd.c b/tools/build/feature/test-libopencsd.c
index 5ff1246..d96b2df 100644
--- a/tools/build/feature/test-libopencsd.c
+++ b/tools/build/feature/test-libopencsd.c
@@ -3,6 +3,13 @@ 
 
 int main(void)
 {
+	/*
+	 * Requires ocsd_generic_trace_elem.num_instr_range introduced in
+	 * OpenCSD 0.9
+	 */
+	ocsd_generic_trace_elem elem;
+	(void)elem.num_instr_range;
+
 	(void)ocsd_get_version();
 	return 0;
 }
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 938def6..73d8384 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -263,9 +263,12 @@  static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
 	decoder->tail = 0;
 	decoder->packet_count = 0;
 	for (i = 0; i < MAX_BUFFER; i++) {
+		decoder->packet_buffer[i].isa = CS_ETM_ISA_UNKNOWN;
 		decoder->packet_buffer[i].start_addr = CS_ETM_INVAL_ADDR;
 		decoder->packet_buffer[i].end_addr = CS_ETM_INVAL_ADDR;
+		decoder->packet_buffer[i].instr_count = 0;
 		decoder->packet_buffer[i].last_instr_taken_branch = false;
+		decoder->packet_buffer[i].last_instr_size = 0;
 		decoder->packet_buffer[i].exc = false;
 		decoder->packet_buffer[i].exc_ret = false;
 		decoder->packet_buffer[i].cpu = INT_MIN;
@@ -294,11 +297,13 @@  cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 	decoder->packet_count++;
 
 	decoder->packet_buffer[et].sample_type = sample_type;
+	decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
 	decoder->packet_buffer[et].exc = false;
 	decoder->packet_buffer[et].exc_ret = false;
 	decoder->packet_buffer[et].cpu = *((int *)inode->priv);
 	decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
 	decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
+	decoder->packet_buffer[et].instr_count = 0;
 
 	if (decoder->packet_count == MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
@@ -321,8 +326,28 @@  cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
 
 	packet = &decoder->packet_buffer[decoder->tail];
 
+	switch (elem->isa) {
+	case ocsd_isa_aarch64:
+		packet->isa = CS_ETM_ISA_A64;
+		break;
+	case ocsd_isa_arm:
+		packet->isa = CS_ETM_ISA_A32;
+		break;
+	case ocsd_isa_thumb2:
+		packet->isa = CS_ETM_ISA_T32;
+		break;
+	case ocsd_isa_tee:
+	case ocsd_isa_jazelle:
+	case ocsd_isa_custom:
+	case ocsd_isa_unknown:
+	default:
+		packet->isa = CS_ETM_ISA_UNKNOWN;
+	}
+
 	packet->start_addr = elem->st_addr;
 	packet->end_addr = elem->en_addr;
+	packet->instr_count = elem->num_instr_range;
+
 	switch (elem->last_i_type) {
 	case OCSD_INSTR_BR:
 	case OCSD_INSTR_BR_INDIRECT:
@@ -336,6 +361,8 @@  cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
 		break;
 	}
 
+	packet->last_instr_size = elem->last_instr_sz;
+
 	return ret;
 }
 
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 612b575..9351bd1 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -28,11 +28,21 @@  enum cs_etm_sample_type {
 	CS_ETM_TRACE_ON = 1 << 1,
 };
 
+enum cs_etm_isa {
+	CS_ETM_ISA_UNKNOWN,
+	CS_ETM_ISA_A64,
+	CS_ETM_ISA_A32,
+	CS_ETM_ISA_T32,
+};
+
 struct cs_etm_packet {
 	enum cs_etm_sample_type sample_type;
+	enum cs_etm_isa isa;
 	u64 start_addr;
 	u64 end_addr;
+	u32 instr_count;
 	u8 last_instr_taken_branch;
+	u8 last_instr_size;
 	u8 exc;
 	u8 exc_ret;
 	int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 2ae6402..fcaa73f 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -31,14 +31,6 @@ 
 
 #define MAX_TIMESTAMP (~0ULL)
 
-/*
- * A64 instructions are always 4 bytes
- *
- * Only A64 is supported, so can use this constant for converting between
- * addresses and instruction counts, calculting offsets etc
- */
-#define A64_INSTR_SIZE 4
-
 struct cs_etm_auxtrace {
 	struct auxtrace auxtrace;
 	struct auxtrace_queues queues;
@@ -492,21 +484,16 @@  static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
 	etmq->last_branch_rb->nr = 0;
 }
 
-static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
-{
-	/* Returns 0 for the CS_ETM_TRACE_ON packet */
-	if (packet->sample_type == CS_ETM_TRACE_ON)
-		return 0;
+static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
+					 u64 addr) {
+	u8 instrBytes[2];
 
-	/*
-	 * The packet records the execution range with an exclusive end address
-	 *
-	 * A64 instructions are constant size, so the last executed
-	 * instruction is A64_INSTR_SIZE before the end address
-	 * Will need to do instruction level decode for T32 instructions as
-	 * they can be variable size (not yet supported).
+	cs_etm__mem_access(etmq, addr, ARRAY_SIZE(instrBytes), instrBytes);
+	/* T32 instruction size is indicated by bits[15:11] of the first
+	 * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
+	 * denote a 32-bit instruction.
 	 */
-	return packet->end_addr - A64_INSTR_SIZE;
+	return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
 }
 
 static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
@@ -518,27 +505,32 @@  static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
 	return packet->start_addr;
 }
 
-static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
+static inline
+u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
 {
-	/*
-	 * Only A64 instructions are currently supported, so can get
-	 * instruction count by dividing.
-	 * Will need to do instruction level decode for T32 instructions as
-	 * they can be variable size (not yet supported).
-	 */
-	return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
+	/* Returns 0 for the CS_ETM_TRACE_ON packet */
+	if (packet->sample_type == CS_ETM_TRACE_ON)
+		return 0;
+
+	return packet->end_addr - packet->last_instr_size;
 }
 
-static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet,
+static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
+				     const struct cs_etm_packet *packet,
 				     u64 offset)
 {
-	/*
-	 * Only A64 instructions are currently supported, so can get
-	 * instruction address by muliplying.
-	 * Will need to do instruction level decode for T32 instructions as
-	 * they can be variable size (not yet supported).
-	 */
-	return packet->start_addr + offset * A64_INSTR_SIZE;
+	if (packet->isa == CS_ETM_ISA_T32) {
+		u64 addr = packet->start_addr;
+
+		while (offset > 0) {
+			addr += cs_etm__t32_instr_size(etmq, addr);
+			offset--;
+		}
+		return addr;
+	}
+
+	/* Assume a 4 byte instruction size (A32/A64) */
+	return packet->start_addr + offset * 4;
 }
 
 static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
@@ -867,9 +859,8 @@  static int cs_etm__sample(struct cs_etm_queue *etmq)
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	struct cs_etm_packet *tmp;
 	int ret;
-	u64 instrs_executed;
+	u64 instrs_executed = etmq->packet->instr_count;
 
-	instrs_executed = cs_etm__instr_count(etmq->packet);
 	etmq->period_instructions += instrs_executed;
 
 	/*
@@ -899,7 +890,7 @@  static int cs_etm__sample(struct cs_etm_queue *etmq)
 		 * executed, but PC has not advanced to next instruction)
 		 */
 		u64 offset = (instrs_executed - instrs_over - 1);
-		u64 addr = cs_etm__instr_addr(etmq->packet, offset);
+		u64 addr = cs_etm__instr_addr(etmq, etmq->packet, offset);
 
 		ret = cs_etm__synth_instruction_sample(
 			etmq, addr, etm->instructions_sample_period);