diff mbox series

[v1,3/3] perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail}

Message ID 20210809112727.596876-4-leo.yan@linaro.org (mailing list archive)
State New, archived
Headers show
Series perf: Support compat mode for AUX ring buffer | expand

Commit Message

Leo Yan Aug. 9, 2021, 11:27 a.m. UTC
When the tool runs with compat mode on Arm platform, the kernel is in
64-bit mode and user space is in 32-bit mode; the user space can use
instructions "ldrd" and "strd" for 64-bit value atomicity.

This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
building, it uses "ldrd" and "strd" instructions to ensure accessing
atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
built for arm and arm64 building, these two functions are not needed for
arm64, so check the compiler macro "__arm__" to only include them for
arm building.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

James Clark Aug. 23, 2021, 12:23 p.m. UTC | #1
On 09/08/2021 12:27, Leo Yan wrote:
> When the tool runs with compat mode on Arm platform, the kernel is in
> 64-bit mode and user space is in 32-bit mode; the user space can use
> instructions "ldrd" and "strd" for 64-bit value atomicity.
> 
> This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
> building, it uses "ldrd" and "strd" instructions to ensure accessing
> atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
> built for arm and arm64 building, these two functions are not needed for
> arm64, so check the compiler macro "__arm__" to only include them for
> arm building.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> index b187bddbd01a..c7c7ec0812d5 100644
> --- a/tools/perf/arch/arm/util/auxtrace.c
> +++ b/tools/perf/arch/arm/util/auxtrace.c
> @@ -107,3 +107,35 @@ struct auxtrace_record
>  	*err = 0;
>  	return NULL;
>  }
> +
> +#if defined(__arm__)
> +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +	u64 result;
> +
> +	__asm__ __volatile__(
> +"	ldrd    %0, %H0, [%1]"
> +	: "=&r" (result)
> +	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
> +	);
> +
> +	return result;
> +}

Hi Leo,

I see that this is a duplicate of the atomic read in arch/arm/include/asm/atomic.h

For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
make it not so easy for Arm. Just wondering if you considered trying to include the
existing one? Or decided that it was easier to duplicate it?

Other than that, I have tested that the change works with a 32bit build with snapshot
and normal mode.

Reviewed by: James Clark <james.clark@arm.com>
Tested by: James Clark <james.clark@arm.com>
 
> +
> +int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
> +{
> +	struct perf_event_mmap_page *pc = mm->userpg;
> +
> +	/* Ensure all reads are done before we write the tail out */
> +	smp_mb();
> +
> +	__asm__ __volatile__(
> +"	strd    %2, %H2, [%1]"
> +	: "=Qo" (pc->aux_tail)
> +	: "r" (&pc->aux_tail), "r" (tail)
> +	);
> +
> +	return 0;
> +}
> +#endif
>
Leo Yan Aug. 23, 2021, 1:30 p.m. UTC | #2
Hi James,

On Mon, Aug 23, 2021 at 01:23:42PM +0100, James Clark wrote:
> 
> 
> On 09/08/2021 12:27, Leo Yan wrote:
> > When the tool runs with compat mode on Arm platform, the kernel is in
> > 64-bit mode and user space is in 32-bit mode; the user space can use
> > instructions "ldrd" and "strd" for 64-bit value atomicity.
> > 
> > This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
> > building, it uses "ldrd" and "strd" instructions to ensure accessing
> > atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
> > built for arm and arm64 building, these two functions are not needed for
> > arm64, so check the compiler macro "__arm__" to only include them for
> > arm building.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> > index b187bddbd01a..c7c7ec0812d5 100644
> > --- a/tools/perf/arch/arm/util/auxtrace.c
> > +++ b/tools/perf/arch/arm/util/auxtrace.c
> > @@ -107,3 +107,35 @@ struct auxtrace_record
> >  	*err = 0;
> >  	return NULL;
> >  }
> > +
> > +#if defined(__arm__)
> > +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> > +{
> > +	struct perf_event_mmap_page *pc = mm->userpg;
> > +	u64 result;
> > +
> > +	__asm__ __volatile__(
> > +"	ldrd    %0, %H0, [%1]"
> > +	: "=&r" (result)
> > +	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
> > +	);
> > +
> > +	return result;
> > +}
> 
> Hi Leo,
> 
> I see that this is a duplicate of the atomic read in arch/arm/include/asm/atomic.h

Exactly.

> For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
> include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
> make it not so easy for Arm. Just wondering if you considered trying to include the
> existing one? Or decided that it was easier to duplicate it?

Good finding!

With you reminding, I recognized that the atomic operations for
arm/arm64 should be improved for user space program.  So far, perf tool
simply uses the compiler's atomic implementations (from
asm-generic/atomic-gcc.h) for arm/arm64; but for a more reliable
implementation, I think we should improve the user space program with
architecture's atomic instructions.

So I think your question should be converted to: should we export the
arm/arm64 atomicity operations to user space program?  Seems to me this
is a challenge work, we need at least to finish below items:

- Support arm64 atomic operations and reuse kernel's
  arch/arm/include/asm/atomic.h;
- Support arm atomic operation and reuse kernel's
  arch/arm/include/asm/atomic.h;
- For aarch32 building, we need to use configurations to distinguish
  different cases, like LPAE, Armv7, and ARMv6 variants (so far I have
  no idea how to use a graceful way to distinguish these different
  building in perf tool).

I am not sure if there have any existed ongoing effort for this part,
if anyone is working on this (or before have started related work),
then definitely we should look into how we can reuse the arch's
atomic headers.

Otherwise, I prefer to firstly merge this patch with dozen lines of
duplicate code; afterwards, we can send a separate patch set to
support arm/arm64 atomic operations in user space.

If any Arm/Arm64 maintainers could shed some light for this part work,
I think it would be very helpful.

> Other than that, I have tested that the change works with a 32bit build with snapshot
> and normal mode.
> 
> Reviewed by: James Clark <james.clark@arm.com>
> Tested by: James Clark <james.clark@arm.com>

Thanks for test and review!

Leo
Russell King (Oracle) Aug. 23, 2021, 1:39 p.m. UTC | #3
On Mon, Aug 23, 2021 at 09:30:43PM +0800, Leo Yan wrote:
> Hi James,
> 
> On Mon, Aug 23, 2021 at 01:23:42PM +0100, James Clark wrote:
> > 
> > 
> > On 09/08/2021 12:27, Leo Yan wrote:
> > > When the tool runs with compat mode on Arm platform, the kernel is in
> > > 64-bit mode and user space is in 32-bit mode; the user space can use
> > > instructions "ldrd" and "strd" for 64-bit value atomicity.
> > > 
> > > This patch adds compat_auxtrace_mmap__{read_head|write_tail} for arm
> > > building, it uses "ldrd" and "strd" instructions to ensure accessing
> > > atomicity for aux head and tail.  The file arch/arm/util/auxtrace.c is
> > > built for arm and arm64 building, these two functions are not needed for
> > > arm64, so check the compiler macro "__arm__" to only include them for
> > > arm building.
> > > 
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > >  tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++++++++++++++++++++
> > >  1 file changed, 32 insertions(+)
> > > 
> > > diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> > > index b187bddbd01a..c7c7ec0812d5 100644
> > > --- a/tools/perf/arch/arm/util/auxtrace.c
> > > +++ b/tools/perf/arch/arm/util/auxtrace.c
> > > @@ -107,3 +107,35 @@ struct auxtrace_record
> > >  	*err = 0;
> > >  	return NULL;
> > >  }
> > > +
> > > +#if defined(__arm__)
> > > +u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
> > > +{
> > > +	struct perf_event_mmap_page *pc = mm->userpg;
> > > +	u64 result;
> > > +
> > > +	__asm__ __volatile__(
> > > +"	ldrd    %0, %H0, [%1]"
> > > +	: "=&r" (result)
> > > +	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
> > > +	);
> > > +
> > > +	return result;
> > > +}
> > 
> > Hi Leo,
> > 
> > I see that this is a duplicate of the atomic read in arch/arm/include/asm/atomic.h
> 
> Exactly.
> 
> > For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
> > include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
> > make it not so easy for Arm. Just wondering if you considered trying to include the
> > existing one? Or decided that it was easier to duplicate it?
> 
> Good finding!
> 
> With you reminding, I recognized that the atomic operations for
> arm/arm64 should be improved for user space program.  So far, perf tool
> simply uses the compiler's atomic implementations (from
> asm-generic/atomic-gcc.h) for arm/arm64; but for a more reliable
> implementation, I think we should improve the user space program with
> architecture's atomic instructions.

No we should not. Sometimes, what's in the kernel is for the kernel's
use only, and not for userspace's use. That may be because what works
in kernel space does not work in userspace.

For example, the ARMv6+ atomic operations can be executed in userspace
_provided_ they are only used on memory which has an exclusive monitor.
They can't be used on anything that is not "normal memory". Prior to
ARMv6, the atomic operations rely on disabling interrupts. That
facility is simply not available to userspace, so these must not be
made available to userspace.

The same applies to bitops.

We've been here before in the past, when the kernel headers were not
separated from the user ABI headers, and people would write programs
that included e.g. bitops.h on x86 because they had optimised bitops
code. This made the userspace programs very non-portable - without
re-implementing userspace versions of this stuff in every userspace
program that did this stuff.

So no, having experienced the effects of this kind of thing in the
past, the kernel should _not_ export architecture specific code in
header files to userspace.

Also, it should be pointed out that by doing so, you create a licensing
issue. If the code is GPLv2, and you build your program such that it
incorporates GPLv2 code, then if the userspace program is not GPLv2
compliant, you have a licensing problem, and in effect the program
can be distributed.

Please do not go down this route.
Leo Yan Aug. 23, 2021, 2:54 p.m. UTC | #4
Hi Russell,

On Mon, Aug 23, 2021 at 02:39:18PM +0100, Russell King (Oracle) wrote:
> On Mon, Aug 23, 2021 at 09:30:43PM +0800, Leo Yan wrote:
> > On Mon, Aug 23, 2021 at 01:23:42PM +0100, James Clark wrote:

[...]

> > > For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
> > > include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
> > > make it not so easy for Arm. Just wondering if you considered trying to include the
> > > existing one? Or decided that it was easier to duplicate it?
> > 
> > Good finding!
> > 
> > With you reminding, I recognized that the atomic operations for
> > arm/arm64 should be improved for user space program.  So far, perf tool
> > simply uses the compiler's atomic implementations (from
> > asm-generic/atomic-gcc.h) for arm/arm64; but for a more reliable
> > implementation, I think we should improve the user space program with
> > architecture's atomic instructions.
> 
> No we should not. Sometimes, what's in the kernel is for the kernel's
> use only, and not for userspace's use. That may be because what works
> in kernel space does not work in userspace.
> 
> For example, the ARMv6+ atomic operations can be executed in userspace
> _provided_ they are only used on memory which has an exclusive monitor.
> They can't be used on anything that is not "normal memory".

Okay, IIUC, the requirement for "normal memory" and exclusive monitor
should also apply on aarch64 for ldrex/strex, Load-Acquire and
Store-Release instructions, etc.  Otherwise, it's heavily dependent on
the exclusive monitors outside the cache coherency domain (but this is
out of the scopes for CPU).

perf tool is very likely to map memory with "normal memory" but we
cannot say it's always true.

So I agree there have risk for exporting the aarch32/aarch64 atomic
headers to user space.

> Prior to
> ARMv6, the atomic operations rely on disabling interrupts. That
> facility is simply not available to userspace, so these must not be
> made available to userspace.
> 
> The same applies to bitops.
> 
> We've been here before in the past, when the kernel headers were not
> separated from the user ABI headers, and people would write programs
> that included e.g. bitops.h on x86 because they had optimised bitops
> code. This made the userspace programs very non-portable - without
> re-implementing userspace versions of this stuff in every userspace
> program that did this stuff.
> 
> So no, having experienced the effects of this kind of thing in the
> past, the kernel should _not_ export architecture specific code in
> header files to userspace.
> 
> Also, it should be pointed out that by doing so, you create a licensing
> issue. If the code is GPLv2, and you build your program such that it
> incorporates GPLv2 code, then if the userspace program is not GPLv2
> compliant, you have a licensing problem, and in effect the program
> can be distributed.
> 
> Please do not go down this route.

Thanks a lot for the suggestion and quick response.

Leo
diff mbox series

Patch

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index b187bddbd01a..c7c7ec0812d5 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -107,3 +107,35 @@  struct auxtrace_record
 	*err = 0;
 	return NULL;
 }
+
+#if defined(__arm__)
+u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+	u64 result;
+
+	__asm__ __volatile__(
+"	ldrd    %0, %H0, [%1]"
+	: "=&r" (result)
+	: "r" (&pc->aux_head), "Qo" (pc->aux_head)
+	);
+
+	return result;
+}
+
+int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
+{
+	struct perf_event_mmap_page *pc = mm->userpg;
+
+	/* Ensure all reads are done before we write the tail out */
+	smp_mb();
+
+	__asm__ __volatile__(
+"	strd    %2, %H2, [%1]"
+	: "=Qo" (pc->aux_tail)
+	: "r" (&pc->aux_tail), "r" (tail)
+	);
+
+	return 0;
+}
+#endif