diff mbox series

drivers/perf: Enable PID_IN_CONTEXTIDR with SPE

Message ID 20201130162454.28255-1-james.clark@arm.com (mailing list archive)
State New, archived
Headers show
Series drivers/perf: Enable PID_IN_CONTEXTIDR with SPE | expand

Commit Message

James Clark Nov. 30, 2020, 4:24 p.m. UTC
Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
This flag is required to get PID data in the SPE trace. Without
it the perf tool will report 0 for PID which isn't very useful,
especially when doing system wide profiling or profiling
applications that fork.

There is a small performance overhead when enabling
PID_IN_CONTEXTIDR, but SPE itself is optional and not enabled by
default so the impact is minimised.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Al Grant <al.grant@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
---
 drivers/perf/Kconfig | 1 +
 1 file changed, 1 insertion(+)

Comments

Will Deacon Nov. 30, 2020, 4:46 p.m. UTC | #1
On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote:
> Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
> This flag is required to get PID data in the SPE trace. Without
> it the perf tool will report 0 for PID which isn't very useful,
> especially when doing system wide profiling or profiling
> applications that fork.

Can perf not figure out the pid some other way? (e.g. by tracing context
switches and correlating that with the SPE data?). Also, how does this
work with pid namespaces?

> There is a small performance overhead when enabling
> PID_IN_CONTEXTIDR, but SPE itself is optional and not enabled by
> default so the impact is minimised.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Al Grant <al.grant@arm.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  drivers/perf/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 130327ff0b0e..47ede46c3d57 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -125,6 +125,7 @@ config XGENE_PMU
>  config ARM_SPE_PMU
>  	tristate "Enable support for the ARMv8.2 Statistical Profiling Extension"
>  	depends on ARM64
> +	select PID_IN_CONTEXTIDR

Probably better to make PID_IN_CONTEXTIDR 'default y' if SPE is enabled,
rather than selecting it directly. That way, at least people can turn it
off if they don't want it.

Will
Leo Yan Dec. 1, 2020, 4:10 a.m. UTC | #2
Hi Will,

On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote:
> > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
> > This flag is required to get PID data in the SPE trace. Without
> > it the perf tool will report 0 for PID which isn't very useful,
> > especially when doing system wide profiling or profiling
> > applications that fork.
> 
> Can perf not figure out the pid some other way? (e.g. by tracing context
> switches and correlating that with the SPE data?).

For perf 'per-thread' mode, we can use context switch trace event as
assisted info to select thread context.  But for "system wide" mode and
"snapshot" mode in perf tool, since the trace data is continuous, I
think we cannot use context switch trace event to correlate the SPE
trace data.

> Also, how does this work with pid namespaces?

Here we are studying the implemetation of Intel-PT and Arm CoreSight.

The context ID is stored into the hardware trace data when record;
afterwards when perf tool decodes the trace data and detects the
packet for context ID, it will select the machine's thread context in
perf [1].  Since the perf tool gathers all the threads infomation in
perf data file, based on the context ID, it can find the corresponding
thread pointer with function machine__find_thread() [2].

Since your question is for "pid namespace", to be honest, I don't know
how perf tool to handle any confliction for differrent processes share
the same PID, and I am not sure if you are asking CGroup related stuff
or not.  If this cannot answer your question, please let me know.


Thanks,
Leo

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c#n510
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm.c#n1076

> 
> > There is a small performance overhead when enabling
> > PID_IN_CONTEXTIDR, but SPE itself is optional and not enabled by
> > default so the impact is minimised.
> > 
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Al Grant <al.grant@arm.com>
> > Cc: Leo Yan <leo.yan@linaro.org>
> > Cc: John Garry <john.garry@huawei.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Signed-off-by: James Clark <james.clark@arm.com>
> > ---
> >  drivers/perf/Kconfig | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 130327ff0b0e..47ede46c3d57 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -125,6 +125,7 @@ config XGENE_PMU
> >  config ARM_SPE_PMU
> >  	tristate "Enable support for the ARMv8.2 Statistical Profiling Extension"
> >  	depends on ARM64
> > +	select PID_IN_CONTEXTIDR
> 
> Probably better to make PID_IN_CONTEXTIDR 'default y' if SPE is enabled,
> rather than selecting it directly. That way, at least people can turn it
> off if they don't want it.
> 
> Will
Will Deacon Dec. 1, 2020, 11:09 p.m. UTC | #3
On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote:
> On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote:
> > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
> > > This flag is required to get PID data in the SPE trace. Without
> > > it the perf tool will report 0 for PID which isn't very useful,
> > > especially when doing system wide profiling or profiling
> > > applications that fork.
> > 
> > Can perf not figure out the pid some other way? (e.g. by tracing context
> > switches and correlating that with the SPE data?).
> 
> For perf 'per-thread' mode, we can use context switch trace event as
> assisted info to select thread context.  But for "system wide" mode and
> "snapshot" mode in perf tool, since the trace data is continuous, I
> think we cannot use context switch trace event to correlate the SPE
> trace data.

Is there no way to correlate them with something like CNTVCT?

> > Also, how does this work with pid namespaces?
> 
> Here we are studying the implemetation of Intel-PT and Arm CoreSight.
> 
> The context ID is stored into the hardware trace data when record;
> afterwards when perf tool decodes the trace data and detects the
> packet for context ID, it will select the machine's thread context in
> perf [1].  Since the perf tool gathers all the threads infomation in
> perf data file, based on the context ID, it can find the corresponding
> thread pointer with function machine__find_thread() [2].
> 
> Since your question is for "pid namespace", to be honest, I don't know
> how perf tool to handle any confliction for differrent processes share
> the same PID, and I am not sure if you are asking CGroup related stuff
> or not.  If this cannot answer your question, please let me know.

My point was that the pid value written to CONTEXTIDR is a global pid
and does not take namespacing into account. If perf is run inside a pid
namespace, it will therefore not work.

Will
Leo Yan Dec. 3, 2020, 6:39 a.m. UTC | #4
Hi Will,

[ + Mathieu ]

On Tue, Dec 01, 2020 at 11:09:36PM +0000, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote:
> > On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote:
> > > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote:
> > > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
> > > > This flag is required to get PID data in the SPE trace. Without
> > > > it the perf tool will report 0 for PID which isn't very useful,
> > > > especially when doing system wide profiling or profiling
> > > > applications that fork.
> > > 
> > > Can perf not figure out the pid some other way? (e.g. by tracing context
> > > switches and correlating that with the SPE data?).
> > 
> > For perf 'per-thread' mode, we can use context switch trace event as
> > assisted info to select thread context.  But for "system wide" mode and
> > "snapshot" mode in perf tool, since the trace data is continuous, I
> > think we cannot use context switch trace event to correlate the SPE
> > trace data.
> 
> Is there no way to correlate them with something like CNTVCT?

Good point.  Yes, we can convert CNTVCT to system time; I read the
code in the perf's intel-pt.c and found the timestamp is used to
correlate the auxtrace heap.  I think it's better to dig more for
detailed implementation.

> > > Also, how does this work with pid namespaces?
> > 
> > Here we are studying the implemetation of Intel-PT and Arm CoreSight.
> > 
> > The context ID is stored into the hardware trace data when record;
> > afterwards when perf tool decodes the trace data and detects the
> > packet for context ID, it will select the machine's thread context in
> > perf [1].  Since the perf tool gathers all the threads infomation in
> > perf data file, based on the context ID, it can find the corresponding
> > thread pointer with function machine__find_thread() [2].
> > 
> > Since your question is for "pid namespace", to be honest, I don't know
> > how perf tool to handle any confliction for differrent processes share
> > the same PID, and I am not sure if you are asking CGroup related stuff
> > or not.  If this cannot answer your question, please let me know.
> 
> My point was that the pid value written to CONTEXTIDR is a global pid
> and does not take namespacing into account. If perf is run inside a pid
> namespace, it will therefore not work.

Understand now.

The perf events PERF_RECORD_ITRACE_START/PERF_RECORD_SWITCH/
PERF_RECORD_SWITCH_CPU_WIDE can be used to set pid/tid in perf.  So this
would be a safe way for perf tool running in pid namespace.

Loop in Mathieu, this is a common issue for both Arm SPE and CoreSight
(IIRC, though CoreSight's timestamp is not strictly attaching to Arm arch
timer counter, the trend is to unify this for using arch timer
counter).

I think James could continue to upstream a new patch by following your
suggestion for enabling PID_IN_CONTEXTIDR, eventually, it's a feature
for Arm SPE to record CONTEXTIDR in its packet.

Your questions inspired me, thanks!

Leo
Leo Yan Dec. 3, 2020, 6:41 a.m. UTC | #5
Looping in Mathieu at this time.

On Thu, Dec 03, 2020 at 02:39:41PM +0800, Leo Yan wrote:
> Hi Will,
> 
> [ + Mathieu ]
> 
> On Tue, Dec 01, 2020 at 11:09:36PM +0000, Will Deacon wrote:
> > On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote:
> > > On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote:
> > > > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
> > > > > This flag is required to get PID data in the SPE trace. Without
> > > > > it the perf tool will report 0 for PID which isn't very useful,
> > > > > especially when doing system wide profiling or profiling
> > > > > applications that fork.
> > > > 
> > > > Can perf not figure out the pid some other way? (e.g. by tracing context
> > > > switches and correlating that with the SPE data?).
> > > 
> > > For perf 'per-thread' mode, we can use context switch trace event as
> > > assisted info to select thread context.  But for "system wide" mode and
> > > "snapshot" mode in perf tool, since the trace data is continuous, I
> > > think we cannot use context switch trace event to correlate the SPE
> > > trace data.
> > 
> > Is there no way to correlate them with something like CNTVCT?
> 
> Good point.  Yes, we can convert CNTVCT to system time; I read the
> code in the perf's intel-pt.c and found the timestamp is used to
> correlate the auxtrace heap.  I think it's better to dig more for
> detailed implementation.
> 
> > > > Also, how does this work with pid namespaces?
> > > 
> > > Here we are studying the implemetation of Intel-PT and Arm CoreSight.
> > > 
> > > The context ID is stored into the hardware trace data when record;
> > > afterwards when perf tool decodes the trace data and detects the
> > > packet for context ID, it will select the machine's thread context in
> > > perf [1].  Since the perf tool gathers all the threads infomation in
> > > perf data file, based on the context ID, it can find the corresponding
> > > thread pointer with function machine__find_thread() [2].
> > > 
> > > Since your question is for "pid namespace", to be honest, I don't know
> > > how perf tool to handle any confliction for differrent processes share
> > > the same PID, and I am not sure if you are asking CGroup related stuff
> > > or not.  If this cannot answer your question, please let me know.
> > 
> > My point was that the pid value written to CONTEXTIDR is a global pid
> > and does not take namespacing into account. If perf is run inside a pid
> > namespace, it will therefore not work.
> 
> Understand now.
> 
> The perf events PERF_RECORD_ITRACE_START/PERF_RECORD_SWITCH/
> PERF_RECORD_SWITCH_CPU_WIDE can be used to set pid/tid in perf.  So this
> would be a safe way for perf tool running in pid namespace.
> 
> Loop in Mathieu, this is a common issue for both Arm SPE and CoreSight
> (IIRC, though CoreSight's timestamp is not strictly attaching to Arm arch
> timer counter, the trend is to unify this for using arch timer
> counter).
> 
> I think James could continue to upstream a new patch by following your
> suggestion for enabling PID_IN_CONTEXTIDR, eventually, it's a feature
> for Arm SPE to record CONTEXTIDR in its packet.
> 
> Your questions inspired me, thanks!
> 
> Leo
Mathieu Poirier Dec. 4, 2020, 12:07 a.m. UTC | #6
On Thu, Dec 03, 2020 at 02:39:41PM +0800, Leo Yan wrote:
> Hi Will,
> 
> [ + Mathieu ]
> 
> On Tue, Dec 01, 2020 at 11:09:36PM +0000, Will Deacon wrote:
> > On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote:
> > > On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote:
> > > > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
> > > > > This flag is required to get PID data in the SPE trace. Without
> > > > > it the perf tool will report 0 for PID which isn't very useful,
> > > > > especially when doing system wide profiling or profiling
> > > > > applications that fork.
> > > > 
> > > > Can perf not figure out the pid some other way? (e.g. by tracing context
> > > > switches and correlating that with the SPE data?).
> > > 
> > > For perf 'per-thread' mode, we can use context switch trace event as
> > > assisted info to select thread context.  But for "system wide" mode and
> > > "snapshot" mode in perf tool, since the trace data is continuous, I
> > > think we cannot use context switch trace event to correlate the SPE
> > > trace data.
> > 
> > Is there no way to correlate them with something like CNTVCT?
> 
> Good point.  Yes, we can convert CNTVCT to system time; I read the
> code in the perf's intel-pt.c and found the timestamp is used to
> correlate the auxtrace heap.  I think it's better to dig more for
> detailed implementation.
> 
> > > > Also, how does this work with pid namespaces?
> > > 
> > > Here we are studying the implemetation of Intel-PT and Arm CoreSight.
> > > 
> > > The context ID is stored into the hardware trace data when record;
> > > afterwards when perf tool decodes the trace data and detects the
> > > packet for context ID, it will select the machine's thread context in
> > > perf [1].  Since the perf tool gathers all the threads infomation in
> > > perf data file, based on the context ID, it can find the corresponding
> > > thread pointer with function machine__find_thread() [2].
> > > 
> > > Since your question is for "pid namespace", to be honest, I don't know
> > > how perf tool to handle any confliction for differrent processes share
> > > the same PID, and I am not sure if you are asking CGroup related stuff
> > > or not.  If this cannot answer your question, please let me know.
> > 
> > My point was that the pid value written to CONTEXTIDR is a global pid
> > and does not take namespacing into account. If perf is run inside a pid
> > namespace, it will therefore not work.

Will is correct.  That is why coresight doesn't allow contextID tracing if the
request is made from inside a PID namespace.  But that is only from the sysfs
interface.  From perf the guard wasn't put in place, probably because CPU
wide trace scenarios were not implemented yet.   

> 
> Understand now.
> 
> The perf events PERF_RECORD_ITRACE_START/PERF_RECORD_SWITCH/
> PERF_RECORD_SWITCH_CPU_WIDE can be used to set pid/tid in perf.  So this
> would be a safe way for perf tool running in pid namespace.
> 

Remember that the events you are referring to above are all processed in the
perf tools where there is no way to go from pid to vpid.  And to preempt any
hope, there is no way to "switch" the pid for the vpid in the trace stream.
That is all done in hardware.

I think the easiest solution is to simply avoid using contextIDs when PID
namespaces are involved.

> Loop in Mathieu, this is a common issue for both Arm SPE and CoreSight
> (IIRC, though CoreSight's timestamp is not strictly attaching to Arm arch
> timer counter, the trend is to unify this for using arch timer
> counter).
> 
> I think James could continue to upstream a new patch by following your
> suggestion for enabling PID_IN_CONTEXTIDR, eventually, it's a feature
> for Arm SPE to record CONTEXTIDR in its packet.
> 
> Your questions inspired me, thanks!
> 
> Leo
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
James Clark Dec. 14, 2020, 8:51 a.m. UTC | #7
On 02/12/2020 01:09, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote:
>> On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote:
>>>> Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled.
>>>> This flag is required to get PID data in the SPE trace. Without
>>>> it the perf tool will report 0 for PID which isn't very useful,
>>>> especially when doing system wide profiling or profiling
>>>> applications that fork.
>>>
>>> Can perf not figure out the pid some other way? (e.g. by tracing context
>>> switches and correlating that with the SPE data?).
>>
>> For perf 'per-thread' mode, we can use context switch trace event as
>> assisted info to select thread context.  But for "system wide" mode and
>> "snapshot" mode in perf tool, since the trace data is continuous, I
>> think we cannot use context switch trace event to correlate the SPE
>> trace data.
> 
> Is there no way to correlate them with something like CNTVCT?
> 
>>> Also, how does this work with pid namespaces?
>>
>> Here we are studying the implemetation of Intel-PT and Arm CoreSight.
>>
>> The context ID is stored into the hardware trace data when record;
>> afterwards when perf tool decodes the trace data and detects the
>> packet for context ID, it will select the machine's thread context in
>> perf [1].  Since the perf tool gathers all the threads infomation in
>> perf data file, based on the context ID, it can find the corresponding
>> thread pointer with function machine__find_thread() [2].
>>
>> Since your question is for "pid namespace", to be honest, I don't know
>> how perf tool to handle any confliction for differrent processes share
>> the same PID, and I am not sure if you are asking CGroup related stuff
>> or not.  If this cannot answer your question, please let me know.
> 
> My point was that the pid value written to CONTEXTIDR is a global pid
> and does not take namespacing into account. If perf is run inside a pid
> namespace, it will therefore not work.

That's an interesting point, but I think we should improve this for the simple
use case without namespaces first just to improve the user experience, so I've
sent v2 of the patch with the change you suggested about using "default y".

One other thing that is an issue that I'd like to ask about is this line in
arm_spe_pmu.c:

	if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
		reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);

This means that the user has to be root to get the context saved with SPE.
Is this a necessary security feature? I thought that PIDs are viewable by
all users anyway? Do you think there is any way we could remove the perfmon_capable()
requirement?

James
diff mbox series

Patch

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 130327ff0b0e..47ede46c3d57 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -125,6 +125,7 @@  config XGENE_PMU
 config ARM_SPE_PMU
 	tristate "Enable support for the ARMv8.2 Statistical Profiling Extension"
 	depends on ARM64
+	select PID_IN_CONTEXTIDR
 	help
 	  Enable perf support for the ARMv8.2 Statistical Profiling
 	  Extension, which provides periodic sampling of operations in