Message ID | 20201130162454.28255-1-james.clark@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drivers/perf: Enable PID_IN_CONTEXTIDR with SPE | expand |
On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote: > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. > This flag is required to get PID data in the SPE trace. Without > it the perf tool will report 0 for PID which isn't very useful, > especially when doing system wide profiling or profiling > applications that fork. Can perf not figure out the pid some other way? (e.g. by tracing context switches and correlating that with the SPE data?). Also, how does this work with pid namespaces? > There is a small performance overhead when enabling > PID_IN_CONTEXTIDR, but SPE itself is optional and not enabled by > default so the impact is minimised. > > Cc: Will Deacon <will@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Al Grant <al.grant@arm.com> > Cc: Leo Yan <leo.yan@linaro.org> > Cc: John Garry <john.garry@huawei.com> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > Signed-off-by: James Clark <james.clark@arm.com> > --- > drivers/perf/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig > index 130327ff0b0e..47ede46c3d57 100644 > --- a/drivers/perf/Kconfig > +++ b/drivers/perf/Kconfig > @@ -125,6 +125,7 @@ config XGENE_PMU > config ARM_SPE_PMU > tristate "Enable support for the ARMv8.2 Statistical Profiling Extension" > depends on ARM64 > + select PID_IN_CONTEXTIDR Probably better to make PID_IN_CONTEXTIDR 'default y' if SPE is enabled, rather than selecting it directly. That way, at least people can turn it off if they don't want it. Will
Hi Will, On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote: > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote: > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. > > This flag is required to get PID data in the SPE trace. Without > > it the perf tool will report 0 for PID which isn't very useful, > > especially when doing system wide profiling or profiling > > applications that fork. > > Can perf not figure out the pid some other way? (e.g. by tracing context > switches and correlating that with the SPE data?). For perf 'per-thread' mode, we can use context switch trace event as assisted info to select thread context. But for "system wide" mode and "snapshot" mode in perf tool, since the trace data is continuous, I think we cannot use context switch trace event to correlate the SPE trace data. > Also, how does this work with pid namespaces? Here we are studying the implemetation of Intel-PT and Arm CoreSight. The context ID is stored into the hardware trace data when record; afterwards when perf tool decodes the trace data and detects the packet for context ID, it will select the machine's thread context in perf [1]. Since the perf tool gathers all the threads infomation in perf data file, based on the context ID, it can find the corresponding thread pointer with function machine__find_thread() [2]. Since your question is for "pid namespace", to be honest, I don't know how perf tool to handle any confliction for differrent processes share the same PID, and I am not sure if you are asking CGroup related stuff or not. If this cannot answer your question, please let me know. Thanks, Leo [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c#n510 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/cs-etm.c#n1076 > > > There is a small performance overhead when enabling > > PID_IN_CONTEXTIDR, but SPE itself is optional and not enabled by > > default so the impact is minimised. > > > > Cc: Will Deacon <will@kernel.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Al Grant <al.grant@arm.com> > > Cc: Leo Yan <leo.yan@linaro.org> > > Cc: John Garry <john.garry@huawei.com> > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > Signed-off-by: James Clark <james.clark@arm.com> > > --- > > drivers/perf/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig > > index 130327ff0b0e..47ede46c3d57 100644 > > --- a/drivers/perf/Kconfig > > +++ b/drivers/perf/Kconfig > > @@ -125,6 +125,7 @@ config XGENE_PMU > > config ARM_SPE_PMU > > tristate "Enable support for the ARMv8.2 Statistical Profiling Extension" > > depends on ARM64 > > + select PID_IN_CONTEXTIDR > > Probably better to make PID_IN_CONTEXTIDR 'default y' if SPE is enabled, > rather than selecting it directly. That way, at least people can turn it > off if they don't want it. > > Will
On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote: > On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote: > > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote: > > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. > > > This flag is required to get PID data in the SPE trace. Without > > > it the perf tool will report 0 for PID which isn't very useful, > > > especially when doing system wide profiling or profiling > > > applications that fork. > > > > Can perf not figure out the pid some other way? (e.g. by tracing context > > switches and correlating that with the SPE data?). > > For perf 'per-thread' mode, we can use context switch trace event as > assisted info to select thread context. But for "system wide" mode and > "snapshot" mode in perf tool, since the trace data is continuous, I > think we cannot use context switch trace event to correlate the SPE > trace data. Is there no way to correlate them with something like CNTVCT? > > Also, how does this work with pid namespaces? > > Here we are studying the implemetation of Intel-PT and Arm CoreSight. > > The context ID is stored into the hardware trace data when record; > afterwards when perf tool decodes the trace data and detects the > packet for context ID, it will select the machine's thread context in > perf [1]. Since the perf tool gathers all the threads infomation in > perf data file, based on the context ID, it can find the corresponding > thread pointer with function machine__find_thread() [2]. > > Since your question is for "pid namespace", to be honest, I don't know > how perf tool to handle any confliction for differrent processes share > the same PID, and I am not sure if you are asking CGroup related stuff > or not. If this cannot answer your question, please let me know. My point was that the pid value written to CONTEXTIDR is a global pid and does not take namespacing into account. If perf is run inside a pid namespace, it will therefore not work. Will
Hi Will, [ + Mathieu ] On Tue, Dec 01, 2020 at 11:09:36PM +0000, Will Deacon wrote: > On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote: > > On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote: > > > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote: > > > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. > > > > This flag is required to get PID data in the SPE trace. Without > > > > it the perf tool will report 0 for PID which isn't very useful, > > > > especially when doing system wide profiling or profiling > > > > applications that fork. > > > > > > Can perf not figure out the pid some other way? (e.g. by tracing context > > > switches and correlating that with the SPE data?). > > > > For perf 'per-thread' mode, we can use context switch trace event as > > assisted info to select thread context. But for "system wide" mode and > > "snapshot" mode in perf tool, since the trace data is continuous, I > > think we cannot use context switch trace event to correlate the SPE > > trace data. > > Is there no way to correlate them with something like CNTVCT? Good point. Yes, we can convert CNTVCT to system time; I read the code in the perf's intel-pt.c and found the timestamp is used to correlate the auxtrace heap. I think it's better to dig more for detailed implementation. > > > Also, how does this work with pid namespaces? > > > > Here we are studying the implemetation of Intel-PT and Arm CoreSight. > > > > The context ID is stored into the hardware trace data when record; > > afterwards when perf tool decodes the trace data and detects the > > packet for context ID, it will select the machine's thread context in > > perf [1]. Since the perf tool gathers all the threads infomation in > > perf data file, based on the context ID, it can find the corresponding > > thread pointer with function machine__find_thread() [2]. > > > > Since your question is for "pid namespace", to be honest, I don't know > > how perf tool to handle any confliction for differrent processes share > > the same PID, and I am not sure if you are asking CGroup related stuff > > or not. If this cannot answer your question, please let me know. > > My point was that the pid value written to CONTEXTIDR is a global pid > and does not take namespacing into account. If perf is run inside a pid > namespace, it will therefore not work. Understand now. The perf events PERF_RECORD_ITRACE_START/PERF_RECORD_SWITCH/ PERF_RECORD_SWITCH_CPU_WIDE can be used to set pid/tid in perf. So this would be a safe way for perf tool running in pid namespace. Loop in Mathieu, this is a common issue for both Arm SPE and CoreSight (IIRC, though CoreSight's timestamp is not strictly attaching to Arm arch timer counter, the trend is to unify this for using arch timer counter). I think James could continue to upstream a new patch by following your suggestion for enabling PID_IN_CONTEXTIDR, eventually, it's a feature for Arm SPE to record CONTEXTIDR in its packet. Your questions inspired me, thanks! Leo
Looping in Mathieu at this time. On Thu, Dec 03, 2020 at 02:39:41PM +0800, Leo Yan wrote: > Hi Will, > > [ + Mathieu ] > > On Tue, Dec 01, 2020 at 11:09:36PM +0000, Will Deacon wrote: > > On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote: > > > On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote: > > > > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote: > > > > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. > > > > > This flag is required to get PID data in the SPE trace. Without > > > > > it the perf tool will report 0 for PID which isn't very useful, > > > > > especially when doing system wide profiling or profiling > > > > > applications that fork. > > > > > > > > Can perf not figure out the pid some other way? (e.g. by tracing context > > > > switches and correlating that with the SPE data?). > > > > > > For perf 'per-thread' mode, we can use context switch trace event as > > > assisted info to select thread context. But for "system wide" mode and > > > "snapshot" mode in perf tool, since the trace data is continuous, I > > > think we cannot use context switch trace event to correlate the SPE > > > trace data. > > > > Is there no way to correlate them with something like CNTVCT? > > Good point. Yes, we can convert CNTVCT to system time; I read the > code in the perf's intel-pt.c and found the timestamp is used to > correlate the auxtrace heap. I think it's better to dig more for > detailed implementation. > > > > > Also, how does this work with pid namespaces? > > > > > > Here we are studying the implemetation of Intel-PT and Arm CoreSight. > > > > > > The context ID is stored into the hardware trace data when record; > > > afterwards when perf tool decodes the trace data and detects the > > > packet for context ID, it will select the machine's thread context in > > > perf [1]. Since the perf tool gathers all the threads infomation in > > > perf data file, based on the context ID, it can find the corresponding > > > thread pointer with function machine__find_thread() [2]. > > > > > > Since your question is for "pid namespace", to be honest, I don't know > > > how perf tool to handle any confliction for differrent processes share > > > the same PID, and I am not sure if you are asking CGroup related stuff > > > or not. If this cannot answer your question, please let me know. > > > > My point was that the pid value written to CONTEXTIDR is a global pid > > and does not take namespacing into account. If perf is run inside a pid > > namespace, it will therefore not work. > > Understand now. > > The perf events PERF_RECORD_ITRACE_START/PERF_RECORD_SWITCH/ > PERF_RECORD_SWITCH_CPU_WIDE can be used to set pid/tid in perf. So this > would be a safe way for perf tool running in pid namespace. > > Loop in Mathieu, this is a common issue for both Arm SPE and CoreSight > (IIRC, though CoreSight's timestamp is not strictly attaching to Arm arch > timer counter, the trend is to unify this for using arch timer > counter). > > I think James could continue to upstream a new patch by following your > suggestion for enabling PID_IN_CONTEXTIDR, eventually, it's a feature > for Arm SPE to record CONTEXTIDR in its packet. > > Your questions inspired me, thanks! > > Leo
On Thu, Dec 03, 2020 at 02:39:41PM +0800, Leo Yan wrote: > Hi Will, > > [ + Mathieu ] > > On Tue, Dec 01, 2020 at 11:09:36PM +0000, Will Deacon wrote: > > On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote: > > > On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote: > > > > On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote: > > > > > Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. > > > > > This flag is required to get PID data in the SPE trace. Without > > > > > it the perf tool will report 0 for PID which isn't very useful, > > > > > especially when doing system wide profiling or profiling > > > > > applications that fork. > > > > > > > > Can perf not figure out the pid some other way? (e.g. by tracing context > > > > switches and correlating that with the SPE data?). > > > > > > For perf 'per-thread' mode, we can use context switch trace event as > > > assisted info to select thread context. But for "system wide" mode and > > > "snapshot" mode in perf tool, since the trace data is continuous, I > > > think we cannot use context switch trace event to correlate the SPE > > > trace data. > > > > Is there no way to correlate them with something like CNTVCT? > > Good point. Yes, we can convert CNTVCT to system time; I read the > code in the perf's intel-pt.c and found the timestamp is used to > correlate the auxtrace heap. I think it's better to dig more for > detailed implementation. > > > > > Also, how does this work with pid namespaces? > > > > > > Here we are studying the implemetation of Intel-PT and Arm CoreSight. > > > > > > The context ID is stored into the hardware trace data when record; > > > afterwards when perf tool decodes the trace data and detects the > > > packet for context ID, it will select the machine's thread context in > > > perf [1]. Since the perf tool gathers all the threads infomation in > > > perf data file, based on the context ID, it can find the corresponding > > > thread pointer with function machine__find_thread() [2]. > > > > > > Since your question is for "pid namespace", to be honest, I don't know > > > how perf tool to handle any confliction for differrent processes share > > > the same PID, and I am not sure if you are asking CGroup related stuff > > > or not. If this cannot answer your question, please let me know. > > > > My point was that the pid value written to CONTEXTIDR is a global pid > > and does not take namespacing into account. If perf is run inside a pid > > namespace, it will therefore not work. Will is correct. That is why coresight doesn't allow contextID tracing if the request is made from inside a PID namespace. But that is only from the sysfs interface. From perf the guard wasn't put in place, probably because CPU wide trace scenarios were not implemented yet. > > Understand now. > > The perf events PERF_RECORD_ITRACE_START/PERF_RECORD_SWITCH/ > PERF_RECORD_SWITCH_CPU_WIDE can be used to set pid/tid in perf. So this > would be a safe way for perf tool running in pid namespace. > Remember that the events you are referring to above are all processed in the perf tools where there is no way to go from pid to vpid. And to preempt any hope, there is no way to "switch" the pid for the vpid in the trace stream. That is all done in hardware. I think the easiest solution is to simply avoid using contextIDs when PID namespaces are involved. > Loop in Mathieu, this is a common issue for both Arm SPE and CoreSight > (IIRC, though CoreSight's timestamp is not strictly attaching to Arm arch > timer counter, the trend is to unify this for using arch timer > counter). > > I think James could continue to upstream a new patch by following your > suggestion for enabling PID_IN_CONTEXTIDR, eventually, it's a feature > for Arm SPE to record CONTEXTIDR in its packet. > > Your questions inspired me, thanks! > > Leo > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 02/12/2020 01:09, Will Deacon wrote: > On Tue, Dec 01, 2020 at 12:10:40PM +0800, Leo Yan wrote: >> On Mon, Nov 30, 2020 at 04:46:51PM +0000, Will Deacon wrote: >>> On Mon, Nov 30, 2020 at 06:24:54PM +0200, James Clark wrote: >>>> Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. >>>> This flag is required to get PID data in the SPE trace. Without >>>> it the perf tool will report 0 for PID which isn't very useful, >>>> especially when doing system wide profiling or profiling >>>> applications that fork. >>> >>> Can perf not figure out the pid some other way? (e.g. by tracing context >>> switches and correlating that with the SPE data?). >> >> For perf 'per-thread' mode, we can use context switch trace event as >> assisted info to select thread context. But for "system wide" mode and >> "snapshot" mode in perf tool, since the trace data is continuous, I >> think we cannot use context switch trace event to correlate the SPE >> trace data. > > Is there no way to correlate them with something like CNTVCT? > >>> Also, how does this work with pid namespaces? >> >> Here we are studying the implemetation of Intel-PT and Arm CoreSight. >> >> The context ID is stored into the hardware trace data when record; >> afterwards when perf tool decodes the trace data and detects the >> packet for context ID, it will select the machine's thread context in >> perf [1]. Since the perf tool gathers all the threads infomation in >> perf data file, based on the context ID, it can find the corresponding >> thread pointer with function machine__find_thread() [2]. >> >> Since your question is for "pid namespace", to be honest, I don't know >> how perf tool to handle any confliction for differrent processes share >> the same PID, and I am not sure if you are asking CGroup related stuff >> or not. If this cannot answer your question, please let me know. > > My point was that the pid value written to CONTEXTIDR is a global pid > and does not take namespacing into account. If perf is run inside a pid > namespace, it will therefore not work. That's an interesting point, but I think we should improve this for the simple use case without namespaces first just to improve the user experience, so I've sent v2 of the patch with the change you suggested about using "default y". One other thing that is an issue that I'd like to ask about is this line in arm_spe_pmu.c: if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT); This means that the user has to be root to get the context saved with SPE. Is this a necessary security feature? I thought that PIDs are viewable by all users anyway? Do you think there is any way we could remove the perfmon_capable() requirement? James
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index 130327ff0b0e..47ede46c3d57 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -125,6 +125,7 @@ config XGENE_PMU config ARM_SPE_PMU tristate "Enable support for the ARMv8.2 Statistical Profiling Extension" depends on ARM64 + select PID_IN_CONTEXTIDR help Enable perf support for the ARMv8.2 Statistical Profiling Extension, which provides periodic sampling of operations in
Enable PID_IN_CONTEXTIDR by default when Arm SPE is enabled. This flag is required to get PID data in the SPE trace. Without it the perf tool will report 0 for PID which isn't very useful, especially when doing system wide profiling or profiling applications that fork. There is a small performance overhead when enabling PID_IN_CONTEXTIDR, but SPE itself is optional and not enabled by default so the impact is minimised. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Al Grant <al.grant@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: John Garry <john.garry@huawei.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> --- drivers/perf/Kconfig | 1 + 1 file changed, 1 insertion(+)