Message ID | 20240827164417.3309560-2-leo.yan@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf arm-spe: Introduce metadata version 2 | expand |
On Tue, Aug 27, 2024 at 05:44:09PM +0100, Leo Yan wrote: > This commit adds a new entry 'lds' in the capacity folder. 'lds' stands > for "loaded data source". When its value is 1, it indicates the data > source implemented, and data source packets will be recorded in the > trace data. > > Signed-off-by: Leo Yan <leo.yan@arm.com> > --- > drivers/perf/arm_spe_pmu.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > index 9100d82bfabc..81c1e7627721 100644 > --- a/drivers/perf/arm_spe_pmu.c > +++ b/drivers/perf/arm_spe_pmu.c > @@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action { > /* This sysfs gunk was really good fun to write. */ > enum arm_spe_pmu_capabilities { > SPE_PMU_CAP_ARCH_INST = 0, > + SPE_PMU_CAP_LDS, > SPE_PMU_CAP_ERND, > SPE_PMU_CAP_FEAT_MAX, > SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX, > @@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities { > > static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = { > [SPE_PMU_CAP_ARCH_INST] = SPE_PMU_FEAT_ARCH_INST, > + [SPE_PMU_CAP_LDS] = SPE_PMU_FEAT_LDS, > [SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND, > }; > > @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev, > > static struct attribute *arm_spe_pmu_cap_attr[] = { > SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST), > + SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS), > SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND), > SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ), > SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL), What will userspace do with this? I don't think you can turn LDS on/off, so either you'll get the data source packet or you won't. Will
On 8/30/24 11:38, Will Deacon wrote: > On Tue, Aug 27, 2024 at 05:44:09PM +0100, Leo Yan wrote: >> This commit adds a new entry 'lds' in the capacity folder. 'lds' stands >> for "loaded data source". When its value is 1, it indicates the data >> source implemented, and data source packets will be recorded in the >> trace data. >> >> Signed-off-by: Leo Yan <leo.yan@arm.com> >> --- >> drivers/perf/arm_spe_pmu.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c >> index 9100d82bfabc..81c1e7627721 100644 >> --- a/drivers/perf/arm_spe_pmu.c >> +++ b/drivers/perf/arm_spe_pmu.c >> @@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action { >> /* This sysfs gunk was really good fun to write. */ >> enum arm_spe_pmu_capabilities { >> SPE_PMU_CAP_ARCH_INST = 0, >> + SPE_PMU_CAP_LDS, >> SPE_PMU_CAP_ERND, >> SPE_PMU_CAP_FEAT_MAX, >> SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX, >> @@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities { >> >> static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = { >> [SPE_PMU_CAP_ARCH_INST] = SPE_PMU_FEAT_ARCH_INST, >> + [SPE_PMU_CAP_LDS] = SPE_PMU_FEAT_LDS, >> [SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND, >> }; >> >> @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev, >> >> static struct attribute *arm_spe_pmu_cap_attr[] = { >> SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST), >> + SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS), >> SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND), >> SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ), >> SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL), > > What will userspace do with this? I don't think you can turn LDS on/off, > so either you'll get the data source packet or you won't. Yes, LDS bit does not work as a switch. The tool in the userspace will record the LDS bit into the metadata. During decoding phase, it reads out the LDS from metadata. Based on it, the perf tool can know if the data source is supported or not, if yes then decode the data source packet. Another point is how to decide the data source packet format. Now we maintain a CPU list for tracking CPU variants which support data source trace. For long term, I would like the tool can based on hardware feature (e.g. a ID register in Arm SPE) to decide the data source format, so far it is absent. This is why LDS bit + CPU list is a more reliable way. See some discussion [1]. Thanks, Leo [1] https://lore.kernel.org/linux-perf-users/Zl9jLtiFagBcH7oH@J2N7QTR9R3/
On Fri, Aug 30, 2024 at 01:12:33PM +0100, Leo Yan wrote: > On 8/30/24 11:38, Will Deacon wrote: > > On Tue, Aug 27, 2024 at 05:44:09PM +0100, Leo Yan wrote: > > > This commit adds a new entry 'lds' in the capacity folder. 'lds' stands > > > for "loaded data source". When its value is 1, it indicates the data > > > source implemented, and data source packets will be recorded in the > > > trace data. > > > > > > Signed-off-by: Leo Yan <leo.yan@arm.com> > > > --- > > > drivers/perf/arm_spe_pmu.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c > > > index 9100d82bfabc..81c1e7627721 100644 > > > --- a/drivers/perf/arm_spe_pmu.c > > > +++ b/drivers/perf/arm_spe_pmu.c > > > @@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action { > > > /* This sysfs gunk was really good fun to write. */ > > > enum arm_spe_pmu_capabilities { > > > SPE_PMU_CAP_ARCH_INST = 0, > > > + SPE_PMU_CAP_LDS, > > > SPE_PMU_CAP_ERND, > > > SPE_PMU_CAP_FEAT_MAX, > > > SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX, > > > @@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities { > > > > > > static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = { > > > [SPE_PMU_CAP_ARCH_INST] = SPE_PMU_FEAT_ARCH_INST, > > > + [SPE_PMU_CAP_LDS] = SPE_PMU_FEAT_LDS, > > > [SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND, > > > }; > > > > > > @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev, > > > > > > static struct attribute *arm_spe_pmu_cap_attr[] = { > > > SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST), > > > + SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS), > > > SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND), > > > SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ), > > > SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL), > > > > What will userspace do with this? I don't think you can turn LDS on/off, > > so either you'll get the data source packet or you won't. > > Yes, LDS bit does not work as a switch. > > The tool in the userspace will record the LDS bit into the metadata. During > decoding phase, it reads out the LDS from metadata. Based on it, the perf > tool can know if the data source is supported or not, if yes then decode the > data source packet. Why not just decode a data source packet when you see it? i.e. assume LDS is always set. > Another point is how to decide the data source packet format. Now we maintain > a CPU list for tracking CPU variants which support data source trace. For long > term, I would like the tool can based on hardware feature (e.g. a ID register > in Arm SPE) to decide the data source format, so far it is absent. This is why > LDS bit + CPU list is a more reliable way. See some discussion [1]. Huh. Why would you have a CPU in the list if it _doesn't_ have LDS? If we have to resort to per-CPU decoding, then that's even more of a reason not to have the LDS cap imo. Will
On 8/30/2024 2:09 PM, Will Deacon wrote: [...] >>>> @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev, >>>> >>>> static struct attribute *arm_spe_pmu_cap_attr[] = { >>>> SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST), >>>> + SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS), >>>> SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND), >>>> SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ), >>>> SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL), >>> >>> What will userspace do with this? I don't think you can turn LDS on/off, >>> so either you'll get the data source packet or you won't. >> >> Yes, LDS bit does not work as a switch. >> >> The tool in the userspace will record the LDS bit into the metadata. During >> decoding phase, it reads out the LDS from metadata. Based on it, the perf >> tool can know if the data source is supported or not, if yes then decode the >> data source packet. > > Why not just decode a data source packet when you see it? i.e. assume LDS > is always set. The current tool works this way to directly decode a data source packet. However, as Arm ARM section D17.2.4 "Data Source packet" describes, the loaded data source is implementation dependent, the data source payload format also is implementation defined. We are halfway here in using the LDS bit to determine if the data source is implemented. However, we lack information on the data source format implementation. As a first step, we can use the LDS bit for sanity checking in the tool to detect any potential silicon implementation issues. Once we have an architectural definition for the data source format, we can extend the tool accordingly. >> Another point is how to decide the data source packet format. Now we maintain >> a CPU list for tracking CPU variants which support data source trace. For long >> term, I would like the tool can based on hardware feature (e.g. a ID register >> in Arm SPE) to decide the data source format, so far it is absent. This is why >> LDS bit + CPU list is a more reliable way. See some discussion [1]. > > Huh. Why would you have a CPU in the list if it _doesn't_ have LDS? Yeah, this is what we don't expect - we can verify the implementation based on LDS bit. E.g. if users ask data source related questions, we can use LDS bit (saved in the perf metadata) to confirm the feature has been implemented in a silicon. > If we have to resort to per-CPU decoding, then that's even more of a reason> not to have the LDS cap imo. This series converts the Arm SPE information into per-CPU metadata, including the LDS bit. Consequently, the decoding process retrieves CPU metadata for per-CPU decoding, making it easy to determine if a CPU supports the data source. We have platforms that not all CPUs support Arm SPE, for example, the CPU0 and CPU1 don't support Arm SPE, CPU2~CPU5 share a Arm SPE PMU event, CPU6~CPU7 share another Arm SPE PMU event. In this case, per CPU metadata can be easily for checking hardware capacity (include LDS bit) in the decoding. Thanks, Leo
On Sat, Aug 31, 2024 at 12:37:29PM +0100, Leo Yan wrote: > On 8/30/2024 2:09 PM, Will Deacon wrote: > > [...] > > >>>> @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev, > >>>> > >>>> static struct attribute *arm_spe_pmu_cap_attr[] = { > >>>> SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST), > >>>> + SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS), > >>>> SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND), > >>>> SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ), > >>>> SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL), > >>> > >>> What will userspace do with this? I don't think you can turn LDS on/off, > >>> so either you'll get the data source packet or you won't. > >> > >> Yes, LDS bit does not work as a switch. > >> > >> The tool in the userspace will record the LDS bit into the metadata. During > >> decoding phase, it reads out the LDS from metadata. Based on it, the perf > >> tool can know if the data source is supported or not, if yes then decode the > >> data source packet. > > > > Why not just decode a data source packet when you see it? i.e. assume LDS > > is always set. > > The current tool works this way to directly decode a data source packet. > > However, as Arm ARM section D17.2.4 "Data Source packet" describes, the loaded > data source is implementation dependent, the data source payload format also > is implementation defined. > > We are halfway here in using the LDS bit to determine if the data source is > implemented. However, we lack information on the data source format > implementation. As a first step, we can use the LDS bit for sanity checking in > the tool to detect any potential silicon implementation issues. Once we have > an architectural definition for the data source format, we can extend the tool > accordingly. I don't think we shyould expose UAPI from the driver to detect potential hardware bugs. Let's add it when we know it's useful for something instead. > > >> Another point is how to decide the data source packet format. Now we maintain > >> a CPU list for tracking CPU variants which support data source trace. For long > >> term, I would like the tool can based on hardware feature (e.g. a ID register > >> in Arm SPE) to decide the data source format, so far it is absent. This is why > >> LDS bit + CPU list is a more reliable way. See some discussion [1]. > > > > Huh. Why would you have a CPU in the list if it _doesn't_ have LDS? > > Yeah, this is what we don't expect - we can verify the implementation based on > LDS bit. > > E.g. if users ask data source related questions, we can use LDS bit (saved in > the perf metadata) to confirm the feature has been implemented in a silicon. What exactly do you mean by this? As far as I can tell: - Data source packets are either present or absent depending on LDS - You need CPU-specific information to decode them it they are present So it's neither necessary nor sufficient to expose the LDS bit to userspace. Will
On 9/4/2024 1:35 PM, Will Deacon wrote: >>> Why not just decode a data source packet when you see it? i.e. assume LDS >>> is always set. >> >> The current tool works this way to directly decode a data source packet. >> >> However, as Arm ARM section D17.2.4 "Data Source packet" describes, the loaded >> data source is implementation dependent, the data source payload format also >> is implementation defined. >> >> We are halfway here in using the LDS bit to determine if the data source is >> implemented. However, we lack information on the data source format >> implementation. As a first step, we can use the LDS bit for sanity checking in >> the tool to detect any potential silicon implementation issues. Once we have >> an architectural definition for the data source format, we can extend the tool >> accordingly. > > I don't think we shyould expose UAPI from the driver to detect potential > hardware bugs. Let's add it when we know it's useful for something instead. I understand your concern. >>>> Another point is how to decide the data source packet format. Now we maintain >>>> a CPU list for tracking CPU variants which support data source trace. For long >>>> term, I would like the tool can based on hardware feature (e.g. a ID register >>>> in Arm SPE) to decide the data source format, so far it is absent. This is why >>>> LDS bit + CPU list is a more reliable way. See some discussion [1]. >>> >>> Huh. Why would you have a CPU in the list if it _doesn't_ have LDS? >> >> Yeah, this is what we don't expect - we can verify the implementation based on >> LDS bit. >> >> E.g. if users ask data source related questions, we can use LDS bit (saved in >> the perf metadata) to confirm the feature has been implemented in a silicon. > > What exactly do you mean by this? Sometimes, users might ask if the data source is supported in Arm SPE. With exposing the LDS bit, it would be helpful for us to check if the feature is supported. For example, when a user uses the perf tool to record Arm SPE data, the LDS bit is stored in the perf metadata, and we can check its value during post-analysis. > As far as I can tell: > > - Data source packets are either present or absent depending on LDS > - You need CPU-specific information to decode them it they are present > > So it's neither necessary nor sufficient to expose the LDS bit to > userspace. We can live without exposing LDS bit currently. I will drop this change in next spin. Just head up, later I think we might still need to expose capacity bits (or the PMSIDR_EL1 register) for new features. As you said, we can do it when it is necessary. Thanks for suggestion! Leo
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index 9100d82bfabc..81c1e7627721 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -110,6 +110,7 @@ enum arm_spe_pmu_buf_fault_action { /* This sysfs gunk was really good fun to write. */ enum arm_spe_pmu_capabilities { SPE_PMU_CAP_ARCH_INST = 0, + SPE_PMU_CAP_LDS, SPE_PMU_CAP_ERND, SPE_PMU_CAP_FEAT_MAX, SPE_PMU_CAP_CNT_SZ = SPE_PMU_CAP_FEAT_MAX, @@ -118,6 +119,7 @@ enum arm_spe_pmu_capabilities { static int arm_spe_pmu_feat_caps[SPE_PMU_CAP_FEAT_MAX] = { [SPE_PMU_CAP_ARCH_INST] = SPE_PMU_FEAT_ARCH_INST, + [SPE_PMU_CAP_LDS] = SPE_PMU_FEAT_LDS, [SPE_PMU_CAP_ERND] = SPE_PMU_FEAT_ERND, }; @@ -160,6 +162,7 @@ static ssize_t arm_spe_pmu_cap_show(struct device *dev, static struct attribute *arm_spe_pmu_cap_attr[] = { SPE_CAP_EXT_ATTR_ENTRY(arch_inst, SPE_PMU_CAP_ARCH_INST), + SPE_CAP_EXT_ATTR_ENTRY(lds, SPE_PMU_CAP_LDS), SPE_CAP_EXT_ATTR_ENTRY(ernd, SPE_PMU_CAP_ERND), SPE_CAP_EXT_ATTR_ENTRY(count_size, SPE_PMU_CAP_CNT_SZ), SPE_CAP_EXT_ATTR_ENTRY(min_interval, SPE_PMU_CAP_MIN_IVAL),
This commit adds a new entry 'lds' in the capacity folder. 'lds' stands for "loaded data source". When its value is 1, it indicates the data source implemented, and data source packets will be recorded in the trace data. Signed-off-by: Leo Yan <leo.yan@arm.com> --- drivers/perf/arm_spe_pmu.c | 3 +++ 1 file changed, 3 insertions(+)