Message ID | 20210419195812.147710-1-terry.bowman@amd.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Len Brown |
Headers | show |
Series | [v2] tools/power turbostat: Fix RAPL summary collection on AMD processors | expand |
On Mon, 2021-04-19 at 14:58 -0500, Terry Bowman wrote:
@@ -3281,7 +3326,7 @@ static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg
continue;
ret = get_msr(cpu, offset, &msr_cur);
if (ret) {
- fprintf(outf, "Can not update msr(0x%x)\n", offset);
+ fprintf(outf, "Can not update msr(0x%lx)\n", offset);
This gives a warning when compiled on 32-bit, since turbostat is
compiled with -D_FILE_OFFSET_BITS=64:
turbostat.c: In function 'update_msr_sum':
turbostat.c:3329:42: warning: format '%lx' expects argument of type
'long unsigned int', but argument 3 has type 'off_t' {aka 'long long
int'} [-Wformat=]
3329 | fprintf(outf, "Can not update msr(0x%lx)\n", offset);
| ~~^ ~~~~~~
| | |
| | off_t {aka long long int}
| long unsigned int
| %llx
Easiest fix is probably to cast to (long long int) and use the %llx
format specifier. That should be valid with i686, x32, and amd64
userspace.
Everything else looks fine as far as I can tell, so feel free to add a
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
once you've fixed that warning.
On Mon, Apr 19, 2021 at 02:58:12PM -0500, Terry Bowman wrote: > Turbostat fails to correctly collect and display RAPL summary information > on Family 17h and 19h AMD processors. Running turbostat on these processors > returns immediately. If turbostat is working correctly then RAPL summary > data is displayed until the user provided command completes. If a command > is not provided by the user then turbostat is designed to continuously > display RAPL information until interrupted. > > The issue is due to offset_to_idx() and idx_to_offset() missing support for > AMD MSR addresses/offsets. offset_to_idx()'s switch statement is missing > cases for AMD MSRs and idx_to_offset() does not include a path to return > AMD MSR(s) for any idx. > > The solution is add AMD MSR support to offset_to_idx() and idx_to_offset(). > These functions are split-out and renamed along architecture vendor lines > for supporting both AMD and Intel MSRs. > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") > Signed-off-by: Terry Bowman <terry.bowman@amd.com> Thanks for fixing, Terry, and previously there was a patch for this from Bas Nieuwenhuizen: https://lkml.org/lkml/2021/3/12/682 and it is expected to have been merged in Len's branch already. thanks, Chenyu
On Tue, Apr 20, 2021 at 10:03:36AM +0800, Chen Yu wrote: > On Mon, Apr 19, 2021 at 02:58:12PM -0500, Terry Bowman wrote: > > Turbostat fails to correctly collect and display RAPL summary information > > on Family 17h and 19h AMD processors. Running turbostat on these processors > > returns immediately. If turbostat is working correctly then RAPL summary > > data is displayed until the user provided command completes. If a command > > is not provided by the user then turbostat is designed to continuously > > display RAPL information until interrupted. > > > > The issue is due to offset_to_idx() and idx_to_offset() missing support for > > AMD MSR addresses/offsets. offset_to_idx()'s switch statement is missing > > cases for AMD MSRs and idx_to_offset() does not include a path to return > > AMD MSR(s) for any idx. > > > > The solution is add AMD MSR support to offset_to_idx() and idx_to_offset(). > > These functions are split-out and renamed along architecture vendor lines > > for supporting both AMD and Intel MSRs. > > > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Thanks for fixing, Terry, and previously there was a patch for this from Bas Nieuwenhuizen: > https://lkml.org/lkml/2021/3/12/682 > and it is expected to have been merged in Len's branch already. Expected? So is it or is it not? And can you folks agree on a patch already and give it to Artem for testing (CCed) because he's triggering it too: https://bugzilla.kernel.org/show_bug.cgi?id=212357 Thx.
On Tue, Apr 20, 2021 at 10:07:01AM +0200, Borislav Petkov wrote: > On Tue, Apr 20, 2021 at 10:03:36AM +0800, Chen Yu wrote: > > On Mon, Apr 19, 2021 at 02:58:12PM -0500, Terry Bowman wrote: > > > Turbostat fails to correctly collect and display RAPL summary information > > > on Family 17h and 19h AMD processors. Running turbostat on these processors > > > returns immediately. If turbostat is working correctly then RAPL summary > > > data is displayed until the user provided command completes. If a command > > > is not provided by the user then turbostat is designed to continuously > > > display RAPL information until interrupted. > > > > > > The issue is due to offset_to_idx() and idx_to_offset() missing support for > > > AMD MSR addresses/offsets. offset_to_idx()'s switch statement is missing > > > cases for AMD MSRs and idx_to_offset() does not include a path to return > > > AMD MSR(s) for any idx. > > > > > > The solution is add AMD MSR support to offset_to_idx() and idx_to_offset(). > > > These functions are split-out and renamed along architecture vendor lines > > > for supporting both AMD and Intel MSRs. > > > > > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") > > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > > Thanks for fixing, Terry, and previously there was a patch for this from Bas Nieuwenhuizen: > > https://lkml.org/lkml/2021/3/12/682 > > and it is expected to have been merged in Len's branch already. > > Expected? > > So is it or is it not? > This patch was sent to Len and it is not in public repo yet. He is preparing for a new release of turbostat as merge window is approaching. > And can you folks agree on a patch already and give it to Artem for > testing (CCed) because he's triggering it too: > > https://bugzilla.kernel.org/show_bug.cgi?id=212357 > Okay. I would vote for the the patch from Bas as it was a combined work from two authors and tested by several AMD users. But let me paste it here too for Artem to see if this also works for him: From 00e0622b1b693a5c7dc343aeb3aa51614a9e125e Mon Sep 17 00:00:00 2001 From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Date: Fri, 12 Mar 2021 21:27:40 +0800 Subject: [PATCH] tools/power/turbostat: Fix turbostat for AMD Zen CPUs It was reported that on Zen+ system turbostat started exiting, which was tracked down to the MSR_PKG_ENERGY_STAT read failing because offset_to_idx wasn't returning a non-negative index. This patch combined the modification from Bingsong Si and Bas Nieuwenhuizen and addd the MSR to the index system as alternative for MSR_PKG_ENERGY_STATUS. Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") Reported-by: youling257 <youling257@gmail.com> Tested-by: youling257 <youling257@gmail.com> Tested-by: sibingsong <owen.si@ucloud.cn> Tested-by: Kurt Garloff <kurt@garloff.de> Co-developed-by: Bingsong Si <owen.si@ucloud.cn> Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- tools/power/x86/turbostat/turbostat.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index a7c4f0772e53..a7c965734fdf 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -297,7 +297,10 @@ int idx_to_offset(int idx) switch (idx) { case IDX_PKG_ENERGY: - offset = MSR_PKG_ENERGY_STATUS; + if (do_rapl & RAPL_AMD_F17H) + offset = MSR_PKG_ENERGY_STAT; + else + offset = MSR_PKG_ENERGY_STATUS; break; case IDX_DRAM_ENERGY: offset = MSR_DRAM_ENERGY_STATUS; @@ -326,6 +329,7 @@ int offset_to_idx(int offset) switch (offset) { case MSR_PKG_ENERGY_STATUS: + case MSR_PKG_ENERGY_STAT: idx = IDX_PKG_ENERGY; break; case MSR_DRAM_ENERGY_STATUS: @@ -353,7 +357,7 @@ int idx_valid(int idx) { switch (idx) { case IDX_PKG_ENERGY: - return do_rapl & RAPL_PKG; + return do_rapl & (RAPL_PKG | RAPL_AMD_F17H); case IDX_DRAM_ENERGY: return do_rapl & RAPL_DRAM; case IDX_PP0_ENERGY:
On Tue, 2021-04-20 at 21:15 +0800, Chen Yu wrote: > > Okay. I would vote for the the patch from Bas as it was a combined > work from two > authors and tested by several AMD users. But let me paste it here too > for Artem to > see if this also works for him: > > > From 00e0622b1b693a5c7dc343aeb3aa51614a9e125e Mon Sep 17 00:00:00 > 2001 > From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > Date: Fri, 12 Mar 2021 21:27:40 +0800 > Subject: [PATCH] tools/power/turbostat: Fix turbostat for AMD Zen > CPUs > > > @@ -297,7 +297,10 @@ int idx_to_offset(int idx) > > switch (idx) { > case IDX_PKG_ENERGY: > - offset = MSR_PKG_ENERGY_STATUS; > + if (do_rapl & RAPL_AMD_F17H) > + offset = MSR_PKG_ENERGY_STAT; > + else > + offset = MSR_PKG_ENERGY_STATUS; > break; > case IDX_DRAM_ENERGY: > offset = MSR_DRAM_ENERGY_STATUS; This patch has the same issue I noticed with the initial revision of Terry's patch - the idx_to_offset function returns type int (32-bit signed), but MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be interpreted as a negative number) The end result is, as far as I can tell, that it hits the if (offset < 0) check in update_msr_sum() resulting in the timer callback for updating the stat in the background when long durations are used to not happen. For short durations it still works fine since the background update isn't used.
On Mon, 2021-04-19 at 14:58 -0500, Terry Bowman wrote: > Turbostat fails to correctly collect and display RAPL summary > information > on Family 17h and 19h AMD processors. Running turbostat on these > processors > returns immediately. If turbostat is working correctly then RAPL > summary > data is displayed until the user provided command completes. If a > command > is not provided by the user then turbostat is designed to > continuously > display RAPL information until interrupted. > > The issue is due to offset_to_idx() and idx_to_offset() missing > support for > AMD MSR addresses/offsets. offset_to_idx()'s switch statement is > missing > cases for AMD MSRs and idx_to_offset() does not include a path to > return > AMD MSR(s) for any idx. Ah, when I was looking at Boris's patch, it lead me to one more spot that was missed with the off_t type - inside the function update_msr_sum(), the offset is also stored in a variable of type int, and as a result the background update for AMD's MSR when turbostat is used with long durations won't work. (It's skipped because of the offset < 0 check).
On 4/20/21 1:15 PM, Chen Yu wrote: > On Tue, Apr 20, 2021 at 10:07:01AM +0200, Borislav Petkov wrote: >> On Tue, Apr 20, 2021 at 10:03:36AM +0800, Chen Yu wrote: >>> On Mon, Apr 19, 2021 at 02:58:12PM -0500, Terry Bowman wrote: >>>> Turbostat fails to correctly collect and display RAPL summary information >>>> on Family 17h and 19h AMD processors. Running turbostat on these processors >>>> returns immediately. If turbostat is working correctly then RAPL summary >>>> data is displayed until the user provided command completes. If a command >>>> is not provided by the user then turbostat is designed to continuously >>>> display RAPL information until interrupted. >>>> >>>> The issue is due to offset_to_idx() and idx_to_offset() missing support for >>>> AMD MSR addresses/offsets. offset_to_idx()'s switch statement is missing >>>> cases for AMD MSRs and idx_to_offset() does not include a path to return >>>> AMD MSR(s) for any idx. >>>> >>>> The solution is add AMD MSR support to offset_to_idx() and idx_to_offset(). >>>> These functions are split-out and renamed along architecture vendor lines >>>> for supporting both AMD and Intel MSRs. >>>> >>>> Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") >>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >>> Thanks for fixing, Terry, and previously there was a patch for this from Bas Nieuwenhuizen: >>> https://lkml.org/lkml/2021/3/12/682 >>> and it is expected to have been merged in Len's branch already. >> >> Expected? >> >> So is it or is it not? >> > This patch was sent to Len and it is not in public repo yet. He is preparing for a new > release of turbostat as merge window is approaching. >> And can you folks agree on a patch already and give it to Artem for >> testing (CCed) because he's triggering it too: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=212357 >> > Okay. I would vote for the the patch from Bas as it was a combined work from two > authors and tested by several AMD users. But let me paste it here too for Artem to > see if this also works for him: > > > From 00e0622b1b693a5c7dc343aeb3aa51614a9e125e Mon Sep 17 00:00:00 2001 > From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > Date: Fri, 12 Mar 2021 21:27:40 +0800 > Subject: [PATCH] tools/power/turbostat: Fix turbostat for AMD Zen CPUs > > It was reported that on Zen+ system turbostat started exiting, > which was tracked down to the MSR_PKG_ENERGY_STAT read failing because > offset_to_idx wasn't returning a non-negative index. > > This patch combined the modification from Bingsong Si and > Bas Nieuwenhuizen and addd the MSR to the index system as alternative for > MSR_PKG_ENERGY_STATUS. > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") > Reported-by: youling257 <youling257@gmail.com> > Tested-by: youling257 <youling257@gmail.com> > Tested-by: sibingsong <owen.si@ucloud.cn> > Tested-by: Kurt Garloff <kurt@garloff.de> > Co-developed-by: Bingsong Si <owen.si@ucloud.cn> > Signed-off-by: Chen Yu <yu.c.chen@intel.com> > --- > tools/power/x86/turbostat/turbostat.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c > index a7c4f0772e53..a7c965734fdf 100644 > --- a/tools/power/x86/turbostat/turbostat.c > +++ b/tools/power/x86/turbostat/turbostat.c > @@ -297,7 +297,10 @@ int idx_to_offset(int idx) > > switch (idx) { > case IDX_PKG_ENERGY: > - offset = MSR_PKG_ENERGY_STATUS; > + if (do_rapl & RAPL_AMD_F17H) > + offset = MSR_PKG_ENERGY_STAT; > + else > + offset = MSR_PKG_ENERGY_STATUS; > break; > case IDX_DRAM_ENERGY: > offset = MSR_DRAM_ENERGY_STATUS; > @@ -326,6 +329,7 @@ int offset_to_idx(int offset) > > switch (offset) { > case MSR_PKG_ENERGY_STATUS: > + case MSR_PKG_ENERGY_STAT: > idx = IDX_PKG_ENERGY; > break; > case MSR_DRAM_ENERGY_STATUS: > @@ -353,7 +357,7 @@ int idx_valid(int idx) > { > switch (idx) { > case IDX_PKG_ENERGY: > - return do_rapl & RAPL_PKG; > + return do_rapl & (RAPL_PKG | RAPL_AMD_F17H); > case IDX_DRAM_ENERGY: > return do_rapl & RAPL_DRAM; > case IDX_PP0_ENERGY: > The patch works for me. Tested-by: Artem S. Tashkinov <aros@gmx.com>
On Tue, Apr 20, 2021 at 09:28:06AM -0400, Calvin Walton wrote: > On Tue, 2021-04-20 at 21:15 +0800, Chen Yu wrote: > > > > Okay. I would vote for the the patch from Bas as it was a combined > > work from two > > authors and tested by several AMD users. But let me paste it here too > > for Artem to > > see if this also works for him: > > > > > > From 00e0622b1b693a5c7dc343aeb3aa51614a9e125e Mon Sep 17 00:00:00 > > 2001 > > From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > > Date: Fri, 12 Mar 2021 21:27:40 +0800 > > Subject: [PATCH] tools/power/turbostat: Fix turbostat for AMD Zen > > CPUs > > > > > > @@ -297,7 +297,10 @@ int idx_to_offset(int idx) > > > > switch (idx) { > > case IDX_PKG_ENERGY: > > - offset = MSR_PKG_ENERGY_STATUS; > > + if (do_rapl & RAPL_AMD_F17H) > > + offset = MSR_PKG_ENERGY_STAT; > > + else > > + offset = MSR_PKG_ENERGY_STATUS; > > break; > > case IDX_DRAM_ENERGY: > > offset = MSR_DRAM_ENERGY_STATUS; > > This patch has the same issue I noticed with the initial revision of > Terry's patch - the idx_to_offset function returns type int (32-bit > signed), but MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, > would be interpreted as a negative number) > > The end result is, as far as I can tell, that it hits the if (offset < > 0) check in update_msr_sum() resulting in the timer callback for > updating the stat in the background when long durations are used to not > happen. > > For short durations it still works fine since the background update > isn't used. > Ah, got it, nice catch. How about an incremental patch based on Bas' one to fix this 'overflow' issue? Would converting offset_to_idx(), idx_to_offset() and update_msr_sum() to use off_t instead of int be enough? Do you or Terry have interest to cook that patch? For Terry's version, I'm not sure if spliting the code into different CPU vendor would benefit in the future, except that we would have plenty of new MSRs to be introduced in the future. thanks, Chenyu > > -- > Calvin Walton <calvin.walton@kepstin.ca> >
On Tue, 2021-04-20 at 22:37 +0800, Chen Yu wrote: > On Tue, Apr 20, 2021 at 09:28:06AM -0400, Calvin Walton wrote: > > This patch has the same issue I noticed with the initial revision > > of > > Terry's patch - the idx_to_offset function returns type int (32-bit > > signed), but MSR_PKG_ENERGY_STAT is greater than INT_MAX (or > > rather, > > would be interpreted as a negative number) > > > > The end result is, as far as I can tell, that it hits the if > > (offset < > > 0) check in update_msr_sum() resulting in the timer callback for > > updating the stat in the background when long durations are used to > > not > > happen. > > > > For short durations it still works fine since the background update > > isn't used. > > > Ah, got it, nice catch. How about an incremental patch based on Bas' > one > to fix this 'overflow' issue? Would converting offset_to_idx(), > idx_to_offset() and > update_msr_sum() to use off_t instead of int be enough? Do you or > Terry have interest > to cook that patch? For Terry's version, I'm not sure if spliting > the code into different CPU vendor would benefit in the future, > except > that we would have plenty of new MSRs to be introduced in the future. Yes, I believe updating the offset_to_idx(), idx_to_offset(), and update_msr_sum() functions is sufficient. I can do the incremental patch for that this evening if nobody beats me to it :)
On Tue, Apr 20, 2021 at 09:15:41PM +0800, Chen Yu wrote:
> This patch was sent to Len and it is not in public repo yet.
If that thing were in a public repo, you'd have the advantage of
pointing people to it and they could be testing patches.
On Tue, Apr 20, 2021 at 10:42:09AM -0400, Calvin Walton wrote: > On Tue, 2021-04-20 at 22:37 +0800, Chen Yu wrote: > > On Tue, Apr 20, 2021 at 09:28:06AM -0400, Calvin Walton wrote: > > > This patch has the same issue I noticed with the initial revision > > > of > > > Terry's patch - the idx_to_offset function returns type int (32-bit > > > signed), but MSR_PKG_ENERGY_STAT is greater than INT_MAX (or > > > rather, > > > would be interpreted as a negative number) > > > > > > The end result is, as far as I can tell, that it hits the if > > > (offset < > > > 0) check in update_msr_sum() resulting in the timer callback for > > > updating the stat in the background when long durations are used to > > > not > > > happen. > > > > > > For short durations it still works fine since the background update > > > isn't used. > > > > > Ah, got it, nice catch. How about an incremental patch based on Bas' > > one > > to fix this 'overflow' issue? Would converting offset_to_idx(), > > idx_to_offset() and > > update_msr_sum() to use off_t instead of int be enough? Do you or > > Terry have interest > > to cook that patch? For Terry's version, I'm not sure if spliting > > the code into different CPU vendor would benefit in the future, > > except > > that we would have plenty of new MSRs to be introduced in the future. > > Yes, I believe updating the offset_to_idx(), idx_to_offset(), and > update_msr_sum() functions is sufficient. I can do the incremental > patch for that this evening if nobody beats me to it :) > Calvin, could you please take a look at the following version if it is suitible? From b2e63fe4f02e17289414b4f61237da822df115fb Mon Sep 17 00:00:00 2001 From: Calvin Walton <calvin.walton@kepstin.ca> Date: Fri, 23 Apr 2021 17:32:13 +0800 Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow issue in index converting The idx_to_offset() function returns type int (32-bit signed), but MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be interpreted as a negative number). The end result is that it hits the if (offset < 0) check in update_msr_sum() resulting in the timer callback for updating the stat in the background when long durations are used to not happen. The similar issue exists in offset_to_idx() and update_msr_sum(). This patch fixes this issue by converting the 'int' type to 'off_t' accordingly. Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- tools/power/x86/turbostat/turbostat.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index a211264b57fd..77557122b292 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -296,9 +296,9 @@ struct msr_sum_array { /* The percpu MSR sum array.*/ struct msr_sum_array *per_cpu_msr_sum; -int idx_to_offset(int idx) +off_t idx_to_offset(int idx) { - int offset; + off_t offset; switch (idx) { case IDX_PKG_ENERGY: @@ -328,7 +328,7 @@ int idx_to_offset(int idx) return offset; } -int offset_to_idx(int offset) +int offset_to_idx(off_t offset) { int idx; @@ -3338,7 +3338,7 @@ static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) { unsigned long long msr_cur, msr_last; - int offset; + off_t offset; if (!idx_valid(i)) continue; @@ -3347,7 +3347,7 @@ static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg continue; ret = get_msr(cpu, offset, &msr_cur); if (ret) { - fprintf(outf, "Can not update msr(0x%x)\n", offset); + fprintf(outf, "Can not update msr(0x%llx)\n", (long long int)offset); continue; } thanks, Chenyu
On Tue, Apr 20, 2021 at 05:25:21PM +0200, Borislav Petkov wrote: > On Tue, Apr 20, 2021 at 09:15:41PM +0800, Chen Yu wrote: > > This patch was sent to Len and it is not in public repo yet. > > If that thing were in a public repo, you'd have the advantage of > pointing people to it and they could be testing patches. > Okay, I think we'll put it in public repo once verified. thanks, Chenyu > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
On Fri, Apr 23, 2021 at 08:16:07PM +0800, Chen Yu wrote: > From b2e63fe4f02e17289414b4f61237da822df115fb Mon Sep 17 00:00:00 2001 > From: Calvin Walton <calvin.walton@kepstin.ca> > Date: Fri, 23 Apr 2021 17:32:13 +0800 > Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow issue in index > converting > > The idx_to_offset() function returns type int (32-bit signed), but > MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be > interpreted as a negative number). The end result is that it hits > the if (offset < 0) check in update_msr_sum() resulting in the timer > callback for updating the stat in the background when long durations > are used to not happen. The similar issue exists in offset_to_idx() > and update_msr_sum(). > > This patch fixes this issue by converting the 'int' type to 'off_t' > accordingly. > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") > Signed-off-by: Chen Yu <yu.c.chen@intel.com> This patch's authorship is weird: it says From: Calvin but doesn't have his SOB here - only yours. > --- > tools/power/x86/turbostat/turbostat.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c > index a211264b57fd..77557122b292 100644 > --- a/tools/power/x86/turbostat/turbostat.c > +++ b/tools/power/x86/turbostat/turbostat.c > @@ -296,9 +296,9 @@ struct msr_sum_array { > /* The percpu MSR sum array.*/ > struct msr_sum_array *per_cpu_msr_sum; > > -int idx_to_offset(int idx) > +off_t idx_to_offset(int idx) And this is silly. MSRs are unsigned int. Fullstop. So that function should either return u32 or unsigned int or so. Thx.
Hi Boris, On Fri, Apr 23, 2021 at 02:19:34PM +0200, Borislav Petkov wrote: > On Fri, Apr 23, 2021 at 08:16:07PM +0800, Chen Yu wrote: > > From b2e63fe4f02e17289414b4f61237da822df115fb Mon Sep 17 00:00:00 2001 > > From: Calvin Walton <calvin.walton@kepstin.ca> > > Date: Fri, 23 Apr 2021 17:32:13 +0800 > > Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow issue in index > > converting > > > > The idx_to_offset() function returns type int (32-bit signed), but > > MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be > > interpreted as a negative number). The end result is that it hits > > the if (offset < 0) check in update_msr_sum() resulting in the timer > > callback for updating the stat in the background when long durations > > are used to not happen. The similar issue exists in offset_to_idx() > > and update_msr_sum(). > > > > This patch fixes this issue by converting the 'int' type to 'off_t' > > accordingly. > > > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") > > Signed-off-by: Chen Yu <yu.c.chen@intel.com> > > This patch's authorship is weird: it says From: Calvin but doesn't have > his SOB here - only yours. > I see, I'll add Calvin's SOB here. Previously I thought the 'From' field might be enough to indicate the Author, but it seems to not be the case. > > --- > > tools/power/x86/turbostat/turbostat.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c > > index a211264b57fd..77557122b292 100644 > > --- a/tools/power/x86/turbostat/turbostat.c > > +++ b/tools/power/x86/turbostat/turbostat.c > > @@ -296,9 +296,9 @@ struct msr_sum_array { > > /* The percpu MSR sum array.*/ > > struct msr_sum_array *per_cpu_msr_sum; > > > > -int idx_to_offset(int idx) > > +off_t idx_to_offset(int idx) > > And this is silly. MSRs are unsigned int. Fullstop. > > So that function should either return u32 or unsigned int or so. > Got it. The off_t was derived from old code in this file from get_msr() and alike, let me convert this return value to unsigned int. thanks, Chenyu > Thx. > > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
On Fri, 2021-04-23 at 14:19 +0200, Borislav Petkov wrote: > On Fri, Apr 23, 2021 at 08:16:07PM +0800, Chen Yu wrote: > > From b2e63fe4f02e17289414b4f61237da822df115fb Mon Sep 17 00:00:00 > > 2001 > > From: Calvin Walton <calvin.walton@kepstin.ca> > > Date: Fri, 23 Apr 2021 17:32:13 +0800 > > Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow > > issue in index > > converting > > > > The idx_to_offset() function returns type int (32-bit signed), but > > MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be > > interpreted as a negative number). The end result is that it hits > > the if (offset < 0) check in update_msr_sum() resulting in the > > timer > > callback for updating the stat in the background when long > > durations > > are used to not happen. The similar issue exists in offset_to_idx() > > and update_msr_sum(). > > > > This patch fixes this issue by converting the 'int' type to 'off_t' > > accordingly. > > > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL > > display") > > Signed-off-by: Chen Yu <yu.c.chen@intel.com> > > This patch's authorship is weird: it says From: Calvin but doesn't > have > his SOB here - only yours. I think this patch is adapted from one of my earlier submissions? I don't think I can really say that I wrote it, but I'll certainly review it. > > > --- > > tools/power/x86/turbostat/turbostat.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/tools/power/x86/turbostat/turbostat.c > > b/tools/power/x86/turbostat/turbostat.c > > index a211264b57fd..77557122b292 100644 > > --- a/tools/power/x86/turbostat/turbostat.c > > +++ b/tools/power/x86/turbostat/turbostat.c > > @@ -296,9 +296,9 @@ struct msr_sum_array { > > /* The percpu MSR sum array.*/ > > struct msr_sum_array *per_cpu_msr_sum; > > > > -int idx_to_offset(int idx) > > +off_t idx_to_offset(int idx) > > And this is silly. MSRs are unsigned int. Fullstop. > > So that function should either return u32 or unsigned int or so. So, there's two problems with that: 1. This function needs to be able to return an error value that cannot be confused with a valid MSR. This is currently done by returning a negative number. If an unsigned value is used, a different way of indicating errors needs to be written. 2. We are not using CPU instructions to access MSRs direction. Instead they are being read from /dev/msr. So the "offset" value is actually a seek into the /dev/msr file (using pread), and thus is of type off_t.
On Fri, 2021-04-23 at 20:16 +0800, Chen Yu wrote > > Calvin, could you please take a look at the following version if it > is suitible? I assume this patch is adapted from my earlier submission of the complete fix to be an incremental fix on top of the already applied patch? If so, you are OK to add Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca> > From b2e63fe4f02e17289414b4f61237da822df115fb Mon Sep 17 00:00:00 > 2001 > From: Calvin Walton <calvin.walton@kepstin.ca> > Date: Fri, 23 Apr 2021 17:32:13 +0800 > Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow issue > in index > converting > > The idx_to_offset() function returns type int (32-bit signed), but > MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be > interpreted as a negative number). The end result is that it hits > the if (offset < 0) check in update_msr_sum() resulting in the timer > callback for updating the stat in the background when long durations > are used to not happen. The similar issue exists in offset_to_idx() > and update_msr_sum(). > > This patch fixes this issue by converting the 'int' type to 'off_t' > accordingly. This patch covers all of the places I know about which had type issues - it should be good. Thanks!
On Fri, Apr 23, 2021 at 10:04:10AM -0400, Calvin Walton wrote: > On Fri, 2021-04-23 at 20:16 +0800, Chen Yu wrote > > > > Calvin, could you please take a look at the following version if it > > is suitible? > > I assume this patch is adapted from my earlier submission of the > complete fix to be an incremental fix on top of the already applied > patch? If so, you are OK to add > > Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca> > Thanks Calvin. BTW, I did not receive your incremental patch on top of already applied one, so I post this patch in the name of yours. > > From b2e63fe4f02e17289414b4f61237da822df115fb Mon Sep 17 00:00:00 > > 2001 > > From: Calvin Walton <calvin.walton@kepstin.ca> > > Date: Fri, 23 Apr 2021 17:32:13 +0800 > > Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow issue > > in index > > converting > > > > The idx_to_offset() function returns type int (32-bit signed), but > > MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be > > interpreted as a negative number). The end result is that it hits > > the if (offset < 0) check in update_msr_sum() resulting in the timer > > callback for updating the stat in the background when long durations > > are used to not happen. The similar issue exists in offset_to_idx() > > and update_msr_sum(). > > > > This patch fixes this issue by converting the 'int' type to 'off_t' > > accordingly. > > This patch covers all of the places I know about which had type issues > - it should be good. Thanks! > Okay, as per Boris' suggestion, this patch has changed the off_t to unsigned int, and the print format is kept the same as '%x' without casting, how about this: From d5923502d0010d5e4d722c2a01ee8bda4a13bf75 Mon Sep 17 00:00:00 2001 From: Calvin Walton <calvin.walton@kepstin.ca> Date: Fri, 23 Apr 2021 17:32:13 +0800 Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow issue in index converting The idx_to_offset() function returns type int (32-bit signed), but MSR_PKG_ENERGY_STAT is u32 and would be interpreted as a negative number. The end result is that it hits the if (offset < 0) check in update_msr_sum() which prevents the timer callback from updating the stat in the background when long durations are used. The similar issue exists in offset_to_idx() and update_msr_sum(). This patch fixes this issue by converting the 'int' to 'unsigned int' accordingly. Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- tools/power/x86/turbostat/turbostat.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index a211264b57fd..dcde41784059 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -296,9 +296,9 @@ struct msr_sum_array { /* The percpu MSR sum array.*/ struct msr_sum_array *per_cpu_msr_sum; -int idx_to_offset(int idx) +unsigned int idx_to_offset(int idx) { - int offset; + unsigned int offset; switch (idx) { case IDX_PKG_ENERGY: @@ -328,7 +328,7 @@ int idx_to_offset(int idx) return offset; } -int offset_to_idx(int offset) +int offset_to_idx(unsigned int offset) { int idx; @@ -3338,7 +3338,7 @@ static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) { unsigned long long msr_cur, msr_last; - int offset; + unsigned int offset; if (!idx_valid(i)) continue;
On Fri, Apr 23, 2021 at 10:00:14AM -0400, Calvin Walton wrote: > So, there's two problems with that: > 1. This function needs to be able to return an error value that cannot be > confused with a valid MSR. This is currently done by returning a > negative number. If an unsigned value is used, a different way of > indicating errors needs to be written. > 2. We are not using CPU instructions to access MSRs direction. Instead > they are being read from /dev/msr. So the "offset" value is actually a > seek into the /dev/msr file (using pread), and thus is of type off_t. Ah right, that's the /dev/msr thing. Then off_t is correct, forget what I said. Thx.
On Fri, Apr 23, 2021 at 09:34:30PM +0800, Chen Yu wrote: > I see, I'll add Calvin's SOB here. Previously I thought the 'From' field might > be enough to indicate the Author, but it seems to not be the case. The From: field is used by git to take the author but that's not the problem. You need the author her/himself to sign off on the work - no one else can do that. You can refresh on that here: Documentation/process/submitting-patches.rst > Got it. The off_t was derived from old code in this file from get_msr() and alike, > let me convert this return value to unsigned int. See my reply to Calvin - it needs to by off_t after all. Thx.
On Fri, 2021-04-23 at 22:27 +0800, Chen Yu wrote: > > This patch fixes this issue by converting the 'int' to 'unsigned int' > accordingly. NAK on this version of the patch. It needs to be off_t; see my conversation with Boris elsewhere in this thread.
On Fri, Apr 23, 2021 at 04:32:05PM +0200, Borislav Petkov wrote: > On Fri, Apr 23, 2021 at 09:34:30PM +0800, Chen Yu wrote: > > I see, I'll add Calvin's SOB here. Previously I thought the 'From' field might > > be enough to indicate the Author, but it seems to not be the case. > > The From: field is used by git to take the author but that's not the > problem. You need the author her/himself to sign off on the work - no > one else can do that. > Thanks for the explanation. > You can refresh on that here: > > Documentation/process/submitting-patches.rst > > > Got it. The off_t was derived from old code in this file from get_msr() and alike, > > let me convert this return value to unsigned int. > > See my reply to Calvin - it needs to by off_t after all. > Okay. Best, Chenyu > Thx. > > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
On Fri, Apr 23, 2021 at 10:00:14AM -0400, Calvin Walton wrote: > On Fri, 2021-04-23 at 14:19 +0200, Borislav Petkov wrote: > > On Fri, Apr 23, 2021 at 08:16:07PM +0800, Chen Yu wrote: > > > From b2e63fe4f02e17289414b4f61237da822df115fb Mon Sep 17 00:00:00 > > > 2001 > > > From: Calvin Walton <calvin.walton@kepstin.ca> > > > Date: Fri, 23 Apr 2021 17:32:13 +0800 > > > Subject: [PATCH 3/5] tools/power turbostat: Fix offset overflow > > > issue in index > > > converting > > > > > > The idx_to_offset() function returns type int (32-bit signed), but > > > MSR_PKG_ENERGY_STAT is greater than INT_MAX (or rather, would be > > > interpreted as a negative number). The end result is that it hits > > > the if (offset < 0) check in update_msr_sum() resulting in the > > > timer > > > callback for updating the stat in the background when long > > > durations > > > are used to not happen. The similar issue exists in offset_to_idx() > > > and update_msr_sum(). > > > > > > This patch fixes this issue by converting the 'int' type to 'off_t' > > > accordingly. > > > > > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL > > > display") > > > Signed-off-by: Chen Yu <yu.c.chen@intel.com> > > > > This patch's authorship is weird: it says From: Calvin but doesn't > > have > > his SOB here - only yours. > > I think this patch is adapted from one of my earlier submissions? I > don't think I can really say that I wrote it, but I'll certainly review > it. > > > > > > --- > > > tools/power/x86/turbostat/turbostat.c | 10 +++++----- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/tools/power/x86/turbostat/turbostat.c > > > b/tools/power/x86/turbostat/turbostat.c > > > index a211264b57fd..77557122b292 100644 > > > --- a/tools/power/x86/turbostat/turbostat.c > > > +++ b/tools/power/x86/turbostat/turbostat.c > > > @@ -296,9 +296,9 @@ struct msr_sum_array { > > > /* The percpu MSR sum array.*/ > > > struct msr_sum_array *per_cpu_msr_sum; > > > > > > -int idx_to_offset(int idx) > > > +off_t idx_to_offset(int idx) > > > > And this is silly. MSRs are unsigned int. Fullstop. > > > > So that function should either return u32 or unsigned int or so. > > So, there's two problems with that: > 1. This function needs to be able to return an error value that cannot be > confused with a valid MSR. This is currently done by returning a > negative number. If an unsigned value is used, a different way of > indicating errors needs to be written. > 2. We are not using CPU instructions to access MSRs direction. Instead > they are being read from /dev/msr. So the "offset" value is actually a > seek into the /dev/msr file (using pread), and thus is of type off_t. > I see, I misunderstood it with kernel's rdmsr() and ... I'll use the original version. thanks, Chenyu > -- > Calvin Walton <calvin.walton@kepstin.ca> >
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index a7c4f0772e53..5aacdbd28fa8 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -291,9 +291,9 @@ struct msr_sum_array { /* The percpu MSR sum array.*/ struct msr_sum_array *per_cpu_msr_sum; -int idx_to_offset(int idx) +off_t idx_to_offset_intel(int idx) { - int offset; + off_t offset; switch (idx) { case IDX_PKG_ENERGY: @@ -320,7 +320,7 @@ int idx_to_offset(int idx) return offset; } -int offset_to_idx(int offset) +int offset_to_idx_intel(off_t offset) { int idx; @@ -349,7 +349,7 @@ int offset_to_idx(int offset) return idx; } -int idx_valid(int idx) +int idx_valid_intel(int idx) { switch (idx) { case IDX_PKG_ENERGY: @@ -368,6 +368,51 @@ int idx_valid(int idx) return 0; } } + +off_t (*idx_to_offset)(int idx) = idx_to_offset_intel; +int (*offset_to_idx)(off_t offset) = offset_to_idx_intel; +int (*idx_valid)(int idx) = idx_valid_intel; + +off_t idx_to_offset_amd(int idx) +{ + off_t offset; + + switch (idx) { + case IDX_PKG_ENERGY: + offset = MSR_PKG_ENERGY_STAT; + break; + default: + offset = -1; + } + + return offset; +} + +int offset_to_idx_amd(off_t offset) +{ + int idx; + + switch (offset) { + case MSR_PKG_ENERGY_STAT: + idx = IDX_PKG_ENERGY; + break; + default: + idx = -1; + } + + return idx; +} + +int idx_valid_amd(int idx) +{ + switch (idx) { + case IDX_PKG_ENERGY: + return do_rapl & RAPL_AMD_F17H; + default: + return 0; + } +} + struct sys_counters { unsigned int added_thread_counters; unsigned int added_core_counters; @@ -3272,7 +3317,7 @@ static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg for (i = IDX_PKG_ENERGY; i < IDX_COUNT; i++) { unsigned long long msr_cur, msr_last; - int offset; + off_t offset; if (!idx_valid(i)) continue; @@ -3281,7 +3326,7 @@ static int update_msr_sum(struct thread_data *t, struct core_data *c, struct pkg continue; ret = get_msr(cpu, offset, &msr_cur); if (ret) { - fprintf(outf, "Can not update msr(0x%x)\n", offset); + fprintf(outf, "Can not update msr(0x%lx)\n", offset); continue; } @@ -5348,6 +5393,12 @@ void process_cpuid() if (!quiet) decode_misc_feature_control(); + if (authentic_amd || hygon_genuine) { + idx_to_offset = idx_to_offset_amd; + offset_to_idx = offset_to_idx_amd; + idx_valid = idx_valid_amd; + } + return; }
Turbostat fails to correctly collect and display RAPL summary information on Family 17h and 19h AMD processors. Running turbostat on these processors returns immediately. If turbostat is working correctly then RAPL summary data is displayed until the user provided command completes. If a command is not provided by the user then turbostat is designed to continuously display RAPL information until interrupted. The issue is due to offset_to_idx() and idx_to_offset() missing support for AMD MSR addresses/offsets. offset_to_idx()'s switch statement is missing cases for AMD MSRs and idx_to_offset() does not include a path to return AMD MSR(s) for any idx. The solution is add AMD MSR support to offset_to_idx() and idx_to_offset(). These functions are split-out and renamed along architecture vendor lines for supporting both AMD and Intel MSRs. Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") Signed-off-by: Terry Bowman <terry.bowman@amd.com> --- Changes in V2: - Set patch title to v2. The first patch submission was mistakenly titled as v4 when it should have been v1. - Change offset variables from 'int' to 'off_t' type. Change is needed to prevent sign extension in code casting int->off_t. This is currently a problem with AMD MSRs using base of 0xC000_0000 - Update idx_valid_amd() capability masking to use RAPL_AMD_F17H tools/power/x86/turbostat/turbostat.c | 63 ++++++++++++++++++++++++--- 1 file changed, 57 insertions(+), 6 deletions(-)