diff mbox series

Fix turbostat exiting with an error when run on AMD CPUs

Message ID 88d11c19e662f67ae492eb4b93e12e1b24e68c1d.camel@kepstin.ca (mailing list archive)
State Superseded, archived
Delegated to: Len Brown
Headers show
Series Fix turbostat exiting with an error when run on AMD CPUs | expand

Commit Message

Calvin Walton April 15, 2021, 2:05 a.m. UTC
The current version of turbostat exits immediately upon entering the
main loop, with error code -13. This is a regression that was introducted
in these commits:

9972d5d84d76 tools/power turbostat: Enable accumulate RAPL display
87e15da95775 tools/power turbostat: Introduce functions to accumulate RAPL consumption

Which introduced a method to accumulate MSR values over long sampling
durations.

The commits failed to account for the fact that AMD CPUs use a different
(but confusingly similarly named) MSR for reading the package energy.
I've added the AMD version of the MSR to the methods so that turbostat
can be run again.

(If you run on a system with mixed Intel and AMD cpus, you might have
problems, but I have been assured that this isn't likely in practice.)

The MSR offsets in the conversion functions have been switched to use
type off_t, since the offsets of the AMD MSRs exceed the range of a
signed 32-bit int.

Note that since the framework introduced only handles per-cpu MSRs but not
per-core MSRs, AMD "Core" energy is not currently accumulated over long
sampling periods.

Fixes: 9972d5d84d76982606806b2ce887f70c2f8ba60a
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
---
 tools/power/x86/turbostat/turbostat.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Calvin Walton April 15, 2021, 2:08 a.m. UTC | #1
On Wed, 2021-04-14 at 22:05 -0400, Calvin Walton wrote:
> The current version of turbostat exits immediately upon entering the
> main loop, with error code -13. This is a regression that was
> introducted
> in these commits:
> 
> 9972d5d84d76 tools/power turbostat: Enable accumulate RAPL display
> 87e15da95775 tools/power turbostat: Introduce functions to accumulate
> RAPL consumption

Ah, I failed to check the mailing list before sending this patch! Terry
Bowman's fix here should probably be preferred:
https://patchwork.kernel.org/project/linux-pm/patch/20210331155807.3838-1-terry.bowman@amd.com/

My patch was simply the minimum necessary to get turbostat working
again.
Calvin Walton April 15, 2021, 2:25 a.m. UTC | #2
On Thu, 2021-04-15 at 10:26 +0800, Chen Yu wrote:
> Hi Calvin,
> On Wed, Apr 14, 2021 at 10:08:07PM -0400, Calvin Walton wrote:
> > On Wed, 2021-04-14 at 22:05 -0400, Calvin Walton wrote:
> > > The current version of turbostat exits immediately upon entering
> > > the
> > > main loop, with error code -13. This is a regression that was
> > > introducted
> > > in these commits:
> > > 
> > > 9972d5d84d76 tools/power turbostat: Enable accumulate RAPL
> > > display
> > > 87e15da95775 tools/power turbostat: Introduce functions to
> > > accumulate
> > > RAPL consumption
> > 
> > Ah, I failed to check the mailing list before sending this patch!
> > Terry
> > Bowman's fix here should probably be preferred:
> > https://patchwork.kernel.org/project/linux-pm/patch/20210331155807.3838-1-terry.bowman@amd.com/
> > 
> > My patch was simply the minimum necessary to get turbostat working
> > again.
> Thanks for reporting this. We had a fix for this previously at
> https://lkml.org/lkml/2021/3/12/682
> 
> I'll check with Len if this patch has been merged.

Thanks for checking.

I notice that the linked patch doesn't include the other part of the
fix - correcting the type used for the MSR offsets to off_t (the AMD
MSRs at 0xc0010299 exceed the range of a signed 32-bit int), so if that
patch is in the queue to be merged, I can submit the off_t patch
separately.
Chen Yu April 15, 2021, 2:26 a.m. UTC | #3
Hi Calvin,
On Wed, Apr 14, 2021 at 10:08:07PM -0400, Calvin Walton wrote:
> On Wed, 2021-04-14 at 22:05 -0400, Calvin Walton wrote:
> > The current version of turbostat exits immediately upon entering the
> > main loop, with error code -13. This is a regression that was
> > introducted
> > in these commits:
> > 
> > 9972d5d84d76 tools/power turbostat: Enable accumulate RAPL display
> > 87e15da95775 tools/power turbostat: Introduce functions to accumulate
> > RAPL consumption
> 
> Ah, I failed to check the mailing list before sending this patch! Terry
> Bowman's fix here should probably be preferred:
> https://patchwork.kernel.org/project/linux-pm/patch/20210331155807.3838-1-terry.bowman@amd.com/
> 
> My patch was simply the minimum necessary to get turbostat working
> again.
Thanks for reporting this. We had a fix for this previously at
https://lkml.org/lkml/2021/3/12/682

I'll check with Len if this patch has been merged.

thanks,
Chenyu
> -- 
> Calvin Walton <calvin.walton@kepstin.ca>
>
diff mbox series

Patch

diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index a7c4f0772e53..576e03d373c4 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c
@@ -291,13 +291,16 @@  struct msr_sum_array {
 /* The percpu MSR sum array.*/
 struct msr_sum_array *per_cpu_msr_sum;
 
-int idx_to_offset(int idx)
+off_t idx_to_offset(int idx)
 {
-	int offset;
+	off_t offset;
 
 	switch (idx) {
 	case IDX_PKG_ENERGY:
-		offset = MSR_PKG_ENERGY_STATUS;
+		if (do_rapl & RAPL_AMD_F17H)
+			offset = MSR_PKG_ENERGY_STAT;
+		else
+			offset = MSR_PKG_ENERGY_STATUS;
 		break;
 	case IDX_DRAM_ENERGY:
 		offset = MSR_DRAM_ENERGY_STATUS;
@@ -320,11 +323,12 @@  int idx_to_offset(int idx)
 	return offset;
 }
 
-int offset_to_idx(int offset)
+int offset_to_idx(off_t offset)
 {
 	int idx;
 
 	switch (offset) {
+	case MSR_PKG_ENERGY_STAT:
 	case MSR_PKG_ENERGY_STATUS:
 		idx = IDX_PKG_ENERGY;
 		break;
@@ -353,7 +357,7 @@  int idx_valid(int idx)
 {
 	switch (idx) {
 	case IDX_PKG_ENERGY:
-		return do_rapl & RAPL_PKG;
+		return do_rapl & (RAPL_PKG | RAPL_AMD_F17H);
 	case IDX_DRAM_ENERGY:
 		return do_rapl & RAPL_DRAM;
 	case IDX_PP0_ENERGY: