diff mbox

[v2,2/2] powercap/rapl: reduce ipi calls

Message ID 1452647483-14244-3-git-send-email-jacob.jun.pan@linux.intel.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Jacob Pan Jan. 13, 2016, 1:11 a.m. UTC
Reduce remote CPU calls for MSR access by combining read
modify write into one function. Also optimize search for active CPU on
package.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/powercap/intel_rapl.c | 77 +++++++++++++++++++++++--------------------
 1 file changed, 42 insertions(+), 35 deletions(-)

Comments

Thomas Gleixner Jan. 13, 2016, 9:47 a.m. UTC | #1
On Tue, 12 Jan 2016, Jacob Pan wrote:
> @@ -805,30 +809,18 @@ static int rapl_write_data_raw(struct rapl_domain *rd,
>  			enum rapl_primitives prim,
>  			unsigned long long value)
>  {
> -	u64 msr_val;
> -	u32 msr;
>  	struct rapl_primitive_info *rp = &rpi[prim];
>  	int cpu;
> +	u64 bits;
>  
>  	cpu = find_active_cpu_on_package(rd->package_id);
>  	if (cpu < 0)
>  		return cpu;
> -	msr = rd->msrs[rp->id];
> -	if (rdmsrl_safe_on_cpu(cpu, msr, &msr_val)) {
> -		dev_dbg(&rd->power_zone.dev,
> -			"failed to read msr 0x%x on cpu %d\n", msr, cpu);
> -		return -EIO;
> -	}
> -	value = rapl_unit_xlate(rd, rd->package_id, rp->unit, value, 1);
> -	msr_val &= ~rp->mask;
> -	msr_val |= value << rp->shift;
> -	if (wrmsrl_safe_on_cpu(cpu, msr, msr_val)) {
> -		dev_dbg(&rd->power_zone.dev,
> -			"failed to write msr 0x%x on cpu %d\n", msr, cpu);
> -		return -EIO;
> -	}
>  
> -	return 0;
> +	bits = rapl_unit_xlate(rd, rd->package_id, rp->unit, value, 1);
> +	bits |= bits << rp->shift;
> +
> +	return rmwmsrl_safe_on_cpu(cpu, rd->msrs[rp->id], rp->mask, bits);

So here you actually use that new (misnomed) function, but for 

> +static void power_limit_irq_save_cpu(void *info)

and

> +static void power_limit_irq_restore_cpu(void *info)

you use a bog standard smp function call. What's the benefit of adding that
rmw function over a bog standard smp function call if you can only use it for
one instance of the same pattern?

Boris asked you the same question here

      https://lkml.kernel.org/r/20151220152749.GA29805@pd.tnic

but you decided to ignore it.

Thanks,

	tglx




--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jacob Pan Jan. 13, 2016, 4:21 p.m. UTC | #2
On Wed, 13 Jan 2016 10:47:26 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> So here you actually use that new (misnomed) function, but for 
> 
> > +static void power_limit_irq_save_cpu(void *info)  
> 
> and
> 
> > +static void power_limit_irq_restore_cpu(void *info)  
> 
> you use a bog standard smp function call. What's the benefit of
> adding that rmw function over a bog standard smp function call if you
> can only use it for one instance of the same pattern?
> 
> Boris asked you the same question here
> 
>       https://lkml.kernel.org/r/20151220152749.GA29805@pd.tnic
> 
> but you decided to ignore it.
+Borislav,

Thanks for bring this out. I didn't mean to ignore. I thought my point
was stated in the commit message there was no point of going back and
forth. Read-Modify-Write is quite common, not just for RAPL could be
used by future code. Sorry if I wasn't clear.

Jacob
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 13, 2016, 4:36 p.m. UTC | #3
On Wed, Jan 13, 2016 at 08:21:13AM -0800, Jacob Pan wrote:
> Thanks for bring this out. I didn't mean to ignore. I thought my point
> was stated in the commit message there was no point of going back and
> forth. Read-Modify-Write is quite common, not just for RAPL could be
> used by future code. Sorry if I wasn't clear.

But it also shows that it doesn't suffice for all your needs. So why add
it?

You can much better define your own functions which do all the MSR
handling you require and call them with smp_call_function_*.
Jacob Pan Jan. 13, 2016, 5:51 p.m. UTC | #4
On Wed, 13 Jan 2016 17:36:10 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Wed, Jan 13, 2016 at 08:21:13AM -0800, Jacob Pan wrote:
> > Thanks for bring this out. I didn't mean to ignore. I thought my
> > point was stated in the commit message there was no point of going
> > back and forth. Read-Modify-Write is quite common, not just for
> > RAPL could be used by future code. Sorry if I wasn't clear.
> 
> But it also shows that it doesn't suffice for all your needs. So why
> add it?
> 
> You can much better define your own functions which do all the MSR
> handling you require and call them with smp_call_function_*.
> 
yeah, that is what I did in the original patch.
https://lkml.org/lkml/2015/12/7/1090

Then i was suggested to add a rmw msr api for the common good :), I
think it is a good idea since such operation is not limited to RAPL
driver. Other register access APIs such as regmap have more complete
selections.


Jacob
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 13, 2016, 6:04 p.m. UTC | #5
On Wed, Jan 13, 2016 at 09:51:24AM -0800, Jacob Pan wrote:
> yeah, that is what I did in the original patch.
> https://lkml.org/lkml/2015/12/7/1090
> 
> Then i was suggested to add a rmw msr api for the common good :), I
> think it is a good idea since such operation is not limited to RAPL
> driver.

Others are...?
Jacob Pan Jan. 13, 2016, 6:21 p.m. UTC | #6
On Wed, 13 Jan 2016 19:04:12 +0100
Borislav Petkov <bp@alien8.de> wrote:

> > Then i was suggested to add a rmw msr api for the common good :), I
> > think it is a good idea since such operation is not limited to RAPL
> > driver.  
> 
> Others are...?

nothing for the _safe version right now. Quick scan through the code
I see the non safe version has a couple can be converted to rmw.
e.g.

static int cpufreq_p4_setdc(unsigned int cpu, unsigned int newstate)
{
...

	rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
	if (newstate == DC_DISABLE) {
		pr_debug("CPU#%d disabling modulation\n", cpu);
		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l & ~(1<<4),
	h); } else {
		pr_debug("CPU#%d setting duty cycle to %d%%\n",
			cpu, ((125 * newstate) / 10));
		/* bits 63 - 5	: reserved
		 * bit  4	: enable/disable
		 * bits 3-1	: duty cycle
		 * bit  0	: reserved
		 */
		l = (l & ~14);
		l = l | (1<<4) | ((newstate & 0x7)<<1);
		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l, h);
	} 


static int sfi_cpufreq_target(struct cpufreq_policy *policy, unsigned
int index) {
...

	rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
	lo = (lo & ~INTEL_PERF_CTL_MASK) |
		((u32) sfi_cpufreq_array[next_perf_state].ctrl_val &
		INTEL_PERF_CTL_MASK);
	wrmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, lo, hi);
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 13, 2016, 7:16 p.m. UTC | #7
On Wed, Jan 13, 2016 at 10:21:38AM -0800, Jacob Pan wrote:
> static int cpufreq_p4_setdc(unsigned int cpu, unsigned int newstate)
> {
> ...
> 
> 	rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
> 	if (newstate == DC_DISABLE) {
> 		pr_debug("CPU#%d disabling modulation\n", cpu);
> 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l & ~(1<<4),
> 	h); } else {
> 		pr_debug("CPU#%d setting duty cycle to %d%%\n",
> 			cpu, ((125 * newstate) / 10));
> 		/* bits 63 - 5	: reserved
> 		 * bit  4	: enable/disable
> 		 * bits 3-1	: duty cycle
> 		 * bit  0	: reserved
> 		 */
> 		l = (l & ~14);
> 		l = l | (1<<4) | ((newstate & 0x7)<<1);
> 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l, h);
> 	} 

This cannot be converted because you need to do the stuff between the
rdmsr_on_cpu() and wrmsr_on_cpu() calls.

> static int sfi_cpufreq_target(struct cpufreq_policy *policy, unsigned
> int index) {
> ...
> 
> 	rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> 	lo = (lo & ~INTEL_PERF_CTL_MASK) |
> 		((u32) sfi_cpufreq_array[next_perf_state].ctrl_val &
> 		INTEL_PERF_CTL_MASK);
> 	wrmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, lo, hi);

Ditto.

These two examples prove my point, actually.
Jacob Pan Jan. 13, 2016, 8:10 p.m. UTC | #8
On Wed, 13 Jan 2016 20:16:22 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Wed, Jan 13, 2016 at 10:21:38AM -0800, Jacob Pan wrote:
> > static int cpufreq_p4_setdc(unsigned int cpu, unsigned int newstate)
> > {
> > ...
> > 
> > 	rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
> > 	if (newstate == DC_DISABLE) {
> > 		pr_debug("CPU#%d disabling modulation\n", cpu);
> > 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l &
> > ~(1<<4), h); } else {
> > 		pr_debug("CPU#%d setting duty cycle to %d%%\n",
> > 			cpu, ((125 * newstate) / 10));
> > 		/* bits 63 - 5	: reserved
> > 		 * bit  4	: enable/disable
> > 		 * bits 3-1	: duty cycle
> > 		 * bit  0	: reserved
> > 		 */
> > 		l = (l & ~14);
> > 		l = l | (1<<4) | ((newstate & 0x7)<<1);
> > 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l, h);
> > 	}   
> 
> This cannot be converted because you need to do the stuff between the
> rdmsr_on_cpu() and wrmsr_on_cpu() calls.
> 
it can be converted if move the below if statement outside read/write
pair. 
 	if (newstate == DC_DISABLE) {

> > static int sfi_cpufreq_target(struct cpufreq_policy *policy,
> > unsigned int index) {
> > ...
> > 
> > 	rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> > 	lo = (lo & ~INTEL_PERF_CTL_MASK) |
> > 		((u32) sfi_cpufreq_array[next_perf_state].ctrl_val &
> > 		INTEL_PERF_CTL_MASK);
> > 	wrmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, lo, hi);  
> 
> Ditto.
> 
> These two examples prove my point, actually.

same here, it is just clear mask and set mask, why not?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 13, 2016, 9:26 p.m. UTC | #9
On Wed, Jan 13, 2016 at 12:10:03PM -0800, Jacob Pan wrote:
> > > 	rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
> > > 	if (newstate == DC_DISABLE) {
> > > 		pr_debug("CPU#%d disabling modulation\n", cpu);
> > > 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l &
> > > ~(1<<4), h); } else {
> > > 		pr_debug("CPU#%d setting duty cycle to %d%%\n",
> > > 			cpu, ((125 * newstate) / 10));
> > > 		/* bits 63 - 5	: reserved
> > > 		 * bit  4	: enable/disable
> > > 		 * bits 3-1	: duty cycle
> > > 		 * bit  0	: reserved
> > > 		 */
> > > 		l = (l & ~14);
> > > 		l = l | (1<<4) | ((newstate & 0x7)<<1);
> > > 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l, h);
> > > 	}   
> > 
> > This cannot be converted because you need to do the stuff between the
> > rdmsr_on_cpu() and wrmsr_on_cpu() calls.
> > 
> it can be converted if move the below if statement outside read/write
> pair. 
>  	if (newstate == DC_DISABLE) {

You mean something like this (I'm having hard time even figuring out what
goes where):

	if (newstate == DC_DISABLE) {
		pr_debug("CPU#%d disabling modulation\n", cpu);
		rmwmsrl_safe_on_cpu(cpu, MSR_IA32_THERM_CONTROL, (1 << 4), 0);
	} else {
		pr_debug("CPU#%d setting duty cycle to %d%%\n", cpu, ((125 * newstate) / 10));
		rmwmsrl_safe_on_cpu(cpu, MSR_IA32_THERM_CONTROL, 14, (1 << 4) |	((newstate & 0x7)<<1));
	}

Now this is *absolutely* unreadable and hard to use. The previous
version at least showed what happens to which bits. This call site will
make everyone go look at the definition of rmwmsrl_safe_on_cpu() and see
what those last two arguments do actually.

And, again, for the n-th time, this still doesn't work if you need to do
other stuff between the rdmsr and wrmsr. So your interface will cover
*some* cases but not all. So people should do rmwmsrl_safe_on_cpu() but
not always - only if they don't need to do stuff between the reads and
the writes.

Hmm, no thanks.

> > > static int sfi_cpufreq_target(struct cpufreq_policy *policy,
> > > unsigned int index) {
> > > ...
> > > 
> > > 	rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> > > 	lo = (lo & ~INTEL_PERF_CTL_MASK) |
> > > 		((u32) sfi_cpufreq_array[next_perf_state].ctrl_val &
> > > 		INTEL_PERF_CTL_MASK);
> > > 	wrmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, lo, hi);  
> > 
> > Ditto.
> > 
> > These two examples prove my point, actually.
> 
> same here, it is just clear mask and set mask, why not?

Like this?

	rmwmsrl_safe_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
			    INTEL_PERF_CTL_MASK,
			    (u32)sfi_cpufreq_array[next_perf_state].ctrl_val & INTEL_PERF_CTL_MASK);

Yikes!

So yes, it can work but it is ugly, hard to parse and use, not generic
enough, etc, etc.

So thanks, but no thanks.
Thomas Gleixner Jan. 13, 2016, 9:49 p.m. UTC | #10
On Wed, 13 Jan 2016, Jacob Pan wrote:
> On Wed, 13 Jan 2016 20:16:22 +0100
> Borislav Petkov <bp@alien8.de> wrote:
> 
> > On Wed, Jan 13, 2016 at 10:21:38AM -0800, Jacob Pan wrote:
> > > static int cpufreq_p4_setdc(unsigned int cpu, unsigned int newstate)
> > > {
> > > ...
> > > 
> > > 	rdmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, &l, &h);
> > > 	if (newstate == DC_DISABLE) {
> > > 		pr_debug("CPU#%d disabling modulation\n", cpu);
> > > 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l &
> > > ~(1<<4), h); } else {
> > > 		pr_debug("CPU#%d setting duty cycle to %d%%\n",
> > > 			cpu, ((125 * newstate) / 10));
> > > 		/* bits 63 - 5	: reserved
> > > 		 * bit  4	: enable/disable
> > > 		 * bits 3-1	: duty cycle
> > > 		 * bit  0	: reserved
> > > 		 */
> > > 		l = (l & ~14);
> > > 		l = l | (1<<4) | ((newstate & 0x7)<<1);
> > > 		wrmsr_on_cpu(cpu, MSR_IA32_THERM_CONTROL, l, h);
> > > 	}   
> > 
> > This cannot be converted because you need to do the stuff between the
> > rdmsr_on_cpu() and wrmsr_on_cpu() calls.
> > 
> it can be converted if move the below if statement outside read/write
> pair. 
>  	if (newstate == DC_DISABLE) {
> 
> > > static int sfi_cpufreq_target(struct cpufreq_policy *policy,
> > > unsigned int index) {
> > > ...
> > > 
> > > 	rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> > > 	lo = (lo & ~INTEL_PERF_CTL_MASK) |
> > > 		((u32) sfi_cpufreq_array[next_perf_state].ctrl_val &
> > > 		INTEL_PERF_CTL_MASK);
> > > 	wrmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, lo, hi);  
> > 
> > Ditto.
> > 
> > These two examples prove my point, actually.
> 
> same here, it is just clear mask and set mask, why not?

And what's the actual saving over a simple function which does that rdmsr,
modify, wrmsr thing and call it via smp_call_function like you did for 2 of 3
places in the rapl driver?

The amount of IPIs is the same. The amount of saved code is questionable. Lets
look at your usecase:

@@ -805,30 +809,18 @@ static int rapl_write_data_raw(struct rapl_domain *rd,
                        enum rapl_primitives prim,
                        unsigned long long value)
 {
-       u64 msr_val;
-       u32 msr;
        struct rapl_primitive_info *rp = &rpi[prim];
        int cpu;
+       u64 bits;
 
        cpu = find_active_cpu_on_package(rd->package_id);
        if (cpu < 0)
                return cpu;
-       msr = rd->msrs[rp->id];
-       if (rdmsrl_safe_on_cpu(cpu, msr, &msr_val)) {
-               dev_dbg(&rd->power_zone.dev,
-                       "failed to read msr 0x%x on cpu %d\n", msr, cpu);
-               return -EIO;
-       }
-       value = rapl_unit_xlate(rd, rd->package_id, rp->unit, value, 1);
-       msr_val &= ~rp->mask;
-       msr_val |= value << rp->shift;
-       if (wrmsrl_safe_on_cpu(cpu, msr, msr_val)) {
-               dev_dbg(&rd->power_zone.dev,
-                       "failed to write msr 0x%x on cpu %d\n", msr, cpu);
-               return -EIO;
-       }
 
-       return 0;
+       bits = rapl_unit_xlate(rd, rd->package_id, rp->unit, value, 1);
+       bits |= bits << rp->shift;
+
+       return rmwmsrl_safe_on_cpu(cpu, rd->msrs[rp->id], rp->mask, bits);
 }

So that has:	5 insertions and 17 deletions

And the library code adds 65 lines including an export. So the text size
balance of this is:

Mainline:
text	data	bss  dec     hex   filename
 5021	1008    0     6029   178d  ../build/arch/x86/lib/built-in.o
10870	1040	24   11934   2e9e  ../build/drivers/powercap/intel_rapl.o

Your patch (Just the above part which uses the lib stuff)
 5385	1008	0     6393   18f9  ../build/arch/x86/lib/built-in.o
10838	1040    24   11902   2e7e  ../build/drivers/powercap/intel_rapl.o
-----
+ 332

Now you have two more possible candidates, which require another 65 lines of
different library code and lets assume another 364 bytes of library code. So
lets further assume the above examples safe us like the rapl one 32 bytes
each, then the net damage is: 632 byte extra text size.

So what exactly is the point of this exercise?

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
srinivas pandruvada Jan. 13, 2016, 9:54 p.m. UTC | #11
On Wed, 2016-01-13 at 22:26 +0100, Borislav Petkov wrote:

[Cut]

> 
> 	rmwmsrl_safe_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
> 			    INTEL_PERF_CTL_MASK,
> 			    (u32)sfi_cpufreq_array[next_perf_state].ctr
> l_val & INTEL_PERF_CTL_MASK);
> 
> Yikes!
> 
> So yes, it can work but it is ugly, hard to parse and use, not
> generic
> enough, etc, etc.
> 
> So thanks, but no thanks.
> 
I agree, in some cases it will not make much sense to use read-
modify_write calls, the user may decide whether it makes sense or not.
But such interface is not new to Linux kernel:

regmap_update_bits(), which is referenced for 346 times.

Are you saying that any such calls are not useful?

Thanks,
Srinivas
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner Jan. 13, 2016, 10:02 p.m. UTC | #12
On Wed, 13 Jan 2016, Srinivas Pandruvada wrote:
> On Wed, 2016-01-13 at 22:26 +0100, Borislav Petkov wrote:
> > 	rmwmsrl_safe_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
> > 			    INTEL_PERF_CTL_MASK,
> > 			    (u32)sfi_cpufreq_array[next_perf_state].ctr
> > l_val & INTEL_PERF_CTL_MASK);
> > 
> > Yikes!
> > 
> > So yes, it can work but it is ugly, hard to parse and use, not
> > generic
> > enough, etc, etc.
> > 
> > So thanks, but no thanks.
> > 
> I agree, in some cases it will not make much sense to use read-
> modify_write calls, the user may decide whether it makes sense or not.
> But such interface is not new to Linux kernel:
> 
> regmap_update_bits(), which is referenced for 346 times.
> 
> Are you saying that any such calls are not useful?

There are certainly cases when such calls are useful. And those cases are when
we have a sufficiently big occurence of similar code which is sufficiently
complex to justify the library code and the export.

Thanks,

	tglx
Jacob Pan Jan. 13, 2016, 10:11 p.m. UTC | #13
On Wed, 13 Jan 2016 23:02:33 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Wed, 13 Jan 2016, Srinivas Pandruvada wrote:
> > On Wed, 2016-01-13 at 22:26 +0100, Borislav Petkov wrote:
> > > 	rmwmsrl_safe_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
> > > 			    INTEL_PERF_CTL_MASK,
> > > 			    (u32)sfi_cpufreq_array[next_perf_state].ctr
> > > l_val & INTEL_PERF_CTL_MASK);
> > > 
> > > Yikes!
> > > 
> > > So yes, it can work but it is ugly, hard to parse and use, not
> > > generic
> > > enough, etc, etc.
> > > 
> > > So thanks, but no thanks.
> > > 
> > I agree, in some cases it will not make much sense to use read-
> > modify_write calls, the user may decide whether it makes sense or
> > not. But such interface is not new to Linux kernel:
> > 
> > regmap_update_bits(), which is referenced for 346 times.
> > 
> > Are you saying that any such calls are not useful?
> 
> There are certainly cases when such calls are useful. And those cases
> are when we have a sufficiently big occurence of similar code which
> is sufficiently complex to justify the library code and the export.
> 
The balance of pros and cons depends on the number of occurrence. The
lib call overhead is constant where saving from the callers are
multiplied. Anyway, I will go back to my original code until we have
enough callers to tip the balance.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 13, 2016, 10:16 p.m. UTC | #14
On Wed, Jan 13, 2016 at 01:54:43PM -0800, Srinivas Pandruvada wrote:
> regmap_update_bits(), which is referenced for 346 times.

I'm at a loss as to why people keep talking about regmap.

$ git grep regmap_update_bits arch/x86/
$

> Are you saying that any such calls are not useful?

It seems I have to repeat myself ad absurdum so let me summarize: I
don't have anything against useful APIs. Those are not.
Jacob Pan Jan. 13, 2016, 10:20 p.m. UTC | #15
On Wed, 13 Jan 2016 22:26:02 +0100
Borislav Petkov <bp@alien8.de> wrote:

> You mean something like this (I'm having hard time even figuring out
> what goes where):
> 
> 	if (newstate == DC_DISABLE) {
> 		pr_debug("CPU#%d disabling modulation\n", cpu);
> 		rmwmsrl_safe_on_cpu(cpu, MSR_IA32_THERM_CONTROL, (1
> << 4), 0); } else {
> 		pr_debug("CPU#%d setting duty cycle to %d%%\n", cpu,
> ((125 * newstate) / 10)); rmwmsrl_safe_on_cpu(cpu,
> MSR_IA32_THERM_CONTROL, 14, (1 << 4) |	((newstate &
> 0x7)<<1)); }
> 
> Now this is *absolutely* unreadable and hard to use. The previous
> version at least showed what happens to which bits. This call site
> will make everyone go look at the definition of rmwmsrl_safe_on_cpu()
> and see what those last two arguments do actually.
> 
To me the caller code became more readable. I think you are referring
the function name being not readable, which is separate of this
conversion.

> And, again, for the n-th time, this still doesn't work if you need to
> do other stuff between the rdmsr and wrmsr. So your interface will
> cover *some* cases but not all. So people should do
> rmwmsrl_safe_on_cpu() but not always - only if they don't need to do
> stuff between the reads and the writes.
> 
I know, I never disagreed with that :) which is why I am not using it in
the other cases in RAPL driver.
> Hmm, no thanks.
> 
> > > > static int sfi_cpufreq_target(struct cpufreq_policy *policy,
> > > > unsigned int index) {
> > > > ...
> > > > 
> > > > 	rdmsr_on_cpu(policy->cpu, MSR_IA32_PERF_CTL, &lo, &hi);
> > > > 	lo = (lo & ~INTEL_PERF_CTL_MASK) |
> > > > 		((u32)
> > > > sfi_cpufreq_array[next_perf_state].ctrl_val &
> > > > INTEL_PERF_CTL_MASK); wrmsr_on_cpu(policy->cpu,
> > > > MSR_IA32_PERF_CTL, lo, hi);    
> > > 
> > > Ditto.
> > > 
> > > These two examples prove my point, actually.  
> > 
> > same here, it is just clear mask and set mask, why not?  
> 
> Like this?
> 
> 	rmwmsrl_safe_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
> 			    INTEL_PERF_CTL_MASK,
> 			    (u32)sfi_cpufreq_array[next_perf_state].ctrl_val
> & INTEL_PERF_CTL_MASK);
> 
> Yikes!
> 
> So yes, it can work but it is ugly, hard to parse and use, not generic
> enough, etc, etc.
I don't think the conversion adds extra ugliness. It actually makes it
more clear what is the mask to clear and what bits to set.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin Jan. 13, 2016, 10:23 p.m. UTC | #16
On 01/13/16 14:11, Jacob Pan wrote:
>
> The balance of pros and cons depends on the number of occurrence. The
> lib call overhead is constant where saving from the callers are
> multiplied. Anyway, I will go back to my original code until we have
> enough callers to tip the balance.
> 

The thing about premature librarization is that it can cause very
unnatural code to end up being written.  Until it is clear what the APIs
we actually need are, we shouldn't force them into a mold.  Of course,
it often makes sense to make these *local* APIs that, if useful, can get
globalized.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Jan. 13, 2016, 10:29 p.m. UTC | #17
On Wed, Jan 13, 2016 at 02:20:07PM -0800, Jacob Pan wrote:
> To me the caller code became more readable.

Not to me.

> I think you are referring the function name being not readable, which
> is separate of this conversion.

You think wrong.

I typed in the example and commented right under it. The old code is
much more understandable than the new code. The old code shows what bits
get set and cleared, the new code uses masks, sometimes hidden behind
defines, which people have to look up to understand what's going on. And
then look at the function definition to know which arg which is. And so
on and so on...

But I'm going to stop wasting my time with this now. I gave you enough
arguments against.
srinivas pandruvada Jan. 13, 2016, 10:39 p.m. UTC | #18
On Wed, 2016-01-13 at 23:16 +0100, Borislav Petkov wrote:
> On Wed, Jan 13, 2016 at 01:54:43PM -0800, Srinivas Pandruvada wrote:
> > regmap_update_bits(), which is referenced for 346 times.
> 
> I'm at a loss as to why people keep talking about regmap.

Thomas Gleixner, understood the context and he gave a satisfactory
reply.

Thanks,
Srinivas

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 48747c2..b34d6a6 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -263,7 +263,11 @@  static struct rapl_package *find_package_by_id(int id)
 /* caller to ensure CPU hotplug lock is held */
 static int find_active_cpu_on_package(int package_id)
 {
-	int i;
+	/* try to avoid remote cpu call, use raw since we are preemptible */
+	int i = raw_smp_processor_id();
+
+	if (topology_physical_package_id(i) == package_id)
+		return i;
 
 	for_each_online_cpu(i) {
 		if (topology_physical_package_id(i) == package_id)
@@ -805,30 +809,18 @@  static int rapl_write_data_raw(struct rapl_domain *rd,
 			enum rapl_primitives prim,
 			unsigned long long value)
 {
-	u64 msr_val;
-	u32 msr;
 	struct rapl_primitive_info *rp = &rpi[prim];
 	int cpu;
+	u64 bits;
 
 	cpu = find_active_cpu_on_package(rd->package_id);
 	if (cpu < 0)
 		return cpu;
-	msr = rd->msrs[rp->id];
-	if (rdmsrl_safe_on_cpu(cpu, msr, &msr_val)) {
-		dev_dbg(&rd->power_zone.dev,
-			"failed to read msr 0x%x on cpu %d\n", msr, cpu);
-		return -EIO;
-	}
-	value = rapl_unit_xlate(rd, rd->package_id, rp->unit, value, 1);
-	msr_val &= ~rp->mask;
-	msr_val |= value << rp->shift;
-	if (wrmsrl_safe_on_cpu(cpu, msr, msr_val)) {
-		dev_dbg(&rd->power_zone.dev,
-			"failed to write msr 0x%x on cpu %d\n", msr, cpu);
-		return -EIO;
-	}
 
-	return 0;
+	bits = rapl_unit_xlate(rd, rd->package_id, rp->unit, value, 1);
+	bits |= bits << rp->shift;
+
+	return rmwmsrl_safe_on_cpu(cpu, rd->msrs[rp->id], rp->mask, bits);
 }
 
 /*
@@ -893,6 +885,21 @@  static int rapl_check_unit_atom(struct rapl_package *rp, int cpu)
 	return 0;
 }
 
+static void power_limit_irq_save_cpu(void *info)
+{
+	u32 l, h = 0;
+	struct rapl_package *rp = (struct rapl_package *)info;
+
+	/* save the state of PLN irq mask bit before disabling it */
+	rdmsr_safe(MSR_IA32_PACKAGE_THERM_INTERRUPT, &l, &h);
+	if (!(rp->power_limit_irq & PACKAGE_PLN_INT_SAVED)) {
+		rp->power_limit_irq = l & PACKAGE_THERM_INT_PLN_ENABLE;
+		rp->power_limit_irq |= PACKAGE_PLN_INT_SAVED;
+	}
+	l &= ~PACKAGE_THERM_INT_PLN_ENABLE;
+	wrmsr_safe(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
+}
+
 
 /* REVISIT:
  * When package power limit is set artificially low by RAPL, LVT
@@ -906,7 +913,6 @@  static int rapl_check_unit_atom(struct rapl_package *rp, int cpu)
 
 static void package_power_limit_irq_save(int package_id)
 {
-	u32 l, h = 0;
 	int cpu;
 	struct rapl_package *rp;
 
@@ -920,20 +926,27 @@  static void package_power_limit_irq_save(int package_id)
 	cpu = find_active_cpu_on_package(package_id);
 	if (cpu < 0)
 		return;
-	/* save the state of PLN irq mask bit before disabling it */
-	rdmsr_safe_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, &l, &h);
-	if (!(rp->power_limit_irq & PACKAGE_PLN_INT_SAVED)) {
-		rp->power_limit_irq = l & PACKAGE_THERM_INT_PLN_ENABLE;
-		rp->power_limit_irq |= PACKAGE_PLN_INT_SAVED;
-	}
-	l &= ~PACKAGE_THERM_INT_PLN_ENABLE;
-	wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
+	smp_call_function_single(cpu, power_limit_irq_save_cpu, rp, 1);
+}
+
+static void power_limit_irq_restore_cpu(void *info)
+{
+	u32 l, h = 0;
+	struct rapl_package *rp = (struct rapl_package *)info;
+
+	rdmsr_safe(MSR_IA32_PACKAGE_THERM_INTERRUPT, &l, &h);
+
+	if (rp->power_limit_irq & PACKAGE_THERM_INT_PLN_ENABLE)
+		l |= PACKAGE_THERM_INT_PLN_ENABLE;
+	else
+		l &= ~PACKAGE_THERM_INT_PLN_ENABLE;
+
+	wrmsr_safe(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
 }
 
 /* restore per package power limit interrupt enable state */
 static void package_power_limit_irq_restore(int package_id)
 {
-	u32 l, h;
 	int cpu;
 	struct rapl_package *rp;
 
@@ -951,14 +964,8 @@  static void package_power_limit_irq_restore(int package_id)
 	/* irq enable state not saved, nothing to restore */
 	if (!(rp->power_limit_irq & PACKAGE_PLN_INT_SAVED))
 		return;
-	rdmsr_safe_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, &l, &h);
-
-	if (rp->power_limit_irq & PACKAGE_THERM_INT_PLN_ENABLE)
-		l |= PACKAGE_THERM_INT_PLN_ENABLE;
-	else
-		l &= ~PACKAGE_THERM_INT_PLN_ENABLE;
 
-	wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
+	smp_call_function_single(cpu, power_limit_irq_restore_cpu, rp, 1);
 }
 
 static void set_floor_freq_default(struct rapl_domain *rd, bool mode)