diff mbox

[RFC,18/19] cpufreq: remove transition_lock

Message ID 1452533760-13787-19-git-send-email-juri.lelli@arm.com (mailing list archive)
State RFC
Headers show

Commit Message

Juri Lelli Jan. 11, 2016, 5:35 p.m. UTC
From: Michael Turquette <mturquette@baylibre.com>

transition_lock was introduced to serialize cpufreq transition
notifiers. Instead of using a different lock for protecting concurrent
modifications of policy, it is better to require that callers of
transition notifiers implement appropriate locking (this is already the
case AFAICS). Removing transition_lock also simplifies current locking
scheme.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
---
 drivers/cpufreq/cpufreq.c | 19 ++++++++++---------
 include/linux/cpufreq.h   |  1 -
 2 files changed, 10 insertions(+), 10 deletions(-)

Comments

Viresh Kumar Jan. 12, 2016, 11:24 a.m. UTC | #1
On 11-01-16, 17:35, Juri Lelli wrote:
> From: Michael Turquette <mturquette@baylibre.com>
> 
> transition_lock was introduced to serialize cpufreq transition
> notifiers. Instead of using a different lock for protecting concurrent
> modifications of policy, it is better to require that callers of
> transition notifiers implement appropriate locking (this is already the
> case AFAICS). Removing transition_lock also simplifies current locking
> scheme.

So, are you saying that the reasoning mentioned in this patch are all
wrong?

commit 12478cf0c55e ("cpufreq: Make sure frequency transitions are
serialized")
Michael Turquette Jan. 13, 2016, 12:54 a.m. UTC | #2
Hi Viresh,

Quoting Viresh Kumar (2016-01-12 03:24:09)
> On 11-01-16, 17:35, Juri Lelli wrote:
> > From: Michael Turquette <mturquette@baylibre.com>
> > 
> > transition_lock was introduced to serialize cpufreq transition
> > notifiers. Instead of using a different lock for protecting concurrent
> > modifications of policy, it is better to require that callers of
> > transition notifiers implement appropriate locking (this is already the
> > case AFAICS). Removing transition_lock also simplifies current locking
> > scheme.
> 
> So, are you saying that the reasoning mentioned in this patch are all
> wrong?
> 
> commit 12478cf0c55e ("cpufreq: Make sure frequency transitions are
> serialized")

No, that's not what I'm saying. Quoting that patch:

"""
The key challenge is to allow drivers to begin the transition from one thread
and end it in a completely different thread (this is to enable drivers that do
asynchronous POSTCHANGE notification from bottom-halves, to also use the same
interface).

To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a
wait-queue are added per-policy. The flag and the wait-queue are used in
conjunction to create an "uninterrupted flow" from _begin() to _end(). The
spinlock is used to ensure that only one such "flow" is in flight at any given
time. Put together, this provides us all the necessary synchronization.
"""

So the transition_onging flag and wait-queue are all good. That stuff is
just great. This patch doesn't touch it.

What it does change is that it removes a superfluous spinlock that
should never have needed to exist in the first place.
cpufreq_freq_transition_begin is called directly by driver target
callbacks, and it is called by __cpufreq_driver_target.

__cpufreq_driver_target should be using a per-policy lock. Any other
behavior is just insane. I haven't gone through this thread to see if
that change has been made by Juri, but we need to get there either in
this series or the follow-up series that introduces some RCU locking.

Regards,
Mike

> 
> -- 
> viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 13, 2016, 6:31 a.m. UTC | #3
On 12-01-16, 16:54, Michael Turquette wrote:
> __cpufreq_driver_target should be using a per-policy lock.

It doesn't :)
Juri Lelli Jan. 14, 2016, 9:44 a.m. UTC | #4
Hi,

On 13/01/16 10:21, Michael Turquette wrote:
> Hi Viresh,
> 
> Quoting Viresh Kumar (2016-01-12 22:31:48)
> > On 12-01-16, 16:54, Michael Turquette wrote:
> > > __cpufreq_driver_target should be using a per-policy lock.
> > 
> > It doesn't :)
> 
> It should.
> 
> A less conceited response is that a per-policy lock should be held
> around calls to __cpufreq_driver_target. This can obviously be done by
> cpufreq_driver_target (no double underscore), but there are quite a few
> drivers that call __cpufreq_driver_target, and since we're touching the
> policy structure we need a lock around it.
> 

Agree, we should enforce the rule that everything that touches policy
structure has to lock it before.

> Juri's cover letter did not explicitly state my original, full intention
> for the patches I was working on. I'll spell that out below and
> hopefully we can gather consensus around it before moving forward. Juri,
> I'm implicitly assuming that you agree with the stuff below, but please
> correct me if I am wrong.

Right. I decided to post with this RFC only a subset of the patches we
came up with because I needed to build some more confidence with the
subsystem I was going to propose changes for. Review comments received
are helping me on that front. I didn't mention at all next steps (RCU)
because I wanted to focus on understanding and documenting, and maybe
fixing where required, the current status, before we change it.

> The original idea for overhauling the locking
> in cpufreq is to use two locks:
> 
> 1) per-policy lock (my patches were using a mutex), which is the only
> lock that needs to be touched during a frequency transition. We do not
> want any lock contention between policy's during freq transition. For
> read-side operation this locking scheme should utilize RCU so that the
> scheduler can safely access the values in struct cpufreq_policy within
> it's schedule() context. [a note on RCU below]
> 
> 2) a single, framework-wide lock (my patches were using a mutex) that
> handles all of the other synchronization: governor events, driver events
> and anything else that does not happen on a per-policy basis. I don't
> think RCU is necessary for this. These operations are all slow-path ones
> so reducing the mess of 6-ish locks in cpufreq.c and friends down to a
> single mutex simplifies things greatly, eliminates the "drop the lock
> here for a few instructions" hacks and generally makes things more
> readable.
> 

This is basically what I also have on top of this series. I actually
went for RCUs also for 2, but yes, that's maybe overkilling.

A comment on 1 above, and something on which I got stuck upon for some
time, is that, if we implement RCU logic as it is supposed to be, I
think we can generate a lot of copy-update operations when changing
frequency (as policy structure needs to be changed). Also, we might read
stale data. So, I'm not sure this will pay off. However, I tried to get
around this problem and I guess we will discuss if 1 is doable in the
next RFC :-).

> A quick note on RCU and the scheduler-driven DVFS stuff: RCU only helps
> us on read-side operations. For the purposes of sched-dvfs, this means
> that when we look at capacity utilization and want to normalize
> frequency based on that, we need to access the per-policy structure in a
> lockless way. RCU makes this possible.
> 
> RCU is absolutely not a magic bullet or elixir that lets us kick off
> DVFS transitions from the schedule() context. The frequency transitions
> are write-side operations, as we invariably touch struct cpufreq_policy.
> This means that the read-side stuff can live in the schedule() context,
> but write-side needs to be kicked out to a thread.
> 

Correct. We will still need the kthread machinery even after this
changes.

Thanks for clarifying things!

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 14, 2016, 10:32 a.m. UTC | #5
On 13-01-16, 10:21, Michael Turquette wrote:
> Quoting Viresh Kumar (2016-01-12 22:31:48)
> > On 12-01-16, 16:54, Michael Turquette wrote:
> > > __cpufreq_driver_target should be using a per-policy lock.
> > 
> > It doesn't :)
> 
> It should.

I thought we wanted the routine doing DVFS to not sleep as it will be
called from scheduler ?

Looks fine otherwise. But yeah, the series is still incomplete in the
sense that there is no lock today around __cpufreq_driver_target().
Juri Lelli Jan. 14, 2016, 1:52 p.m. UTC | #6
On 14/01/16 16:02, Viresh Kumar wrote:
> On 13-01-16, 10:21, Michael Turquette wrote:
> > Quoting Viresh Kumar (2016-01-12 22:31:48)
> > > On 12-01-16, 16:54, Michael Turquette wrote:
> > > > __cpufreq_driver_target should be using a per-policy lock.
> > > 
> > > It doesn't :)
> > 
> > It should.
> 
> I thought we wanted the routine doing DVFS to not sleep as it will be
> called from scheduler ?
> 
> Looks fine otherwise. But yeah, the series is still incomplete in the
> sense that there is no lock today around __cpufreq_driver_target().
> 

I was under the impression that the purpose of having
__cpufreq_driver_target() exported outside cpufreq.c was working due to
the fact that users implement their own locking.

That's why I put the following comment in this patch.

 /*
  * Callers must ensure proper mutual exclusion on policy (for transition_
  * ongoing/transition_task handling). While holding policy->rwsem is
  * sufficient, other schemes might work as well (e.g., cpufreq_governor.c
  * holds timer_mutex while entering the path that generates transitions).
  */

From what I can see ondemand and conservative (via governor) seem to use
timer_mutex; userspace userspace_mutex instead. Do they serve different
purposes instead? How do we currently serialize operations on policy
when using __cpufreq_driver_target() directly otherwise?

Thanks,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Jan. 18, 2016, 5:09 a.m. UTC | #7
On 14-01-16, 13:52, Juri Lelli wrote:
> I was under the impression that the purpose of having
> __cpufreq_driver_target() exported outside cpufreq.c was working due to
> the fact that users implement their own locking.
> 
> That's why I put the following comment in this patch.
> 
>  /*
>   * Callers must ensure proper mutual exclusion on policy (for transition_
>   * ongoing/transition_task handling). While holding policy->rwsem is
>   * sufficient, other schemes might work as well (e.g., cpufreq_governor.c
>   * holds timer_mutex while entering the path that generates transitions).
>   */
> 
> >From what I can see ondemand and conservative (via governor) seem to use
> timer_mutex; userspace userspace_mutex instead. Do they serve different
> purposes instead? How do we currently serialize operations on policy
> when using __cpufreq_driver_target() directly otherwise?

The patch I referred to earlier in the thread had detailed few of the
races we were worried about. The lock you just removed is responsible
for taking care of the races you are worried now :)
Peter Zijlstra Jan. 19, 2016, 2 p.m. UTC | #8
On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote:
> RCU is absolutely not a magic bullet or elixir that lets us kick off
> DVFS transitions from the schedule() context. The frequency transitions
> are write-side operations, as we invariably touch struct cpufreq_policy.
> This means that the read-side stuff can live in the schedule() context,
> but write-side needs to be kicked out to a thread.

Why? If the state is per-cpu and acquired by RCU, updates should be no
problem at all.

If you need inter-cpu state, then things get to be a little tricky
though, but you can actually nest a raw_spinlock_t in there if you
absolutely have to.


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Juri Lelli Jan. 19, 2016, 2:42 p.m. UTC | #9
On 19/01/16 15:00, Peter Zijlstra wrote:
> On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote:
> > RCU is absolutely not a magic bullet or elixir that lets us kick off
> > DVFS transitions from the schedule() context. The frequency transitions
> > are write-side operations, as we invariably touch struct cpufreq_policy.
> > This means that the read-side stuff can live in the schedule() context,
> > but write-side needs to be kicked out to a thread.
> 
> Why? If the state is per-cpu and acquired by RCU, updates should be no
> problem at all.
> 
> If you need inter-cpu state, then things get to be a little tricky
> though, but you can actually nest a raw_spinlock_t in there if you
> absolutely have to.
> 

We have at least two problems. First one is that state is per frequency
domain (struct cpufreq_policy) and this usually spans more than one cpu.
Second one is that we might need to sleep while servicing the frequency
transition, both because platform needs to sleep and because some paths
of cpufreq core use sleeping locks (yes, that might be changed as well I
guess).  A solution based on spinlocks only might not be usable on
platforms that needs to sleep, also.

Another thing that I was thinking of actually is that since struct
cpufreq_policy is updated a lot (more or less at every frequency
transition), is it actually suitable for RCU?

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Jan. 19, 2016, 3:30 p.m. UTC | #10
On Tue, Jan 19, 2016 at 02:42:33PM +0000, Juri Lelli wrote:
> On 19/01/16 15:00, Peter Zijlstra wrote:
> > On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote:
> > > RCU is absolutely not a magic bullet or elixir that lets us kick off
> > > DVFS transitions from the schedule() context. The frequency transitions
> > > are write-side operations, as we invariably touch struct cpufreq_policy.
> > > This means that the read-side stuff can live in the schedule() context,
> > > but write-side needs to be kicked out to a thread.
> > 
> > Why? If the state is per-cpu and acquired by RCU, updates should be no
> > problem at all.
> > 
> > If you need inter-cpu state, then things get to be a little tricky
> > though, but you can actually nest a raw_spinlock_t in there if you
> > absolutely have to.
> > 
> 
> We have at least two problems. First one is that state is per frequency
> domain (struct cpufreq_policy) and this usually spans more than one cpu.
> Second one is that we might need to sleep while servicing the frequency
> transition, both because platform needs to sleep and because some paths
> of cpufreq core use sleeping locks (yes, that might be changed as well I
> guess).  A solution based on spinlocks only might not be usable on
> platforms that needs to sleep, also.

Sure, if you need to actually sleep to poke the hardware you've lost and
you do indeed need the kthread thingy.

> Another thing that I was thinking of actually is that since struct
> cpufreq_policy is updated a lot (more or less at every frequency
> transition), is it actually suitable for RCU?

That entirely depends on how 'hard' it is to 'replace/change' the
cpufreq policy.

Typically I envision that to be very hard and require mutexes and the
like, in which case RCU can provide a cheap lookup and existence.

So on 'sane' hardware with per logical cpu hints you can get away
without any locks.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Juri Lelli Jan. 19, 2016, 4:01 p.m. UTC | #11
On 19/01/16 16:30, Peter Zijlstra wrote:
> On Tue, Jan 19, 2016 at 02:42:33PM +0000, Juri Lelli wrote:
> > On 19/01/16 15:00, Peter Zijlstra wrote:
> > > On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote:
> > > > RCU is absolutely not a magic bullet or elixir that lets us kick off
> > > > DVFS transitions from the schedule() context. The frequency transitions
> > > > are write-side operations, as we invariably touch struct cpufreq_policy.
> > > > This means that the read-side stuff can live in the schedule() context,
> > > > but write-side needs to be kicked out to a thread.
> > > 
> > > Why? If the state is per-cpu and acquired by RCU, updates should be no
> > > problem at all.
> > > 
> > > If you need inter-cpu state, then things get to be a little tricky
> > > though, but you can actually nest a raw_spinlock_t in there if you
> > > absolutely have to.
> > > 
> > 
> > We have at least two problems. First one is that state is per frequency
> > domain (struct cpufreq_policy) and this usually spans more than one cpu.
> > Second one is that we might need to sleep while servicing the frequency
> > transition, both because platform needs to sleep and because some paths
> > of cpufreq core use sleeping locks (yes, that might be changed as well I
> > guess).  A solution based on spinlocks only might not be usable on
> > platforms that needs to sleep, also.
> 
> Sure, if you need to actually sleep to poke the hardware you've lost and
> you do indeed need the kthread thingy.
> 

Yeah, also cpufreq relies on blocking notifiers (to name one thing). So,
it seems to me quite some things needs to be changed to make it fully
non sleeping.

> > Another thing that I was thinking of actually is that since struct
> > cpufreq_policy is updated a lot (more or less at every frequency
> > transition), is it actually suitable for RCU?
> 
> That entirely depends on how 'hard' it is to 'replace/change' the
> cpufreq policy.
> 
> Typically I envision that to be very hard and require mutexes and the
> like, in which case RCU can provide a cheap lookup and existence.
> 

Right, read path is fast, but write path still requires some sort of
locking (malloc, copy and update). So, I'm wondering if this still pays
off for a structure that gets written a lot.

> So on 'sane' hardware with per logical cpu hints you can get away
> without any locks.
> 

But maybe you are saying that there are ways we can make that work :).

Thanks,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Jan. 19, 2016, 7:17 p.m. UTC | #12
On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote:
> Right, read path is fast, but write path still requires some sort of
> locking (malloc, copy and update). So, I'm wondering if this still pays
> off for a structure that gets written a lot.

No, not at all.

struct cpufreq_driver *driver;

void sched_util_change(unsigned int util)
{
	struct my_per_cpu_data *foo;

	rcu_read_lock();
	foo = __this_cpu_ptr(rcu_dereference(driver)->data);
	if (foo) {
		if (abs(util - foo->last_util) > 10) {
			foo->last_util = util;
			foo->set_util(util);
		}
	}
	rcu_read_unlock();
}


struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver)
{
	struct cpufreq_driver *old_driver;

	mutex_lock(&cpufreq_driver_lock);
	old_driver = driver;
	rcu_assign_driver(driver, new_driver);
	if (old_driver)
		synchronize_rcu();
	mutex_unlock(&cpufreq_driver_lock);

	return old_driver;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Jan. 19, 2016, 7:21 p.m. UTC | #13
On Tue, Jan 19, 2016 at 08:17:34PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote:
> > Right, read path is fast, but write path still requires some sort of
> > locking (malloc, copy and update). So, I'm wondering if this still pays
> > off for a structure that gets written a lot.
> 
> No, not at all.
> 
> struct cpufreq_driver *driver;
> 
> void sched_util_change(unsigned int util)
> {
> 	struct my_per_cpu_data *foo;
> 
> 	rcu_read_lock();

That should obviously be:

	d = rcu_dereference(driver);
	if (d) {
		foo = __this_cpu_ptr(d->data);

> 		if (abs(util - foo->last_util) > 10) {
> 			foo->last_util = util;
> 			foo->set_util(util);
> 		}
> 	}
> 	rcu_read_unlock();
> }
> 
> 
> struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver)
> {
> 	struct cpufreq_driver *old_driver;
> 
> 	mutex_lock(&cpufreq_driver_lock);
> 	old_driver = driver;
> 	rcu_assign_driver(driver, new_driver);
> 	if (old_driver)
> 		synchronize_rcu();
> 	mutex_unlock(&cpufreq_driver_lock);
> 
> 	return old_driver;
> }
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 19, 2016, 9:52 p.m. UTC | #14
On Tuesday, January 19, 2016 08:21:11 PM Peter Zijlstra wrote:
> On Tue, Jan 19, 2016 at 08:17:34PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote:
> > > Right, read path is fast, but write path still requires some sort of
> > > locking (malloc, copy and update). So, I'm wondering if this still pays
> > > off for a structure that gets written a lot.
> > 
> > No, not at all.
> > 

This is very similar to what I was thinking about, plus-minus a couple of
things.

> > struct cpufreq_driver *driver;
> > 
> > void sched_util_change(unsigned int util)
> > {
> > 	struct my_per_cpu_data *foo;
> > 
> > 	rcu_read_lock();
> 
> That should obviously be:
> 
> 	d = rcu_dereference(driver);
> 	if (d) {
> 		foo = __this_cpu_ptr(d->data);

If we do this, it would be convenient to define ->set_util() to take
foo as an arg too, in addition to util.

And is there any particular reason why d->data has to be per-cpu?

> 
> > 		if (abs(util - foo->last_util) > 10) {

Even if the utilization doesn't change, it still may be too high or too low,
so we may want to call foo->set_util() in that case too, at least once a
while.

> > 			foo->last_util = util;
> > 			foo->set_util(util);
> > 		}
> > 	}
> > 	rcu_read_unlock();
> > }
> > 
> > 
> > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver)
> > {
> > 	struct cpufreq_driver *old_driver;
> > 
> > 	mutex_lock(&cpufreq_driver_lock);
> > 	old_driver = driver;
> > 	rcu_assign_driver(driver, new_driver);
> > 	if (old_driver)
> > 		synchronize_rcu();
> > 	mutex_unlock(&cpufreq_driver_lock);
> > 
> > 	return old_driver;
> > }

We never need to do this, because we never replace one driver with another in
one go.  We need to go from a valid driver pointer to NULL and the other way
around only.

This means there may be other pointers around that may be accessed safely
from foo->set_util() above if there's a rule that they must be set before
the driver pointer and the data structures they point to must stay around
until the syncronize_rcu() returns.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Juri Lelli Jan. 20, 2016, 12:59 p.m. UTC | #15
On 19/01/16 20:21, Peter Zijlstra wrote:
> On Tue, Jan 19, 2016 at 08:17:34PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote:
> > > Right, read path is fast, but write path still requires some sort of
> > > locking (malloc, copy and update). So, I'm wondering if this still pays
> > > off for a structure that gets written a lot.
> > 
> > No, not at all.
> > 
> > struct cpufreq_driver *driver;
> > 
> > void sched_util_change(unsigned int util)
> > {
> > 	struct my_per_cpu_data *foo;
> > 
> > 	rcu_read_lock();
> 
> That should obviously be:
> 
> 	d = rcu_dereference(driver);
> 	if (d) {
> 		foo = __this_cpu_ptr(d->data);
> 
> > 		if (abs(util - foo->last_util) > 10) {
> > 			foo->last_util = util;
> > 			foo->set_util(util);
> > 		}
> > 	}
> > 	rcu_read_unlock();
> > }
> > 
> > 
> > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver)
> > {
> > 	struct cpufreq_driver *old_driver;
> > 
> > 	mutex_lock(&cpufreq_driver_lock);
> > 	old_driver = driver;
> > 	rcu_assign_driver(driver, new_driver);
> > 	if (old_driver)
> > 		synchronize_rcu();
> > 	mutex_unlock(&cpufreq_driver_lock);
> > 
> > 	return old_driver;
> > }
> > 
> > 
> > 
> 

Right, this addresses the driver side (modulo what Rafael pointed out
about setting driver pointer to NULL and then to point to the new
driver); and for this part I think RCU works well. I'm not concerned
about the driver side :).

Now, assuming that we move cpufreq_cpu_data inside cpufreq_driver (IIUC
this is your d->data), we will have per_cpu pointers pointing to the
different policies. Inside these policy data structures we have
information regarding current frequency, maximum allowed frequency, cpus
covered by this policy, and a few more. IIUC this is your foo thing.
Since the structure pointed to by foo will be shared amongs several
cpus, we need some way to guarantee mutual exclusion and such. I think
we were thinking to use RCU for this bit as well and that is what I'm
concerned about, as curr frequency will change at every frequency
transition.

Maybe you are also implying that we need to change cpufreq_cpu_data as
well. I need to think more about that.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Jan. 20, 2016, 5:04 p.m. UTC | #16
On Tue, Jan 19, 2016 at 10:52:22PM +0100, Rafael J. Wysocki wrote:
> This is very similar to what I was thinking about, plus-minus a couple of
> things.
> 
> > > struct cpufreq_driver *driver;
> > > 
> > > void sched_util_change(unsigned int util)
> > > {
> > > 	struct my_per_cpu_data *foo;
> > > 
> > > 	rcu_read_lock();
> > 
> > That should obviously be:
> > 
> > 	d = rcu_dereference(driver);
> > 	if (d) {
> > 		foo = __this_cpu_ptr(d->data);
> 
> If we do this, it would be convenient to define ->set_util() to take
> foo as an arg too, in addition to util.
> 
> And is there any particular reason why d->data has to be per-cpu?

Seems sensible, at best it actually is per cpu data, at worst this per
cpu pointer points to the same data for multiple cpus (the freq domain).

> > 
> > > 		if (abs(util - foo->last_util) > 10) {
> 
> Even if the utilization doesn't change, it still may be too high or too low,
> so we may want to call foo->set_util() in that case too, at least once a
> while.
> 
> > > 			foo->last_util = util;

Ah, the whole point of this was that ^^^ store.

Modifying the data structure doesn't need a new alloc / copy etc.. We
only use RCU to guarantee the data exists, once we have the data, the
data itself can be modified however.

Here its strictly per-cpu data, so modifying it can be unserialized
since CPUs themselves are sequentially consistent.

If you have a freq domain with multiple CPUs in, you'll have to go stick
a lock in.

> > > 			foo->set_util(util);
> > > 		}
> > > 	}
> > > 	rcu_read_unlock();
> > > }
> > > 
> > > 
> > > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver)
> > > {
> > > 	struct cpufreq_driver *old_driver;
> > > 
> > > 	mutex_lock(&cpufreq_driver_lock);
> > > 	old_driver = driver;
> > > 	rcu_assign_driver(driver, new_driver);
> > > 	if (old_driver)
> > > 		synchronize_rcu();
> > > 	mutex_unlock(&cpufreq_driver_lock);
> > > 
> > > 	return old_driver;
> > > }
> 
> We never need to do this, because we never replace one driver with another in
> one go.  We need to go from a valid driver pointer to NULL and the other way
> around only.

The above can do those transitions :-)

> This means there may be other pointers around that may be accessed safely
> from foo->set_util() above if there's a rule that they must be set before
> the driver pointer and the data structures they point to must stay around
> until the syncronize_rcu() returns.

I would dangle _everything_ off the one driver pointer, that's much
easier.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 20, 2016, 10:12 p.m. UTC | #17
On Wed, Jan 20, 2016 at 6:04 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jan 19, 2016 at 10:52:22PM +0100, Rafael J. Wysocki wrote:
>> This is very similar to what I was thinking about, plus-minus a couple of
>> things.
>>
>> > > struct cpufreq_driver *driver;
>> > >
>> > > void sched_util_change(unsigned int util)
>> > > {
>> > >   struct my_per_cpu_data *foo;
>> > >
>> > >   rcu_read_lock();
>> >
>> > That should obviously be:
>> >
>> >     d = rcu_dereference(driver);
>> >     if (d) {
>> >             foo = __this_cpu_ptr(d->data);
>>
>> If we do this, it would be convenient to define ->set_util() to take
>> foo as an arg too, in addition to util.
>>
>> And is there any particular reason why d->data has to be per-cpu?
>
> Seems sensible, at best it actually is per cpu data, at worst this per
> cpu pointer points to the same data for multiple cpus (the freq domain).
>
>> >
>> > >           if (abs(util - foo->last_util) > 10) {
>>
>> Even if the utilization doesn't change, it still may be too high or too low,
>> so we may want to call foo->set_util() in that case too, at least once a
>> while.
>>
>> > >                   foo->last_util = util;
>
> Ah, the whole point of this was that ^^^ store.

OK, I see.

> Modifying the data structure doesn't need a new alloc / copy etc.. We
> only use RCU to guarantee the data exists, once we have the data, the
> data itself can be modified however.
>
> Here its strictly per-cpu data, so modifying it can be unserialized
> since CPUs themselves are sequentially consistent.

Right.  So that's why you want it to be per-cpu really.

> If you have a freq domain with multiple CPUs in, you'll have to go stick
> a lock in.

Right.

>> > >                   foo->set_util(util);
>> > >           }
>> > >   }
>> > >   rcu_read_unlock();
>> > > }
>> > >
>> > >
>> > > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver)
>> > > {
>> > >   struct cpufreq_driver *old_driver;
>> > >
>> > >   mutex_lock(&cpufreq_driver_lock);
>> > >   old_driver = driver;
>> > >   rcu_assign_driver(driver, new_driver);
>> > >   if (old_driver)
>> > >           synchronize_rcu();
>> > >   mutex_unlock(&cpufreq_driver_lock);
>> > >
>> > >   return old_driver;
>> > > }
>>
>> We never need to do this, because we never replace one driver with another in
>> one go.  We need to go from a valid driver pointer to NULL and the other way
>> around only.
>
> The above can do those transitions :-)

Yes, it can, but the real thing will probably be more complicated than
the code above and then the difference may actually matter.

>> This means there may be other pointers around that may be accessed safely
>> from foo->set_util() above if there's a rule that they must be set before
>> the driver pointer and the data structures they point to must stay around
>> until the syncronize_rcu() returns.
>
> I would dangle _everything_ off the one driver pointer, that's much
> easier.

I'm not sure how much easier it is in practice.

Even if everything dangles out of the driver pointer, data structures
pointed to by those things need not be allocated all in one go by the
same entity.  Some of them are allocated by drivers, some of them by
the core, at different times.  The ordering between those allocations
and populating the pointers is what matters, not how all that is laid
out in memory.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Jan. 20, 2016, 10:38 p.m. UTC | #18
On Wed, Jan 20, 2016 at 11:12:45PM +0100, Rafael J. Wysocki wrote:
> > I would dangle _everything_ off the one driver pointer, that's much
> > easier.
> 
> I'm not sure how much easier it is in practice.
> 
> Even if everything dangles out of the driver pointer, data structures
> pointed to by those things need not be allocated all in one go by the
> same entity.  Some of them are allocated by drivers, some of them by
> the core, at different times.

Yes, I've noticed, some of that is really bonkers.

> The ordering between those allocations
> and populating the pointers is what matters, not how all that is laid
> out in memory.

I'm thinking getting that ordering right is easier/more natural, if its
all contained in one object. But this could be subjective.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Jan. 20, 2016, 11:33 p.m. UTC | #19
On Wed, Jan 20, 2016 at 11:38 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jan 20, 2016 at 11:12:45PM +0100, Rafael J. Wysocki wrote:
>> > I would dangle _everything_ off the one driver pointer, that's much
>> > easier.
>>
>> I'm not sure how much easier it is in practice.
>>
>> Even if everything dangles out of the driver pointer, data structures
>> pointed to by those things need not be allocated all in one go by the
>> same entity.  Some of them are allocated by drivers, some of them by
>> the core, at different times.
>
> Yes, I've noticed, some of that is really bonkers.
>
>> The ordering between those allocations
>> and populating the pointers is what matters, not how all that is laid
>> out in memory.
>
> I'm thinking getting that ordering right is easier/more natural, if its
> all contained in one object. But this could be subjective.

I'm trying to look at this from the perspective of making changes.

It should be possible to change the ordering of how the data
structures are populated and pointers set without changing the
existing memory layout of them, which may allow us to minimize the
amount of changes to cpufreq drivers for old hardware (and therefore
generally difficult to test), for example.

Also, this way each individual change may be more limited in scope and
therefore less error prone IMO.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6c9bef7..78b1e2f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -421,10 +421,15 @@  static void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
 	cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
 }
 
+/*
+ * Callers must ensure proper mutual exclusion on policy (for transition_
+ * ongoing/transition_task handling). While holding policy->rwsem is
+ * sufficient, other scheme might work as well (e.g., cpufreq_governor.c
+ * holds timer_mutex while entering the path that generates transitions).
+ */
 void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
 		struct cpufreq_freqs *freqs)
 {
-
 	/*
 	 * Catch double invocations of _begin() which lead to self-deadlock.
 	 * ASYNC_NOTIFICATION drivers are left out because the cpufreq core
@@ -439,22 +444,19 @@  void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
 wait:
 	wait_event(policy->transition_wait, !policy->transition_ongoing);
 
-	spin_lock(&policy->transition_lock);
-
-	if (unlikely(policy->transition_ongoing)) {
-		spin_unlock(&policy->transition_lock);
+	if (unlikely(policy->transition_ongoing))
 		goto wait;
-	}
 
 	policy->transition_ongoing = true;
 	policy->transition_task = current;
 
-	spin_unlock(&policy->transition_lock);
-
 	cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
 }
 EXPORT_SYMBOL_GPL(cpufreq_freq_transition_begin);
 
+/*
+ * As above, callers must ensure proper mutual exclusion on policy.
+ */
 void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
 		struct cpufreq_freqs *freqs, int transition_failed)
 {
@@ -1057,7 +1059,6 @@  static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	kobject_init(&policy->kobj, &ktype_cpufreq);
 	INIT_LIST_HEAD(&policy->policy_list);
 	init_rwsem(&policy->rwsem);
-	spin_lock_init(&policy->transition_lock);
 	init_waitqueue_head(&policy->transition_wait);
 	init_completion(&policy->kobj_unregister);
 	INIT_WORK(&policy->update, handle_update);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 79b87ce..6bbb88f 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -105,7 +105,6 @@  struct cpufreq_policy {
 
 	/* Synchronization for frequency transitions */
 	bool			transition_ongoing; /* Tracks transition status */
-	spinlock_t		transition_lock;
 	wait_queue_head_t	transition_wait;
 	struct task_struct	*transition_task; /* Task which is doing the transition */