Message ID | 1452533760-13787-19-git-send-email-juri.lelli@arm.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On 11-01-16, 17:35, Juri Lelli wrote: > From: Michael Turquette <mturquette@baylibre.com> > > transition_lock was introduced to serialize cpufreq transition > notifiers. Instead of using a different lock for protecting concurrent > modifications of policy, it is better to require that callers of > transition notifiers implement appropriate locking (this is already the > case AFAICS). Removing transition_lock also simplifies current locking > scheme. So, are you saying that the reasoning mentioned in this patch are all wrong? commit 12478cf0c55e ("cpufreq: Make sure frequency transitions are serialized")
Hi Viresh, Quoting Viresh Kumar (2016-01-12 03:24:09) > On 11-01-16, 17:35, Juri Lelli wrote: > > From: Michael Turquette <mturquette@baylibre.com> > > > > transition_lock was introduced to serialize cpufreq transition > > notifiers. Instead of using a different lock for protecting concurrent > > modifications of policy, it is better to require that callers of > > transition notifiers implement appropriate locking (this is already the > > case AFAICS). Removing transition_lock also simplifies current locking > > scheme. > > So, are you saying that the reasoning mentioned in this patch are all > wrong? > > commit 12478cf0c55e ("cpufreq: Make sure frequency transitions are > serialized") No, that's not what I'm saying. Quoting that patch: """ The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface). To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The spinlock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization. """ So the transition_onging flag and wait-queue are all good. That stuff is just great. This patch doesn't touch it. What it does change is that it removes a superfluous spinlock that should never have needed to exist in the first place. cpufreq_freq_transition_begin is called directly by driver target callbacks, and it is called by __cpufreq_driver_target. __cpufreq_driver_target should be using a per-policy lock. Any other behavior is just insane. I haven't gone through this thread to see if that change has been made by Juri, but we need to get there either in this series or the follow-up series that introduces some RCU locking. Regards, Mike > > -- > viresh -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12-01-16, 16:54, Michael Turquette wrote:
> __cpufreq_driver_target should be using a per-policy lock.
It doesn't :)
Hi, On 13/01/16 10:21, Michael Turquette wrote: > Hi Viresh, > > Quoting Viresh Kumar (2016-01-12 22:31:48) > > On 12-01-16, 16:54, Michael Turquette wrote: > > > __cpufreq_driver_target should be using a per-policy lock. > > > > It doesn't :) > > It should. > > A less conceited response is that a per-policy lock should be held > around calls to __cpufreq_driver_target. This can obviously be done by > cpufreq_driver_target (no double underscore), but there are quite a few > drivers that call __cpufreq_driver_target, and since we're touching the > policy structure we need a lock around it. > Agree, we should enforce the rule that everything that touches policy structure has to lock it before. > Juri's cover letter did not explicitly state my original, full intention > for the patches I was working on. I'll spell that out below and > hopefully we can gather consensus around it before moving forward. Juri, > I'm implicitly assuming that you agree with the stuff below, but please > correct me if I am wrong. Right. I decided to post with this RFC only a subset of the patches we came up with because I needed to build some more confidence with the subsystem I was going to propose changes for. Review comments received are helping me on that front. I didn't mention at all next steps (RCU) because I wanted to focus on understanding and documenting, and maybe fixing where required, the current status, before we change it. > The original idea for overhauling the locking > in cpufreq is to use two locks: > > 1) per-policy lock (my patches were using a mutex), which is the only > lock that needs to be touched during a frequency transition. We do not > want any lock contention between policy's during freq transition. For > read-side operation this locking scheme should utilize RCU so that the > scheduler can safely access the values in struct cpufreq_policy within > it's schedule() context. [a note on RCU below] > > 2) a single, framework-wide lock (my patches were using a mutex) that > handles all of the other synchronization: governor events, driver events > and anything else that does not happen on a per-policy basis. I don't > think RCU is necessary for this. These operations are all slow-path ones > so reducing the mess of 6-ish locks in cpufreq.c and friends down to a > single mutex simplifies things greatly, eliminates the "drop the lock > here for a few instructions" hacks and generally makes things more > readable. > This is basically what I also have on top of this series. I actually went for RCUs also for 2, but yes, that's maybe overkilling. A comment on 1 above, and something on which I got stuck upon for some time, is that, if we implement RCU logic as it is supposed to be, I think we can generate a lot of copy-update operations when changing frequency (as policy structure needs to be changed). Also, we might read stale data. So, I'm not sure this will pay off. However, I tried to get around this problem and I guess we will discuss if 1 is doable in the next RFC :-). > A quick note on RCU and the scheduler-driven DVFS stuff: RCU only helps > us on read-side operations. For the purposes of sched-dvfs, this means > that when we look at capacity utilization and want to normalize > frequency based on that, we need to access the per-policy structure in a > lockless way. RCU makes this possible. > > RCU is absolutely not a magic bullet or elixir that lets us kick off > DVFS transitions from the schedule() context. The frequency transitions > are write-side operations, as we invariably touch struct cpufreq_policy. > This means that the read-side stuff can live in the schedule() context, > but write-side needs to be kicked out to a thread. > Correct. We will still need the kthread machinery even after this changes. Thanks for clarifying things! Best, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13-01-16, 10:21, Michael Turquette wrote: > Quoting Viresh Kumar (2016-01-12 22:31:48) > > On 12-01-16, 16:54, Michael Turquette wrote: > > > __cpufreq_driver_target should be using a per-policy lock. > > > > It doesn't :) > > It should. I thought we wanted the routine doing DVFS to not sleep as it will be called from scheduler ? Looks fine otherwise. But yeah, the series is still incomplete in the sense that there is no lock today around __cpufreq_driver_target().
On 14/01/16 16:02, Viresh Kumar wrote: > On 13-01-16, 10:21, Michael Turquette wrote: > > Quoting Viresh Kumar (2016-01-12 22:31:48) > > > On 12-01-16, 16:54, Michael Turquette wrote: > > > > __cpufreq_driver_target should be using a per-policy lock. > > > > > > It doesn't :) > > > > It should. > > I thought we wanted the routine doing DVFS to not sleep as it will be > called from scheduler ? > > Looks fine otherwise. But yeah, the series is still incomplete in the > sense that there is no lock today around __cpufreq_driver_target(). > I was under the impression that the purpose of having __cpufreq_driver_target() exported outside cpufreq.c was working due to the fact that users implement their own locking. That's why I put the following comment in this patch. /* * Callers must ensure proper mutual exclusion on policy (for transition_ * ongoing/transition_task handling). While holding policy->rwsem is * sufficient, other schemes might work as well (e.g., cpufreq_governor.c * holds timer_mutex while entering the path that generates transitions). */ From what I can see ondemand and conservative (via governor) seem to use timer_mutex; userspace userspace_mutex instead. Do they serve different purposes instead? How do we currently serialize operations on policy when using __cpufreq_driver_target() directly otherwise? Thanks, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 14-01-16, 13:52, Juri Lelli wrote: > I was under the impression that the purpose of having > __cpufreq_driver_target() exported outside cpufreq.c was working due to > the fact that users implement their own locking. > > That's why I put the following comment in this patch. > > /* > * Callers must ensure proper mutual exclusion on policy (for transition_ > * ongoing/transition_task handling). While holding policy->rwsem is > * sufficient, other schemes might work as well (e.g., cpufreq_governor.c > * holds timer_mutex while entering the path that generates transitions). > */ > > >From what I can see ondemand and conservative (via governor) seem to use > timer_mutex; userspace userspace_mutex instead. Do they serve different > purposes instead? How do we currently serialize operations on policy > when using __cpufreq_driver_target() directly otherwise? The patch I referred to earlier in the thread had detailed few of the races we were worried about. The lock you just removed is responsible for taking care of the races you are worried now :)
On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote: > RCU is absolutely not a magic bullet or elixir that lets us kick off > DVFS transitions from the schedule() context. The frequency transitions > are write-side operations, as we invariably touch struct cpufreq_policy. > This means that the read-side stuff can live in the schedule() context, > but write-side needs to be kicked out to a thread. Why? If the state is per-cpu and acquired by RCU, updates should be no problem at all. If you need inter-cpu state, then things get to be a little tricky though, but you can actually nest a raw_spinlock_t in there if you absolutely have to. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19/01/16 15:00, Peter Zijlstra wrote: > On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote: > > RCU is absolutely not a magic bullet or elixir that lets us kick off > > DVFS transitions from the schedule() context. The frequency transitions > > are write-side operations, as we invariably touch struct cpufreq_policy. > > This means that the read-side stuff can live in the schedule() context, > > but write-side needs to be kicked out to a thread. > > Why? If the state is per-cpu and acquired by RCU, updates should be no > problem at all. > > If you need inter-cpu state, then things get to be a little tricky > though, but you can actually nest a raw_spinlock_t in there if you > absolutely have to. > We have at least two problems. First one is that state is per frequency domain (struct cpufreq_policy) and this usually spans more than one cpu. Second one is that we might need to sleep while servicing the frequency transition, both because platform needs to sleep and because some paths of cpufreq core use sleeping locks (yes, that might be changed as well I guess). A solution based on spinlocks only might not be usable on platforms that needs to sleep, also. Another thing that I was thinking of actually is that since struct cpufreq_policy is updated a lot (more or less at every frequency transition), is it actually suitable for RCU? Best, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 19, 2016 at 02:42:33PM +0000, Juri Lelli wrote: > On 19/01/16 15:00, Peter Zijlstra wrote: > > On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote: > > > RCU is absolutely not a magic bullet or elixir that lets us kick off > > > DVFS transitions from the schedule() context. The frequency transitions > > > are write-side operations, as we invariably touch struct cpufreq_policy. > > > This means that the read-side stuff can live in the schedule() context, > > > but write-side needs to be kicked out to a thread. > > > > Why? If the state is per-cpu and acquired by RCU, updates should be no > > problem at all. > > > > If you need inter-cpu state, then things get to be a little tricky > > though, but you can actually nest a raw_spinlock_t in there if you > > absolutely have to. > > > > We have at least two problems. First one is that state is per frequency > domain (struct cpufreq_policy) and this usually spans more than one cpu. > Second one is that we might need to sleep while servicing the frequency > transition, both because platform needs to sleep and because some paths > of cpufreq core use sleeping locks (yes, that might be changed as well I > guess). A solution based on spinlocks only might not be usable on > platforms that needs to sleep, also. Sure, if you need to actually sleep to poke the hardware you've lost and you do indeed need the kthread thingy. > Another thing that I was thinking of actually is that since struct > cpufreq_policy is updated a lot (more or less at every frequency > transition), is it actually suitable for RCU? That entirely depends on how 'hard' it is to 'replace/change' the cpufreq policy. Typically I envision that to be very hard and require mutexes and the like, in which case RCU can provide a cheap lookup and existence. So on 'sane' hardware with per logical cpu hints you can get away without any locks. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19/01/16 16:30, Peter Zijlstra wrote: > On Tue, Jan 19, 2016 at 02:42:33PM +0000, Juri Lelli wrote: > > On 19/01/16 15:00, Peter Zijlstra wrote: > > > On Wed, Jan 13, 2016 at 10:21:31AM -0800, Michael Turquette wrote: > > > > RCU is absolutely not a magic bullet or elixir that lets us kick off > > > > DVFS transitions from the schedule() context. The frequency transitions > > > > are write-side operations, as we invariably touch struct cpufreq_policy. > > > > This means that the read-side stuff can live in the schedule() context, > > > > but write-side needs to be kicked out to a thread. > > > > > > Why? If the state is per-cpu and acquired by RCU, updates should be no > > > problem at all. > > > > > > If you need inter-cpu state, then things get to be a little tricky > > > though, but you can actually nest a raw_spinlock_t in there if you > > > absolutely have to. > > > > > > > We have at least two problems. First one is that state is per frequency > > domain (struct cpufreq_policy) and this usually spans more than one cpu. > > Second one is that we might need to sleep while servicing the frequency > > transition, both because platform needs to sleep and because some paths > > of cpufreq core use sleeping locks (yes, that might be changed as well I > > guess). A solution based on spinlocks only might not be usable on > > platforms that needs to sleep, also. > > Sure, if you need to actually sleep to poke the hardware you've lost and > you do indeed need the kthread thingy. > Yeah, also cpufreq relies on blocking notifiers (to name one thing). So, it seems to me quite some things needs to be changed to make it fully non sleeping. > > Another thing that I was thinking of actually is that since struct > > cpufreq_policy is updated a lot (more or less at every frequency > > transition), is it actually suitable for RCU? > > That entirely depends on how 'hard' it is to 'replace/change' the > cpufreq policy. > > Typically I envision that to be very hard and require mutexes and the > like, in which case RCU can provide a cheap lookup and existence. > Right, read path is fast, but write path still requires some sort of locking (malloc, copy and update). So, I'm wondering if this still pays off for a structure that gets written a lot. > So on 'sane' hardware with per logical cpu hints you can get away > without any locks. > But maybe you are saying that there are ways we can make that work :). Thanks, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote: > Right, read path is fast, but write path still requires some sort of > locking (malloc, copy and update). So, I'm wondering if this still pays > off for a structure that gets written a lot. No, not at all. struct cpufreq_driver *driver; void sched_util_change(unsigned int util) { struct my_per_cpu_data *foo; rcu_read_lock(); foo = __this_cpu_ptr(rcu_dereference(driver)->data); if (foo) { if (abs(util - foo->last_util) > 10) { foo->last_util = util; foo->set_util(util); } } rcu_read_unlock(); } struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver) { struct cpufreq_driver *old_driver; mutex_lock(&cpufreq_driver_lock); old_driver = driver; rcu_assign_driver(driver, new_driver); if (old_driver) synchronize_rcu(); mutex_unlock(&cpufreq_driver_lock); return old_driver; } -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 19, 2016 at 08:17:34PM +0100, Peter Zijlstra wrote: > On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote: > > Right, read path is fast, but write path still requires some sort of > > locking (malloc, copy and update). So, I'm wondering if this still pays > > off for a structure that gets written a lot. > > No, not at all. > > struct cpufreq_driver *driver; > > void sched_util_change(unsigned int util) > { > struct my_per_cpu_data *foo; > > rcu_read_lock(); That should obviously be: d = rcu_dereference(driver); if (d) { foo = __this_cpu_ptr(d->data); > if (abs(util - foo->last_util) > 10) { > foo->last_util = util; > foo->set_util(util); > } > } > rcu_read_unlock(); > } > > > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver) > { > struct cpufreq_driver *old_driver; > > mutex_lock(&cpufreq_driver_lock); > old_driver = driver; > rcu_assign_driver(driver, new_driver); > if (old_driver) > synchronize_rcu(); > mutex_unlock(&cpufreq_driver_lock); > > return old_driver; > } > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, January 19, 2016 08:21:11 PM Peter Zijlstra wrote: > On Tue, Jan 19, 2016 at 08:17:34PM +0100, Peter Zijlstra wrote: > > On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote: > > > Right, read path is fast, but write path still requires some sort of > > > locking (malloc, copy and update). So, I'm wondering if this still pays > > > off for a structure that gets written a lot. > > > > No, not at all. > > This is very similar to what I was thinking about, plus-minus a couple of things. > > struct cpufreq_driver *driver; > > > > void sched_util_change(unsigned int util) > > { > > struct my_per_cpu_data *foo; > > > > rcu_read_lock(); > > That should obviously be: > > d = rcu_dereference(driver); > if (d) { > foo = __this_cpu_ptr(d->data); If we do this, it would be convenient to define ->set_util() to take foo as an arg too, in addition to util. And is there any particular reason why d->data has to be per-cpu? > > > if (abs(util - foo->last_util) > 10) { Even if the utilization doesn't change, it still may be too high or too low, so we may want to call foo->set_util() in that case too, at least once a while. > > foo->last_util = util; > > foo->set_util(util); > > } > > } > > rcu_read_unlock(); > > } > > > > > > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver) > > { > > struct cpufreq_driver *old_driver; > > > > mutex_lock(&cpufreq_driver_lock); > > old_driver = driver; > > rcu_assign_driver(driver, new_driver); > > if (old_driver) > > synchronize_rcu(); > > mutex_unlock(&cpufreq_driver_lock); > > > > return old_driver; > > } We never need to do this, because we never replace one driver with another in one go. We need to go from a valid driver pointer to NULL and the other way around only. This means there may be other pointers around that may be accessed safely from foo->set_util() above if there's a rule that they must be set before the driver pointer and the data structures they point to must stay around until the syncronize_rcu() returns. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19/01/16 20:21, Peter Zijlstra wrote: > On Tue, Jan 19, 2016 at 08:17:34PM +0100, Peter Zijlstra wrote: > > On Tue, Jan 19, 2016 at 04:01:55PM +0000, Juri Lelli wrote: > > > Right, read path is fast, but write path still requires some sort of > > > locking (malloc, copy and update). So, I'm wondering if this still pays > > > off for a structure that gets written a lot. > > > > No, not at all. > > > > struct cpufreq_driver *driver; > > > > void sched_util_change(unsigned int util) > > { > > struct my_per_cpu_data *foo; > > > > rcu_read_lock(); > > That should obviously be: > > d = rcu_dereference(driver); > if (d) { > foo = __this_cpu_ptr(d->data); > > > if (abs(util - foo->last_util) > 10) { > > foo->last_util = util; > > foo->set_util(util); > > } > > } > > rcu_read_unlock(); > > } > > > > > > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver) > > { > > struct cpufreq_driver *old_driver; > > > > mutex_lock(&cpufreq_driver_lock); > > old_driver = driver; > > rcu_assign_driver(driver, new_driver); > > if (old_driver) > > synchronize_rcu(); > > mutex_unlock(&cpufreq_driver_lock); > > > > return old_driver; > > } > > > > > > > Right, this addresses the driver side (modulo what Rafael pointed out about setting driver pointer to NULL and then to point to the new driver); and for this part I think RCU works well. I'm not concerned about the driver side :). Now, assuming that we move cpufreq_cpu_data inside cpufreq_driver (IIUC this is your d->data), we will have per_cpu pointers pointing to the different policies. Inside these policy data structures we have information regarding current frequency, maximum allowed frequency, cpus covered by this policy, and a few more. IIUC this is your foo thing. Since the structure pointed to by foo will be shared amongs several cpus, we need some way to guarantee mutual exclusion and such. I think we were thinking to use RCU for this bit as well and that is what I'm concerned about, as curr frequency will change at every frequency transition. Maybe you are also implying that we need to change cpufreq_cpu_data as well. I need to think more about that. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 19, 2016 at 10:52:22PM +0100, Rafael J. Wysocki wrote: > This is very similar to what I was thinking about, plus-minus a couple of > things. > > > > struct cpufreq_driver *driver; > > > > > > void sched_util_change(unsigned int util) > > > { > > > struct my_per_cpu_data *foo; > > > > > > rcu_read_lock(); > > > > That should obviously be: > > > > d = rcu_dereference(driver); > > if (d) { > > foo = __this_cpu_ptr(d->data); > > If we do this, it would be convenient to define ->set_util() to take > foo as an arg too, in addition to util. > > And is there any particular reason why d->data has to be per-cpu? Seems sensible, at best it actually is per cpu data, at worst this per cpu pointer points to the same data for multiple cpus (the freq domain). > > > > > if (abs(util - foo->last_util) > 10) { > > Even if the utilization doesn't change, it still may be too high or too low, > so we may want to call foo->set_util() in that case too, at least once a > while. > > > > foo->last_util = util; Ah, the whole point of this was that ^^^ store. Modifying the data structure doesn't need a new alloc / copy etc.. We only use RCU to guarantee the data exists, once we have the data, the data itself can be modified however. Here its strictly per-cpu data, so modifying it can be unserialized since CPUs themselves are sequentially consistent. If you have a freq domain with multiple CPUs in, you'll have to go stick a lock in. > > > foo->set_util(util); > > > } > > > } > > > rcu_read_unlock(); > > > } > > > > > > > > > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver) > > > { > > > struct cpufreq_driver *old_driver; > > > > > > mutex_lock(&cpufreq_driver_lock); > > > old_driver = driver; > > > rcu_assign_driver(driver, new_driver); > > > if (old_driver) > > > synchronize_rcu(); > > > mutex_unlock(&cpufreq_driver_lock); > > > > > > return old_driver; > > > } > > We never need to do this, because we never replace one driver with another in > one go. We need to go from a valid driver pointer to NULL and the other way > around only. The above can do those transitions :-) > This means there may be other pointers around that may be accessed safely > from foo->set_util() above if there's a rule that they must be set before > the driver pointer and the data structures they point to must stay around > until the syncronize_rcu() returns. I would dangle _everything_ off the one driver pointer, that's much easier. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 20, 2016 at 6:04 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Jan 19, 2016 at 10:52:22PM +0100, Rafael J. Wysocki wrote: >> This is very similar to what I was thinking about, plus-minus a couple of >> things. >> >> > > struct cpufreq_driver *driver; >> > > >> > > void sched_util_change(unsigned int util) >> > > { >> > > struct my_per_cpu_data *foo; >> > > >> > > rcu_read_lock(); >> > >> > That should obviously be: >> > >> > d = rcu_dereference(driver); >> > if (d) { >> > foo = __this_cpu_ptr(d->data); >> >> If we do this, it would be convenient to define ->set_util() to take >> foo as an arg too, in addition to util. >> >> And is there any particular reason why d->data has to be per-cpu? > > Seems sensible, at best it actually is per cpu data, at worst this per > cpu pointer points to the same data for multiple cpus (the freq domain). > >> > >> > > if (abs(util - foo->last_util) > 10) { >> >> Even if the utilization doesn't change, it still may be too high or too low, >> so we may want to call foo->set_util() in that case too, at least once a >> while. >> >> > > foo->last_util = util; > > Ah, the whole point of this was that ^^^ store. OK, I see. > Modifying the data structure doesn't need a new alloc / copy etc.. We > only use RCU to guarantee the data exists, once we have the data, the > data itself can be modified however. > > Here its strictly per-cpu data, so modifying it can be unserialized > since CPUs themselves are sequentially consistent. Right. So that's why you want it to be per-cpu really. > If you have a freq domain with multiple CPUs in, you'll have to go stick > a lock in. Right. >> > > foo->set_util(util); >> > > } >> > > } >> > > rcu_read_unlock(); >> > > } >> > > >> > > >> > > struct cpufreq_driver *cpufreq_flip_driver(struct cpufreq_driver *new_driver) >> > > { >> > > struct cpufreq_driver *old_driver; >> > > >> > > mutex_lock(&cpufreq_driver_lock); >> > > old_driver = driver; >> > > rcu_assign_driver(driver, new_driver); >> > > if (old_driver) >> > > synchronize_rcu(); >> > > mutex_unlock(&cpufreq_driver_lock); >> > > >> > > return old_driver; >> > > } >> >> We never need to do this, because we never replace one driver with another in >> one go. We need to go from a valid driver pointer to NULL and the other way >> around only. > > The above can do those transitions :-) Yes, it can, but the real thing will probably be more complicated than the code above and then the difference may actually matter. >> This means there may be other pointers around that may be accessed safely >> from foo->set_util() above if there's a rule that they must be set before >> the driver pointer and the data structures they point to must stay around >> until the syncronize_rcu() returns. > > I would dangle _everything_ off the one driver pointer, that's much > easier. I'm not sure how much easier it is in practice. Even if everything dangles out of the driver pointer, data structures pointed to by those things need not be allocated all in one go by the same entity. Some of them are allocated by drivers, some of them by the core, at different times. The ordering between those allocations and populating the pointers is what matters, not how all that is laid out in memory. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 20, 2016 at 11:12:45PM +0100, Rafael J. Wysocki wrote: > > I would dangle _everything_ off the one driver pointer, that's much > > easier. > > I'm not sure how much easier it is in practice. > > Even if everything dangles out of the driver pointer, data structures > pointed to by those things need not be allocated all in one go by the > same entity. Some of them are allocated by drivers, some of them by > the core, at different times. Yes, I've noticed, some of that is really bonkers. > The ordering between those allocations > and populating the pointers is what matters, not how all that is laid > out in memory. I'm thinking getting that ordering right is easier/more natural, if its all contained in one object. But this could be subjective. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 20, 2016 at 11:38 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jan 20, 2016 at 11:12:45PM +0100, Rafael J. Wysocki wrote: >> > I would dangle _everything_ off the one driver pointer, that's much >> > easier. >> >> I'm not sure how much easier it is in practice. >> >> Even if everything dangles out of the driver pointer, data structures >> pointed to by those things need not be allocated all in one go by the >> same entity. Some of them are allocated by drivers, some of them by >> the core, at different times. > > Yes, I've noticed, some of that is really bonkers. > >> The ordering between those allocations >> and populating the pointers is what matters, not how all that is laid >> out in memory. > > I'm thinking getting that ordering right is easier/more natural, if its > all contained in one object. But this could be subjective. I'm trying to look at this from the perspective of making changes. It should be possible to change the ordering of how the data structures are populated and pointers set without changing the existing memory layout of them, which may allow us to minimize the amount of changes to cpufreq drivers for old hardware (and therefore generally difficult to test), for example. Also, this way each individual change may be more limited in scope and therefore less error prone IMO. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 6c9bef7..78b1e2f 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -421,10 +421,15 @@ static void cpufreq_notify_post_transition(struct cpufreq_policy *policy, cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE); } +/* + * Callers must ensure proper mutual exclusion on policy (for transition_ + * ongoing/transition_task handling). While holding policy->rwsem is + * sufficient, other scheme might work as well (e.g., cpufreq_governor.c + * holds timer_mutex while entering the path that generates transitions). + */ void cpufreq_freq_transition_begin(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs) { - /* * Catch double invocations of _begin() which lead to self-deadlock. * ASYNC_NOTIFICATION drivers are left out because the cpufreq core @@ -439,22 +444,19 @@ void cpufreq_freq_transition_begin(struct cpufreq_policy *policy, wait: wait_event(policy->transition_wait, !policy->transition_ongoing); - spin_lock(&policy->transition_lock); - - if (unlikely(policy->transition_ongoing)) { - spin_unlock(&policy->transition_lock); + if (unlikely(policy->transition_ongoing)) goto wait; - } policy->transition_ongoing = true; policy->transition_task = current; - spin_unlock(&policy->transition_lock); - cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE); } EXPORT_SYMBOL_GPL(cpufreq_freq_transition_begin); +/* + * As above, callers must ensure proper mutual exclusion on policy. + */ void cpufreq_freq_transition_end(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs, int transition_failed) { @@ -1057,7 +1059,6 @@ static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) kobject_init(&policy->kobj, &ktype_cpufreq); INIT_LIST_HEAD(&policy->policy_list); init_rwsem(&policy->rwsem); - spin_lock_init(&policy->transition_lock); init_waitqueue_head(&policy->transition_wait); init_completion(&policy->kobj_unregister); INIT_WORK(&policy->update, handle_update); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 79b87ce..6bbb88f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -105,7 +105,6 @@ struct cpufreq_policy { /* Synchronization for frequency transitions */ bool transition_ongoing; /* Tracks transition status */ - spinlock_t transition_lock; wait_queue_head_t transition_wait; struct task_struct *transition_task; /* Task which is doing the transition */