Message ID | 20130218123920.26245.56709.stgit@srivatsabhat.in.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > Some important design requirements and considerations: > ----------------------------------------------------- > > 1. Scalable synchronization at the reader-side, especially in the fast-path > > Any synchronization at the atomic hotplug readers side must be highly > scalable - avoid global single-holder locks/counters etc. Because, these > paths currently use the extremely fast preempt_disable(); our replacement > to preempt_disable() should not become ridiculously costly and also should > not serialize the readers among themselves needlessly. > > At a minimum, the new APIs must be extremely fast at the reader side > atleast in the fast-path, when no CPU offline writers are active. > > 2. preempt_disable() was recursive. The replacement should also be recursive. > > 3. No (new) lock-ordering restrictions > > preempt_disable() was super-flexible. It didn't impose any ordering > restrictions or rules for nesting. Our replacement should also be equally > flexible and usable. > > 4. No deadlock possibilities > > Regular per-cpu locking is not the way to go if we want to have relaxed > rules for lock-ordering. Because, we can end up in circular-locking > dependencies as explained in https://lkml.org/lkml/2012/12/6/290 > > So, avoid the usual per-cpu locking schemes (per-cpu locks/per-cpu atomic > counters with spin-on-contention etc) as much as possible, to avoid > numerous deadlock possibilities from creeping in. > > > Implementation of the design: > ---------------------------- > > We use per-CPU reader-writer locks for synchronization because: > > a. They are quite fast and scalable in the fast-path (when no writers are > active), since they use fast per-cpu counters in those paths. > > b. They are recursive at the reader side. > > c. They provide a good amount of safety against deadlocks; they don't > spring new deadlock possibilities on us from out of nowhere. As a > result, they have relaxed locking rules and are quite flexible, and > thus are best suited for replacing usages of preempt_disable() or > local_irq_disable() at the reader side. > > Together, these satisfy all the requirements mentioned above. Thanks for this detailed design explanation. > +/* > + * Invoked by atomic hotplug reader (a task which wants to prevent > + * CPU offline, but which can't afford to sleep), to prevent CPUs from > + * going offline. So, you can call this function from atomic contexts > + * (including interrupt handlers). > + * > + * Note: This does NOT prevent CPUs from coming online! It only prevents > + * CPUs from going offline. > + * > + * You can call this function recursively. > + * > + * Returns with preemption disabled (but interrupts remain as they are; > + * they are not disabled). > + */ > +void get_online_cpus_atomic(void) > +{ > + percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock); > +} > +EXPORT_SYMBOL_GPL(get_online_cpus_atomic); > + > +void put_online_cpus_atomic(void) > +{ > + percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock); > +} > +EXPORT_SYMBOL_GPL(put_online_cpus_atomic); So, you made it clear why you want a recursive read side here. I am wondering though, if you could take care of recursive uses in get/put_online_cpus_atomic() instead of doing it as a property of your rwlock: get_online_cpus_atomic() { unsigned long flags; local_irq_save(flags); if (this_cpu_inc_return(hotplug_recusion_count) == 1) percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock); local_irq_restore(flags); } Once again, the idea there is to avoid baking the reader side recursive properties into your rwlock itself, so that it won't be impossible to implement reader/writer fairness into your rwlock in the future (which may be not be very important for the hotplug use, but could be when other uses get introduced).
On 02/18/2013 09:53 PM, Michel Lespinasse wrote: > On Mon, Feb 18, 2013 at 8:39 PM, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> Some important design requirements and considerations: >> ----------------------------------------------------- [...] >> +/* >> + * Invoked by atomic hotplug reader (a task which wants to prevent >> + * CPU offline, but which can't afford to sleep), to prevent CPUs from >> + * going offline. So, you can call this function from atomic contexts >> + * (including interrupt handlers). >> + * >> + * Note: This does NOT prevent CPUs from coming online! It only prevents >> + * CPUs from going offline. >> + * >> + * You can call this function recursively. >> + * >> + * Returns with preemption disabled (but interrupts remain as they are; >> + * they are not disabled). >> + */ >> +void get_online_cpus_atomic(void) >> +{ >> + percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock); >> +} >> +EXPORT_SYMBOL_GPL(get_online_cpus_atomic); >> + >> +void put_online_cpus_atomic(void) >> +{ >> + percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock); >> +} >> +EXPORT_SYMBOL_GPL(put_online_cpus_atomic); > > So, you made it clear why you want a recursive read side here. > > I am wondering though, if you could take care of recursive uses in > get/put_online_cpus_atomic() instead of doing it as a property of your > rwlock: > > get_online_cpus_atomic() > { > unsigned long flags; > local_irq_save(flags); > if (this_cpu_inc_return(hotplug_recusion_count) == 1) > percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock); > local_irq_restore(flags); > } > > Once again, the idea there is to avoid baking the reader side > recursive properties into your rwlock itself, so that it won't be > impossible to implement reader/writer fairness into your rwlock in the > future (which may be not be very important for the hotplug use, but > could be when other uses get introduced). > Hmm, your proposal above looks good to me, at first glance. (Sorry, I had mistaken your earlier mails to mean that you were against recursive reader-side, while you actually meant that you didn't like implementing the recursive reader-side logic using the recursive property of rwlocks). While your idea above looks good, it might introduce more complexity in the unlock path, since this would allow nesting of heterogeneous readers (ie., if hotplug_recursion_count == 1, you don't know whether you need to simply decrement the counter or unlock the rwlock as well). But I'll give this some more thought to see if we can implement this without making it too complex. Thank you! Regards, Srivatsa S. Bhat
On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > On 02/18/2013 09:53 PM, Michel Lespinasse wrote: >> I am wondering though, if you could take care of recursive uses in >> get/put_online_cpus_atomic() instead of doing it as a property of your >> rwlock: >> >> get_online_cpus_atomic() >> { >> unsigned long flags; >> local_irq_save(flags); >> if (this_cpu_inc_return(hotplug_recusion_count) == 1) >> percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock); >> local_irq_restore(flags); >> } >> >> Once again, the idea there is to avoid baking the reader side >> recursive properties into your rwlock itself, so that it won't be >> impossible to implement reader/writer fairness into your rwlock in the >> future (which may be not be very important for the hotplug use, but >> could be when other uses get introduced). > > Hmm, your proposal above looks good to me, at first glance. > (Sorry, I had mistaken your earlier mails to mean that you were against > recursive reader-side, while you actually meant that you didn't like > implementing the recursive reader-side logic using the recursive property > of rwlocks). To be honest, I was replying as I went through the series, so I hadn't digested your hotplug use case yet :) But yes - I don't like having the rwlock itself be recursive, but I do understand that you have a legitimate requirement for get_online_cpus_atomic() to be recursive. This IMO points to the direction I suggested, of explicitly handling the recusion in get_online_cpus_atomic() so that the underlying rwlock doesn't have to support recursive reader side itself. (And this would work for the idea of making writers own the reader side as well - you can do it with the hotplug_recursion_count instead of with the underlying rwlock). > While your idea above looks good, it might introduce more complexity > in the unlock path, since this would allow nesting of heterogeneous readers > (ie., if hotplug_recursion_count == 1, you don't know whether you need to > simply decrement the counter or unlock the rwlock as well). Well, I think the idea doesn't make the underlying rwlock more complex, since you could in principle keep your existing percpu_read_lock_irqsafe implementation as is and just remove the recursive behavior from its documentation. Now ideally if we're adding a bit of complexity in get_online_cpus_atomic() it'd be nice if we could remove some from percpu_read_lock_irqsafe, but I haven't thought about that deeply either. I think you'd still want to have the equivalent of a percpu reader_refcnt, except it could now be a bool instead of an int, and percpu_read_lock_irqsafe would still set it to back to 0/false after acquiring the global read side if a writer is signaled. Basically your existing percpu_read_lock_irqsafe code should still work, and we could remove just the parts that were only there to deal with the recursive property.
On 02/18/2013 10:51 PM, Michel Lespinasse wrote: > On Tue, Feb 19, 2013 at 12:43 AM, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> On 02/18/2013 09:53 PM, Michel Lespinasse wrote: >>> I am wondering though, if you could take care of recursive uses in >>> get/put_online_cpus_atomic() instead of doing it as a property of your >>> rwlock: >>> >>> get_online_cpus_atomic() >>> { >>> unsigned long flags; >>> local_irq_save(flags); >>> if (this_cpu_inc_return(hotplug_recusion_count) == 1) >>> percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock); >>> local_irq_restore(flags); >>> } >>> >>> Once again, the idea there is to avoid baking the reader side >>> recursive properties into your rwlock itself, so that it won't be >>> impossible to implement reader/writer fairness into your rwlock in the >>> future (which may be not be very important for the hotplug use, but >>> could be when other uses get introduced). >> >> Hmm, your proposal above looks good to me, at first glance. >> (Sorry, I had mistaken your earlier mails to mean that you were against >> recursive reader-side, while you actually meant that you didn't like >> implementing the recursive reader-side logic using the recursive property >> of rwlocks). > > To be honest, I was replying as I went through the series, so I hadn't > digested your hotplug use case yet :) > > But yes - I don't like having the rwlock itself be recursive, but I do > understand that you have a legitimate requirement for > get_online_cpus_atomic() to be recursive. This IMO points to the > direction I suggested, of explicitly handling the recusion in > get_online_cpus_atomic() so that the underlying rwlock doesn't have to > support recursive reader side itself. > > (And this would work for the idea of making writers own the reader > side as well - you can do it with the hotplug_recursion_count instead > of with the underlying rwlock). > >> While your idea above looks good, it might introduce more complexity >> in the unlock path, since this would allow nesting of heterogeneous readers >> (ie., if hotplug_recursion_count == 1, you don't know whether you need to >> simply decrement the counter or unlock the rwlock as well). > > Well, I think the idea doesn't make the underlying rwlock more > complex, since you could in principle keep your existing > percpu_read_lock_irqsafe implementation as is and just remove the > recursive behavior from its documentation. > > Now ideally if we're adding a bit of complexity in > get_online_cpus_atomic() it'd be nice if we could remove some from > percpu_read_lock_irqsafe, but I haven't thought about that deeply > either. I think you'd still want to have the equivalent of a percpu > reader_refcnt, except it could now be a bool instead of an int, and > percpu_read_lock_irqsafe would still set it to back to 0/false after > acquiring the global read side if a writer is signaled. Basically your > existing percpu_read_lock_irqsafe code should still work, and we could > remove just the parts that were only there to deal with the recursive > property. > But, the whole intention behind removing the parts depending on the recursive property of rwlocks would be to make it easier to make rwlocks fair (going forward) right? Then, that won't work for CPU hotplug, because, just like we have a legitimate reason to have recursive get_online_cpus_atomic(), we also have a legitimate reason to have unfairness in locking (i.e., for deadlock-safety). So we simply can't afford to make the locking fair - we'll end up in too many deadlock possibilities, as hinted in the changelog of patch 1. (Remember, we are replacing preempt_disable(), which had absolutely no special nesting rules or locking implications. That is why we are forced to provide maximum locking flexibility and safety against new/previously non-existent deadlocks, in the new synchronization scheme). So the only long-term solution I can think of is to decouple percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing our own unfair locking scheme inside. What do you think? Regards, Srivatsa S. Bhat
On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > But, the whole intention behind removing the parts depending on the > recursive property of rwlocks would be to make it easier to make rwlocks > fair (going forward) right? Then, that won't work for CPU hotplug, because, > just like we have a legitimate reason to have recursive > get_online_cpus_atomic(), we also have a legitimate reason to have > unfairness in locking (i.e., for deadlock-safety). So we simply can't > afford to make the locking fair - we'll end up in too many deadlock > possibilities, as hinted in the changelog of patch 1. Grumpf - I hadn't realized that making the underlying rwlock fair would break your hotplug use case. But you are right, it would. Oh well :/ > So the only long-term solution I can think of is to decouple > percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing > our own unfair locking scheme inside. What do you think? I have no idea how hard would it be to change get_online_cpus_atomic() call sites so that the hotplug rwlock read side has a defined order vs other locks (thus making sure the situation you describe in patch 1 doesn't happen). I agree we shouldn't base our short term plans around that, but maybe that's doable in the long term ??? Otherwise, I think we should add some big-fat-warning that percpu rwlocks don't have reader/writer fairness, that the hotplug use case actually depends on the unfairness / would break if the rwlock was made fair, and that any new uses of percpu rwlocks should be very carefully considered because of the reader/writer fairness issues. Maybe even give percpu rwlocks a less generic sounding name, given how constrained they are by the hotplug use case.
On 02/19/2013 03:10 PM, Michel Lespinasse wrote: > On Tue, Feb 19, 2013 at 2:50 AM, Srivatsa S. Bhat > <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> But, the whole intention behind removing the parts depending on the >> recursive property of rwlocks would be to make it easier to make rwlocks >> fair (going forward) right? Then, that won't work for CPU hotplug, because, >> just like we have a legitimate reason to have recursive >> get_online_cpus_atomic(), we also have a legitimate reason to have >> unfairness in locking (i.e., for deadlock-safety). So we simply can't >> afford to make the locking fair - we'll end up in too many deadlock >> possibilities, as hinted in the changelog of patch 1. > > Grumpf - I hadn't realized that making the underlying rwlock fair > would break your hotplug use case. But you are right, it would. Oh > well :/ > Yeah :-/ >> So the only long-term solution I can think of is to decouple >> percpu-rwlocks and rwlock_t (like what Tejun suggested) by implementing >> our own unfair locking scheme inside. What do you think? > > I have no idea how hard would it be to change get_online_cpus_atomic() > call sites so that the hotplug rwlock read side has a defined order vs > other locks (thus making sure the situation you describe in patch 1 > doesn't happen). I agree we shouldn't base our short term plans around > that, but maybe that's doable in the long term ??? > I think it should be possible in the longer term. I'm expecting it to be *much much* harder to audit and convert (requiring a lot of subsystem knowledge of each subsystem that we are touching), than the simpler tree-wide conversion that I did in this patchset... but I don't think it is impossible. > Otherwise, I think we should add some big-fat-warning that percpu > rwlocks don't have reader/writer fairness, that the hotplug use case > actually depends on the unfairness / would break if the rwlock was > made fair, and that any new uses of percpu rwlocks should be very > carefully considered because of the reader/writer fairness issues. In fact, when I started out, I actually contained all the new locking code inside CPU hotplug itself, and didn't even expose it as a generic percpu rwlock in some of the previous versions of this patchset... :-) But now that we already have a generic locking scheme exposed, we could add a warning against using it without due consideration. > Maybe even give percpu rwlocks a less generic sounding name, given how > constrained they are by the hotplug use case. I wouldn't go that far... ;-) Unfairness is not a show-stopper right? IMHO, the warning/documentation should suffice for anybody wanting to try out this locking scheme for other use-cases. Regards, Srivatsa S. Bhat
> I wouldn't go that far... ;-) Unfairness is not a show-stopper right? > IMHO, the warning/documentation should suffice for anybody wanting to > try out this locking scheme for other use-cases. I presume that by 'fairness' you mean 'write preference'? I'd not sure how difficult it would be, but maybe have two functions for acquiring the lock for read, one blocks if there is a writer waiting, the other doesn't. That way you can change the individual call sites separately. The other place I can imagine a per-cpu rwlock being used is to allow a driver to disable 'sleep' or software controlled hardware removal while it performs a sequence of operations. David
On 02/19/2013 04:12 PM, David Laight wrote: >> I wouldn't go that far... ;-) Unfairness is not a show-stopper right? >> IMHO, the warning/documentation should suffice for anybody wanting to >> try out this locking scheme for other use-cases. > > I presume that by 'fairness' you mean 'write preference'? > Yep. > I'd not sure how difficult it would be, but maybe have two functions > for acquiring the lock for read, one blocks if there is a writer > waiting, the other doesn't. > > That way you can change the individual call sites separately. > Right, we could probably use that method to change the call sites in multiple stages, in the future. > The other place I can imagine a per-cpu rwlock being used > is to allow a driver to disable 'sleep' or software controlled > hardware removal while it performs a sequence of operations. > BTW, per-cpu rwlocks use spinlocks underneath, so they can be used only in atomic contexts (you can't sleep holding this lock). So that would probably make it less attractive or useless to "heavy-weight" usecases like the latter one you mentioned. They probably need to use per-cpu rw-semaphore or some such, which allows sleeping. I'm not very certain of the exact usecases you are talking about, but I just wanted to point out that percpu-rwlocks might not be applicable to many scenarios. ..(which might be a good thing, considering its unfair property today). Regards, Srivatsa S. Bhat
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 67874b8..cb6b94b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1616,6 +1616,7 @@ config NR_CPUS config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SMP && HOTPLUG + select PERCPU_RWLOCK help Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig index b6f3ad5..83d9882 100644 --- a/arch/blackfin/Kconfig +++ b/arch/blackfin/Kconfig @@ -261,6 +261,7 @@ config NR_CPUS config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SMP && HOTPLUG + select PERCPU_RWLOCK default y config BF_REV_MIN diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 3279646..c246772 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -378,6 +378,7 @@ config HOTPLUG_CPU bool "Support for hot-pluggable CPUs (EXPERIMENTAL)" depends on SMP && EXPERIMENTAL select HOTPLUG + select PERCPU_RWLOCK default n ---help--- Say Y here to experiment with turning CPUs off and on. CPUs diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 2ac626a..f97c479 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -956,6 +956,7 @@ config SYS_HAS_EARLY_PRINTK config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SMP && HOTPLUG && SYS_SUPPORTS_HOTPLUG_CPU + select PERCPU_RWLOCK help Say Y here to allow turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. diff --git a/arch/mn10300/Kconfig b/arch/mn10300/Kconfig index e70001c..a64e488 100644 --- a/arch/mn10300/Kconfig +++ b/arch/mn10300/Kconfig @@ -60,6 +60,7 @@ config ARCH_HAS_ILOG2_U32 config HOTPLUG_CPU def_bool n + select PERCPU_RWLOCK source "init/Kconfig" diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index b77feff..6f55cd4 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -226,6 +226,7 @@ config HOTPLUG_CPU bool default y if SMP select HOTPLUG + select PERCPU_RWLOCK config ARCH_SELECT_MEMORY_MODEL def_bool y diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 17903f1..56b1f15 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -336,6 +336,7 @@ config HOTPLUG_CPU bool "Support for enabling/disabling CPUs" depends on SMP && HOTPLUG && EXPERIMENTAL && (PPC_PSERIES || \ PPC_PMAC || PPC_POWERNV || (PPC_85xx && !PPC_E500MC)) + select PERCPU_RWLOCK ---help--- Say Y here to be able to disable and re-enable individual CPUs at runtime on SMP machines. diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index b5ea38c..a9aafb4 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -299,6 +299,7 @@ config HOTPLUG_CPU prompt "Support for hot-pluggable CPUs" depends on SMP select HOTPLUG + select PERCPU_RWLOCK help Say Y here to be able to turn CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu/cpu#. diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index babc2b8..8c92eef 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -765,6 +765,7 @@ config NR_CPUS config HOTPLUG_CPU bool "Support for hot-pluggable CPUs (EXPERIMENTAL)" depends on SMP && HOTPLUG && EXPERIMENTAL + select PERCPU_RWLOCK help Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index cb9c333..b22f29d 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -254,6 +254,7 @@ config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SPARC64 && SMP select HOTPLUG + select PERCPU_RWLOCK help Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu/cpu#. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 225543b..1a6d50d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1689,6 +1689,7 @@ config PHYSICAL_ALIGN config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SMP && HOTPLUG + select PERCPU_RWLOCK ---help--- Say Y here to allow turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. diff --git a/include/linux/cpu.h b/include/linux/cpu.h index ce7a074..cf24da1 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys; extern void get_online_cpus(void); extern void put_online_cpus(void); +extern void get_online_cpus_atomic(void); +extern void put_online_cpus_atomic(void); #define hotcpu_notifier(fn, pri) cpu_notifier(fn, pri) #define register_hotcpu_notifier(nb) register_cpu_notifier(nb) #define unregister_hotcpu_notifier(nb) unregister_cpu_notifier(nb) @@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void) #define get_online_cpus() do { } while (0) #define put_online_cpus() do { } while (0) +#define get_online_cpus_atomic() do { } while (0) +#define put_online_cpus_atomic() do { } while (0) #define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0) /* These aren't inline functions due to a GCC bug. */ #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; }) diff --git a/kernel/cpu.c b/kernel/cpu.c index 3046a50..58dd1df 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1,6 +1,18 @@ /* CPU control. * (C) 2001, 2002, 2003, 2004 Rusty Russell * + * Rework of the CPU hotplug offline mechanism to remove its dependence on + * the heavy-weight stop_machine() primitive, by Srivatsa S. Bhat and + * Paul E. McKenney. + * + * Copyright (C) IBM Corporation, 2012-2013 + * Authors: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> + * Paul E. McKenney <paulmck@linux.vnet.ibm.com> + * + * With lots of invaluable suggestions from: + * Oleg Nesterov <oleg@redhat.com> + * Tejun Heo <tj@kernel.org> + * * This code is licenced under the GPL. */ #include <linux/proc_fs.h> @@ -19,6 +31,7 @@ #include <linux/mutex.h> #include <linux/gfp.h> #include <linux/suspend.h> +#include <linux/percpu-rwlock.h> #include "smpboot.h" @@ -133,6 +146,38 @@ static void cpu_hotplug_done(void) mutex_unlock(&cpu_hotplug.lock); } +/* + * Per-CPU Reader-Writer lock to synchronize between atomic hotplug + * readers and the CPU offline hotplug writer. + */ +DEFINE_STATIC_PERCPU_RWLOCK(hotplug_pcpu_rwlock); + +/* + * Invoked by atomic hotplug reader (a task which wants to prevent + * CPU offline, but which can't afford to sleep), to prevent CPUs from + * going offline. So, you can call this function from atomic contexts + * (including interrupt handlers). + * + * Note: This does NOT prevent CPUs from coming online! It only prevents + * CPUs from going offline. + * + * You can call this function recursively. + * + * Returns with preemption disabled (but interrupts remain as they are; + * they are not disabled). + */ +void get_online_cpus_atomic(void) +{ + percpu_read_lock_irqsafe(&hotplug_pcpu_rwlock); +} +EXPORT_SYMBOL_GPL(get_online_cpus_atomic); + +void put_online_cpus_atomic(void) +{ + percpu_read_unlock_irqsafe(&hotplug_pcpu_rwlock); +} +EXPORT_SYMBOL_GPL(put_online_cpus_atomic); + #else /* #if CONFIG_HOTPLUG_CPU */ static void cpu_hotplug_begin(void) {} static void cpu_hotplug_done(void) {} @@ -246,15 +291,21 @@ struct take_cpu_down_param { static int __ref take_cpu_down(void *_param) { struct take_cpu_down_param *param = _param; + unsigned long flags; int err; + percpu_write_lock_irqsave(&hotplug_pcpu_rwlock, &flags); + /* Ensure this CPU doesn't handle any more interrupts. */ err = __cpu_disable(); if (err < 0) - return err; + goto out; cpu_notify(CPU_DYING | param->mod, param->hcpu); - return 0; + +out: + percpu_write_unlock_irqrestore(&hotplug_pcpu_rwlock, &flags); + return err; } /* Requires cpu_add_remove_lock to be held */