Message ID | 20190318131155.29450-2-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen: simplify suspend/resume handling | expand |
On 18/03/2019 13:11, Juergen Gross wrote: > cpu_disable_scheduler() is being called from __cpu_disable() today. > There is no need to call it on the cpu just being disabled, so use > the CPU_DEAD case of the cpu notifier chain. > > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > xen/arch/x86/smpboot.c | 3 --- > xen/common/schedule.c | 12 +++++------- > 2 files changed, 5 insertions(+), 10 deletions(-) > > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > index 7d1226d7bc..b7a0a4a419 100644 > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -1221,9 +1221,6 @@ void __cpu_disable(void) > cpumask_clear_cpu(cpu, &cpu_online_map); > fixup_irqs(&cpu_online_map, 1); > fixup_eoi(); > - > - if ( cpu_disable_scheduler(cpu) ) > - BUG(); > } It looks like ARM needs an equivalent adjustment. > > void __cpu_die(unsigned int cpu) > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index 60755a631e..665747f247 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -773,8 +773,9 @@ void restore_vcpu_affinity(struct domain *d) > } > > /* > - * This function is used by cpu_hotplug code from stop_machine context > + * This function is used by cpu_hotplug code via cpu notifier chain > * and from cpupools to switch schedulers on a cpu. > + * Caller must get domlist_read_lock. s/get/hold/ ? With at least the ARM side adjusted, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > */ > int cpu_disable_scheduler(unsigned int cpu) > { > @@ -789,12 +790,6 @@ int cpu_disable_scheduler(unsigned int cpu) > if ( c == NULL ) > return ret; > > - /* > - * We'd need the domain RCU lock, but: > - * - when we are called from cpupool code, it's acquired there already; > - * - when we are called for CPU teardown, we're in stop-machine context, > - * so that's not be a problem. > - */ > for_each_domain_in_cpupool ( d, c ) > { > for_each_vcpu ( d, v ) > @@ -1738,6 +1733,9 @@ static int cpu_schedule_callback( > rc = cpu_schedule_up(cpu); > break; > case CPU_DEAD: > + rcu_read_lock(&domlist_read_lock); > + cpu_disable_scheduler(cpu); > + rcu_read_unlock(&domlist_read_lock); > SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu); > /* Fallthrough */ > case CPU_UP_CANCELED:
On 3/18/19 1:11 PM, Juergen Gross wrote: > cpu_disable_scheduler() is being called from __cpu_disable() today. > There is no need to call it on the cpu just being disabled, so use > the CPU_DEAD case of the cpu notifier chain. > > Signed-off-by: Juergen Gross <jgross@suse.com> Scheduler change: Acked-by: George Dunlap <george.dunlap@citrix.com>
>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote: > @@ -1738,6 +1733,9 @@ static int cpu_schedule_callback( > rc = cpu_schedule_up(cpu); > break; > case CPU_DEAD: > + rcu_read_lock(&domlist_read_lock); > + cpu_disable_scheduler(cpu); > + rcu_read_unlock(&domlist_read_lock); > SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu); > /* Fallthrough */ > case CPU_UP_CANCELED: cpu_disable_scheduler() has a return value (and hence means to fail) - is ignoring this here really appropriate? Also while indeed (as the description says) there's no need to run the function on the CPU itself, it's not obvious to me that it's safe to run it outside of stop_machine() context. Or to be more precise, it's not clear to me that leaving stop_machine() context with the adjustments not done yet is not going to lead to problems (due to the gap between leaving that context and acquiring the RCU lock). Could you clarify this in the description, please (if it indeed is fine this way)? Jan
On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote: > cpu_disable_scheduler() is being called from __cpu_disable() today. > There is no need to call it on the cpu just being disabled, so use > the CPU_DEAD case of the cpu notifier chain. > So, what do you mean with "There is no need to call it on the cpu just being disabled"? Because we still (even after this patch, I mean) call cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just that right now we call it from __cpu_disable(), with the patch we call it slightly later. And another difference looks to me to be that right now we call cpu_disable_scheduler() from stop-machine context, which I think is no longer true with this patch. Perhaps the changelog could tell why that is ok? Regards, Dario
On 27/03/2019 17:24, Dario Faggioli wrote: > On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote: >> cpu_disable_scheduler() is being called from __cpu_disable() today. >> There is no need to call it on the cpu just being disabled, so use >> the CPU_DEAD case of the cpu notifier chain. >> > So, what do you mean with "There is no need to call it on the cpu just > being disabled"? > > Because we still (even after this patch, I mean) call > cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just that > right now we call it from __cpu_disable(), with the patch we call it > slightly later. The CPU_DEAD notifier chain is called on the CPU requesting the other one to go down (so on the boot CPU in suspend case). So we call it _for_ all non-boot CPUs in the boot CPU. > And another difference looks to me to be that right now we call > cpu_disable_scheduler() from stop-machine context, which I think is no > longer true with this patch. Perhaps the changelog could tell why that > is ok? Okay, I'll add something. Juergen
On 27/03/2019 17:22, Jan Beulich wrote: >>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote: >> @@ -1738,6 +1733,9 @@ static int cpu_schedule_callback( >> rc = cpu_schedule_up(cpu); >> break; >> case CPU_DEAD: >> + rcu_read_lock(&domlist_read_lock); >> + cpu_disable_scheduler(cpu); >> + rcu_read_unlock(&domlist_read_lock); >> SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu); >> /* Fallthrough */ >> case CPU_UP_CANCELED: > > cpu_disable_scheduler() has a return value (and hence means to > fail) - is ignoring this here really appropriate? You are right, I should handle those cases in cpu_disable_scheduler(). Without the complete series committed this will result in a case not handled correctly: dom0 trying to pin a vcpu to a physical cpu other than cpu 0 via SCHEDOP_pin_override and suspending in that state. I'm not aware of any dom0 trying to do that. > Also while indeed (as the description says) there's no need to > run the function on the CPU itself, it's not obvious to me that > it's safe to run it outside of stop_machine() context. Or to be > more precise, it's not clear to me that leaving stop_machine() > context with the adjustments not done yet is not going to > lead to problems (due to the gap between leaving that context > and acquiring the RCU lock). Could you clarify this in the > description, please (if it indeed is fine this way)? It is fine, as the chances are zero that any code will run on the cpu just taken down and that cpu is not holding any locks we might need. I'll add that to the commit message. Juergen
On Wed, 2019-03-27 at 17:31 +0100, Juergen Gross wrote: > On 27/03/2019 17:24, Dario Faggioli wrote: > > On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote: > > > cpu_disable_scheduler() is being called from __cpu_disable() > > > today. > > > There is no need to call it on the cpu just being disabled, so > > > use > > > the CPU_DEAD case of the cpu notifier chain. > > > > > So, what do you mean with "There is no need to call it on the cpu > > just > > being disabled"? > > > > Because we still (even after this patch, I mean) call > > cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just > > that > > right now we call it from __cpu_disable(), with the patch we call > > it > > slightly later. > > The CPU_DEAD notifier chain is called on the CPU requesting the other > one to go down (so on the boot CPU in suspend case). So we call it > _for_ > all non-boot CPUs in the boot CPU. > Mmm... ok, I see what you mean now. I guess part of "the problem" is that "call func on cpu A" reads, at least to me, as both 1) call func so that it acts on and change the state of cpu A, and 2) call func in such a way that it executes on cpu A. But I'm no native speaker, so it may very well be that the confusion is all and only mine. Dario
On 27/03/2019 17:51, Dario Faggioli wrote: > On Wed, 2019-03-27 at 17:31 +0100, Juergen Gross wrote: >> On 27/03/2019 17:24, Dario Faggioli wrote: >>> On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote: >>>> cpu_disable_scheduler() is being called from __cpu_disable() >>>> today. >>>> There is no need to call it on the cpu just being disabled, so >>>> use >>>> the CPU_DEAD case of the cpu notifier chain. >>>> >>> So, what do you mean with "There is no need to call it on the cpu >>> just >>> being disabled"? >>> >>> Because we still (even after this patch, I mean) call >>> cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just >>> that >>> right now we call it from __cpu_disable(), with the patch we call >>> it >>> slightly later. >> >> The CPU_DEAD notifier chain is called on the CPU requesting the other >> one to go down (so on the boot CPU in suspend case). So we call it >> _for_ >> all non-boot CPUs in the boot CPU. >> > Mmm... ok, I see what you mean now. > > I guess part of "the problem" is that "call func on cpu A" reads, at > least to me, as both 1) call func so that it acts on and change the > state of cpu A, and 2) call func in such a way that it executes on cpu > A. I'll rephrase to "execute func on cpu...". Juergen
>>> On 27.03.19 at 17:45, <jgross@suse.com> wrote: > On 27/03/2019 17:22, Jan Beulich wrote: >> Also while indeed (as the description says) there's no need to >> run the function on the CPU itself, it's not obvious to me that >> it's safe to run it outside of stop_machine() context. Or to be >> more precise, it's not clear to me that leaving stop_machine() >> context with the adjustments not done yet is not going to >> lead to problems (due to the gap between leaving that context >> and acquiring the RCU lock). Could you clarify this in the >> description, please (if it indeed is fine this way)? > > It is fine, as the chances are zero that any code will run on the cpu > just taken down and that cpu is not holding any locks we might need. Well, of course nothing's going to run on that CPU anymore. But vCPU-s may still have associations with it, so what I'm worried about is e.g. something finding v->processor pointing at an offline CPU and getting confused. Another, more exotic (or should I say contrived) scenario might be a soft-online request coming very quickly after a prior soft-offline one, with this function not having got around to run yet. Or basically anything else that accesses the same state the function means to update (or use). Jan
On 27/03/2019 17:58, Jan Beulich wrote: >>>> On 27.03.19 at 17:45, <jgross@suse.com> wrote: >> On 27/03/2019 17:22, Jan Beulich wrote: >>> Also while indeed (as the description says) there's no need to >>> run the function on the CPU itself, it's not obvious to me that >>> it's safe to run it outside of stop_machine() context. Or to be >>> more precise, it's not clear to me that leaving stop_machine() >>> context with the adjustments not done yet is not going to >>> lead to problems (due to the gap between leaving that context >>> and acquiring the RCU lock). Could you clarify this in the >>> description, please (if it indeed is fine this way)? >> >> It is fine, as the chances are zero that any code will run on the cpu >> just taken down and that cpu is not holding any locks we might need. > > Well, of course nothing's going to run on that CPU anymore. > But vCPU-s may still have associations with it, so what I'm > worried about is e.g. something finding v->processor pointing > at an offline CPU and getting confused. Another, more exotic v->processor is allowed to have a stale value as long as the vcpu isn't running. > (or should I say contrived) scenario might be a soft-online > request coming very quickly after a prior soft-offline one, with > this function not having got around to run yet. Or basically > anything else that accesses the same state the function > means to update (or use). The CPU_DEAD notifier chain is activated before calling cpu_hotplug_done(). I don't see how an online request could make it in between. Juergen
On Wed, 2019-03-27 at 18:06 +0100, Juergen Gross wrote: > On 27/03/2019 17:58, Jan Beulich wrote: > > > > > > > Well, of course nothing's going to run on that CPU anymore. > > But vCPU-s may still have associations with it, so what I'm > > worried about is e.g. something finding v->processor pointing > > at an offline CPU and getting confused. Another, more exotic > > v->processor is allowed to have a stale value as long as the vcpu > isn't running. > I was also concerned about things being done no longer in stop-machine, like Jan, and in fact, I asked for similar reassurances. Thinking more about this, I agree with Juergen that it should work. In fact, it's important for v->processor to point to a CPU that has its scheduling data structure allocated and valid. In current approach, right before suspend, that is only true for the BSP, and that's why we move every vcpu there. But after this series, since we don't deallocate such structs any longer, it should be ok that v->processor retains its value. Regards, Dario
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 7d1226d7bc..b7a0a4a419 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -1221,9 +1221,6 @@ void __cpu_disable(void) cpumask_clear_cpu(cpu, &cpu_online_map); fixup_irqs(&cpu_online_map, 1); fixup_eoi(); - - if ( cpu_disable_scheduler(cpu) ) - BUG(); } void __cpu_die(unsigned int cpu) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 60755a631e..665747f247 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -773,8 +773,9 @@ void restore_vcpu_affinity(struct domain *d) } /* - * This function is used by cpu_hotplug code from stop_machine context + * This function is used by cpu_hotplug code via cpu notifier chain * and from cpupools to switch schedulers on a cpu. + * Caller must get domlist_read_lock. */ int cpu_disable_scheduler(unsigned int cpu) { @@ -789,12 +790,6 @@ int cpu_disable_scheduler(unsigned int cpu) if ( c == NULL ) return ret; - /* - * We'd need the domain RCU lock, but: - * - when we are called from cpupool code, it's acquired there already; - * - when we are called for CPU teardown, we're in stop-machine context, - * so that's not be a problem. - */ for_each_domain_in_cpupool ( d, c ) { for_each_vcpu ( d, v ) @@ -1738,6 +1733,9 @@ static int cpu_schedule_callback( rc = cpu_schedule_up(cpu); break; case CPU_DEAD: + rcu_read_lock(&domlist_read_lock); + cpu_disable_scheduler(cpu); + rcu_read_unlock(&domlist_read_lock); SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu); /* Fallthrough */ case CPU_UP_CANCELED:
cpu_disable_scheduler() is being called from __cpu_disable() today. There is no need to call it on the cpu just being disabled, so use the CPU_DEAD case of the cpu notifier chain. Signed-off-by: Juergen Gross <jgross@suse.com> --- xen/arch/x86/smpboot.c | 3 --- xen/common/schedule.c | 12 +++++------- 2 files changed, 5 insertions(+), 10 deletions(-)