Message ID | 20200228071922.3983-1-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] xen: make sure stop_machine_run() is always called in a tasklet | expand |
On 28.02.2020 08:19, Juergen Gross wrote: > With core scheduling active it is mandatory for stop_machine_run() to > be called in idle context only (so either during boot or in a tasklet), > as otherwise a scheduling deadlock would occur: stop_machine_run() > does a cpu rendezvous by activating a tasklet on all other cpus. In > case stop_machine_run() was not called in an idle vcpu it would block > scheduling the idle vcpu on its siblings with core scheduling being > active, resulting in a hang. > > Put a BUG_ON() into stop_machine_run() to test for being called in an > idle vcpu only and adapt the missing call site (ucode loading) to use a > tasklet for calling stop_machine_run(). > > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > V2: > - rephrase commit message (Julien Grall) > --- > xen/arch/x86/microcode.c | 54 +++++++++++++++++++++++++++++------------------ > xen/common/stop_machine.c | 1 + > 2 files changed, 35 insertions(+), 20 deletions(-) There's no mention anywhere of a connection to your RCU series, nor to rcu_barrier(). Yet the change puts a new restriction also on its use, and iirc this was mentioned in prior discussion. Did I miss anything? > @@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) > return ret; > } > > +int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) > +{ > + int ret; > + struct ucode_buf *buffer; > + > + if ( len != (uint32_t)len ) > + return -E2BIG; > + > + if ( microcode_ops == NULL ) > + return -EINVAL; > + > + buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len); > + if ( !buffer ) > + return -ENOMEM; > + > + ret = copy_from_guest(buffer->buffer, buf, len); > + if ( ret ) > + { > + xfree(buffer); > + return -EFAULT; > + } > + buffer->len = len; > + > + return continue_hypercall_on_cpu(0, microcode_update_helper, buffer); > +} Andrew, just to clarify - were you okay with Jürgen's response regarding this re-introduction of continue_hypercall_on_cpu() here? Jan
On 28.02.20 09:27, Jan Beulich wrote: > On 28.02.2020 08:19, Juergen Gross wrote: >> With core scheduling active it is mandatory for stop_machine_run() to >> be called in idle context only (so either during boot or in a tasklet), >> as otherwise a scheduling deadlock would occur: stop_machine_run() >> does a cpu rendezvous by activating a tasklet on all other cpus. In >> case stop_machine_run() was not called in an idle vcpu it would block >> scheduling the idle vcpu on its siblings with core scheduling being >> active, resulting in a hang. >> >> Put a BUG_ON() into stop_machine_run() to test for being called in an >> idle vcpu only and adapt the missing call site (ucode loading) to use a >> tasklet for calling stop_machine_run(). >> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> V2: >> - rephrase commit message (Julien Grall) >> --- >> xen/arch/x86/microcode.c | 54 +++++++++++++++++++++++++++++------------------ >> xen/common/stop_machine.c | 1 + >> 2 files changed, 35 insertions(+), 20 deletions(-) > > There's no mention anywhere of a connection to your RCU series, > nor to rcu_barrier(). Yet the change puts a new restriction also > on its use, and iirc this was mentioned in prior discussion. Did > I miss anything? Basically this patch makes the restriction explicit. Without the patch stop_machine_run() being called outside of a tasklet would just hang with core scheduling being active. Now it will catch those violations early even with core scheduling inactive. Note that currently there are no violations of this restriction anywhere in the hypervisor other than the one addressed by this patch. Juergen
On 28.02.2020 09:58, Jürgen Groß wrote: > On 28.02.20 09:27, Jan Beulich wrote: >> On 28.02.2020 08:19, Juergen Gross wrote: >>> With core scheduling active it is mandatory for stop_machine_run() to >>> be called in idle context only (so either during boot or in a tasklet), >>> as otherwise a scheduling deadlock would occur: stop_machine_run() >>> does a cpu rendezvous by activating a tasklet on all other cpus. In >>> case stop_machine_run() was not called in an idle vcpu it would block >>> scheduling the idle vcpu on its siblings with core scheduling being >>> active, resulting in a hang. >>> >>> Put a BUG_ON() into stop_machine_run() to test for being called in an >>> idle vcpu only and adapt the missing call site (ucode loading) to use a >>> tasklet for calling stop_machine_run(). >>> >>> Signed-off-by: Juergen Gross <jgross@suse.com> >>> --- >>> V2: >>> - rephrase commit message (Julien Grall) >>> --- >>> xen/arch/x86/microcode.c | 54 +++++++++++++++++++++++++++++------------------ >>> xen/common/stop_machine.c | 1 + >>> 2 files changed, 35 insertions(+), 20 deletions(-) >> >> There's no mention anywhere of a connection to your RCU series, >> nor to rcu_barrier(). Yet the change puts a new restriction also >> on its use, and iirc this was mentioned in prior discussion. Did >> I miss anything? > > Basically this patch makes the restriction explicit. Without the patch > stop_machine_run() being called outside of a tasklet would just hang > with core scheduling being active. Now it will catch those violations > early even with core scheduling inactive. > > Note that currently there are no violations of this restriction anywhere > in the hypervisor other than the one addressed by this patch. Well, there is a connection to core scheduling. Without it, i.e. prior to 4.13, there was no restriction on the placement of rcu_barrier() invocations. According to what you say above, the restriction was implicitly introduced with core scheduling. It should imo be made explicit by attaching a comment, which would (again imo) best be done here because now you also make this case crash without core scheduling in use. Jan
On 28.02.20 10:15, Jan Beulich wrote: > On 28.02.2020 09:58, Jürgen Groß wrote: >> On 28.02.20 09:27, Jan Beulich wrote: >>> On 28.02.2020 08:19, Juergen Gross wrote: >>>> With core scheduling active it is mandatory for stop_machine_run() to >>>> be called in idle context only (so either during boot or in a tasklet), >>>> as otherwise a scheduling deadlock would occur: stop_machine_run() >>>> does a cpu rendezvous by activating a tasklet on all other cpus. In >>>> case stop_machine_run() was not called in an idle vcpu it would block >>>> scheduling the idle vcpu on its siblings with core scheduling being >>>> active, resulting in a hang. >>>> >>>> Put a BUG_ON() into stop_machine_run() to test for being called in an >>>> idle vcpu only and adapt the missing call site (ucode loading) to use a >>>> tasklet for calling stop_machine_run(). >>>> >>>> Signed-off-by: Juergen Gross <jgross@suse.com> >>>> --- >>>> V2: >>>> - rephrase commit message (Julien Grall) >>>> --- >>>> xen/arch/x86/microcode.c | 54 +++++++++++++++++++++++++++++------------------ >>>> xen/common/stop_machine.c | 1 + >>>> 2 files changed, 35 insertions(+), 20 deletions(-) >>> >>> There's no mention anywhere of a connection to your RCU series, >>> nor to rcu_barrier(). Yet the change puts a new restriction also >>> on its use, and iirc this was mentioned in prior discussion. Did >>> I miss anything? >> >> Basically this patch makes the restriction explicit. Without the patch >> stop_machine_run() being called outside of a tasklet would just hang >> with core scheduling being active. Now it will catch those violations >> early even with core scheduling inactive. >> >> Note that currently there are no violations of this restriction anywhere >> in the hypervisor other than the one addressed by this patch. > > Well, there is a connection to core scheduling. Without it, i.e. > prior to 4.13, there was no restriction on the placement of > rcu_barrier() invocations. According to what you say above, the > restriction was implicitly introduced with core scheduling. It > should imo be made explicit by attaching a comment, which would > (again imo) best be done here because now you also make this > case crash without core scheduling in use. Okay, I'll add a comment to stop_machine_run() and to rcu_barrier(). The rcu_barrier() comment will be then removed by my rcu series again. Juergen
On 28.02.2020 10:18, Jürgen Groß wrote: > On 28.02.20 10:15, Jan Beulich wrote: >> On 28.02.2020 09:58, Jürgen Groß wrote: >>> On 28.02.20 09:27, Jan Beulich wrote: >>>> On 28.02.2020 08:19, Juergen Gross wrote: >>>>> With core scheduling active it is mandatory for stop_machine_run() to >>>>> be called in idle context only (so either during boot or in a tasklet), >>>>> as otherwise a scheduling deadlock would occur: stop_machine_run() >>>>> does a cpu rendezvous by activating a tasklet on all other cpus. In >>>>> case stop_machine_run() was not called in an idle vcpu it would block >>>>> scheduling the idle vcpu on its siblings with core scheduling being >>>>> active, resulting in a hang. >>>>> >>>>> Put a BUG_ON() into stop_machine_run() to test for being called in an >>>>> idle vcpu only and adapt the missing call site (ucode loading) to use a >>>>> tasklet for calling stop_machine_run(). >>>>> >>>>> Signed-off-by: Juergen Gross <jgross@suse.com> >>>>> --- >>>>> V2: >>>>> - rephrase commit message (Julien Grall) >>>>> --- >>>>> xen/arch/x86/microcode.c | 54 +++++++++++++++++++++++++++++------------------ >>>>> xen/common/stop_machine.c | 1 + >>>>> 2 files changed, 35 insertions(+), 20 deletions(-) >>>> >>>> There's no mention anywhere of a connection to your RCU series, >>>> nor to rcu_barrier(). Yet the change puts a new restriction also >>>> on its use, and iirc this was mentioned in prior discussion. Did >>>> I miss anything? >>> >>> Basically this patch makes the restriction explicit. Without the patch >>> stop_machine_run() being called outside of a tasklet would just hang >>> with core scheduling being active. Now it will catch those violations >>> early even with core scheduling inactive. >>> >>> Note that currently there are no violations of this restriction anywhere >>> in the hypervisor other than the one addressed by this patch. >> >> Well, there is a connection to core scheduling. Without it, i.e. >> prior to 4.13, there was no restriction on the placement of >> rcu_barrier() invocations. According to what you say above, the >> restriction was implicitly introduced with core scheduling. It >> should imo be made explicit by attaching a comment, which would >> (again imo) best be done here because now you also make this >> case crash without core scheduling in use. > > Okay, I'll add a comment to stop_machine_run() and to rcu_barrier(). The > rcu_barrier() comment will be then removed by my rcu series again. Right - the alternative then would be to call out a dependency of this patch on that series. Jan
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c index 35c1d36cdc..4cf4e66b18 100644 --- a/xen/arch/x86/microcode.c +++ b/xen/arch/x86/microcode.c @@ -562,30 +562,18 @@ static int do_microcode_update(void *patch) return ret; } -int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) +struct ucode_buf { + unsigned int len; + char buffer[]; +}; + +static long microcode_update_helper(void *data) { int ret; - void *buffer; + struct ucode_buf *buffer = data; unsigned int cpu, updated; struct microcode_patch *patch; - if ( len != (uint32_t)len ) - return -E2BIG; - - if ( microcode_ops == NULL ) - return -EINVAL; - - buffer = xmalloc_bytes(len); - if ( !buffer ) - return -ENOMEM; - - ret = copy_from_guest(buffer, buf, len); - if ( ret ) - { - xfree(buffer); - return -EFAULT; - } - /* cpu_online_map must not change during update */ if ( !get_cpu_maps() ) { @@ -607,7 +595,7 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) return -EPERM; } - patch = parse_blob(buffer, len); + patch = parse_blob(buffer->buffer, buffer->len); xfree(buffer); if ( IS_ERR(patch) ) { @@ -700,6 +688,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) return ret; } +int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) +{ + int ret; + struct ucode_buf *buffer; + + if ( len != (uint32_t)len ) + return -E2BIG; + + if ( microcode_ops == NULL ) + return -EINVAL; + + buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len); + if ( !buffer ) + return -ENOMEM; + + ret = copy_from_guest(buffer->buffer, buf, len); + if ( ret ) + { + xfree(buffer); + return -EFAULT; + } + buffer->len = len; + + return continue_hypercall_on_cpu(0, microcode_update_helper, buffer); +} + static int __init microcode_init(void) { /* diff --git a/xen/common/stop_machine.c b/xen/common/stop_machine.c index 33d9602217..fe7f7d4447 100644 --- a/xen/common/stop_machine.c +++ b/xen/common/stop_machine.c @@ -74,6 +74,7 @@ int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu) int ret; BUG_ON(!local_irq_is_enabled()); + BUG_ON(!is_idle_vcpu(current)); /* cpu_online_map must not change. */ if ( !get_cpu_maps() )
With core scheduling active it is mandatory for stop_machine_run() to be called in idle context only (so either during boot or in a tasklet), as otherwise a scheduling deadlock would occur: stop_machine_run() does a cpu rendezvous by activating a tasklet on all other cpus. In case stop_machine_run() was not called in an idle vcpu it would block scheduling the idle vcpu on its siblings with core scheduling being active, resulting in a hang. Put a BUG_ON() into stop_machine_run() to test for being called in an idle vcpu only and adapt the missing call site (ucode loading) to use a tasklet for calling stop_machine_run(). Signed-off-by: Juergen Gross <jgross@suse.com> --- V2: - rephrase commit message (Julien Grall) --- xen/arch/x86/microcode.c | 54 +++++++++++++++++++++++++++++------------------ xen/common/stop_machine.c | 1 + 2 files changed, 35 insertions(+), 20 deletions(-)