Message ID | 20200211093527.6811-1-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen: make sure stop_machine_run() is always called in a tasklet | expand |
Hi, On 11/02/2020 10:35, Juergen Gross wrote: > With core scheduling active it is mandatory for stop_machine_run() to > be called in a tasklet only, as otherwise a scheduling deadlock would > occur: stop_machine_run() does a cpu rendezvous by activating a tasklet > on all other cpus. In case stop_machine_run() was not called in an idle > vcpu it would block scheduling the idle vcpu on its siblings with core > scheduling being active, resulting in a hang. This suggests it is not safe to call stop_machine_run() outside a tasklet but still under "idle vCPU" context. However, alternative patching on Arm during boot will not be in a tasklet. Is it going to be safe? Cheers,
On 13.02.20 10:01, Julien Grall wrote: > Hi, > > On 11/02/2020 10:35, Juergen Gross wrote: >> With core scheduling active it is mandatory for stop_machine_run() to >> be called in a tasklet only, as otherwise a scheduling deadlock would >> occur: stop_machine_run() does a cpu rendezvous by activating a tasklet >> on all other cpus. In case stop_machine_run() was not called in an idle >> vcpu it would block scheduling the idle vcpu on its siblings with core >> scheduling being active, resulting in a hang. > > This suggests it is not safe to call stop_machine_run() outside a > tasklet but still under "idle vCPU" context. However, alternative > patching on Arm during boot will not be in a tasklet. Is it going to be > safe? Yes. I can rephrase that part to make it clear. Juergen
Hi, On 13/02/2020 11:01, Jürgen Groß wrote: > On 13.02.20 10:01, Julien Grall wrote: >> Hi, >> >> On 11/02/2020 10:35, Juergen Gross wrote: >>> With core scheduling active it is mandatory for stop_machine_run() to >>> be called in a tasklet only, as otherwise a scheduling deadlock would >>> occur: stop_machine_run() does a cpu rendezvous by activating a tasklet >>> on all other cpus. In case stop_machine_run() was not called in an idle >>> vcpu it would block scheduling the idle vcpu on its siblings with core >>> scheduling being active, resulting in a hang. >> >> This suggests it is not safe to call stop_machine_run() outside a >> tasklet but still under "idle vCPU" context. However, alternative >> patching on Arm during boot will not be in a tasklet. Is it going to >> be safe? > > Yes. > > I can rephrase that part to make it clear. That would nice. Thank you! Cheers,
On 11/02/2020 09:35, Juergen Gross wrote: > With core scheduling active it is mandatory for stop_machine_run() to > be called in a tasklet only, as otherwise a scheduling deadlock would > occur: stop_machine_run() does a cpu rendezvous by activating a tasklet > on all other cpus. In case stop_machine_run() was not called in an idle > vcpu it would block scheduling the idle vcpu on its siblings with core > scheduling being active, resulting in a hang. I suppose rcu_barrier() is fine due to process_pending_softirqs() being called inside? I'm a little concerned by imposing is_vcpu_idle() restriction in that case as rcu_barrier() could be technically called from a non-tasklet context. Igor
On 14.02.20 15:06, Igor Druzhinin wrote: > On 11/02/2020 09:35, Juergen Gross wrote: >> With core scheduling active it is mandatory for stop_machine_run() to >> be called in a tasklet only, as otherwise a scheduling deadlock would >> occur: stop_machine_run() does a cpu rendezvous by activating a tasklet >> on all other cpus. In case stop_machine_run() was not called in an idle >> vcpu it would block scheduling the idle vcpu on its siblings with core >> scheduling being active, resulting in a hang. > > I suppose rcu_barrier() is fine due to process_pending_softirqs() being > called inside? I'm a little concerned by imposing is_vcpu_idle() restriction > in that case as rcu_barrier() could be technically called from a non-tasklet > context. No, stop_machine_run() with core scheduling active can only work when called in an idle vcpu. OTOH it would be fairly easy to add another softirq for a similar purpose and have a sync_machine_run() using that instead of tasklets. This could be used for ucode loading, too. stop_machine_run() and sync_machine_run() could use a common main function. The patch should be rather simple. Thoughts? Juergen
On 14/02/2020 16:39, Jürgen Groß wrote: > On 14.02.20 15:06, Igor Druzhinin wrote: >> On 11/02/2020 09:35, Juergen Gross wrote: >>> With core scheduling active it is mandatory for stop_machine_run() to >>> be called in a tasklet only, as otherwise a scheduling deadlock would >>> occur: stop_machine_run() does a cpu rendezvous by activating a tasklet >>> on all other cpus. In case stop_machine_run() was not called in an idle >>> vcpu it would block scheduling the idle vcpu on its siblings with core >>> scheduling being active, resulting in a hang. >> >> I suppose rcu_barrier() is fine due to process_pending_softirqs() being >> called inside? I'm a little concerned by imposing is_vcpu_idle() restriction >> in that case as rcu_barrier() could be technically called from a non-tasklet >> context. > > No, stop_machine_run() with core scheduling active can only work when > called in an idle vcpu. > > OTOH it would be fairly easy to add another softirq for a similar > purpose and have a sync_machine_run() using that instead of tasklets. > This could be used for ucode loading, too. > > stop_machine_run() and sync_machine_run() could use a common main > function. The patch should be rather simple. > > Thoughts? I have a patch on the list (which I was planning to send a v2 for) that fixes another issue with rcu_barrier(): https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg02273.html As I understand it now that wouldn't work with core-scheduling. Do you think it's possible to synchronously wait for tasklets to finish in non-tasklet context (because that's what the purpose of rcu_barrier() is)? Igor
On 14.02.20 18:34, Igor Druzhinin wrote: > On 14/02/2020 16:39, Jürgen Groß wrote: >> On 14.02.20 15:06, Igor Druzhinin wrote: >>> On 11/02/2020 09:35, Juergen Gross wrote: >>>> With core scheduling active it is mandatory for stop_machine_run() to >>>> be called in a tasklet only, as otherwise a scheduling deadlock would >>>> occur: stop_machine_run() does a cpu rendezvous by activating a tasklet >>>> on all other cpus. In case stop_machine_run() was not called in an idle >>>> vcpu it would block scheduling the idle vcpu on its siblings with core >>>> scheduling being active, resulting in a hang. >>> >>> I suppose rcu_barrier() is fine due to process_pending_softirqs() being >>> called inside? I'm a little concerned by imposing is_vcpu_idle() restriction >>> in that case as rcu_barrier() could be technically called from a non-tasklet >>> context. >> >> No, stop_machine_run() with core scheduling active can only work when >> called in an idle vcpu. >> >> OTOH it would be fairly easy to add another softirq for a similar >> purpose and have a sync_machine_run() using that instead of tasklets. >> This could be used for ucode loading, too. >> >> stop_machine_run() and sync_machine_run() could use a common main >> function. The patch should be rather simple. >> >> Thoughts? > > I have a patch on the list (which I was planning to send a v2 for) that > fixes another issue with rcu_barrier(): > https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg02273.html > > As I understand it now that wouldn't work with core-scheduling. Do you think > it's possible to synchronously wait for tasklets to finish in non-tasklet > context (because that's what the purpose of rcu_barrier() is)? No, won't work, unless we add preemption (basically would need per-vcpu stacks instead of per-pcpu ones). What might work IMO would be to do rcu_process_callbacks() no longer during idle, but to have a specific softirq for that purpose. This would remove the need to involve scheduling for rcu_barrier(). A brief check of process_pending_softirqs() callers seems to allow that, but I'd like to have a second opinion from someone having more rcu knowledge than me. Single problematic users of process_pending_softirqs() could still be switched to a variant not allowing the new rcu softirq. Juergen
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c index c0fb690f79..8e61769377 100644 --- a/xen/arch/x86/microcode.c +++ b/xen/arch/x86/microcode.c @@ -561,30 +561,18 @@ static int do_microcode_update(void *patch) return ret; } -int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) +struct ucode_buf { + unsigned int len; + char buffer[]; +}; + +static long microcode_update_helper(void *data) { int ret; - void *buffer; + struct ucode_buf *buffer = data; unsigned int cpu, updated; struct microcode_patch *patch; - if ( len != (uint32_t)len ) - return -E2BIG; - - if ( microcode_ops == NULL ) - return -EINVAL; - - buffer = xmalloc_bytes(len); - if ( !buffer ) - return -ENOMEM; - - ret = copy_from_guest(buffer, buf, len); - if ( ret ) - { - xfree(buffer); - return -EFAULT; - } - /* cpu_online_map must not change during update */ if ( !get_cpu_maps() ) { @@ -606,7 +594,7 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) return -EPERM; } - patch = parse_blob(buffer, len); + patch = parse_blob(buffer->buffer, buffer->len); xfree(buffer); if ( IS_ERR(patch) ) { @@ -699,6 +687,32 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) return ret; } +int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) +{ + int ret; + struct ucode_buf *buffer; + + if ( len != (uint32_t)len ) + return -E2BIG; + + if ( microcode_ops == NULL ) + return -EINVAL; + + buffer = xmalloc_flex_struct(struct ucode_buf, buffer, len); + if ( !buffer ) + return -ENOMEM; + + ret = copy_from_guest(buffer->buffer, buf, len); + if ( ret ) + { + xfree(buffer); + return -EFAULT; + } + buffer->len = len; + + return continue_hypercall_on_cpu(0, microcode_update_helper, buffer); +} + static int __init microcode_init(void) { /* diff --git a/xen/common/stop_machine.c b/xen/common/stop_machine.c index 33d9602217..fe7f7d4447 100644 --- a/xen/common/stop_machine.c +++ b/xen/common/stop_machine.c @@ -74,6 +74,7 @@ int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu) int ret; BUG_ON(!local_irq_is_enabled()); + BUG_ON(!is_idle_vcpu(current)); /* cpu_online_map must not change. */ if ( !get_cpu_maps() )
With core scheduling active it is mandatory for stop_machine_run() to be called in a tasklet only, as otherwise a scheduling deadlock would occur: stop_machine_run() does a cpu rendezvous by activating a tasklet on all other cpus. In case stop_machine_run() was not called in an idle vcpu it would block scheduling the idle vcpu on its siblings with core scheduling being active, resulting in a hang. Put a BUG_ON() into stop_machine_run() to test for being called in an idle vcpu only and adapt the missing call site (ucode loading) to use a tasklet for calling stop_machine_run(). Signed-off-by: Juergen Gross <jgross@suse.com> --- xen/arch/x86/microcode.c | 54 +++++++++++++++++++++++++++++------------------ xen/common/stop_machine.c | 1 + 2 files changed, 35 insertions(+), 20 deletions(-)