Message ID | 20200218122114.17596-1-jgross@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | xen/rcu: let rcu work better with core scheduling | expand |
On 18/02/2020 12:21, Juergen Gross wrote: > Today the RCU handling in Xen is affecting scheduling in several ways. > It is raising sched softirqs without any real need and it requires > tasklets for rcu_barrier(), which interacts badly with core scheduling. > > This small series repairs those issues. > > Additionally some ASSERT()s are added for verification of sane rcu > handling. In order to avoid those triggering right away the obvious > violations are fixed. > Initial test of the first 2 patches is promising. Will run more tests over night to see how stable it is. Igor
On 18/02/2020 13:15, Igor Druzhinin wrote: > On 18/02/2020 12:21, Juergen Gross wrote: >> Today the RCU handling in Xen is affecting scheduling in several ways. >> It is raising sched softirqs without any real need and it requires >> tasklets for rcu_barrier(), which interacts badly with core scheduling. >> >> This small series repairs those issues. >> >> Additionally some ASSERT()s are added for verification of sane rcu >> handling. In order to avoid those triggering right away the obvious >> violations are fixed. >> > > Initial test of the first 2 patches is promising. Will run more tests > over night to see how stable it is. I stress-tested it over night and it seems to work for our case. Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com> Igor
On 18/02/2020 12:21, Juergen Gross wrote: > Today the RCU handling in Xen is affecting scheduling in several ways. > It is raising sched softirqs without any real need and it requires > tasklets for rcu_barrier(), which interacts badly with core scheduling. > > This small series repairs those issues. > > Additionally some ASSERT()s are added for verification of sane rcu > handling. In order to avoid those triggering right away the obvious > violations are fixed. I've done more testing of this with [1] and, unfortunately, it quite easily deadlocks while without this series it doesn't. Steps to repro: - apply [1] - take a host with considerable CPU count (~64) - run a loop: xen-hptool smt-disable; xen-hptool smt-enable [1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html Igor
On 22.02.20 03:29, Igor Druzhinin wrote: > On 18/02/2020 12:21, Juergen Gross wrote: >> Today the RCU handling in Xen is affecting scheduling in several ways. >> It is raising sched softirqs without any real need and it requires >> tasklets for rcu_barrier(), which interacts badly with core scheduling. >> >> This small series repairs those issues. >> >> Additionally some ASSERT()s are added for verification of sane rcu >> handling. In order to avoid those triggering right away the obvious >> violations are fixed. > > I've done more testing of this with [1] and, unfortunately, it quite easily > deadlocks while without this series it doesn't. > > Steps to repro: > - apply [1] > - take a host with considerable CPU count (~64) > - run a loop: xen-hptool smt-disable; xen-hptool smt-enable > > [1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html Yeah, the reason for that is that rcu_barrier() is a nop in this situation without my patch, as the then called stop_machine_run() in rcu_barrier() will just return -EBUSY. Juergen
Hi, On 22/02/2020 06:05, Jürgen Groß wrote: > On 22.02.20 03:29, Igor Druzhinin wrote: >> On 18/02/2020 12:21, Juergen Gross wrote: >>> Today the RCU handling in Xen is affecting scheduling in several ways. >>> It is raising sched softirqs without any real need and it requires >>> tasklets for rcu_barrier(), which interacts badly with core scheduling. >>> >>> This small series repairs those issues. >>> >>> Additionally some ASSERT()s are added for verification of sane rcu >>> handling. In order to avoid those triggering right away the obvious >>> violations are fixed. >> >> I've done more testing of this with [1] and, unfortunately, it quite >> easily >> deadlocks while without this series it doesn't. >> >> Steps to repro: >> - apply [1] >> - take a host with considerable CPU count (~64) >> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable >> >> [1] >> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html >> > > Yeah, the reason for that is that rcu_barrier() is a nop in this > situation without my patch, as the then called stop_machine_run() in > rcu_barrier() will just return -EBUSY. I think rcu_barrier() been a NOP is also problem as it means you would be able to continue before the in-flight callback has been completed. But I am not entirely sure why a deadlock would happen with your suggestion? Could you details a bit more? Cheers,
On 22.02.20 13:32, Julien Grall wrote: > Hi, > > On 22/02/2020 06:05, Jürgen Groß wrote: >> On 22.02.20 03:29, Igor Druzhinin wrote: >>> On 18/02/2020 12:21, Juergen Gross wrote: >>>> Today the RCU handling in Xen is affecting scheduling in several ways. >>>> It is raising sched softirqs without any real need and it requires >>>> tasklets for rcu_barrier(), which interacts badly with core scheduling. >>>> >>>> This small series repairs those issues. >>>> >>>> Additionally some ASSERT()s are added for verification of sane rcu >>>> handling. In order to avoid those triggering right away the obvious >>>> violations are fixed. >>> >>> I've done more testing of this with [1] and, unfortunately, it quite >>> easily >>> deadlocks while without this series it doesn't. >>> >>> Steps to repro: >>> - apply [1] >>> - take a host with considerable CPU count (~64) >>> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable >>> >>> [1] >>> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html >>> >> >> Yeah, the reason for that is that rcu_barrier() is a nop in this >> situation without my patch, as the then called stop_machine_run() in >> rcu_barrier() will just return -EBUSY. > > I think rcu_barrier() been a NOP is also problem as it means you would > be able to continue before the in-flight callback has been completed. > > But I am not entirely sure why a deadlock would happen with your > suggestion? Could you details a bit more? get_cpu_maps() will return false as long stop_machine_run() is holding the lock, and rcu handling will loop until it gets the lock... Juergen
On 22/02/2020 06:05, Jürgen Groß wrote: > On 22.02.20 03:29, Igor Druzhinin wrote: >> On 18/02/2020 12:21, Juergen Gross wrote: >>> Today the RCU handling in Xen is affecting scheduling in several ways. >>> It is raising sched softirqs without any real need and it requires >>> tasklets for rcu_barrier(), which interacts badly with core scheduling. >>> >>> This small series repairs those issues. >>> >>> Additionally some ASSERT()s are added for verification of sane rcu >>> handling. In order to avoid those triggering right away the obvious >>> violations are fixed. >> >> I've done more testing of this with [1] and, unfortunately, it quite easily >> deadlocks while without this series it doesn't. >> >> Steps to repro: >> - apply [1] >> - take a host with considerable CPU count (~64) >> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable >> >> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html > > Yeah, the reason for that is that rcu_barrier() is a nop in this > situation without my patch, as the then called stop_machine_run() in > rcu_barrier() will just return -EBUSY. Are you sure that's ther reason? I always have the following stack on CPU0: (XEN) [ 120.891143] *** Dumping CPU0 host state: *** (XEN) [ 120.895909] ----[ Xen-4.13.0 x86_64 debug=y Not tainted ]---- (XEN) [ 120.902487] CPU: 0 (XEN) [ 120.905269] RIP: e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43 (XEN) [ 120.913415] RFLAGS: 0000000000000286 CONTEXT: hypervisor (XEN) [ 120.919389] rax: 0000000000000000 rbx: ffff82d0805ddb78 rcx: 0000000000000001 (XEN) [ 120.927362] rdx: ffff82d0805cdb00 rsi: ffff82d0805c7cd8 rdi: 0000000000000007 (XEN) [ 120.935341] rbp: ffff8300920bfbc0 rsp: ffff8300920bfbb8 r8: 000000000000003b (XEN) [ 120.943310] r9: 0444444444444432 r10: 3333333333333333 r11: 0000000000000001 (XEN) [ 120.951282] r12: ffff82d0805ddb78 r13: 0000000000000001 r14: ffff8300920bfc18 (XEN) [ 120.959251] r15: ffff82d0802af646 cr0: 000000008005003b cr4: 00000000003506e0 (XEN) [ 120.967223] cr3: 00000000920b0000 cr2: ffff88820dffe7f8 (XEN) [ 120.973125] fsb: 0000000000000000 gsb: ffff88821e3c0000 gss: 0000000000000000 (XEN) [ 120.981094] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) [ 120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43): (XEN) [ 120.997037] 85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5 (XEN) [ 121.005442] Xen stack trace from rsp=ffff8300920bfbb8: (XEN) [ 121.011080] ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845 (XEN) [ 121.019145] ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8 (XEN) [ 121.027200] 0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf (XEN) [ 121.035255] fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000 (XEN) [ 121.043320] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) [ 121.051375] 000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80 (XEN) [ 121.059443] ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2 (XEN) [ 121.067498] ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00 (XEN) [ 121.075553] ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040 (XEN) [ 121.083621] ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff (XEN) [ 121.091674] 0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040 (XEN) [ 121.099739] ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40 (XEN) [ 121.107797] ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000 (XEN) [ 121.115852] ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00 (XEN) [ 121.123917] fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003 (XEN) [ 121.131975] 0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0 (XEN) [ 121.140037] ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0 (XEN) [ 121.148093] ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30 (XEN) [ 121.156150] ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000 (XEN) [ 121.164216] 0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50 (XEN) [ 121.172271] Xen call trace: (XEN) [ 121.175573] [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43 (XEN) [ 121.183024] [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde (XEN) [ 121.189520] [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89 (XEN) [ 121.197403] [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64 (XEN) [ 121.204951] [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e (XEN) [ 121.213271] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 (XEN) [ 121.220890] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 (XEN) [ 121.228086] [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d (XEN) [ 121.236758] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 (XEN) [ 121.244378] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 (XEN) [ 121.251568] [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e (XEN) [ 121.257639] [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32 (XEN) [ 121.264051] [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe (XEN) [ 121.272454] [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8 (XEN) [ 121.281900] [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4 (XEN) [ 121.289786] [<ffff82d080245803>] F do_tasklet+0x58/0x85 (XEN) [ 121.295771] [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some reason. Igor
On 22.02.20 17:42, Igor Druzhinin wrote: > On 22/02/2020 06:05, Jürgen Groß wrote: >> On 22.02.20 03:29, Igor Druzhinin wrote: >>> On 18/02/2020 12:21, Juergen Gross wrote: >>>> Today the RCU handling in Xen is affecting scheduling in several ways. >>>> It is raising sched softirqs without any real need and it requires >>>> tasklets for rcu_barrier(), which interacts badly with core scheduling. >>>> >>>> This small series repairs those issues. >>>> >>>> Additionally some ASSERT()s are added for verification of sane rcu >>>> handling. In order to avoid those triggering right away the obvious >>>> violations are fixed. >>> >>> I've done more testing of this with [1] and, unfortunately, it quite easily >>> deadlocks while without this series it doesn't. >>> >>> Steps to repro: >>> - apply [1] >>> - take a host with considerable CPU count (~64) >>> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable >>> >>> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html >> >> Yeah, the reason for that is that rcu_barrier() is a nop in this >> situation without my patch, as the then called stop_machine_run() in >> rcu_barrier() will just return -EBUSY. > > Are you sure that's ther reason? I always have the following stack on CPU0: > > (XEN) [ 120.891143] *** Dumping CPU0 host state: *** > (XEN) [ 120.895909] ----[ Xen-4.13.0 x86_64 debug=y Not tainted ]---- > (XEN) [ 120.902487] CPU: 0 > (XEN) [ 120.905269] RIP: e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43 > (XEN) [ 120.913415] RFLAGS: 0000000000000286 CONTEXT: hypervisor > (XEN) [ 120.919389] rax: 0000000000000000 rbx: ffff82d0805ddb78 rcx: 0000000000000001 > (XEN) [ 120.927362] rdx: ffff82d0805cdb00 rsi: ffff82d0805c7cd8 rdi: 0000000000000007 > (XEN) [ 120.935341] rbp: ffff8300920bfbc0 rsp: ffff8300920bfbb8 r8: 000000000000003b > (XEN) [ 120.943310] r9: 0444444444444432 r10: 3333333333333333 r11: 0000000000000001 > (XEN) [ 120.951282] r12: ffff82d0805ddb78 r13: 0000000000000001 r14: ffff8300920bfc18 > (XEN) [ 120.959251] r15: ffff82d0802af646 cr0: 000000008005003b cr4: 00000000003506e0 > (XEN) [ 120.967223] cr3: 00000000920b0000 cr2: ffff88820dffe7f8 > (XEN) [ 120.973125] fsb: 0000000000000000 gsb: ffff88821e3c0000 gss: 0000000000000000 > (XEN) [ 120.981094] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) [ 120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43): > (XEN) [ 120.997037] 85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5 > (XEN) [ 121.005442] Xen stack trace from rsp=ffff8300920bfbb8: > (XEN) [ 121.011080] ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845 > (XEN) [ 121.019145] ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8 > (XEN) [ 121.027200] 0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf > (XEN) [ 121.035255] fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000 > (XEN) [ 121.043320] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) [ 121.051375] 000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80 > (XEN) [ 121.059443] ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2 > (XEN) [ 121.067498] ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00 > (XEN) [ 121.075553] ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040 > (XEN) [ 121.083621] ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff > (XEN) [ 121.091674] 0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040 > (XEN) [ 121.099739] ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40 > (XEN) [ 121.107797] ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000 > (XEN) [ 121.115852] ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00 > (XEN) [ 121.123917] fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003 > (XEN) [ 121.131975] 0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0 > (XEN) [ 121.140037] ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0 > (XEN) [ 121.148093] ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30 > (XEN) [ 121.156150] ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000 > (XEN) [ 121.164216] 0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50 > (XEN) [ 121.172271] Xen call trace: > (XEN) [ 121.175573] [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43 > (XEN) [ 121.183024] [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde > (XEN) [ 121.189520] [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89 > (XEN) [ 121.197403] [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64 > (XEN) [ 121.204951] [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e > (XEN) [ 121.213271] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 > (XEN) [ 121.220890] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 > (XEN) [ 121.228086] [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d > (XEN) [ 121.236758] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 > (XEN) [ 121.244378] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 > (XEN) [ 121.251568] [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e > (XEN) [ 121.257639] [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32 > (XEN) [ 121.264051] [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe > (XEN) [ 121.272454] [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8 > (XEN) [ 121.281900] [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4 > (XEN) [ 121.289786] [<ffff82d080245803>] F do_tasklet+0x58/0x85 > (XEN) [ 121.295771] [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb > > So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some > reason. Interesting. Looking further into that. At least time_calibration() is missing to call get_cpu_maps(). Juergen
On 23/02/2020 14:14, Jürgen Groß wrote: > On 22.02.20 17:42, Igor Druzhinin wrote: >> (XEN) [ 120.891143] *** Dumping CPU0 host state: *** >> (XEN) [ 120.895909] ----[ Xen-4.13.0 x86_64 debug=y Not tainted ]---- >> (XEN) [ 120.902487] CPU: 0 >> (XEN) [ 120.905269] RIP: e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43 >> (XEN) [ 120.913415] RFLAGS: 0000000000000286 CONTEXT: hypervisor >> (XEN) [ 120.919389] rax: 0000000000000000 rbx: ffff82d0805ddb78 rcx: 0000000000000001 >> (XEN) [ 120.927362] rdx: ffff82d0805cdb00 rsi: ffff82d0805c7cd8 rdi: 0000000000000007 >> (XEN) [ 120.935341] rbp: ffff8300920bfbc0 rsp: ffff8300920bfbb8 r8: 000000000000003b >> (XEN) [ 120.943310] r9: 0444444444444432 r10: 3333333333333333 r11: 0000000000000001 >> (XEN) [ 120.951282] r12: ffff82d0805ddb78 r13: 0000000000000001 r14: ffff8300920bfc18 >> (XEN) [ 120.959251] r15: ffff82d0802af646 cr0: 000000008005003b cr4: 00000000003506e0 >> (XEN) [ 120.967223] cr3: 00000000920b0000 cr2: ffff88820dffe7f8 >> (XEN) [ 120.973125] fsb: 0000000000000000 gsb: ffff88821e3c0000 gss: 0000000000000000 >> (XEN) [ 120.981094] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) [ 120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43): >> (XEN) [ 120.997037] 85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5 >> (XEN) [ 121.005442] Xen stack trace from rsp=ffff8300920bfbb8: >> (XEN) [ 121.011080] ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845 >> (XEN) [ 121.019145] ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8 >> (XEN) [ 121.027200] 0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf >> (XEN) [ 121.035255] fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000 >> (XEN) [ 121.043320] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) [ 121.051375] 000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80 >> (XEN) [ 121.059443] ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2 >> (XEN) [ 121.067498] ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00 >> (XEN) [ 121.075553] ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040 >> (XEN) [ 121.083621] ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff >> (XEN) [ 121.091674] 0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040 >> (XEN) [ 121.099739] ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40 >> (XEN) [ 121.107797] ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000 >> (XEN) [ 121.115852] ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00 >> (XEN) [ 121.123917] fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003 >> (XEN) [ 121.131975] 0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0 >> (XEN) [ 121.140037] ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0 >> (XEN) [ 121.148093] ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30 >> (XEN) [ 121.156150] ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000 >> (XEN) [ 121.164216] 0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50 >> (XEN) [ 121.172271] Xen call trace: >> (XEN) [ 121.175573] [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43 >> (XEN) [ 121.183024] [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde >> (XEN) [ 121.189520] [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89 >> (XEN) [ 121.197403] [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64 >> (XEN) [ 121.204951] [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e >> (XEN) [ 121.213271] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 >> (XEN) [ 121.220890] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 >> (XEN) [ 121.228086] [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d >> (XEN) [ 121.236758] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 >> (XEN) [ 121.244378] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 >> (XEN) [ 121.251568] [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e >> (XEN) [ 121.257639] [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32 >> (XEN) [ 121.264051] [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe >> (XEN) [ 121.272454] [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8 >> (XEN) [ 121.281900] [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4 >> (XEN) [ 121.289786] [<ffff82d080245803>] F do_tasklet+0x58/0x85 >> (XEN) [ 121.295771] [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb >> >> So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some >> reason. > > Interesting. Looking further into that. > > At least time_calibration() is missing to call get_cpu_maps(). I debugged this issue and the following fixes it: diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index ccf2ec6..36d98a4 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -153,6 +153,7 @@ static int rsinterval = 1000; * multiple times. */ static atomic_t cpu_count = ATOMIC_INIT(0); +static atomic_t done_count = ATOMIC_INIT(0); static void rcu_barrier_callback(struct rcu_head *head) { @@ -175,6 +176,8 @@ static void rcu_barrier_action(void) process_pending_softirqs(); cpu_relax(); } + + atomic_dec(&done_count); } void rcu_barrier(void) @@ -194,10 +197,11 @@ void rcu_barrier(void) if ( !initial ) { atomic_set(&cpu_count, num_online_cpus()); + atomic_set(&done_count, num_online_cpus()); cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); } - while ( atomic_read(&cpu_count) ) + while ( atomic_read(&done_count) ) { process_pending_softirqs(); cpu_relax(); Is there anything else that blocks v3 currently. Igor
On 27.02.20 16:16, Igor Druzhinin wrote: > On 23/02/2020 14:14, Jürgen Groß wrote: >> On 22.02.20 17:42, Igor Druzhinin wrote: >>> (XEN) [ 120.891143] *** Dumping CPU0 host state: *** >>> (XEN) [ 120.895909] ----[ Xen-4.13.0 x86_64 debug=y Not tainted ]---- >>> (XEN) [ 120.902487] CPU: 0 >>> (XEN) [ 120.905269] RIP: e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43 >>> (XEN) [ 120.913415] RFLAGS: 0000000000000286 CONTEXT: hypervisor >>> (XEN) [ 120.919389] rax: 0000000000000000 rbx: ffff82d0805ddb78 rcx: 0000000000000001 >>> (XEN) [ 120.927362] rdx: ffff82d0805cdb00 rsi: ffff82d0805c7cd8 rdi: 0000000000000007 >>> (XEN) [ 120.935341] rbp: ffff8300920bfbc0 rsp: ffff8300920bfbb8 r8: 000000000000003b >>> (XEN) [ 120.943310] r9: 0444444444444432 r10: 3333333333333333 r11: 0000000000000001 >>> (XEN) [ 120.951282] r12: ffff82d0805ddb78 r13: 0000000000000001 r14: ffff8300920bfc18 >>> (XEN) [ 120.959251] r15: ffff82d0802af646 cr0: 000000008005003b cr4: 00000000003506e0 >>> (XEN) [ 120.967223] cr3: 00000000920b0000 cr2: ffff88820dffe7f8 >>> (XEN) [ 120.973125] fsb: 0000000000000000 gsb: ffff88821e3c0000 gss: 0000000000000000 >>> (XEN) [ 120.981094] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >>> (XEN) [ 120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43): >>> (XEN) [ 120.997037] 85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5 >>> (XEN) [ 121.005442] Xen stack trace from rsp=ffff8300920bfbb8: >>> (XEN) [ 121.011080] ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845 >>> (XEN) [ 121.019145] ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8 >>> (XEN) [ 121.027200] 0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf >>> (XEN) [ 121.035255] fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000 >>> (XEN) [ 121.043320] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> (XEN) [ 121.051375] 000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80 >>> (XEN) [ 121.059443] ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2 >>> (XEN) [ 121.067498] ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00 >>> (XEN) [ 121.075553] ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040 >>> (XEN) [ 121.083621] ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff >>> (XEN) [ 121.091674] 0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040 >>> (XEN) [ 121.099739] ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40 >>> (XEN) [ 121.107797] ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000 >>> (XEN) [ 121.115852] ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00 >>> (XEN) [ 121.123917] fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003 >>> (XEN) [ 121.131975] 0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0 >>> (XEN) [ 121.140037] ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0 >>> (XEN) [ 121.148093] ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30 >>> (XEN) [ 121.156150] ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000 >>> (XEN) [ 121.164216] 0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50 >>> (XEN) [ 121.172271] Xen call trace: >>> (XEN) [ 121.175573] [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43 >>> (XEN) [ 121.183024] [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde >>> (XEN) [ 121.189520] [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89 >>> (XEN) [ 121.197403] [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64 >>> (XEN) [ 121.204951] [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e >>> (XEN) [ 121.213271] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 >>> (XEN) [ 121.220890] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 >>> (XEN) [ 121.228086] [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d >>> (XEN) [ 121.236758] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 >>> (XEN) [ 121.244378] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 >>> (XEN) [ 121.251568] [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e >>> (XEN) [ 121.257639] [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32 >>> (XEN) [ 121.264051] [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe >>> (XEN) [ 121.272454] [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8 >>> (XEN) [ 121.281900] [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4 >>> (XEN) [ 121.289786] [<ffff82d080245803>] F do_tasklet+0x58/0x85 >>> (XEN) [ 121.295771] [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb >>> >>> So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some >>> reason. >> >> Interesting. Looking further into that. >> >> At least time_calibration() is missing to call get_cpu_maps(). > > I debugged this issue and the following fixes it: > > diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c > index ccf2ec6..36d98a4 100644 > --- a/xen/common/rcupdate.c > +++ b/xen/common/rcupdate.c > @@ -153,6 +153,7 @@ static int rsinterval = 1000; > * multiple times. > */ > static atomic_t cpu_count = ATOMIC_INIT(0); > +static atomic_t done_count = ATOMIC_INIT(0); > > static void rcu_barrier_callback(struct rcu_head *head) > { > @@ -175,6 +176,8 @@ static void rcu_barrier_action(void) > process_pending_softirqs(); > cpu_relax(); > } > + > + atomic_dec(&done_count); > } > > void rcu_barrier(void) > @@ -194,10 +197,11 @@ void rcu_barrier(void) > if ( !initial ) > { > atomic_set(&cpu_count, num_online_cpus()); > + atomic_set(&done_count, num_online_cpus()); > cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); > } > > - while ( atomic_read(&cpu_count) ) > + while ( atomic_read(&done_count) ) > { > process_pending_softirqs(); > cpu_relax(); > > Is there anything else that blocks v3 currently. Thanks for the work! I'll send V3 probably tomorrow. Juergen
On 27.02.20 16:16, Igor Druzhinin wrote: > On 23/02/2020 14:14, Jürgen Groß wrote: >> On 22.02.20 17:42, Igor Druzhinin wrote: >>> (XEN) [ 120.891143] *** Dumping CPU0 host state: *** >>> (XEN) [ 120.895909] ----[ Xen-4.13.0 x86_64 debug=y Not tainted ]---- >>> (XEN) [ 120.902487] CPU: 0 >>> (XEN) [ 120.905269] RIP: e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43 >>> (XEN) [ 120.913415] RFLAGS: 0000000000000286 CONTEXT: hypervisor >>> (XEN) [ 120.919389] rax: 0000000000000000 rbx: ffff82d0805ddb78 rcx: 0000000000000001 >>> (XEN) [ 120.927362] rdx: ffff82d0805cdb00 rsi: ffff82d0805c7cd8 rdi: 0000000000000007 >>> (XEN) [ 120.935341] rbp: ffff8300920bfbc0 rsp: ffff8300920bfbb8 r8: 000000000000003b >>> (XEN) [ 120.943310] r9: 0444444444444432 r10: 3333333333333333 r11: 0000000000000001 >>> (XEN) [ 120.951282] r12: ffff82d0805ddb78 r13: 0000000000000001 r14: ffff8300920bfc18 >>> (XEN) [ 120.959251] r15: ffff82d0802af646 cr0: 000000008005003b cr4: 00000000003506e0 >>> (XEN) [ 120.967223] cr3: 00000000920b0000 cr2: ffff88820dffe7f8 >>> (XEN) [ 120.973125] fsb: 0000000000000000 gsb: ffff88821e3c0000 gss: 0000000000000000 >>> (XEN) [ 120.981094] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >>> (XEN) [ 120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43): >>> (XEN) [ 120.997037] 85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5 >>> (XEN) [ 121.005442] Xen stack trace from rsp=ffff8300920bfbb8: >>> (XEN) [ 121.011080] ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845 >>> (XEN) [ 121.019145] ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8 >>> (XEN) [ 121.027200] 0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf >>> (XEN) [ 121.035255] fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000 >>> (XEN) [ 121.043320] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> (XEN) [ 121.051375] 000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80 >>> (XEN) [ 121.059443] ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2 >>> (XEN) [ 121.067498] ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00 >>> (XEN) [ 121.075553] ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040 >>> (XEN) [ 121.083621] ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff >>> (XEN) [ 121.091674] 0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040 >>> (XEN) [ 121.099739] ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40 >>> (XEN) [ 121.107797] ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000 >>> (XEN) [ 121.115852] ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00 >>> (XEN) [ 121.123917] fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003 >>> (XEN) [ 121.131975] 0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0 >>> (XEN) [ 121.140037] ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0 >>> (XEN) [ 121.148093] ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30 >>> (XEN) [ 121.156150] ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000 >>> (XEN) [ 121.164216] 0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50 >>> (XEN) [ 121.172271] Xen call trace: >>> (XEN) [ 121.175573] [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43 >>> (XEN) [ 121.183024] [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde >>> (XEN) [ 121.189520] [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89 >>> (XEN) [ 121.197403] [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64 >>> (XEN) [ 121.204951] [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e >>> (XEN) [ 121.213271] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 >>> (XEN) [ 121.220890] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 >>> (XEN) [ 121.228086] [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d >>> (XEN) [ 121.236758] [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90 >>> (XEN) [ 121.244378] [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37 >>> (XEN) [ 121.251568] [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e >>> (XEN) [ 121.257639] [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32 >>> (XEN) [ 121.264051] [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe >>> (XEN) [ 121.272454] [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8 >>> (XEN) [ 121.281900] [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4 >>> (XEN) [ 121.289786] [<ffff82d080245803>] F do_tasklet+0x58/0x85 >>> (XEN) [ 121.295771] [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb >>> >>> So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some >>> reason. >> >> Interesting. Looking further into that. >> >> At least time_calibration() is missing to call get_cpu_maps(). > > I debugged this issue and the following fixes it: > > diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c > index ccf2ec6..36d98a4 100644 > --- a/xen/common/rcupdate.c > +++ b/xen/common/rcupdate.c > @@ -153,6 +153,7 @@ static int rsinterval = 1000; > * multiple times. > */ > static atomic_t cpu_count = ATOMIC_INIT(0); > +static atomic_t done_count = ATOMIC_INIT(0); > > static void rcu_barrier_callback(struct rcu_head *head) > { > @@ -175,6 +176,8 @@ static void rcu_barrier_action(void) > process_pending_softirqs(); > cpu_relax(); > } > + > + atomic_dec(&done_count); > } > > void rcu_barrier(void) > @@ -194,10 +197,11 @@ void rcu_barrier(void) > if ( !initial ) > { > atomic_set(&cpu_count, num_online_cpus()); > + atomic_set(&done_count, num_online_cpus()); > cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); > } > > - while ( atomic_read(&cpu_count) ) > + while ( atomic_read(&done_count) ) > { > process_pending_softirqs(); > cpu_relax(); I think you are just narrowing the window of the race: It is still possible to have two cpus entering rcu_barrier() and to make it into the if ( !initial ) clause. Instead of introducing another atomic I believe the following patch instead of yours should do it: diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index e6add0b120..0d5469a326 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) void rcu_barrier(void) { - int initial = atomic_read(&cpu_count); - while ( !get_cpu_maps() ) { process_pending_softirqs(); - if ( initial && !atomic_read(&cpu_count) ) + if ( !atomic_read(&cpu_count) ) return; cpu_relax(); - initial = atomic_read(&cpu_count); } - if ( !initial ) - { - atomic_set(&cpu_count, num_online_cpus()); + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); - } while ( atomic_read(&cpu_count) ) { Could you give that a try, please? Juergen
On 28/02/2020 07:10, Jürgen Groß wrote: > > I think you are just narrowing the window of the race: > > It is still possible to have two cpus entering rcu_barrier() and to > make it into the if ( !initial ) clause. > > Instead of introducing another atomic I believe the following patch > instead of yours should do it: > > diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c > index e6add0b120..0d5469a326 100644 > --- a/xen/common/rcupdate.c > +++ b/xen/common/rcupdate.c > @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) > > void rcu_barrier(void) > { > - int initial = atomic_read(&cpu_count); > - > while ( !get_cpu_maps() ) > { > process_pending_softirqs(); > - if ( initial && !atomic_read(&cpu_count) ) > + if ( !atomic_read(&cpu_count) ) > return; > > cpu_relax(); > - initial = atomic_read(&cpu_count); > } > > - if ( !initial ) > - { > - atomic_set(&cpu_count, num_online_cpus()); > + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) > cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); > - } > > while ( atomic_read(&cpu_count) ) > { > > Could you give that a try, please? With this patch I cannot disable SMT at all. The problem that my diff solved was a race between 2 consecutive rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off operation) where some CPUs didn't exit the cpu_count checking loop completely but cpu_count is already reinitialized on CPU0 - this results in some CPUs being stuck in the loop. Igor
On 02.03.20 14:25, Igor Druzhinin wrote: > On 28/02/2020 07:10, Jürgen Groß wrote: >> >> I think you are just narrowing the window of the race: >> >> It is still possible to have two cpus entering rcu_barrier() and to >> make it into the if ( !initial ) clause. >> >> Instead of introducing another atomic I believe the following patch >> instead of yours should do it: >> >> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c >> index e6add0b120..0d5469a326 100644 >> --- a/xen/common/rcupdate.c >> +++ b/xen/common/rcupdate.c >> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) >> >> void rcu_barrier(void) >> { >> - int initial = atomic_read(&cpu_count); >> - >> while ( !get_cpu_maps() ) >> { >> process_pending_softirqs(); >> - if ( initial && !atomic_read(&cpu_count) ) >> + if ( !atomic_read(&cpu_count) ) >> return; >> >> cpu_relax(); >> - initial = atomic_read(&cpu_count); >> } >> >> - if ( !initial ) >> - { >> - atomic_set(&cpu_count, num_online_cpus()); >> + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) >> cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); >> - } >> >> while ( atomic_read(&cpu_count) ) >> { >> >> Could you give that a try, please? > > With this patch I cannot disable SMT at all. > > The problem that my diff solved was a race between 2 consecutive > rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off > operation) where some CPUs didn't exit the cpu_count checking loop > completely but cpu_count is already reinitialized on CPU0 - this > results in some CPUs being stuck in the loop. Ah, okay, then I believe a combination of the two patches is needed. Something like the attached version? Juergen
On 02/03/2020 14:03, Jürgen Groß wrote: > On 02.03.20 14:25, Igor Druzhinin wrote: >> On 28/02/2020 07:10, Jürgen Groß wrote: >>> >>> I think you are just narrowing the window of the race: >>> >>> It is still possible to have two cpus entering rcu_barrier() and to >>> make it into the if ( !initial ) clause. >>> >>> Instead of introducing another atomic I believe the following patch >>> instead of yours should do it: >>> >>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c >>> index e6add0b120..0d5469a326 100644 >>> --- a/xen/common/rcupdate.c >>> +++ b/xen/common/rcupdate.c >>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) >>> >>> void rcu_barrier(void) >>> { >>> - int initial = atomic_read(&cpu_count); >>> - >>> while ( !get_cpu_maps() ) >>> { >>> process_pending_softirqs(); >>> - if ( initial && !atomic_read(&cpu_count) ) >>> + if ( !atomic_read(&cpu_count) ) >>> return; >>> >>> cpu_relax(); >>> - initial = atomic_read(&cpu_count); >>> } >>> >>> - if ( !initial ) >>> - { >>> - atomic_set(&cpu_count, num_online_cpus()); >>> + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) >>> cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); >>> - } >>> >>> while ( atomic_read(&cpu_count) ) >>> { >>> >>> Could you give that a try, please? >> >> With this patch I cannot disable SMT at all. >> >> The problem that my diff solved was a race between 2 consecutive >> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off >> operation) where some CPUs didn't exit the cpu_count checking loop >> completely but cpu_count is already reinitialized on CPU0 - this >> results in some CPUs being stuck in the loop. > > Ah, okay, then I believe a combination of the two patches is needed. > > Something like the attached version? I apologies - my previous test result was from machine booted in core mode. I'm now testing it properly and the original patch seems to do the trick but I still don't understand how you can avoid the race with only 1 counter - it's always possible that CPU1 is still in cpu_count checking loop (even if cpu_count is currently 0) when cpu_count is reinitialized. I'm looking at your current version now. Was the removal of get_cpu_maps() and recursion protection intentional? I suspect it would only work on the latest master so I need to keep those for 4.13 testing. Igor
On 02.03.20 15:23, Igor Druzhinin wrote: > On 02/03/2020 14:03, Jürgen Groß wrote: >> On 02.03.20 14:25, Igor Druzhinin wrote: >>> On 28/02/2020 07:10, Jürgen Groß wrote: >>>> >>>> I think you are just narrowing the window of the race: >>>> >>>> It is still possible to have two cpus entering rcu_barrier() and to >>>> make it into the if ( !initial ) clause. >>>> >>>> Instead of introducing another atomic I believe the following patch >>>> instead of yours should do it: >>>> >>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c >>>> index e6add0b120..0d5469a326 100644 >>>> --- a/xen/common/rcupdate.c >>>> +++ b/xen/common/rcupdate.c >>>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) >>>> >>>> void rcu_barrier(void) >>>> { >>>> - int initial = atomic_read(&cpu_count); >>>> - >>>> while ( !get_cpu_maps() ) >>>> { >>>> process_pending_softirqs(); >>>> - if ( initial && !atomic_read(&cpu_count) ) >>>> + if ( !atomic_read(&cpu_count) ) >>>> return; >>>> >>>> cpu_relax(); >>>> - initial = atomic_read(&cpu_count); >>>> } >>>> >>>> - if ( !initial ) >>>> - { >>>> - atomic_set(&cpu_count, num_online_cpus()); >>>> + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) >>>> cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); >>>> - } >>>> >>>> while ( atomic_read(&cpu_count) ) >>>> { >>>> >>>> Could you give that a try, please? >>> >>> With this patch I cannot disable SMT at all. >>> >>> The problem that my diff solved was a race between 2 consecutive >>> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off >>> operation) where some CPUs didn't exit the cpu_count checking loop >>> completely but cpu_count is already reinitialized on CPU0 - this >>> results in some CPUs being stuck in the loop. >> >> Ah, okay, then I believe a combination of the two patches is needed. >> >> Something like the attached version? > > I apologies - my previous test result was from machine booted in core mode. > I'm now testing it properly and the original patch seems to do the trick but > I still don't understand how you can avoid the race with only 1 counter - > it's always possible that CPU1 is still in cpu_count checking loop (even if > cpu_count is currently 0) when cpu_count is reinitialized. I guess this is very very unlikely. > I'm looking at your current version now. Was the removal of get_cpu_maps() > and recursion protection intentional? I suspect it would only work on the > latest master so I need to keep those for 4.13 testing. Oh, sorry, this seems to be an old version. Here comes the correct one. Juergen
On 02/03/2020 14:32, Jürgen Groß wrote: > On 02.03.20 15:23, Igor Druzhinin wrote: >> On 02/03/2020 14:03, Jürgen Groß wrote: >>> On 02.03.20 14:25, Igor Druzhinin wrote: >>>> On 28/02/2020 07:10, Jürgen Groß wrote: >>>>> >>>>> I think you are just narrowing the window of the race: >>>>> >>>>> It is still possible to have two cpus entering rcu_barrier() and to >>>>> make it into the if ( !initial ) clause. >>>>> >>>>> Instead of introducing another atomic I believe the following patch >>>>> instead of yours should do it: >>>>> >>>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c >>>>> index e6add0b120..0d5469a326 100644 >>>>> --- a/xen/common/rcupdate.c >>>>> +++ b/xen/common/rcupdate.c >>>>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void) >>>>> >>>>> void rcu_barrier(void) >>>>> { >>>>> - int initial = atomic_read(&cpu_count); >>>>> - >>>>> while ( !get_cpu_maps() ) >>>>> { >>>>> process_pending_softirqs(); >>>>> - if ( initial && !atomic_read(&cpu_count) ) >>>>> + if ( !atomic_read(&cpu_count) ) >>>>> return; >>>>> >>>>> cpu_relax(); >>>>> - initial = atomic_read(&cpu_count); >>>>> } >>>>> >>>>> - if ( !initial ) >>>>> - { >>>>> - atomic_set(&cpu_count, num_online_cpus()); >>>>> + if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 ) >>>>> cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ); >>>>> - } >>>>> >>>>> while ( atomic_read(&cpu_count) ) >>>>> { >>>>> >>>>> Could you give that a try, please? >>>> >>>> With this patch I cannot disable SMT at all. >>>> >>>> The problem that my diff solved was a race between 2 consecutive >>>> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off >>>> operation) where some CPUs didn't exit the cpu_count checking loop >>>> completely but cpu_count is already reinitialized on CPU0 - this >>>> results in some CPUs being stuck in the loop. >>> >>> Ah, okay, then I believe a combination of the two patches is needed. >>> >>> Something like the attached version? >> >> I apologies - my previous test result was from machine booted in core mode. >> I'm now testing it properly and the original patch seems to do the trick but >> I still don't understand how you can avoid the race with only 1 counter - >> it's always possible that CPU1 is still in cpu_count checking loop (even if >> cpu_count is currently 0) when cpu_count is reinitialized. > > I guess this is very very unlikely. > >> I'm looking at your current version now. Was the removal of get_cpu_maps() >> and recursion protection intentional? I suspect it would only work on the >> latest master so I need to keep those for 4.13 testing. > > Oh, sorry, this seems to be an old version. > > Here comes the correct one. I checked this version and it's supposed to be fine for v3 I guess. However, I wasn't able to check how well it would work in core mode though as CPU hot off is generally broken in it now (at least it boots in core mode with rcu_barrier called on CPU bring-up). Igor