[v2,0/4] xen/rcu: let rcu work better with core scheduling

Message ID	20200218122114.17596-1-jgross@suse.com (mailing list archive)
Headers	show Return-Path: <SRS0=Jx38=4G=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8783D207FD From: Juergen Gross <jgross@suse.com> To: xen-devel@lists.xenproject.org Date: Tue, 18 Feb 2020 13:21:10 +0100 Message-Id: <20200218122114.17596-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v2 0/4] xen/rcu: let rcu work better with core scheduling Precedence: list Cc: Juergen Gross <jgross@suse.com>, Kevin Tian <kevin.tian@intel.com>, Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Jun Nakajima <jun.nakajima@intel.com>, Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, George Dunlap <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Jan Beulich <jbeulich@suse.com>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	xen/rcu: let rcu work better with core scheduling \| expand [v2,0/4] xen/rcu: let rcu work better with core scheduling [v2,1/4] xen/rcu: use rcu softirq for forcing quiescent state [v2,2/4] xen/rcu: don't use stop_machine_run() for rcu_barrier() [v2,3/4] xen: add process_pending_softirqs_norcu() for keyhandlers [v2,4/4] xen/rcu: add assertions to debug build

Jürgen Groß Feb. 18, 2020, 12:21 p.m. UTC

Today the RCU handling in Xen is affecting scheduling in several ways.
It is raising sched softirqs without any real need and it requires
tasklets for rcu_barrier(), which interacts badly with core scheduling.

This small series repairs those issues.

Additionally some ASSERT()s are added for verification of sane rcu
handling. In order to avoid those triggering right away the obvious
violations are fixed.

Changes in V2:
- use get_cpu_maps() in rcu_barrier() handling
- avoid recursion in rcu_barrier() handling
- new patches 3 and 4

Juergen Gross (4):
  xen/rcu: use rcu softirq for forcing quiescent state
  xen/rcu: don't use stop_machine_run() for rcu_barrier()
  xen: add process_pending_softirqs_norcu() for keyhandlers
  xen/rcu: add assertions to debug build

 xen/arch/x86/mm/p2m-ept.c                   |  2 +-
 xen/arch/x86/numa.c                         |  4 +-
 xen/common/keyhandler.c                     |  6 +-
 xen/common/multicall.c                      |  1 +
 xen/common/rcupdate.c                       | 96 +++++++++++++++++++++--------
 xen/common/softirq.c                        | 19 ++++--
 xen/common/wait.c                           |  1 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c         |  2 +-
 xen/drivers/vpci/msi.c                      |  4 +-
 xen/include/xen/rcupdate.h                  | 23 +++++--
 xen/include/xen/softirq.h                   |  2 +
 12 files changed, 118 insertions(+), 44 deletions(-)

Igor Druzhinin Feb. 18, 2020, 1:15 p.m. UTC | #1

On 18/02/2020 12:21, Juergen Gross wrote:
> Today the RCU handling in Xen is affecting scheduling in several ways.
> It is raising sched softirqs without any real need and it requires
> tasklets for rcu_barrier(), which interacts badly with core scheduling.
> 
> This small series repairs those issues.
> 
> Additionally some ASSERT()s are added for verification of sane rcu
> handling. In order to avoid those triggering right away the obvious
> violations are fixed.
> 

Initial test of the first 2 patches is promising. Will run more tests
over night to see how stable it is.

Igor

Igor Druzhinin Feb. 19, 2020, 4:48 p.m. UTC | #2

On 18/02/2020 13:15, Igor Druzhinin wrote:
> On 18/02/2020 12:21, Juergen Gross wrote:
>> Today the RCU handling in Xen is affecting scheduling in several ways.
>> It is raising sched softirqs without any real need and it requires
>> tasklets for rcu_barrier(), which interacts badly with core scheduling.
>>
>> This small series repairs those issues.
>>
>> Additionally some ASSERT()s are added for verification of sane rcu
>> handling. In order to avoid those triggering right away the obvious
>> violations are fixed.
>>
> 
> Initial test of the first 2 patches is promising. Will run more tests
> over night to see how stable it is.

I stress-tested it over night and it seems to work for our case.

Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Igor

Igor Druzhinin Feb. 22, 2020, 2:29 a.m. UTC | #3

On 18/02/2020 12:21, Juergen Gross wrote:
> Today the RCU handling in Xen is affecting scheduling in several ways.
> It is raising sched softirqs without any real need and it requires
> tasklets for rcu_barrier(), which interacts badly with core scheduling.
> 
> This small series repairs those issues.
> 
> Additionally some ASSERT()s are added for verification of sane rcu
> handling. In order to avoid those triggering right away the obvious
> violations are fixed.

I've done more testing of this with [1] and, unfortunately, it quite easily
deadlocks while without this series it doesn't.

Steps to repro:
- apply [1]
- take a host with considerable CPU count (~64)
- run a loop: xen-hptool smt-disable; xen-hptool smt-enable

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html

Igor

Jürgen Groß Feb. 22, 2020, 6:05 a.m. UTC | #4

On 22.02.20 03:29, Igor Druzhinin wrote:
> On 18/02/2020 12:21, Juergen Gross wrote:
>> Today the RCU handling in Xen is affecting scheduling in several ways.
>> It is raising sched softirqs without any real need and it requires
>> tasklets for rcu_barrier(), which interacts badly with core scheduling.
>>
>> This small series repairs those issues.
>>
>> Additionally some ASSERT()s are added for verification of sane rcu
>> handling. In order to avoid those triggering right away the obvious
>> violations are fixed.
> 
> I've done more testing of this with [1] and, unfortunately, it quite easily
> deadlocks while without this series it doesn't.
> 
> Steps to repro:
> - apply [1]
> - take a host with considerable CPU count (~64)
> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html

Yeah, the reason for that is that rcu_barrier() is a nop in this
situation without my patch, as the then called stop_machine_run() in
rcu_barrier() will just return -EBUSY.

Juergen

Julien Grall Feb. 22, 2020, 12:32 p.m. UTC | #5

Hi,

On 22/02/2020 06:05, Jürgen Groß wrote:
> On 22.02.20 03:29, Igor Druzhinin wrote:
>> On 18/02/2020 12:21, Juergen Gross wrote:
>>> Today the RCU handling in Xen is affecting scheduling in several ways.
>>> It is raising sched softirqs without any real need and it requires
>>> tasklets for rcu_barrier(), which interacts badly with core scheduling.
>>>
>>> This small series repairs those issues.
>>>
>>> Additionally some ASSERT()s are added for verification of sane rcu
>>> handling. In order to avoid those triggering right away the obvious
>>> violations are fixed.
>>
>> I've done more testing of this with [1] and, unfortunately, it quite 
>> easily
>> deadlocks while without this series it doesn't.
>>
>> Steps to repro:
>> - apply [1]
>> - take a host with considerable CPU count (~64)
>> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable
>>
>> [1] 
>> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html 
>>
> 
> Yeah, the reason for that is that rcu_barrier() is a nop in this
> situation without my patch, as the then called stop_machine_run() in
> rcu_barrier() will just return -EBUSY.

I think rcu_barrier() been a NOP is also problem as it means you would 
be able to continue before the in-flight callback has been completed.

But I am not entirely sure why a deadlock would happen with your 
suggestion? Could you details a bit more?

Cheers,

Jürgen Groß Feb. 22, 2020, 1:56 p.m. UTC | #6

On 22.02.20 13:32, Julien Grall wrote:
> Hi,
> 
> On 22/02/2020 06:05, Jürgen Groß wrote:
>> On 22.02.20 03:29, Igor Druzhinin wrote:
>>> On 18/02/2020 12:21, Juergen Gross wrote:
>>>> Today the RCU handling in Xen is affecting scheduling in several ways.
>>>> It is raising sched softirqs without any real need and it requires
>>>> tasklets for rcu_barrier(), which interacts badly with core scheduling.
>>>>
>>>> This small series repairs those issues.
>>>>
>>>> Additionally some ASSERT()s are added for verification of sane rcu
>>>> handling. In order to avoid those triggering right away the obvious
>>>> violations are fixed.
>>>
>>> I've done more testing of this with [1] and, unfortunately, it quite 
>>> easily
>>> deadlocks while without this series it doesn't.
>>>
>>> Steps to repro:
>>> - apply [1]
>>> - take a host with considerable CPU count (~64)
>>> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable
>>>
>>> [1] 
>>> https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html 
>>>
>>
>> Yeah, the reason for that is that rcu_barrier() is a nop in this
>> situation without my patch, as the then called stop_machine_run() in
>> rcu_barrier() will just return -EBUSY.
> 
> I think rcu_barrier() been a NOP is also problem as it means you would 
> be able to continue before the in-flight callback has been completed.
> 
> But I am not entirely sure why a deadlock would happen with your 
> suggestion? Could you details a bit more?

get_cpu_maps() will return false as long stop_machine_run() is holding
the lock, and rcu handling will loop until it gets the lock...


Juergen

Igor Druzhinin Feb. 22, 2020, 4:42 p.m. UTC | #7

On 22/02/2020 06:05, Jürgen Groß wrote:
> On 22.02.20 03:29, Igor Druzhinin wrote:
>> On 18/02/2020 12:21, Juergen Gross wrote:
>>> Today the RCU handling in Xen is affecting scheduling in several ways.
>>> It is raising sched softirqs without any real need and it requires
>>> tasklets for rcu_barrier(), which interacts badly with core scheduling.
>>>
>>> This small series repairs those issues.
>>>
>>> Additionally some ASSERT()s are added for verification of sane rcu
>>> handling. In order to avoid those triggering right away the obvious
>>> violations are fixed.
>>
>> I've done more testing of this with [1] and, unfortunately, it quite easily
>> deadlocks while without this series it doesn't.
>>
>> Steps to repro:
>> - apply [1]
>> - take a host with considerable CPU count (~64)
>> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable
>>
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html
> 
> Yeah, the reason for that is that rcu_barrier() is a nop in this
> situation without my patch, as the then called stop_machine_run() in
> rcu_barrier() will just return -EBUSY.

Are you sure that's ther reason? I always have the following stack on CPU0:

(XEN) [  120.891143] *** Dumping CPU0 host state: ***
(XEN) [  120.895909] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
(XEN) [  120.902487] CPU:    0
(XEN) [  120.905269] RIP:    e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43
(XEN) [  120.913415] RFLAGS: 0000000000000286   CONTEXT: hypervisor
(XEN) [  120.919389] rax: 0000000000000000   rbx: ffff82d0805ddb78   rcx: 0000000000000001
(XEN) [  120.927362] rdx: ffff82d0805cdb00   rsi: ffff82d0805c7cd8   rdi: 0000000000000007
(XEN) [  120.935341] rbp: ffff8300920bfbc0   rsp: ffff8300920bfbb8   r8:  000000000000003b
(XEN) [  120.943310] r9:  0444444444444432   r10: 3333333333333333   r11: 0000000000000001
(XEN) [  120.951282] r12: ffff82d0805ddb78   r13: 0000000000000001   r14: ffff8300920bfc18
(XEN) [  120.959251] r15: ffff82d0802af646   cr0: 000000008005003b   cr4: 00000000003506e0
(XEN) [  120.967223] cr3: 00000000920b0000   cr2: ffff88820dffe7f8
(XEN) [  120.973125] fsb: 0000000000000000   gsb: ffff88821e3c0000   gss: 0000000000000000
(XEN) [  120.981094] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) [  120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43):
(XEN) [  120.997037]  85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5
(XEN) [  121.005442] Xen stack trace from rsp=ffff8300920bfbb8:
(XEN) [  121.011080]    ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845
(XEN) [  121.019145]    ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8
(XEN) [  121.027200]    0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf
(XEN) [  121.035255]    fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000
(XEN) [  121.043320]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) [  121.051375]    000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80
(XEN) [  121.059443]    ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2
(XEN) [  121.067498]    ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00
(XEN) [  121.075553]    ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040
(XEN) [  121.083621]    ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff
(XEN) [  121.091674]    0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040
(XEN) [  121.099739]    ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40
(XEN) [  121.107797]    ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000
(XEN) [  121.115852]    ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00
(XEN) [  121.123917]    fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003
(XEN) [  121.131975]    0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0
(XEN) [  121.140037]    ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0
(XEN) [  121.148093]    ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30
(XEN) [  121.156150]    ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000
(XEN) [  121.164216]    0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50
(XEN) [  121.172271] Xen call trace:
(XEN) [  121.175573]    [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43
(XEN) [  121.183024]    [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde
(XEN) [  121.189520]    [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89
(XEN) [  121.197403]    [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64
(XEN) [  121.204951]    [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e
(XEN) [  121.213271]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
(XEN) [  121.220890]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
(XEN) [  121.228086]    [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d
(XEN) [  121.236758]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
(XEN) [  121.244378]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
(XEN) [  121.251568]    [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e
(XEN) [  121.257639]    [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32
(XEN) [  121.264051]    [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe
(XEN) [  121.272454]    [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8
(XEN) [  121.281900]    [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4
(XEN) [  121.289786]    [<ffff82d080245803>] F do_tasklet+0x58/0x85
(XEN) [  121.295771]    [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb

So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some
reason.

Igor

Jürgen Groß Feb. 23, 2020, 2:14 p.m. UTC | #8

On 22.02.20 17:42, Igor Druzhinin wrote:
> On 22/02/2020 06:05, Jürgen Groß wrote:
>> On 22.02.20 03:29, Igor Druzhinin wrote:
>>> On 18/02/2020 12:21, Juergen Gross wrote:
>>>> Today the RCU handling in Xen is affecting scheduling in several ways.
>>>> It is raising sched softirqs without any real need and it requires
>>>> tasklets for rcu_barrier(), which interacts badly with core scheduling.
>>>>
>>>> This small series repairs those issues.
>>>>
>>>> Additionally some ASSERT()s are added for verification of sane rcu
>>>> handling. In order to avoid those triggering right away the obvious
>>>> violations are fixed.
>>>
>>> I've done more testing of this with [1] and, unfortunately, it quite easily
>>> deadlocks while without this series it doesn't.
>>>
>>> Steps to repro:
>>> - apply [1]
>>> - take a host with considerable CPU count (~64)
>>> - run a loop: xen-hptool smt-disable; xen-hptool smt-enable
>>>
>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html
>>
>> Yeah, the reason for that is that rcu_barrier() is a nop in this
>> situation without my patch, as the then called stop_machine_run() in
>> rcu_barrier() will just return -EBUSY.
> 
> Are you sure that's ther reason? I always have the following stack on CPU0:
> 
> (XEN) [  120.891143] *** Dumping CPU0 host state: ***
> (XEN) [  120.895909] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
> (XEN) [  120.902487] CPU:    0
> (XEN) [  120.905269] RIP:    e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43
> (XEN) [  120.913415] RFLAGS: 0000000000000286   CONTEXT: hypervisor
> (XEN) [  120.919389] rax: 0000000000000000   rbx: ffff82d0805ddb78   rcx: 0000000000000001
> (XEN) [  120.927362] rdx: ffff82d0805cdb00   rsi: ffff82d0805c7cd8   rdi: 0000000000000007
> (XEN) [  120.935341] rbp: ffff8300920bfbc0   rsp: ffff8300920bfbb8   r8:  000000000000003b
> (XEN) [  120.943310] r9:  0444444444444432   r10: 3333333333333333   r11: 0000000000000001
> (XEN) [  120.951282] r12: ffff82d0805ddb78   r13: 0000000000000001   r14: ffff8300920bfc18
> (XEN) [  120.959251] r15: ffff82d0802af646   cr0: 000000008005003b   cr4: 00000000003506e0
> (XEN) [  120.967223] cr3: 00000000920b0000   cr2: ffff88820dffe7f8
> (XEN) [  120.973125] fsb: 0000000000000000   gsb: ffff88821e3c0000   gss: 0000000000000000
> (XEN) [  120.981094] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) [  120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43):
> (XEN) [  120.997037]  85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5
> (XEN) [  121.005442] Xen stack trace from rsp=ffff8300920bfbb8:
> (XEN) [  121.011080]    ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845
> (XEN) [  121.019145]    ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8
> (XEN) [  121.027200]    0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf
> (XEN) [  121.035255]    fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000
> (XEN) [  121.043320]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) [  121.051375]    000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80
> (XEN) [  121.059443]    ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2
> (XEN) [  121.067498]    ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00
> (XEN) [  121.075553]    ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040
> (XEN) [  121.083621]    ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff
> (XEN) [  121.091674]    0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040
> (XEN) [  121.099739]    ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40
> (XEN) [  121.107797]    ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000
> (XEN) [  121.115852]    ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00
> (XEN) [  121.123917]    fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003
> (XEN) [  121.131975]    0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0
> (XEN) [  121.140037]    ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0
> (XEN) [  121.148093]    ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30
> (XEN) [  121.156150]    ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000
> (XEN) [  121.164216]    0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50
> (XEN) [  121.172271] Xen call trace:
> (XEN) [  121.175573]    [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43
> (XEN) [  121.183024]    [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde
> (XEN) [  121.189520]    [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89
> (XEN) [  121.197403]    [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64
> (XEN) [  121.204951]    [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e
> (XEN) [  121.213271]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
> (XEN) [  121.220890]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
> (XEN) [  121.228086]    [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d
> (XEN) [  121.236758]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
> (XEN) [  121.244378]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
> (XEN) [  121.251568]    [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e
> (XEN) [  121.257639]    [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32
> (XEN) [  121.264051]    [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe
> (XEN) [  121.272454]    [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8
> (XEN) [  121.281900]    [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4
> (XEN) [  121.289786]    [<ffff82d080245803>] F do_tasklet+0x58/0x85
> (XEN) [  121.295771]    [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb
> 
> So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some
> reason.

Interesting. Looking further into that.

At least time_calibration() is missing to call get_cpu_maps().


Juergen

Igor Druzhinin Feb. 27, 2020, 3:16 p.m. UTC | #9

On 23/02/2020 14:14, Jürgen Groß wrote:
> On 22.02.20 17:42, Igor Druzhinin wrote:
>> (XEN) [  120.891143] *** Dumping CPU0 host state: ***
>> (XEN) [  120.895909] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
>> (XEN) [  120.902487] CPU:    0
>> (XEN) [  120.905269] RIP:    e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43
>> (XEN) [  120.913415] RFLAGS: 0000000000000286   CONTEXT: hypervisor
>> (XEN) [  120.919389] rax: 0000000000000000   rbx: ffff82d0805ddb78   rcx: 0000000000000001
>> (XEN) [  120.927362] rdx: ffff82d0805cdb00   rsi: ffff82d0805c7cd8   rdi: 0000000000000007
>> (XEN) [  120.935341] rbp: ffff8300920bfbc0   rsp: ffff8300920bfbb8   r8:  000000000000003b
>> (XEN) [  120.943310] r9:  0444444444444432   r10: 3333333333333333   r11: 0000000000000001
>> (XEN) [  120.951282] r12: ffff82d0805ddb78   r13: 0000000000000001   r14: ffff8300920bfc18
>> (XEN) [  120.959251] r15: ffff82d0802af646   cr0: 000000008005003b   cr4: 00000000003506e0
>> (XEN) [  120.967223] cr3: 00000000920b0000   cr2: ffff88820dffe7f8
>> (XEN) [  120.973125] fsb: 0000000000000000   gsb: ffff88821e3c0000   gss: 0000000000000000
>> (XEN) [  120.981094] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) [  120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43):
>> (XEN) [  120.997037]  85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5
>> (XEN) [  121.005442] Xen stack trace from rsp=ffff8300920bfbb8:
>> (XEN) [  121.011080]    ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845
>> (XEN) [  121.019145]    ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8
>> (XEN) [  121.027200]    0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf
>> (XEN) [  121.035255]    fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000
>> (XEN) [  121.043320]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) [  121.051375]    000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80
>> (XEN) [  121.059443]    ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2
>> (XEN) [  121.067498]    ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00
>> (XEN) [  121.075553]    ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040
>> (XEN) [  121.083621]    ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff
>> (XEN) [  121.091674]    0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040
>> (XEN) [  121.099739]    ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40
>> (XEN) [  121.107797]    ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000
>> (XEN) [  121.115852]    ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00
>> (XEN) [  121.123917]    fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003
>> (XEN) [  121.131975]    0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0
>> (XEN) [  121.140037]    ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0
>> (XEN) [  121.148093]    ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30
>> (XEN) [  121.156150]    ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000
>> (XEN) [  121.164216]    0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50
>> (XEN) [  121.172271] Xen call trace:
>> (XEN) [  121.175573]    [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43
>> (XEN) [  121.183024]    [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde
>> (XEN) [  121.189520]    [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89
>> (XEN) [  121.197403]    [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64
>> (XEN) [  121.204951]    [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e
>> (XEN) [  121.213271]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
>> (XEN) [  121.220890]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
>> (XEN) [  121.228086]    [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d
>> (XEN) [  121.236758]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
>> (XEN) [  121.244378]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
>> (XEN) [  121.251568]    [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e
>> (XEN) [  121.257639]    [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32
>> (XEN) [  121.264051]    [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe
>> (XEN) [  121.272454]    [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8
>> (XEN) [  121.281900]    [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4
>> (XEN) [  121.289786]    [<ffff82d080245803>] F do_tasklet+0x58/0x85
>> (XEN) [  121.295771]    [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb
>>
>> So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some
>> reason.
> 
> Interesting. Looking further into that.
> 
> At least time_calibration() is missing to call get_cpu_maps().

I debugged this issue and the following fixes it:

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index ccf2ec6..36d98a4 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -153,6 +153,7 @@ static int rsinterval = 1000;
  * multiple times.
  */
 static atomic_t cpu_count = ATOMIC_INIT(0);
+static atomic_t done_count = ATOMIC_INIT(0);
 
 static void rcu_barrier_callback(struct rcu_head *head)
 {
@@ -175,6 +176,8 @@ static void rcu_barrier_action(void)
         process_pending_softirqs();
         cpu_relax();
     }
+
+    atomic_dec(&done_count);
 }
 
 void rcu_barrier(void)
@@ -194,10 +197,11 @@ void rcu_barrier(void)
     if ( !initial )
     {
         atomic_set(&cpu_count, num_online_cpus());
+        atomic_set(&done_count, num_online_cpus());
         cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
     }
 
-    while ( atomic_read(&cpu_count) )
+    while ( atomic_read(&done_count) )
     {
         process_pending_softirqs();
         cpu_relax();

Is there anything else that blocks v3 currently.

Igor

Jürgen Groß Feb. 27, 2020, 3:21 p.m. UTC | #10

On 27.02.20 16:16, Igor Druzhinin wrote:
> On 23/02/2020 14:14, Jürgen Groß wrote:
>> On 22.02.20 17:42, Igor Druzhinin wrote:
>>> (XEN) [  120.891143] *** Dumping CPU0 host state: ***
>>> (XEN) [  120.895909] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
>>> (XEN) [  120.902487] CPU:    0
>>> (XEN) [  120.905269] RIP:    e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43
>>> (XEN) [  120.913415] RFLAGS: 0000000000000286   CONTEXT: hypervisor
>>> (XEN) [  120.919389] rax: 0000000000000000   rbx: ffff82d0805ddb78   rcx: 0000000000000001
>>> (XEN) [  120.927362] rdx: ffff82d0805cdb00   rsi: ffff82d0805c7cd8   rdi: 0000000000000007
>>> (XEN) [  120.935341] rbp: ffff8300920bfbc0   rsp: ffff8300920bfbb8   r8:  000000000000003b
>>> (XEN) [  120.943310] r9:  0444444444444432   r10: 3333333333333333   r11: 0000000000000001
>>> (XEN) [  120.951282] r12: ffff82d0805ddb78   r13: 0000000000000001   r14: ffff8300920bfc18
>>> (XEN) [  120.959251] r15: ffff82d0802af646   cr0: 000000008005003b   cr4: 00000000003506e0
>>> (XEN) [  120.967223] cr3: 00000000920b0000   cr2: ffff88820dffe7f8
>>> (XEN) [  120.973125] fsb: 0000000000000000   gsb: ffff88821e3c0000   gss: 0000000000000000
>>> (XEN) [  120.981094] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>> (XEN) [  120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43):
>>> (XEN) [  120.997037]  85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5
>>> (XEN) [  121.005442] Xen stack trace from rsp=ffff8300920bfbb8:
>>> (XEN) [  121.011080]    ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845
>>> (XEN) [  121.019145]    ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8
>>> (XEN) [  121.027200]    0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf
>>> (XEN) [  121.035255]    fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) [  121.043320]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) [  121.051375]    000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80
>>> (XEN) [  121.059443]    ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2
>>> (XEN) [  121.067498]    ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00
>>> (XEN) [  121.075553]    ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040
>>> (XEN) [  121.083621]    ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff
>>> (XEN) [  121.091674]    0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040
>>> (XEN) [  121.099739]    ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40
>>> (XEN) [  121.107797]    ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000
>>> (XEN) [  121.115852]    ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00
>>> (XEN) [  121.123917]    fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003
>>> (XEN) [  121.131975]    0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0
>>> (XEN) [  121.140037]    ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0
>>> (XEN) [  121.148093]    ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30
>>> (XEN) [  121.156150]    ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000
>>> (XEN) [  121.164216]    0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50
>>> (XEN) [  121.172271] Xen call trace:
>>> (XEN) [  121.175573]    [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43
>>> (XEN) [  121.183024]    [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde
>>> (XEN) [  121.189520]    [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89
>>> (XEN) [  121.197403]    [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64
>>> (XEN) [  121.204951]    [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e
>>> (XEN) [  121.213271]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
>>> (XEN) [  121.220890]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
>>> (XEN) [  121.228086]    [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d
>>> (XEN) [  121.236758]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
>>> (XEN) [  121.244378]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
>>> (XEN) [  121.251568]    [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e
>>> (XEN) [  121.257639]    [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32
>>> (XEN) [  121.264051]    [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe
>>> (XEN) [  121.272454]    [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8
>>> (XEN) [  121.281900]    [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4
>>> (XEN) [  121.289786]    [<ffff82d080245803>] F do_tasklet+0x58/0x85
>>> (XEN) [  121.295771]    [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb
>>>
>>> So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some
>>> reason.
>>
>> Interesting. Looking further into that.
>>
>> At least time_calibration() is missing to call get_cpu_maps().
> 
> I debugged this issue and the following fixes it:
> 
> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> index ccf2ec6..36d98a4 100644
> --- a/xen/common/rcupdate.c
> +++ b/xen/common/rcupdate.c
> @@ -153,6 +153,7 @@ static int rsinterval = 1000;
>    * multiple times.
>    */
>   static atomic_t cpu_count = ATOMIC_INIT(0);
> +static atomic_t done_count = ATOMIC_INIT(0);
>   
>   static void rcu_barrier_callback(struct rcu_head *head)
>   {
> @@ -175,6 +176,8 @@ static void rcu_barrier_action(void)
>           process_pending_softirqs();
>           cpu_relax();
>       }
> +
> +    atomic_dec(&done_count);
>   }
>   
>   void rcu_barrier(void)
> @@ -194,10 +197,11 @@ void rcu_barrier(void)
>       if ( !initial )
>       {
>           atomic_set(&cpu_count, num_online_cpus());
> +        atomic_set(&done_count, num_online_cpus());
>           cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>       }
>   
> -    while ( atomic_read(&cpu_count) )
> +    while ( atomic_read(&done_count) )
>       {
>           process_pending_softirqs();
>           cpu_relax();
> 
> Is there anything else that blocks v3 currently.

Thanks for the work!

I'll send V3 probably tomorrow.


Juergen

Jürgen Groß Feb. 28, 2020, 7:10 a.m. UTC | #11

On 27.02.20 16:16, Igor Druzhinin wrote:
> On 23/02/2020 14:14, Jürgen Groß wrote:
>> On 22.02.20 17:42, Igor Druzhinin wrote:
>>> (XEN) [  120.891143] *** Dumping CPU0 host state: ***
>>> (XEN) [  120.895909] ----[ Xen-4.13.0  x86_64  debug=y   Not tainted ]----
>>> (XEN) [  120.902487] CPU:    0
>>> (XEN) [  120.905269] RIP:    e008:[<ffff82d0802aa750>] smp_send_call_function_mask+0x40/0x43
>>> (XEN) [  120.913415] RFLAGS: 0000000000000286   CONTEXT: hypervisor
>>> (XEN) [  120.919389] rax: 0000000000000000   rbx: ffff82d0805ddb78   rcx: 0000000000000001
>>> (XEN) [  120.927362] rdx: ffff82d0805cdb00   rsi: ffff82d0805c7cd8   rdi: 0000000000000007
>>> (XEN) [  120.935341] rbp: ffff8300920bfbc0   rsp: ffff8300920bfbb8   r8:  000000000000003b
>>> (XEN) [  120.943310] r9:  0444444444444432   r10: 3333333333333333   r11: 0000000000000001
>>> (XEN) [  120.951282] r12: ffff82d0805ddb78   r13: 0000000000000001   r14: ffff8300920bfc18
>>> (XEN) [  120.959251] r15: ffff82d0802af646   cr0: 000000008005003b   cr4: 00000000003506e0
>>> (XEN) [  120.967223] cr3: 00000000920b0000   cr2: ffff88820dffe7f8
>>> (XEN) [  120.973125] fsb: 0000000000000000   gsb: ffff88821e3c0000   gss: 0000000000000000
>>> (XEN) [  120.981094] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>> (XEN) [  120.988548] Xen code around <ffff82d0802aa750> (smp_send_call_function_mask+0x40/0x43):
>>> (XEN) [  120.997037]  85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02 0f 0b 55 48 89 e5
>>> (XEN) [  121.005442] Xen stack trace from rsp=ffff8300920bfbb8:
>>> (XEN) [  121.011080]    ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84 ffff82d080389845
>>> (XEN) [  121.019145]    ffff8300920bfc18 ffff82d0802af178 0000000000000000 0000001c1d27aff8
>>> (XEN) [  121.027200]    0000000000000000 ffff8300920bfc80 ffff82d0802af1fa ffff82d080289adf
>>> (XEN) [  121.035255]    fffffffffffffd55 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) [  121.043320]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) [  121.051375]    000000000000003b 0000001c25e54bf1 0000000000000000 ffff8300920bfc80
>>> (XEN) [  121.059443]    ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d ffff82d0802af4a2
>>> (XEN) [  121.067498]    ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400 ffff8300920bfd00
>>> (XEN) [  121.075553]    ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80 ffff82d0805c7040
>>> (XEN) [  121.083621]    ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff
>>> (XEN) [  121.091674]    0000000000000000 ffff8300920bfd30 ffff82d0802425a5 ffff82d0805c7040
>>> (XEN) [  121.099739]    ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff ffff8300920bfd40
>>> (XEN) [  121.107797]    ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f 0000000000000000
>>> (XEN) [  121.115852]    ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00 ffff82d0805cdb00
>>> (XEN) [  121.123917]    fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5 0000000000000003
>>> (XEN) [  121.131975]    0000000000000001 00000000ffffffef ffff8300920bffff ffff8300920bfdc0
>>> (XEN) [  121.140037]    ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b ffff8300920bfdf0
>>> (XEN) [  121.148093]    ffff82d0802addb1 ffff83042b3b0000 0000000000000003 ffff8300920bfe30
>>> (XEN) [  121.156150]    ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0 ffff83042b3b0000
>>> (XEN) [  121.164216]    0000000000000000 0000000000000000 0000000000000000 ffff8300920bfe50
>>> (XEN) [  121.172271] Xen call trace:
>>> (XEN) [  121.175573]    [<ffff82d0802aa750>] R smp_send_call_function_mask+0x40/0x43
>>> (XEN) [  121.183024]    [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde
>>> (XEN) [  121.189520]    [<ffff82d0802af1fa>] F arch/x86/time.c#time_calibration+0x82/0x89
>>> (XEN) [  121.197403]    [<ffff82d080245f4d>] F common/timer.c#execute_timer+0x49/0x64
>>> (XEN) [  121.204951]    [<ffff82d080246781>] F common/timer.c#timer_softirq_action+0x116/0x24e
>>> (XEN) [  121.213271]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
>>> (XEN) [  121.220890]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
>>> (XEN) [  121.228086]    [<ffff82d08022bc0f>] F common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d
>>> (XEN) [  121.236758]    [<ffff82d0802425a5>] F common/softirq.c#__do_softirq+0x85/0x90
>>> (XEN) [  121.244378]    [<ffff82d0802425e5>] F process_pending_softirqs+0x35/0x37
>>> (XEN) [  121.251568]    [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e
>>> (XEN) [  121.257639]    [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32
>>> (XEN) [  121.264051]    [<ffff82d0802ae086>] F arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe
>>> (XEN) [  121.272454]    [<ffff82d08020878d>] F common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8
>>> (XEN) [  121.281900]    [<ffff82d0802454e6>] F common/tasklet.c#do_tasklet_work+0x81/0xb4
>>> (XEN) [  121.289786]    [<ffff82d080245803>] F do_tasklet+0x58/0x85
>>> (XEN) [  121.295771]    [<ffff82d08027a0b4>] F arch/x86/domain.c#idle_loop+0x87/0xcb
>>>
>>> So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync for some
>>> reason.
>>
>> Interesting. Looking further into that.
>>
>> At least time_calibration() is missing to call get_cpu_maps().
> 
> I debugged this issue and the following fixes it:
> 
> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> index ccf2ec6..36d98a4 100644
> --- a/xen/common/rcupdate.c
> +++ b/xen/common/rcupdate.c
> @@ -153,6 +153,7 @@ static int rsinterval = 1000;
>    * multiple times.
>    */
>   static atomic_t cpu_count = ATOMIC_INIT(0);
> +static atomic_t done_count = ATOMIC_INIT(0);
>   
>   static void rcu_barrier_callback(struct rcu_head *head)
>   {
> @@ -175,6 +176,8 @@ static void rcu_barrier_action(void)
>           process_pending_softirqs();
>           cpu_relax();
>       }
> +
> +    atomic_dec(&done_count);
>   }
>   
>   void rcu_barrier(void)
> @@ -194,10 +197,11 @@ void rcu_barrier(void)
>       if ( !initial )
>       {
>           atomic_set(&cpu_count, num_online_cpus());
> +        atomic_set(&done_count, num_online_cpus());
>           cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>       }
>   
> -    while ( atomic_read(&cpu_count) )
> +    while ( atomic_read(&done_count) )
>       {
>           process_pending_softirqs();
>           cpu_relax();

I think you are just narrowing the window of the race:

It is still possible to have two cpus entering rcu_barrier() and to
make it into the if ( !initial ) clause.

Instead of introducing another atomic I believe the following patch
instead of yours should do it:

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index e6add0b120..0d5469a326 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -180,23 +180,17 @@ static void rcu_barrier_action(void)

  void rcu_barrier(void)
  {
-    int initial = atomic_read(&cpu_count);
-
      while ( !get_cpu_maps() )
      {
          process_pending_softirqs();
-        if ( initial && !atomic_read(&cpu_count) )
+        if ( !atomic_read(&cpu_count) )
              return;

          cpu_relax();
-        initial = atomic_read(&cpu_count);
      }

-    if ( !initial )
-    {
-        atomic_set(&cpu_count, num_online_cpus());
+    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
          cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
-    }

      while ( atomic_read(&cpu_count) )
      {

Could you give that a try, please?


Juergen

Igor Druzhinin March 2, 2020, 1:25 p.m. UTC | #12

On 28/02/2020 07:10, Jürgen Groß wrote:
> 
> I think you are just narrowing the window of the race:
> 
> It is still possible to have two cpus entering rcu_barrier() and to
> make it into the if ( !initial ) clause.
> 
> Instead of introducing another atomic I believe the following patch
> instead of yours should do it:
> 
> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> index e6add0b120..0d5469a326 100644
> --- a/xen/common/rcupdate.c
> +++ b/xen/common/rcupdate.c
> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void)
> 
>  void rcu_barrier(void)
>  {
> -    int initial = atomic_read(&cpu_count);
> -
>      while ( !get_cpu_maps() )
>      {
>          process_pending_softirqs();
> -        if ( initial && !atomic_read(&cpu_count) )
> +        if ( !atomic_read(&cpu_count) )
>              return;
> 
>          cpu_relax();
> -        initial = atomic_read(&cpu_count);
>      }
> 
> -    if ( !initial )
> -    {
> -        atomic_set(&cpu_count, num_online_cpus());
> +    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
>          cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
> -    }
> 
>      while ( atomic_read(&cpu_count) )
>      {
> 
> Could you give that a try, please?

With this patch I cannot disable SMT at all.

The problem that my diff solved was a race between 2 consecutive
rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off
operation) where some CPUs didn't exit the cpu_count checking loop
completely but cpu_count is already reinitialized on CPU0 - this
results in some CPUs being stuck in the loop.

Igor

Jürgen Groß March 2, 2020, 2:03 p.m. UTC | #13

On 02.03.20 14:25, Igor Druzhinin wrote:
> On 28/02/2020 07:10, Jürgen Groß wrote:
>>
>> I think you are just narrowing the window of the race:
>>
>> It is still possible to have two cpus entering rcu_barrier() and to
>> make it into the if ( !initial ) clause.
>>
>> Instead of introducing another atomic I believe the following patch
>> instead of yours should do it:
>>
>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>> index e6add0b120..0d5469a326 100644
>> --- a/xen/common/rcupdate.c
>> +++ b/xen/common/rcupdate.c
>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void)
>>
>>   void rcu_barrier(void)
>>   {
>> -    int initial = atomic_read(&cpu_count);
>> -
>>       while ( !get_cpu_maps() )
>>       {
>>           process_pending_softirqs();
>> -        if ( initial && !atomic_read(&cpu_count) )
>> +        if ( !atomic_read(&cpu_count) )
>>               return;
>>
>>           cpu_relax();
>> -        initial = atomic_read(&cpu_count);
>>       }
>>
>> -    if ( !initial )
>> -    {
>> -        atomic_set(&cpu_count, num_online_cpus());
>> +    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
>>           cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>> -    }
>>
>>       while ( atomic_read(&cpu_count) )
>>       {
>>
>> Could you give that a try, please?
> 
> With this patch I cannot disable SMT at all.
> 
> The problem that my diff solved was a race between 2 consecutive
> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off
> operation) where some CPUs didn't exit the cpu_count checking loop
> completely but cpu_count is already reinitialized on CPU0 - this
> results in some CPUs being stuck in the loop.

Ah, okay, then I believe a combination of the two patches is needed.

Something like the attached version?


Juergen

Igor Druzhinin March 2, 2020, 2:23 p.m. UTC | #14

On 02/03/2020 14:03, Jürgen Groß wrote:
> On 02.03.20 14:25, Igor Druzhinin wrote:
>> On 28/02/2020 07:10, Jürgen Groß wrote:
>>>
>>> I think you are just narrowing the window of the race:
>>>
>>> It is still possible to have two cpus entering rcu_barrier() and to
>>> make it into the if ( !initial ) clause.
>>>
>>> Instead of introducing another atomic I believe the following patch
>>> instead of yours should do it:
>>>
>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>>> index e6add0b120..0d5469a326 100644
>>> --- a/xen/common/rcupdate.c
>>> +++ b/xen/common/rcupdate.c
>>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void)
>>>
>>>   void rcu_barrier(void)
>>>   {
>>> -    int initial = atomic_read(&cpu_count);
>>> -
>>>       while ( !get_cpu_maps() )
>>>       {
>>>           process_pending_softirqs();
>>> -        if ( initial && !atomic_read(&cpu_count) )
>>> +        if ( !atomic_read(&cpu_count) )
>>>               return;
>>>
>>>           cpu_relax();
>>> -        initial = atomic_read(&cpu_count);
>>>       }
>>>
>>> -    if ( !initial )
>>> -    {
>>> -        atomic_set(&cpu_count, num_online_cpus());
>>> +    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
>>>           cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>>> -    }
>>>
>>>       while ( atomic_read(&cpu_count) )
>>>       {
>>>
>>> Could you give that a try, please?
>>
>> With this patch I cannot disable SMT at all.
>>
>> The problem that my diff solved was a race between 2 consecutive
>> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off
>> operation) where some CPUs didn't exit the cpu_count checking loop
>> completely but cpu_count is already reinitialized on CPU0 - this
>> results in some CPUs being stuck in the loop.
> 
> Ah, okay, then I believe a combination of the two patches is needed.
> 
> Something like the attached version?

I apologies - my previous test result was from machine booted in core mode.
I'm now testing it properly and the original patch seems to do the trick but
I still don't understand how you can avoid the race with only 1 counter - 
it's always possible that CPU1 is still in cpu_count checking loop (even if
cpu_count is currently 0) when cpu_count is reinitialized.

I'm looking at your current version now. Was the removal of get_cpu_maps()
and recursion protection intentional? I suspect it would only work on the
latest master so I need to keep those for 4.13 testing.

Igor

Jürgen Groß March 2, 2020, 2:32 p.m. UTC | #15

On 02.03.20 15:23, Igor Druzhinin wrote:
> On 02/03/2020 14:03, Jürgen Groß wrote:
>> On 02.03.20 14:25, Igor Druzhinin wrote:
>>> On 28/02/2020 07:10, Jürgen Groß wrote:
>>>>
>>>> I think you are just narrowing the window of the race:
>>>>
>>>> It is still possible to have two cpus entering rcu_barrier() and to
>>>> make it into the if ( !initial ) clause.
>>>>
>>>> Instead of introducing another atomic I believe the following patch
>>>> instead of yours should do it:
>>>>
>>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>>>> index e6add0b120..0d5469a326 100644
>>>> --- a/xen/common/rcupdate.c
>>>> +++ b/xen/common/rcupdate.c
>>>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void)
>>>>
>>>>    void rcu_barrier(void)
>>>>    {
>>>> -    int initial = atomic_read(&cpu_count);
>>>> -
>>>>        while ( !get_cpu_maps() )
>>>>        {
>>>>            process_pending_softirqs();
>>>> -        if ( initial && !atomic_read(&cpu_count) )
>>>> +        if ( !atomic_read(&cpu_count) )
>>>>                return;
>>>>
>>>>            cpu_relax();
>>>> -        initial = atomic_read(&cpu_count);
>>>>        }
>>>>
>>>> -    if ( !initial )
>>>> -    {
>>>> -        atomic_set(&cpu_count, num_online_cpus());
>>>> +    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
>>>>            cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>>>> -    }
>>>>
>>>>        while ( atomic_read(&cpu_count) )
>>>>        {
>>>>
>>>> Could you give that a try, please?
>>>
>>> With this patch I cannot disable SMT at all.
>>>
>>> The problem that my diff solved was a race between 2 consecutive
>>> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off
>>> operation) where some CPUs didn't exit the cpu_count checking loop
>>> completely but cpu_count is already reinitialized on CPU0 - this
>>> results in some CPUs being stuck in the loop.
>>
>> Ah, okay, then I believe a combination of the two patches is needed.
>>
>> Something like the attached version?
> 
> I apologies - my previous test result was from machine booted in core mode.
> I'm now testing it properly and the original patch seems to do the trick but
> I still don't understand how you can avoid the race with only 1 counter -
> it's always possible that CPU1 is still in cpu_count checking loop (even if
> cpu_count is currently 0) when cpu_count is reinitialized.

I guess this is very very unlikely.

> I'm looking at your current version now. Was the removal of get_cpu_maps()
> and recursion protection intentional? I suspect it would only work on the
> latest master so I need to keep those for 4.13 testing.

Oh, sorry, this seems to be an old version.

Here comes the correct one.


Juergen

Igor Druzhinin March 2, 2020, 10:29 p.m. UTC | #16

On 02/03/2020 14:32, Jürgen Groß wrote:
> On 02.03.20 15:23, Igor Druzhinin wrote:
>> On 02/03/2020 14:03, Jürgen Groß wrote:
>>> On 02.03.20 14:25, Igor Druzhinin wrote:
>>>> On 28/02/2020 07:10, Jürgen Groß wrote:
>>>>>
>>>>> I think you are just narrowing the window of the race:
>>>>>
>>>>> It is still possible to have two cpus entering rcu_barrier() and to
>>>>> make it into the if ( !initial ) clause.
>>>>>
>>>>> Instead of introducing another atomic I believe the following patch
>>>>> instead of yours should do it:
>>>>>
>>>>> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
>>>>> index e6add0b120..0d5469a326 100644
>>>>> --- a/xen/common/rcupdate.c
>>>>> +++ b/xen/common/rcupdate.c
>>>>> @@ -180,23 +180,17 @@ static void rcu_barrier_action(void)
>>>>>
>>>>>    void rcu_barrier(void)
>>>>>    {
>>>>> -    int initial = atomic_read(&cpu_count);
>>>>> -
>>>>>        while ( !get_cpu_maps() )
>>>>>        {
>>>>>            process_pending_softirqs();
>>>>> -        if ( initial && !atomic_read(&cpu_count) )
>>>>> +        if ( !atomic_read(&cpu_count) )
>>>>>                return;
>>>>>
>>>>>            cpu_relax();
>>>>> -        initial = atomic_read(&cpu_count);
>>>>>        }
>>>>>
>>>>> -    if ( !initial )
>>>>> -    {
>>>>> -        atomic_set(&cpu_count, num_online_cpus());
>>>>> +    if ( atomic_cmpxchg(&cpu_count, 0, num_online_cpus()) == 0 )
>>>>>            cpumask_raise_softirq(&cpu_online_map, RCU_SOFTIRQ);
>>>>> -    }
>>>>>
>>>>>        while ( atomic_read(&cpu_count) )
>>>>>        {
>>>>>
>>>>> Could you give that a try, please?
>>>>
>>>> With this patch I cannot disable SMT at all.
>>>>
>>>> The problem that my diff solved was a race between 2 consecutive
>>>> rcu_barrier operations on CPU0 (the pattern specific to SMT-on/off
>>>> operation) where some CPUs didn't exit the cpu_count checking loop
>>>> completely but cpu_count is already reinitialized on CPU0 - this
>>>> results in some CPUs being stuck in the loop.
>>>
>>> Ah, okay, then I believe a combination of the two patches is needed.
>>>
>>> Something like the attached version?
>>
>> I apologies - my previous test result was from machine booted in core mode.
>> I'm now testing it properly and the original patch seems to do the trick but
>> I still don't understand how you can avoid the race with only 1 counter -
>> it's always possible that CPU1 is still in cpu_count checking loop (even if
>> cpu_count is currently 0) when cpu_count is reinitialized.
> 
> I guess this is very very unlikely.
> 
>> I'm looking at your current version now. Was the removal of get_cpu_maps()
>> and recursion protection intentional? I suspect it would only work on the
>> latest master so I need to keep those for 4.13 testing.
> 
> Oh, sorry, this seems to be an old version.
> 
> Here comes the correct one.

I checked this version and it's supposed to be fine for v3 I guess. However,
I wasn't able to check how well it would work in core mode though as CPU hot off
is generally broken in it now (at least it boots in core mode with rcu_barrier
called on CPU bring-up).

Igor

[v2,0/4] xen/rcu: let rcu work better with core scheduling

Message

Comments