diff mbox

KVM: nVMX: Fix reload apic access page warning

Message ID 1473153633-4725-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li Sept. 6, 2016, 9:20 a.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>
WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
 dump_stack+0x99/0xd0
 __warn+0xd1/0xf0
 warn_slowpath_fmt+0x4f/0x60
 ? prepare_to_swait+0x39/0xa0
 ? prepare_to_swait+0x39/0xa0
 __might_sleep+0x7e/0x80
 __gfn_to_pfn_memslot+0x156/0x480 [kvm]
 gfn_to_pfn+0x2a/0x30 [kvm]
 gfn_to_page+0xe/0x20 [kvm]
 kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
 nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
 ? _raw_spin_unlock_irqrestore+0x36/0x80
 vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
 kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
 kvm_vcpu_check_block+0x12/0x60 [kvm]
 kvm_vcpu_block+0x94/0x4c0 [kvm]
 kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
 ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
 kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]

Comments

Wanpeng Li Sept. 16, 2016, 1:07 a.m. UTC | #1
Ping, :)
2016-09-06 17:20 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
> Call Trace:
>  dump_stack+0x99/0xd0
>  __warn+0xd1/0xf0
>  warn_slowpath_fmt+0x4f/0x60
>  ? prepare_to_swait+0x39/0xa0
>  ? prepare_to_swait+0x39/0xa0
>  __might_sleep+0x7e/0x80
>  __gfn_to_pfn_memslot+0x156/0x480 [kvm]
>  gfn_to_pfn+0x2a/0x30 [kvm]
>  gfn_to_page+0xe/0x20 [kvm]
>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>
> ===============================
> [ INFO: suspicious RCU usage. ]
> 4.8.0-rc5+ #47 Not tainted
> -------------------------------
> ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by qemu-system-x86/4230:
>  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
>
> stack backtrace:
> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
> Call Trace:
>  dump_stack+0x99/0xd0
>  lockdep_rcu_suspicious+0xe7/0x120
>  gfn_to_memslot+0x12a/0x140 [kvm]
>  gfn_to_pfn+0x12/0x30 [kvm]
>  gfn_to_page+0xe/0x20 [kvm]
>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>  ? __fget+0xfd/0x210
>  ? __lock_is_held+0x54/0x70
>  do_vfs_ioctl+0x96/0x6a0
>  ? __fget+0x11c/0x210
>  ? __fget+0x5/0x210
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x81/0x220
>  entry_SYSCALL64_slow_path+0x25/0x25
>
> These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
>
> The nested preemption timer is based on hrtimer which is started on L2
> entry, stopped on L2 exit and evaluated via the new check_nested_events
> hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
> if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
> both can be in nested preemption timer evaluation path which results in
> the warning above.
>
> This patch fix it by leveraging request bit to async reload APIC access
> page before vmentry in order to avoid to reload directly during the nested
> preemption timer evaluation, it is safe since the vmcs01 is loaded and
> current is nested vmexit.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/vmx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5cede40..ee059ce 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10793,7 +10793,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>          * We are now running in L2, mmu_notifier will force to reload the
>          * page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
>          */
> -       kvm_vcpu_reload_apic_access_page(vcpu);
> +       kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
>
>         /*
>          * Exiting from L2 to L1, we're now back to L1 which thinks it just
> --
> 1.9.1
>
Radim Krčmář Sept. 20, 2016, 1:24 p.m. UTC | #2
2016-09-06 17:20+0800, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
> Call Trace:
>  dump_stack+0x99/0xd0
>  __warn+0xd1/0xf0
>  warn_slowpath_fmt+0x4f/0x60
>  ? prepare_to_swait+0x39/0xa0
>  ? prepare_to_swait+0x39/0xa0
>  __might_sleep+0x7e/0x80
>  __gfn_to_pfn_memslot+0x156/0x480 [kvm]
>  gfn_to_pfn+0x2a/0x30 [kvm]
>  gfn_to_page+0xe/0x20 [kvm]
>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> 
> ===============================
> [ INFO: suspicious RCU usage. ]
> 4.8.0-rc5+ #47 Not tainted
> -------------------------------
> ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by qemu-system-x86/4230:
>  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
> 
> stack backtrace:
> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
> Call Trace:
>  dump_stack+0x99/0xd0
>  lockdep_rcu_suspicious+0xe7/0x120
>  gfn_to_memslot+0x12a/0x140 [kvm]
>  gfn_to_pfn+0x12/0x30 [kvm]
>  gfn_to_page+0xe/0x20 [kvm]
>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>  ? __fget+0xfd/0x210
>  ? __lock_is_held+0x54/0x70
>  do_vfs_ioctl+0x96/0x6a0
>  ? __fget+0x11c/0x210
>  ? __fget+0x5/0x210
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x81/0x220
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
> 
> The nested preemption timer is based on hrtimer which is started on L2 
> entry, stopped on L2 exit and evaluated via the new check_nested_events 
> hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) 
> if need to yield pCPU and w/o holding srcu read lock when accesses memslots, 
> both can be in nested preemption timer evaluation path which results in 
> the warning above. 
> 
> This patch fix it by leveraging request bit to async reload APIC access 
> page before vmentry in order to avoid to reload directly during the nested 
> preemption timer evaluation, it is safe since the vmcs01 is loaded and 
> current is nested vmexit.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li Sept. 21, 2016, 2:33 a.m. UTC | #3
2016-09-20 21:24 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2016-09-06 17:20+0800, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
>> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
>> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
>> Call Trace:
>>  dump_stack+0x99/0xd0
>>  __warn+0xd1/0xf0
>>  warn_slowpath_fmt+0x4f/0x60
>>  ? prepare_to_swait+0x39/0xa0
>>  ? prepare_to_swait+0x39/0xa0
>>  __might_sleep+0x7e/0x80
>>  __gfn_to_pfn_memslot+0x156/0x480 [kvm]
>>  gfn_to_pfn+0x2a/0x30 [kvm]
>>  gfn_to_page+0xe/0x20 [kvm]
>>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>>
>> ===============================
>> [ INFO: suspicious RCU usage. ]
>> 4.8.0-rc5+ #47 Not tainted
>> -------------------------------
>> ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
>>
>> other info that might help us debug this:
>>
>>
>> rcu_scheduler_active = 1, debug_locks = 0
>> 1 lock held by qemu-system-x86/4230:
>>  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
>>
>> stack backtrace:
>> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
>> Call Trace:
>>  dump_stack+0x99/0xd0
>>  lockdep_rcu_suspicious+0xe7/0x120
>>  gfn_to_memslot+0x12a/0x140 [kvm]
>>  gfn_to_pfn+0x12/0x30 [kvm]
>>  gfn_to_page+0xe/0x20 [kvm]
>>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>>  ? __fget+0xfd/0x210
>>  ? __lock_is_held+0x54/0x70
>>  do_vfs_ioctl+0x96/0x6a0
>>  ? __fget+0x11c/0x210
>>  ? __fget+0x5/0x210
>>  SyS_ioctl+0x79/0x90
>>  do_syscall_64+0x81/0x220
>>  entry_SYSCALL64_slow_path+0x25/0x25
>>
>> These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
>>
>> The nested preemption timer is based on hrtimer which is started on L2
>> entry, stopped on L2 exit and evaluated via the new check_nested_events
>> hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE)
>> if need to yield pCPU and w/o holding srcu read lock when accesses memslots,
>> both can be in nested preemption timer evaluation path which results in
>> the warning above.
>>
>> This patch fix it by leveraging request bit to async reload APIC access
>> page before vmentry in order to avoid to reload directly during the nested
>> preemption timer evaluation, it is safe since the vmcs01 is loaded and
>> current is nested vmexit.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>
> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>

Thanks. :)

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Sept. 22, 2016, 11:10 p.m. UTC | #4
2016-09-06 17:20+0800, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80
> do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8d0de7f9>] prepare_to_swait+0x39/0xa0
> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
> Call Trace:
>  dump_stack+0x99/0xd0
>  __warn+0xd1/0xf0
>  warn_slowpath_fmt+0x4f/0x60
>  ? prepare_to_swait+0x39/0xa0
>  ? prepare_to_swait+0x39/0xa0
>  __might_sleep+0x7e/0x80
>  __gfn_to_pfn_memslot+0x156/0x480 [kvm]
>  gfn_to_pfn+0x2a/0x30 [kvm]
>  gfn_to_page+0xe/0x20 [kvm]
>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
> 
> ===============================
> [ INFO: suspicious RCU usage. ]
> 4.8.0-rc5+ #47 Not tainted
> -------------------------------
> ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by qemu-system-x86/4230:
>  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]
> 
> stack backtrace:
> CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
> Call Trace:
>  dump_stack+0x99/0xd0
>  lockdep_rcu_suspicious+0xe7/0x120
>  gfn_to_memslot+0x12a/0x140 [kvm]
>  gfn_to_pfn+0x12/0x30 [kvm]
>  gfn_to_page+0xe/0x20 [kvm]
>  kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
>  nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
>  ? _raw_spin_unlock_irqrestore+0x36/0x80
>  vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
>  kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
>  kvm_vcpu_check_block+0x12/0x60 [kvm]
>  kvm_vcpu_block+0x94/0x4c0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>  ? __fget+0xfd/0x210
>  ? __lock_is_held+0x54/0x70
>  do_vfs_ioctl+0x96/0x6a0
>  ? __fget+0x11c/0x210
>  ? __fget+0x5/0x210
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x81/0x220
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat
> 
> The nested preemption timer is based on hrtimer which is started on L2 
> entry, stopped on L2 exit and evaluated via the new check_nested_events 
> hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) 
> if need to yield pCPU and w/o holding srcu read lock when accesses memslots, 
> both can be in nested preemption timer evaluation path which results in 
> the warning above. 
> 
> This patch fix it by leveraging request bit to async reload APIC access 
> page before vmentry in order to avoid to reload directly during the nested 
> preemption timer evaluation, it is safe since the vmcs01 is loaded and 
> current is nested vmexit.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---

Applied to kvm/queue, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

===============================
[ INFO: suspicious RCU usage. ]
4.8.0-rc5+ #47 Not tainted
-------------------------------
./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/4230:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffc062975c>] vcpu_load+0x1c/0x60 [kvm]

stack backtrace:
CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47
Call Trace:
 dump_stack+0x99/0xd0
 lockdep_rcu_suspicious+0xe7/0x120
 gfn_to_memslot+0x12a/0x140 [kvm]
 gfn_to_pfn+0x12/0x30 [kvm]
 gfn_to_page+0xe/0x20 [kvm]
 kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm]
 nested_vmx_vmexit+0x765/0xca0 [kvm_intel]
 ? _raw_spin_unlock_irqrestore+0x36/0x80
 vmx_check_nested_events+0x49/0x1f0 [kvm_intel]
 kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm]
 kvm_vcpu_check_block+0x12/0x60 [kvm]
 kvm_vcpu_block+0x94/0x4c0 [kvm]
 kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm]
 ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm]
 kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
 ? __fget+0xfd/0x210
 ? __lock_is_held+0x54/0x70
 do_vfs_ioctl+0x96/0x6a0
 ? __fget+0x11c/0x210
 ? __fget+0x5/0x210
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x81/0x220
 entry_SYSCALL64_slow_path+0x25/0x25

These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat

The nested preemption timer is based on hrtimer which is started on L2 
entry, stopped on L2 exit and evaluated via the new check_nested_events 
hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) 
if need to yield pCPU and w/o holding srcu read lock when accesses memslots, 
both can be in nested preemption timer evaluation path which results in 
the warning above. 

This patch fix it by leveraging request bit to async reload APIC access 
page before vmentry in order to avoid to reload directly during the nested 
preemption timer evaluation, it is safe since the vmcs01 is loaded and 
current is nested vmexit.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5cede40..ee059ce 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10793,7 +10793,7 @@  static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	 * We are now running in L2, mmu_notifier will force to reload the
 	 * page's hpa for L2 vmcs. Need to reload it for L1 before entering L1.
 	 */
-	kvm_vcpu_reload_apic_access_page(vcpu);
+	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 
 	/*
 	 * Exiting from L2 to L1, we're now back to L1 which thinks it just