diff mbox

KVM: x86: fix access memslots w/o hold srcu read lock

Message ID 54531E7B.1040006@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tiejun Chen Oct. 31, 2014, 5:30 a.m. UTC
On 2014/10/31 12:33, Wanpeng Li wrote:
> The srcu read lock must be held while accessing memslots (e.g.
> when using gfn_to_* functions), however, commit c24ae0dcd3e8
> ("kvm: x86: Unpin and remove kvm_arch->apic_access_page") call
> gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it
> which leads to suspicious rcu_dereference_check() usage warning.
> This patch fix it by holding srcu read lock when call gfn_to_page()
> in kvm_vcpu_reload_apic_access_page() function.
>
>
> [ INFO: suspicious RCU usage. ]
> 3.18.0-rc2-test2+ #70 Not tainted
> -------------------------------
> include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by qemu-system-x86/2371:
>   #0:  (&vcpu->mutex){+.+...}, at: [<ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm]
>
> stack backtrace:
> CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
> Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
>   0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000
>   ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00
>   ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08
> Call Trace:
>   [<ffffffff816f514f>] dump_stack+0x4e/0x71
>   [<ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120
>   [<ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm]
>   [<ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm]
>   [<ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm]
>   [<ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
>   [<ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
>   [<ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
>   [<ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
>   [<ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm]
>   [<ffffffff810bc664>] ? __lock_is_held+0x54/0x80
>   [<ffffffff812231f0>] do_vfs_ioctl+0x300/0x520
>   [<ffffffff8122ee45>] ? __fget+0x5/0x250
>   [<ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0
>   [<ffffffff81223491>] SyS_ioctl+0x81/0xa0
>   [<ffffffff816fed6d>] system_call_fastpath+0x16/0x1b
>
> Reported-by: Takashi Iwai <tiwai@suse.de>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> ---
>   arch/x86/kvm/x86.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0033df3..2d97329 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6059,6 +6059,7 @@ static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>   void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>   {
>   	struct page *page = NULL;
> +	int idx;
>
>   	if (!irqchip_in_kernel(vcpu->kvm))
>   		return;
> @@ -6066,7 +6067,9 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>   	if (!kvm_x86_ops->set_apic_access_page_addr)
>   		return;
>
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);

There's another scenario that we already hold srcu before call 
kvm_vcpu_reload_apic_access_page(),

__vcpu_run()
     |
     + vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
     + r = vcpu_enter_guest(vcpu);
         |
         + kvm_vcpu_reload_apic_access_page(vcpu);

So according to backtrace I think we should fix as follows:

kvm: x86: vmx: hold kvm->srcu while reload apic access page

kvm_vcpu_reload_apic_access_page() needs to access memslots via
gfn_to_page(), so its necessary to hold kvm->srcu.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
  arch/x86/kvm/vmx.c | 3 +++
  1 file changed, 3 insertions(+)

Comments

Wanpeng Li Oct. 31, 2014, 6:11 a.m. UTC | #1
Hi Tiejun,
On Fri, Oct 31, 2014 at 01:30:35PM +0800, Chen, Tiejun wrote:
>On 2014/10/31 12:33, Wanpeng Li wrote:
>>The srcu read lock must be held while accessing memslots (e.g.
>>when using gfn_to_* functions), however, commit c24ae0dcd3e8
>>("kvm: x86: Unpin and remove kvm_arch->apic_access_page") call
>>gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it
>>which leads to suspicious rcu_dereference_check() usage warning.
>>This patch fix it by holding srcu read lock when call gfn_to_page()
>>in kvm_vcpu_reload_apic_access_page() function.
>>
>>
>>[ INFO: suspicious RCU usage. ]
>>3.18.0-rc2-test2+ #70 Not tainted
>>-------------------------------
>>include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!
>>
>>other info that might help us debug this:
>>
>>rcu_scheduler_active = 1, debug_locks = 0
>>1 lock held by qemu-system-x86/2371:
>>  #0:  (&vcpu->mutex){+.+...}, at: [<ffffffffa037d800>] vcpu_load+0x20/0xd0 [kvm]
>>
>>stack backtrace:
>>CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
>>Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
>>  0000000000000001 ffff880209983ca8 ffffffff816f514f 0000000000000000
>>  ffff8802099b8990 ffff880209983cd8 ffffffff810bd687 00000000000fee00
>>  ffff880208a2c000 ffff880208a10000 ffff88020ef50040 ffff880209983d08
>>Call Trace:
>>  [<ffffffff816f514f>] dump_stack+0x4e/0x71
>>  [<ffffffff810bd687>] lockdep_rcu_suspicious+0xe7/0x120
>>  [<ffffffffa037d055>] gfn_to_memslot+0xd5/0xe0 [kvm]
>>  [<ffffffffa03807d3>] __gfn_to_pfn+0x33/0x60 [kvm]
>>  [<ffffffffa0380885>] gfn_to_page+0x25/0x90 [kvm]
>>  [<ffffffffa038aeec>] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
>>  [<ffffffffa08f0a9c>] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
>>  [<ffffffffa039ab8e>] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
>>  [<ffffffffa039ac0c>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
>>  [<ffffffffa037f7e0>] kvm_vm_ioctl+0x1d0/0x780 [kvm]
>>  [<ffffffff810bc664>] ? __lock_is_held+0x54/0x80
>>  [<ffffffff812231f0>] do_vfs_ioctl+0x300/0x520
>>  [<ffffffff8122ee45>] ? __fget+0x5/0x250
>>  [<ffffffff8122f0fa>] ? __fget_light+0x2a/0xe0
>>  [<ffffffff81223491>] SyS_ioctl+0x81/0xa0
>>  [<ffffffff816fed6d>] system_call_fastpath+0x16/0x1b
>>
>>Reported-by: Takashi Iwai <tiwai@suse.de>
>>Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>>---
>>  arch/x86/kvm/x86.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>index 0033df3..2d97329 100644
>>--- a/arch/x86/kvm/x86.c
>>+++ b/arch/x86/kvm/x86.c
>>@@ -6059,6 +6059,7 @@ static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
>>  void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>>  {
>>  	struct page *page = NULL;
>>+	int idx;
>>
>>  	if (!irqchip_in_kernel(vcpu->kvm))
>>  		return;
>>@@ -6066,7 +6067,9 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>>  	if (!kvm_x86_ops->set_apic_access_page_addr)
>>  		return;
>>
>>+	idx = srcu_read_lock(&vcpu->kvm->srcu);
>
>There's another scenario that we already hold srcu before call
>kvm_vcpu_reload_apic_access_page(),
>
>__vcpu_run()
>    |
>    + vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>    + r = vcpu_enter_guest(vcpu);
>        |
>        + kvm_vcpu_reload_apic_access_page(vcpu);
>

You are right. Great thanks for your pointing out. After recheck all the
callsites of kvm_vcpu_reload_apic_access_page(), just vmx_vcpu_reset()
path need to be fixed.

Regards,
Wanpeng Li

>So according to backtrace I think we should fix as follows:
>
>kvm: x86: vmx: hold kvm->srcu while reload apic access page
>
>kvm_vcpu_reload_apic_access_page() needs to access memslots via
>gfn_to_page(), so its necessary to hold kvm->srcu.
>
>Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>---
> arch/x86/kvm/vmx.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>index b25a588..9fa1f46 100644
>--- a/arch/x86/kvm/vmx.c
>+++ b/arch/x86/kvm/vmx.c
>@@ -4442,6 +4442,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
> {
>        struct vcpu_vmx *vmx = to_vmx(vcpu);
>        struct msr_data apic_base_msr;
>+       int idx;
>
>        vmx->rmode.vm86_active = 0;
>
>@@ -4509,7 +4510,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>                vmcs_write32(TPR_THRESHOLD, 0);
>        }
>
>+       idx = srcu_read_lock(&vcpu->kvm->srcu);
>        kvm_vcpu_reload_apic_access_page(vcpu);
>+       srcu_read_unlock(&vcpu->kvm->srcu, idx);
>
>        if (vmx_vm_has_apicv(vcpu->kvm))
>                memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
>-- 
>1.9.1
>
>Thanks
>Tiejun
>>  	page = gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
>>+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>  	kvm_x86_ops->set_apic_access_page_addr(vcpu, page_to_phys(page));
>>
>>  	/*
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Oct. 31, 2014, 10:36 a.m. UTC | #2
On 31/10/2014 06:30, Chen, Tiejun wrote:
> 
> @@ -4442,6 +4442,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>  {
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
>         struct msr_data apic_base_msr;
> +       int idx;
> 
>         vmx->rmode.vm86_active = 0;
> 
> @@ -4509,7 +4510,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>                 vmcs_write32(TPR_THRESHOLD, 0);
>         }
> 
> +       idx = srcu_read_lock(&vcpu->kvm->srcu);
>         kvm_vcpu_reload_apic_access_page(vcpu);
> +       srcu_read_unlock(&vcpu->kvm->srcu, idx);
> 
>         if (vmx_vm_has_apicv(vcpu->kvm))
>                 memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));

Not enough; you can call vcpu_enter_guest -> kvm_apic_accept_events ->
kvm_vcpu_reset -> vmx_vcpu_reset while under the SRCU lock.  The right
place to add the lock is kvm_arch_vcpu_setup.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li Oct. 31, 2014, 11:10 a.m. UTC | #3
Hi Paolo,
On 14/10/31 ??6:36, Paolo Bonzini wrote:
>
> On 31/10/2014 06:30, Chen, Tiejun wrote:
>> @@ -4442,6 +4442,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>>   {
>>          struct vcpu_vmx *vmx = to_vmx(vcpu);
>>          struct msr_data apic_base_msr;
>> +       int idx;
>>
>>          vmx->rmode.vm86_active = 0;
>>
>> @@ -4509,7 +4510,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>>                  vmcs_write32(TPR_THRESHOLD, 0);
>>          }
>>
>> +       idx = srcu_read_lock(&vcpu->kvm->srcu);
>>          kvm_vcpu_reload_apic_access_page(vcpu);
>> +       srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>
>>          if (vmx_vm_has_apicv(vcpu->kvm))
>>                  memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
> Not enough; you can call vcpu_enter_guest -> kvm_apic_accept_events ->
> kvm_vcpu_reset -> vmx_vcpu_reset while under the SRCU lock.  The right
> place to add the lock is kvm_arch_vcpu_setup.

Ah, ok, I will send a newer version tomorrow. ;-)

Regards,
Wanpeng Li

>
> Thanks,
>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li Nov. 1, 2014, 2:49 a.m. UTC | #4
Hi Paolo,
On 14/10/31 ??6:36, Paolo Bonzini wrote:
>
> On 31/10/2014 06:30, Chen, Tiejun wrote:
>> @@ -4442,6 +4442,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>>   {
>>          struct vcpu_vmx *vmx = to_vmx(vcpu);
>>          struct msr_data apic_base_msr;
>> +       int idx;
>>
>>          vmx->rmode.vm86_active = 0;
>>
>> @@ -4509,7 +4510,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
>>                  vmcs_write32(TPR_THRESHOLD, 0);
>>          }
>>
>> +       idx = srcu_read_lock(&vcpu->kvm->srcu);
>>          kvm_vcpu_reload_apic_access_page(vcpu);
>> +       srcu_read_unlock(&vcpu->kvm->srcu, idx);
>>
>>          if (vmx_vm_has_apicv(vcpu->kvm))
>>                  memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));
> Not enough; you can call vcpu_enter_guest -> kvm_apic_accept_events ->
> kvm_vcpu_reset -> vmx_vcpu_reset while under the SRCU lock.  The right
> place to add the lock is kvm_arch_vcpu_setup.

This is also not enough. I see the warning in the below path during the 
test:

kvm_arch_vcpu_ioctl_run
-> kvm_apic_accept_events
-> kvm_vcpu_reset

I just send out the version 3 and hope it can take care all the 
situations. ;-)

Regards,
Wanpeng Li

>
> Thanks,
>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b25a588..9fa1f46 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4442,6 +4442,7 @@  static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
  {
         struct vcpu_vmx *vmx = to_vmx(vcpu);
         struct msr_data apic_base_msr;
+       int idx;

         vmx->rmode.vm86_active = 0;

@@ -4509,7 +4510,9 @@  static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
                 vmcs_write32(TPR_THRESHOLD, 0);
         }

+       idx = srcu_read_lock(&vcpu->kvm->srcu);
         kvm_vcpu_reload_apic_access_page(vcpu);
+       srcu_read_unlock(&vcpu->kvm->srcu, idx);

         if (vmx_vm_has_apicv(vcpu->kvm))
                 memset(&vmx->pi_desc, 0, sizeof(struct pi_desc));