From patchwork Tue Sep 6 09:20:33 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wanpeng Li X-Patchwork-Id: 9316247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C6AF9601C0 for ; Tue, 6 Sep 2016 09:23:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C27C628C0E for ; Tue, 6 Sep 2016 09:23:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B6F9F28C2A; Tue, 6 Sep 2016 09:23:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2AC1928C0E for ; Tue, 6 Sep 2016 09:23:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933053AbcIFJVO (ORCPT ); Tue, 6 Sep 2016 05:21:14 -0400 Received: from mail-pa0-f67.google.com ([209.85.220.67]:33751 "EHLO mail-pa0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754890AbcIFJUi (ORCPT ); Tue, 6 Sep 2016 05:20:38 -0400 Received: by mail-pa0-f67.google.com with SMTP id h5so314756pao.0; Tue, 06 Sep 2016 02:20:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=VR2b557xSYMF+u9lzU2l17OEfNqZ2l8Xc3z7V7Ttiwc=; b=ihwwMkZyf3Iu4/EecWdBLSpyntViQeel+ARIHqsTQJS5yluj9y5I0y5bcEWax1ye23 jJX9Cqg1EgrgXEHF4QuPo6wO5VZP1puhIu/8cy3rzF1JpDA0yjrz5LgTwjhNrDh/O/6f 8XHltghCIDvbFraQfFO0RvvcD33JSJ4e/nWgOZtLKr7B7+ot6jVBZCSh7KLV4r5nUnzD N5ivTTK+ylFTDObtn5vwxyj38dsdrbpJyT7dhoJj3x5NspPLEkElI108sOAtTWFvAqXg +3WFE/LsBK3iG5tpqtWf3c9uhP8VHz9DzPIsW82FKzylRcVUidkpuWlTrUj5mAHX8h9H AG0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=VR2b557xSYMF+u9lzU2l17OEfNqZ2l8Xc3z7V7Ttiwc=; b=XkeXmJJsCXY4IgP8/z1hP6fQpoyehzCis1w0l8wnFhV3SpD8pV9pV3Kx9bVFfJwc13 QTLxcnhcSf6vwgAT+i+GQ62m5b9qEmVBy9m2I/HJvjnS7sX5WdE1o9XB1dM0Q5EZ8TnF SuOimsm61DG2HfYG4o3wSyR0Dr7ae+RwhrcLyFWSQMp7FvKTP64rW1nEy8v7LfpoOHXN tkKuDaHS3CcEu0z1LlQwPxYbvhS6OHRQVrlktCAol3cbkfZKrztM3qcspgC5JtrgxXhp gm6xsoMrSsxzxjZqft4sLedSw1p4kT8DH2XVZoXZ/OHmV1yJ0jf8CEN179uT79RN97MH u4kw== X-Gm-Message-State: AE9vXwPJjaKD2/uydoGwo9LlYT0dB0hM8Hq0bGSrv9M4YHzNneAKt2xeyPZkgKALh5bEXA== X-Received: by 10.66.72.106 with SMTP id c10mr70799218pav.18.1473153638052; Tue, 06 Sep 2016 02:20:38 -0700 (PDT) Received: from kernel.kingsoft.cn ([114.255.44.132]) by smtp.gmail.com with ESMTPSA id k78sm39201020pfa.78.2016.09.06.02.20.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 06 Sep 2016 02:20:37 -0700 (PDT) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Wanpeng Li , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Yunhong Jiang Subject: [PATCH] KVM: nVMX: Fix reload apic access page warning Date: Tue, 6 Sep 2016 17:20:33 +0800 Message-Id: <1473153633-4725-1-git-send-email-wanpeng.li@hotmail.com> X-Mailer: git-send-email 1.9.1 MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wanpeng Li WARNING: CPU: 1 PID: 4230 at kernel/sched/core.c:7564 __might_sleep+0x7e/0x80 do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_swait+0x39/0xa0 CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_fmt+0x4f/0x60 ? prepare_to_swait+0x39/0xa0 ? prepare_to_swait+0x39/0xa0 __might_sleep+0x7e/0x80 __gfn_to_pfn_memslot+0x156/0x480 [kvm] gfn_to_pfn+0x2a/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] Reviewed-by: Radim Krčmář =============================== [ INFO: suspicious RCU usage. ] 4.8.0-rc5+ #47 Not tainted ------------------------------- ./include/linux/kvm_host.h:535 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/4230: #0: (&vcpu->mutex){+.+.+.}, at: [] vcpu_load+0x1c/0x60 [kvm] stack backtrace: CPU: 1 PID: 4230 Comm: qemu-system-x86 Not tainted 4.8.0-rc5+ #47 Call Trace: dump_stack+0x99/0xd0 lockdep_rcu_suspicious+0xe7/0x120 gfn_to_memslot+0x12a/0x140 [kvm] gfn_to_pfn+0x12/0x30 [kvm] gfn_to_page+0xe/0x20 [kvm] kvm_vcpu_reload_apic_access_page+0x32/0xa0 [kvm] nested_vmx_vmexit+0x765/0xca0 [kvm_intel] ? _raw_spin_unlock_irqrestore+0x36/0x80 vmx_check_nested_events+0x49/0x1f0 [kvm_intel] kvm_arch_vcpu_runnable+0x2d/0xe0 [kvm] kvm_vcpu_check_block+0x12/0x60 [kvm] kvm_vcpu_block+0x94/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x619/0x1aa0 [kvm] ? kvm_arch_vcpu_ioctl_run+0xdf1/0x1aa0 [kvm] kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0xfd/0x210 ? __lock_is_held+0x54/0x70 do_vfs_ioctl+0x96/0x6a0 ? __fget+0x11c/0x210 ? __fget+0x5/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x81/0x220 entry_SYSCALL64_slow_path+0x25/0x25 These can be triggered by running kvm-unit-test: ./x86-run x86/vmx.flat The nested preemption timer is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. The current logic adds vCPU to a simple waitqueue (TASK_INTERRUPTIBLE) if need to yield pCPU and w/o holding srcu read lock when accesses memslots, both can be in nested preemption timer evaluation path which results in the warning above. This patch fix it by leveraging request bit to async reload APIC access page before vmentry in order to avoid to reload directly during the nested preemption timer evaluation, it is safe since the vmcs01 is loaded and current is nested vmexit. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Yunhong Jiang Signed-off-by: Wanpeng Li --- arch/x86/kvm/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5cede40..ee059ce 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10793,7 +10793,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, * We are now running in L2, mmu_notifier will force to reload the * page's hpa for L2 vmcs. Need to reload it for L1 before entering L1. */ - kvm_vcpu_reload_apic_access_page(vcpu); + kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); /* * Exiting from L2 to L1, we're now back to L1 which thinks it just