From patchwork Tue Jun 16 21:48:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11608711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2870013A0 for ; Tue, 16 Jun 2020 21:49:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 09D572082E for ; Tue, 16 Jun 2020 21:49:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="b+GLksqh" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726312AbgFPVtP (ORCPT ); Tue, 16 Jun 2020 17:49:15 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:29075 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725849AbgFPVtO (ORCPT ); Tue, 16 Jun 2020 17:49:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592344152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HJopFOxb2oxM6WdAq1omP3wKyWlk0t81tbJtj1deElA=; b=b+GLksqhNcP1Z0F1sXEEof3580ZAxXSvb6Kv+sO/39CtSRHfeceJ1E5jvrEDZ8RQO5lRcj s6tafEp38X4m2HelIGRqxwliuleqxp9NsyjA42S5UR6cg67SZiuMh2T5SyNZ2rjV0v+0WR Pk7ZY7aVK3aeJxRC10AlZMoE6cOIyHk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-274-9HvqEOY8M165iHPSCcEe1A-1; Tue, 16 Jun 2020 17:49:11 -0400 X-MC-Unique: 9HvqEOY8M165iHPSCcEe1A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D315B18585C8; Tue, 16 Jun 2020 21:49:09 +0000 (UTC) Received: from horse.redhat.com (ovpn-114-132.rdu2.redhat.com [10.10.114.132]) by smtp.corp.redhat.com (Postfix) with ESMTP id 510E419C71; Tue, 16 Jun 2020 21:49:03 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id C4B81225E4A; Tue, 16 Jun 2020 17:49:02 -0400 (EDT) From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, vgoyal@redhat.com, vkuznets@redhat.com, pbonzini@redhat.com, wanpengli@tencent.com, sean.j.christopherson@intel.com Subject: [PATCH 1/3] kvm,x86: Force sync fault if previous attempts failed Date: Tue, 16 Jun 2020 17:48:45 -0400 Message-Id: <20200616214847.24482-2-vgoyal@redhat.com> In-Reply-To: <20200616214847.24482-1-vgoyal@redhat.com> References: <20200616214847.24482-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Page fault error handling behavior in kvm seems little inconsistent when page fault reports error. If we are doing fault synchronously then we capture error (-EFAULT) returned by __gfn_to_pfn_memslot() and exit to user space and qemu reports error, "error: kvm run failed Bad address". But if we are doing async page fault, then async_pf_execute() will simply ignore the error reported by get_user_pages_remote(). It is assumed that page fault was successful and either a page ready event is injected in guest or guest is brought out of artificial halt state and run again. In both the cases when guest retries the instruction, it takes exit again as page fault was not successful in previous attempt. And then this infinite loop continues forever. This patch tries to make this behavior consistent. That is instead of getting into infinite loop of retrying page fault, exit to user space and stop VM if page fault error happens. This can be improved by injecting errors in guest when it is allowed. Later patches can inject error when a process in guest triggered page fault and in that case guest process will receive SIGBUS. Currently we don't have a way to inject errors when guest is in kernel mode. Once we have race free method to do so, we should be able to inject errors and guest can do fixup_exception() if caller set it up so (virtio-fs). When async pf encounters error then save that pfn and when next time guest retries, do a sync fault instead of async fault. So that if error is encountered, we exit to qemu and avoid infinite loop. As of now only one error pfn is stored and that means it could be overwritten before next a retry from guest happens. But this is just a hint and if we miss it, some other time we will catch it. If this becomes an issue, we could maintain an array of error gfn later to help ease the issue. Signed-off-by: Vivek Goyal --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/x86.c | 19 +++++++++++++++++-- include/linux/kvm_host.h | 1 + virt/kvm/async_pf.c | 8 ++++++-- 6 files changed, 27 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 938497a6ebd7..348a73106556 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -778,6 +778,7 @@ struct kvm_vcpu_arch { unsigned long nested_apf_token; bool delivery_as_pf_vmexit; bool pageready_pending; + gfn_t error_gfn; } apf; /* OSVW MSRs (AMD only) */ diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 0ad06bfe2c2c..6fa085ff07a3 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -60,7 +60,7 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots); void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer); void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, bool accessed_dirty, gpa_t new_eptp); -bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu); +bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn); int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, u64 fault_address, char *insn, int insn_len); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 15984dfde427..e207900ab303 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4078,7 +4078,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, if (!async) return false; /* *pfn has correct page already */ - if (!prefault && kvm_can_do_async_pf(vcpu)) { + if (!prefault && kvm_can_do_async_pf(vcpu, cr2_or_gpa >> PAGE_SHIFT)) { trace_kvm_try_async_get_page(cr2_or_gpa, gfn); if (kvm_find_async_pf_gfn(vcpu, gfn)) { trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 697d1b273a2f..4c5205434b9c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10513,7 +10513,7 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) return true; } -bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu) +bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn) { if (unlikely(!lapic_in_kernel(vcpu) || kvm_event_needs_reinjection(vcpu) || @@ -10527,7 +10527,13 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu) * If interrupts are off we cannot even use an artificial * halt state. */ - return kvm_arch_interrupt_allowed(vcpu); + if (!kvm_arch_interrupt_allowed(vcpu)) + return false; + + if (vcpu->arch.apf.error_gfn == gfn) + return false; + + return true; } bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, @@ -10583,6 +10589,15 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, kvm_apic_set_irq(vcpu, &irq, NULL); } + if (work->error_code) { + /* + * Page fault failed and we don't have a way to report + * errors back to guest yet. So save this pfn and force + * sync page fault next time. + */ + vcpu->arch.apf.error_gfn = work->cr2_or_gpa >> PAGE_SHIFT; + } + vcpu->arch.apf.halted = false; vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 216cdb6581e2..b8558334b184 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -207,6 +207,7 @@ struct kvm_async_pf { struct kvm_arch_async_pf arch; bool wakeup_all; bool notpresent_injected; + int error_code; }; void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index a36828fbf40a..6b30374a4de1 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -52,6 +52,7 @@ static void async_pf_execute(struct work_struct *work) gpa_t cr2_or_gpa = apf->cr2_or_gpa; int locked = 1; bool first; + long ret; might_sleep(); @@ -61,11 +62,14 @@ static void async_pf_execute(struct work_struct *work) * access remotely. */ down_read(&mm->mmap_sem); - get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, - &locked); + ret = get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, + &locked); if (locked) up_read(&mm->mmap_sem); + if (ret < 0) + apf->error_code = ret; + if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC)) kvm_arch_async_page_present(vcpu, apf); From patchwork Tue Jun 16 21:48:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11608713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C83F13A0 for ; Tue, 16 Jun 2020 21:49:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A2062098B for ; Tue, 16 Jun 2020 21:49:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DDUC1sUU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726433AbgFPVtY (ORCPT ); Tue, 16 Jun 2020 17:49:24 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:36737 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726044AbgFPVtW (ORCPT ); Tue, 16 Jun 2020 17:49:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592344159; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oDD3KfbOqwS5bW7URIU0KF/wEHz10zThSyNo+J1Fe9Q=; b=DDUC1sUU9qb6nEDUMVXP6EZno2vu6cm1WzwgW4Kq96uBgmAaK46cfNma90/a2Idh7NI1CC oozmRm5hiL5gR1O7v6N28lPUvP7QmjMm5knqEtXIt08akxOgWzIHHQxCVYhVapUJBL/L2S i8LRFA/sbcr0hoiAR4Aj60JqnV/ISPA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-109-44JbRXPJN9ecGvY5VUOvhg-1; Tue, 16 Jun 2020 17:49:16 -0400 X-MC-Unique: 44JbRXPJN9ecGvY5VUOvhg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 149F010059B8; Tue, 16 Jun 2020 21:49:10 +0000 (UTC) Received: from horse.redhat.com (ovpn-114-132.rdu2.redhat.com [10.10.114.132]) by smtp.corp.redhat.com (Postfix) with ESMTP id 511E15D9D3; Tue, 16 Jun 2020 21:49:03 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id C8781225E4B; Tue, 16 Jun 2020 17:49:02 -0400 (EDT) From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, vgoyal@redhat.com, vkuznets@redhat.com, pbonzini@redhat.com, wanpengli@tencent.com, sean.j.christopherson@intel.com Subject: [PATCH 2/3] kvm: Add capability to be able to report async pf error to guest Date: Tue, 16 Jun 2020 17:48:46 -0400 Message-Id: <20200616214847.24482-3-vgoyal@redhat.com> In-Reply-To: <20200616214847.24482-1-vgoyal@redhat.com> References: <20200616214847.24482-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org As of now asynchronous page fault mecahanism assumes host will always be successful in resolving page fault. So there are only two states, that is page is not present and page is ready. If a page is backed by a file and that file has been truncated (as can be the case with virtio-fs), then page fault handler on host returns -EFAULT. As of now async page fault logic does not look at error code (-EFAULT) returned by get_user_pages_remote() and returns PAGE_READY to guest. Guest tries to access page and page fault happnes again. And this gets kvm into an infinite loop. (Killing host process gets kvm out of this loop though). This patch adds another state to async page fault logic which allows host to return error to guest. Once guest knows that async page fault can't be resolved, it can send SIGBUS to host process (if user space was accessing the page in question). Signed-off-by: Vivek Goyal --- Documentation/virt/kvm/cpuid.rst | 4 +++ Documentation/virt/kvm/msr.rst | 10 +++++--- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/include/asm/kvm_para.h | 8 +++--- arch/x86/include/uapi/asm/kvm_para.h | 10 ++++++-- arch/x86/kernel/kvm.c | 34 ++++++++++++++++++++----- arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/x86.c | 38 ++++++++++++++++++++-------- 9 files changed, 83 insertions(+), 29 deletions(-) diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index a7dff9186bed..a4497f3a6e7f 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit async pf acknowledgment msr 0x4b564d07. +KVM_FEATURE_ASYNC_PF_ERROR 15 paravirtualized async PF error + can be enabled by setting bit 4 + when writing to msr 0x4b564d02 + KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expeced in kvmclock diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst index e37a14c323d2..513eb8e0b0eb 100644 --- a/Documentation/virt/kvm/msr.rst +++ b/Documentation/virt/kvm/msr.rst @@ -202,19 +202,21 @@ data: /* Used for 'page ready' events delivered via interrupt notification */ __u32 token; - - __u8 pad[56]; + __u32 ready_flags; + __u8 pad[52]; __u32 enabled; }; - Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1 + Bits 5 of the MSR is reserved and should be zero. Bit 0 is set to 1 when asynchronous page faults are enabled on the vcpu, 0 when disabled. Bit 1 is 1 if asynchronous page faults can be injected when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as #PF vmexits. Bit 2 can be set only if KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID. Bit 3 enables interrupt based delivery of 'page ready' events. Bit 3 can only be set if KVM_FEATURE_ASYNC_PF_INT is present in - CPUID. + CPUID. Bit 4 enables reporting of page fault errors if hypervisor + encounters errors while faulting in page. Bit 4 can only be set if + KVM_FEATURE_ASYNC_PF_ERROR is present in CPUID. 'Page not present' events are currently always delivered as synthetic #PF exception. During delivery of these events APF CR2 register contains diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 348a73106556..ff8dbc604dbe 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -779,6 +779,7 @@ struct kvm_vcpu_arch { bool delivery_as_pf_vmexit; bool pageready_pending; gfn_t error_gfn; + bool send_pf_error; } apf; /* OSVW MSRs (AMD only) */ @@ -1680,6 +1681,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu); bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu); +void kvm_arch_async_page_fault_error(struct kvm_vcpu *vcpu, + struct kvm_async_pf *work); extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu); diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index bbc43e5411d9..6c738e91ca2c 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -89,8 +89,8 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, bool kvm_para_available(void); unsigned int kvm_arch_para_features(void); unsigned int kvm_arch_para_hints(void); -void kvm_async_pf_task_wait_schedule(u32 token); -void kvm_async_pf_task_wake(u32 token); +void kvm_async_pf_task_wait_schedule(u32 token, bool user_mode); +void kvm_async_pf_task_wake(u32 token, bool is_err); u32 kvm_read_and_reset_apf_flags(void); void kvm_disable_steal_time(void); bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token); @@ -120,8 +120,8 @@ static inline void kvm_spinlock_init(void) #endif /* CONFIG_PARAVIRT_SPINLOCKS */ #else /* CONFIG_KVM_GUEST */ -#define kvm_async_pf_task_wait_schedule(T) do {} while(0) -#define kvm_async_pf_task_wake(T) do {} while(0) +#define kvm_async_pf_task_wait_schedule(T, U) do {} while(0) +#define kvm_async_pf_task_wake(T, I) do {} while(0) static inline bool kvm_para_available(void) { diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 812e9b4c1114..b43fd2ddc949 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -32,6 +32,7 @@ #define KVM_FEATURE_POLL_CONTROL 12 #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 +#define KVM_FEATURE_ASYNC_PF_ERROR 15 #define KVM_HINTS_REALTIME 0 @@ -85,6 +86,7 @@ struct kvm_clock_pairing { #define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1) #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT (1 << 2) #define KVM_ASYNC_PF_DELIVERY_AS_INT (1 << 3) +#define KVM_ASYNC_PF_SEND_ERROR (1 << 4) /* MSR_KVM_ASYNC_PF_INT */ #define KVM_ASYNC_PF_VEC_MASK GENMASK(7, 0) @@ -119,14 +121,18 @@ struct kvm_mmu_op_release_pt { #define KVM_PV_REASON_PAGE_NOT_PRESENT 1 #define KVM_PV_REASON_PAGE_READY 2 + +/* Bit flags for ready_flags field */ +#define KVM_PV_REASON_PAGE_ERROR 1 + struct kvm_vcpu_pv_apf_data { /* Used for 'page not present' events delivered via #PF */ __u32 flags; /* Used for 'page ready' events delivered via interrupt notification */ __u32 token; - - __u8 pad[56]; + __u32 ready_flags; + __u8 pad[52]; __u32 enabled; }; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index ff95429d206b..2ee9c9d971ae 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -75,6 +75,7 @@ struct kvm_task_sleep_node { struct swait_queue_head wq; u32 token; int cpu; + bool is_err; }; static struct kvm_task_sleep_head { @@ -97,7 +98,14 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b, return NULL; } -static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n) +static void handle_async_pf_error(bool user_mode) +{ + if (user_mode) + send_sig_info(SIGBUS, SEND_SIG_PRIV, current); +} + +static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n, + bool user_mode) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -107,6 +115,8 @@ static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n) e = _find_apf_task(b, token); if (e) { /* dummy entry exist -> wake up was delivered ahead of PF */ + if (e->is_err) + handle_async_pf_error(user_mode); hlist_del(&e->link); raw_spin_unlock(&b->lock); kfree(e); @@ -128,14 +138,14 @@ static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n) * Invoked from the async pagefault handling code or from the VM exit page * fault handler. In both cases RCU is watching. */ -void kvm_async_pf_task_wait_schedule(u32 token) +void kvm_async_pf_task_wait_schedule(u32 token, bool user_mode) { struct kvm_task_sleep_node n; DECLARE_SWAITQUEUE(wait); lockdep_assert_irqs_disabled(); - if (!kvm_async_pf_queue_task(token, &n)) + if (!kvm_async_pf_queue_task(token, &n, user_mode)) return; for (;;) { @@ -148,6 +158,8 @@ void kvm_async_pf_task_wait_schedule(u32 token) local_irq_disable(); } finish_swait(&n.wq, &wait); + if (n.is_err) + handle_async_pf_error(user_mode); } EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait_schedule); @@ -177,7 +189,7 @@ static void apf_task_wake_all(void) } } -void kvm_async_pf_task_wake(u32 token) +void kvm_async_pf_task_wake(u32 token, bool is_err) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; @@ -208,9 +220,11 @@ void kvm_async_pf_task_wake(u32 token) } n->token = token; n->cpu = smp_processor_id(); + n->is_err = is_err; init_swait_queue_head(&n->wq); hlist_add_head(&n->link, &b->list); } else { + n->is_err = is_err; apf_task_wake_one(n); } raw_spin_unlock(&b->lock); @@ -256,14 +270,15 @@ bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token) panic("Host injected async #PF in kernel mode\n"); /* Page is swapped out by the host. */ - kvm_async_pf_task_wait_schedule(token); + kvm_async_pf_task_wait_schedule(token, user_mode(regs)); return true; } NOKPROBE_SYMBOL(__kvm_handle_async_pf); __visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs) { - u32 token; + u32 token, ready_flags; + bool is_err; entering_ack_irq(); @@ -271,7 +286,9 @@ __visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs) if (__this_cpu_read(apf_reason.enabled)) { token = __this_cpu_read(apf_reason.token); - kvm_async_pf_task_wake(token); + ready_flags = __this_cpu_read(apf_reason.ready_flags); + is_err = ready_flags & KVM_PV_REASON_PAGE_ERROR; + kvm_async_pf_task_wake(token, is_err); __this_cpu_write(apf_reason.token, 0); wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1); } @@ -335,6 +352,9 @@ static void kvm_guest_cpu_init(void) wrmsrl(MSR_KVM_ASYNC_PF_INT, KVM_ASYNC_PF_VECTOR); + if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_ERROR)) + pa |= KVM_ASYNC_PF_SEND_ERROR; + wrmsrl(MSR_KVM_ASYNC_PF_EN, pa); __this_cpu_write(apf_reason.enabled, 1); pr_info("KVM setup async PF for cpu %d\n", smp_processor_id()); diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 253b8e875ccd..f811f36e0c24 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) (1 << KVM_FEATURE_PV_SEND_IPI) | (1 << KVM_FEATURE_POLL_CONTROL) | (1 << KVM_FEATURE_PV_SCHED_YIELD) | - (1 << KVM_FEATURE_ASYNC_PF_INT); + (1 << KVM_FEATURE_ASYNC_PF_INT) | + (1 << KVM_FEATURE_ASYNC_PF_ERROR); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e207900ab303..634182bb07c7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4175,7 +4175,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, } else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) { vcpu->arch.apf.host_apf_flags = 0; local_irq_disable(); - kvm_async_pf_task_wait_schedule(fault_address); + kvm_async_pf_task_wait_schedule(fault_address, 0); local_irq_enable(); } else { WARN_ONCE(1, "Unexpected host async PF flags: %x\n", flags); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4c5205434b9c..c3b2097f2eaa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2690,8 +2690,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) { gpa_t gpa = data & ~0x3f; - /* Bits 4:5 are reserved, Should be zero */ - if (data & 0x30) + /* Bits 5 is reserved, Should be zero */ + if (data & 0x20) return 1; vcpu->arch.apf.msr_en_val = data; @@ -2703,11 +2703,12 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) } if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa, - sizeof(u64))) + sizeof(u64) + sizeof(u32))) return 1; vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT; + vcpu->arch.apf.send_pf_error = data & KVM_ASYNC_PF_SEND_ERROR; kvm_async_pf_wakeup_all(vcpu); @@ -10481,12 +10482,25 @@ static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu) sizeof(reason)); } -static inline int apf_put_user_ready(struct kvm_vcpu *vcpu, u32 token) +static inline int apf_put_user_ready(struct kvm_vcpu *vcpu, u32 token, + bool is_err) { unsigned int offset = offsetof(struct kvm_vcpu_pv_apf_data, token); + int ret; + u32 ready_flags = 0; + + if (is_err && vcpu->arch.apf.send_pf_error) + ready_flags = KVM_PV_REASON_PAGE_ERROR; + + ret = kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.apf.data, + &token, offset, sizeof(token)); + if (ret < 0) + return ret; + offset = offsetof(struct kvm_vcpu_pv_apf_data, ready_flags); return kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.apf.data, - &token, offset, sizeof(token)); + &ready_flags, offset, + sizeof(ready_flags)); } static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu) @@ -10571,6 +10585,8 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { + bool present_injected = false; + struct kvm_lapic_irq irq = { .delivery_mode = APIC_DM_FIXED, .vector = vcpu->arch.apf.vec @@ -10584,16 +10600,18 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, if ((work->wakeup_all || work->notpresent_injected) && kvm_pv_async_pf_enabled(vcpu) && - !apf_put_user_ready(vcpu, work->arch.token)) { + !apf_put_user_ready(vcpu, work->arch.token, work->error_code)) { vcpu->arch.apf.pageready_pending = true; kvm_apic_set_irq(vcpu, &irq, NULL); + present_injected = true; } - if (work->error_code) { + if (work->error_code && + (!present_injected || !vcpu->arch.apf.send_pf_error)) { /* - * Page fault failed and we don't have a way to report - * errors back to guest yet. So save this pfn and force - * sync page fault next time. + * Page fault failed. If we did not report error back + * to guest, save this pfn and force sync page fault next + * time. */ vcpu->arch.apf.error_gfn = work->cr2_or_gpa >> PAGE_SHIFT; } From patchwork Tue Jun 16 21:48:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 11608717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C880913B6 for ; Tue, 16 Jun 2020 21:49:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B05AF2082E for ; Tue, 16 Jun 2020 21:49:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EF6NDaDP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726335AbgFPVtP (ORCPT ); Tue, 16 Jun 2020 17:49:15 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:60925 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725901AbgFPVtO (ORCPT ); Tue, 16 Jun 2020 17:49:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592344152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e6PBX3SKVg9nqUd6syjhG1IgYhvtY/KB/e1VnPReQ6w=; b=EF6NDaDPC1Zjn0hHKfyl1qGM8Z244e3IfKz7w4hCemY+QF01M7Kx9+mKKcIaKZ58cbPDWB /lRV+JX9V/SmmD0vdgsxyHS5dS/RaqcVGMcJQsjeGU+0qcPGIMIjjURbzAp2+kPi5dxO/f xqFn43MDirNoa5uWWsKHV2+g3SkHz34= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-386-k1TY7La4NrqErVOPt9cjEQ-1; Tue, 16 Jun 2020 17:49:11 -0400 X-MC-Unique: k1TY7La4NrqErVOPt9cjEQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DE5F8E919; Tue, 16 Jun 2020 21:49:09 +0000 (UTC) Received: from horse.redhat.com (ovpn-114-132.rdu2.redhat.com [10.10.114.132]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5DB9719D7B; Tue, 16 Jun 2020 21:49:03 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id CCDBC225E4C; Tue, 16 Jun 2020 17:49:02 -0400 (EDT) From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, vgoyal@redhat.com, vkuznets@redhat.com, pbonzini@redhat.com, wanpengli@tencent.com, sean.j.christopherson@intel.com Subject: [PATCH 3/3] kvm, async_pf: Use FOLL_WRITE only for write faults Date: Tue, 16 Jun 2020 17:48:47 -0400 Message-Id: <20200616214847.24482-4-vgoyal@redhat.com> In-Reply-To: <20200616214847.24482-1-vgoyal@redhat.com> References: <20200616214847.24482-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org async_pf_execute() calls get_user_pages_remote() and uses FOLL_WRITE always. It does not matter whether fault happened due to read/write. This creates issues if vma is mapped for reading and does not have VM_WRITE set and check_vma_flags() fails in __get_user_pages() and get_user_pages_remote() returns -EFAULT. So far this was not an issue, as we don't care about return code from get_user_pages_remote(). But soon we want to look at this error code and if file got truncated and page can't be faulted in, we want to inject error in guest and let guest take appropriate action (Either send SIGBUS to guest process or do exception table handling or possibly die). Hence, we don't want to get -EFAULT erroneously. Pass FOLL_WRITE only if it is write fault. Signed-off-by: Vivek Goyal --- arch/x86/kvm/mmu/mmu.c | 7 ++++--- include/linux/kvm_host.h | 4 +++- virt/kvm/async_pf.c | 9 +++++++-- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 634182bb07c7..15969c4c8925 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4046,7 +4046,7 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr) } static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - gfn_t gfn) + gfn_t gfn, bool write) { struct kvm_arch_async_pf arch; @@ -4056,7 +4056,8 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu); return kvm_setup_async_pf(vcpu, cr2_or_gpa, - kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); + kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch, + write); } static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, @@ -4084,7 +4085,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn); kvm_make_request(KVM_REQ_APF_HALT, vcpu); return true; - } else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn)) + } else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn, write)) return true; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b8558334b184..a7c3999a7374 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -208,12 +208,14 @@ struct kvm_async_pf { bool wakeup_all; bool notpresent_injected; int error_code; + bool write; }; void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu); void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu); int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - unsigned long hva, struct kvm_arch_async_pf *arch); + unsigned long hva, struct kvm_arch_async_pf *arch, + bool write); int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 6b30374a4de1..1d2dc267f9b8 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -53,6 +53,7 @@ static void async_pf_execute(struct work_struct *work) int locked = 1; bool first; long ret; + unsigned int gup_flags = 0; might_sleep(); @@ -61,8 +62,10 @@ static void async_pf_execute(struct work_struct *work) * mm and might be done in another context, so we must * access remotely. */ + if (apf->write) + gup_flags = FOLL_WRITE; down_read(&mm->mmap_sem); - ret = get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, + ret = get_user_pages_remote(NULL, mm, addr, 1, gup_flags, NULL, NULL, &locked); if (locked) up_read(&mm->mmap_sem); @@ -161,7 +164,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) } int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - unsigned long hva, struct kvm_arch_async_pf *arch) + unsigned long hva, struct kvm_arch_async_pf *arch, + bool write) { struct kvm_async_pf *work; @@ -186,6 +190,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, work->addr = hva; work->arch = *arch; work->mm = current->mm; + work->write = write; mmget(work->mm); kvm_get_kvm(work->vcpu->kvm);