From patchwork Tue Jun 16 21:48:45 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Vivek Goyal <vgoyal@redhat.com>
X-Patchwork-Id: 11608711
Return-Path: <SRS0=jnd6=75=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2870013A0
	for <patchwork-kvm@patchwork.kernel.org>;
 Tue, 16 Jun 2020 21:49:16 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 09D572082E
	for <patchwork-kvm@patchwork.kernel.org>;
 Tue, 16 Jun 2020 21:49:15 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com
 header.b="b+GLksqh"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726312AbgFPVtP (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Tue, 16 Jun 2020 17:49:15 -0400
Received: from us-smtp-1.mimecast.com ([205.139.110.61]:29075 "EHLO
        us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1725849AbgFPVtO (ORCPT
        <rfc822;kvm@vger.kernel.org>); Tue, 16 Jun 2020 17:49:14 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1592344152;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=HJopFOxb2oxM6WdAq1omP3wKyWlk0t81tbJtj1deElA=;
        b=b+GLksqhNcP1Z0F1sXEEof3580ZAxXSvb6Kv+sO/39CtSRHfeceJ1E5jvrEDZ8RQO5lRcj
        s6tafEp38X4m2HelIGRqxwliuleqxp9NsyjA42S5UR6cg67SZiuMh2T5SyNZ2rjV0v+0WR
        Pk7ZY7aVK3aeJxRC10AlZMoE6cOIyHk=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-274-9HvqEOY8M165iHPSCcEe1A-1; Tue, 16 Jun 2020 17:49:11 -0400
X-MC-Unique: 9HvqEOY8M165iHPSCcEe1A-1
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com
 [10.5.11.23])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D315B18585C8;
        Tue, 16 Jun 2020 21:49:09 +0000 (UTC)
Received: from horse.redhat.com (ovpn-114-132.rdu2.redhat.com [10.10.114.132])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 510E419C71;
        Tue, 16 Jun 2020 21:49:03 +0000 (UTC)
Received: by horse.redhat.com (Postfix, from userid 10451)
        id C4B81225E4A; Tue, 16 Jun 2020 17:49:02 -0400 (EDT)
From: Vivek Goyal <vgoyal@redhat.com>
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com,
        dgilbert@redhat.com, vgoyal@redhat.com, vkuznets@redhat.com,
        pbonzini@redhat.com, wanpengli@tencent.com,
        sean.j.christopherson@intel.com
Subject: [PATCH 1/3] kvm,x86: Force sync fault if previous attempts failed
Date: Tue, 16 Jun 2020 17:48:45 -0400
Message-Id: <20200616214847.24482-2-vgoyal@redhat.com>
In-Reply-To: <20200616214847.24482-1-vgoyal@redhat.com>
References: <20200616214847.24482-1-vgoyal@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Page fault error handling behavior in kvm seems little inconsistent when
page fault reports error. If we are doing fault synchronously
then we capture error (-EFAULT) returned by __gfn_to_pfn_memslot() and
exit to user space and qemu reports error, "error: kvm run failed Bad address".

But if we are doing async page fault, then async_pf_execute() will simply
ignore the error reported by get_user_pages_remote(). It is assumed that
page fault was successful and either a page ready event is injected in
guest or guest is brought out of artificial halt state and run again.
In both the cases when guest retries the instruction, it takes exit
again as page fault was not successful in previous attempt. And then
this infinite loop continues forever.

This patch tries to make this behavior consistent. That is instead of
getting into infinite loop of retrying page fault, exit to user space
and stop VM if page fault error happens. This can be improved by
injecting errors in guest when it is allowed. Later patches can
inject error when a process in guest triggered page fault and
in that case guest process will receive SIGBUS. Currently we don't
have a way to inject errors when guest is in kernel mode. Once we
have race free method to do so, we should be able to inject errors
and guest can do fixup_exception() if caller set it up so (virtio-fs).

When async pf encounters error then save that pfn and when next time
guest retries, do a sync fault instead of async fault. So that if error
is encountered, we exit to qemu and avoid infinite loop.

As of now only one error pfn is stored and that means it could be
overwritten before next a retry from guest happens. But this is
just a hint and if we miss it, some other time we will catch it.
If this becomes an issue, we could maintain an array of error
gfn later to help ease the issue.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.h              |  2 +-
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 arch/x86/kvm/x86.c              | 19 +++++++++++++++++--
 include/linux/kvm_host.h        |  1 +
 virt/kvm/async_pf.c             |  8 ++++++--
 6 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 938497a6ebd7..348a73106556 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -778,6 +778,7 @@ struct kvm_vcpu_arch {
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 		bool pageready_pending;
+		gfn_t error_gfn;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0ad06bfe2c2c..6fa085ff07a3 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -60,7 +60,7 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots);
 void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 			     bool accessed_dirty, gpa_t new_eptp);
-bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
+bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				u64 fault_address, char *insn, int insn_len);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 15984dfde427..e207900ab303 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4078,7 +4078,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	if (!async)
 		return false; /* *pfn has correct page already */
 
-	if (!prefault && kvm_can_do_async_pf(vcpu)) {
+	if (!prefault && kvm_can_do_async_pf(vcpu, cr2_or_gpa >> PAGE_SHIFT)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
 		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
 			trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 697d1b273a2f..4c5205434b9c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10513,7 +10513,7 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
 	return true;
 }
 
-bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
+bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	if (unlikely(!lapic_in_kernel(vcpu) ||
 		     kvm_event_needs_reinjection(vcpu) ||
@@ -10527,7 +10527,13 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 	 * If interrupts are off we cannot even use an artificial
 	 * halt state.
 	 */
-	return kvm_arch_interrupt_allowed(vcpu);
+	if (!kvm_arch_interrupt_allowed(vcpu))
+		return false;
+
+	if (vcpu->arch.apf.error_gfn == gfn)
+		return false;
+
+	return true;
 }
 
 bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
@@ -10583,6 +10589,15 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		kvm_apic_set_irq(vcpu, &irq, NULL);
 	}
 
+	if (work->error_code) {
+		/*
+		 * Page fault failed and we don't have a way to report
+		 * errors back to guest yet. So save this pfn and force
+		 * sync page fault next time.
+		 */
+		vcpu->arch.apf.error_gfn = work->cr2_or_gpa >> PAGE_SHIFT;
+	}
+
 	vcpu->arch.apf.halted = false;
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 216cdb6581e2..b8558334b184 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -207,6 +207,7 @@ struct kvm_async_pf {
 	struct kvm_arch_async_pf arch;
 	bool   wakeup_all;
 	bool notpresent_injected;
+	int error_code;
 };
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index a36828fbf40a..6b30374a4de1 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -52,6 +52,7 @@ static void async_pf_execute(struct work_struct *work)
 	gpa_t cr2_or_gpa = apf->cr2_or_gpa;
 	int locked = 1;
 	bool first;
+	long ret;
 
 	might_sleep();
 
@@ -61,11 +62,14 @@ static void async_pf_execute(struct work_struct *work)
 	 * access remotely.
 	 */
 	down_read(&mm->mmap_sem);
-	get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
-			&locked);
+	ret = get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+				    &locked);
 	if (locked)
 		up_read(&mm->mmap_sem);
 
+	if (ret < 0)
+		apf->error_code = ret;
+
 	if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC))
 		kvm_arch_async_page_present(vcpu, apf);
 

From patchwork Tue Jun 16 21:48:46 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Vivek Goyal <vgoyal@redhat.com>
X-Patchwork-Id: 11608713
Return-Path: <SRS0=jnd6=75=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C83F13A0
	for <patchwork-kvm@patchwork.kernel.org>;
 Tue, 16 Jun 2020 21:49:25 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 3A2062098B
	for <patchwork-kvm@patchwork.kernel.org>;
 Tue, 16 Jun 2020 21:49:25 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com
 header.b="DDUC1sUU"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726433AbgFPVtY (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Tue, 16 Jun 2020 17:49:24 -0400
Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:36737 "EHLO
        us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
        with ESMTP id S1726044AbgFPVtW (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 16 Jun 2020 17:49:22 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1592344159;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=oDD3KfbOqwS5bW7URIU0KF/wEHz10zThSyNo+J1Fe9Q=;
        b=DDUC1sUU9qb6nEDUMVXP6EZno2vu6cm1WzwgW4Kq96uBgmAaK46cfNma90/a2Idh7NI1CC
        oozmRm5hiL5gR1O7v6N28lPUvP7QmjMm5knqEtXIt08akxOgWzIHHQxCVYhVapUJBL/L2S
        i8LRFA/sbcr0hoiAR4Aj60JqnV/ISPA=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-109-44JbRXPJN9ecGvY5VUOvhg-1; Tue, 16 Jun 2020 17:49:16 -0400
X-MC-Unique: 44JbRXPJN9ecGvY5VUOvhg-1
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com
 [10.5.11.14])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 149F010059B8;
        Tue, 16 Jun 2020 21:49:10 +0000 (UTC)
Received: from horse.redhat.com (ovpn-114-132.rdu2.redhat.com [10.10.114.132])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 511E15D9D3;
        Tue, 16 Jun 2020 21:49:03 +0000 (UTC)
Received: by horse.redhat.com (Postfix, from userid 10451)
        id C8781225E4B; Tue, 16 Jun 2020 17:49:02 -0400 (EDT)
From: Vivek Goyal <vgoyal@redhat.com>
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com,
        dgilbert@redhat.com, vgoyal@redhat.com, vkuznets@redhat.com,
        pbonzini@redhat.com, wanpengli@tencent.com,
        sean.j.christopherson@intel.com
Subject: [PATCH 2/3] kvm: Add capability to be able to report async pf error
 to guest
Date: Tue, 16 Jun 2020 17:48:46 -0400
Message-Id: <20200616214847.24482-3-vgoyal@redhat.com>
In-Reply-To: <20200616214847.24482-1-vgoyal@redhat.com>
References: <20200616214847.24482-1-vgoyal@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

As of now asynchronous page fault mecahanism assumes host will always be
successful in resolving page fault. So there are only two states, that
is page is not present and page is ready.

If a page is backed by a file and that file has been truncated (as
can be the case with virtio-fs), then page fault handler on host returns
-EFAULT.

As of now async page fault logic does not look at error code (-EFAULT)
returned by get_user_pages_remote() and returns PAGE_READY to guest.
Guest tries to access page and page fault happnes again. And this
gets kvm into an infinite loop. (Killing host process gets kvm out of
this loop though).

This patch adds another state to async page fault logic which allows
host to return error to guest. Once guest knows that async page fault
can't be resolved, it can send SIGBUS to host process (if user space
was accessing the page in question).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     |  4 +++
 Documentation/virt/kvm/msr.rst       | 10 +++++---
 arch/x86/include/asm/kvm_host.h      |  3 +++
 arch/x86/include/asm/kvm_para.h      |  8 +++---
 arch/x86/include/uapi/asm/kvm_para.h | 10 ++++++--
 arch/x86/kernel/kvm.c                | 34 ++++++++++++++++++++-----
 arch/x86/kvm/cpuid.c                 |  3 ++-
 arch/x86/kvm/mmu/mmu.c               |  2 +-
 arch/x86/kvm/x86.c                   | 38 ++++++++++++++++++++--------
 9 files changed, 83 insertions(+), 29 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index a7dff9186bed..a4497f3a6e7f 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -92,6 +92,10 @@ KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
                                               async pf acknowledgment msr
                                               0x4b564d07.
 
+KVM_FEATURE_ASYNC_PF_ERROR        15          paravirtualized async PF error
+                                              can be enabled by setting bit 4
+                                              when writing to msr 0x4b564d02
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index e37a14c323d2..513eb8e0b0eb 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -202,19 +202,21 @@ data:
 
 		/* Used for 'page ready' events delivered via interrupt notification */
 		__u32 token;
-
-		__u8 pad[56];
+                __u32 ready_flags;
+		__u8 pad[52];
 		__u32 enabled;
 	  };
 
-	Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1
+	Bits 5 of the MSR is reserved and should be zero. Bit 0 is set to 1
 	when asynchronous page faults are enabled on the vcpu, 0 when disabled.
 	Bit 1 is 1 if asynchronous page faults can be injected when vcpu is in
 	cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as
 	#PF vmexits.  Bit 2 can be set only if KVM_FEATURE_ASYNC_PF_VMEXIT is
 	present in CPUID. Bit 3 enables interrupt based delivery of 'page ready'
 	events. Bit 3 can only be set if KVM_FEATURE_ASYNC_PF_INT is present in
-	CPUID.
+	CPUID. Bit 4 enables reporting of page fault errors if hypervisor
+        encounters errors while faulting in page. Bit 4 can only be set if
+        KVM_FEATURE_ASYNC_PF_ERROR is present in CPUID.
 
 	'Page not present' events are currently always delivered as synthetic
 	#PF exception. During delivery of these events APF CR2 register contains
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 348a73106556..ff8dbc604dbe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm_vcpu_arch {
 		bool delivery_as_pf_vmexit;
 		bool pageready_pending;
 		gfn_t error_gfn;
+		bool send_pf_error;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
@@ -1680,6 +1681,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
 void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
 bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
+void kvm_arch_async_page_fault_error(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index bbc43e5411d9..6c738e91ca2c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -89,8 +89,8 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
-void kvm_async_pf_task_wait_schedule(u32 token);
-void kvm_async_pf_task_wake(u32 token);
+void kvm_async_pf_task_wait_schedule(u32 token, bool user_mode);
+void kvm_async_pf_task_wake(u32 token, bool is_err);
 u32 kvm_read_and_reset_apf_flags(void);
 void kvm_disable_steal_time(void);
 bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token);
@@ -120,8 +120,8 @@ static inline void kvm_spinlock_init(void)
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 #else /* CONFIG_KVM_GUEST */
-#define kvm_async_pf_task_wait_schedule(T) do {} while(0)
-#define kvm_async_pf_task_wake(T) do {} while(0)
+#define kvm_async_pf_task_wait_schedule(T, U) do {} while(0)
+#define kvm_async_pf_task_wake(T, I) do {} while(0)
 
 static inline bool kvm_para_available(void)
 {
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..b43fd2ddc949 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_ASYNC_PF_ERROR	15
 
 #define KVM_HINTS_REALTIME      0
 
@@ -85,6 +86,7 @@ struct kvm_clock_pairing {
 #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
 #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
 #define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
+#define KVM_ASYNC_PF_SEND_ERROR			(1 << 4)
 
 /* MSR_KVM_ASYNC_PF_INT */
 #define KVM_ASYNC_PF_VEC_MASK			GENMASK(7, 0)
@@ -119,14 +121,18 @@ struct kvm_mmu_op_release_pt {
 #define KVM_PV_REASON_PAGE_NOT_PRESENT 1
 #define KVM_PV_REASON_PAGE_READY 2
 
+
+/* Bit flags for ready_flags field */
+#define KVM_PV_REASON_PAGE_ERROR 1
+
 struct kvm_vcpu_pv_apf_data {
 	/* Used for 'page not present' events delivered via #PF */
 	__u32 flags;
 
 	/* Used for 'page ready' events delivered via interrupt notification */
 	__u32 token;
-
-	__u8 pad[56];
+	__u32 ready_flags;
+	__u8 pad[52];
 	__u32 enabled;
 };
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index ff95429d206b..2ee9c9d971ae 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,6 +75,7 @@ struct kvm_task_sleep_node {
 	struct swait_queue_head wq;
 	u32 token;
 	int cpu;
+	bool is_err;
 };
 
 static struct kvm_task_sleep_head {
@@ -97,7 +98,14 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
 	return NULL;
 }
 
-static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n)
+static void handle_async_pf_error(bool user_mode)
+{
+	if (user_mode)
+		send_sig_info(SIGBUS, SEND_SIG_PRIV, current);
+}
+
+static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n,
+				    bool user_mode)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -107,6 +115,8 @@ static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n)
 	e = _find_apf_task(b, token);
 	if (e) {
 		/* dummy entry exist -> wake up was delivered ahead of PF */
+		if (e->is_err)
+			handle_async_pf_error(user_mode);
 		hlist_del(&e->link);
 		raw_spin_unlock(&b->lock);
 		kfree(e);
@@ -128,14 +138,14 @@ static bool kvm_async_pf_queue_task(u32 token, struct kvm_task_sleep_node *n)
  * Invoked from the async pagefault handling code or from the VM exit page
  * fault handler. In both cases RCU is watching.
  */
-void kvm_async_pf_task_wait_schedule(u32 token)
+void kvm_async_pf_task_wait_schedule(u32 token, bool user_mode)
 {
 	struct kvm_task_sleep_node n;
 	DECLARE_SWAITQUEUE(wait);
 
 	lockdep_assert_irqs_disabled();
 
-	if (!kvm_async_pf_queue_task(token, &n))
+	if (!kvm_async_pf_queue_task(token, &n, user_mode))
 		return;
 
 	for (;;) {
@@ -148,6 +158,8 @@ void kvm_async_pf_task_wait_schedule(u32 token)
 		local_irq_disable();
 	}
 	finish_swait(&n.wq, &wait);
+	if (n.is_err)
+		handle_async_pf_error(user_mode);
 }
 EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait_schedule);
 
@@ -177,7 +189,7 @@ static void apf_task_wake_all(void)
 	}
 }
 
-void kvm_async_pf_task_wake(u32 token)
+void kvm_async_pf_task_wake(u32 token, bool is_err)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -208,9 +220,11 @@ void kvm_async_pf_task_wake(u32 token)
 		}
 		n->token = token;
 		n->cpu = smp_processor_id();
+		n->is_err = is_err;
 		init_swait_queue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
 	} else {
+		n->is_err = is_err;
 		apf_task_wake_one(n);
 	}
 	raw_spin_unlock(&b->lock);
@@ -256,14 +270,15 @@ bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 		panic("Host injected async #PF in kernel mode\n");
 
 	/* Page is swapped out by the host. */
-	kvm_async_pf_task_wait_schedule(token);
+	kvm_async_pf_task_wait_schedule(token, user_mode(regs));
 	return true;
 }
 NOKPROBE_SYMBOL(__kvm_handle_async_pf);
 
 __visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs)
 {
-	u32 token;
+	u32 token, ready_flags;
+	bool is_err;
 
 	entering_ack_irq();
 
@@ -271,7 +286,9 @@ __visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs)
 
 	if (__this_cpu_read(apf_reason.enabled)) {
 		token = __this_cpu_read(apf_reason.token);
-		kvm_async_pf_task_wake(token);
+		ready_flags = __this_cpu_read(apf_reason.ready_flags);
+		is_err = ready_flags & KVM_PV_REASON_PAGE_ERROR;
+		kvm_async_pf_task_wake(token, is_err);
 		__this_cpu_write(apf_reason.token, 0);
 		wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
 	}
@@ -335,6 +352,9 @@ static void kvm_guest_cpu_init(void)
 
 		wrmsrl(MSR_KVM_ASYNC_PF_INT, KVM_ASYNC_PF_VECTOR);
 
+		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_ERROR))
+			pa |= KVM_ASYNC_PF_SEND_ERROR;
+
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
 		pr_info("KVM setup async PF for cpu %d\n", smp_processor_id());
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 253b8e875ccd..f811f36e0c24 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
 			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
-			     (1 << KVM_FEATURE_ASYNC_PF_INT);
+			     (1 << KVM_FEATURE_ASYNC_PF_INT) |
+			     (1 << KVM_FEATURE_ASYNC_PF_ERROR);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e207900ab303..634182bb07c7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4175,7 +4175,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 	} else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) {
 		vcpu->arch.apf.host_apf_flags = 0;
 		local_irq_disable();
-		kvm_async_pf_task_wait_schedule(fault_address);
+		kvm_async_pf_task_wait_schedule(fault_address, 0);
 		local_irq_enable();
 	} else {
 		WARN_ONCE(1, "Unexpected host async PF flags: %x\n", flags);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c5205434b9c..c3b2097f2eaa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2690,8 +2690,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 4:5 are reserved, Should be zero */
-	if (data & 0x30)
+	/* Bits 5 is reserved, Should be zero */
+	if (data & 0x20)
 		return 1;
 
 	vcpu->arch.apf.msr_en_val = data;
@@ -2703,11 +2703,12 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	}
 
 	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
-					sizeof(u64)))
+					sizeof(u64) + sizeof(u32)))
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
+	vcpu->arch.apf.send_pf_error = data & KVM_ASYNC_PF_SEND_ERROR;
 
 	kvm_async_pf_wakeup_all(vcpu);
 
@@ -10481,12 +10482,25 @@ static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu)
 				      sizeof(reason));
 }
 
-static inline int apf_put_user_ready(struct kvm_vcpu *vcpu, u32 token)
+static inline int apf_put_user_ready(struct kvm_vcpu *vcpu, u32 token,
+				     bool is_err)
 {
 	unsigned int offset = offsetof(struct kvm_vcpu_pv_apf_data, token);
+	int ret;
+	u32 ready_flags = 0;
+
+	if (is_err && vcpu->arch.apf.send_pf_error)
+		ready_flags = KVM_PV_REASON_PAGE_ERROR;
+
+	ret = kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.apf.data,
+					    &token, offset, sizeof(token));
+	if (ret < 0)
+		return ret;
 
+	offset = offsetof(struct kvm_vcpu_pv_apf_data, ready_flags);
 	return kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.apf.data,
-					     &token, offset, sizeof(token));
+					    &ready_flags, offset,
+					    sizeof(ready_flags));
 }
 
 static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu)
@@ -10571,6 +10585,8 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
+	bool present_injected = false;
+
 	struct kvm_lapic_irq irq = {
 		.delivery_mode = APIC_DM_FIXED,
 		.vector = vcpu->arch.apf.vec
@@ -10584,16 +10600,18 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 
 	if ((work->wakeup_all || work->notpresent_injected) &&
 	    kvm_pv_async_pf_enabled(vcpu) &&
-	    !apf_put_user_ready(vcpu, work->arch.token)) {
+	    !apf_put_user_ready(vcpu, work->arch.token, work->error_code)) {
 		vcpu->arch.apf.pageready_pending = true;
 		kvm_apic_set_irq(vcpu, &irq, NULL);
+		present_injected = true;
 	}
 
-	if (work->error_code) {
+	if (work->error_code &&
+	    (!present_injected || !vcpu->arch.apf.send_pf_error)) {
 		/*
-		 * Page fault failed and we don't have a way to report
-		 * errors back to guest yet. So save this pfn and force
-		 * sync page fault next time.
+		 * Page fault failed. If we did not report error back
+		 * to guest, save this pfn and force sync page fault next
+		 * time.
 		 */
 		vcpu->arch.apf.error_gfn = work->cr2_or_gpa >> PAGE_SHIFT;
 	}

From patchwork Tue Jun 16 21:48:47 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Vivek Goyal <vgoyal@redhat.com>
X-Patchwork-Id: 11608717
Return-Path: <SRS0=jnd6=75=vger.kernel.org=kvm-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C880913B6
	for <patchwork-kvm@patchwork.kernel.org>;
 Tue, 16 Jun 2020 21:49:34 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id B05AF2082E
	for <patchwork-kvm@patchwork.kernel.org>;
 Tue, 16 Jun 2020 21:49:34 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com
 header.b="EF6NDaDP"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726335AbgFPVtP (ORCPT
        <rfc822;patchwork-kvm@patchwork.kernel.org>);
        Tue, 16 Jun 2020 17:49:15 -0400
Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:60925 "EHLO
        us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
        with ESMTP id S1725901AbgFPVtO (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 16 Jun 2020 17:49:14 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1592344152;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=e6PBX3SKVg9nqUd6syjhG1IgYhvtY/KB/e1VnPReQ6w=;
        b=EF6NDaDPC1Zjn0hHKfyl1qGM8Z244e3IfKz7w4hCemY+QF01M7Kx9+mKKcIaKZ58cbPDWB
        /lRV+JX9V/SmmD0vdgsxyHS5dS/RaqcVGMcJQsjeGU+0qcPGIMIjjURbzAp2+kPi5dxO/f
        xqFn43MDirNoa5uWWsKHV2+g3SkHz34=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-386-k1TY7La4NrqErVOPt9cjEQ-1; Tue, 16 Jun 2020 17:49:11 -0400
X-MC-Unique: k1TY7La4NrqErVOPt9cjEQ-1
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com
 [10.5.11.23])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DE5F8E919;
        Tue, 16 Jun 2020 21:49:09 +0000 (UTC)
Received: from horse.redhat.com (ovpn-114-132.rdu2.redhat.com [10.10.114.132])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 5DB9719D7B;
        Tue, 16 Jun 2020 21:49:03 +0000 (UTC)
Received: by horse.redhat.com (Postfix, from userid 10451)
        id CCDBC225E4C; Tue, 16 Jun 2020 17:49:02 -0400 (EDT)
From: Vivek Goyal <vgoyal@redhat.com>
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com,
        dgilbert@redhat.com, vgoyal@redhat.com, vkuznets@redhat.com,
        pbonzini@redhat.com, wanpengli@tencent.com,
        sean.j.christopherson@intel.com
Subject: [PATCH 3/3] kvm, async_pf: Use FOLL_WRITE only for write faults
Date: Tue, 16 Jun 2020 17:48:47 -0400
Message-Id: <20200616214847.24482-4-vgoyal@redhat.com>
In-Reply-To: <20200616214847.24482-1-vgoyal@redhat.com>
References: <20200616214847.24482-1-vgoyal@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

async_pf_execute() calls get_user_pages_remote() and uses FOLL_WRITE
always. It does not matter whether fault happened due to read/write.

This creates issues if vma is mapped for reading and does not have
VM_WRITE set and check_vma_flags() fails in __get_user_pages() and
get_user_pages_remote() returns -EFAULT.

So far this was not an issue, as we don't care about return code
from get_user_pages_remote(). But soon we want to look at this error
code and if file got truncated and page can't be faulted in, we want
to inject error in guest and let guest take appropriate action (Either
send SIGBUS to guest process or do exception table handling or possibly
die).

Hence, we don't want to get -EFAULT erroneously. Pass FOLL_WRITE only
if it is write fault.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   | 7 ++++---
 include/linux/kvm_host.h | 4 +++-
 virt/kvm/async_pf.c      | 9 +++++++--
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 634182bb07c7..15969c4c8925 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4046,7 +4046,7 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
 }
 
 static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-				   gfn_t gfn)
+				   gfn_t gfn, bool write)
 {
 	struct kvm_arch_async_pf arch;
 
@@ -4056,7 +4056,8 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu);
 
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
-				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
+				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch,
+				  write);
 }
 
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
@@ -4084,7 +4085,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
 			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
 			return true;
-		} else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn))
+		} else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn, write))
 			return true;
 	}
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b8558334b184..a7c3999a7374 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -208,12 +208,14 @@ struct kvm_async_pf {
 	bool   wakeup_all;
 	bool notpresent_injected;
 	int error_code;
+	bool write;
 };
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
 int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-		       unsigned long hva, struct kvm_arch_async_pf *arch);
+		       unsigned long hva, struct kvm_arch_async_pf *arch,
+		       bool write);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 6b30374a4de1..1d2dc267f9b8 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -53,6 +53,7 @@ static void async_pf_execute(struct work_struct *work)
 	int locked = 1;
 	bool first;
 	long ret;
+	unsigned int gup_flags = 0;
 
 	might_sleep();
 
@@ -61,8 +62,10 @@ static void async_pf_execute(struct work_struct *work)
 	 * mm and might be done in another context, so we must
 	 * access remotely.
 	 */
+	if (apf->write)
+		gup_flags = FOLL_WRITE;
 	down_read(&mm->mmap_sem);
-	ret = get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+	ret = get_user_pages_remote(NULL, mm, addr, 1, gup_flags, NULL, NULL,
 				    &locked);
 	if (locked)
 		up_read(&mm->mmap_sem);
@@ -161,7 +164,8 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 }
 
 int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-		       unsigned long hva, struct kvm_arch_async_pf *arch)
+		       unsigned long hva, struct kvm_arch_async_pf *arch,
+		       bool write)
 {
 	struct kvm_async_pf *work;
 
@@ -186,6 +190,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	work->addr = hva;
 	work->arch = *arch;
 	work->mm = current->mm;
+	work->write = write;
 	mmget(work->mm);
 	kvm_get_kvm(work->vcpu->kvm);