diff mbox series

[v2,18/23] kvm: x86: Get/set expanded xstate buffer

Message ID 20211217153003.1719189-19-jing2.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series AMX Support in KVM | expand

Commit Message

Jing Liu Dec. 17, 2021, 3:29 p.m. UTC
From: Guang Zeng <guang.zeng@intel.com>

When AMX is enabled it requires a larger xstate buffer than the
legacy hardcoded 4KB one. Exising kvm ioctls (KVM_[G|S]ET_XSAVE
under KVM_CAP_XSAVE) are not suitable for this purpose.

A new capability (KVM_CAP_XSAVE2) is introduced to mark an extended
kvm_xsave format for supporting >4KB fpstate. The expanded fpstate
size is returned to userspace when KVM_CAP_XSAVE2 is queried.

Introduce KVM_GET_XSAVE2 under this capability with the new format
for copying kernel fpstate to userspace.

Reuse KVM_SET_XSAVE for both old/new formats by reimplementing it to
do properly-sized memdup_user() based on guest fpstate.

Signed-off-by: Guang Zeng <guang.zeng@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h | 12 ++++++++++-
 arch/x86/kvm/x86.c              | 37 ++++++++++++++++++++++++++++++++-
 include/uapi/linux/kvm.h        |  4 ++++
 3 files changed, 51 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini Dec. 20, 2021, 9:04 a.m. UTC | #1
On 12/17/21 16:29, Jing Liu wrote:
> +/* for KVM_CAP_XSAVE and KVM_CAP_XSAVE2 */
>   struct kvm_xsave {
> +	/*
> +	 * KVM_GET_XSAVE only uses the first 4096 bytes.
> +	 *
> +	 * KVM_GET_XSAVE2 must have the size match what is returned by
> +	 * KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
> +	 *
> +	 * KVM_SET_XSAVE uses the extra field if guest_fpu::fpstate::size
> +	 * exceeds 4096 bytes.

KVM_GET_XSAVE2 and KVM_SET_XSAVE respectively write and read as many 
bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2), when 
invoked on the vm file descriptor.  Currently, 
KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will only return a value that is 
greater than 4096 bytes if any dynamic features have been enabled with 
``arch_prctl()``; this however may change in the future.

The offsets of the state save areas in struct kvm_xsave follow the 
contents of CPUID leaf 0xD on the host.

Paolo
Wang, Wei W Dec. 21, 2021, 2:45 a.m. UTC | #2
On Monday, December 20, 2021 5:04 PM, Paolo Bonzini wrote:
> On 12/17/21 16:29, Jing Liu wrote:
> > +/* for KVM_CAP_XSAVE and KVM_CAP_XSAVE2 */
> >   struct kvm_xsave {
> > +	/*
> > +	 * KVM_GET_XSAVE only uses the first 4096 bytes.
> > +	 *
> > +	 * KVM_GET_XSAVE2 must have the size match what is returned by
> > +	 * KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
> > +	 *
> > +	 * KVM_SET_XSAVE uses the extra field if guest_fpu::fpstate::size
> > +	 * exceeds 4096 bytes.
> 
> KVM_GET_XSAVE2 and KVM_SET_XSAVE respectively write and read as many
> bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2), when
> invoked on the vm file descriptor.  Currently,
> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will only return a value that is
> greater than 4096 bytes if any dynamic features have been enabled with
> ``arch_prctl()``; this however may change in the future.

Would this make people think that KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) doesn’t
return the value (i.e. return 0) if it is smaller than 4096?
(i.e. KVM_GET_XSAVE2 doesn't work with size < 4096, which isn’t true)

I plan to just reword a bit:
Currently, KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will only return a size value,
and the value is greater than 4096 bytes if any dynamic features have been enabled with
``arch_prctl()``. More types of values could be returned in the future.

Thanks,
Wei
Paolo Bonzini Dec. 21, 2021, 8:45 a.m. UTC | #3
On 12/21/21 03:45, Wang, Wei W wrote:
>> KVM_GET_XSAVE2 and KVM_SET_XSAVE respectively write and read as many
>> bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2), when
>> invoked on the vm file descriptor.  Currently,
>> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will only return a value that is
>> greater than 4096 bytes if any dynamic features have been enabled with
>> ``arch_prctl()``; this however may change in the future.
> Would this make people think that KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) doesn’t
> return the value (i.e. return 0) if it is smaller than 4096?
> (i.e. KVM_GET_XSAVE2 doesn't work with size < 4096, which isn’t true)
> 
> I plan to just reword a bit:
> Currently, KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will only return a size value,
> and the value is greater than 4096 bytes if any dynamic features have been enabled with
> ``arch_prctl()``. More types of values could be returned in the future.

Next refinement:

The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will 
always be at least 4096.  Currently, it is only greater than 4096 if a 
dynamic feature has been enabled with ``arch_prctl()``, but this may 
change in the future.

(I'm not sure if the first sentence is true in the code, but if not it 
is a bug that has to be fixed :)).

Paolo
Wang, Wei W Dec. 21, 2021, 9:06 a.m. UTC | #4
On Tuesday, December 21, 2021 4:45 PM, Paolo Bonzini wrote:
> On 12/21/21 03:45, Wang, Wei W wrote:
> >> KVM_GET_XSAVE2 and KVM_SET_XSAVE respectively write and read as
> many
> >> bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2),
> when
> >> invoked on the vm file descriptor.  Currently,
> >> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will only return a value that is
> >> greater than 4096 bytes if any dynamic features have been enabled
> >> with ``arch_prctl()``; this however may change in the future.
> > Would this make people think that
> KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2)
> > doesn’t return the value (i.e. return 0) if it is smaller than 4096?
> > (i.e. KVM_GET_XSAVE2 doesn't work with size < 4096, which isn’t true)
> >
> > I plan to just reword a bit:
> > Currently, KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will only return a
> size
> > value, and the value is greater than 4096 bytes if any dynamic
> > features have been enabled with ``arch_prctl()``. More types of values could
> be returned in the future.
> 
> Next refinement:
> 
> The size value returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will
> always be at least 4096.  Currently, it is only greater than 4096 if a dynamic
> feature has been enabled with ``arch_prctl()``, but this may change in the
> future.

> (I'm not sure if the first sentence is true in the code, but if not it is a bug that
> has to be fixed :)).

For the implementation, KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) always return kvm->vcpus[0]->arch.guest_fpu.uabi_size.
Do you want to change it to below?

If (kvm->vcpus[0]->arch.guest_fpu.uabi_size < 4096)
	return 0;
else
	return kvm->vcpus[0]->arch.guest_fpu.uabi_size;

If the size is less than 4096 (e.g. no dynamic xfeatures enabled),
userspace should use the old KVM_GET_XSAVE (instead of KVM_GET_XSAVE2)?
(KVM_GET_XSAVE2 supports to work with size less than 4096, so I think this isn't necessary)

Thanks,
Wei
Paolo Bonzini Dec. 21, 2021, 9:30 a.m. UTC | #5
On 12/21/21 10:06, Wang, Wei W wrote:
>> (I'm not sure if the first sentence is true in the code, but if not it is a bug that
>> has to be fixed :)).
> For the implementation, KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) always return kvm->vcpus[0]->arch.guest_fpu.uabi_size.
> Do you want to change it to below?
> 
> If (kvm->vcpus[0]->arch.guest_fpu.uabi_size < 4096)
> 	return 0;

return 4096;

since the minimum size of struct kvm_xsave2 (with no extra) is 4096.

Paolo

> else
> 	return kvm->vcpus[0]->arch.guest_fpu.uabi_size;
> 
> If the size is less than 4096 (e.g. no dynamic xfeatures enabled),
> userspace should use the old KVM_GET_XSAVE (instead of KVM_GET_XSAVE2)?
> (KVM_GET_XSAVE2 supports to work with size less than 4096, so I think this isn't necessary)
diff mbox series

Patch

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5a776a08f78c..240e17829e89 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -373,9 +373,19 @@  struct kvm_debugregs {
 	__u64 reserved[9];
 };
 
-/* for KVM_CAP_XSAVE */
+/* for KVM_CAP_XSAVE and KVM_CAP_XSAVE2 */
 struct kvm_xsave {
+	/*
+	 * KVM_GET_XSAVE only uses the first 4096 bytes.
+	 *
+	 * KVM_GET_XSAVE2 must have the size match what is returned by
+	 * KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
+	 *
+	 * KVM_SET_XSAVE uses the extra field if guest_fpu::fpstate::size
+	 * exceeds 4096 bytes.
+	 */
 	__u32 region[1024];
+	__u32 extra[0];
 };
 
 #define KVM_MAX_XCRS	16
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f8bacf18e6ed..796a9f2d1f23 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4296,6 +4296,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		else
 			r = 0;
 		break;
+	case KVM_CAP_XSAVE2:
+		r = kvm->vcpus[0]->arch.guest_fpu.uabi_size;
+		break;
 	default:
 		break;
 	}
@@ -4899,6 +4902,16 @@  static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
 				       vcpu->arch.pkru);
 }
 
+static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
+					  u8 *state, unsigned int size)
+{
+	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
+		return;
+
+	fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu,
+				       state, size, vcpu->arch.pkru);
+}
+
 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 					struct kvm_xsave *guest_xsave)
 {
@@ -5366,7 +5379,9 @@  long kvm_arch_vcpu_ioctl(struct file *filp,
 		break;
 	}
 	case KVM_SET_XSAVE: {
-		u.xsave = memdup_user(argp, sizeof(*u.xsave));
+		int size = vcpu->arch.guest_fpu.uabi_size;
+
+		u.xsave = memdup_user(argp, size);
 		if (IS_ERR(u.xsave)) {
 			r = PTR_ERR(u.xsave);
 			goto out_nofree;
@@ -5375,6 +5390,26 @@  long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave);
 		break;
 	}
+
+	case KVM_GET_XSAVE2: {
+		int size = vcpu->arch.guest_fpu.uabi_size;
+
+		u.xsave = kzalloc(size, GFP_KERNEL_ACCOUNT);
+		if (!u.xsave) {
+			r = -ENOMEM;
+			break;
+		}
+
+		kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
+
+		if (copy_to_user(argp, u.xsave, size)) {
+			r = -EFAULT;
+			break;
+		}
+		r = 0;
+		break;
+	}
+
 	case KVM_GET_XCRS: {
 		u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL_ACCOUNT);
 		r = -ENOMEM;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1daa45268de2..9d1c01669560 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1131,6 +1131,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
+#define KVM_CAP_XSAVE2 207
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1610,6 +1611,9 @@  struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+/* Available with KVM_CAP_XSAVE2 */
+#define KVM_GET_XSAVE2		  _IOR(KVMIO,  0xcf, struct kvm_xsave)
+
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;