diff mbox series

[RFC,1/2] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support

Message ID 20240509075423.156858-1-weijiang.yang@intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC,1/2] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support | expand

Commit Message

Yang, Weijiang May 9, 2024, 7:54 a.m. UTC
Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access HW MSR or
KVM synthetic MSR throught it.

In CET KVM series [*], KVM "steals" an MSR from PV MSR space and access
it via KVM_{G,S}ET_MSRs uAPIs, but the approach pollutes PV MSR space
and hides the difference of synthetic MSRs and normal HW defined MSRs.

Now carve out a separate room in KVM-customized MSR address space for
synthetic MSRs. The synthetic MSRs are not exposed to userspace via
KVM_GET_MSR_INDEX_LIST, instead userspace complies with KVM's setup and
composes the uAPI params. KVM synthetic MSR indices start from 0 and
increase linearly. Userspace caller should tag MSR type correctly in
order to access intended HW or synthetic MSR.

[*]:
https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h | 10 ++++++
 arch/x86/kvm/x86.c              | 62 +++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)

Comments

Sean Christopherson June 11, 2024, 1:04 a.m. UTC | #1
On Thu, May 09, 2024, Yang Weijiang wrote:
> @@ -5859,6 +5884,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +static int kvm_translate_synthetic_msr(u32 *index)
> +{
> +	return 0;

This needs to be -EINVAL.

> +}
> +
>  long kvm_arch_vcpu_ioctl(struct file *filp,
>  			 unsigned int ioctl, unsigned long arg)
>  {
Yang, Weijiang June 11, 2024, 2:05 a.m. UTC | #2
On 6/11/2024 9:04 AM, Sean Christopherson wrote:
> On Thu, May 09, 2024, Yang Weijiang wrote:
>> @@ -5859,6 +5884,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>>   	}
>>   }
>>   
>> +static int kvm_translate_synthetic_msr(u32 *index)
>> +{
>> +	return 0;
> This needs to be -EINVAL.

OK, I'll change it, thanks!
Nikolas Wipper Sept. 11, 2024, 11:31 a.m. UTC | #3
On Thu May  9, 2024 at 09:54 AM UTC+0200, Yang Weijiang wrote:
> Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access HW MSR or
> KVM synthetic MSR throught it.
> 
> In CET KVM series [*], KVM "steals" an MSR from PV MSR space and access
> it via KVM_{G,S}ET_MSRs uAPIs, but the approach pollutes PV MSR space
> and hides the difference of synthetic MSRs and normal HW defined MSRs.
> 
> Now carve out a separate room in KVM-customized MSR address space for
> synthetic MSRs. The synthetic MSRs are not exposed to userspace via
> KVM_GET_MSR_INDEX_LIST, instead userspace complies with KVM's setup and
> composes the uAPI params. KVM synthetic MSR indices start from 0 and
> increase linearly. Userspace caller should tag MSR type correctly in
> order to access intended HW or synthetic MSR.
> 
> [*]:
> https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com/
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

Having this API, and specifically having a definite kvm_one_reg structure 
for x86 registers, would be interesting for register pinning/intercepts.
With one_reg for x86 the API could be platform agnostic and possible even
replace MSR filters for x86. I do have a couple of questions about these
patches.

> ---
>  arch/x86/include/uapi/asm/kvm.h | 10 ++++++
>  arch/x86/kvm/x86.c              | 62 +++++++++++++++++++++++++++++++++
>  2 files changed, 72 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index ef11aa4cab42..ca2a47a85fa1 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -410,6 +410,16 @@ struct kvm_xcrs {
>  	__u64 padding[16];
>  };
>  
> +#define KVM_X86_REG_MSR			(1 << 2)
> +#define KVM_X86_REG_SYNTHETIC_MSR	(1 << 3)

Why is this a bitfield? As opposed to just counting up?

#define KVM_X86_REG_MSR			2
#define KVM_X86_REG_SYNTHETIC_MSR	3

> +
> +struct kvm_x86_reg_id {
> +	__u32 index;
> +	__u8 type;
> +	__u8 rsvd;
> +	__u16 rsvd16;
> +};

This struct is opposite to what other architectures do, where they have
an architecture ID in the upper 32 bits, and the lower 32 bits actually
identify the register. This would probably make sense for x86 too, to
avoid conflicts with other IDs (I think MIPS core registers can have IDs
with the lower 32 bits all zero) so that the IDs are actually unique,
right?

Best,
Nikolas



Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Sean Christopherson Sept. 11, 2024, 2:36 p.m. UTC | #4
On Wed, Sep 11, 2024, Nikolas Wipper wrote:
> On Thu May  9, 2024 at 09:54 AM UTC+0200, Yang Weijiang wrote:
> > Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access HW MSR or
> > KVM synthetic MSR throught it.
> > 
> > In CET KVM series [*], KVM "steals" an MSR from PV MSR space and access
> > it via KVM_{G,S}ET_MSRs uAPIs, but the approach pollutes PV MSR space
> > and hides the difference of synthetic MSRs and normal HW defined MSRs.
> > 
> > Now carve out a separate room in KVM-customized MSR address space for
> > synthetic MSRs. The synthetic MSRs are not exposed to userspace via
> > KVM_GET_MSR_INDEX_LIST, instead userspace complies with KVM's setup and
> > composes the uAPI params. KVM synthetic MSR indices start from 0 and
> > increase linearly. Userspace caller should tag MSR type correctly in
> > order to access intended HW or synthetic MSR.
> > 
> > [*]:
> > https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com/
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> 
> Having this API, and specifically having a definite kvm_one_reg structure 
> for x86 registers, would be interesting for register pinning/intercepts.
> With one_reg for x86 the API could be platform agnostic and possible even
> replace MSR filters for x86.

I don't follow.  MSR filters let userspace intercept accesses for a variety of
reasons, these APIs simply provide a way to read/write a register value that is
stored in KVM.  I don't see how this could replace MSR filters.  

> I do have a couple of questions about these patches.
> 
> > ---
> >  arch/x86/include/uapi/asm/kvm.h | 10 ++++++
> >  arch/x86/kvm/x86.c              | 62 +++++++++++++++++++++++++++++++++
> >  2 files changed, 72 insertions(+)
> > 
> > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> > index ef11aa4cab42..ca2a47a85fa1 100644
> > --- a/arch/x86/include/uapi/asm/kvm.h
> > +++ b/arch/x86/include/uapi/asm/kvm.h
> > @@ -410,6 +410,16 @@ struct kvm_xcrs {
> >  	__u64 padding[16];
> >  };
> >  
> > +#define KVM_X86_REG_MSR			(1 << 2)
> > +#define KVM_X86_REG_SYNTHETIC_MSR	(1 << 3)
> 
> Why is this a bitfield? As opposed to just counting up?

Hmm, good question.  This came from my initial sketch, and it would seem that I
something specific in mind since starting at (1 << 2) is oddly specific, but for
the life of me I can't remember what the plan was.  Best guest is that I was
leaving space for '0' and '1' to be regs and sregs?  But that still doesn't
explain/justify using a bitfield.

[*] https://lore.kernel.org/all/ZjLE7giCsEI4Sftp@google.com

> 
> #define KVM_X86_REG_MSR			2
> #define KVM_X86_REG_SYNTHETIC_MSR	3
> 
> > +
> > +struct kvm_x86_reg_id {
> > +	__u32 index;
> > +	__u8 type;
> > +	__u8 rsvd;
> > +	__u16 rsvd16;
> > +};
> 
> This struct is opposite to what other architectures do, where they have
> an architecture ID in the upper 32 bits, and the lower 32 bits actually
> identify the register. This would probably make sense for x86 too, to
> avoid conflicts with other IDs (I think MIPS core registers can have IDs
> with the lower 32 bits all zero) so that the IDs are actually unique,
> right?

It's not the opposite, it's just missing fields for the arch and the size.  Ugh,
the size is unaligned.  That's annoying.  Something like this?

struct kvm_x86_reg_id {
	__u32 index;
	__u8  type;
	__u8  rsvd;
	__u8  rsvd4:4;
	__u8  size:4;
	__u8  x86;
}

Though looking at this with fresh eyes, I don't think the above structure should
be exposed to userspace.  Userspace will only ever want to encode a register; the
exact register may not be hardcoded, but I would expect the type to always be
known ahead of time, if not outright hardcoded.  The struct is really only useful
for the kernel, e.g. to easily switch on the type, extract the index, etc.

As annoying as it can be for a human to decipher the final value, the arm64/riscv
approach of providing builders is probably the way to go, though I think x86 can
be much simpler (less stuff to encode).

Oh!  Another thing I think we should do is make KVM_{G,S}ET_ONE_REG 64-bit only
so that we don't have to deal with 32-bit vs. 64-bit GPRs.  32-bit userspace
would need to manually encode the register id, but I have no problem making life
difficult for such setups.  Or KVM could reject the ioctl for .compat_ioctl(),
but that seems unnecessary.

E.g. since IIUC switch() and if() statements are off-limits in uapi headers...

#define KVM_X86_REG_TYPE_MSR	2ull

#define KVM_x86_REG_TYPE_SIZE(type) 						\
{(										\
	__u64 type_size = type;							\
										\
	type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 :		\
		     type == KVM_X86_REG_TYPE_SYNTHETIC_MSR ? KVM_REG_SIZE_U64 :\
		     0;								\
	type_size;								\
})

#define KVM_X86_REG_ENCODE(type, index)				\
	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type) | index)

#define KVM_X86_REG_MSR(index) KVM_X86_REG_ENCODE(KVM_X86_REG_TYPE_MSR, index)
Nikolas Wipper Sept. 11, 2024, 2:48 p.m. UTC | #5
On Wed Sep 11, 2024 at 04:36 PM UTC+0200, Sean Christopherson wrote:
> On Wed, Sep 11, 2024, Nikolas Wipper wrote:
>> Having this API, and specifically having a definite kvm_one_reg structure
>> for x86 registers, would be interesting for register pinning/intercepts.
>> With one_reg for x86 the API could be platform agnostic and possible even
>> replace MSR filters for x86.
> 
> I don't follow.  MSR filters let userspace intercept accesses for a variety of
> reasons, these APIs simply provide a way to read/write a register value that is
> stored in KVM.  I don't see how this could replace MSR filters.

Nope, that would be an entirely different API, but if that uses one reg IDs it
could be unified to cover CRs and MSRs all in one.




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Sean Christopherson Sept. 11, 2024, 2:59 p.m. UTC | #6
On Wed, Sep 11, 2024, Nikolas Wipper wrote:
> On Wed Sep 11, 2024 at 04:36 PM UTC+0200, Sean Christopherson wrote:
> > On Wed, Sep 11, 2024, Nikolas Wipper wrote:
> >> Having this API, and specifically having a definite kvm_one_reg structure
> >> for x86 registers, would be interesting for register pinning/intercepts.
> >> With one_reg for x86 the API could be platform agnostic and possible even
> >> replace MSR filters for x86.
> > 
> > I don't follow.  MSR filters let userspace intercept accesses for a variety of
> > reasons, these APIs simply provide a way to read/write a register value that is
> > stored in KVM.  I don't see how this could replace MSR filters.
> 
> Nope, that would be an entirely different API, but if that uses one reg IDs it
> could be unified to cover CRs and MSRs all in one.

Oooh, gotcha.  Yeah, uniquely identifiable registers would allow for a generic
filtering API, though I'm not entirely sure that's actually a good idea in the
long run.  Most x86 registers can't be intercepted; having a generic filtering
API might incur an annoyingly high maintenance cost.  Hmm, though it should be
easy enough to explicitly allow only MSR and CR types, so if/when we get to the
point where CR pinning/filtering is desirable/ready, then a unified API probably
does make sense.
diff mbox series

Patch

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index ef11aa4cab42..ca2a47a85fa1 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -410,6 +410,16 @@  struct kvm_xcrs {
 	__u64 padding[16];
 };
 
+#define KVM_X86_REG_MSR			(1 << 2)
+#define KVM_X86_REG_SYNTHETIC_MSR	(1 << 3)
+
+struct kvm_x86_reg_id {
+	__u32 index;
+	__u8 type;
+	__u8 rsvd;
+	__u16 rsvd16;
+};
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91478b769af0..d0054c52f24b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2244,6 +2244,31 @@  static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr_ignored_check(vcpu, index, *data, true);
 }
 
+static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *value)
+{
+	u64 val;
+	int r;
+
+	r = do_get_msr(vcpu, msr, &val);
+	if (r)
+		return r;
+
+	if (put_user(val, value))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *value)
+{
+	u64 val;
+
+	if (get_user(val, value))
+		return -EFAULT;
+
+	return do_set_msr(vcpu, msr, &val);
+}
+
 #ifdef CONFIG_X86_64
 struct pvclock_clock {
 	int vclock_mode;
@@ -5859,6 +5884,11 @@  static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 	}
 }
 
+static int kvm_translate_synthetic_msr(u32 *index)
+{
+	return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 			 unsigned int ioctl, unsigned long arg)
 {
@@ -5976,6 +6006,38 @@  long kvm_arch_vcpu_ioctl(struct file *filp,
 		srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		break;
 	}
+	case KVM_GET_ONE_REG:
+	case KVM_SET_ONE_REG: {
+		struct kvm_x86_reg_id *id;
+		struct kvm_one_reg reg;
+		u64 __user *value;
+
+		r = -EFAULT;
+		if (copy_from_user(&reg, argp, sizeof(reg)))
+			break;
+
+		r = -EINVAL;
+		id = (struct kvm_x86_reg_id *)&reg.id;
+		if (id->rsvd || id->rsvd16)
+			break;
+
+		if (id->type != KVM_X86_REG_MSR &&
+		    id->type != KVM_X86_REG_SYNTHETIC_MSR)
+			break;
+
+		if (id->type == KVM_X86_REG_SYNTHETIC_MSR) {
+			r = kvm_translate_synthetic_msr(&id->index);
+			if (r)
+				break;
+		}
+
+		value = u64_to_user_ptr(reg.addr);
+		if (ioctl == KVM_GET_ONE_REG)
+			r = kvm_get_one_msr(vcpu, id->index, value);
+		else
+			r = kvm_set_one_msr(vcpu, id->index, value);
+		break;
+	}
 	case KVM_TPR_ACCESS_REPORTING: {
 		struct kvm_tpr_access_ctl tac;