diff mbox series

[01/18] KVM: x86: hyper-v: Introduce XMM output support

Message ID 20240609154945.55332-2-nsaenz@amazon.com (mailing list archive)
State New, archived
Headers show
Series Introducing Core Building Blocks for Hyper-V VSM Emulation | expand

Commit Message

Nicolas Saenz Julienne June 9, 2024, 3:49 p.m. UTC
Prepare infrastructure to be able to return data through the XMM
registers when Hyper-V hypercalls are issues in fast mode. The XMM
registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
restored on successful hypercall completion.

Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>

---

There was some discussion in the RFC about whether growing 'struct
kvm_hyperv_exit' is ABI breakage. IMO it isn't:
- There is padding in 'struct kvm_run' that ensures that a bigger
  'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
- Adding a new field at the bottom of the 'hcall' field within the
  'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
  the offsets within that struct either.
- Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
  its size isn't part of the uABI. It already grew when syndbg was
  introduced.

 Documentation/virt/kvm/api.rst     | 19 ++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  2 +-
 arch/x86/kvm/hyperv.c              | 56 +++++++++++++++++++++++++++++-
 include/uapi/linux/kvm.h           |  6 ++++
 4 files changed, 81 insertions(+), 2 deletions(-)

Comments

Vitaly Kuznetsov July 8, 2024, 2:59 p.m. UTC | #1
Nicolas Saenz Julienne <nsaenz@amazon.com> writes:

> Prepare infrastructure to be able to return data through the XMM
> registers when Hyper-V hypercalls are issues in fast mode. The XMM
> registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
> restored on successful hypercall completion.
>
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
>
> ---
>
> There was some discussion in the RFC about whether growing 'struct
> kvm_hyperv_exit' is ABI breakage. IMO it isn't:
> - There is padding in 'struct kvm_run' that ensures that a bigger
>   'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
> - Adding a new field at the bottom of the 'hcall' field within the
>   'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
>   the offsets within that struct either.
> - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
>   its size isn't part of the uABI. It already grew when syndbg was
>   introduced.

Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
any immediate issues with the current approach, we may want to introduce
something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
to handle this new information anyway and it is better to make
unprepared userspace fail with 'unknown exit' then to mishandle a
hypercall by ignoring XMM portion of the data.

>
>  Documentation/virt/kvm/api.rst     | 19 ++++++++++
>  arch/x86/include/asm/hyperv-tlfs.h |  2 +-
>  arch/x86/kvm/hyperv.c              | 56 +++++++++++++++++++++++++++++-
>  include/uapi/linux/kvm.h           |  6 ++++
>  4 files changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a71d91978d9ef..17893b330b76f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8893,3 +8893,22 @@ Ordering of KVM_GET_*/KVM_SET_* ioctls
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  
>  TBD
> +
> +10. Hyper-V CPUIDs
> +==================
> +
> +This section only applies to x86.

We can probably use 

:Architectures: x86

which we already use.

> +
> +New Hyper-V feature support is no longer being tracked through KVM
> +capabilities.  Userspace can check if a particular version of KVM supports a
> +feature using KMV_GET_SUPPORTED_HV_CPUID.  This section documents how Hyper-V
> +CPUIDs map to KVM functionality.
> +
> +10.1 HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE
> +------------------------------------------
> +
> +:Location: CPUID.40000003H:EDX[bit 15]
> +
> +This CPUID indicates that KVM supports retuning data to the guest in response
> +to a hypercall using the XMM registers. It also extends ``struct
> +kvm_hyperv_exit`` to allow passing the XMM data from userspace.

It's always good to document things, thanks! I'm, however, wondering
what should we document as part of KVM API. In the file, we already
have:
- "4.118 KVM_GET_SUPPORTED_HV_CPUID"
- "struct kvm_hyperv_exit" description in "5. The kvm_run structure"

The later should definitely get extended to cover XMM and I guess the
former can accomodate the 'no longer being tracked' comment. With that,
maybe there's no need for a new section? 

> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 3787d26810c1c..6a18c9f77d5fe 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -49,7 +49,7 @@
>  /* Support for physical CPU dynamic partitioning events is available*/
>  #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE	BIT(3)
>  /*
> - * Support for passing hypercall input parameter block via XMM
> + * Support for passing hypercall input and output parameter block via XMM
>   * registers is available
>   */
>  #define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE		BIT(4)

This change of the comment is weird (or I may have forgotten something
important), could you please elaborate?. Currently, we have:

/*
 * Support for passing hypercall input parameter block via XMM
 * registers is available
 */
#define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE         BIT(4)
...
/*
 * Support for returning hypercall output block via XMM
 * registers is available
 */
#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE                BIT(15)

which seems to be correct. TLFS also defines

Bit 4: XmmRegistersForFastHypercallAvailable

in CPUID 0x40000009.EDX (Nested Hypervisor Feature Identification) which
probably covers both but we don't set this leaf in KVM currently ...

> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 8a47f8541eab7..42f44546fe79c 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1865,6 +1865,7 @@ struct kvm_hv_hcall {
>  	u16 rep_idx;
>  	bool fast;
>  	bool rep;
> +	bool xmm_dirty;
>  	sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
>  
>  	/*
> @@ -2396,9 +2397,49 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
>  	return ret;
>  }
>  
> +static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
> +{
> +	int reg;
> +
> +	kvm_fpu_get();
> +	for (reg = 0; reg < HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) {
> +		const sse128_t data = sse128(xmm[reg].low, xmm[reg].high);
> +		_kvm_write_sse_reg(reg, &data);
> +	}
> +	kvm_fpu_put();
> +}
> +
> +static bool kvm_hv_is_xmm_output_hcall(u16 code)
> +{
> +	return false;
> +}
> +
> +static bool kvm_hv_xmm_output_allowed(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> +	return !hv_vcpu->enforce_cpuid ||
> +	       hv_vcpu->cpuid_cache.features_edx &
> +		       HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
> +}
> +
>  static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result);
> +	bool fast = !!(vcpu->run->hyperv.u.hcall.input & HV_HYPERCALL_FAST_BIT);
> +	u16 code = vcpu->run->hyperv.u.hcall.input & 0xffff;
> +	u64 result = vcpu->run->hyperv.u.hcall.result;
> +
> +	if (hv_result_success(result) && fast &&
> +	    kvm_hv_is_xmm_output_hcall(code)) {

Assuming hypercalls with XMM output are always 'fast', should we include
'fast' check in kvm_hv_is_xmm_output_hcall()?

> +		if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> +			kvm_queue_exception(vcpu, UD_VECTOR);
> +			return 1;
> +		}
> +
> +		kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
> +	}
> +
> +	return kvm_hv_hypercall_complete(vcpu, result);
>  }
>  
>  static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> @@ -2553,6 +2594,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>  	hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
>  	hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
>  	hc.rep = !!(hc.rep_cnt || hc.rep_idx);
> +	hc.xmm_dirty = false;
>  
>  	trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
>  			       hc.rep_idx, hc.ingpa, hc.outgpa);
> @@ -2673,6 +2715,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>  		break;
>  	}
>  
> +	if (hv_result_success(ret) && hc.xmm_dirty) {
> +		if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> +			kvm_queue_exception(vcpu, UD_VECTOR);
> +			return 1;
> +		}
> +
> +		kvm_hv_write_xmm((struct kvm_hyperv_xmm_reg *)hc.xmm);
> +	}
> +
>  hypercall_complete:
>  	return kvm_hv_hypercall_complete(vcpu, ret);
>  
> @@ -2682,6 +2733,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>  	vcpu->run->hyperv.u.hcall.input = hc.param;
>  	vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
>  	vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
> +	if (hc.fast)
> +		memcpy(vcpu->run->hyperv.u.hcall.xmm, hc.xmm, sizeof(hc.xmm));
>  	vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
>  	return 0;
>  }
> @@ -2830,6 +2883,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>  			ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
>  
>  			ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
> +			ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
>  			ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
>  			ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d03842abae578..fbdee8d754595 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -90,6 +90,11 @@ struct kvm_pit_config {
>  
>  #define KVM_PIT_SPEAKER_DUMMY     1
>  
> +struct kvm_hyperv_xmm_reg {
> +	__u64 low;
> +	__u64 high;
> +};
> +
>  struct kvm_hyperv_exit {
>  #define KVM_EXIT_HYPERV_SYNIC          1
>  #define KVM_EXIT_HYPERV_HCALL          2
> @@ -108,6 +113,7 @@ struct kvm_hyperv_exit {
>  			__u64 input;
>  			__u64 result;
>  			__u64 params[2];
> +			struct kvm_hyperv_xmm_reg xmm[6];

In theory, we have HV_HYPERCALL_MAX_XMM_REGISTERS in TLFS (which you
already use in the code). While I'm not sure it makes sense to make KVM
ABI dependent on TLFS changes (probably not), we may want to leave a
short comment explaining where '6' comes from.

>  		} hcall;
>  		struct {
>  			__u32 msr;
Nicolas Saenz Julienne July 17, 2024, 2:12 p.m. UTC | #2
Hi Vitaly,
Thanks for having a look at this.

On Mon Jul 8, 2024 at 2:59 PM UTC, Vitaly Kuznetsov wrote:
> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>
> > Prepare infrastructure to be able to return data through the XMM
> > registers when Hyper-V hypercalls are issues in fast mode. The XMM
> > registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
> > restored on successful hypercall completion.
> >
> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> >
> > ---
> >
> > There was some discussion in the RFC about whether growing 'struct
> > kvm_hyperv_exit' is ABI breakage. IMO it isn't:
> > - There is padding in 'struct kvm_run' that ensures that a bigger
> >   'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
> > - Adding a new field at the bottom of the 'hcall' field within the
> >   'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
> >   the offsets within that struct either.
> > - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
> >   its size isn't part of the uABI. It already grew when syndbg was
> >   introduced.
>
> Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
> any immediate issues with the current approach, we may want to introduce
> something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
> to handle this new information anyway and it is better to make
> unprepared userspace fail with 'unknown exit' then to mishandle a
> hypercall by ignoring XMM portion of the data.

OK, I'll go that way. Just wanted to get a better understanding of why
you felt it was necessary.

> >
> >  Documentation/virt/kvm/api.rst     | 19 ++++++++++
> >  arch/x86/include/asm/hyperv-tlfs.h |  2 +-
> >  arch/x86/kvm/hyperv.c              | 56 +++++++++++++++++++++++++++++-
> >  include/uapi/linux/kvm.h           |  6 ++++
> >  4 files changed, 81 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a71d91978d9ef..17893b330b76f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8893,3 +8893,22 @@ Ordering of KVM_GET_*/KVM_SET_* ioctls
> >  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> >  TBD
> > +
> > +10. Hyper-V CPUIDs
> > +==================
> > +
> > +This section only applies to x86.
>
> We can probably use
>
> :Architectures: x86
>
> which we already use.

Noted.

> > +
> > +New Hyper-V feature support is no longer being tracked through KVM
> > +capabilities.  Userspace can check if a particular version of KVM supports a
> > +feature using KMV_GET_SUPPORTED_HV_CPUID.  This section documents how Hyper-V
> > +CPUIDs map to KVM functionality.
> > +
> > +10.1 HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE
> > +------------------------------------------
> > +
> > +:Location: CPUID.40000003H:EDX[bit 15]
> > +
> > +This CPUID indicates that KVM supports retuning data to the guest in response
> > +to a hypercall using the XMM registers. It also extends ``struct
> > +kvm_hyperv_exit`` to allow passing the XMM data from userspace.
>
> It's always good to document things, thanks! I'm, however, wondering
> what should we document as part of KVM API. In the file, we already
> have:
> - "4.118 KVM_GET_SUPPORTED_HV_CPUID"
> - "struct kvm_hyperv_exit" description in "5. The kvm_run structure"
>
> The later should definitely get extended to cover XMM and I guess the
> former can accomodate the 'no longer being tracked' comment. With that,
> maybe there's no need for a new section?

I'll try to fit it that way.

> > diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> > index 3787d26810c1c..6a18c9f77d5fe 100644
> > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > @@ -49,7 +49,7 @@
> >  /* Support for physical CPU dynamic partitioning events is available*/
> >  #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE    BIT(3)
> >  /*
> > - * Support for passing hypercall input parameter block via XMM
> > + * Support for passing hypercall input and output parameter block via XMM
> >   * registers is available
> >   */
> >  #define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE         BIT(4)
>
> This change of the comment is weird (or I may have forgotten something
> important), could you please elaborate?. Currently, we have:
>
> /*
>  * Support for passing hypercall input parameter block via XMM
>  * registers is available
>  */
> #define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE         BIT(4)
> ...
> /*
>  * Support for returning hypercall output block via XMM
>  * registers is available
>  */
> #define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE                BIT(15)
>
> which seems to be correct. TLFS also defines
>
> Bit 4: XmmRegistersForFastHypercallAvailable
>
> in CPUID 0x40000009.EDX (Nested Hypervisor Feature Identification) which
> probably covers both but we don't set this leaf in KVM currently ...

You're right, this comment update no longer applies. It used to in an
older version of the patch, but slipped through the cracks as I rebased
it. Sorry.

> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > index 8a47f8541eab7..42f44546fe79c 100644
> > --- a/arch/x86/kvm/hyperv.c
> > +++ b/arch/x86/kvm/hyperv.c
> > @@ -1865,6 +1865,7 @@ struct kvm_hv_hcall {
> >       u16 rep_idx;
> >       bool fast;
> >       bool rep;
> > +     bool xmm_dirty;
> >       sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
> >
> >       /*
> > @@ -2396,9 +2397,49 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
> >       return ret;
> >  }
> >
> > +static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
> > +{
> > +     int reg;
> > +
> > +     kvm_fpu_get();
> > +     for (reg = 0; reg < HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) {
> > +             const sse128_t data = sse128(xmm[reg].low, xmm[reg].high);
> > +             _kvm_write_sse_reg(reg, &data);
> > +     }
> > +     kvm_fpu_put();
> > +}
> > +
> > +static bool kvm_hv_is_xmm_output_hcall(u16 code)
> > +{
> > +     return false;
> > +}
> > +
> > +static bool kvm_hv_xmm_output_allowed(struct kvm_vcpu *vcpu)
> > +{
> > +     struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> > +
> > +     return !hv_vcpu->enforce_cpuid ||
> > +            hv_vcpu->cpuid_cache.features_edx &
> > +                    HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
> > +}
> > +
> >  static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
> >  {
> > -     return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result);
> > +     bool fast = !!(vcpu->run->hyperv.u.hcall.input & HV_HYPERCALL_FAST_BIT);
> > +     u16 code = vcpu->run->hyperv.u.hcall.input & 0xffff;
> > +     u64 result = vcpu->run->hyperv.u.hcall.result;
> > +
> > +     if (hv_result_success(result) && fast &&
> > +         kvm_hv_is_xmm_output_hcall(code)) {
>
> Assuming hypercalls with XMM output are always 'fast', should we include
> 'fast' check in kvm_hv_is_xmm_output_hcall()?

Sounds good, yes.

> > +             if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> > +                     kvm_queue_exception(vcpu, UD_VECTOR);
> > +                     return 1;
> > +             }
> > +
> > +             kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
> > +     }
> > +
> > +     return kvm_hv_hypercall_complete(vcpu, result);
> >  }
> >
> >  static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> > @@ -2553,6 +2594,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> >       hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
> >       hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
> >       hc.rep = !!(hc.rep_cnt || hc.rep_idx);
> > +     hc.xmm_dirty = false;
> >
> >       trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
> >                              hc.rep_idx, hc.ingpa, hc.outgpa);
> > @@ -2673,6 +2715,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> >               break;
> >       }
> >
> > +     if (hv_result_success(ret) && hc.xmm_dirty) {
> > +             if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> > +                     kvm_queue_exception(vcpu, UD_VECTOR);
> > +                     return 1;
> > +             }
> > +
> > +             kvm_hv_write_xmm((struct kvm_hyperv_xmm_reg *)hc.xmm);
> > +     }
> > +
> >  hypercall_complete:
> >       return kvm_hv_hypercall_complete(vcpu, ret);
> >
> > @@ -2682,6 +2733,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> >       vcpu->run->hyperv.u.hcall.input = hc.param;
> >       vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
> >       vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
> > +     if (hc.fast)
> > +             memcpy(vcpu->run->hyperv.u.hcall.xmm, hc.xmm, sizeof(hc.xmm));
> >       vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
> >       return 0;
> >  }
> > @@ -2830,6 +2883,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> >                       ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
> >
> >                       ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
> > +                     ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
> >                       ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
> >                       ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> >
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index d03842abae578..fbdee8d754595 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -90,6 +90,11 @@ struct kvm_pit_config {
> >
> >  #define KVM_PIT_SPEAKER_DUMMY     1
> >
> > +struct kvm_hyperv_xmm_reg {
> > +     __u64 low;
> > +     __u64 high;
> > +};
> > +
> >  struct kvm_hyperv_exit {
> >  #define KVM_EXIT_HYPERV_SYNIC          1
> >  #define KVM_EXIT_HYPERV_HCALL          2
> > @@ -108,6 +113,7 @@ struct kvm_hyperv_exit {
> >                       __u64 input;
> >                       __u64 result;
> >                       __u64 params[2];
> > +                     struct kvm_hyperv_xmm_reg xmm[6];
>
> In theory, we have HV_HYPERCALL_MAX_XMM_REGISTERS in TLFS (which you
> already use in the code). While I'm not sure it makes sense to make KVM
> ABI dependent on TLFS changes (probably not), we may want to leave a
> short comment explaining where '6' comes from.

Will do.

Thanks,
Nicolas
Vitaly Kuznetsov July 29, 2024, 1:53 p.m. UTC | #3
Nicolas Saenz Julienne <nsaenz@amazon.com> writes:

> Hi Vitaly,
> Thanks for having a look at this.
>
> On Mon Jul 8, 2024 at 2:59 PM UTC, Vitaly Kuznetsov wrote:
>> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>>
>> > Prepare infrastructure to be able to return data through the XMM
>> > registers when Hyper-V hypercalls are issues in fast mode. The XMM
>> > registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
>> > restored on successful hypercall completion.
>> >
>> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
>> >
>> > ---
>> >
>> > There was some discussion in the RFC about whether growing 'struct
>> > kvm_hyperv_exit' is ABI breakage. IMO it isn't:
>> > - There is padding in 'struct kvm_run' that ensures that a bigger
>> >   'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
>> > - Adding a new field at the bottom of the 'hcall' field within the
>> >   'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
>> >   the offsets within that struct either.
>> > - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
>> >   its size isn't part of the uABI. It already grew when syndbg was
>> >   introduced.
>>
>> Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
>> any immediate issues with the current approach, we may want to introduce
>> something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
>> to handle this new information anyway and it is better to make
>> unprepared userspace fail with 'unknown exit' then to mishandle a
>> hypercall by ignoring XMM portion of the data.
>
> OK, I'll go that way. Just wanted to get a better understanding of why
> you felt it was necessary.
>

(sorry for delayed reply, I was on vacation)

I don't think it's an absolute must but it appears as a cleaner approach
to me. 

Imagine there's some userspace which handles KVM_EXIT_HYPERV_HCALL today
and we want to add XMM handling there. How would we know if xmm portion
of the data is actually filled by KVM or not? With your patch, we can of
course check for HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE in
KVM_GET_SUPPORTED_HV_CPUID but this is not really straightforward, is
it? Checking the size is not good either. E.g. think about downstream
versions of KVM which may or may not have certain backports. In case we
(theoretically) do several additions to 'struct kvm_hyperv_exit', it
will quickly become a nightmare.

On the contrary, KVM_EXIT_HYPERV_HCALL_XMM (or just
KVM_EXIT_HYPERV_HCALL2) approach looks cleaner: once userspace sees it,
it knows that 'xmm' portion of the data can be relied upon.
Nicolas Saenz Julienne Aug. 5, 2024, 2:08 p.m. UTC | #4
On Mon Jul 29, 2024 at 1:53 PM UTC, Vitaly Kuznetsov wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>
> > Hi Vitaly,
> > Thanks for having a look at this.
> >
> > On Mon Jul 8, 2024 at 2:59 PM UTC, Vitaly Kuznetsov wrote:
> >> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
> >>
> >> > Prepare infrastructure to be able to return data through the XMM
> >> > registers when Hyper-V hypercalls are issues in fast mode. The XMM
> >> > registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
> >> > restored on successful hypercall completion.
> >> >
> >> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> >> >
> >> > ---
> >> >
> >> > There was some discussion in the RFC about whether growing 'struct
> >> > kvm_hyperv_exit' is ABI breakage. IMO it isn't:
> >> > - There is padding in 'struct kvm_run' that ensures that a bigger
> >> >   'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
> >> > - Adding a new field at the bottom of the 'hcall' field within the
> >> >   'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
> >> >   the offsets within that struct either.
> >> > - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
> >> >   its size isn't part of the uABI. It already grew when syndbg was
> >> >   introduced.
> >>
> >> Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
> >> any immediate issues with the current approach, we may want to introduce
> >> something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
> >> to handle this new information anyway and it is better to make
> >> unprepared userspace fail with 'unknown exit' then to mishandle a
> >> hypercall by ignoring XMM portion of the data.
> >
> > OK, I'll go that way. Just wanted to get a better understanding of why
> > you felt it was necessary.
> >
>
> (sorry for delayed reply, I was on vacation)
>
> I don't think it's an absolute must but it appears as a cleaner approach
> to me.
>
> Imagine there's some userspace which handles KVM_EXIT_HYPERV_HCALL today
> and we want to add XMM handling there. How would we know if xmm portion
> of the data is actually filled by KVM or not? With your patch, we can of
> course check for HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE in
> KVM_GET_SUPPORTED_HV_CPUID but this is not really straightforward, is
> it? Checking the size is not good either. E.g. think about downstream
> versions of KVM which may or may not have certain backports. In case we
> (theoretically) do several additions to 'struct kvm_hyperv_exit', it
> will quickly become a nightmare.
>
> On the contrary, KVM_EXIT_HYPERV_HCALL_XMM (or just
> KVM_EXIT_HYPERV_HCALL2) approach looks cleaner: once userspace sees it,
> it knows that 'xmm' portion of the data can be relied upon.

Makes sense, thanks for the explanation.

Nicolas
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a71d91978d9ef..17893b330b76f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8893,3 +8893,22 @@  Ordering of KVM_GET_*/KVM_SET_* ioctls
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 TBD
+
+10. Hyper-V CPUIDs
+==================
+
+This section only applies to x86.
+
+New Hyper-V feature support is no longer being tracked through KVM
+capabilities.  Userspace can check if a particular version of KVM supports a
+feature using KMV_GET_SUPPORTED_HV_CPUID.  This section documents how Hyper-V
+CPUIDs map to KVM functionality.
+
+10.1 HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE
+------------------------------------------
+
+:Location: CPUID.40000003H:EDX[bit 15]
+
+This CPUID indicates that KVM supports retuning data to the guest in response
+to a hypercall using the XMM registers. It also extends ``struct
+kvm_hyperv_exit`` to allow passing the XMM data from userspace.
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 3787d26810c1c..6a18c9f77d5fe 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -49,7 +49,7 @@ 
 /* Support for physical CPU dynamic partitioning events is available*/
 #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE	BIT(3)
 /*
- * Support for passing hypercall input parameter block via XMM
+ * Support for passing hypercall input and output parameter block via XMM
  * registers is available
  */
 #define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE		BIT(4)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab7..42f44546fe79c 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1865,6 +1865,7 @@  struct kvm_hv_hcall {
 	u16 rep_idx;
 	bool fast;
 	bool rep;
+	bool xmm_dirty;
 	sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
 
 	/*
@@ -2396,9 +2397,49 @@  static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
 	return ret;
 }
 
+static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
+{
+	int reg;
+
+	kvm_fpu_get();
+	for (reg = 0; reg < HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) {
+		const sse128_t data = sse128(xmm[reg].low, xmm[reg].high);
+		_kvm_write_sse_reg(reg, &data);
+	}
+	kvm_fpu_put();
+}
+
+static bool kvm_hv_is_xmm_output_hcall(u16 code)
+{
+	return false;
+}
+
+static bool kvm_hv_xmm_output_allowed(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+	return !hv_vcpu->enforce_cpuid ||
+	       hv_vcpu->cpuid_cache.features_edx &
+		       HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
+}
+
 static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
 {
-	return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result);
+	bool fast = !!(vcpu->run->hyperv.u.hcall.input & HV_HYPERCALL_FAST_BIT);
+	u16 code = vcpu->run->hyperv.u.hcall.input & 0xffff;
+	u64 result = vcpu->run->hyperv.u.hcall.result;
+
+	if (hv_result_success(result) && fast &&
+	    kvm_hv_is_xmm_output_hcall(code)) {
+		if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
+			kvm_queue_exception(vcpu, UD_VECTOR);
+			return 1;
+		}
+
+		kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
+	}
+
+	return kvm_hv_hypercall_complete(vcpu, result);
 }
 
 static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
@@ -2553,6 +2594,7 @@  int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
 	hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
 	hc.rep = !!(hc.rep_cnt || hc.rep_idx);
+	hc.xmm_dirty = false;
 
 	trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
 			       hc.rep_idx, hc.ingpa, hc.outgpa);
@@ -2673,6 +2715,15 @@  int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 		break;
 	}
 
+	if (hv_result_success(ret) && hc.xmm_dirty) {
+		if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
+			kvm_queue_exception(vcpu, UD_VECTOR);
+			return 1;
+		}
+
+		kvm_hv_write_xmm((struct kvm_hyperv_xmm_reg *)hc.xmm);
+	}
+
 hypercall_complete:
 	return kvm_hv_hypercall_complete(vcpu, ret);
 
@@ -2682,6 +2733,8 @@  int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	vcpu->run->hyperv.u.hcall.input = hc.param;
 	vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
 	vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
+	if (hc.fast)
+		memcpy(vcpu->run->hyperv.u.hcall.xmm, hc.xmm, sizeof(hc.xmm));
 	vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
 	return 0;
 }
@@ -2830,6 +2883,7 @@  int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 			ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
 
 			ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
+			ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
 			ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
 			ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d03842abae578..fbdee8d754595 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -90,6 +90,11 @@  struct kvm_pit_config {
 
 #define KVM_PIT_SPEAKER_DUMMY     1
 
+struct kvm_hyperv_xmm_reg {
+	__u64 low;
+	__u64 high;
+};
+
 struct kvm_hyperv_exit {
 #define KVM_EXIT_HYPERV_SYNIC          1
 #define KVM_EXIT_HYPERV_HCALL          2
@@ -108,6 +113,7 @@  struct kvm_hyperv_exit {
 			__u64 input;
 			__u64 result;
 			__u64 params[2];
+			struct kvm_hyperv_xmm_reg xmm[6];
 		} hcall;
 		struct {
 			__u32 msr;