[v2] x86/mem_sharing: support forks with active vPMU state

Message ID	a8a66208064c209e65c08380c59bc6aeff5f57f8.1658340502.git.tamas.lengyel@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Tamas K Lengyel <tamas.lengyel@intel.com> To: xen-devel@lists.xenproject.org Cc: Tamas K Lengyel <tamas.lengyel@intel.com>, Jan Beulich <jbeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, Wei Liu <wl@xen.org>, Jun Nakajima <jun.nakajima@intel.com>, Kevin Tian <kevin.tian@intel.com>, Tamas K Lengyel <tamas@tklengyel.com>, George Dunlap <george.dunlap@citrix.com> Subject: [PATCH v2] x86/mem_sharing: support forks with active vPMU state Date: Wed, 20 Jul 2022 14:47:29 -0400 Message-Id: <a8a66208064c209e65c08380c59bc6aeff5f57f8.1658340502.git.tamas.lengyel@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v2] x86/mem_sharing: support forks with active vPMU state \| expand [v2] x86/mem_sharing: support forks with active vPMU state

Tamas K Lengyel July 20, 2022, 6:47 p.m. UTC

Currently the vPMU state from a parent isn't copied to VM forks. To enable the
vPMU state to be copied to a fork VM we export certain vPMU functions. First,
the vPMU context needs to be allocated for the fork if the parent has one. For
this we introduce vpmu->allocate_context, which has previously only been called
when the guest enables the PMU on itself. Furthermore, we export
vpmu_save_force so that the PMU context can be saved on-demand even if no
context switch took place on the parent's CPU yet. Additionally, we make sure
all relevant configuration MSRs are saved in the vPMU context so the copy is
complete and the fork starts with the same PMU config as the parent.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
v2: make some things conditional on CONFIG_MEM_SHARING
    add stub function on AMD to vpmu_ops to simplify allocate context calls
---
 xen/arch/x86/cpu/vpmu.c         | 14 ++++++++++-
 xen/arch/x86/cpu/vpmu_amd.c     | 12 +++++++++
 xen/arch/x86/cpu/vpmu_intel.c   | 32 ++++++++++++++++++++----
 xen/arch/x86/include/asm/vpmu.h | 17 +++++++++++++
 xen/arch/x86/mm/mem_sharing.c   | 44 +++++++++++++++++++++++++++++++++
 5 files changed, 113 insertions(+), 6 deletions(-)

Jan Beulich July 21, 2022, 10:19 a.m. UTC | #1

On 20.07.2022 20:47, Tamas K Lengyel wrote:
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1653,6 +1653,46 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
>      hvm_set_nonreg_state(cd_vcpu, &nrs);
>  }
>  
> +static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> +{
> +    struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> +    struct vpmu_struct *cd_vpmu = vcpu_vpmu(cd_vcpu);

I would hope two of the four pointers could actually be constified.

> +    if ( !vpmu_are_all_set(d_vpmu, VPMU_INITIALIZED | VPMU_CONTEXT_ALLOCATED) )
> +        return 0;
> +    if ( vpmu_allocate_context(cd_vcpu) )
> +        return -ENOMEM;

The function supplies an error code - please use it rather than
assuming it's always going to be -ENOMEM. Alternatively make the
function return bool. (Ideally the hook functions themselves would
be well-formed in this regard, but I realize that the Intel one is
pre-existing in its present undesirable shape.)

> +    /*
> +     * The VPMU subsystem only saves the context when the CPU does a context
> +     * switch. Otherwise, the relevant MSRs are not saved on vmexit.
> +     * We force a save here in case the parent CPU context is still loaded.
> +     */
> +    if ( vpmu_is_set(d_vpmu, VPMU_CONTEXT_LOADED) )
> +    {
> +        int pcpu = smp_processor_id();

unsigned int please.

> +        if ( d_vpmu->last_pcpu != pcpu )
> +        {
> +            on_selected_cpus(cpumask_of(d_vpmu->last_pcpu),
> +                             vpmu_save_force, (void *)d_vcpu, 1);

No need for the cast afaict.

> +            vpmu_reset(d_vpmu, VPMU_CONTEXT_LOADED);
> +        } else

Nit: Style.

> +            vpmu_save(d_vcpu);
> +    }
> +
> +    if ( vpmu_is_set(d_vpmu, VPMU_RUNNING) )
> +        vpmu_set(cd_vpmu, VPMU_RUNNING);
> +
> +    /* Make sure context gets (re-)loaded when scheduled next */
> +    vpmu_reset(cd_vpmu, VPMU_CONTEXT_LOADED);
> +
> +    memcpy(cd_vpmu->context, d_vpmu->context, d_vpmu->context_size);
> +    memcpy(cd_vpmu->priv_context, d_vpmu->priv_context, d_vpmu->priv_context_size);

Nit: Long line.

Jan

Andrew Cooper July 21, 2022, 12:03 p.m. UTC | #2

On 20/07/2022 19:47, Tamas K Lengyel wrote:
> diff --git a/xen/arch/x86/cpu/vpmu_amd.c b/xen/arch/x86/cpu/vpmu_amd.c
> index 9bacc02ec1..4c76e24551 100644
> --- a/xen/arch/x86/cpu/vpmu_amd.c
> +++ b/xen/arch/x86/cpu/vpmu_amd.c
> @@ -518,6 +518,14 @@ static int cf_check svm_vpmu_initialise(struct vcpu *v)
>      return 0;
>  }
>  
> +#ifdef CONFIG_MEM_SHARING
> +static int cf_check amd_allocate_context(struct vcpu *v)
> +{
> +    ASSERT_UNREACHABLE();

What makes this unreachable?

I know none of this is tested on AMD, but it is in principle reachable I
think.

I'd just leave this as return 0.  It will be slightly less rude to
whomever adds forking support on AMD.

> +    return 0;
> +}
> +#endif
> +
>  static const struct arch_vpmu_ops __initconst_cf_clobber amd_vpmu_ops = {
>      .initialise = svm_vpmu_initialise,
>      .do_wrmsr = amd_vpmu_do_wrmsr,
> @@ -527,6 +535,10 @@ static const struct arch_vpmu_ops __initconst_cf_clobber amd_vpmu_ops = {
>      .arch_vpmu_save = amd_vpmu_save,
>      .arch_vpmu_load = amd_vpmu_load,
>      .arch_vpmu_dump = amd_vpmu_dump,
> +
> +#ifdef CONFIG_MEM_SHARING
> +    .allocate_context = amd_allocate_context

Trailing comma please, and in the Intel structure.

> +#endif
>  };
>  
>  static const struct arch_vpmu_ops *__init common_init(void)
> diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
> index 8612f46973..01d4296485 100644
> --- a/xen/arch/x86/cpu/vpmu_intel.c
> +++ b/xen/arch/x86/cpu/vpmu_intel.c
> @@ -282,10 +282,17 @@ static inline void __core2_vpmu_save(struct vcpu *v)
>      for ( i = 0; i < fixed_pmc_cnt; i++ )
>          rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
>      for ( i = 0; i < arch_pmc_cnt; i++ )
> +    {
>          rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
> +        rdmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
> +    }
>  
>      if ( !is_hvm_vcpu(v) )
>          rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
> +    /* Save MSR to private context to make it fork-friendly */
> +    else if ( mem_sharing_enabled(v->domain) )
> +        vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
> +                           &core2_vpmu_cxt->global_ctrl);

/sigh.  So we're also not using the VMCS perf controls either.

That wants fixing too, but isn't a task for now.

Everything else LGTM.

~Andrew

Tamas K Lengyel July 21, 2022, 12:31 p.m. UTC | #3

On Thu, Jul 21, 2022 at 6:19 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 20.07.2022 20:47, Tamas K Lengyel wrote:
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1653,6 +1653,46 @@ static void copy_vcpu_nonreg_state(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> >      hvm_set_nonreg_state(cd_vcpu, &nrs);
> >  }
> >
> > +static int copy_vpmu(struct vcpu *d_vcpu, struct vcpu *cd_vcpu)
> > +{
> > +    struct vpmu_struct *d_vpmu = vcpu_vpmu(d_vcpu);
> > +    struct vpmu_struct *cd_vpmu = vcpu_vpmu(cd_vcpu);
>
> I would hope two of the four pointers could actually be constified.

I don't think so, we do modify both the vpmu and vcpu state as-needed
on both the parent and the child.

> > +    if ( !vpmu_are_all_set(d_vpmu, VPMU_INITIALIZED | VPMU_CONTEXT_ALLOCATED) )
> > +        return 0;
> > +    if ( vpmu_allocate_context(cd_vcpu) )
> > +        return -ENOMEM;
>
> The function supplies an error code - please use it rather than
> assuming it's always going to be -ENOMEM. Alternatively make the
> function return bool. (Ideally the hook functions themselves would
> be well-formed in this regard, but I realize that the Intel one is
> pre-existing in its present undesirable shape.)

Sure.

> > +    /*
> > +     * The VPMU subsystem only saves the context when the CPU does a context
> > +     * switch. Otherwise, the relevant MSRs are not saved on vmexit.
> > +     * We force a save here in case the parent CPU context is still loaded.
> > +     */
> > +    if ( vpmu_is_set(d_vpmu, VPMU_CONTEXT_LOADED) )
> > +    {
> > +        int pcpu = smp_processor_id();
>
> unsigned int please.
>
> > +        if ( d_vpmu->last_pcpu != pcpu )
> > +        {
> > +            on_selected_cpus(cpumask_of(d_vpmu->last_pcpu),
> > +                             vpmu_save_force, (void *)d_vcpu, 1);
>
> No need for the cast afaict.
>
> > +            vpmu_reset(d_vpmu, VPMU_CONTEXT_LOADED);
> > +        } else
>
> Nit: Style.

Sure, these were all pretty much copy-pasted but will fix them.

> > +            vpmu_save(d_vcpu);
> > +    }
> > +
> > +    if ( vpmu_is_set(d_vpmu, VPMU_RUNNING) )
> > +        vpmu_set(cd_vpmu, VPMU_RUNNING);
> > +
> > +    /* Make sure context gets (re-)loaded when scheduled next */
> > +    vpmu_reset(cd_vpmu, VPMU_CONTEXT_LOADED);
> > +
> > +    memcpy(cd_vpmu->context, d_vpmu->context, d_vpmu->context_size);
> > +    memcpy(cd_vpmu->priv_context, d_vpmu->priv_context, d_vpmu->priv_context_size);
>
> Nit: Long line.

Ack.

Thanks,
Tamas

Tamas K Lengyel July 21, 2022, 12:35 p.m. UTC | #4

On Thu, Jul 21, 2022 at 8:03 AM Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>
> On 20/07/2022 19:47, Tamas K Lengyel wrote:
> > diff --git a/xen/arch/x86/cpu/vpmu_amd.c b/xen/arch/x86/cpu/vpmu_amd.c
> > index 9bacc02ec1..4c76e24551 100644
> > --- a/xen/arch/x86/cpu/vpmu_amd.c
> > +++ b/xen/arch/x86/cpu/vpmu_amd.c
> > @@ -518,6 +518,14 @@ static int cf_check svm_vpmu_initialise(struct vcpu *v)
> >      return 0;
> >  }
> >
> > +#ifdef CONFIG_MEM_SHARING
> > +static int cf_check amd_allocate_context(struct vcpu *v)
> > +{
> > +    ASSERT_UNREACHABLE();
>
> What makes this unreachable?
>
> I know none of this is tested on AMD, but it is in principle reachable I
> think.
>
> I'd just leave this as return 0.  It will be slightly less rude to
> whomever adds forking support on AMD.

The only caller is the vm fork route and vm forks are explicitly only
available on Intel (see mem_sharing_control). So this is unreachable
and IMHO should be noted as such.

>
> > +    return 0;
> > +}
> > +#endif
> > +
> >  static const struct arch_vpmu_ops __initconst_cf_clobber amd_vpmu_ops = {
> >      .initialise = svm_vpmu_initialise,
> >      .do_wrmsr = amd_vpmu_do_wrmsr,
> > @@ -527,6 +535,10 @@ static const struct arch_vpmu_ops __initconst_cf_clobber amd_vpmu_ops = {
> >      .arch_vpmu_save = amd_vpmu_save,
> >      .arch_vpmu_load = amd_vpmu_load,
> >      .arch_vpmu_dump = amd_vpmu_dump,
> > +
> > +#ifdef CONFIG_MEM_SHARING
> > +    .allocate_context = amd_allocate_context
>
> Trailing comma please, and in the Intel structure.

Ack

> > +#endif
> >  };
> >
> >  static const struct arch_vpmu_ops *__init common_init(void)
> > diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
> > index 8612f46973..01d4296485 100644
> > --- a/xen/arch/x86/cpu/vpmu_intel.c
> > +++ b/xen/arch/x86/cpu/vpmu_intel.c
> > @@ -282,10 +282,17 @@ static inline void __core2_vpmu_save(struct vcpu *v)
> >      for ( i = 0; i < fixed_pmc_cnt; i++ )
> >          rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
> >      for ( i = 0; i < arch_pmc_cnt; i++ )
> > +    {
> >          rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
> > +        rdmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
> > +    }
> >
> >      if ( !is_hvm_vcpu(v) )
> >          rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
> > +    /* Save MSR to private context to make it fork-friendly */
> > +    else if ( mem_sharing_enabled(v->domain) )
> > +        vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
> > +                           &core2_vpmu_cxt->global_ctrl);
>
> /sigh.  So we're also not using the VMCS perf controls either.
>
> That wants fixing too, but isn't a task for now.

It does get saved and swapped on vmexit but we don't want to do this
vmx_read/vmx_write in the mem_sharing codebase. It's much cleaner if
this is saved into the vpmu context structure and reloaded from there,
so we can just do a memcpy in mem_sharing without having to know the
details.

> Everything else LGTM.

Cheers!
Tamas

Andrew Cooper July 22, 2022, 10:54 a.m. UTC | #5

On 21/07/2022 13:35, Tamas K Lengyel wrote:
> On Thu, Jul 21, 2022 at 8:03 AM Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>> On 20/07/2022 19:47, Tamas K Lengyel wrote:
>>> +#endif
>>>  };
>>>
>>>  static const struct arch_vpmu_ops *__init common_init(void)
>>> diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
>>> index 8612f46973..01d4296485 100644
>>> --- a/xen/arch/x86/cpu/vpmu_intel.c
>>> +++ b/xen/arch/x86/cpu/vpmu_intel.c
>>> @@ -282,10 +282,17 @@ static inline void __core2_vpmu_save(struct vcpu *v)
>>>      for ( i = 0; i < fixed_pmc_cnt; i++ )
>>>          rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
>>>      for ( i = 0; i < arch_pmc_cnt; i++ )
>>> +    {
>>>          rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
>>> +        rdmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
>>> +    }
>>>
>>>      if ( !is_hvm_vcpu(v) )
>>>          rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
>>> +    /* Save MSR to private context to make it fork-friendly */
>>> +    else if ( mem_sharing_enabled(v->domain) )
>>> +        vmx_read_guest_msr(v, MSR_CORE_PERF_GLOBAL_CTRL,
>>> +                           &core2_vpmu_cxt->global_ctrl);
>> /sigh.  So we're also not using the VMCS perf controls either.
>>
>> That wants fixing too, but isn't a task for now.
> It does get saved and swapped on vmexit but we don't want to do this
> vmx_read/vmx_write in the mem_sharing codebase. It's much cleaner if
> this is saved into the vpmu context structure and reloaded from there,
> so we can just do a memcpy in mem_sharing without having to know the
> details.

This is specifically why I introduced the {pv,hvm}_{get,set}_reg()
interfaces.

Lots of callers want to operate on a specific register, without wanting
to know if it's live in an MSR, or in the VMCB, VMCS, MSR load/save
list, or in a random structure in memory.

This is a perfect example that wants converting.  One patch to move
MSR_CORE_PERF_GLOBAL_CTRL into the get/set reg infrastructure (no
practical change), and then a second patch to make the VT-x
implementation conditional between the MSR load/save lists and the VMCS
host/guest controls depending on hardware support.

~Andrew

[v2] x86/mem_sharing: support forks with active vPMU state

Commit Message

Comments

Patch