diff mbox series

[4/6] nestedsvm: Disable TscRateMSR

Message ID 20240206012051.3564035-5-george.dunlap@cloud.com (mailing list archive)
State Superseded
Headers show
Series AMD Nested Virt Preparation | expand

Commit Message

George Dunlap Feb. 6, 2024, 1:20 a.m. UTC
The primary purpose of TSC scaling, from our perspective, is to
maintain the fiction of an "invariant TSC" across migrates between
platforms with different clock speeds.

On AMD, the TscRateMSR CPUID bit is unconditionally enabled in the
"host cpuid", even if the hardware doesn't actually support it.
According to c/s fd14a1943c4 ("nestedsvm: Support TSC Rate MSR"),
testing showed that emulating TSC scaling in an L1 was more expensive
than emulating TSC scaling on an L0 (due to extra sets of vmexit /
vmenter).

However, the current implementation seems to be broken.

First of all, the final L2 scaling ratio should be a composition of
the L0 scaling ratio and the L1 scaling ratio; there's no indication
this is being done anywhere.

Secondly, it's not clear that the L1 tsc scaling ratio actually
affects the L0 tsc scaling ratio.  The stored value (ns_tscratio) is
used to affect the tsc *offset*, but doesn't seem to actually be
factored into d->hvm.tsc_scaling_ratio.  (Which shouldn't be
per-domain anyway, but per-vcpu.)  Having the *offset* scaled
according to the nested scaling without the actual RDTSC itself also
being scaled has got to produce inconsistent results.

For now, just disable the functionality entirely until we can
implement it properly:

- Don't set TSCRATEMSR in the host CPUID policy

- Remove MSR_AMD64_TSC_RATIO emulation handling, so that the guest
  guests a #GP if it tries to access them (as it should when
  TSCRATEMSR is clear)

- Remove ns_tscratio from struct nestedhvm, and all code that touches
  it

Unfortunately this means ripping out the scaling calculation stuff as
well, since it's only used in the nested case; it's there in the git
tree if we need it for reference when we re-introduce it.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
---
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/cpu-policy.c                    |  3 +-
 xen/arch/x86/hvm/svm/nestedsvm.c             |  2 -
 xen/arch/x86/hvm/svm/svm.c                   | 57 --------------------
 xen/arch/x86/include/asm/hvm/svm/nestedsvm.h |  5 --
 4 files changed, 1 insertion(+), 66 deletions(-)

Comments

Jan Beulich Feb. 19, 2024, 3:22 p.m. UTC | #1
On 06.02.2024 02:20, George Dunlap wrote:
> For now, just disable the functionality entirely until we can
> implement it properly:
> 
> - Don't set TSCRATEMSR in the host CPUID policy

This goes too far: This way you would (in principle) also affect guests
with nesting disabled. According to the earlier parts of the description
there's also no issue with it in that case. What you want to make sure
it that in the HVM policy the bit isn't set.

While presently resolving to cpu_has_svm_feature(), I think
cpu_has_tsc_ratio really ought to resolve to the host policy field.
Of course then requiring the host policy to reflect reality rather than
having what is "always emulated". IOW ...

> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void)
>                                 (1u << SVM_FEATURE_PAUSEFILTER) |
>                                 (1u << SVM_FEATURE_DECODEASSISTS));
>          /* Enable features which are always emulated. */
> -        p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
> -                               (1u << SVM_FEATURE_TSCRATEMSR));
> +        p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);

... this likely wants replacing altogether by not overriding what we
found in hardware, which would apparently mean moving the two bit
masks to the earlier "clamping" expression.

But then of course Andrew may know of reasons why all of this is done
in calculate_host_policy() in the first place, rather than in HVM
policy calculation.

Jan
George Dunlap Feb. 21, 2024, 8:48 a.m. UTC | #2
On Mon, Feb 19, 2024 at 11:22 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.02.2024 02:20, George Dunlap wrote:
> > For now, just disable the functionality entirely until we can
> > implement it properly:
> >
> > - Don't set TSCRATEMSR in the host CPUID policy
>
> This goes too far: This way you would (in principle) also affect guests
> with nesting disabled. According to the earlier parts of the description
> there's also no issue with it in that case. What you want to make sure
> it that in the HVM policy the bit isn't set.
>
> While presently resolving to cpu_has_svm_feature(), I think
> cpu_has_tsc_ratio really ought to resolve to the host policy field.
> Of course then requiring the host policy to reflect reality rather than
> having what is "always emulated". IOW ...
>
> > --- a/xen/arch/x86/cpu-policy.c
> > +++ b/xen/arch/x86/cpu-policy.c
> > @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void)
> >                                 (1u << SVM_FEATURE_PAUSEFILTER) |
> >                                 (1u << SVM_FEATURE_DECODEASSISTS));
> >          /* Enable features which are always emulated. */
> > -        p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
> > -                               (1u << SVM_FEATURE_TSCRATEMSR));
> > +        p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);
>
> ... this likely wants replacing altogether by not overriding what we
> found in hardware, which would apparently mean moving the two bit
> masks to the earlier "clamping" expression.
>
> But then of course Andrew may know of reasons why all of this is done
> in calculate_host_policy() in the first place, rather than in HVM
> policy calculation.

It sounds like maybe you're confusing host_policy with
x86_capabilities?  From what I can tell:

*  the "basic" cpu_has_X macros resolve to boot_cpu_has(), which
resolves to cpu_has(&boot_cpu_data, ...), which is completely
independent of the cpu-policy.c:host_cpu_policy

* cpu-policy.c:host_cpu_policy only affects what is advertised to
guests, via {pv,hvm}_cpu_policy and featureset bits.  Most notably a
quick skim doesn't show any mechanism by which host_cpu_policy could
affect what features Xen itself decides to use.

Not sure exactly why the nested virt stuff is done at the
host_cpu_policy level rather than the hvm_cpu_policy level, but since
that's where it is, that's where we need to change it.

FWIW, as I said in response to your comment on 2/6, it would be nicer
if we moved svm_feature_flags into the "capabilities" section; but
that's a different set of work.

 -George


 -George
Jan Beulich Feb. 21, 2024, 10:52 a.m. UTC | #3
On 21.02.2024 09:48, George Dunlap wrote:
> On Mon, Feb 19, 2024 at 11:22 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 06.02.2024 02:20, George Dunlap wrote:
>>> For now, just disable the functionality entirely until we can
>>> implement it properly:
>>>
>>> - Don't set TSCRATEMSR in the host CPUID policy
>>
>> This goes too far: This way you would (in principle) also affect guests
>> with nesting disabled. According to the earlier parts of the description
>> there's also no issue with it in that case. What you want to make sure
>> it that in the HVM policy the bit isn't set.
>>
>> While presently resolving to cpu_has_svm_feature(), I think
>> cpu_has_tsc_ratio really ought to resolve to the host policy field.
>> Of course then requiring the host policy to reflect reality rather than
>> having what is "always emulated". IOW ...
>>
>>> --- a/xen/arch/x86/cpu-policy.c
>>> +++ b/xen/arch/x86/cpu-policy.c
>>> @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void)
>>>                                 (1u << SVM_FEATURE_PAUSEFILTER) |
>>>                                 (1u << SVM_FEATURE_DECODEASSISTS));
>>>          /* Enable features which are always emulated. */
>>> -        p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
>>> -                               (1u << SVM_FEATURE_TSCRATEMSR));
>>> +        p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);
>>
>> ... this likely wants replacing altogether by not overriding what we
>> found in hardware, which would apparently mean moving the two bit
>> masks to the earlier "clamping" expression.
>>
>> But then of course Andrew may know of reasons why all of this is done
>> in calculate_host_policy() in the first place, rather than in HVM
>> policy calculation.
> 
> It sounds like maybe you're confusing host_policy with
> x86_capabilities?  From what I can tell:
> 
> *  the "basic" cpu_has_X macros resolve to boot_cpu_has(), which
> resolves to cpu_has(&boot_cpu_data, ...), which is completely
> independent of the cpu-policy.c:host_cpu_policy
> 
> * cpu-policy.c:host_cpu_policy only affects what is advertised to
> guests, via {pv,hvm}_cpu_policy and featureset bits.  Most notably a
> quick skim doesn't show any mechanism by which host_cpu_policy could
> affect what features Xen itself decides to use.

I'm not mixing the two, no; the two are still insufficiently disentangled.
There's really no reason (long term) to have both host policy and
x86_capabilities. Therefore I'd prefer if new code (including a basically
fundamental re-write as is going to be needed for nested) to avoid
needlessly further extending x86_capabilities. Unless of course there's
something fundamentally wrong with eliminating the redundancy, which
likely Andrew would be in the best position to point out.

> Not sure exactly why the nested virt stuff is done at the
> host_cpu_policy level rather than the hvm_cpu_policy level, but since
> that's where it is, that's where we need to change it.
> 
> FWIW, as I said in response to your comment on 2/6, it would be nicer
> if we moved svm_feature_flags into the "capabilities" section; but
> that's a different set of work.

Can as well reply here then, rather than there: If the movement from
host to HVM policy was done first, the patch here could more sanely go
on top, and patch 2 could then also go on top, converting the separate
variable to host policy accesses, quite possibly introducing a similar
wrapper as you introduce there right now.

But no, I'm not meaning to make this a requirement; this would merely be
an imo better approach. My ack there stands, in case you want to keep
(and commit) the change as is.

Jan
George Dunlap Feb. 22, 2024, 9:30 a.m. UTC | #4
On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote:
> >> But then of course Andrew may know of reasons why all of this is done
> >> in calculate_host_policy() in the first place, rather than in HVM
> >> policy calculation.
> >
> > It sounds like maybe you're confusing host_policy with
> > x86_capabilities?  From what I can tell:
> >
> > *  the "basic" cpu_has_X macros resolve to boot_cpu_has(), which
> > resolves to cpu_has(&boot_cpu_data, ...), which is completely
> > independent of the cpu-policy.c:host_cpu_policy
> >
> > * cpu-policy.c:host_cpu_policy only affects what is advertised to
> > guests, via {pv,hvm}_cpu_policy and featureset bits.  Most notably a
> > quick skim doesn't show any mechanism by which host_cpu_policy could
> > affect what features Xen itself decides to use.
>
> I'm not mixing the two, no; the two are still insufficiently disentangled.
> There's really no reason (long term) to have both host policy and
> x86_capabilities. Therefore I'd prefer if new code (including a basically
> fundamental re-write as is going to be needed for nested) to avoid
> needlessly further extending x86_capabilities. Unless of course there's
> something fundamentally wrong with eliminating the redundancy, which
> likely Andrew would be in the best position to point out.

So I don't know the history of how things got to be the way they are,
nor really much about the code but what I've gathered from skimming
through while creating this patch series.  But from that impression,
the only issue I really see with the current code is the confusing
naming.  The cpufeature.h code has this nice infrastructure to allow
you to, for instance, enable or disable certain bits on the
command-line; and the interface for querying all the different bits of
functionality is all nicely put in one place.  Moving the
svm_feature_flags into x86_capabilities would immediately allow SVM to
take advantage of this infrastructure; it's not clear to me how this
would be "needless".

Furthermore, it looks to me like host_cpu_policy is used as a starting
point for generating pv_cpu_policy and hvm_cpu_policy, both of which
are only used for guest cpuid generation.  Given that the format of
those policies is fixed, and there's a lot of "copy this bit from the
host policy wholesale", it seems like no matter what, you'd want a
host_cpu_policy.

And in any case -- all that is kind of moot.  *Right now*,
host_cpu_policy is only used for guest cpuid policy creation; *right
now*, the nested virt features of AMD are handled in the
host_cpu_policy; *right now*, we're advertising to guests bits which
are not properly virtualized; *right now* these bits are actually set
unconditionally, regardless of whether they're even available on the
hardware; *right now*, Xen uses svm_feature_flags to determine its own
use of TscRateMSR; so *right now*, removing this bit from
host_cpu_policy won't prevent Xen from using TscRateMSR itself.

(Unless my understanding of the code is wrong, in which case I'd
appreciate a correction.)

If at some point in the future x86_capabilities and host_cpu_policy
were merged somehow, whoever did the merging would have to untangle
the twiddling of these bits anyway.  What I'm changing in this patch
wouldn't make that any harder.

> > Not sure exactly why the nested virt stuff is done at the
> > host_cpu_policy level rather than the hvm_cpu_policy level, but since
> > that's where it is, that's where we need to change it.
> >
> > FWIW, as I said in response to your comment on 2/6, it would be nicer
> > if we moved svm_feature_flags into the "capabilities" section; but
> > that's a different set of work.
>
> Can as well reply here then, rather than there: If the movement from
> host to HVM policy was done first, the patch here could more sanely go
> on top, and patch 2 could then also go on top, converting the separate
> variable to host policy accesses, quite possibly introducing a similar
> wrapper as you introduce there right now.
>
> But no, I'm not meaning to make this a requirement; this would merely be
> an imo better approach. My ack there stands, in case you want to keep
> (and commit) the change as is.

I mean, I don't mind doing a bit more prep work, if I know that's the
direction we want to go in.  "Actually, since you're doing a bit of
clean-up anyway -- right now host_cpu_policy is only used to calculate
guest policy, but we'd like to move over to being the Source of Truth
for the host instead of x86_capabilities.  While you're here, would
you mind moving the nested virt policy stuff into hvm_cpu_policy
instead?"

I certainly wouldn't want to move svm_feature_flags into
host_cpu_policy while it's still got random other "guest-only" policy
bits in it; and auditing it for such policy bits is out of the scope
of this work.

 -George
Jan Beulich Feb. 22, 2024, 9:50 a.m. UTC | #5
On 22.02.2024 10:30, George Dunlap wrote:
> On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>> But then of course Andrew may know of reasons why all of this is done
>>>> in calculate_host_policy() in the first place, rather than in HVM
>>>> policy calculation.
>>>
>>> It sounds like maybe you're confusing host_policy with
>>> x86_capabilities?  From what I can tell:
>>>
>>> *  the "basic" cpu_has_X macros resolve to boot_cpu_has(), which
>>> resolves to cpu_has(&boot_cpu_data, ...), which is completely
>>> independent of the cpu-policy.c:host_cpu_policy
>>>
>>> * cpu-policy.c:host_cpu_policy only affects what is advertised to
>>> guests, via {pv,hvm}_cpu_policy and featureset bits.  Most notably a
>>> quick skim doesn't show any mechanism by which host_cpu_policy could
>>> affect what features Xen itself decides to use.
>>
>> I'm not mixing the two, no; the two are still insufficiently disentangled.
>> There's really no reason (long term) to have both host policy and
>> x86_capabilities. Therefore I'd prefer if new code (including a basically
>> fundamental re-write as is going to be needed for nested) to avoid
>> needlessly further extending x86_capabilities. Unless of course there's
>> something fundamentally wrong with eliminating the redundancy, which
>> likely Andrew would be in the best position to point out.
> 
> So I don't know the history of how things got to be the way they are,
> nor really much about the code but what I've gathered from skimming
> through while creating this patch series.  But from that impression,
> the only issue I really see with the current code is the confusing
> naming.  The cpufeature.h code has this nice infrastructure to allow
> you to, for instance, enable or disable certain bits on the
> command-line; and the interface for querying all the different bits of
> functionality is all nicely put in one place.  Moving the
> svm_feature_flags into x86_capabilities would immediately allow SVM to
> take advantage of this infrastructure; it's not clear to me how this
> would be "needless".
> 
> Furthermore, it looks to me like host_cpu_policy is used as a starting
> point for generating pv_cpu_policy and hvm_cpu_policy, both of which
> are only used for guest cpuid generation.  Given that the format of
> those policies is fixed, and there's a lot of "copy this bit from the
> host policy wholesale", it seems like no matter what, you'd want a
> host_cpu_policy.
> 
> And in any case -- all that is kind of moot.  *Right now*,
> host_cpu_policy is only used for guest cpuid policy creation; *right
> now*, the nested virt features of AMD are handled in the
> host_cpu_policy; *right now*, we're advertising to guests bits which
> are not properly virtualized; *right now* these bits are actually set
> unconditionally, regardless of whether they're even available on the
> hardware; *right now*, Xen uses svm_feature_flags to determine its own
> use of TscRateMSR; so *right now*, removing this bit from
> host_cpu_policy won't prevent Xen from using TscRateMSR itself.
> 
> (Unless my understanding of the code is wrong, in which case I'd
> appreciate a correction.)

There's nothing wrong afaics, just missing at least one aspect: Did you
see all the featureset <-> policy conversions in cpu-policy.c? That (to
me at least) clearly is a sign of unnecessary duplication of the same
data. This goes as far as seeding the host policy from the raw one, just
to then immediately run x86_cpu_featureset_to_policy(), thus overwriting
a fair part of what was first taken from the raw policy. That's necessary
right now, because setup_{force,clear}_cpu_cap() act on
boot_cpu_data.x86_capability[], not the host policy.

As to the "needless" further up, it's only as far as moving those bits
into x86_capability[] would further duplicate information, rather than
(for that piece at least) putting them into the policies right away. But
yes, if the goal is to have setup_{force,clear}_cpu_cap() be able to
control those bits as well, then going the intermediate step would be
unavoidable at this point in time.

Jan
George Dunlap Feb. 22, 2024, 11 a.m. UTC | #6
On Thu, Feb 22, 2024 at 5:50 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 22.02.2024 10:30, George Dunlap wrote:
> > On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>> But then of course Andrew may know of reasons why all of this is done
> >>>> in calculate_host_policy() in the first place, rather than in HVM
> >>>> policy calculation.
> >>>
> >>> It sounds like maybe you're confusing host_policy with
> >>> x86_capabilities?  From what I can tell:
> >>>
> >>> *  the "basic" cpu_has_X macros resolve to boot_cpu_has(), which
> >>> resolves to cpu_has(&boot_cpu_data, ...), which is completely
> >>> independent of the cpu-policy.c:host_cpu_policy
> >>>
> >>> * cpu-policy.c:host_cpu_policy only affects what is advertised to
> >>> guests, via {pv,hvm}_cpu_policy and featureset bits.  Most notably a
> >>> quick skim doesn't show any mechanism by which host_cpu_policy could
> >>> affect what features Xen itself decides to use.
> >>
> >> I'm not mixing the two, no; the two are still insufficiently disentangled.
> >> There's really no reason (long term) to have both host policy and
> >> x86_capabilities. Therefore I'd prefer if new code (including a basically
> >> fundamental re-write as is going to be needed for nested) to avoid
> >> needlessly further extending x86_capabilities. Unless of course there's
> >> something fundamentally wrong with eliminating the redundancy, which
> >> likely Andrew would be in the best position to point out.
> >
> > So I don't know the history of how things got to be the way they are,
> > nor really much about the code but what I've gathered from skimming
> > through while creating this patch series.  But from that impression,
> > the only issue I really see with the current code is the confusing
> > naming.  The cpufeature.h code has this nice infrastructure to allow
> > you to, for instance, enable or disable certain bits on the
> > command-line; and the interface for querying all the different bits of
> > functionality is all nicely put in one place.  Moving the
> > svm_feature_flags into x86_capabilities would immediately allow SVM to
> > take advantage of this infrastructure; it's not clear to me how this
> > would be "needless".
> >
> > Furthermore, it looks to me like host_cpu_policy is used as a starting
> > point for generating pv_cpu_policy and hvm_cpu_policy, both of which
> > are only used for guest cpuid generation.  Given that the format of
> > those policies is fixed, and there's a lot of "copy this bit from the
> > host policy wholesale", it seems like no matter what, you'd want a
> > host_cpu_policy.
> >
> > And in any case -- all that is kind of moot.  *Right now*,
> > host_cpu_policy is only used for guest cpuid policy creation; *right
> > now*, the nested virt features of AMD are handled in the
> > host_cpu_policy; *right now*, we're advertising to guests bits which
> > are not properly virtualized; *right now* these bits are actually set
> > unconditionally, regardless of whether they're even available on the
> > hardware; *right now*, Xen uses svm_feature_flags to determine its own
> > use of TscRateMSR; so *right now*, removing this bit from
> > host_cpu_policy won't prevent Xen from using TscRateMSR itself.
> >
> > (Unless my understanding of the code is wrong, in which case I'd
> > appreciate a correction.)
>
> There's nothing wrong afaics, just missing at least one aspect: Did you
> see all the featureset <-> policy conversions in cpu-policy.c? That (to
> me at least) clearly is a sign of unnecessary duplication of the same
> data. This goes as far as seeding the host policy from the raw one, just
> to then immediately run x86_cpu_featureset_to_policy(), thus overwriting
> a fair part of what was first taken from the raw policy. That's necessary
> right now, because setup_{force,clear}_cpu_cap() act on
> boot_cpu_data.x86_capability[], not the host policy.
>
> As to the "needless" further up, it's only as far as moving those bits
> into x86_capability[] would further duplicate information, rather than
> (for that piece at least) putting them into the policies right away. But
> yes, if the goal is to have setup_{force,clear}_cpu_cap() be able to
> control those bits as well, then going the intermediate step would be
> unavoidable at this point in time.

I'm still not sure of what needs to happen to move this forward.

As I said, I'm not opposed to doing some prep work; but I don't want
to randomly guess as to what kinds of clean-up needs to be done, only
to be told it was wrong (either by you when I post it or by Andy
sometime after it's checked in).

I could certainly move svm_feature_flags into host_cpu_policy, and
have cpu_svm_feature_* reference host_cpu_policy instead (after moving
the nested virt "guest policy" tweaks into hvm_cpu_policy); but as far
as I can tell, that would be the *very first* instance of Xen using
host_cpu_policy in that manner.  I'd like more clarity that this is
the long-term direction that things are going before then.

If you (plural) don't have time now to refresh your memory / make an
informed decision about what you want to happen, then please consider
just taking the patch as it is; it doesn't make future changes any
harder.

 -George
Jan Beulich Feb. 22, 2024, 11:26 a.m. UTC | #7
On 22.02.2024 12:00, George Dunlap wrote:
> On Thu, Feb 22, 2024 at 5:50 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 22.02.2024 10:30, George Dunlap wrote:
>>> On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>> But then of course Andrew may know of reasons why all of this is done
>>>>>> in calculate_host_policy() in the first place, rather than in HVM
>>>>>> policy calculation.
>>>>>
>>>>> It sounds like maybe you're confusing host_policy with
>>>>> x86_capabilities?  From what I can tell:
>>>>>
>>>>> *  the "basic" cpu_has_X macros resolve to boot_cpu_has(), which
>>>>> resolves to cpu_has(&boot_cpu_data, ...), which is completely
>>>>> independent of the cpu-policy.c:host_cpu_policy
>>>>>
>>>>> * cpu-policy.c:host_cpu_policy only affects what is advertised to
>>>>> guests, via {pv,hvm}_cpu_policy and featureset bits.  Most notably a
>>>>> quick skim doesn't show any mechanism by which host_cpu_policy could
>>>>> affect what features Xen itself decides to use.
>>>>
>>>> I'm not mixing the two, no; the two are still insufficiently disentangled.
>>>> There's really no reason (long term) to have both host policy and
>>>> x86_capabilities. Therefore I'd prefer if new code (including a basically
>>>> fundamental re-write as is going to be needed for nested) to avoid
>>>> needlessly further extending x86_capabilities. Unless of course there's
>>>> something fundamentally wrong with eliminating the redundancy, which
>>>> likely Andrew would be in the best position to point out.
>>>
>>> So I don't know the history of how things got to be the way they are,
>>> nor really much about the code but what I've gathered from skimming
>>> through while creating this patch series.  But from that impression,
>>> the only issue I really see with the current code is the confusing
>>> naming.  The cpufeature.h code has this nice infrastructure to allow
>>> you to, for instance, enable or disable certain bits on the
>>> command-line; and the interface for querying all the different bits of
>>> functionality is all nicely put in one place.  Moving the
>>> svm_feature_flags into x86_capabilities would immediately allow SVM to
>>> take advantage of this infrastructure; it's not clear to me how this
>>> would be "needless".
>>>
>>> Furthermore, it looks to me like host_cpu_policy is used as a starting
>>> point for generating pv_cpu_policy and hvm_cpu_policy, both of which
>>> are only used for guest cpuid generation.  Given that the format of
>>> those policies is fixed, and there's a lot of "copy this bit from the
>>> host policy wholesale", it seems like no matter what, you'd want a
>>> host_cpu_policy.
>>>
>>> And in any case -- all that is kind of moot.  *Right now*,
>>> host_cpu_policy is only used for guest cpuid policy creation; *right
>>> now*, the nested virt features of AMD are handled in the
>>> host_cpu_policy; *right now*, we're advertising to guests bits which
>>> are not properly virtualized; *right now* these bits are actually set
>>> unconditionally, regardless of whether they're even available on the
>>> hardware; *right now*, Xen uses svm_feature_flags to determine its own
>>> use of TscRateMSR; so *right now*, removing this bit from
>>> host_cpu_policy won't prevent Xen from using TscRateMSR itself.
>>>
>>> (Unless my understanding of the code is wrong, in which case I'd
>>> appreciate a correction.)
>>
>> There's nothing wrong afaics, just missing at least one aspect: Did you
>> see all the featureset <-> policy conversions in cpu-policy.c? That (to
>> me at least) clearly is a sign of unnecessary duplication of the same
>> data. This goes as far as seeding the host policy from the raw one, just
>> to then immediately run x86_cpu_featureset_to_policy(), thus overwriting
>> a fair part of what was first taken from the raw policy. That's necessary
>> right now, because setup_{force,clear}_cpu_cap() act on
>> boot_cpu_data.x86_capability[], not the host policy.
>>
>> As to the "needless" further up, it's only as far as moving those bits
>> into x86_capability[] would further duplicate information, rather than
>> (for that piece at least) putting them into the policies right away. But
>> yes, if the goal is to have setup_{force,clear}_cpu_cap() be able to
>> control those bits as well, then going the intermediate step would be
>> unavoidable at this point in time.
> 
> I'm still not sure of what needs to happen to move this forward.
> 
> As I said, I'm not opposed to doing some prep work; but I don't want
> to randomly guess as to what kinds of clean-up needs to be done, only
> to be told it was wrong (either by you when I post it or by Andy
> sometime after it's checked in).
> 
> I could certainly move svm_feature_flags into host_cpu_policy, and
> have cpu_svm_feature_* reference host_cpu_policy instead (after moving
> the nested virt "guest policy" tweaks into hvm_cpu_policy); but as far
> as I can tell, that would be the *very first* instance of Xen using
> host_cpu_policy in that manner.  I'd like more clarity that this is
> the long-term direction that things are going before then.
> 
> If you (plural) don't have time now to refresh your memory / make an
> informed decision about what you want to happen, then please consider
> just taking the patch as it is; it doesn't make future changes any
> harder.

I'd be willing to ack the patch as is if I understood the piece of code
in calculate_host_policy() that is being changed here. My take is that
in a prereq patch that wants adjusting, by moving it wherever applicable.
Without indications to the contrary, it living there is plain wrong, and
the patch here touching that function isn't quite right either, even if
only as a result of the original issue.

If without at least a little bit of input from Andrew we can't move
forward, then that's the way it is - we need to wait. As you likely know
I have dozens of patches in that state ...

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 10079c26ae..d71abbc44a 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -407,8 +407,7 @@  static void __init calculate_host_policy(void)
                                (1u << SVM_FEATURE_PAUSEFILTER) |
                                (1u << SVM_FEATURE_DECODEASSISTS));
         /* Enable features which are always emulated. */
-        p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
-                               (1u << SVM_FEATURE_TSCRATEMSR));
+        p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);
     }
 
     /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index ee9602f5c8..d02a59f184 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -146,8 +146,6 @@  int cf_check nsvm_vcpu_reset(struct vcpu *v)
     svm->ns_msr_hsavepa = INVALID_PADDR;
     svm->ns_ovvmcb_pa = INVALID_PADDR;
 
-    svm->ns_tscratio = DEFAULT_TSC_RATIO;
-
     svm->ns_cr_intercepts = 0;
     svm->ns_dr_intercepts = 0;
     svm->ns_exception_intercepts = 0;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index b551eac807..34b9f603bc 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -777,43 +777,6 @@  static int cf_check svm_get_guest_pat(struct vcpu *v, u64 *gpat)
     return 1;
 }
 
-static uint64_t scale_tsc(uint64_t host_tsc, uint64_t ratio)
-{
-    uint64_t mult, frac, scaled_host_tsc;
-
-    if ( ratio == DEFAULT_TSC_RATIO )
-        return host_tsc;
-
-    /*
-     * Suppose the most significant 32 bits of host_tsc and ratio are
-     * tsc_h and mult, and the least 32 bits of them are tsc_l and frac,
-     * then
-     *     host_tsc * ratio * 2^-32
-     *     = host_tsc * (mult * 2^32 + frac) * 2^-32
-     *     = host_tsc * mult + (tsc_h * 2^32 + tsc_l) * frac * 2^-32
-     *     = host_tsc * mult + tsc_h * frac + ((tsc_l * frac) >> 32)
-     *
-     * Multiplications in the last two terms are between 32-bit integers,
-     * so both of them can fit in 64-bit integers.
-     *
-     * Because mult is usually less than 10 in practice, it's very rare
-     * that host_tsc * mult can overflow a 64-bit integer.
-     */
-    mult = ratio >> 32;
-    frac = ratio & ((1ULL << 32) - 1);
-    scaled_host_tsc  = host_tsc * mult;
-    scaled_host_tsc += (host_tsc >> 32) * frac;
-    scaled_host_tsc += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32;
-
-    return scaled_host_tsc;
-}
-
-static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc,
-    uint64_t ratio)
-{
-    return guest_tsc - scale_tsc(host_tsc, ratio);
-}
-
 static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 {
     struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
@@ -832,18 +795,8 @@  static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 
     if ( nestedhvm_vcpu_in_guestmode(v) )
     {
-        struct nestedsvm *svm = &vcpu_nestedsvm(v);
-
         n2_tsc_offset = vmcb_get_tsc_offset(n2vmcb) -
                         vmcb_get_tsc_offset(n1vmcb);
-        if ( svm->ns_tscratio != DEFAULT_TSC_RATIO )
-        {
-            uint64_t guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
-
-            n2_tsc_offset = svm_get_tsc_offset(guest_tsc,
-                                               guest_tsc + n2_tsc_offset,
-                                               svm->ns_tscratio);
-        }
         vmcb_set_tsc_offset(n1vmcb, offset);
     }
 
@@ -1921,10 +1874,6 @@  static int cf_check svm_msr_read_intercept(
         *msr_content = nsvm->ns_msr_hsavepa;
         break;
 
-    case MSR_AMD64_TSC_RATIO:
-        *msr_content = nsvm->ns_tscratio;
-        break;
-
     case MSR_AMD_OSVW_ID_LENGTH:
     case MSR_AMD_OSVW_STATUS:
         if ( !d->arch.cpuid->extd.osvw )
@@ -2103,12 +2052,6 @@  static int cf_check svm_msr_write_intercept(
             goto gpf;
         break;
 
-    case MSR_AMD64_TSC_RATIO:
-        if ( msr_content & TSC_RATIO_RSVD_BITS )
-            goto gpf;
-        nsvm->ns_tscratio = msr_content;
-        break;
-
     case MSR_IA32_MCx_MISC(4): /* Threshold register */
     case MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3:
         /*
diff --git a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h
index 406fc082b1..45d658ad01 100644
--- a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h
+++ b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h
@@ -18,11 +18,6 @@  struct nestedsvm {
      */
     uint64_t ns_ovvmcb_pa;
 
-    /* virtual tscratio holding the value l1 guest writes to the
-     * MSR_AMD64_TSC_RATIO MSR.
-     */
-    uint64_t ns_tscratio;
-
     /* Cached real intercepts of the l2 guest */
     uint32_t ns_cr_intercepts;
     uint32_t ns_dr_intercepts;