Message ID | 20240206012051.3564035-5-george.dunlap@cloud.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | AMD Nested Virt Preparation | expand |
On 06.02.2024 02:20, George Dunlap wrote: > For now, just disable the functionality entirely until we can > implement it properly: > > - Don't set TSCRATEMSR in the host CPUID policy This goes too far: This way you would (in principle) also affect guests with nesting disabled. According to the earlier parts of the description there's also no issue with it in that case. What you want to make sure it that in the HVM policy the bit isn't set. While presently resolving to cpu_has_svm_feature(), I think cpu_has_tsc_ratio really ought to resolve to the host policy field. Of course then requiring the host policy to reflect reality rather than having what is "always emulated". IOW ... > --- a/xen/arch/x86/cpu-policy.c > +++ b/xen/arch/x86/cpu-policy.c > @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void) > (1u << SVM_FEATURE_PAUSEFILTER) | > (1u << SVM_FEATURE_DECODEASSISTS)); > /* Enable features which are always emulated. */ > - p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | > - (1u << SVM_FEATURE_TSCRATEMSR)); > + p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); ... this likely wants replacing altogether by not overriding what we found in hardware, which would apparently mean moving the two bit masks to the earlier "clamping" expression. But then of course Andrew may know of reasons why all of this is done in calculate_host_policy() in the first place, rather than in HVM policy calculation. Jan
On Mon, Feb 19, 2024 at 11:22 PM Jan Beulich <jbeulich@suse.com> wrote: > > On 06.02.2024 02:20, George Dunlap wrote: > > For now, just disable the functionality entirely until we can > > implement it properly: > > > > - Don't set TSCRATEMSR in the host CPUID policy > > This goes too far: This way you would (in principle) also affect guests > with nesting disabled. According to the earlier parts of the description > there's also no issue with it in that case. What you want to make sure > it that in the HVM policy the bit isn't set. > > While presently resolving to cpu_has_svm_feature(), I think > cpu_has_tsc_ratio really ought to resolve to the host policy field. > Of course then requiring the host policy to reflect reality rather than > having what is "always emulated". IOW ... > > > --- a/xen/arch/x86/cpu-policy.c > > +++ b/xen/arch/x86/cpu-policy.c > > @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void) > > (1u << SVM_FEATURE_PAUSEFILTER) | > > (1u << SVM_FEATURE_DECODEASSISTS)); > > /* Enable features which are always emulated. */ > > - p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | > > - (1u << SVM_FEATURE_TSCRATEMSR)); > > + p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); > > ... this likely wants replacing altogether by not overriding what we > found in hardware, which would apparently mean moving the two bit > masks to the earlier "clamping" expression. > > But then of course Andrew may know of reasons why all of this is done > in calculate_host_policy() in the first place, rather than in HVM > policy calculation. It sounds like maybe you're confusing host_policy with x86_capabilities? From what I can tell: * the "basic" cpu_has_X macros resolve to boot_cpu_has(), which resolves to cpu_has(&boot_cpu_data, ...), which is completely independent of the cpu-policy.c:host_cpu_policy * cpu-policy.c:host_cpu_policy only affects what is advertised to guests, via {pv,hvm}_cpu_policy and featureset bits. Most notably a quick skim doesn't show any mechanism by which host_cpu_policy could affect what features Xen itself decides to use. Not sure exactly why the nested virt stuff is done at the host_cpu_policy level rather than the hvm_cpu_policy level, but since that's where it is, that's where we need to change it. FWIW, as I said in response to your comment on 2/6, it would be nicer if we moved svm_feature_flags into the "capabilities" section; but that's a different set of work. -George -George
On 21.02.2024 09:48, George Dunlap wrote: > On Mon, Feb 19, 2024 at 11:22 PM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 06.02.2024 02:20, George Dunlap wrote: >>> For now, just disable the functionality entirely until we can >>> implement it properly: >>> >>> - Don't set TSCRATEMSR in the host CPUID policy >> >> This goes too far: This way you would (in principle) also affect guests >> with nesting disabled. According to the earlier parts of the description >> there's also no issue with it in that case. What you want to make sure >> it that in the HVM policy the bit isn't set. >> >> While presently resolving to cpu_has_svm_feature(), I think >> cpu_has_tsc_ratio really ought to resolve to the host policy field. >> Of course then requiring the host policy to reflect reality rather than >> having what is "always emulated". IOW ... >> >>> --- a/xen/arch/x86/cpu-policy.c >>> +++ b/xen/arch/x86/cpu-policy.c >>> @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void) >>> (1u << SVM_FEATURE_PAUSEFILTER) | >>> (1u << SVM_FEATURE_DECODEASSISTS)); >>> /* Enable features which are always emulated. */ >>> - p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | >>> - (1u << SVM_FEATURE_TSCRATEMSR)); >>> + p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); >> >> ... this likely wants replacing altogether by not overriding what we >> found in hardware, which would apparently mean moving the two bit >> masks to the earlier "clamping" expression. >> >> But then of course Andrew may know of reasons why all of this is done >> in calculate_host_policy() in the first place, rather than in HVM >> policy calculation. > > It sounds like maybe you're confusing host_policy with > x86_capabilities? From what I can tell: > > * the "basic" cpu_has_X macros resolve to boot_cpu_has(), which > resolves to cpu_has(&boot_cpu_data, ...), which is completely > independent of the cpu-policy.c:host_cpu_policy > > * cpu-policy.c:host_cpu_policy only affects what is advertised to > guests, via {pv,hvm}_cpu_policy and featureset bits. Most notably a > quick skim doesn't show any mechanism by which host_cpu_policy could > affect what features Xen itself decides to use. I'm not mixing the two, no; the two are still insufficiently disentangled. There's really no reason (long term) to have both host policy and x86_capabilities. Therefore I'd prefer if new code (including a basically fundamental re-write as is going to be needed for nested) to avoid needlessly further extending x86_capabilities. Unless of course there's something fundamentally wrong with eliminating the redundancy, which likely Andrew would be in the best position to point out. > Not sure exactly why the nested virt stuff is done at the > host_cpu_policy level rather than the hvm_cpu_policy level, but since > that's where it is, that's where we need to change it. > > FWIW, as I said in response to your comment on 2/6, it would be nicer > if we moved svm_feature_flags into the "capabilities" section; but > that's a different set of work. Can as well reply here then, rather than there: If the movement from host to HVM policy was done first, the patch here could more sanely go on top, and patch 2 could then also go on top, converting the separate variable to host policy accesses, quite possibly introducing a similar wrapper as you introduce there right now. But no, I'm not meaning to make this a requirement; this would merely be an imo better approach. My ack there stands, in case you want to keep (and commit) the change as is. Jan
On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote: > >> But then of course Andrew may know of reasons why all of this is done > >> in calculate_host_policy() in the first place, rather than in HVM > >> policy calculation. > > > > It sounds like maybe you're confusing host_policy with > > x86_capabilities? From what I can tell: > > > > * the "basic" cpu_has_X macros resolve to boot_cpu_has(), which > > resolves to cpu_has(&boot_cpu_data, ...), which is completely > > independent of the cpu-policy.c:host_cpu_policy > > > > * cpu-policy.c:host_cpu_policy only affects what is advertised to > > guests, via {pv,hvm}_cpu_policy and featureset bits. Most notably a > > quick skim doesn't show any mechanism by which host_cpu_policy could > > affect what features Xen itself decides to use. > > I'm not mixing the two, no; the two are still insufficiently disentangled. > There's really no reason (long term) to have both host policy and > x86_capabilities. Therefore I'd prefer if new code (including a basically > fundamental re-write as is going to be needed for nested) to avoid > needlessly further extending x86_capabilities. Unless of course there's > something fundamentally wrong with eliminating the redundancy, which > likely Andrew would be in the best position to point out. So I don't know the history of how things got to be the way they are, nor really much about the code but what I've gathered from skimming through while creating this patch series. But from that impression, the only issue I really see with the current code is the confusing naming. The cpufeature.h code has this nice infrastructure to allow you to, for instance, enable or disable certain bits on the command-line; and the interface for querying all the different bits of functionality is all nicely put in one place. Moving the svm_feature_flags into x86_capabilities would immediately allow SVM to take advantage of this infrastructure; it's not clear to me how this would be "needless". Furthermore, it looks to me like host_cpu_policy is used as a starting point for generating pv_cpu_policy and hvm_cpu_policy, both of which are only used for guest cpuid generation. Given that the format of those policies is fixed, and there's a lot of "copy this bit from the host policy wholesale", it seems like no matter what, you'd want a host_cpu_policy. And in any case -- all that is kind of moot. *Right now*, host_cpu_policy is only used for guest cpuid policy creation; *right now*, the nested virt features of AMD are handled in the host_cpu_policy; *right now*, we're advertising to guests bits which are not properly virtualized; *right now* these bits are actually set unconditionally, regardless of whether they're even available on the hardware; *right now*, Xen uses svm_feature_flags to determine its own use of TscRateMSR; so *right now*, removing this bit from host_cpu_policy won't prevent Xen from using TscRateMSR itself. (Unless my understanding of the code is wrong, in which case I'd appreciate a correction.) If at some point in the future x86_capabilities and host_cpu_policy were merged somehow, whoever did the merging would have to untangle the twiddling of these bits anyway. What I'm changing in this patch wouldn't make that any harder. > > Not sure exactly why the nested virt stuff is done at the > > host_cpu_policy level rather than the hvm_cpu_policy level, but since > > that's where it is, that's where we need to change it. > > > > FWIW, as I said in response to your comment on 2/6, it would be nicer > > if we moved svm_feature_flags into the "capabilities" section; but > > that's a different set of work. > > Can as well reply here then, rather than there: If the movement from > host to HVM policy was done first, the patch here could more sanely go > on top, and patch 2 could then also go on top, converting the separate > variable to host policy accesses, quite possibly introducing a similar > wrapper as you introduce there right now. > > But no, I'm not meaning to make this a requirement; this would merely be > an imo better approach. My ack there stands, in case you want to keep > (and commit) the change as is. I mean, I don't mind doing a bit more prep work, if I know that's the direction we want to go in. "Actually, since you're doing a bit of clean-up anyway -- right now host_cpu_policy is only used to calculate guest policy, but we'd like to move over to being the Source of Truth for the host instead of x86_capabilities. While you're here, would you mind moving the nested virt policy stuff into hvm_cpu_policy instead?" I certainly wouldn't want to move svm_feature_flags into host_cpu_policy while it's still got random other "guest-only" policy bits in it; and auditing it for such policy bits is out of the scope of this work. -George
On 22.02.2024 10:30, George Dunlap wrote: > On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote: >>>> But then of course Andrew may know of reasons why all of this is done >>>> in calculate_host_policy() in the first place, rather than in HVM >>>> policy calculation. >>> >>> It sounds like maybe you're confusing host_policy with >>> x86_capabilities? From what I can tell: >>> >>> * the "basic" cpu_has_X macros resolve to boot_cpu_has(), which >>> resolves to cpu_has(&boot_cpu_data, ...), which is completely >>> independent of the cpu-policy.c:host_cpu_policy >>> >>> * cpu-policy.c:host_cpu_policy only affects what is advertised to >>> guests, via {pv,hvm}_cpu_policy and featureset bits. Most notably a >>> quick skim doesn't show any mechanism by which host_cpu_policy could >>> affect what features Xen itself decides to use. >> >> I'm not mixing the two, no; the two are still insufficiently disentangled. >> There's really no reason (long term) to have both host policy and >> x86_capabilities. Therefore I'd prefer if new code (including a basically >> fundamental re-write as is going to be needed for nested) to avoid >> needlessly further extending x86_capabilities. Unless of course there's >> something fundamentally wrong with eliminating the redundancy, which >> likely Andrew would be in the best position to point out. > > So I don't know the history of how things got to be the way they are, > nor really much about the code but what I've gathered from skimming > through while creating this patch series. But from that impression, > the only issue I really see with the current code is the confusing > naming. The cpufeature.h code has this nice infrastructure to allow > you to, for instance, enable or disable certain bits on the > command-line; and the interface for querying all the different bits of > functionality is all nicely put in one place. Moving the > svm_feature_flags into x86_capabilities would immediately allow SVM to > take advantage of this infrastructure; it's not clear to me how this > would be "needless". > > Furthermore, it looks to me like host_cpu_policy is used as a starting > point for generating pv_cpu_policy and hvm_cpu_policy, both of which > are only used for guest cpuid generation. Given that the format of > those policies is fixed, and there's a lot of "copy this bit from the > host policy wholesale", it seems like no matter what, you'd want a > host_cpu_policy. > > And in any case -- all that is kind of moot. *Right now*, > host_cpu_policy is only used for guest cpuid policy creation; *right > now*, the nested virt features of AMD are handled in the > host_cpu_policy; *right now*, we're advertising to guests bits which > are not properly virtualized; *right now* these bits are actually set > unconditionally, regardless of whether they're even available on the > hardware; *right now*, Xen uses svm_feature_flags to determine its own > use of TscRateMSR; so *right now*, removing this bit from > host_cpu_policy won't prevent Xen from using TscRateMSR itself. > > (Unless my understanding of the code is wrong, in which case I'd > appreciate a correction.) There's nothing wrong afaics, just missing at least one aspect: Did you see all the featureset <-> policy conversions in cpu-policy.c? That (to me at least) clearly is a sign of unnecessary duplication of the same data. This goes as far as seeding the host policy from the raw one, just to then immediately run x86_cpu_featureset_to_policy(), thus overwriting a fair part of what was first taken from the raw policy. That's necessary right now, because setup_{force,clear}_cpu_cap() act on boot_cpu_data.x86_capability[], not the host policy. As to the "needless" further up, it's only as far as moving those bits into x86_capability[] would further duplicate information, rather than (for that piece at least) putting them into the policies right away. But yes, if the goal is to have setup_{force,clear}_cpu_cap() be able to control those bits as well, then going the intermediate step would be unavoidable at this point in time. Jan
On Thu, Feb 22, 2024 at 5:50 PM Jan Beulich <jbeulich@suse.com> wrote: > > On 22.02.2024 10:30, George Dunlap wrote: > > On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote: > >>>> But then of course Andrew may know of reasons why all of this is done > >>>> in calculate_host_policy() in the first place, rather than in HVM > >>>> policy calculation. > >>> > >>> It sounds like maybe you're confusing host_policy with > >>> x86_capabilities? From what I can tell: > >>> > >>> * the "basic" cpu_has_X macros resolve to boot_cpu_has(), which > >>> resolves to cpu_has(&boot_cpu_data, ...), which is completely > >>> independent of the cpu-policy.c:host_cpu_policy > >>> > >>> * cpu-policy.c:host_cpu_policy only affects what is advertised to > >>> guests, via {pv,hvm}_cpu_policy and featureset bits. Most notably a > >>> quick skim doesn't show any mechanism by which host_cpu_policy could > >>> affect what features Xen itself decides to use. > >> > >> I'm not mixing the two, no; the two are still insufficiently disentangled. > >> There's really no reason (long term) to have both host policy and > >> x86_capabilities. Therefore I'd prefer if new code (including a basically > >> fundamental re-write as is going to be needed for nested) to avoid > >> needlessly further extending x86_capabilities. Unless of course there's > >> something fundamentally wrong with eliminating the redundancy, which > >> likely Andrew would be in the best position to point out. > > > > So I don't know the history of how things got to be the way they are, > > nor really much about the code but what I've gathered from skimming > > through while creating this patch series. But from that impression, > > the only issue I really see with the current code is the confusing > > naming. The cpufeature.h code has this nice infrastructure to allow > > you to, for instance, enable or disable certain bits on the > > command-line; and the interface for querying all the different bits of > > functionality is all nicely put in one place. Moving the > > svm_feature_flags into x86_capabilities would immediately allow SVM to > > take advantage of this infrastructure; it's not clear to me how this > > would be "needless". > > > > Furthermore, it looks to me like host_cpu_policy is used as a starting > > point for generating pv_cpu_policy and hvm_cpu_policy, both of which > > are only used for guest cpuid generation. Given that the format of > > those policies is fixed, and there's a lot of "copy this bit from the > > host policy wholesale", it seems like no matter what, you'd want a > > host_cpu_policy. > > > > And in any case -- all that is kind of moot. *Right now*, > > host_cpu_policy is only used for guest cpuid policy creation; *right > > now*, the nested virt features of AMD are handled in the > > host_cpu_policy; *right now*, we're advertising to guests bits which > > are not properly virtualized; *right now* these bits are actually set > > unconditionally, regardless of whether they're even available on the > > hardware; *right now*, Xen uses svm_feature_flags to determine its own > > use of TscRateMSR; so *right now*, removing this bit from > > host_cpu_policy won't prevent Xen from using TscRateMSR itself. > > > > (Unless my understanding of the code is wrong, in which case I'd > > appreciate a correction.) > > There's nothing wrong afaics, just missing at least one aspect: Did you > see all the featureset <-> policy conversions in cpu-policy.c? That (to > me at least) clearly is a sign of unnecessary duplication of the same > data. This goes as far as seeding the host policy from the raw one, just > to then immediately run x86_cpu_featureset_to_policy(), thus overwriting > a fair part of what was first taken from the raw policy. That's necessary > right now, because setup_{force,clear}_cpu_cap() act on > boot_cpu_data.x86_capability[], not the host policy. > > As to the "needless" further up, it's only as far as moving those bits > into x86_capability[] would further duplicate information, rather than > (for that piece at least) putting them into the policies right away. But > yes, if the goal is to have setup_{force,clear}_cpu_cap() be able to > control those bits as well, then going the intermediate step would be > unavoidable at this point in time. I'm still not sure of what needs to happen to move this forward. As I said, I'm not opposed to doing some prep work; but I don't want to randomly guess as to what kinds of clean-up needs to be done, only to be told it was wrong (either by you when I post it or by Andy sometime after it's checked in). I could certainly move svm_feature_flags into host_cpu_policy, and have cpu_svm_feature_* reference host_cpu_policy instead (after moving the nested virt "guest policy" tweaks into hvm_cpu_policy); but as far as I can tell, that would be the *very first* instance of Xen using host_cpu_policy in that manner. I'd like more clarity that this is the long-term direction that things are going before then. If you (plural) don't have time now to refresh your memory / make an informed decision about what you want to happen, then please consider just taking the patch as it is; it doesn't make future changes any harder. -George
On 22.02.2024 12:00, George Dunlap wrote: > On Thu, Feb 22, 2024 at 5:50 PM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 22.02.2024 10:30, George Dunlap wrote: >>> On Wed, Feb 21, 2024 at 6:52 PM Jan Beulich <jbeulich@suse.com> wrote: >>>>>> But then of course Andrew may know of reasons why all of this is done >>>>>> in calculate_host_policy() in the first place, rather than in HVM >>>>>> policy calculation. >>>>> >>>>> It sounds like maybe you're confusing host_policy with >>>>> x86_capabilities? From what I can tell: >>>>> >>>>> * the "basic" cpu_has_X macros resolve to boot_cpu_has(), which >>>>> resolves to cpu_has(&boot_cpu_data, ...), which is completely >>>>> independent of the cpu-policy.c:host_cpu_policy >>>>> >>>>> * cpu-policy.c:host_cpu_policy only affects what is advertised to >>>>> guests, via {pv,hvm}_cpu_policy and featureset bits. Most notably a >>>>> quick skim doesn't show any mechanism by which host_cpu_policy could >>>>> affect what features Xen itself decides to use. >>>> >>>> I'm not mixing the two, no; the two are still insufficiently disentangled. >>>> There's really no reason (long term) to have both host policy and >>>> x86_capabilities. Therefore I'd prefer if new code (including a basically >>>> fundamental re-write as is going to be needed for nested) to avoid >>>> needlessly further extending x86_capabilities. Unless of course there's >>>> something fundamentally wrong with eliminating the redundancy, which >>>> likely Andrew would be in the best position to point out. >>> >>> So I don't know the history of how things got to be the way they are, >>> nor really much about the code but what I've gathered from skimming >>> through while creating this patch series. But from that impression, >>> the only issue I really see with the current code is the confusing >>> naming. The cpufeature.h code has this nice infrastructure to allow >>> you to, for instance, enable or disable certain bits on the >>> command-line; and the interface for querying all the different bits of >>> functionality is all nicely put in one place. Moving the >>> svm_feature_flags into x86_capabilities would immediately allow SVM to >>> take advantage of this infrastructure; it's not clear to me how this >>> would be "needless". >>> >>> Furthermore, it looks to me like host_cpu_policy is used as a starting >>> point for generating pv_cpu_policy and hvm_cpu_policy, both of which >>> are only used for guest cpuid generation. Given that the format of >>> those policies is fixed, and there's a lot of "copy this bit from the >>> host policy wholesale", it seems like no matter what, you'd want a >>> host_cpu_policy. >>> >>> And in any case -- all that is kind of moot. *Right now*, >>> host_cpu_policy is only used for guest cpuid policy creation; *right >>> now*, the nested virt features of AMD are handled in the >>> host_cpu_policy; *right now*, we're advertising to guests bits which >>> are not properly virtualized; *right now* these bits are actually set >>> unconditionally, regardless of whether they're even available on the >>> hardware; *right now*, Xen uses svm_feature_flags to determine its own >>> use of TscRateMSR; so *right now*, removing this bit from >>> host_cpu_policy won't prevent Xen from using TscRateMSR itself. >>> >>> (Unless my understanding of the code is wrong, in which case I'd >>> appreciate a correction.) >> >> There's nothing wrong afaics, just missing at least one aspect: Did you >> see all the featureset <-> policy conversions in cpu-policy.c? That (to >> me at least) clearly is a sign of unnecessary duplication of the same >> data. This goes as far as seeding the host policy from the raw one, just >> to then immediately run x86_cpu_featureset_to_policy(), thus overwriting >> a fair part of what was first taken from the raw policy. That's necessary >> right now, because setup_{force,clear}_cpu_cap() act on >> boot_cpu_data.x86_capability[], not the host policy. >> >> As to the "needless" further up, it's only as far as moving those bits >> into x86_capability[] would further duplicate information, rather than >> (for that piece at least) putting them into the policies right away. But >> yes, if the goal is to have setup_{force,clear}_cpu_cap() be able to >> control those bits as well, then going the intermediate step would be >> unavoidable at this point in time. > > I'm still not sure of what needs to happen to move this forward. > > As I said, I'm not opposed to doing some prep work; but I don't want > to randomly guess as to what kinds of clean-up needs to be done, only > to be told it was wrong (either by you when I post it or by Andy > sometime after it's checked in). > > I could certainly move svm_feature_flags into host_cpu_policy, and > have cpu_svm_feature_* reference host_cpu_policy instead (after moving > the nested virt "guest policy" tweaks into hvm_cpu_policy); but as far > as I can tell, that would be the *very first* instance of Xen using > host_cpu_policy in that manner. I'd like more clarity that this is > the long-term direction that things are going before then. > > If you (plural) don't have time now to refresh your memory / make an > informed decision about what you want to happen, then please consider > just taking the patch as it is; it doesn't make future changes any > harder. I'd be willing to ack the patch as is if I understood the piece of code in calculate_host_policy() that is being changed here. My take is that in a prereq patch that wants adjusting, by moving it wherever applicable. Without indications to the contrary, it living there is plain wrong, and the patch here touching that function isn't quite right either, even if only as a result of the original issue. If without at least a little bit of input from Andrew we can't move forward, then that's the way it is - we need to wait. As you likely know I have dozens of patches in that state ... Jan
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 10079c26ae..d71abbc44a 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void) (1u << SVM_FEATURE_PAUSEFILTER) | (1u << SVM_FEATURE_DECODEASSISTS)); /* Enable features which are always emulated. */ - p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | - (1u << SVM_FEATURE_TSCRATEMSR)); + p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); } /* 0x000000ce MSR_INTEL_PLATFORM_INFO */ diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index ee9602f5c8..d02a59f184 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -146,8 +146,6 @@ int cf_check nsvm_vcpu_reset(struct vcpu *v) svm->ns_msr_hsavepa = INVALID_PADDR; svm->ns_ovvmcb_pa = INVALID_PADDR; - svm->ns_tscratio = DEFAULT_TSC_RATIO; - svm->ns_cr_intercepts = 0; svm->ns_dr_intercepts = 0; svm->ns_exception_intercepts = 0; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index b551eac807..34b9f603bc 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -777,43 +777,6 @@ static int cf_check svm_get_guest_pat(struct vcpu *v, u64 *gpat) return 1; } -static uint64_t scale_tsc(uint64_t host_tsc, uint64_t ratio) -{ - uint64_t mult, frac, scaled_host_tsc; - - if ( ratio == DEFAULT_TSC_RATIO ) - return host_tsc; - - /* - * Suppose the most significant 32 bits of host_tsc and ratio are - * tsc_h and mult, and the least 32 bits of them are tsc_l and frac, - * then - * host_tsc * ratio * 2^-32 - * = host_tsc * (mult * 2^32 + frac) * 2^-32 - * = host_tsc * mult + (tsc_h * 2^32 + tsc_l) * frac * 2^-32 - * = host_tsc * mult + tsc_h * frac + ((tsc_l * frac) >> 32) - * - * Multiplications in the last two terms are between 32-bit integers, - * so both of them can fit in 64-bit integers. - * - * Because mult is usually less than 10 in practice, it's very rare - * that host_tsc * mult can overflow a 64-bit integer. - */ - mult = ratio >> 32; - frac = ratio & ((1ULL << 32) - 1); - scaled_host_tsc = host_tsc * mult; - scaled_host_tsc += (host_tsc >> 32) * frac; - scaled_host_tsc += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32; - - return scaled_host_tsc; -} - -static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc, - uint64_t ratio) -{ - return guest_tsc - scale_tsc(host_tsc, ratio); -} - static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc) { struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb; @@ -832,18 +795,8 @@ static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc) if ( nestedhvm_vcpu_in_guestmode(v) ) { - struct nestedsvm *svm = &vcpu_nestedsvm(v); - n2_tsc_offset = vmcb_get_tsc_offset(n2vmcb) - vmcb_get_tsc_offset(n1vmcb); - if ( svm->ns_tscratio != DEFAULT_TSC_RATIO ) - { - uint64_t guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc); - - n2_tsc_offset = svm_get_tsc_offset(guest_tsc, - guest_tsc + n2_tsc_offset, - svm->ns_tscratio); - } vmcb_set_tsc_offset(n1vmcb, offset); } @@ -1921,10 +1874,6 @@ static int cf_check svm_msr_read_intercept( *msr_content = nsvm->ns_msr_hsavepa; break; - case MSR_AMD64_TSC_RATIO: - *msr_content = nsvm->ns_tscratio; - break; - case MSR_AMD_OSVW_ID_LENGTH: case MSR_AMD_OSVW_STATUS: if ( !d->arch.cpuid->extd.osvw ) @@ -2103,12 +2052,6 @@ static int cf_check svm_msr_write_intercept( goto gpf; break; - case MSR_AMD64_TSC_RATIO: - if ( msr_content & TSC_RATIO_RSVD_BITS ) - goto gpf; - nsvm->ns_tscratio = msr_content; - break; - case MSR_IA32_MCx_MISC(4): /* Threshold register */ case MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3: /* diff --git a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h index 406fc082b1..45d658ad01 100644 --- a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h +++ b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h @@ -18,11 +18,6 @@ struct nestedsvm { */ uint64_t ns_ovvmcb_pa; - /* virtual tscratio holding the value l1 guest writes to the - * MSR_AMD64_TSC_RATIO MSR. - */ - uint64_t ns_tscratio; - /* Cached real intercepts of the l2 guest */ uint32_t ns_cr_intercepts; uint32_t ns_dr_intercepts;
The primary purpose of TSC scaling, from our perspective, is to maintain the fiction of an "invariant TSC" across migrates between platforms with different clock speeds. On AMD, the TscRateMSR CPUID bit is unconditionally enabled in the "host cpuid", even if the hardware doesn't actually support it. According to c/s fd14a1943c4 ("nestedsvm: Support TSC Rate MSR"), testing showed that emulating TSC scaling in an L1 was more expensive than emulating TSC scaling on an L0 (due to extra sets of vmexit / vmenter). However, the current implementation seems to be broken. First of all, the final L2 scaling ratio should be a composition of the L0 scaling ratio and the L1 scaling ratio; there's no indication this is being done anywhere. Secondly, it's not clear that the L1 tsc scaling ratio actually affects the L0 tsc scaling ratio. The stored value (ns_tscratio) is used to affect the tsc *offset*, but doesn't seem to actually be factored into d->hvm.tsc_scaling_ratio. (Which shouldn't be per-domain anyway, but per-vcpu.) Having the *offset* scaled according to the nested scaling without the actual RDTSC itself also being scaled has got to produce inconsistent results. For now, just disable the functionality entirely until we can implement it properly: - Don't set TSCRATEMSR in the host CPUID policy - Remove MSR_AMD64_TSC_RATIO emulation handling, so that the guest guests a #GP if it tries to access them (as it should when TSCRATEMSR is clear) - Remove ns_tscratio from struct nestedhvm, and all code that touches it Unfortunately this means ripping out the scaling calculation stuff as well, since it's only used in the nested case; it's there in the git tree if we need it for reference when we re-introduce it. Signed-off-by: George Dunlap <george.dunlap@cloud.com> --- CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: "Roger Pau Monné" <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> --- xen/arch/x86/cpu-policy.c | 3 +- xen/arch/x86/hvm/svm/nestedsvm.c | 2 - xen/arch/x86/hvm/svm/svm.c | 57 -------------------- xen/arch/x86/include/asm/hvm/svm/nestedsvm.h | 5 -- 4 files changed, 1 insertion(+), 66 deletions(-)