Message ID | 20191115105739.20333-1-george.dunlap@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] x86: Add hack to disable "Fake HT" mode | expand |
On 15.11.2019 11:57, George Dunlap wrote: > Open questions: > > - Is this the right place to put the `getenv` check? > > - Is there any way we can make migration work, at least in some cases? > > - Can we check for known-problematic models, and at least report a > more useful error? Checking for specific models should be straightforward, but I wonder how sensible it is to compile a likely ever growing list into here. As to the reporting of an error - you saying "more useful" suggests there is some error already being reported. But I don't see any here, nor does any come to mind. > --- a/tools/libxc/xc_cpuid_x86.c > +++ b/tools/libxc/xc_cpuid_x86.c > @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, > } > else > { > - /* > - * Topology for HVM guests is entirely controlled by Xen. For now, we > - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. > - */ > - p->basic.htt = true; > + p->basic.htt = false; > p->extd.cmp_legacy = false; > > - /* > - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. > - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid > - * overflow. > - */ > - if ( !(p->basic.lppp & 0x80) ) > - p->basic.lppp *= 2; > - I appreciate you wanting to put all adjustments in a central place, but at least it makes patch review more difficult. How about you latch !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of the function and then the above would become if ( !(p->basic.lppp & 0x80) ) p->basic.lppp <<= fakeht; and e.g. ... > switch ( p->x86_vendor ) > { > case X86_VENDOR_INTEL: > for ( i = 0; (p->cache.subleaf[i].type && > i < ARRAY_SIZE(p->cache.raw)); ++i ) > { > - p->cache.subleaf[i].cores_per_package = > - (p->cache.subleaf[i].cores_per_package << 1) | 1; ... this p->cache.subleaf[i].cores_per_package = (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht; > + p->cache.subleaf[i].cores_per_package = 0; This doesn't look correct - you need to leave alone the field if the adjustment moved down is supposed to have any effect. This is an example of why the bigger code movement you do is risky. Jan
On 15.11.2019 11:57, George Dunlap wrote: > Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM > guests") attempted to "fake up" a topology which would induce guest > operating systems to not treat vcpus as sibling hyperthreads. This > involved (among other things) actually reporting hyperthreading as > available, but giving vcpus every other APICID. The resulting cpu > featureset is invalid, but most operating systems on most hardware > managed to cope with it. > > Unfortunately, Windows running on modern AMD hardware -- including > Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets > confused by the resulting contradictory feature bits and crashes > during installation. (Linux guests have so far continued to cope.) I do not understand a central point: No matter why and/or how a fake topology is presented by Xen, why did the older generation Ryzen 2xxx work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that causes the one to work and the other to fail? Regards Andreas
On 11/15/19 11:17 AM, Andreas Kinzler wrote: > On 15.11.2019 11:57, George Dunlap wrote: >> Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM >> guests") attempted to "fake up" a topology which would induce guest >> operating systems to not treat vcpus as sibling hyperthreads. This >> involved (among other things) actually reporting hyperthreading as >> available, but giving vcpus every other APICID. The resulting cpu >> featureset is invalid, but most operating systems on most hardware >> managed to cope with it. >> >> Unfortunately, Windows running on modern AMD hardware -- including >> Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets >> confused by the resulting contradictory feature bits and crashes >> during installation. (Linux guests have so far continued to cope.) > > I do not understand a central point: No matter why and/or how a fake > topology is presented by Xen, why did the older generation Ryzen 2xxx > work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that > causes the one to work and the other to fail? The CPU features that the guest sees are a mix of the real underlying features and changes made by Xen. Xen and/or the hardware will behave exactly the same in both cases; the only difference is what the guest operating system does. The guest operating system code which is trying to determine the topology sees an impossible franken-monster mix of bits which its authors never anticipated, and so did not test; and it goes off the rails somewhere and crashes. Without seeing the code I cannot tell you how or exactly why it fails on Ryzen 3xxx but not on Ryzen 2xxx. It could be something as simple as a missed case in a switch statement somewhere which never happens on real hardware. Ideally of course such code should never crash, and should probably be designed such that it muddles on somehow no matter what it gets. But I doubt we're going to get a huge amount of sympathy from MS if we request a patch to allow it to deal with incoherent topology; and in any case there are already lots of images out there already with the old code. (Caveat: This is all from my understanding of Andy's description, not from personal experience.) -George
On 15.11.2019 12:29, George Dunlap wrote: > On 11/15/19 11:17 AM, Andreas Kinzler wrote: >> I do not understand a central point: No matter why and/or how a fake >> topology is presented by Xen, why did the older generation Ryzen 2xxx >> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that >> causes the one to work and the other to fail? > The CPU features that the guest sees are a mix of the real underlying > features and changes made by Xen. Xen and/or the hardware will behave Why not analyze the bits in detail? I already posted the complete CPUID for 3700X (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html). @Steven: > If this is helpful, I can probably provide the same from: > * Ryzen 2700x > * Ryzen 3900x Can you post for 2700X? Then someone with detailed knowledge could compare the two? Regards Andreas
On 11/15/19 11:12 AM, Jan Beulich wrote: > On 15.11.2019 11:57, George Dunlap wrote: >> Open questions: >> >> - Is this the right place to put the `getenv` check? >> >> - Is there any way we can make migration work, at least in some cases? >> >> - Can we check for known-problematic models, and at least report a >> more useful error? > > Checking for specific models should be straightforward, but I wonder > how sensible it is to compile a likely ever growing list into here. The hope is that this would be a short-term fix, replaced by something more robust in 4.14. > > As to the reporting of an error - you saying "more useful" suggests > there is some error already being reported. But I don't see any here, > nor does any come to mind. At the moment, if someone buys a Ryzen 3xxx box, installs Xen, and tries to boot a Windows guest, Windows will simply crash (as I understand it). (That is, the issue is "reported" by a BSoD.) That person will then google around, and hopefully run across the various threads here, and see that she should add "export XEN_LIBXC_DISABLE_FAKEHT=1" when starting their Windows guests. Suppose instead that after buying a Ryzen 3xxx box, and installing Xen, when starting a guest, they got the following error message: "WARNING: This system may be affected by Xen bug $FOO. Please see $DOCUMENTATION and set XEN_LIBXC_DISABLE_FAKEHT appropriately." It would mean annoying people not running Windows unnecessarily; but it seems like a better experience than having a guest crash and searching for a solution. (And to be clear, the reason it's RFC is to ask what people think of this idea, rather than to argue that this is what we should do.) > >> --- a/tools/libxc/xc_cpuid_x86.c >> +++ b/tools/libxc/xc_cpuid_x86.c >> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >> } >> else >> { >> - /* >> - * Topology for HVM guests is entirely controlled by Xen. For now, we >> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. >> - */ >> - p->basic.htt = true; >> + p->basic.htt = false; >> p->extd.cmp_legacy = false; >> >> - /* >> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. >> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid >> - * overflow. >> - */ >> - if ( !(p->basic.lppp & 0x80) ) >> - p->basic.lppp *= 2; >> - > > I appreciate you wanting to put all adjustments in a central place, but > at least it makes patch review more difficult. How about you latch > !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of > the function and then the above would become > > if ( !(p->basic.lppp & 0x80) ) > p->basic.lppp <<= fakeht; > > and e.g. ... > >> switch ( p->x86_vendor ) >> { >> case X86_VENDOR_INTEL: >> for ( i = 0; (p->cache.subleaf[i].type && >> i < ARRAY_SIZE(p->cache.raw)); ++i ) >> { >> - p->cache.subleaf[i].cores_per_package = >> - (p->cache.subleaf[i].cores_per_package << 1) | 1; > > ... this > > p->cache.subleaf[i].cores_per_package = > (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht; I'm afraid I think the code itself would then become more difficult to read; and it seems a bit strange to be architecting our code based on limitations of the diff algorithm and/or diff viewer used. I could break the patch down into two if you think that would make it easier to review. You might also try a diff viewer which shows both before and after; I use 'meld'. >> + p->cache.subleaf[i].cores_per_package = 0; > > This doesn't look correct - you need to leave alone the field if > the adjustment moved down is supposed to have any effect. This > is an example of why the bigger code movement you do is risky. Ah yes, I didn't notice the calculation was self-referential. Let me take a look. -George
On 11/15/19 11:39 AM, Andreas Kinzler wrote: > On 15.11.2019 12:29, George Dunlap wrote: >> On 11/15/19 11:17 AM, Andreas Kinzler wrote: >>> I do not understand a central point: No matter why and/or how a fake >>> topology is presented by Xen, why did the older generation Ryzen 2xxx >>> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that >>> causes the one to work and the other to fail? >> The CPU features that the guest sees are a mix of the real underlying >> features and changes made by Xen. Xen and/or the hardware will behave > > Why not analyze the bits in detail? I already posted the complete CPUID > for 3700X > (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html). > > > @Steven: >> If this is helpful, I can probably provide the same from: >> * Ryzen 2700x >> * Ryzen 3900x > Can you post for 2700X? > > Then someone with detailed knowledge could compare the two? What would be the purpose? The code is going to look like this -- https://gitlab.com/xen-project/xen/blob/staging/xen/arch/x86/cpu/common.c -- an impenetrable maze of "switch" and "if" statements based on individual bits or features or models. *Somewhere* in Window's version of that code, there's a path which is triggered by Ryzen-3xxx-Xen-with-Fake-HT but not triggered by Ryzen-2xxx-with-Xen-Fake-HT, or Ryzen-3xxx-Xen-without-Fake-HT. And suppose we found exactly the bits which triggered it, what then? We could flip just that bit; but that would make the resulting CPUID even *more* weird, and likely to trigger something else in the future. The solution this patch proposes doesn't make the CPUID topology "normal", but it certainly makes it a lot less weird, which is a better place to be. -George
On 15.11.19 11:57, George Dunlap wrote: > Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM > guests") attempted to "fake up" a topology which would induce guest > operating systems to not treat vcpus as sibling hyperthreads. This > involved (among other things) actually reporting hyperthreading as > available, but giving vcpus every other APICID. The resulting cpu > featureset is invalid, but most operating systems on most hardware > managed to cope with it. > > Unfortunately, Windows running on modern AMD hardware -- including > Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets > confused by the resulting contradictory feature bits and crashes > during installation. (Linux guests have so far continued to cope.) > > A "proper" fix is complicated and it's too late to fix it either for > 4.13, or to backport to supported branches. As a short-term fix, > implement an option to disable this "Fake HT" mode. The resulting > topology reported will not be canonical, but experimentally continues > to work with Windows guests. > > However, disabling this "Fake HT" mode has not been widely tested, and > will almost certainly break migration if applied inconsistently. > > To minimize impact while allowing administrators to disable "Fake HT" > only on guests which are known not to work without it (i.e., Windows > guests) on affected hardware, add an environment variable which can be > set to disable the "Fake HT" mode on such hardware. Hmm, how is this going to work with libvirt? AFAIK libvirtd running as a single process is creating all guests. So with this approach you'd either not be able to use libvirtd, or you'd have to disable "Fake HT" for all guests, probably by modifying the libvirtd service definition. > Reported-by: Steven Haigh <netwiz@crc.id.au> > Reported-by: Andreas Kinzler <hfp@posteo.de> > Signed-off-by: George Dunlap <george.dunlap@citrix.com> > --- > This has been compile-tested only; I'm posting it early to get > feedback on the approach. > > TODO: Prevent such guests from being migrated > > Open questions: > > - Is this the right place to put the `getenv` check? > > - Is there any way we can make migration work, at least in some cases? > > - Can we check for known-problematic models, and at least report a > more useful error? Can't we just disable "Fake HT" automatically on those models? This would automagically make migration work, too. Juergen
On 15.11.2019 12:58, George Dunlap wrote: > On 11/15/19 11:12 AM, Jan Beulich wrote: >> On 15.11.2019 11:57, George Dunlap wrote: >>> --- a/tools/libxc/xc_cpuid_x86.c >>> +++ b/tools/libxc/xc_cpuid_x86.c >>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>> } >>> else >>> { >>> - /* >>> - * Topology for HVM guests is entirely controlled by Xen. For now, we >>> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. >>> - */ >>> - p->basic.htt = true; >>> + p->basic.htt = false; >>> p->extd.cmp_legacy = false; >>> >>> - /* >>> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. >>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid >>> - * overflow. >>> - */ >>> - if ( !(p->basic.lppp & 0x80) ) >>> - p->basic.lppp *= 2; >>> - >> >> I appreciate you wanting to put all adjustments in a central place, but >> at least it makes patch review more difficult. How about you latch >> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of >> the function and then the above would become >> >> if ( !(p->basic.lppp & 0x80) ) >> p->basic.lppp <<= fakeht; >> >> and e.g. ... >> >>> switch ( p->x86_vendor ) >>> { >>> case X86_VENDOR_INTEL: >>> for ( i = 0; (p->cache.subleaf[i].type && >>> i < ARRAY_SIZE(p->cache.raw)); ++i ) >>> { >>> - p->cache.subleaf[i].cores_per_package = >>> - (p->cache.subleaf[i].cores_per_package << 1) | 1; >> >> ... this >> >> p->cache.subleaf[i].cores_per_package = >> (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht; > > I'm afraid I think the code itself would then become more difficult to > read; Slightly, but yes. > and it seems a bit strange to be architecting our code based on > limitations of the diff algorithm and/or diff viewer used. It's not entirely uncommon to (also) consider how the resulting diff would look like when putting together a change. And besides the review aspect, there's also the archeology one - "git blame" yields much more helpful results when code doesn't get moved around more than necessary. But yes, there's no very clear "this is the better option" here. I've taken another look at the code before your change though - everything is already nicely in one place with Andrew's most recent code reorg. So I'm now having an even harder time seeing why you want to move things around again. Jan
On 11/15/19 12:23 PM, Jürgen Groß wrote: > On 15.11.19 11:57, George Dunlap wrote: >> Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM >> guests") attempted to "fake up" a topology which would induce guest >> operating systems to not treat vcpus as sibling hyperthreads. This >> involved (among other things) actually reporting hyperthreading as >> available, but giving vcpus every other APICID. The resulting cpu >> featureset is invalid, but most operating systems on most hardware >> managed to cope with it. >> >> Unfortunately, Windows running on modern AMD hardware -- including >> Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets >> confused by the resulting contradictory feature bits and crashes >> during installation. (Linux guests have so far continued to cope.) >> >> A "proper" fix is complicated and it's too late to fix it either for >> 4.13, or to backport to supported branches. As a short-term fix, >> implement an option to disable this "Fake HT" mode. The resulting >> topology reported will not be canonical, but experimentally continues >> to work with Windows guests. >> >> However, disabling this "Fake HT" mode has not been widely tested, and >> will almost certainly break migration if applied inconsistently. >> >> To minimize impact while allowing administrators to disable "Fake HT" >> only on guests which are known not to work without it (i.e., Windows >> guests) on affected hardware, add an environment variable which can be >> set to disable the "Fake HT" mode on such hardware. > > Hmm, how is this going to work with libvirt? AFAIK libvirtd running as > a single process is creating all guests. So with this approach you'd > either not be able to use libvirtd, or you'd have to disable "Fake HT" > for all guests, probably by modifying the libvirtd service definition. If we went the current route, yes, that's what would need to be done. Anything else would require changing the interface, which would require changes to libvirt anyway. >> Reported-by: Steven Haigh <netwiz@crc.id.au> >> Reported-by: Andreas Kinzler <hfp@posteo.de> >> Signed-off-by: George Dunlap <george.dunlap@citrix.com> >> --- >> This has been compile-tested only; I'm posting it early to get >> feedback on the approach. >> >> TODO: Prevent such guests from being migrated >> >> Open questions: >> >> - Is this the right place to put the `getenv` check? >> >> - Is there any way we can make migration work, at least in some cases? >> >> - Can we check for known-problematic models, and at least report a >> more useful error? > > Can't we just disable "Fake HT" automatically on those models? This > would automagically make migration work, too. What if someone is using only Linux guests on a Ryzen 3xxx box with 4.12.1? If they naively update to 4.13 without going through a version that has this backport, they'll get the wrong CPUid on the receiving side and migration will fail. There was also concern about disabling the "Fake HT" across the board on the affected CPUs -- it hasn't been well-tested on Linux, and so it's not clear whether there will be issues or not. (If this was checked in back in June it would be a different matter.) -George
On 15.11.2019 13:10, George Dunlap wrote: > On 11/15/19 11:39 AM, Andreas Kinzler wrote: >> On 15.11.2019 12:29, George Dunlap wrote: >>> On 11/15/19 11:17 AM, Andreas Kinzler wrote: >>>> I do not understand a central point: No matter why and/or how a fake >>>> topology is presented by Xen, why did the older generation Ryzen 2xxx >>>> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that >>>> causes the one to work and the other to fail? >>> The CPU features that the guest sees are a mix of the real underlying >>> features and changes made by Xen. Xen and/or the hardware will behave >> Why not analyze the bits in detail? I already posted the complete CPUID >> for 3700X >> (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html). >> Then someone with detailed knowledge could compare the two? > What would be the purpose? > The code is going to look like this -- > an impenetrable maze of "switch" and "if" statements based on > individual bits or features or models. *Somewhere* in Window's > versionof that code, there's a path which is triggered by As of this moment all of this is just an assumption - you might very well be right, but it could also be something totally different. What if the CPUID is nearly identical? This would lead to the conclusion that the problem has completely different root causes. Regards Andreas
On 15/11/2019 12:39, Jan Beulich wrote: > On 15.11.2019 12:58, George Dunlap wrote: >> On 11/15/19 11:12 AM, Jan Beulich wrote: >>> On 15.11.2019 11:57, George Dunlap wrote: >>>> --- a/tools/libxc/xc_cpuid_x86.c >>>> +++ b/tools/libxc/xc_cpuid_x86.c >>>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>>> } >>>> else >>>> { >>>> - /* >>>> - * Topology for HVM guests is entirely controlled by Xen. For now, we >>>> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. >>>> - */ >>>> - p->basic.htt = true; >>>> + p->basic.htt = false; >>>> p->extd.cmp_legacy = false; >>>> >>>> - /* >>>> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. >>>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid >>>> - * overflow. >>>> - */ >>>> - if ( !(p->basic.lppp & 0x80) ) >>>> - p->basic.lppp *= 2; >>>> - >>> I appreciate you wanting to put all adjustments in a central place, but >>> at least it makes patch review more difficult. How about you latch >>> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of >>> the function and then the above would become >>> >>> if ( !(p->basic.lppp & 0x80) ) >>> p->basic.lppp <<= fakeht; >>> >>> and e.g. ... >>> >>>> switch ( p->x86_vendor ) >>>> { >>>> case X86_VENDOR_INTEL: >>>> for ( i = 0; (p->cache.subleaf[i].type && >>>> i < ARRAY_SIZE(p->cache.raw)); ++i ) >>>> { >>>> - p->cache.subleaf[i].cores_per_package = >>>> - (p->cache.subleaf[i].cores_per_package << 1) | 1; >>> ... this >>> >>> p->cache.subleaf[i].cores_per_package = >>> (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht; >> I'm afraid I think the code itself would then become more difficult to >> read; > Slightly, but yes. > >> and it seems a bit strange to be architecting our code based on >> limitations of the diff algorithm and/or diff viewer used. > It's not entirely uncommon to (also) consider how the resulting > diff would look like when putting together a change. And besides > the review aspect, there's also the archeology one - "git blame" > yields much more helpful results when code doesn't get moved > around more than necessary. But yes, there's no very clear "this > is the better option" here. I've taken another look at the code > before your change though - everything is already nicely in one > place with Andrew's most recent code reorg. So I'm now having an > even harder time seeing why you want to move things around again. We don't. I've recommend twice now to have a single "else if" hunk which is nearly empty, and much more obviously a gross "make it work for 4.13" bodge. The only thing which is an open question (IMO) is if/how to trigger this quirk mode. There are no good options. We just need to agree on the least bad one. ~Andrew
On 15/11/2019 12:44, Andreas Kinzler wrote: > On 15.11.2019 13:10, George Dunlap wrote: >> On 11/15/19 11:39 AM, Andreas Kinzler wrote: >>> On 15.11.2019 12:29, George Dunlap wrote: >>>> On 11/15/19 11:17 AM, Andreas Kinzler wrote: >>>>> I do not understand a central point: No matter why and/or how a fake >>>>> topology is presented by Xen, why did the older generation Ryzen 2xxx >>>>> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen >>>>> that >>>>> causes the one to work and the other to fail? >>>> The CPU features that the guest sees are a mix of the real underlying >>>> features and changes made by Xen. Xen and/or the hardware will behave >>> Why not analyze the bits in detail? I already posted the complete CPUID >>> for 3700X >>> (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html). >>> >>> Then someone with detailed knowledge could compare the two? >> What would be the purpose? >> The code is going to look like this -- >> an impenetrable maze of "switch" and "if" statements based on >> individual bits or features or models. *Somewhere* in Window's >> versionof that code, there's a path which is triggered by > > As of this moment all of this is just an assumption - you might very > well be right, but it could also be something totally different. What > if the CPUID is nearly identical? This would lead to the conclusion > that the problem has completely different root causes. The problem is that Xen's "messing around with topology" is, and has always been, incorrect. It does not match the rules given in the SDM/APM. None of this code is going to survive the rewrite making the presented topology actually follow the architectural rules. However, for 4.13, we are trying to find the least invasive hack possible to cause windows not to explode. ~Andrew
On 11/15/19 1:55 PM, Andrew Cooper wrote: > On 15/11/2019 12:39, Jan Beulich wrote: >> On 15.11.2019 12:58, George Dunlap wrote: >>> On 11/15/19 11:12 AM, Jan Beulich wrote: >>>> On 15.11.2019 11:57, George Dunlap wrote: >>>>> --- a/tools/libxc/xc_cpuid_x86.c >>>>> +++ b/tools/libxc/xc_cpuid_x86.c >>>>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>>>> } >>>>> else >>>>> { >>>>> - /* >>>>> - * Topology for HVM guests is entirely controlled by Xen. For now, we >>>>> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. >>>>> - */ >>>>> - p->basic.htt = true; >>>>> + p->basic.htt = false; >>>>> p->extd.cmp_legacy = false; >>>>> >>>>> - /* >>>>> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. >>>>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid >>>>> - * overflow. >>>>> - */ >>>>> - if ( !(p->basic.lppp & 0x80) ) >>>>> - p->basic.lppp *= 2; >>>>> - >>>> I appreciate you wanting to put all adjustments in a central place, but >>>> at least it makes patch review more difficult. How about you latch >>>> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of >>>> the function and then the above would become >>>> >>>> if ( !(p->basic.lppp & 0x80) ) >>>> p->basic.lppp <<= fakeht; >>>> >>>> and e.g. ... >>>> >>>>> switch ( p->x86_vendor ) >>>>> { >>>>> case X86_VENDOR_INTEL: >>>>> for ( i = 0; (p->cache.subleaf[i].type && >>>>> i < ARRAY_SIZE(p->cache.raw)); ++i ) >>>>> { >>>>> - p->cache.subleaf[i].cores_per_package = >>>>> - (p->cache.subleaf[i].cores_per_package << 1) | 1; >>>> ... this >>>> >>>> p->cache.subleaf[i].cores_per_package = >>>> (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht; >>> I'm afraid I think the code itself would then become more difficult to >>> read; >> Slightly, but yes. >> >>> and it seems a bit strange to be architecting our code based on >>> limitations of the diff algorithm and/or diff viewer used. >> It's not entirely uncommon to (also) consider how the resulting >> diff would look like when putting together a change. And besides >> the review aspect, there's also the archeology one - "git blame" >> yields much more helpful results when code doesn't get moved >> around more than necessary. But yes, there's no very clear "this >> is the better option" here. I've taken another look at the code >> before your change though - everything is already nicely in one >> place with Andrew's most recent code reorg. So I'm now having an >> even harder time seeing why you want to move things around again. > > We don't. I've recommend twice now to have a single "else if" hunk > which is nearly empty, and much more obviously a gross "make it work for > 4.13" bodge. The results are a tiny bit better, but not much really (see attached). -George
On 11/15/19 2:04 PM, George Dunlap wrote: > On 11/15/19 1:55 PM, Andrew Cooper wrote: >> On 15/11/2019 12:39, Jan Beulich wrote: >>> On 15.11.2019 12:58, George Dunlap wrote: >>>> On 11/15/19 11:12 AM, Jan Beulich wrote: >>>>> On 15.11.2019 11:57, George Dunlap wrote: >>>>>> --- a/tools/libxc/xc_cpuid_x86.c >>>>>> +++ b/tools/libxc/xc_cpuid_x86.c >>>>>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>>>>> } >>>>>> else >>>>>> { >>>>>> - /* >>>>>> - * Topology for HVM guests is entirely controlled by Xen. For now, we >>>>>> - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. >>>>>> - */ >>>>>> - p->basic.htt = true; >>>>>> + p->basic.htt = false; >>>>>> p->extd.cmp_legacy = false; >>>>>> >>>>>> - /* >>>>>> - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. >>>>>> - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid >>>>>> - * overflow. >>>>>> - */ >>>>>> - if ( !(p->basic.lppp & 0x80) ) >>>>>> - p->basic.lppp *= 2; >>>>>> - >>>>> I appreciate you wanting to put all adjustments in a central place, but >>>>> at least it makes patch review more difficult. How about you latch >>>>> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of >>>>> the function and then the above would become >>>>> >>>>> if ( !(p->basic.lppp & 0x80) ) >>>>> p->basic.lppp <<= fakeht; >>>>> >>>>> and e.g. ... >>>>> >>>>>> switch ( p->x86_vendor ) >>>>>> { >>>>>> case X86_VENDOR_INTEL: >>>>>> for ( i = 0; (p->cache.subleaf[i].type && >>>>>> i < ARRAY_SIZE(p->cache.raw)); ++i ) >>>>>> { >>>>>> - p->cache.subleaf[i].cores_per_package = >>>>>> - (p->cache.subleaf[i].cores_per_package << 1) | 1; >>>>> ... this >>>>> >>>>> p->cache.subleaf[i].cores_per_package = >>>>> (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht; >>>> I'm afraid I think the code itself would then become more difficult to >>>> read; >>> Slightly, but yes. >>> >>>> and it seems a bit strange to be architecting our code based on >>>> limitations of the diff algorithm and/or diff viewer used. >>> It's not entirely uncommon to (also) consider how the resulting >>> diff would look like when putting together a change. And besides >>> the review aspect, there's also the archeology one - "git blame" >>> yields much more helpful results when code doesn't get moved >>> around more than necessary. But yes, there's no very clear "this >>> is the better option" here. I've taken another look at the code >>> before your change though - everything is already nicely in one >>> place with Andrew's most recent code reorg. So I'm now having an >>> even harder time seeing why you want to move things around again. >> >> We don't. I've recommend twice now to have a single "else if" hunk >> which is nearly empty, and much more obviously a gross "make it work for >> 4.13" bodge. > > The results are a tiny bit better, but not much really (see attached). I mean, if we *really* wanted to optimize for diff readability, we could use `goto`s instead, but I thought that would be going a bit too far. -George
On 15/11/2019 14:04, George Dunlap wrote: >>> It's not entirely uncommon to (also) consider how the resulting >>> diff would look like when putting together a change. And besides >>> the review aspect, there's also the archeology one - "git blame" >>> yields much more helpful results when code doesn't get moved >>> around more than necessary. But yes, there's no very clear "this >>> is the better option" here. I've taken another look at the code >>> before your change though - everything is already nicely in one >>> place with Andrew's most recent code reorg. So I'm now having an >>> even harder time seeing why you want to move things around again. >> We don't. I've recommend twice now to have a single "else if" hunk >> which is nearly empty, and much more obviously a gross "make it work for >> 4.13" bodge. > The results are a tiny bit better, but not much really (see attached). What I meant was: > diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c > index 312c481f1e..bc088e45f0 100644 > --- a/tools/libxc/xc_cpuid_x86.c > +++ b/tools/libxc/xc_cpuid_x86.c > @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, > } else if ( getenv() ) { ... } > else > { With no delta to this block at all. ~Andrew
On 11/15/19 2:06 PM, Andrew Cooper wrote: > On 15/11/2019 14:04, George Dunlap wrote: >>>> It's not entirely uncommon to (also) consider how the resulting >>>> diff would look like when putting together a change. And besides >>>> the review aspect, there's also the archeology one - "git blame" >>>> yields much more helpful results when code doesn't get moved >>>> around more than necessary. But yes, there's no very clear "this >>>> is the better option" here. I've taken another look at the code >>>> before your change though - everything is already nicely in one >>>> place with Andrew's most recent code reorg. So I'm now having an >>>> even harder time seeing why you want to move things around again. >>> We don't. I've recommend twice now to have a single "else if" hunk >>> which is nearly empty, and much more obviously a gross "make it work for >>> 4.13" bodge. >> The results are a tiny bit better, but not much really (see attached). > > What I meant was: > >> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c >> index 312c481f1e..bc088e45f0 100644 >> --- a/tools/libxc/xc_cpuid_x86.c >> +++ b/tools/libxc/xc_cpuid_x86.c >> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >> } > > else if ( getenv() ) > { > ... > } > >> else >> { > > With no delta to this block at all. Then we have to duplicate this code in both blocks: /* * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM / * XEN_DOMCTL_disable_migrate settings to be reflected correctly in * CPUID. Xen will discard these bits if configuration hasn't been * set for the domain. */ p->extd.itsc = true; p->basic.vmx = true; p->extd.svm = true; I won't object if that's what you guys really want. -George
On 15/11/2019 14:10, George Dunlap wrote: > On 11/15/19 2:06 PM, Andrew Cooper wrote: >> On 15/11/2019 14:04, George Dunlap wrote: >>>>> It's not entirely uncommon to (also) consider how the resulting >>>>> diff would look like when putting together a change. And besides >>>>> the review aspect, there's also the archeology one - "git blame" >>>>> yields much more helpful results when code doesn't get moved >>>>> around more than necessary. But yes, there's no very clear "this >>>>> is the better option" here. I've taken another look at the code >>>>> before your change though - everything is already nicely in one >>>>> place with Andrew's most recent code reorg. So I'm now having an >>>>> even harder time seeing why you want to move things around again. >>>> We don't. I've recommend twice now to have a single "else if" hunk >>>> which is nearly empty, and much more obviously a gross "make it work for >>>> 4.13" bodge. >>> The results are a tiny bit better, but not much really (see attached). >> What I meant was: >> >>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c >>> index 312c481f1e..bc088e45f0 100644 >>> --- a/tools/libxc/xc_cpuid_x86.c >>> +++ b/tools/libxc/xc_cpuid_x86.c >>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>> } >> else if ( getenv() ) >> { >> ... >> } >> >>> else >>> { >> With no delta to this block at all. > Then we have to duplicate this code in both blocks: > > /* > * These settings are necessary to cause earlier > HVM_PARAM_NESTEDHVM / > * XEN_DOMCTL_disable_migrate settings to be reflected correctly in > * CPUID. Xen will discard these bits if configuration hasn't been > * set for the domain. > */ > p->extd.itsc = true; > p->basic.vmx = true; > p->extd.svm = true; > > I won't object if that's what you guys really want. I'd also be happy with a single "goto hvm_common;". These tweaks are already at the end of the block, so suitably positioned. None of this code is going to survive for long. ~Andrew
On 15.11.2019 15:10, George Dunlap wrote: > On 11/15/19 2:06 PM, Andrew Cooper wrote: >> On 15/11/2019 14:04, George Dunlap wrote: >>>>> It's not entirely uncommon to (also) consider how the resulting >>>>> diff would look like when putting together a change. And besides >>>>> the review aspect, there's also the archeology one - "git blame" >>>>> yields much more helpful results when code doesn't get moved >>>>> around more than necessary. But yes, there's no very clear "this >>>>> is the better option" here. I've taken another look at the code >>>>> before your change though - everything is already nicely in one >>>>> place with Andrew's most recent code reorg. So I'm now having an >>>>> even harder time seeing why you want to move things around again. >>>> We don't. I've recommend twice now to have a single "else if" hunk >>>> which is nearly empty, and much more obviously a gross "make it work for >>>> 4.13" bodge. >>> The results are a tiny bit better, but not much really (see attached). >> >> What I meant was: >> >>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c >>> index 312c481f1e..bc088e45f0 100644 >>> --- a/tools/libxc/xc_cpuid_x86.c >>> +++ b/tools/libxc/xc_cpuid_x86.c >>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>> } >> >> else if ( getenv() ) >> { >> ... >> } >> >>> else >>> { >> >> With no delta to this block at all. > > Then we have to duplicate this code in both blocks: > > /* > * These settings are necessary to cause earlier > HVM_PARAM_NESTEDHVM / > * XEN_DOMCTL_disable_migrate settings to be reflected correctly in > * CPUID. Xen will discard these bits if configuration hasn't been > * set for the domain. > */ > p->extd.itsc = true; > p->basic.vmx = true; > p->extd.svm = true; > > I won't object if that's what you guys really want. Personally I think the duplication is less bad than the far heavier original code churn, but to be honest, especially with this intended to go away again soon anyway, I'd not be opposed at all to ... else if ( getenv() ) goto no_fake_ht; else { ... no_fake_ht: /* * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM / * XEN_DOMCTL_disable_migrate settings to be reflected correctly in * CPUID. Xen will discard these bits if configuration hasn't been * set for the domain. */ p->extd.itsc = true; p->basic.vmx = true; p->extd.svm = true; } (despite my general dislike of goto). Jan
On 11/15/19 2:18 PM, Jan Beulich wrote: > On 15.11.2019 15:10, George Dunlap wrote: >> On 11/15/19 2:06 PM, Andrew Cooper wrote: >>> On 15/11/2019 14:04, George Dunlap wrote: >>>>>> It's not entirely uncommon to (also) consider how the resulting >>>>>> diff would look like when putting together a change. And besides >>>>>> the review aspect, there's also the archeology one - "git blame" >>>>>> yields much more helpful results when code doesn't get moved >>>>>> around more than necessary. But yes, there's no very clear "this >>>>>> is the better option" here. I've taken another look at the code >>>>>> before your change though - everything is already nicely in one >>>>>> place with Andrew's most recent code reorg. So I'm now having an >>>>>> even harder time seeing why you want to move things around again. >>>>> We don't. I've recommend twice now to have a single "else if" hunk >>>>> which is nearly empty, and much more obviously a gross "make it work for >>>>> 4.13" bodge. >>>> The results are a tiny bit better, but not much really (see attached). >>> >>> What I meant was: >>> >>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c >>>> index 312c481f1e..bc088e45f0 100644 >>>> --- a/tools/libxc/xc_cpuid_x86.c >>>> +++ b/tools/libxc/xc_cpuid_x86.c >>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>>> } >>> >>> else if ( getenv() ) >>> { >>> ... >>> } >>> >>>> else >>>> { >>> >>> With no delta to this block at all. >> >> Then we have to duplicate this code in both blocks: >> >> /* >> * These settings are necessary to cause earlier >> HVM_PARAM_NESTEDHVM / >> * XEN_DOMCTL_disable_migrate settings to be reflected correctly in >> * CPUID. Xen will discard these bits if configuration hasn't been >> * set for the domain. >> */ >> p->extd.itsc = true; >> p->basic.vmx = true; >> p->extd.svm = true; >> >> I won't object if that's what you guys really want. > > Personally I think the duplication is less bad than the far > heavier original code churn, but to be honest, especially with > this intended to go away again soon anyway, I'd not be opposed > at all to > > ... > else if ( getenv() ) > goto no_fake_ht; This isn't correct, because you do need to clear htt and cmp_legacy, as well as zeroing out cores_per_package and threads_per_cache on Intel. (At least, that's what XenServer's patch does, and it's the best tested.) > else > { > ... > no_fake_ht: > /* > * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM / > * XEN_DOMCTL_disable_migrate settings to be reflected correctly in > * CPUID. Xen will discard these bits if configuration hasn't been > * set for the domain. > */ > p->extd.itsc = true; > p->basic.vmx = true; > p->extd.svm = true; > } > > (despite my general dislike of goto). Well I think gotos into other blocks is even worse. :-) I think the result is a lot nicer to review for sure. -George
Just regarding the use of a system environment variable to turn this feature / bugfix / hack on and off - this would probably break starting the VM via the xendomains script. If the VM definition is in /etc/xen/auto/, then there would be nothing to set the environment variable before the VM is launched - hence it would not be applied and a guest crash would occur... Depending on the VM's settings, this would either continue to start & crash - or just stop again until it could be started with the ENV variable. Steven Haigh
On 15.11.2019 15:29, George Dunlap wrote: > On 11/15/19 2:18 PM, Jan Beulich wrote: >> On 15.11.2019 15:10, George Dunlap wrote: >>> On 11/15/19 2:06 PM, Andrew Cooper wrote: >>>> On 15/11/2019 14:04, George Dunlap wrote: >>>>>>> It's not entirely uncommon to (also) consider how the resulting >>>>>>> diff would look like when putting together a change. And besides >>>>>>> the review aspect, there's also the archeology one - "git blame" >>>>>>> yields much more helpful results when code doesn't get moved >>>>>>> around more than necessary. But yes, there's no very clear "this >>>>>>> is the better option" here. I've taken another look at the code >>>>>>> before your change though - everything is already nicely in one >>>>>>> place with Andrew's most recent code reorg. So I'm now having an >>>>>>> even harder time seeing why you want to move things around again. >>>>>> We don't. I've recommend twice now to have a single "else if" hunk >>>>>> which is nearly empty, and much more obviously a gross "make it work for >>>>>> 4.13" bodge. >>>>> The results are a tiny bit better, but not much really (see attached). >>>> >>>> What I meant was: >>>> >>>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c >>>>> index 312c481f1e..bc088e45f0 100644 >>>>> --- a/tools/libxc/xc_cpuid_x86.c >>>>> +++ b/tools/libxc/xc_cpuid_x86.c >>>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>>>> } >>>> >>>> else if ( getenv() ) >>>> { >>>> ... >>>> } >>>> >>>>> else >>>>> { >>>> >>>> With no delta to this block at all. >>> >>> Then we have to duplicate this code in both blocks: >>> >>> /* >>> * These settings are necessary to cause earlier >>> HVM_PARAM_NESTEDHVM / >>> * XEN_DOMCTL_disable_migrate settings to be reflected correctly in >>> * CPUID. Xen will discard these bits if configuration hasn't been >>> * set for the domain. >>> */ >>> p->extd.itsc = true; >>> p->basic.vmx = true; >>> p->extd.svm = true; >>> >>> I won't object if that's what you guys really want. >> >> Personally I think the duplication is less bad than the far >> heavier original code churn, but to be honest, especially with >> this intended to go away again soon anyway, I'd not be opposed >> at all to >> >> ... >> else if ( getenv() ) >> goto no_fake_ht; > > This isn't correct, because you do need to clear htt and cmp_legacy, as > well as zeroing out cores_per_package and threads_per_cache on Intel. > (At least, that's what XenServer's patch does, and it's the best tested.) > >> else >> { >> ... >> no_fake_ht: >> /* >> * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM / >> * XEN_DOMCTL_disable_migrate settings to be reflected correctly in >> * CPUID. Xen will discard these bits if configuration hasn't been >> * set for the domain. >> */ >> p->extd.itsc = true; >> p->basic.vmx = true; >> p->extd.svm = true; >> } >> >> (despite my general dislike of goto). > > Well I think gotos into other blocks is even worse. :-) > > I think the result is a lot nicer to review for sure. Trying to comment despite this having been an attachment: >--- a/tools/libxc/xc_cpuid_x86.c >+++ b/tools/libxc/xc_cpuid_x86.c >@@ -579,6 +579,26 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, > } > else > { >+ if ( getenv("XEN_LIBXC_DISABLE_FAKEHT") ) { The brace wants to move onto its own line. >+ p->basic.htt = false; I think everything below here indeed simply undoes what said old commit did, but I can't match up this one. And together with the question of whether instead leaving it alone, cmp_legacy then would have the same question raised. Jan
On 11/15/19 2:42 PM, Jan Beulich wrote: > On 15.11.2019 15:29, George Dunlap wrote: >> On 11/15/19 2:18 PM, Jan Beulich wrote: >>> On 15.11.2019 15:10, George Dunlap wrote: >>>> On 11/15/19 2:06 PM, Andrew Cooper wrote: >>>>> On 15/11/2019 14:04, George Dunlap wrote: >>>>>>>> It's not entirely uncommon to (also) consider how the resulting >>>>>>>> diff would look like when putting together a change. And besides >>>>>>>> the review aspect, there's also the archeology one - "git blame" >>>>>>>> yields much more helpful results when code doesn't get moved >>>>>>>> around more than necessary. But yes, there's no very clear "this >>>>>>>> is the better option" here. I've taken another look at the code >>>>>>>> before your change though - everything is already nicely in one >>>>>>>> place with Andrew's most recent code reorg. So I'm now having an >>>>>>>> even harder time seeing why you want to move things around again. >>>>>>> We don't. I've recommend twice now to have a single "else if" hunk >>>>>>> which is nearly empty, and much more obviously a gross "make it work for >>>>>>> 4.13" bodge. >>>>>> The results are a tiny bit better, but not much really (see attached). >>>>> >>>>> What I meant was: >>>>> >>>>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c >>>>>> index 312c481f1e..bc088e45f0 100644 >>>>>> --- a/tools/libxc/xc_cpuid_x86.c >>>>>> +++ b/tools/libxc/xc_cpuid_x86.c >>>>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >>>>>> } >>>>> >>>>> else if ( getenv() ) >>>>> { >>>>> ... >>>>> } >>>>> >>>>>> else >>>>>> { >>>>> >>>>> With no delta to this block at all. >>>> >>>> Then we have to duplicate this code in both blocks: >>>> >>>> /* >>>> * These settings are necessary to cause earlier >>>> HVM_PARAM_NESTEDHVM / >>>> * XEN_DOMCTL_disable_migrate settings to be reflected correctly in >>>> * CPUID. Xen will discard these bits if configuration hasn't been >>>> * set for the domain. >>>> */ >>>> p->extd.itsc = true; >>>> p->basic.vmx = true; >>>> p->extd.svm = true; >>>> >>>> I won't object if that's what you guys really want. >>> >>> Personally I think the duplication is less bad than the far >>> heavier original code churn, but to be honest, especially with >>> this intended to go away again soon anyway, I'd not be opposed >>> at all to >>> >>> ... >>> else if ( getenv() ) >>> goto no_fake_ht; >> >> This isn't correct, because you do need to clear htt and cmp_legacy, as >> well as zeroing out cores_per_package and threads_per_cache on Intel. >> (At least, that's what XenServer's patch does, and it's the best tested.) >> >>> else >>> { >>> ... >>> no_fake_ht: >>> /* >>> * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM / >>> * XEN_DOMCTL_disable_migrate settings to be reflected correctly in >>> * CPUID. Xen will discard these bits if configuration hasn't been >>> * set for the domain. >>> */ >>> p->extd.itsc = true; >>> p->basic.vmx = true; >>> p->extd.svm = true; >>> } >>> >>> (despite my general dislike of goto). >> >> Well I think gotos into other blocks is even worse. :-) >> >> I think the result is a lot nicer to review for sure. > > Trying to comment despite this having been an attachment: > >> --- a/tools/libxc/xc_cpuid_x86.c >> +++ b/tools/libxc/xc_cpuid_x86.c >> @@ -579,6 +579,26 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, >> } >> else >> { >> + if ( getenv("XEN_LIBXC_DISABLE_FAKEHT") ) { > > The brace wants to move onto its own line. Ack > >> + p->basic.htt = false; > > I think everything below here indeed simply undoes what said old > commit did, but I can't match up this one. And together with the > question of whether instead leaving it alone, cmp_legacy then > would have the same question raised. This is based on a XenServer patch which reverts the entire commit, and has been maintained in the patchqueue since the commit made it upstream, AFAICT. So I'll let someone from that team comment on the wherefores; but as I said, it's by far the best tested option we're going to get. -George
On 15/11/2019 14:55, George Dunlap wrote: >>> + p->basic.htt = false; >> I think everything below here indeed simply undoes what said old >> commit did, but I can't match up this one. And together with the >> question of whether instead leaving it alone, cmp_legacy then >> would have the same question raised. > This is based on a XenServer patch which reverts the entire commit, and > has been maintained in the patchqueue since the commit made it upstream, > AFAICT. So I'll let someone from that team comment on the wherefores; > but as I said, it's by far the best tested option we're going to get. Yes. It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained forwards until now) because it broke migration across that changeset. It is also this exact version of the revert which has been tested and confirmed to fix the Ryzen 3xxx fixes. A separate experiment tried playing with only the flags, via cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes. Given that both the before and after logic here is broken in the eyes of the APM, I'm not overly fussed about working about exactly how. ~Andrew
FTR, please avoid top-posting. :-) On 11/15/19 2:31 PM, Steven Haigh wrote: > Just regarding the use of a system environment variable to turn this > feature / bugfix / hack on and off - this would probably break starting > the VM via the xendomains script. > > If the VM definition is in /etc/xen/auto/, then there would be nothing > to set the environment variable before the VM is launched - hence it > would not be applied and a guest crash would occur... > > Depending on the VM's settings, this would either continue to start & > crash - or just stop again until it could be started with the ENV variable. Right. So a couple of options: 1. Users of xendomains could set the environment variable in their xendomains script 2. We could add a xl.cfg option. Unknown xl.cfg entries are ignored (for better or for worse); in the future, when the "fake ht" thing is replaced, we can either continue ignoring it, or give a useful error message saying how it should be changed. 2a. We could have the config option *replace* the environment variable; in which case we'd leave libvirt users high and dry 2b. We could have the config option cause xl to *set* the environment variable, which should continue to allow other toolstacks (even those not using libxl) to potentially work around the issue. Right now I'm leaning towards 2b, and having it be in a separate patch. -George
On 15.11.19 16:05, George Dunlap wrote: > FTR, please avoid top-posting. :-) > > On 11/15/19 2:31 PM, Steven Haigh wrote: >> Just regarding the use of a system environment variable to turn this >> feature / bugfix / hack on and off - this would probably break starting >> the VM via the xendomains script. >> >> If the VM definition is in /etc/xen/auto/, then there would be nothing >> to set the environment variable before the VM is launched - hence it >> would not be applied and a guest crash would occur... >> >> Depending on the VM's settings, this would either continue to start & >> crash - or just stop again until it could be started with the ENV variable. > > Right. So a couple of options: > > 1. Users of xendomains could set the environment variable in their > xendomains script > > 2. We could add a xl.cfg option. Unknown xl.cfg entries are ignored > (for better or for worse); in the future, when the "fake ht" thing is > replaced, we can either continue ignoring it, or give a useful error > message saying how it should be changed. > > 2a. We could have the config option *replace* the environment variable; > in which case we'd leave libvirt users high and dry > > 2b. We could have the config option cause xl to *set* the environment > variable, which should continue to allow other toolstacks (even those > not using libxl) to potentially work around the issue. > > Right now I'm leaning towards 2b, and having it be in a separate patch. In which case we should consider having a way to set arbitrary environment variables from the config file in order to avoid this kind of discussion in future similar cases. Juergen
On 11/15/19 3:10 PM, Jürgen Groß wrote: > On 15.11.19 16:05, George Dunlap wrote: >> FTR, please avoid top-posting. :-) >> >> On 11/15/19 2:31 PM, Steven Haigh wrote: >>> Just regarding the use of a system environment variable to turn this >>> feature / bugfix / hack on and off - this would probably break starting >>> the VM via the xendomains script. >>> >>> If the VM definition is in /etc/xen/auto/, then there would be nothing >>> to set the environment variable before the VM is launched - hence it >>> would not be applied and a guest crash would occur... >>> >>> Depending on the VM's settings, this would either continue to start & >>> crash - or just stop again until it could be started with the ENV >>> variable. >> >> Right. So a couple of options: >> >> 1. Users of xendomains could set the environment variable in their >> xendomains script >> >> 2. We could add a xl.cfg option. Unknown xl.cfg entries are ignored >> (for better or for worse); in the future, when the "fake ht" thing is >> replaced, we can either continue ignoring it, or give a useful error >> message saying how it should be changed. >> >> 2a. We could have the config option *replace* the environment variable; >> in which case we'd leave libvirt users high and dry >> >> 2b. We could have the config option cause xl to *set* the environment >> variable, which should continue to allow other toolstacks (even those >> not using libxl) to potentially work around the issue. >> >> Right now I'm leaning towards 2b, and having it be in a separate patch. > > In which case we should consider having a way to set arbitrary > environment variables from the config file in order to avoid this kind > of discussion in future similar cases. Right, I was thinking about a good / useful interface; e.g.: workarounds = [ AMD_RYZEN_TOPOLOGY_FIX=ignore ] And then have 'ignore' mean, "Ignore this if you haven't heard of it, or if the option has gone away; 'auto' mean, "Do the most reasonable thing", 'strict' to mean, "Fail domain build if this workaround is obsolete", or something like that. Then users can dial their own "just make it work" vs "tell me it's gone" as they like. -George
On 11/15/19 2:59 PM, Andrew Cooper wrote: > On 15/11/2019 14:55, George Dunlap wrote: >>>> + p->basic.htt = false; >>> I think everything below here indeed simply undoes what said old >>> commit did, but I can't match up this one. And together with the >>> question of whether instead leaving it alone, cmp_legacy then >>> would have the same question raised. >> This is based on a XenServer patch which reverts the entire commit, and >> has been maintained in the patchqueue since the commit made it upstream, >> AFAICT. So I'll let someone from that team comment on the wherefores; >> but as I said, it's by far the best tested option we're going to get. > > Yes. It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained > forwards until now) because it broke migration across that changeset. > > It is also this exact version of the revert which has been tested and > confirmed to fix the Ryzen 3xxx fixes. > > A separate experiment tried playing with only the flags, via > cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes. Is that because the "revert" still clears cmp_legacy, rather than setting it to 1? I think what Jan was getting at was that ca2eee92df44 *sets* htt and *clears* cmp_legacy, but previous to that commit, htt and cmp_legacy weren't changed, they were simply left alone. When reverting this patch, why do we not simply leave it alone, as was done before, rather than actively clearing them? I think it's a good question to ask, but unless there is a known issue with what the patch does, I think it's far less risky just to take the version which has been tested. -George
On 15.11.2019 16:05, George Dunlap wrote: > FTR, please avoid top-posting. :-) > > On 11/15/19 2:31 PM, Steven Haigh wrote: >> Just regarding the use of a system environment variable to turn this >> feature / bugfix / hack on and off - this would probably break starting >> the VM via the xendomains script. >> >> If the VM definition is in /etc/xen/auto/, then there would be nothing >> to set the environment variable before the VM is launched - hence it >> would not be applied and a guest crash would occur... >> >> Depending on the VM's settings, this would either continue to start & >> crash - or just stop again until it could be started with the ENV variable. > > Right. So a couple of options: > > 1. Users of xendomains could set the environment variable in their > xendomains script > > 2. We could add a xl.cfg option. Unknown xl.cfg entries are ignored > (for better or for worse); in the future, when the "fake ht" thing is > replaced, we can either continue ignoring it, or give a useful error > message saying how it should be changed. > > 2a. We could have the config option *replace* the environment variable; > in which case we'd leave libvirt users high and dry > > 2b. We could have the config option cause xl to *set* the environment > variable, which should continue to allow other toolstacks (even those > not using libxl) to potentially work around the issue. And how would any of these allow to deal with heterogeneous sets of guests? Jan
On 11/15/19 3:27 PM, Jan Beulich wrote: > On 15.11.2019 16:05, George Dunlap wrote: >> FTR, please avoid top-posting. :-) >> >> On 11/15/19 2:31 PM, Steven Haigh wrote: >>> Just regarding the use of a system environment variable to turn this >>> feature / bugfix / hack on and off - this would probably break starting >>> the VM via the xendomains script. >>> >>> If the VM definition is in /etc/xen/auto/, then there would be nothing >>> to set the environment variable before the VM is launched - hence it >>> would not be applied and a guest crash would occur... >>> >>> Depending on the VM's settings, this would either continue to start & >>> crash - or just stop again until it could be started with the ENV variable. >> >> Right. So a couple of options: >> >> 1. Users of xendomains could set the environment variable in their >> xendomains script >> >> 2. We could add a xl.cfg option. Unknown xl.cfg entries are ignored >> (for better or for worse); in the future, when the "fake ht" thing is >> replaced, we can either continue ignoring it, or give a useful error >> message saying how it should be changed. >> >> 2a. We could have the config option *replace* the environment variable; >> in which case we'd leave libvirt users high and dry >> >> 2b. We could have the config option cause xl to *set* the environment >> variable, which should continue to allow other toolstacks (even those >> not using libxl) to potentially work around the issue. > > And how would any of these allow to deal with heterogeneous sets of > guests? Are you perhaps confusing 'xl.cfg' (which is the per-domain configuration file) with 'xl.conf' (which is the system-wide configuration file)? #1 would obviously require arranging for *all* xendomain-enabled guests to be started with the config enabled. #2 would allow heterogeneous guests if the admin went through and added the workaround to the guests she wanted. -George
On 15.11.2019 16:23, George Dunlap wrote: > On 11/15/19 2:59 PM, Andrew Cooper wrote: >> On 15/11/2019 14:55, George Dunlap wrote: >>>>> + p->basic.htt = false; >>>> I think everything below here indeed simply undoes what said old >>>> commit did, but I can't match up this one. And together with the >>>> question of whether instead leaving it alone, cmp_legacy then >>>> would have the same question raised. >>> This is based on a XenServer patch which reverts the entire commit, and >>> has been maintained in the patchqueue since the commit made it upstream, >>> AFAICT. So I'll let someone from that team comment on the wherefores; >>> but as I said, it's by far the best tested option we're going to get. >> >> Yes. It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained >> forwards until now) because it broke migration across that changeset. >> >> It is also this exact version of the revert which has been tested and >> confirmed to fix the Ryzen 3xxx fixes. >> >> A separate experiment tried playing with only the flags, via >> cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes. > > Is that because the "revert" still clears cmp_legacy, rather than > setting it to 1? > > I think what Jan was getting at was that ca2eee92df44 *sets* htt and > *clears* cmp_legacy, but previous to that commit, htt and cmp_legacy > weren't changed, they were simply left alone. When reverting this > patch, why do we not simply leave it alone, as was done before, rather > than actively clearing them? Actually no, I wasn't looking properly - HTT used to be cleared as much as CMP_LEGACY before that change. Somehow I didn't spot the former when putting together my earlier reply (maybe I looked for HTT when its only HT there). So I'm sorry for the extra noise. Jan
On 15.11.2019 16:30, George Dunlap wrote: > On 11/15/19 3:27 PM, Jan Beulich wrote: >> On 15.11.2019 16:05, George Dunlap wrote: >>> FTR, please avoid top-posting. :-) >>> >>> On 11/15/19 2:31 PM, Steven Haigh wrote: >>>> Just regarding the use of a system environment variable to turn this >>>> feature / bugfix / hack on and off - this would probably break starting >>>> the VM via the xendomains script. >>>> >>>> If the VM definition is in /etc/xen/auto/, then there would be nothing >>>> to set the environment variable before the VM is launched - hence it >>>> would not be applied and a guest crash would occur... >>>> >>>> Depending on the VM's settings, this would either continue to start & >>>> crash - or just stop again until it could be started with the ENV variable. >>> >>> Right. So a couple of options: >>> >>> 1. Users of xendomains could set the environment variable in their >>> xendomains script >>> >>> 2. We could add a xl.cfg option. Unknown xl.cfg entries are ignored >>> (for better or for worse); in the future, when the "fake ht" thing is >>> replaced, we can either continue ignoring it, or give a useful error >>> message saying how it should be changed. >>> >>> 2a. We could have the config option *replace* the environment variable; >>> in which case we'd leave libvirt users high and dry >>> >>> 2b. We could have the config option cause xl to *set* the environment >>> variable, which should continue to allow other toolstacks (even those >>> not using libxl) to potentially work around the issue. >> >> And how would any of these allow to deal with heterogeneous sets of >> guests? > > Are you perhaps confusing 'xl.cfg' (which is the per-domain > configuration file) with 'xl.conf' (which is the system-wide > configuration file)? Oh, indeed I was. I'm not used to any suffixes on domain config files. I'm sorry. Jan
On 15/11/2019 15:23, George Dunlap wrote: > On 11/15/19 2:59 PM, Andrew Cooper wrote: >> On 15/11/2019 14:55, George Dunlap wrote: >>>>> + p->basic.htt = false; >>>> I think everything below here indeed simply undoes what said old >>>> commit did, but I can't match up this one. And together with the >>>> question of whether instead leaving it alone, cmp_legacy then >>>> would have the same question raised. >>> This is based on a XenServer patch which reverts the entire commit, and >>> has been maintained in the patchqueue since the commit made it upstream, >>> AFAICT. So I'll let someone from that team comment on the wherefores; >>> but as I said, it's by far the best tested option we're going to get. >> Yes. It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained >> forwards until now) because it broke migration across that changeset. >> >> It is also this exact version of the revert which has been tested and >> confirmed to fix the Ryzen 3xxx fixes. >> >> A separate experiment tried playing with only the flags, via >> cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes. > Is that because the "revert" still clears cmp_legacy, rather than > setting it to 1? > > I think what Jan was getting at was that ca2eee92df44 *sets* htt and > *clears* cmp_legacy, but previous to that commit, htt and cmp_legacy > weren't changed, they were simply left alone. When reverting this > patch, why do we not simply leave it alone, as was done before, rather > than actively clearing them? You also need to account for the accumulated bugfixes of the code since ca2eee92df44. > I think it's a good question to ask, but unless there is a known issue > with what the patch does, I think it's far less risky just to take the > version which has been tested. In short, yes I believe the behaviour is deliberate, although I don't have the bug tickets to hand to remember exactly what went wrong. The only other possibility (and perhaps is better, now that it is possible) is to foward those two bits from the host policy. They are set in the guest policy due to (still) not having a split between default and max (another issue in the queue to be fixed). ~Andrew
On 11/15/19 3:34 PM, Jan Beulich wrote: > On 15.11.2019 16:30, George Dunlap wrote: >> On 11/15/19 3:27 PM, Jan Beulich wrote: >>> On 15.11.2019 16:05, George Dunlap wrote: >>>> FTR, please avoid top-posting. :-) >>>> >>>> On 11/15/19 2:31 PM, Steven Haigh wrote: >>>>> Just regarding the use of a system environment variable to turn this >>>>> feature / bugfix / hack on and off - this would probably break starting >>>>> the VM via the xendomains script. >>>>> >>>>> If the VM definition is in /etc/xen/auto/, then there would be nothing >>>>> to set the environment variable before the VM is launched - hence it >>>>> would not be applied and a guest crash would occur... >>>>> >>>>> Depending on the VM's settings, this would either continue to start & >>>>> crash - or just stop again until it could be started with the ENV variable. >>>> >>>> Right. So a couple of options: >>>> >>>> 1. Users of xendomains could set the environment variable in their >>>> xendomains script >>>> >>>> 2. We could add a xl.cfg option. Unknown xl.cfg entries are ignored >>>> (for better or for worse); in the future, when the "fake ht" thing is >>>> replaced, we can either continue ignoring it, or give a useful error >>>> message saying how it should be changed. >>>> >>>> 2a. We could have the config option *replace* the environment variable; >>>> in which case we'd leave libvirt users high and dry >>>> >>>> 2b. We could have the config option cause xl to *set* the environment >>>> variable, which should continue to allow other toolstacks (even those >>>> not using libxl) to potentially work around the issue. >>> >>> And how would any of these allow to deal with heterogeneous sets of >>> guests? >> >> Are you perhaps confusing 'xl.cfg' (which is the per-domain >> configuration file) with 'xl.conf' (which is the system-wide >> configuration file)? > > Oh, indeed I was. I'm not used to any suffixes on domain config > files. I'm sorry. FYI I'm using the names of the respective man pages: `man xl.cfg` gives you the man page for the per-domain config, `man xl.conf` gives you the global config. It's far from obvious, but at least it's something. :-) -George
diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index 312c481f1e..70c85e1467 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid, } else { - /* - * Topology for HVM guests is entirely controlled by Xen. For now, we - * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. - */ - p->basic.htt = true; + p->basic.htt = false; p->extd.cmp_legacy = false; - /* - * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. - * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid - * overflow. - */ - if ( !(p->basic.lppp & 0x80) ) - p->basic.lppp *= 2; - switch ( p->x86_vendor ) { case X86_VENDOR_INTEL: for ( i = 0; (p->cache.subleaf[i].type && i < ARRAY_SIZE(p->cache.raw)); ++i ) { - p->cache.subleaf[i].cores_per_package = - (p->cache.subleaf[i].cores_per_package << 1) | 1; + p->cache.subleaf[i].cores_per_package = 0; p->cache.subleaf[i].threads_per_cache = 0; } break; + } - case X86_VENDOR_AMD: - case X86_VENDOR_HYGON: + if ( !getenv("XEN_LIBXC_DISABLE_FAKEHT") ) { /* - * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize. - * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one). - * Update to reflect vLAPIC_ID = vCPU_ID * 2. But avoid - * - overflow, - * - going out of sync with leaf 1 EBX[23:16], - * - incrementing ApicIdCoreSize when it's zero (which changes the - * meaning of bits 7:0). + * Topology for HVM guests is entirely controlled by Xen. For now, we + * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT. */ - if ( p->extd.nc < 0x7f ) + p->basic.htt = true; + + /* + * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package. + * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid + * overflow. + */ + if ( !(p->basic.lppp & 0x80) ) + p->basic.lppp *= 2; + + switch ( p->x86_vendor ) { - if ( p->extd.apic_id_size != 0 && p->extd.apic_id_size != 0xf ) - p->extd.apic_id_size++; + case X86_VENDOR_INTEL: + for ( i = 0; (p->cache.subleaf[i].type && + i < ARRAY_SIZE(p->cache.raw)); ++i ) + { + p->cache.subleaf[i].cores_per_package = + (p->cache.subleaf[i].cores_per_package << 1) | 1; + p->cache.subleaf[i].threads_per_cache = 0; + } + + case X86_VENDOR_AMD: + case X86_VENDOR_HYGON: + /* + * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize. + * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one). + * Update to reflect vLAPIC_ID = vCPU_ID * 2. But avoid + * - overflow, + * - going out of sync with leaf 1 EBX[23:16], + * - incrementing ApicIdCoreSize when it's zero (which changes the + * meaning of bits 7:0). + */ + if ( p->extd.nc < 0x7f ) + { + if ( p->extd.apic_id_size != 0 && p->extd.apic_id_size != 0xf ) + p->extd.apic_id_size++; + + p->extd.nc = (p->extd.nc << 1) | 1; + } + break; - p->extd.nc = (p->extd.nc << 1) | 1; } - break; } /*
Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM guests") attempted to "fake up" a topology which would induce guest operating systems to not treat vcpus as sibling hyperthreads. This involved (among other things) actually reporting hyperthreading as available, but giving vcpus every other APICID. The resulting cpu featureset is invalid, but most operating systems on most hardware managed to cope with it. Unfortunately, Windows running on modern AMD hardware -- including Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets confused by the resulting contradictory feature bits and crashes during installation. (Linux guests have so far continued to cope.) A "proper" fix is complicated and it's too late to fix it either for 4.13, or to backport to supported branches. As a short-term fix, implement an option to disable this "Fake HT" mode. The resulting topology reported will not be canonical, but experimentally continues to work with Windows guests. However, disabling this "Fake HT" mode has not been widely tested, and will almost certainly break migration if applied inconsistently. To minimize impact while allowing administrators to disable "Fake HT" only on guests which are known not to work without it (i.e., Windows guests) on affected hardware, add an environment variable which can be set to disable the "Fake HT" mode on such hardware. Reported-by: Steven Haigh <netwiz@crc.id.au> Reported-by: Andreas Kinzler <hfp@posteo.de> Signed-off-by: George Dunlap <george.dunlap@citrix.com> --- This has been compile-tested only; I'm posting it early to get feedback on the approach. TODO: Prevent such guests from being migrated Open questions: - Is this the right place to put the `getenv` check? - Is there any way we can make migration work, at least in some cases? - Can we check for known-problematic models, and at least report a more useful error? CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Jan Beulich <jbeulich@suse.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Anthony Perard <anthony.perard@citrix.com> --- tools/libxc/xc_cpuid_x86.c | 74 +++++++++++++++++++++++--------------- 1 file changed, 45 insertions(+), 29 deletions(-)