diff mbox series

[RFC] x86: Add hack to disable "Fake HT" mode

Message ID	20191115105739.20333-1-george.dunlap@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=w8Ow=ZH=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A77D52072B Received-SPF: None (esa4.hc3370-68.iphmx.com: no sender authenticity information available from domain of george.dunlap@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa4.hc3370-68.iphmx.com; envelope-from="George.Dunlap@citrix.com"; x-sender="george.dunlap@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa4.hc3370-68.iphmx.com: domain of George.Dunlap@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa4.hc3370-68.iphmx.com; envelope-from="George.Dunlap@citrix.com"; x-sender="George.Dunlap@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa4.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa4.hc3370-68.iphmx.com; envelope-from="George.Dunlap@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: YILEz5dv3xC/FnczimzS2zJsO6RowadxBC0IlMW+H3Gcepg2/fXUCDJQrfBT2Wg4oeUU0FWeYg anYcS+S9ga6GM3SOe7/GkEDRznOblSGFVUByp7OsAp/7WoDqBwcc+IhB0pb28KWE4ifffWrIey nbrVDiIeTlSe9CcimQEVJEtvtkR47VphJUUGKZYVGKGt4gPfdFq2lWkpkT2KRupHs39OokBMQe EKBiPJNXMh0rxTgxJ9RIPROBLSwe4TQxh3YaXc7ESqKOs+xqT5r6YWmn2gyB1PDIK/eEUbjQya q4g= From: George Dunlap <george.dunlap@citrix.com> To: <xen-devel@lists.xenproject.org> Date: Fri, 15 Nov 2019 10:57:39 +0000 Message-ID: <20191115105739.20333-1-george.dunlap@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode Precedence: list Cc: Steven Haigh <netwiz@crc.id.au>, Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, Andreas Kinzler <hfp@posteo.de>, Jan Beulich <jbeulich@suse.com>, Anthony Perard <anthony.perard@citrix.com>, Ian Jackson <ian.jackson@citrix.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	[RFC] x86: Add hack to disable "Fake HT" mode \| expand [RFC] x86: Add hack to disable "Fake HT" mode

Commit Message

George Dunlap Nov. 15, 2019, 10:57 a.m. UTC

Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM
guests") attempted to "fake up" a topology which would induce guest
operating systems to not treat vcpus as sibling hyperthreads.  This
involved (among other things) actually reporting hyperthreading as
available, but giving vcpus every other APICID.  The resulting cpu
featureset is invalid, but most operating systems on most hardware
managed to cope with it.

Unfortunately, Windows running on modern AMD hardware -- including
Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets
confused by the resulting contradictory feature bits and crashes
during installation.  (Linux guests have so far continued to cope.)

A "proper" fix is complicated and it's too late to fix it either for
4.13, or to backport to supported branches.  As a short-term fix,
implement an option to disable this "Fake HT" mode.  The resulting
topology reported will not be canonical, but experimentally continues
to work with Windows guests.

However, disabling this "Fake HT" mode has not been widely tested, and
will almost certainly break migration if applied inconsistently.

To minimize impact while allowing administrators to disable "Fake HT"
only on guests which are known not to work without it (i.e., Windows
guests) on affected hardware, add an environment variable which can be
set to disable the "Fake HT" mode on such hardware.

Reported-by: Steven Haigh <netwiz@crc.id.au>
Reported-by: Andreas Kinzler <hfp@posteo.de>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
This has been compile-tested only; I'm posting it early to get
feedback on the approach.

TODO: Prevent such guests from being migrated

Open questions:

- Is this the right place to put the `getenv` check?

- Is there any way we can make migration work, at least in some cases?

- Can we check for known-problematic models, and at least report a
  more useful error?

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
---
 tools/libxc/xc_cpuid_x86.c | 74 +++++++++++++++++++++++---------------
 1 file changed, 45 insertions(+), 29 deletions(-)

Comments

Jan Beulich Nov. 15, 2019, 11:12 a.m. UTC | #1

On 15.11.2019 11:57, George Dunlap wrote:
> Open questions:
> 
> - Is this the right place to put the `getenv` check?
> 
> - Is there any way we can make migration work, at least in some cases?
> 
> - Can we check for known-problematic models, and at least report a
>   more useful error?

Checking for specific models should be straightforward, but I wonder
how sensible it is to compile a likely ever growing list into here.

As to the reporting of an error - you saying "more useful" suggests
there is some error already being reported. But I don't see any here,
nor does any come to mind.

> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>      }
>      else
>      {
> -        /*
> -         * Topology for HVM guests is entirely controlled by Xen.  For now, we
> -         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
> -         */
> -        p->basic.htt = true;
> +        p->basic.htt = false;
>          p->extd.cmp_legacy = false;
>  
> -        /*
> -         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
> -         * overflow.
> -         */
> -        if ( !(p->basic.lppp & 0x80) )
> -            p->basic.lppp *= 2;
> -

I appreciate you wanting to put all adjustments in a central place, but
at least it makes patch review more difficult. How about you latch
!getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of
the function and then the above would become

        if ( !(p->basic.lppp & 0x80) )
            p->basic.lppp <<= fakeht;

and e.g. ...

>          switch ( p->x86_vendor )
>          {
>          case X86_VENDOR_INTEL:
>              for ( i = 0; (p->cache.subleaf[i].type &&
>                            i < ARRAY_SIZE(p->cache.raw)); ++i )
>              {
> -                p->cache.subleaf[i].cores_per_package =
> -                    (p->cache.subleaf[i].cores_per_package << 1) | 1;

... this

                p->cache.subleaf[i].cores_per_package =
                    (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht;

> +                p->cache.subleaf[i].cores_per_package = 0;

This doesn't look correct - you need to leave alone the field if
the adjustment moved down is supposed to have any effect. This
is an example of why the bigger code movement you do is risky.

Jan

Andreas Kinzler Nov. 15, 2019, 11:17 a.m. UTC | #2

On 15.11.2019 11:57, George Dunlap wrote:
> Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM
> guests") attempted to "fake up" a topology which would induce guest
> operating systems to not treat vcpus as sibling hyperthreads.  This
> involved (among other things) actually reporting hyperthreading as
> available, but giving vcpus every other APICID.  The resulting cpu
> featureset is invalid, but most operating systems on most hardware
> managed to cope with it.
> 
> Unfortunately, Windows running on modern AMD hardware -- including
> Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets
> confused by the resulting contradictory feature bits and crashes
> during installation.  (Linux guests have so far continued to cope.)

I do not understand a central point: No matter why and/or how a fake 
topology is presented by Xen, why did the older generation Ryzen 2xxx 
work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that 
causes the one to work and the other to fail?

Regards Andreas

George Dunlap Nov. 15, 2019, 11:29 a.m. UTC | #3

On 11/15/19 11:17 AM, Andreas Kinzler wrote:
> On 15.11.2019 11:57, George Dunlap wrote:
>> Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM
>> guests") attempted to "fake up" a topology which would induce guest
>> operating systems to not treat vcpus as sibling hyperthreads.  This
>> involved (among other things) actually reporting hyperthreading as
>> available, but giving vcpus every other APICID.  The resulting cpu
>> featureset is invalid, but most operating systems on most hardware
>> managed to cope with it.
>>
>> Unfortunately, Windows running on modern AMD hardware -- including
>> Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets
>> confused by the resulting contradictory feature bits and crashes
>> during installation.  (Linux guests have so far continued to cope.)
> 
> I do not understand a central point: No matter why and/or how a fake
> topology is presented by Xen, why did the older generation Ryzen 2xxx
> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that
> causes the one to work and the other to fail?

The CPU features that the guest sees are a mix of the real underlying
features and changes made by Xen.  Xen and/or the hardware will behave
exactly the same in both cases; the only difference is what the guest
operating system does.  The guest operating system code which is trying
to determine the topology sees an impossible franken-monster mix of bits
which its authors never anticipated, and so did not test; and it goes
off the rails somewhere and crashes. Without seeing the code I cannot
tell you how or exactly why it fails on Ryzen 3xxx but not on Ryzen
2xxx.  It could be something as simple as a missed case in a switch
statement somewhere which never happens on real hardware.

Ideally of course such code should never crash, and should probably be
designed such that it muddles on somehow no matter what it gets.  But I
doubt we're going to get a huge amount of sympathy from MS if we request
a patch to allow it to deal with incoherent topology; and in any case
there are already lots of images out there already with the old code.

(Caveat: This is all from my understanding of Andy's description, not
from personal experience.)

 -George

Andreas Kinzler Nov. 15, 2019, 11:39 a.m. UTC | #4

On 15.11.2019 12:29, George Dunlap wrote:
> On 11/15/19 11:17 AM, Andreas Kinzler wrote:
>> I do not understand a central point: No matter why and/or how a fake
>> topology is presented by Xen, why did the older generation Ryzen 2xxx
>> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that
>> causes the one to work and the other to fail?
> The CPU features that the guest sees are a mix of the real underlying
> features and changes made by Xen.  Xen and/or the hardware will behave

Why not analyze the bits in detail? I already posted the complete CPUID 
for 3700X 
(https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html).

@Steven:
 > If this is helpful, I can probably provide the same from:
 >    * Ryzen 2700x
 >    * Ryzen 3900x
Can you post for 2700X?

Then someone with detailed knowledge could compare the two?

Regards Andreas

George Dunlap Nov. 15, 2019, 11:58 a.m. UTC | #5

On 11/15/19 11:12 AM, Jan Beulich wrote:
> On 15.11.2019 11:57, George Dunlap wrote:
>> Open questions:
>>
>> - Is this the right place to put the `getenv` check?
>>
>> - Is there any way we can make migration work, at least in some cases?
>>
>> - Can we check for known-problematic models, and at least report a
>>   more useful error?
> 
> Checking for specific models should be straightforward, but I wonder
> how sensible it is to compile a likely ever growing list into here.

The hope is that this would be a short-term fix, replaced by something
more robust in 4.14.

> 
> As to the reporting of an error - you saying "more useful" suggests
> there is some error already being reported. But I don't see any here,
> nor does any come to mind.

At the moment, if someone buys a Ryzen 3xxx box, installs Xen, and tries
to boot a Windows guest, Windows will simply crash (as I understand it).
 (That is, the issue is "reported" by a BSoD.)  That person will then
google around, and hopefully run across the various threads here, and
see that she should add "export XEN_LIBXC_DISABLE_FAKEHT=1" when
starting their Windows guests.

Suppose instead that after buying a Ryzen 3xxx box, and installing Xen,
when starting a guest, they got the following error message:

"WARNING: This system may be affected by Xen bug $FOO.  Please see
$DOCUMENTATION and set XEN_LIBXC_DISABLE_FAKEHT appropriately."

It would mean annoying people not running Windows unnecessarily; but it
seems like a better experience than having a guest crash and searching
for a solution.

(And to be clear, the reason it's RFC is to ask what people think of
this idea, rather than to argue that this is what we should do.)

> 
>> --- a/tools/libxc/xc_cpuid_x86.c
>> +++ b/tools/libxc/xc_cpuid_x86.c
>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>      }
>>      else
>>      {
>> -        /*
>> -         * Topology for HVM guests is entirely controlled by Xen.  For now, we
>> -         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>> -         */
>> -        p->basic.htt = true;
>> +        p->basic.htt = false;
>>          p->extd.cmp_legacy = false;
>>  
>> -        /*
>> -         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
>> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>> -         * overflow.
>> -         */
>> -        if ( !(p->basic.lppp & 0x80) )
>> -            p->basic.lppp *= 2;
>> -
> 
> I appreciate you wanting to put all adjustments in a central place, but
> at least it makes patch review more difficult. How about you latch
> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of
> the function and then the above would become
> 
>         if ( !(p->basic.lppp & 0x80) )
>             p->basic.lppp <<= fakeht;
> 
> and e.g. ...
> 
>>          switch ( p->x86_vendor )
>>          {
>>          case X86_VENDOR_INTEL:
>>              for ( i = 0; (p->cache.subleaf[i].type &&
>>                            i < ARRAY_SIZE(p->cache.raw)); ++i )
>>              {
>> -                p->cache.subleaf[i].cores_per_package =
>> -                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
> 
> ... this
> 
>                 p->cache.subleaf[i].cores_per_package =
>                     (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht;

I'm afraid I think the code itself would then become more difficult to
read; and it seems a bit strange to be architecting our code based on
limitations of the diff algorithm and/or diff viewer used.

I could break the patch down into two if you think that would make it
easier to review.  You might also try a diff viewer which shows both
before and after; I use 'meld'.

>> +                p->cache.subleaf[i].cores_per_package = 0;
> 
> This doesn't look correct - you need to leave alone the field if
> the adjustment moved down is supposed to have any effect. This
> is an example of why the bigger code movement you do is risky.

Ah yes, I didn't notice the calculation was self-referential.  Let me
take a look.

 -George

George Dunlap Nov. 15, 2019, 12:10 p.m. UTC | #6

On 11/15/19 11:39 AM, Andreas Kinzler wrote:
> On 15.11.2019 12:29, George Dunlap wrote:
>> On 11/15/19 11:17 AM, Andreas Kinzler wrote:
>>> I do not understand a central point: No matter why and/or how a fake
>>> topology is presented by Xen, why did the older generation Ryzen 2xxx
>>> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that
>>> causes the one to work and the other to fail?
>> The CPU features that the guest sees are a mix of the real underlying
>> features and changes made by Xen.  Xen and/or the hardware will behave
> 
> Why not analyze the bits in detail? I already posted the complete CPUID
> for 3700X
> (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html).
> 
> 
> @Steven:
>> If this is helpful, I can probably provide the same from:
>>    * Ryzen 2700x
>>    * Ryzen 3900x
> Can you post for 2700X?
> 
> Then someone with detailed knowledge could compare the two?

What would be the purpose?

The code is going to look like this --

https://gitlab.com/xen-project/xen/blob/staging/xen/arch/x86/cpu/common.c

-- an impenetrable maze of "switch" and "if" statements based on
individual bits or features or models.  *Somewhere* in Window's version
of that code, there's a path which is triggered by
Ryzen-3xxx-Xen-with-Fake-HT but not triggered by
Ryzen-2xxx-with-Xen-Fake-HT, or Ryzen-3xxx-Xen-without-Fake-HT.

And suppose we found exactly the bits which triggered it, what then?  We
could flip just that bit; but that would make the resulting CPUID even
*more* weird, and likely to trigger something else in the future.

The solution this patch proposes doesn't make the CPUID topology
"normal", but it certainly makes it a lot less weird, which is a better
place to be.

 -George

Jürgen Groß Nov. 15, 2019, 12:23 p.m. UTC | #7

On 15.11.19 11:57, George Dunlap wrote:
> Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM
> guests") attempted to "fake up" a topology which would induce guest
> operating systems to not treat vcpus as sibling hyperthreads.  This
> involved (among other things) actually reporting hyperthreading as
> available, but giving vcpus every other APICID.  The resulting cpu
> featureset is invalid, but most operating systems on most hardware
> managed to cope with it.
> 
> Unfortunately, Windows running on modern AMD hardware -- including
> Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets
> confused by the resulting contradictory feature bits and crashes
> during installation.  (Linux guests have so far continued to cope.)
> 
> A "proper" fix is complicated and it's too late to fix it either for
> 4.13, or to backport to supported branches.  As a short-term fix,
> implement an option to disable this "Fake HT" mode.  The resulting
> topology reported will not be canonical, but experimentally continues
> to work with Windows guests.
> 
> However, disabling this "Fake HT" mode has not been widely tested, and
> will almost certainly break migration if applied inconsistently.
> 
> To minimize impact while allowing administrators to disable "Fake HT"
> only on guests which are known not to work without it (i.e., Windows
> guests) on affected hardware, add an environment variable which can be
> set to disable the "Fake HT" mode on such hardware.

Hmm, how is this going to work with libvirt? AFAIK libvirtd running as
a single process is creating all guests. So with this approach you'd
either not be able to use libvirtd, or you'd have to disable "Fake HT"
for all guests, probably by modifying the libvirtd service definition.

> Reported-by: Steven Haigh <netwiz@crc.id.au>
> Reported-by: Andreas Kinzler <hfp@posteo.de>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> This has been compile-tested only; I'm posting it early to get
> feedback on the approach.
> 
> TODO: Prevent such guests from being migrated
> 
> Open questions:
> 
> - Is this the right place to put the `getenv` check?
> 
> - Is there any way we can make migration work, at least in some cases?
> 
> - Can we check for known-problematic models, and at least report a
>    more useful error?

Can't we just disable "Fake HT" automatically on those models? This
would automagically make migration work, too.


Juergen

Jan Beulich Nov. 15, 2019, 12:39 p.m. UTC | #8

On 15.11.2019 12:58, George Dunlap wrote:
> On 11/15/19 11:12 AM, Jan Beulich wrote:
>> On 15.11.2019 11:57, George Dunlap wrote:
>>> --- a/tools/libxc/xc_cpuid_x86.c
>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>      }
>>>      else
>>>      {
>>> -        /*
>>> -         * Topology for HVM guests is entirely controlled by Xen.  For now, we
>>> -         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>>> -         */
>>> -        p->basic.htt = true;
>>> +        p->basic.htt = false;
>>>          p->extd.cmp_legacy = false;
>>>  
>>> -        /*
>>> -         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
>>> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>>> -         * overflow.
>>> -         */
>>> -        if ( !(p->basic.lppp & 0x80) )
>>> -            p->basic.lppp *= 2;
>>> -
>>
>> I appreciate you wanting to put all adjustments in a central place, but
>> at least it makes patch review more difficult. How about you latch
>> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of
>> the function and then the above would become
>>
>>         if ( !(p->basic.lppp & 0x80) )
>>             p->basic.lppp <<= fakeht;
>>
>> and e.g. ...
>>
>>>          switch ( p->x86_vendor )
>>>          {
>>>          case X86_VENDOR_INTEL:
>>>              for ( i = 0; (p->cache.subleaf[i].type &&
>>>                            i < ARRAY_SIZE(p->cache.raw)); ++i )
>>>              {
>>> -                p->cache.subleaf[i].cores_per_package =
>>> -                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
>>
>> ... this
>>
>>                 p->cache.subleaf[i].cores_per_package =
>>                     (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht;
> 
> I'm afraid I think the code itself would then become more difficult to
> read;

Slightly, but yes.

> and it seems a bit strange to be architecting our code based on
> limitations of the diff algorithm and/or diff viewer used.

It's not entirely uncommon to (also) consider how the resulting
diff would look like when putting together a change. And besides
the review aspect, there's also the archeology one - "git blame"
yields much more helpful results when code doesn't get moved
around more than necessary. But yes, there's no very clear "this
is the better option" here. I've taken another look at the code
before your change though - everything is already nicely in one
place with Andrew's most recent code reorg. So I'm now having an
even harder time seeing why you want to move things around again.

Jan

George Dunlap Nov. 15, 2019, 12:42 p.m. UTC | #9

On 11/15/19 12:23 PM, Jürgen Groß wrote:
> On 15.11.19 11:57, George Dunlap wrote:
>> Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM
>> guests") attempted to "fake up" a topology which would induce guest
>> operating systems to not treat vcpus as sibling hyperthreads.  This
>> involved (among other things) actually reporting hyperthreading as
>> available, but giving vcpus every other APICID.  The resulting cpu
>> featureset is invalid, but most operating systems on most hardware
>> managed to cope with it.
>>
>> Unfortunately, Windows running on modern AMD hardware -- including
>> Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets
>> confused by the resulting contradictory feature bits and crashes
>> during installation.  (Linux guests have so far continued to cope.)
>>
>> A "proper" fix is complicated and it's too late to fix it either for
>> 4.13, or to backport to supported branches.  As a short-term fix,
>> implement an option to disable this "Fake HT" mode.  The resulting
>> topology reported will not be canonical, but experimentally continues
>> to work with Windows guests.
>>
>> However, disabling this "Fake HT" mode has not been widely tested, and
>> will almost certainly break migration if applied inconsistently.
>>
>> To minimize impact while allowing administrators to disable "Fake HT"
>> only on guests which are known not to work without it (i.e., Windows
>> guests) on affected hardware, add an environment variable which can be
>> set to disable the "Fake HT" mode on such hardware.
> 
> Hmm, how is this going to work with libvirt? AFAIK libvirtd running as
> a single process is creating all guests. So with this approach you'd
> either not be able to use libvirtd, or you'd have to disable "Fake HT"
> for all guests, probably by modifying the libvirtd service definition.

If we went the current route, yes, that's what would need to be done.
Anything else would require changing the interface, which would require
changes to libvirt anyway.

>> Reported-by: Steven Haigh <netwiz@crc.id.au>
>> Reported-by: Andreas Kinzler <hfp@posteo.de>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> ---
>> This has been compile-tested only; I'm posting it early to get
>> feedback on the approach.
>>
>> TODO: Prevent such guests from being migrated
>>
>> Open questions:
>>
>> - Is this the right place to put the `getenv` check?
>>
>> - Is there any way we can make migration work, at least in some cases?
>>
>> - Can we check for known-problematic models, and at least report a
>>    more useful error?
> 
> Can't we just disable "Fake HT" automatically on those models? This
> would automagically make migration work, too.

What if someone is using only Linux guests on a Ryzen 3xxx box with
4.12.1?  If they naively update to 4.13 without going through a version
that has this backport, they'll get the wrong CPUid on the receiving
side and migration will fail.

There was also concern about disabling the "Fake HT" across the board on
the affected CPUs -- it hasn't been well-tested on Linux, and so it's
not clear whether there will be issues or not.  (If this was checked in
back in June it would be a different matter.)

 -George

Andreas Kinzler Nov. 15, 2019, 12:44 p.m. UTC | #10

On 15.11.2019 13:10, George Dunlap wrote:
> On 11/15/19 11:39 AM, Andreas Kinzler wrote:
>> On 15.11.2019 12:29, George Dunlap wrote:
>>> On 11/15/19 11:17 AM, Andreas Kinzler wrote:
>>>> I do not understand a central point: No matter why and/or how a fake
>>>> topology is presented by Xen, why did the older generation Ryzen 2xxx
>>>> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen that
>>>> causes the one to work and the other to fail?
>>> The CPU features that the guest sees are a mix of the real underlying
>>> features and changes made by Xen.  Xen and/or the hardware will behave
>> Why not analyze the bits in detail? I already posted the complete CPUID
>> for 3700X
>> (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html).
>> Then someone with detailed knowledge could compare the two?
> What would be the purpose?
> The code is going to look like this --
> an impenetrable maze of "switch" and "if" statements based on
> individual bits or features or models.  *Somewhere* in Window's
> versionof that code, there's a path which is triggered by

As of this moment all of this is just an assumption - you might very 
well be right, but it could also be something totally different. What if 
the CPUID is nearly identical? This would lead to the conclusion that 
the problem has completely different root causes.

Regards Andreas

Andrew Cooper Nov. 15, 2019, 1:55 p.m. UTC | #11

On 15/11/2019 12:39, Jan Beulich wrote:
> On 15.11.2019 12:58, George Dunlap wrote:
>> On 11/15/19 11:12 AM, Jan Beulich wrote:
>>> On 15.11.2019 11:57, George Dunlap wrote:
>>>> --- a/tools/libxc/xc_cpuid_x86.c
>>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>>      }
>>>>      else
>>>>      {
>>>> -        /*
>>>> -         * Topology for HVM guests is entirely controlled by Xen.  For now, we
>>>> -         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>>>> -         */
>>>> -        p->basic.htt = true;
>>>> +        p->basic.htt = false;
>>>>          p->extd.cmp_legacy = false;
>>>>  
>>>> -        /*
>>>> -         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
>>>> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>>>> -         * overflow.
>>>> -         */
>>>> -        if ( !(p->basic.lppp & 0x80) )
>>>> -            p->basic.lppp *= 2;
>>>> -
>>> I appreciate you wanting to put all adjustments in a central place, but
>>> at least it makes patch review more difficult. How about you latch
>>> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of
>>> the function and then the above would become
>>>
>>>         if ( !(p->basic.lppp & 0x80) )
>>>             p->basic.lppp <<= fakeht;
>>>
>>> and e.g. ...
>>>
>>>>          switch ( p->x86_vendor )
>>>>          {
>>>>          case X86_VENDOR_INTEL:
>>>>              for ( i = 0; (p->cache.subleaf[i].type &&
>>>>                            i < ARRAY_SIZE(p->cache.raw)); ++i )
>>>>              {
>>>> -                p->cache.subleaf[i].cores_per_package =
>>>> -                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
>>> ... this
>>>
>>>                 p->cache.subleaf[i].cores_per_package =
>>>                     (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht;
>> I'm afraid I think the code itself would then become more difficult to
>> read;
> Slightly, but yes.
>
>> and it seems a bit strange to be architecting our code based on
>> limitations of the diff algorithm and/or diff viewer used.
> It's not entirely uncommon to (also) consider how the resulting
> diff would look like when putting together a change. And besides
> the review aspect, there's also the archeology one - "git blame"
> yields much more helpful results when code doesn't get moved
> around more than necessary. But yes, there's no very clear "this
> is the better option" here. I've taken another look at the code
> before your change though - everything is already nicely in one
> place with Andrew's most recent code reorg. So I'm now having an
> even harder time seeing why you want to move things around again.

We don't.  I've recommend twice now to have a single "else if" hunk
which is nearly empty, and much more obviously a gross "make it work for
4.13" bodge.

The only thing which is an open question (IMO) is if/how to trigger this
quirk mode.  There are no good options.  We just need to agree on the
least bad one.

~Andrew

Andrew Cooper Nov. 15, 2019, 2 p.m. UTC | #12

On 15/11/2019 12:44, Andreas Kinzler wrote:
> On 15.11.2019 13:10, George Dunlap wrote:
>> On 11/15/19 11:39 AM, Andreas Kinzler wrote:
>>> On 15.11.2019 12:29, George Dunlap wrote:
>>>> On 11/15/19 11:17 AM, Andreas Kinzler wrote:
>>>>> I do not understand a central point: No matter why and/or how a fake
>>>>> topology is presented by Xen, why did the older generation Ryzen 2xxx
>>>>> work and Ryzen 3xxx doesn't? What is the change in AMD(!) not Xen
>>>>> that
>>>>> causes the one to work and the other to fail?
>>>> The CPU features that the guest sees are a mix of the real underlying
>>>> features and changes made by Xen.  Xen and/or the hardware will behave
>>> Why not analyze the bits in detail? I already posted the complete CPUID
>>> for 3700X
>>> (https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg02189.html).
>>>
>>> Then someone with detailed knowledge could compare the two?
>> What would be the purpose?
>> The code is going to look like this --
>> an impenetrable maze of "switch" and "if" statements based on
>> individual bits or features or models.  *Somewhere* in Window's
>> versionof that code, there's a path which is triggered by
>
> As of this moment all of this is just an assumption - you might very
> well be right, but it could also be something totally different. What
> if the CPUID is nearly identical? This would lead to the conclusion
> that the problem has completely different root causes.

The problem is that Xen's "messing around with topology" is, and has
always been, incorrect.  It does not match the rules given in the SDM/APM.

None of this code is going to survive the rewrite making the presented
topology actually follow the architectural rules.

However, for 4.13, we are trying to find the least invasive hack
possible to cause windows not to explode.

~Andrew

George Dunlap Nov. 15, 2019, 2:04 p.m. UTC | #13

On 11/15/19 1:55 PM, Andrew Cooper wrote:
> On 15/11/2019 12:39, Jan Beulich wrote:
>> On 15.11.2019 12:58, George Dunlap wrote:
>>> On 11/15/19 11:12 AM, Jan Beulich wrote:
>>>> On 15.11.2019 11:57, George Dunlap wrote:
>>>>> --- a/tools/libxc/xc_cpuid_x86.c
>>>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>>>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>>>      }
>>>>>      else
>>>>>      {
>>>>> -        /*
>>>>> -         * Topology for HVM guests is entirely controlled by Xen.  For now, we
>>>>> -         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>>>>> -         */
>>>>> -        p->basic.htt = true;
>>>>> +        p->basic.htt = false;
>>>>>          p->extd.cmp_legacy = false;
>>>>>  
>>>>> -        /*
>>>>> -         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
>>>>> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>>>>> -         * overflow.
>>>>> -         */
>>>>> -        if ( !(p->basic.lppp & 0x80) )
>>>>> -            p->basic.lppp *= 2;
>>>>> -
>>>> I appreciate you wanting to put all adjustments in a central place, but
>>>> at least it makes patch review more difficult. How about you latch
>>>> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of
>>>> the function and then the above would become
>>>>
>>>>         if ( !(p->basic.lppp & 0x80) )
>>>>             p->basic.lppp <<= fakeht;
>>>>
>>>> and e.g. ...
>>>>
>>>>>          switch ( p->x86_vendor )
>>>>>          {
>>>>>          case X86_VENDOR_INTEL:
>>>>>              for ( i = 0; (p->cache.subleaf[i].type &&
>>>>>                            i < ARRAY_SIZE(p->cache.raw)); ++i )
>>>>>              {
>>>>> -                p->cache.subleaf[i].cores_per_package =
>>>>> -                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
>>>> ... this
>>>>
>>>>                 p->cache.subleaf[i].cores_per_package =
>>>>                     (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht;
>>> I'm afraid I think the code itself would then become more difficult to
>>> read;
>> Slightly, but yes.
>>
>>> and it seems a bit strange to be architecting our code based on
>>> limitations of the diff algorithm and/or diff viewer used.
>> It's not entirely uncommon to (also) consider how the resulting
>> diff would look like when putting together a change. And besides
>> the review aspect, there's also the archeology one - "git blame"
>> yields much more helpful results when code doesn't get moved
>> around more than necessary. But yes, there's no very clear "this
>> is the better option" here. I've taken another look at the code
>> before your change though - everything is already nicely in one
>> place with Andrew's most recent code reorg. So I'm now having an
>> even harder time seeing why you want to move things around again.
> 
> We don't.  I've recommend twice now to have a single "else if" hunk
> which is nearly empty, and much more obviously a gross "make it work for
> 4.13" bodge.

The results are a tiny bit better, but not much really (see attached).

 -George

George Dunlap Nov. 15, 2019, 2:05 p.m. UTC | #14

On 11/15/19 2:04 PM, George Dunlap wrote:
> On 11/15/19 1:55 PM, Andrew Cooper wrote:
>> On 15/11/2019 12:39, Jan Beulich wrote:
>>> On 15.11.2019 12:58, George Dunlap wrote:
>>>> On 11/15/19 11:12 AM, Jan Beulich wrote:
>>>>> On 15.11.2019 11:57, George Dunlap wrote:
>>>>>> --- a/tools/libxc/xc_cpuid_x86.c
>>>>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>>>>> @@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>>>>      }
>>>>>>      else
>>>>>>      {
>>>>>> -        /*
>>>>>> -         * Topology for HVM guests is entirely controlled by Xen.  For now, we
>>>>>> -         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
>>>>>> -         */
>>>>>> -        p->basic.htt = true;
>>>>>> +        p->basic.htt = false;
>>>>>>          p->extd.cmp_legacy = false;
>>>>>>  
>>>>>> -        /*
>>>>>> -         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
>>>>>> -         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
>>>>>> -         * overflow.
>>>>>> -         */
>>>>>> -        if ( !(p->basic.lppp & 0x80) )
>>>>>> -            p->basic.lppp *= 2;
>>>>>> -
>>>>> I appreciate you wanting to put all adjustments in a central place, but
>>>>> at least it makes patch review more difficult. How about you latch
>>>>> !getenv("XEN_LIBXC_DISABLE_FAKEHT") into a local boolean at the top of
>>>>> the function and then the above would become
>>>>>
>>>>>         if ( !(p->basic.lppp & 0x80) )
>>>>>             p->basic.lppp <<= fakeht;
>>>>>
>>>>> and e.g. ...
>>>>>
>>>>>>          switch ( p->x86_vendor )
>>>>>>          {
>>>>>>          case X86_VENDOR_INTEL:
>>>>>>              for ( i = 0; (p->cache.subleaf[i].type &&
>>>>>>                            i < ARRAY_SIZE(p->cache.raw)); ++i )
>>>>>>              {
>>>>>> -                p->cache.subleaf[i].cores_per_package =
>>>>>> -                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
>>>>> ... this
>>>>>
>>>>>                 p->cache.subleaf[i].cores_per_package =
>>>>>                     (p->cache.subleaf[i].cores_per_package << fakeht) | fakeht;
>>>> I'm afraid I think the code itself would then become more difficult to
>>>> read;
>>> Slightly, but yes.
>>>
>>>> and it seems a bit strange to be architecting our code based on
>>>> limitations of the diff algorithm and/or diff viewer used.
>>> It's not entirely uncommon to (also) consider how the resulting
>>> diff would look like when putting together a change. And besides
>>> the review aspect, there's also the archeology one - "git blame"
>>> yields much more helpful results when code doesn't get moved
>>> around more than necessary. But yes, there's no very clear "this
>>> is the better option" here. I've taken another look at the code
>>> before your change though - everything is already nicely in one
>>> place with Andrew's most recent code reorg. So I'm now having an
>>> even harder time seeing why you want to move things around again.
>>
>> We don't.  I've recommend twice now to have a single "else if" hunk
>> which is nearly empty, and much more obviously a gross "make it work for
>> 4.13" bodge.
> 
> The results are a tiny bit better, but not much really (see attached).

I mean, if we *really* wanted to optimize for diff readability, we could
use `goto`s instead, but I thought that would be going a bit too far.

 -George

Andrew Cooper Nov. 15, 2019, 2:06 p.m. UTC | #15

On 15/11/2019 14:04, George Dunlap wrote:
>>> It's not entirely uncommon to (also) consider how the resulting
>>> diff would look like when putting together a change. And besides
>>> the review aspect, there's also the archeology one - "git blame"
>>> yields much more helpful results when code doesn't get moved
>>> around more than necessary. But yes, there's no very clear "this
>>> is the better option" here. I've taken another look at the code
>>> before your change though - everything is already nicely in one
>>> place with Andrew's most recent code reorg. So I'm now having an
>>> even harder time seeing why you want to move things around again.
>> We don't.  I've recommend twice now to have a single "else if" hunk
>> which is nearly empty, and much more obviously a gross "make it work for
>> 4.13" bodge.
> The results are a tiny bit better, but not much really (see attached).

What I meant was:

> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index 312c481f1e..bc088e45f0 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>      }

else if ( getenv() )
{
    ...
}

>      else
>      {

With no delta to this block at all.

~Andrew

George Dunlap Nov. 15, 2019, 2:10 p.m. UTC | #16

On 11/15/19 2:06 PM, Andrew Cooper wrote:
> On 15/11/2019 14:04, George Dunlap wrote:
>>>> It's not entirely uncommon to (also) consider how the resulting
>>>> diff would look like when putting together a change. And besides
>>>> the review aspect, there's also the archeology one - "git blame"
>>>> yields much more helpful results when code doesn't get moved
>>>> around more than necessary. But yes, there's no very clear "this
>>>> is the better option" here. I've taken another look at the code
>>>> before your change though - everything is already nicely in one
>>>> place with Andrew's most recent code reorg. So I'm now having an
>>>> even harder time seeing why you want to move things around again.
>>> We don't.  I've recommend twice now to have a single "else if" hunk
>>> which is nearly empty, and much more obviously a gross "make it work for
>>> 4.13" bodge.
>> The results are a tiny bit better, but not much really (see attached).
> 
> What I meant was:
> 
>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
>> index 312c481f1e..bc088e45f0 100644
>> --- a/tools/libxc/xc_cpuid_x86.c
>> +++ b/tools/libxc/xc_cpuid_x86.c
>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>      }
> 
> else if ( getenv() )
> {
>     ...
> }
> 
>>      else
>>      {
> 
> With no delta to this block at all.

Then we have to duplicate this code in both blocks:

        /*
         * These settings are necessary to cause earlier
HVM_PARAM_NESTEDHVM /
         * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
         * CPUID.  Xen will discard these bits if configuration hasn't been
         * set for the domain.
         */
        p->extd.itsc = true;
        p->basic.vmx = true;
        p->extd.svm = true;

I won't object if that's what you guys really want.

 -George

Andrew Cooper Nov. 15, 2019, 2:14 p.m. UTC | #17

On 15/11/2019 14:10, George Dunlap wrote:
> On 11/15/19 2:06 PM, Andrew Cooper wrote:
>> On 15/11/2019 14:04, George Dunlap wrote:
>>>>> It's not entirely uncommon to (also) consider how the resulting
>>>>> diff would look like when putting together a change. And besides
>>>>> the review aspect, there's also the archeology one - "git blame"
>>>>> yields much more helpful results when code doesn't get moved
>>>>> around more than necessary. But yes, there's no very clear "this
>>>>> is the better option" here. I've taken another look at the code
>>>>> before your change though - everything is already nicely in one
>>>>> place with Andrew's most recent code reorg. So I'm now having an
>>>>> even harder time seeing why you want to move things around again.
>>>> We don't.  I've recommend twice now to have a single "else if" hunk
>>>> which is nearly empty, and much more obviously a gross "make it work for
>>>> 4.13" bodge.
>>> The results are a tiny bit better, but not much really (see attached).
>> What I meant was:
>>
>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
>>> index 312c481f1e..bc088e45f0 100644
>>> --- a/tools/libxc/xc_cpuid_x86.c
>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>      }
>> else if ( getenv() )
>> {
>>     ...
>> }
>>
>>>      else
>>>      {
>> With no delta to this block at all.
> Then we have to duplicate this code in both blocks:
>
>         /*
>          * These settings are necessary to cause earlier
> HVM_PARAM_NESTEDHVM /
>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>          * CPUID.  Xen will discard these bits if configuration hasn't been
>          * set for the domain.
>          */
>         p->extd.itsc = true;
>         p->basic.vmx = true;
>         p->extd.svm = true;
>
> I won't object if that's what you guys really want.

I'd also be happy with a single "goto hvm_common;".  These tweaks are
already at the end of the block, so suitably positioned.

None of this code is going to survive for long.

~Andrew

Jan Beulich Nov. 15, 2019, 2:18 p.m. UTC | #18

On 15.11.2019 15:10, George Dunlap wrote:
> On 11/15/19 2:06 PM, Andrew Cooper wrote:
>> On 15/11/2019 14:04, George Dunlap wrote:
>>>>> It's not entirely uncommon to (also) consider how the resulting
>>>>> diff would look like when putting together a change. And besides
>>>>> the review aspect, there's also the archeology one - "git blame"
>>>>> yields much more helpful results when code doesn't get moved
>>>>> around more than necessary. But yes, there's no very clear "this
>>>>> is the better option" here. I've taken another look at the code
>>>>> before your change though - everything is already nicely in one
>>>>> place with Andrew's most recent code reorg. So I'm now having an
>>>>> even harder time seeing why you want to move things around again.
>>>> We don't.  I've recommend twice now to have a single "else if" hunk
>>>> which is nearly empty, and much more obviously a gross "make it work for
>>>> 4.13" bodge.
>>> The results are a tiny bit better, but not much really (see attached).
>>
>> What I meant was:
>>
>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
>>> index 312c481f1e..bc088e45f0 100644
>>> --- a/tools/libxc/xc_cpuid_x86.c
>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>      }
>>
>> else if ( getenv() )
>> {
>>     ...
>> }
>>
>>>      else
>>>      {
>>
>> With no delta to this block at all.
> 
> Then we have to duplicate this code in both blocks:
> 
>         /*
>          * These settings are necessary to cause earlier
> HVM_PARAM_NESTEDHVM /
>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>          * CPUID.  Xen will discard these bits if configuration hasn't been
>          * set for the domain.
>          */
>         p->extd.itsc = true;
>         p->basic.vmx = true;
>         p->extd.svm = true;
> 
> I won't object if that's what you guys really want.

Personally I think the duplication is less bad than the far
heavier original code churn, but to be honest, especially with
this intended to go away again soon anyway, I'd not be opposed
at all to

    ...
    else if ( getenv() )
        goto no_fake_ht;
    else
    {
    ...
 no_fake_ht:
        /*
         * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM /
         * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
         * CPUID.  Xen will discard these bits if configuration hasn't been
         * set for the domain.
         */
        p->extd.itsc = true;
        p->basic.vmx = true;
        p->extd.svm = true;
    }

(despite my general dislike of goto).

Jan

George Dunlap Nov. 15, 2019, 2:29 p.m. UTC | #19

On 11/15/19 2:18 PM, Jan Beulich wrote:
> On 15.11.2019 15:10, George Dunlap wrote:
>> On 11/15/19 2:06 PM, Andrew Cooper wrote:
>>> On 15/11/2019 14:04, George Dunlap wrote:
>>>>>> It's not entirely uncommon to (also) consider how the resulting
>>>>>> diff would look like when putting together a change. And besides
>>>>>> the review aspect, there's also the archeology one - "git blame"
>>>>>> yields much more helpful results when code doesn't get moved
>>>>>> around more than necessary. But yes, there's no very clear "this
>>>>>> is the better option" here. I've taken another look at the code
>>>>>> before your change though - everything is already nicely in one
>>>>>> place with Andrew's most recent code reorg. So I'm now having an
>>>>>> even harder time seeing why you want to move things around again.
>>>>> We don't.  I've recommend twice now to have a single "else if" hunk
>>>>> which is nearly empty, and much more obviously a gross "make it work for
>>>>> 4.13" bodge.
>>>> The results are a tiny bit better, but not much really (see attached).
>>>
>>> What I meant was:
>>>
>>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
>>>> index 312c481f1e..bc088e45f0 100644
>>>> --- a/tools/libxc/xc_cpuid_x86.c
>>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>>      }
>>>
>>> else if ( getenv() )
>>> {
>>>     ...
>>> }
>>>
>>>>      else
>>>>      {
>>>
>>> With no delta to this block at all.
>>
>> Then we have to duplicate this code in both blocks:
>>
>>         /*
>>          * These settings are necessary to cause earlier
>> HVM_PARAM_NESTEDHVM /
>>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>>          * CPUID.  Xen will discard these bits if configuration hasn't been
>>          * set for the domain.
>>          */
>>         p->extd.itsc = true;
>>         p->basic.vmx = true;
>>         p->extd.svm = true;
>>
>> I won't object if that's what you guys really want.
> 
> Personally I think the duplication is less bad than the far
> heavier original code churn, but to be honest, especially with
> this intended to go away again soon anyway, I'd not be opposed
> at all to
> 
>     ...
>     else if ( getenv() )
>         goto no_fake_ht;

This isn't correct, because you do need to clear htt and cmp_legacy, as
well as zeroing out cores_per_package and threads_per_cache on Intel.
(At least, that's what XenServer's patch does, and it's the best tested.)

>     else
>     {
>     ...
>  no_fake_ht:
>         /*
>          * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM /
>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>          * CPUID.  Xen will discard these bits if configuration hasn't been
>          * set for the domain.
>          */
>         p->extd.itsc = true;
>         p->basic.vmx = true;
>         p->extd.svm = true;
>     }
> 
> (despite my general dislike of goto).

Well I think gotos into other blocks is even worse. :-)

I think the result is a lot nicer to review for sure.

 -George

Steven Haigh Nov. 15, 2019, 2:31 p.m. UTC | #20

Just regarding the use of a system environment variable to turn this 
feature / bugfix / hack on and off - this would probably break starting 
the VM via the xendomains script.

If the VM definition is in /etc/xen/auto/, then there would be nothing 
to set the environment variable before the VM is launched - hence it 
would not be applied and a guest crash would occur...

Depending on the VM's settings, this would either continue to start & 
crash - or just stop again until it could be started with the ENV 
variable.
Steven Haigh

Jan Beulich Nov. 15, 2019, 2:42 p.m. UTC | #21

On 15.11.2019 15:29, George Dunlap wrote:
> On 11/15/19 2:18 PM, Jan Beulich wrote:
>> On 15.11.2019 15:10, George Dunlap wrote:
>>> On 11/15/19 2:06 PM, Andrew Cooper wrote:
>>>> On 15/11/2019 14:04, George Dunlap wrote:
>>>>>>> It's not entirely uncommon to (also) consider how the resulting
>>>>>>> diff would look like when putting together a change. And besides
>>>>>>> the review aspect, there's also the archeology one - "git blame"
>>>>>>> yields much more helpful results when code doesn't get moved
>>>>>>> around more than necessary. But yes, there's no very clear "this
>>>>>>> is the better option" here. I've taken another look at the code
>>>>>>> before your change though - everything is already nicely in one
>>>>>>> place with Andrew's most recent code reorg. So I'm now having an
>>>>>>> even harder time seeing why you want to move things around again.
>>>>>> We don't.  I've recommend twice now to have a single "else if" hunk
>>>>>> which is nearly empty, and much more obviously a gross "make it work for
>>>>>> 4.13" bodge.
>>>>> The results are a tiny bit better, but not much really (see attached).
>>>>
>>>> What I meant was:
>>>>
>>>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
>>>>> index 312c481f1e..bc088e45f0 100644
>>>>> --- a/tools/libxc/xc_cpuid_x86.c
>>>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>>>      }
>>>>
>>>> else if ( getenv() )
>>>> {
>>>>     ...
>>>> }
>>>>
>>>>>      else
>>>>>      {
>>>>
>>>> With no delta to this block at all.
>>>
>>> Then we have to duplicate this code in both blocks:
>>>
>>>         /*
>>>          * These settings are necessary to cause earlier
>>> HVM_PARAM_NESTEDHVM /
>>>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>>>          * CPUID.  Xen will discard these bits if configuration hasn't been
>>>          * set for the domain.
>>>          */
>>>         p->extd.itsc = true;
>>>         p->basic.vmx = true;
>>>         p->extd.svm = true;
>>>
>>> I won't object if that's what you guys really want.
>>
>> Personally I think the duplication is less bad than the far
>> heavier original code churn, but to be honest, especially with
>> this intended to go away again soon anyway, I'd not be opposed
>> at all to
>>
>>     ...
>>     else if ( getenv() )
>>         goto no_fake_ht;
> 
> This isn't correct, because you do need to clear htt and cmp_legacy, as
> well as zeroing out cores_per_package and threads_per_cache on Intel.
> (At least, that's what XenServer's patch does, and it's the best tested.)
> 
>>     else
>>     {
>>     ...
>>  no_fake_ht:
>>         /*
>>          * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM /
>>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>>          * CPUID.  Xen will discard these bits if configuration hasn't been
>>          * set for the domain.
>>          */
>>         p->extd.itsc = true;
>>         p->basic.vmx = true;
>>         p->extd.svm = true;
>>     }
>>
>> (despite my general dislike of goto).
> 
> Well I think gotos into other blocks is even worse. :-)
> 
> I think the result is a lot nicer to review for sure.

Trying to comment despite this having been an attachment:

>--- a/tools/libxc/xc_cpuid_x86.c
>+++ b/tools/libxc/xc_cpuid_x86.c
>@@ -579,6 +579,26 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>     }
>     else
>     {
>+        if ( getenv("XEN_LIBXC_DISABLE_FAKEHT") ) {

The brace wants to move onto its own line.

>+            p->basic.htt = false;

I think everything below here indeed simply undoes what said old
commit did, but I can't match up this one. And together with the
question of whether instead leaving it alone, cmp_legacy then
would have the same question raised.

Jan

George Dunlap Nov. 15, 2019, 2:55 p.m. UTC | #22

On 11/15/19 2:42 PM, Jan Beulich wrote:
> On 15.11.2019 15:29, George Dunlap wrote:
>> On 11/15/19 2:18 PM, Jan Beulich wrote:
>>> On 15.11.2019 15:10, George Dunlap wrote:
>>>> On 11/15/19 2:06 PM, Andrew Cooper wrote:
>>>>> On 15/11/2019 14:04, George Dunlap wrote:
>>>>>>>> It's not entirely uncommon to (also) consider how the resulting
>>>>>>>> diff would look like when putting together a change. And besides
>>>>>>>> the review aspect, there's also the archeology one - "git blame"
>>>>>>>> yields much more helpful results when code doesn't get moved
>>>>>>>> around more than necessary. But yes, there's no very clear "this
>>>>>>>> is the better option" here. I've taken another look at the code
>>>>>>>> before your change though - everything is already nicely in one
>>>>>>>> place with Andrew's most recent code reorg. So I'm now having an
>>>>>>>> even harder time seeing why you want to move things around again.
>>>>>>> We don't.  I've recommend twice now to have a single "else if" hunk
>>>>>>> which is nearly empty, and much more obviously a gross "make it work for
>>>>>>> 4.13" bodge.
>>>>>> The results are a tiny bit better, but not much really (see attached).
>>>>>
>>>>> What I meant was:
>>>>>
>>>>>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
>>>>>> index 312c481f1e..bc088e45f0 100644
>>>>>> --- a/tools/libxc/xc_cpuid_x86.c
>>>>>> +++ b/tools/libxc/xc_cpuid_x86.c
>>>>>> @@ -579,52 +579,71 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>>>>>      }
>>>>>
>>>>> else if ( getenv() )
>>>>> {
>>>>>     ...
>>>>> }
>>>>>
>>>>>>      else
>>>>>>      {
>>>>>
>>>>> With no delta to this block at all.
>>>>
>>>> Then we have to duplicate this code in both blocks:
>>>>
>>>>         /*
>>>>          * These settings are necessary to cause earlier
>>>> HVM_PARAM_NESTEDHVM /
>>>>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>>>>          * CPUID.  Xen will discard these bits if configuration hasn't been
>>>>          * set for the domain.
>>>>          */
>>>>         p->extd.itsc = true;
>>>>         p->basic.vmx = true;
>>>>         p->extd.svm = true;
>>>>
>>>> I won't object if that's what you guys really want.
>>>
>>> Personally I think the duplication is less bad than the far
>>> heavier original code churn, but to be honest, especially with
>>> this intended to go away again soon anyway, I'd not be opposed
>>> at all to
>>>
>>>     ...
>>>     else if ( getenv() )
>>>         goto no_fake_ht;
>>
>> This isn't correct, because you do need to clear htt and cmp_legacy, as
>> well as zeroing out cores_per_package and threads_per_cache on Intel.
>> (At least, that's what XenServer's patch does, and it's the best tested.)
>>
>>>     else
>>>     {
>>>     ...
>>>  no_fake_ht:
>>>         /*
>>>          * These settings are necessary to cause earlier HVM_PARAM_NESTEDHVM /
>>>          * XEN_DOMCTL_disable_migrate settings to be reflected correctly in
>>>          * CPUID.  Xen will discard these bits if configuration hasn't been
>>>          * set for the domain.
>>>          */
>>>         p->extd.itsc = true;
>>>         p->basic.vmx = true;
>>>         p->extd.svm = true;
>>>     }
>>>
>>> (despite my general dislike of goto).
>>
>> Well I think gotos into other blocks is even worse. :-)
>>
>> I think the result is a lot nicer to review for sure.
> 
> Trying to comment despite this having been an attachment:
> 
>> --- a/tools/libxc/xc_cpuid_x86.c
>> +++ b/tools/libxc/xc_cpuid_x86.c
>> @@ -579,6 +579,26 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
>>     }
>>     else
>>     {
>> +        if ( getenv("XEN_LIBXC_DISABLE_FAKEHT") ) {
> 
> The brace wants to move onto its own line.

Ack

> 
>> +            p->basic.htt = false;
> 
> I think everything below here indeed simply undoes what said old
> commit did, but I can't match up this one. And together with the
> question of whether instead leaving it alone, cmp_legacy then
> would have the same question raised.

This is based on a XenServer patch which reverts the entire commit, and
has been maintained in the patchqueue since the commit made it upstream,
AFAICT.  So I'll let someone from that team comment on the wherefores;
but as I said, it's by far the best tested option we're going to get.

 -George

Andrew Cooper Nov. 15, 2019, 2:59 p.m. UTC | #23

On 15/11/2019 14:55, George Dunlap wrote:
>>> +            p->basic.htt = false;
>> I think everything below here indeed simply undoes what said old
>> commit did, but I can't match up this one. And together with the
>> question of whether instead leaving it alone, cmp_legacy then
>> would have the same question raised.
> This is based on a XenServer patch which reverts the entire commit, and
> has been maintained in the patchqueue since the commit made it upstream,
> AFAICT.  So I'll let someone from that team comment on the wherefores;
> but as I said, it's by far the best tested option we're going to get.

Yes.  It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained
forwards until now) because it broke migration across that changeset.

It is also this exact version of the revert which has been tested and
confirmed to fix the Ryzen 3xxx fixes.

A separate experiment tried playing with only the flags, via
cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes.

Given that both the before and after logic here is broken in the eyes of
the APM, I'm not overly fussed about working about exactly how.

~Andrew

George Dunlap Nov. 15, 2019, 3:05 p.m. UTC | #24

FTR, please avoid top-posting. :-)

On 11/15/19 2:31 PM, Steven Haigh wrote:
> Just regarding the use of a system environment variable to turn this
> feature / bugfix / hack on and off - this would probably break starting
> the VM via the xendomains script.
> 
> If the VM definition is in /etc/xen/auto/, then there would be nothing
> to set the environment variable before the VM is launched - hence it
> would not be applied and a guest crash would occur...
> 
> Depending on the VM's settings, this would either continue to start &
> crash - or just stop again until it could be started with the ENV variable.

Right.  So a couple of options:

1. Users of xendomains could set the environment variable in their
xendomains script

2. We could add a xl.cfg option.  Unknown xl.cfg entries are ignored
(for better or for worse); in the future, when the "fake ht" thing is
replaced, we can either continue ignoring it, or give a useful error
message saying how it should be changed.

2a.  We could have the config option *replace* the environment variable;
in which case we'd leave libvirt users high and dry

2b. We could have the config option cause xl to *set* the environment
variable, which should continue to allow other toolstacks (even those
not using libxl) to potentially work around the issue.

Right now I'm leaning towards 2b, and having it be in a separate patch.

 -George

Jürgen Groß Nov. 15, 2019, 3:10 p.m. UTC | #25

On 15.11.19 16:05, George Dunlap wrote:
> FTR, please avoid top-posting. :-)
> 
> On 11/15/19 2:31 PM, Steven Haigh wrote:
>> Just regarding the use of a system environment variable to turn this
>> feature / bugfix / hack on and off - this would probably break starting
>> the VM via the xendomains script.
>>
>> If the VM definition is in /etc/xen/auto/, then there would be nothing
>> to set the environment variable before the VM is launched - hence it
>> would not be applied and a guest crash would occur...
>>
>> Depending on the VM's settings, this would either continue to start &
>> crash - or just stop again until it could be started with the ENV variable.
> 
> Right.  So a couple of options:
> 
> 1. Users of xendomains could set the environment variable in their
> xendomains script
> 
> 2. We could add a xl.cfg option.  Unknown xl.cfg entries are ignored
> (for better or for worse); in the future, when the "fake ht" thing is
> replaced, we can either continue ignoring it, or give a useful error
> message saying how it should be changed.
> 
> 2a.  We could have the config option *replace* the environment variable;
> in which case we'd leave libvirt users high and dry
> 
> 2b. We could have the config option cause xl to *set* the environment
> variable, which should continue to allow other toolstacks (even those
> not using libxl) to potentially work around the issue.
> 
> Right now I'm leaning towards 2b, and having it be in a separate patch.

In which case we should consider having a way to set arbitrary
environment variables from the config file in order to avoid this kind
of discussion in future similar cases.


Juergen

George Dunlap Nov. 15, 2019, 3:15 p.m. UTC | #26

On 11/15/19 3:10 PM, Jürgen Groß wrote:
> On 15.11.19 16:05, George Dunlap wrote:
>> FTR, please avoid top-posting. :-)
>>
>> On 11/15/19 2:31 PM, Steven Haigh wrote:
>>> Just regarding the use of a system environment variable to turn this
>>> feature / bugfix / hack on and off - this would probably break starting
>>> the VM via the xendomains script.
>>>
>>> If the VM definition is in /etc/xen/auto/, then there would be nothing
>>> to set the environment variable before the VM is launched - hence it
>>> would not be applied and a guest crash would occur...
>>>
>>> Depending on the VM's settings, this would either continue to start &
>>> crash - or just stop again until it could be started with the ENV
>>> variable.
>>
>> Right.  So a couple of options:
>>
>> 1. Users of xendomains could set the environment variable in their
>> xendomains script
>>
>> 2. We could add a xl.cfg option.  Unknown xl.cfg entries are ignored
>> (for better or for worse); in the future, when the "fake ht" thing is
>> replaced, we can either continue ignoring it, or give a useful error
>> message saying how it should be changed.
>>
>> 2a.  We could have the config option *replace* the environment variable;
>> in which case we'd leave libvirt users high and dry
>>
>> 2b. We could have the config option cause xl to *set* the environment
>> variable, which should continue to allow other toolstacks (even those
>> not using libxl) to potentially work around the issue.
>>
>> Right now I'm leaning towards 2b, and having it be in a separate patch.
> 
> In which case we should consider having a way to set arbitrary
> environment variables from the config file in order to avoid this kind
> of discussion in future similar cases.

Right, I was thinking about a good / useful interface; e.g.:

workarounds = [
    AMD_RYZEN_TOPOLOGY_FIX=ignore
]

And then have 'ignore' mean, "Ignore this if you haven't heard of it, or
if the option has gone away; 'auto' mean, "Do the most reasonable
thing", 'strict' to mean, "Fail domain build if this workaround is
obsolete", or something like that.  Then users can dial their own "just
make it work" vs "tell me it's gone" as they like.

 -George

George Dunlap Nov. 15, 2019, 3:23 p.m. UTC | #27

On 11/15/19 2:59 PM, Andrew Cooper wrote:
> On 15/11/2019 14:55, George Dunlap wrote:
>>>> +            p->basic.htt = false;
>>> I think everything below here indeed simply undoes what said old
>>> commit did, but I can't match up this one. And together with the
>>> question of whether instead leaving it alone, cmp_legacy then
>>> would have the same question raised.
>> This is based on a XenServer patch which reverts the entire commit, and
>> has been maintained in the patchqueue since the commit made it upstream,
>> AFAICT.  So I'll let someone from that team comment on the wherefores;
>> but as I said, it's by far the best tested option we're going to get.
> 
> Yes.  It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained
> forwards until now) because it broke migration across that changeset.
> 
> It is also this exact version of the revert which has been tested and
> confirmed to fix the Ryzen 3xxx fixes.
> 
> A separate experiment tried playing with only the flags, via
> cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes.

Is that because the "revert"  still clears cmp_legacy, rather than
setting it to 1?

I think what Jan was getting at was that ca2eee92df44 *sets* htt and
*clears* cmp_legacy, but previous to that commit, htt and cmp_legacy
weren't changed, they were simply left alone.  When reverting this
patch, why do we not simply leave it alone, as was done before, rather
than actively clearing them?

I think it's a good question to ask, but unless there is a known issue
with what the patch does, I think it's far less risky just to take the
version which has been tested.

 -George

Jan Beulich Nov. 15, 2019, 3:27 p.m. UTC | #28

On 15.11.2019 16:05, George Dunlap wrote:
> FTR, please avoid top-posting. :-)
> 
> On 11/15/19 2:31 PM, Steven Haigh wrote:
>> Just regarding the use of a system environment variable to turn this
>> feature / bugfix / hack on and off - this would probably break starting
>> the VM via the xendomains script.
>>
>> If the VM definition is in /etc/xen/auto/, then there would be nothing
>> to set the environment variable before the VM is launched - hence it
>> would not be applied and a guest crash would occur...
>>
>> Depending on the VM's settings, this would either continue to start &
>> crash - or just stop again until it could be started with the ENV variable.
> 
> Right.  So a couple of options:
> 
> 1. Users of xendomains could set the environment variable in their
> xendomains script
> 
> 2. We could add a xl.cfg option.  Unknown xl.cfg entries are ignored
> (for better or for worse); in the future, when the "fake ht" thing is
> replaced, we can either continue ignoring it, or give a useful error
> message saying how it should be changed.
> 
> 2a.  We could have the config option *replace* the environment variable;
> in which case we'd leave libvirt users high and dry
> 
> 2b. We could have the config option cause xl to *set* the environment
> variable, which should continue to allow other toolstacks (even those
> not using libxl) to potentially work around the issue.

And how would any of these allow to deal with heterogeneous sets of
guests?

Jan

George Dunlap Nov. 15, 2019, 3:30 p.m. UTC | #29

On 11/15/19 3:27 PM, Jan Beulich wrote:
> On 15.11.2019 16:05, George Dunlap wrote:
>> FTR, please avoid top-posting. :-)
>>
>> On 11/15/19 2:31 PM, Steven Haigh wrote:
>>> Just regarding the use of a system environment variable to turn this
>>> feature / bugfix / hack on and off - this would probably break starting
>>> the VM via the xendomains script.
>>>
>>> If the VM definition is in /etc/xen/auto/, then there would be nothing
>>> to set the environment variable before the VM is launched - hence it
>>> would not be applied and a guest crash would occur...
>>>
>>> Depending on the VM's settings, this would either continue to start &
>>> crash - or just stop again until it could be started with the ENV variable.
>>
>> Right.  So a couple of options:
>>
>> 1. Users of xendomains could set the environment variable in their
>> xendomains script
>>
>> 2. We could add a xl.cfg option.  Unknown xl.cfg entries are ignored
>> (for better or for worse); in the future, when the "fake ht" thing is
>> replaced, we can either continue ignoring it, or give a useful error
>> message saying how it should be changed.
>>
>> 2a.  We could have the config option *replace* the environment variable;
>> in which case we'd leave libvirt users high and dry
>>
>> 2b. We could have the config option cause xl to *set* the environment
>> variable, which should continue to allow other toolstacks (even those
>> not using libxl) to potentially work around the issue.
> 
> And how would any of these allow to deal with heterogeneous sets of
> guests?

Are you perhaps confusing 'xl.cfg' (which is the per-domain
configuration file) with 'xl.conf' (which is the system-wide
configuration file)?

#1 would obviously require arranging for *all* xendomain-enabled guests
to be started with the config enabled.  #2 would allow heterogeneous
guests if the admin went through and added the workaround to the guests
she wanted.

 -George

Jan Beulich Nov. 15, 2019, 3:33 p.m. UTC | #30

On 15.11.2019 16:23, George Dunlap wrote:
> On 11/15/19 2:59 PM, Andrew Cooper wrote:
>> On 15/11/2019 14:55, George Dunlap wrote:
>>>>> +            p->basic.htt = false;
>>>> I think everything below here indeed simply undoes what said old
>>>> commit did, but I can't match up this one. And together with the
>>>> question of whether instead leaving it alone, cmp_legacy then
>>>> would have the same question raised.
>>> This is based on a XenServer patch which reverts the entire commit, and
>>> has been maintained in the patchqueue since the commit made it upstream,
>>> AFAICT.  So I'll let someone from that team comment on the wherefores;
>>> but as I said, it's by far the best tested option we're going to get.
>>
>> Yes.  It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained
>> forwards until now) because it broke migration across that changeset.
>>
>> It is also this exact version of the revert which has been tested and
>> confirmed to fix the Ryzen 3xxx fixes.
>>
>> A separate experiment tried playing with only the flags, via
>> cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes.
> 
> Is that because the "revert"  still clears cmp_legacy, rather than
> setting it to 1?
> 
> I think what Jan was getting at was that ca2eee92df44 *sets* htt and
> *clears* cmp_legacy, but previous to that commit, htt and cmp_legacy
> weren't changed, they were simply left alone.  When reverting this
> patch, why do we not simply leave it alone, as was done before, rather
> than actively clearing them?

Actually no, I wasn't looking properly - HTT used to be cleared as much
as CMP_LEGACY before that change. Somehow I didn't spot the former when
putting together my earlier reply (maybe I looked for HTT when its only
HT there). So I'm sorry for the extra noise.

Jan

Jan Beulich Nov. 15, 2019, 3:34 p.m. UTC | #31

On 15.11.2019 16:30, George Dunlap wrote:
> On 11/15/19 3:27 PM, Jan Beulich wrote:
>> On 15.11.2019 16:05, George Dunlap wrote:
>>> FTR, please avoid top-posting. :-)
>>>
>>> On 11/15/19 2:31 PM, Steven Haigh wrote:
>>>> Just regarding the use of a system environment variable to turn this
>>>> feature / bugfix / hack on and off - this would probably break starting
>>>> the VM via the xendomains script.
>>>>
>>>> If the VM definition is in /etc/xen/auto/, then there would be nothing
>>>> to set the environment variable before the VM is launched - hence it
>>>> would not be applied and a guest crash would occur...
>>>>
>>>> Depending on the VM's settings, this would either continue to start &
>>>> crash - or just stop again until it could be started with the ENV variable.
>>>
>>> Right.  So a couple of options:
>>>
>>> 1. Users of xendomains could set the environment variable in their
>>> xendomains script
>>>
>>> 2. We could add a xl.cfg option.  Unknown xl.cfg entries are ignored
>>> (for better or for worse); in the future, when the "fake ht" thing is
>>> replaced, we can either continue ignoring it, or give a useful error
>>> message saying how it should be changed.
>>>
>>> 2a.  We could have the config option *replace* the environment variable;
>>> in which case we'd leave libvirt users high and dry
>>>
>>> 2b. We could have the config option cause xl to *set* the environment
>>> variable, which should continue to allow other toolstacks (even those
>>> not using libxl) to potentially work around the issue.
>>
>> And how would any of these allow to deal with heterogeneous sets of
>> guests?
> 
> Are you perhaps confusing 'xl.cfg' (which is the per-domain
> configuration file) with 'xl.conf' (which is the system-wide
> configuration file)?

Oh, indeed I was. I'm not used to any suffixes on domain config
files. I'm sorry.

Jan

Andrew Cooper Nov. 15, 2019, 3:35 p.m. UTC | #32

On 15/11/2019 15:23, George Dunlap wrote:
> On 11/15/19 2:59 PM, Andrew Cooper wrote:
>> On 15/11/2019 14:55, George Dunlap wrote:
>>>>> +            p->basic.htt = false;
>>>> I think everything below here indeed simply undoes what said old
>>>> commit did, but I can't match up this one. And together with the
>>>> question of whether instead leaving it alone, cmp_legacy then
>>>> would have the same question raised.
>>> This is based on a XenServer patch which reverts the entire commit, and
>>> has been maintained in the patchqueue since the commit made it upstream,
>>> AFAICT.  So I'll let someone from that team comment on the wherefores;
>>> but as I said, it's by far the best tested option we're going to get.
>> Yes.  It is a revert of c/s ca2eee92df44 (Xen 3.4, and maintained
>> forwards until now) because it broke migration across that changeset.
>>
>> It is also this exact version of the revert which has been tested and
>> confirmed to fix the Ryzen 3xxx fixes.
>>
>> A separate experiment tried playing with only the flags, via
>> cpuid="host:htt=0,cmp_legacy=1" and this did not resolve the crashes.
> Is that because the "revert"  still clears cmp_legacy, rather than
> setting it to 1?
>
> I think what Jan was getting at was that ca2eee92df44 *sets* htt and
> *clears* cmp_legacy, but previous to that commit, htt and cmp_legacy
> weren't changed, they were simply left alone.  When reverting this
> patch, why do we not simply leave it alone, as was done before, rather
> than actively clearing them?

You also need to account for the accumulated bugfixes of the code since
ca2eee92df44.

> I think it's a good question to ask, but unless there is a known issue
> with what the patch does, I think it's far less risky just to take the
> version which has been tested.

In short, yes I believe the behaviour is deliberate, although I don't
have the bug tickets to hand to remember exactly what went wrong.

The only other possibility (and perhaps is better, now that it is
possible) is to foward those two bits from the host policy.  They are
set in the guest policy due to (still) not having a split between
default and max (another issue in the queue to be fixed).

~Andrew

George Dunlap Nov. 15, 2019, 3:37 p.m. UTC | #33

On 11/15/19 3:34 PM, Jan Beulich wrote:
> On 15.11.2019 16:30, George Dunlap wrote:
>> On 11/15/19 3:27 PM, Jan Beulich wrote:
>>> On 15.11.2019 16:05, George Dunlap wrote:
>>>> FTR, please avoid top-posting. :-)
>>>>
>>>> On 11/15/19 2:31 PM, Steven Haigh wrote:
>>>>> Just regarding the use of a system environment variable to turn this
>>>>> feature / bugfix / hack on and off - this would probably break starting
>>>>> the VM via the xendomains script.
>>>>>
>>>>> If the VM definition is in /etc/xen/auto/, then there would be nothing
>>>>> to set the environment variable before the VM is launched - hence it
>>>>> would not be applied and a guest crash would occur...
>>>>>
>>>>> Depending on the VM's settings, this would either continue to start &
>>>>> crash - or just stop again until it could be started with the ENV variable.
>>>>
>>>> Right.  So a couple of options:
>>>>
>>>> 1. Users of xendomains could set the environment variable in their
>>>> xendomains script
>>>>
>>>> 2. We could add a xl.cfg option.  Unknown xl.cfg entries are ignored
>>>> (for better or for worse); in the future, when the "fake ht" thing is
>>>> replaced, we can either continue ignoring it, or give a useful error
>>>> message saying how it should be changed.
>>>>
>>>> 2a.  We could have the config option *replace* the environment variable;
>>>> in which case we'd leave libvirt users high and dry
>>>>
>>>> 2b. We could have the config option cause xl to *set* the environment
>>>> variable, which should continue to allow other toolstacks (even those
>>>> not using libxl) to potentially work around the issue.
>>>
>>> And how would any of these allow to deal with heterogeneous sets of
>>> guests?
>>
>> Are you perhaps confusing 'xl.cfg' (which is the per-domain
>> configuration file) with 'xl.conf' (which is the system-wide
>> configuration file)?
> 
> Oh, indeed I was. I'm not used to any suffixes on domain config
> files. I'm sorry.

FYI I'm using the names of the respective man pages: `man xl.cfg` gives
you the man page for the per-domain config, `man xl.conf` gives you the
global config.

It's far from obvious, but at least it's something. :-)

 -George

diff mbox series

Patch

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 312c481f1e..70c85e1467 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -579,52 +579,68 @@  int xc_cpuid_apply_policy(xc_interface *xch, uint32_t domid,
     }
     else
     {
-        /*
-         * Topology for HVM guests is entirely controlled by Xen.  For now, we
-         * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
-         */
-        p->basic.htt = true;
+        p->basic.htt = false;
         p->extd.cmp_legacy = false;
 
-        /*
-         * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
-         * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
-         * overflow.
-         */
-        if ( !(p->basic.lppp & 0x80) )
-            p->basic.lppp *= 2;
-
         switch ( p->x86_vendor )
         {
         case X86_VENDOR_INTEL:
             for ( i = 0; (p->cache.subleaf[i].type &&
                           i < ARRAY_SIZE(p->cache.raw)); ++i )
             {
-                p->cache.subleaf[i].cores_per_package =
-                    (p->cache.subleaf[i].cores_per_package << 1) | 1;
+                p->cache.subleaf[i].cores_per_package = 0;
                 p->cache.subleaf[i].threads_per_cache = 0;
             }
             break;
+        }
 
-        case X86_VENDOR_AMD:
-        case X86_VENDOR_HYGON:
+        if ( !getenv("XEN_LIBXC_DISABLE_FAKEHT") ) {
             /*
-             * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
-             * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
-             * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
-             * - overflow,
-             * - going out of sync with leaf 1 EBX[23:16],
-             * - incrementing ApicIdCoreSize when it's zero (which changes the
-             *   meaning of bits 7:0).
+             * Topology for HVM guests is entirely controlled by Xen.  For now, we
+             * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no SMT.
              */
-            if ( p->extd.nc < 0x7f )
+            p->basic.htt = true;
+
+            /*
+             * Leaf 1 EBX[23:16] is Maximum Logical Processors Per Package.
+             * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure to avoid
+             * overflow.
+             */
+            if ( !(p->basic.lppp & 0x80) )
+                p->basic.lppp *= 2;
+
+            switch ( p->x86_vendor )
             {
-                if ( p->extd.apic_id_size != 0 && p->extd.apic_id_size != 0xf )
-                    p->extd.apic_id_size++;
+            case X86_VENDOR_INTEL:
+                for ( i = 0; (p->cache.subleaf[i].type &&
+                              i < ARRAY_SIZE(p->cache.raw)); ++i )
+                {
+                    p->cache.subleaf[i].cores_per_package =
+                        (p->cache.subleaf[i].cores_per_package << 1) | 1;
+                    p->cache.subleaf[i].threads_per_cache = 0;
+                }
+
+            case X86_VENDOR_AMD:
+            case X86_VENDOR_HYGON:
+                /*
+                 * Leaf 0x80000008 ECX[15:12] is ApicIdCoreSize.
+                 * Leaf 0x80000008 ECX[7:0] is NumberOfCores (minus one).
+                 * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
+                 * - overflow,
+                 * - going out of sync with leaf 1 EBX[23:16],
+                 * - incrementing ApicIdCoreSize when it's zero (which changes the
+                 *   meaning of bits 7:0).
+                 */
+                if ( p->extd.nc < 0x7f )
+                {
+                    if ( p->extd.apic_id_size != 0 && p->extd.apic_id_size != 0xf )
+                        p->extd.apic_id_size++;
+
+                    p->extd.nc = (p->extd.nc << 1) | 1;
+                }
+                break;
 
-                p->extd.nc = (p->extd.nc << 1) | 1;
             }
-            break;
         }
 
         /*