[0/2] expose host-phys-bits to guest

Message ID	20220831125059.170032-1-kraxel@redhat.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> From: Gerd Hoffmann <kraxel@redhat.com> To: qemu-devel@nongnu.org Cc: kvm@vger.kernel.org, Marcelo Tosatti <mtosatti@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Eduardo Habkost <eduardo@habkost.net>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, "Michael S. Tsirkin" <mst@redhat.com>, Sergio Lopez <slp@redhat.com>, Gerd Hoffmann <kraxel@redhat.com> Subject: [PATCH 0/2] expose host-phys-bits to guest Date: Wed, 31 Aug 2022 14:50:57 +0200 Message-Id: <20220831125059.170032-1-kraxel@redhat.com> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	expose host-phys-bits to guest \| expand [0/2] expose host-phys-bits to guest [1/2,hack] reserve bit KVM_HINTS_HOST_PHYS_BITS [2/2,RfC] expose host-phys-bits to guest

Gerd Hoffmann Aug. 31, 2022, 12:50 p.m. UTC

When the guest (firmware specifically) knows how big
the address space actually is it can be used better.

Some more background:
  https://bugzilla.redhat.com/show_bug.cgi?id=2084533

This is a RfC series exposes the information via cpuid.

take care,
  Gerd

Gerd Hoffmann (2):
  [hack] reserve bit KVM_HINTS_HOST_PHYS_BITS
  [RfC] expose host-phys-bits to guest

 include/standard-headers/asm-x86/kvm_para.h | 3 ++-
 target/i386/cpu.h                           | 3 ---
 hw/i386/microvm.c                           | 6 +++++-
 target/i386/cpu.c                           | 3 +--
 target/i386/host-cpu.c                      | 4 +++-
 target/i386/kvm/kvm.c                       | 1 +
 6 files changed, 12 insertions(+), 8 deletions(-)

Xiaoyao Li Sept. 1, 2022, 6:07 a.m. UTC | #1

On 8/31/2022 8:50 PM, Gerd Hoffmann wrote:
> When the guest (firmware specifically) knows how big
> the address space actually is it can be used better.
> 
> Some more background:
>    https://bugzilla.redhat.com/show_bug.cgi?id=2084533

QEMU enables host-phys-bits for "-cpu host/max" in 
host_cpu_max_instance_init();

I think the problem is for all the named CPU model, that they don't have 
phys_bits defined. Thus they all have "cpu->phys-bits == 0", which leads 
to cpu->phys_bits = TCG_PHYS_ADDR_BITS (36 for 32-bits build and 40 for 
64-bits build)

Anyway, IMO, guest including guest firmware, should always consult from 
CPUID leaf 0x80000008 for physical address length. Tt is the duty of 
userspace VMM, here QEMU, to ensure VM's host physical address length 
not exceeding host's. If userspace VMM cannot ensure this, guest is 
likely hitting problem.

> This is a RfC series exposes the information via cpuid.
> 
> take care,
>    Gerd
> 
> Gerd Hoffmann (2):
>    [hack] reserve bit KVM_HINTS_HOST_PHYS_BITS
>    [RfC] expose host-phys-bits to guest
> 
>   include/standard-headers/asm-x86/kvm_para.h | 3 ++-
>   target/i386/cpu.h                           | 3 ---
>   hw/i386/microvm.c                           | 6 +++++-
>   target/i386/cpu.c                           | 3 +--
>   target/i386/host-cpu.c                      | 4 +++-
>   target/i386/kvm/kvm.c                       | 1 +
>   6 files changed, 12 insertions(+), 8 deletions(-)
>

Gerd Hoffmann Sept. 1, 2022, 1:58 p.m. UTC | #2

Hi,

> I think the problem is for all the named CPU model, that they don't have
> phys_bits defined. Thus they all have "cpu->phys-bits == 0", which leads to
> cpu->phys_bits = TCG_PHYS_ADDR_BITS (36 for 32-bits build and 40 for 64-bits
> build)

Exactly.  And if you run on hardware with phys-bits being 36 or 39
(common for intel desktop processors) things explode when the guest
tries to use the whole range.

> Anyway, IMO, guest including guest firmware, should always consult from
> CPUID leaf 0x80000008 for physical address length.

It simply can't for the reason outlined above.  Even if we fix qemu
today that doesn't solve the problem for the firmware because we want
backward compatibility with older qemu versions.  Thats why I want the
extra bit which essentially says "CPUID leaf 0x80000008 actually works".

take care,
  Gerd

Xiaoyao Li Sept. 1, 2022, 2:36 p.m. UTC | #3

On 9/1/2022 9:58 PM, Gerd Hoffmann wrote:

>> Anyway, IMO, guest including guest firmware, should always consult from
>> CPUID leaf 0x80000008 for physical address length.
> 
> It simply can't for the reason outlined above.  Even if we fix qemu
> today that doesn't solve the problem for the firmware because we want
> backward compatibility with older qemu versions.  Thats why I want the
> extra bit which essentially says "CPUID leaf 0x80000008 actually works".

I don't understand how it backward compatible with older qemu version. 
Old QEMU won't set the extra bit you introduced in this series, and all 
the guest created with old QEMU will become untrusted on CPUID leaf 
0x80000008 ?

> take care,
>    Gerd
>

Claudio Fontana Sept. 1, 2022, 2:55 p.m. UTC | #4

On 9/1/22 08:07, Xiaoyao Li wrote:
> On 8/31/2022 8:50 PM, Gerd Hoffmann wrote:
>> When the guest (firmware specifically) knows how big
>> the address space actually is it can be used better.
>>
>> Some more background:
>>    https://bugzilla.redhat.com/show_bug.cgi?id=2084533
> 
> QEMU enables host-phys-bits for "-cpu host/max" in 
> host_cpu_max_instance_init();

No, in host_cpu_max_instance_init the default for host-phys-bits is set to on.

You can still get the phys bits adjusted if you set the property to on manually for other cpu models.

> 
> I think the problem is for all the named CPU model, that they don't have 
> phys_bits defined. Thus they all have "cpu->phys-bits == 0", which leads 
> to cpu->phys_bits = TCG_PHYS_ADDR_BITS (36 for 32-bits build and 40 for 
> 64-bits build)
> 
> Anyway, IMO, guest including guest firmware, should always consult from 
> CPUID leaf 0x80000008 for physical address length. Tt is the duty of 
> userspace VMM, here QEMU, to ensure VM's host physical address length 
> not exceeding host's. If userspace VMM cannot ensure this, guest is 
> likely hitting problem.
> 
>> This is a RfC series exposes the information via cpuid.
>>
>> take care,
>>    Gerd
>>
>> Gerd Hoffmann (2):
>>    [hack] reserve bit KVM_HINTS_HOST_PHYS_BITS
>>    [RfC] expose host-phys-bits to guest
>>
>>   include/standard-headers/asm-x86/kvm_para.h | 3 ++-
>>   target/i386/cpu.h                           | 3 ---
>>   hw/i386/microvm.c                           | 6 +++++-
>>   target/i386/cpu.c                           | 3 +--
>>   target/i386/host-cpu.c                      | 4 +++-
>>   target/i386/kvm/kvm.c                       | 1 +
>>   6 files changed, 12 insertions(+), 8 deletions(-)
>>
> 
>

Gerd Hoffmann Sept. 1, 2022, 4:17 p.m. UTC | #5

On Thu, Sep 01, 2022 at 10:36:19PM +0800, Xiaoyao Li wrote:
> On 9/1/2022 9:58 PM, Gerd Hoffmann wrote:
> 
> > > Anyway, IMO, guest including guest firmware, should always consult from
> > > CPUID leaf 0x80000008 for physical address length.
> > 
> > It simply can't for the reason outlined above.  Even if we fix qemu
> > today that doesn't solve the problem for the firmware because we want
> > backward compatibility with older qemu versions.  Thats why I want the
> > extra bit which essentially says "CPUID leaf 0x80000008 actually works".
> 
> I don't understand how it backward compatible with older qemu version. Old
> QEMU won't set the extra bit you introduced in this series, and all the
> guest created with old QEMU will become untrusted on CPUID leaf 0x80000008 ?

Correct, on old qemu firmware will not trust CPUID leaf 0x80000008.
That is not worse than the situation we have today, currently the
firmware never trusts CPUID leaf 0x80000008.

So the patches will improves the situation for new qemu only, but I
don't see a way around that.

take care,
  Gerd

Xiaoyao Li Sept. 2, 2022, 12:10 a.m. UTC | #6

On 9/2/2022 12:17 AM, Gerd Hoffmann wrote:
> On Thu, Sep 01, 2022 at 10:36:19PM +0800, Xiaoyao Li wrote:
>> On 9/1/2022 9:58 PM, Gerd Hoffmann wrote:
>>
>>>> Anyway, IMO, guest including guest firmware, should always consult from
>>>> CPUID leaf 0x80000008 for physical address length.
>>>
>>> It simply can't for the reason outlined above.  Even if we fix qemu
>>> today that doesn't solve the problem for the firmware because we want
>>> backward compatibility with older qemu versions.  Thats why I want the
>>> extra bit which essentially says "CPUID leaf 0x80000008 actually works".
>>
>> I don't understand how it backward compatible with older qemu version. Old
>> QEMU won't set the extra bit you introduced in this series, and all the
>> guest created with old QEMU will become untrusted on CPUID leaf 0x80000008 ?
> 
> Correct, on old qemu firmware will not trust CPUID leaf 0x80000008.
> That is not worse than the situation we have today, currently the
> firmware never trusts CPUID leaf 0x80000008.
> 
> So the patches will improves the situation for new qemu only, but I
> don't see a way around that.
> 

I see.

But IMHO, I don't think it's good that guest firmware workaround the 
issue on its own. Instead, it's better to just trust CPUID leaf 
0x80000008 and fail if the given physical address length cannot be 
virtualized/supported.

It's just the bug of VMM to virtualize the physical address length. The 
correction direction is to fix the bug not the workaround to hide the bug.

Gerd Hoffmann Sept. 2, 2022, 6:07 a.m. UTC | #7

On Fri, Sep 02, 2022 at 08:10:00AM +0800, Xiaoyao Li wrote:
> On 9/2/2022 12:17 AM, Gerd Hoffmann wrote:
> > On Thu, Sep 01, 2022 at 10:36:19PM +0800, Xiaoyao Li wrote:
> > > On 9/1/2022 9:58 PM, Gerd Hoffmann wrote:
> > > 
> > > > > Anyway, IMO, guest including guest firmware, should always consult from
> > > > > CPUID leaf 0x80000008 for physical address length.
> > > > 
> > > > It simply can't for the reason outlined above.  Even if we fix qemu
> > > > today that doesn't solve the problem for the firmware because we want
> > > > backward compatibility with older qemu versions.  Thats why I want the
> > > > extra bit which essentially says "CPUID leaf 0x80000008 actually works".
> > > 
> > > I don't understand how it backward compatible with older qemu version. Old
> > > QEMU won't set the extra bit you introduced in this series, and all the
> > > guest created with old QEMU will become untrusted on CPUID leaf 0x80000008 ?
> > 
> > Correct, on old qemu firmware will not trust CPUID leaf 0x80000008.
> > That is not worse than the situation we have today, currently the
> > firmware never trusts CPUID leaf 0x80000008.
> > 
> > So the patches will improves the situation for new qemu only, but I
> > don't see a way around that.
> > 
> 
> I see.
> 
> But IMHO, I don't think it's good that guest firmware workaround the issue
> on its own. Instead, it's better to just trust CPUID leaf 0x80000008 and
> fail if the given physical address length cannot be virtualized/supported.
> 
> It's just the bug of VMM to virtualize the physical address length. The
> correction direction is to fix the bug not the workaround to hide the bug.

I'm starting to repeat myself. "just trust CPUID leaf 0x80000008"
doesn't work because you simply can't with current qemu versions.

I don't like the dance with the new bit very much either, but I don't
see a better way without massive fallout due to compatibility problems.
I'm open to suggestions though.

take care,
  Gerd

Michael S. Tsirkin Sept. 2, 2022, 6:35 a.m. UTC | #8

On Fri, Sep 02, 2022 at 08:07:20AM +0200, Gerd Hoffmann wrote:
> On Fri, Sep 02, 2022 at 08:10:00AM +0800, Xiaoyao Li wrote:
> > On 9/2/2022 12:17 AM, Gerd Hoffmann wrote:
> > > On Thu, Sep 01, 2022 at 10:36:19PM +0800, Xiaoyao Li wrote:
> > > > On 9/1/2022 9:58 PM, Gerd Hoffmann wrote:
> > > > 
> > > > > > Anyway, IMO, guest including guest firmware, should always consult from
> > > > > > CPUID leaf 0x80000008 for physical address length.
> > > > > 
> > > > > It simply can't for the reason outlined above.  Even if we fix qemu
> > > > > today that doesn't solve the problem for the firmware because we want
> > > > > backward compatibility with older qemu versions.  Thats why I want the
> > > > > extra bit which essentially says "CPUID leaf 0x80000008 actually works".
> > > > 
> > > > I don't understand how it backward compatible with older qemu version. Old
> > > > QEMU won't set the extra bit you introduced in this series, and all the
> > > > guest created with old QEMU will become untrusted on CPUID leaf 0x80000008 ?
> > > 
> > > Correct, on old qemu firmware will not trust CPUID leaf 0x80000008.
> > > That is not worse than the situation we have today, currently the
> > > firmware never trusts CPUID leaf 0x80000008.
> > > 
> > > So the patches will improves the situation for new qemu only, but I
> > > don't see a way around that.
> > > 
> > 
> > I see.
> > 
> > But IMHO, I don't think it's good that guest firmware workaround the issue
> > on its own. Instead, it's better to just trust CPUID leaf 0x80000008 and
> > fail if the given physical address length cannot be virtualized/supported.
> > 
> > It's just the bug of VMM to virtualize the physical address length. The
> > correction direction is to fix the bug not the workaround to hide the bug.
> 
> I'm starting to repeat myself. "just trust CPUID leaf 0x80000008"
> doesn't work because you simply can't with current qemu versions.
> 
> I don't like the dance with the new bit very much either, but I don't
> see a better way without massive fallout due to compatibility problems.
> I'm open to suggestions though.
> 
> take care,
>   Gerd

I feel there are three major sources of controversy here

0. the cover letter and subject don't do such a good job
   explaining that what we are doing is just telling guest
   CPUID is not broken. we are not exposing anything new
   and not exposing host capability to guest, for example,
   if cpuid phys address is smaller than host things also
   work fine.

1. really the naming.  We need to be more explicit that it's just a bugfix.

2. down the road we will want to switch the default when no PV. however,
   some hosts might still want conservative firmware for compatibility
   reasons, so I think we need a way to tell firmware
   "ignore phys address width in CPUID like you did in the past".
   let's add a flag for that?
   and if none are set firmware should print a warning, though I
   do not know how many people will see that. Maybe some ;)

along the lines of:

/*
 * Old KVM hosts often reported incorrect phys address width,
 * so firmware had to be very conservative in its use of physical
 * addresses. 
 * One of the two following flags should be set.
 * If none are set firmware is for now conservative, but that will
 * likely change in the future, hosts should not rely on that.
 */
/* 
/* KVM with non broken phys address width should set this flag
 * firmware will be allowed to use all phys address bits
 */
#define KVM_BUG_PHYS_ADDRESS_WIDTH_NONBROKEN 1
/*
 * Force firmware to be very conservative in its use of physical
 * addresses, ignoring phys address width in CPUID.
 * Helpful for migration between hosts with different capabilities.
 */
#define KVM_BUG_PHYS_ADDRESS_WIDTH_BROKEN 2

Gerd Hoffmann Sept. 2, 2022, 8:44 a.m. UTC | #9

Hi,
 
> I feel there are three major sources of controversy here
> 
> 0. the cover letter and subject don't do such a good job
>    explaining that what we are doing is just telling guest
>    CPUID is not broken. we are not exposing anything new
>    and not exposing host capability to guest, for example,
>    if cpuid phys address is smaller than host things also
>    work fine.
> 
> 1. really the naming.  We need to be more explicit that it's just a bugfix.

Yep, I'll go improve that for v2.

> 2. down the road we will want to switch the default when no PV. however,
>    some hosts might still want conservative firmware for compatibility
>    reasons, so I think we need a way to tell firmware
>    "ignore phys address width in CPUID like you did in the past".
>    let's add a flag for that?
>    and if none are set firmware should print a warning, though I
>    do not know how many people will see that. Maybe some ;)

> /*
>  * Force firmware to be very conservative in its use of physical
>  * addresses, ignoring phys address width in CPUID.
>  * Helpful for migration between hosts with different capabilities.
>  */
> #define KVM_BUG_PHYS_ADDRESS_WIDTH_BROKEN 2

I don't see a need for that.  Live migration compatibility can be
handled just fine today using
	'host-phys-bits=on,host-phys-bits-limit=<xx>'

Which is simliar to 'phys-bits=<xx>'.

The important difference is that phys-bits allows pretty much anything
whereas host-phys-bits-limit applies sanity checks against the host
supported phys bits and throws error on invalid values.

take care,
  Gerd

Michael S. Tsirkin Sept. 4, 2022, 8:37 p.m. UTC | #10

On Fri, Sep 02, 2022 at 10:44:20AM +0200, Gerd Hoffmann wrote:
>   Hi,
>  
> > I feel there are three major sources of controversy here
> > 
> > 0. the cover letter and subject don't do such a good job
> >    explaining that what we are doing is just telling guest
> >    CPUID is not broken. we are not exposing anything new
> >    and not exposing host capability to guest, for example,
> >    if cpuid phys address is smaller than host things also
> >    work fine.
> > 
> > 1. really the naming.  We need to be more explicit that it's just a bugfix.
> 
> Yep, I'll go improve that for v2.
> 
> > 2. down the road we will want to switch the default when no PV. however,
> >    some hosts might still want conservative firmware for compatibility
> >    reasons, so I think we need a way to tell firmware
> >    "ignore phys address width in CPUID like you did in the past".
> >    let's add a flag for that?
> >    and if none are set firmware should print a warning, though I
> >    do not know how many people will see that. Maybe some ;)
> 
> > /*
> >  * Force firmware to be very conservative in its use of physical
> >  * addresses, ignoring phys address width in CPUID.
> >  * Helpful for migration between hosts with different capabilities.
> >  */
> > #define KVM_BUG_PHYS_ADDRESS_WIDTH_BROKEN 2
> 
> I don't see a need for that.  Live migration compatibility can be
> handled just fine today using
> 	'host-phys-bits=on,host-phys-bits-limit=<xx>'
> 
> Which is simliar to 'phys-bits=<xx>'.

yes but what if user did not configure anything?

the point of the above is so we can eventually, in X years, change the guests
to trust CPUID by default.

> The important difference is that phys-bits allows pretty much anything
> whereas host-phys-bits-limit applies sanity checks against the host
> supported phys bits and throws error on invalid values.
> 
> take care,
>   Gerd

Gerd Hoffmann Sept. 5, 2022, 7:39 a.m. UTC | #11

Hi,

> > I don't see a need for that.  Live migration compatibility can be
> > handled just fine today using
> > 	'host-phys-bits=on,host-phys-bits-limit=<xx>'
> > 
> > Which is simliar to 'phys-bits=<xx>'.
> 
> yes but what if user did not configure anything?

I expect that'll be less and less common.  The phys-bits=40 default used
by qemu becomes increasingly problematic for large guests which need
more than that, and we see activity implementing support for that in
libvirt.

> the point of the above is so we can eventually, in X years, change the guests
> to trust CPUID by default.

Well, we can flip host-phys-bits to default to 'on' in qemu for new
machine types (or new cpu versions).  That'll have the very same effect.

take care,
  Gerd

[0/2] expose host-phys-bits to guest

Message

Comments