diff mbox series

kvm/x86: reserve bit KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID

Message ID 20220908114146.473630-1-kraxel@redhat.com (mailing list archive)
State New, archived
Headers show
Series kvm/x86: reserve bit KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID | expand

Commit Message

Gerd Hoffmann Sept. 8, 2022, 11:41 a.m. UTC
The KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID bit hints to the guest
that the size of the physical address space as advertised by CPUID
leaf 0x80000008 is actually valid and can be used.

Unfortunately this is not the case today with qemu.  Default behavior is
to advertise 40 address bits (which I think comes from the very first x64
opteron processors).  There are lots of intel desktop processors around
which support less than that (36 or 39 depending on age), and when trying
to use the full 40 bit address space on those things go south quickly.

This renders the physical address size information effectively useless
for guests.  This patch paves the way to fix that by adding a hint for
the guest so it knows whenever the physical address size is usable or
not.

The plan for qemu is to set the bit when the physical address size is
valid.  That is the case when qemu is started with the host-phys-bits=on
option set for the cpu.  Eventually qemu can also flip the default for
that option from off to on, unfortunately that isn't easy for backward
compatibility reasons.

The plan for the firmware is to check that bit and when it is set just
query and use the available physical address space.  When the bit is not
set be conservative and try not exceed 36 bits (aka 64G) address space.
The latter is what the firmware does today unconditionally.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Sept. 8, 2022, 2:52 p.m. UTC | #1
On Thu, Sep 08, 2022, Gerd Hoffmann wrote:
> The KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID bit hints to the guest
> that the size of the physical address space as advertised by CPUID
> leaf 0x80000008 is actually valid and can be used.
> 
> Unfortunately this is not the case today with qemu.  Default behavior is
> to advertise 40 address bits (which I think comes from the very first x64
> opteron processors).  There are lots of intel desktop processors around
> which support less than that (36 or 39 depending on age), and when trying
> to use the full 40 bit address space on those things go south quickly.
> 
> This renders the physical address size information effectively useless
> for guests.  This patch paves the way to fix that by adding a hint for
> the guest so it knows whenever the physical address size is usable or
> not.
> 
> The plan for qemu is to set the bit when the physical address size is
> valid.  That is the case when qemu is started with the host-phys-bits=on
> option set for the cpu.  Eventually qemu can also flip the default for
> that option from off to on, unfortunately that isn't easy for backward
> compatibility reasons.
> 
> The plan for the firmware is to check that bit and when it is set just
> query and use the available physical address space.  When the bit is not
> set be conservative and try not exceed 36 bits (aka 64G) address space.
> The latter is what the firmware does today unconditionally.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 6e64b27b2c1e..115bb34413cf 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -37,7 +37,8 @@
>  #define KVM_FEATURE_HC_MAP_GPA_RANGE	16
>  #define KVM_FEATURE_MIGRATION_CONTROL	17
>  
> -#define KVM_HINTS_REALTIME      0
> +#define KVM_HINTS_REALTIME                      0
> +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1

Why does KVM need to get involved?  This is purely a userspace problem.  E.g. why
not use QEMU's fw_cfg to communicate this information to the guest?

Defining this flag arguably breaks backwards compatibility for VMMs that already
accurately advertise MAXPHYADDR.  The absence of the flag would imply that MAXPHYADDR
is invalid, which is not the case.
Gerd Hoffmann Sept. 9, 2022, 5:02 a.m. UTC | #2
On Thu, Sep 08, 2022 at 02:52:36PM +0000, Sean Christopherson wrote:
> On Thu, Sep 08, 2022, Gerd Hoffmann wrote:
> > The KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID bit hints to the guest
> > that the size of the physical address space as advertised by CPUID
> > leaf 0x80000008 is actually valid and can be used.
> > 
> > Unfortunately this is not the case today with qemu.  Default behavior is
> > to advertise 40 address bits (which I think comes from the very first x64
> > opteron processors).  There are lots of intel desktop processors around
> > which support less than that (36 or 39 depending on age), and when trying
> > to use the full 40 bit address space on those things go south quickly.
> > 
> > This renders the physical address size information effectively useless
> > for guests.  This patch paves the way to fix that by adding a hint for
> > the guest so it knows whenever the physical address size is usable or
> > not.
> > 
> > The plan for qemu is to set the bit when the physical address size is
> > valid.  That is the case when qemu is started with the host-phys-bits=on
> > option set for the cpu.  Eventually qemu can also flip the default for
> > that option from off to on, unfortunately that isn't easy for backward
> > compatibility reasons.
> > 
> > The plan for the firmware is to check that bit and when it is set just
> > query and use the available physical address space.  When the bit is not
> > set be conservative and try not exceed 36 bits (aka 64G) address space.
> > The latter is what the firmware does today unconditionally.
> > 
> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> > ---
> >  arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index 6e64b27b2c1e..115bb34413cf 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -37,7 +37,8 @@
> >  #define KVM_FEATURE_HC_MAP_GPA_RANGE	16
> >  #define KVM_FEATURE_MIGRATION_CONTROL	17
> >  
> > -#define KVM_HINTS_REALTIME      0
> > +#define KVM_HINTS_REALTIME                      0
> > +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
> 
> Why does KVM need to get involved?  This is purely a userspace problem.

It doesn't.  I only need reserve a hints bit, and the canonical source
for that happens to live in the kernel.  That's why this patch doesn't
touch any actual code ;)

> E.g. why not use QEMU's fw_cfg to communicate this information to the
> guest?

That is indeed the other obvious way to implement this.  Given this
information will be needed in code paths which already do CPUID queries
using CPUID to transport that information looked like the better option
to me.

> Defining this flag arguably breaks backwards compatibility for VMMs
> that already accurately advertise MAXPHYADDR.  The absence of the flag
> would imply that MAXPHYADDR is invalid, which is not the case.

That is true no matter how we try to transport that information from the
host to the guest (even with fw_cfg because other hypervisors start
using that interface too).

In practice it is not much of a problem though.  The firmware needs to
know the exact platform it runs on anyway to initialize everything
properly, so the logic can easily be restricted to qemu.

take care,
  Gerd
Vitaly Kuznetsov Sept. 9, 2022, 9:02 a.m. UTC | #3
Gerd Hoffmann <kraxel@redhat.com> writes:

> On Thu, Sep 08, 2022 at 02:52:36PM +0000, Sean Christopherson wrote:
>> On Thu, Sep 08, 2022, Gerd Hoffmann wrote:

...

>> >  arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> > index 6e64b27b2c1e..115bb34413cf 100644
>> > --- a/arch/x86/include/uapi/asm/kvm_para.h
>> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> > @@ -37,7 +37,8 @@
>> >  #define KVM_FEATURE_HC_MAP_GPA_RANGE	16
>> >  #define KVM_FEATURE_MIGRATION_CONTROL	17
>> >  
>> > -#define KVM_HINTS_REALTIME      0
>> > +#define KVM_HINTS_REALTIME                      0
>> > +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
>> 
>> Why does KVM need to get involved?  This is purely a userspace problem.
>
> It doesn't.  I only need reserve a hints bit, and the canonical source
> for that happens to live in the kernel.  That's why this patch doesn't
> touch any actual code ;)
>
>> E.g. why not use QEMU's fw_cfg to communicate this information to the
>> guest?
>
> That is indeed the other obvious way to implement this.  Given this
> information will be needed in code paths which already do CPUID queries
> using CPUID to transport that information looked like the better option
> to me.

While this certainly looks like an overkill here, we could probably add
new, VMM-spefific CPUID leaves to KVM, e.g.

0x4000000A: VMM signature
0x4000000B: VMM features
0x4000000C: VMM quirks
...

this way VMMs (like QEMU) could identify themselves and suggest VMM
specific things to guests without KVM's involvement. Just if 'fw_cfg' is
not enough)
Sean Christopherson Sept. 9, 2022, 2:53 p.m. UTC | #4
On Fri, Sep 09, 2022, Vitaly Kuznetsov wrote:
> Gerd Hoffmann <kraxel@redhat.com> writes:
> 
> > On Thu, Sep 08, 2022 at 02:52:36PM +0000, Sean Christopherson wrote:
> >> On Thu, Sep 08, 2022, Gerd Hoffmann wrote:
> 
> ...
> 
> >> >  arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> >> > index 6e64b27b2c1e..115bb34413cf 100644
> >> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> >> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> >> > @@ -37,7 +37,8 @@
> >> >  #define KVM_FEATURE_HC_MAP_GPA_RANGE	16
> >> >  #define KVM_FEATURE_MIGRATION_CONTROL	17
> >> >  
> >> > -#define KVM_HINTS_REALTIME      0
> >> > +#define KVM_HINTS_REALTIME                      0
> >> > +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
> >> 
> >> Why does KVM need to get involved?  This is purely a userspace problem.
> >
> > It doesn't.  I only need reserve a hints bit, and the canonical source
> > for that happens to live in the kernel.  That's why this patch doesn't
> > touch any actual code ;)
> >
> >> E.g. why not use QEMU's fw_cfg to communicate this information to the
> >> guest?
> >
> > That is indeed the other obvious way to implement this.  Given this
> > information will be needed in code paths which already do CPUID queries
> > using CPUID to transport that information looked like the better option
> > to me.
> 
> While this certainly looks like an overkill here, we could probably add
> new, VMM-spefific CPUID leaves to KVM, e.g.
> 
> 0x4000000A: VMM signature
> 0x4000000B: VMM features
> 0x4000000C: VMM quirks
> ...
> 
> this way VMMs (like QEMU) could identify themselves and suggest VMM
> specific things to guests without KVM's involvement. Just if 'fw_cfg' is
> not enough)

I don't think KVM needs to get involved in that either.  The de facto hypervisor
CPUID standard already allows for multiple hypervisors/VMMs to announce themselves
to the guest, e.g. QEMU could add itself as another VMM using 0x40000100 (shifted
as necessary to accomodate KVM+Hyper-V).
Vitaly Kuznetsov Sept. 13, 2022, 9:40 a.m. UTC | #5
Sean Christopherson <seanjc@google.com> writes:

> On Fri, Sep 09, 2022, Vitaly Kuznetsov wrote:

...

>> While this certainly looks like an overkill here, we could probably add
>> new, VMM-spefific CPUID leaves to KVM, e.g.
>> 
>> 0x4000000A: VMM signature
>> 0x4000000B: VMM features
>> 0x4000000C: VMM quirks
>> ...
>> 
>> this way VMMs (like QEMU) could identify themselves and suggest VMM
>> specific things to guests without KVM's involvement. Just if 'fw_cfg' is
>> not enough)
>
> I don't think KVM needs to get involved in that either.  The de facto hypervisor
> CPUID standard already allows for multiple hypervisors/VMMs to announce themselves
> to the guest, e.g. QEMU could add itself as another VMM using 0x40000100 (shifted
> as necessary to accomodate KVM+Hyper-V).

True, VMM can just use another hypervisor space (+0x100) but we can view
it from a slightly different angle: KVM itself is insufficient to run
VMs, there's always a VMM in the background, we may want to provide a
"standard" for its information meaning guests won't need to search for
VMMs signature[s] but can directly refer to "standardized" leaves. 

(All this is a purely theoretical discussion at this point, we need a
good reason to introduce this first).
Gerd Hoffmann Sept. 21, 2022, 1:42 p.m. UTC | #6
On Fri, Sep 09, 2022 at 07:02:24AM +0200, Gerd Hoffmann wrote:
> On Thu, Sep 08, 2022 at 02:52:36PM +0000, Sean Christopherson wrote:
> > On Thu, Sep 08, 2022, Gerd Hoffmann wrote:
> > > -#define KVM_HINTS_REALTIME      0
> > > +#define KVM_HINTS_REALTIME                      0
> > > +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
> > 
> > Why does KVM need to get involved?  This is purely a userspace problem.
> 
> It doesn't.  I only need reserve a hints bit, and the canonical source
> for that happens to live in the kernel.  That's why this patch doesn't
> touch any actual code ;)
> 
> > E.g. why not use QEMU's fw_cfg to communicate this information to the
> > guest?
> 
> That is indeed the other obvious way to implement this.  Given this
> information will be needed in code paths which already do CPUID queries
> using CPUID to transport that information looked like the better option
> to me.

I'd like to move forward with this.

So, any comment further comments and opinions?
Is it ok to grab a hints bit given the explanation above?
Or should I go for the fw_cfg approach?

thanks & take care,
  Gerd
Sean Christopherson Sept. 21, 2022, 3 p.m. UTC | #7
On Wed, Sep 21, 2022, Gerd Hoffmann wrote:
> On Fri, Sep 09, 2022 at 07:02:24AM +0200, Gerd Hoffmann wrote:
> > On Thu, Sep 08, 2022 at 02:52:36PM +0000, Sean Christopherson wrote:
> > > On Thu, Sep 08, 2022, Gerd Hoffmann wrote:
> > > > -#define KVM_HINTS_REALTIME      0
> > > > +#define KVM_HINTS_REALTIME                      0
> > > > +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
> > > 
> > > Why does KVM need to get involved?  This is purely a userspace problem.
> > 
> > It doesn't.  I only need reserve a hints bit, and the canonical source
> > for that happens to live in the kernel.  That's why this patch doesn't
> > touch any actual code ;)

The issue is that this "hint" effectively breaks other VMMs that already provide
an accurate guest.MAXPHYADDR.

> > > E.g. why not use QEMU's fw_cfg to communicate this information to the
> > > guest?
> > 
> > That is indeed the other obvious way to implement this.  Given this
> > information will be needed in code paths which already do CPUID queries
> > using CPUID to transport that information looked like the better option
> > to me.
> 
> I'd like to move forward with this.
> 
> So, any comment further comments and opinions?
> Is it ok to grab a hints bit given the explanation above?
> Or should I go for the fw_cfg approach?

My strong preference is the fw_cfg approach, or if the guest side really wants to
use CPUID, have QEMU define it's own CPUID signature and provide QEMU-specific
hints/quirks that way.
Jim Mattson Sept. 21, 2022, 4:32 p.m. UTC | #8
On Wed, Sep 21, 2022 at 8:00 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Sep 21, 2022, Gerd Hoffmann wrote:
> > On Fri, Sep 09, 2022 at 07:02:24AM +0200, Gerd Hoffmann wrote:
> > > On Thu, Sep 08, 2022 at 02:52:36PM +0000, Sean Christopherson wrote:
> > > > On Thu, Sep 08, 2022, Gerd Hoffmann wrote:
> > > > > -#define KVM_HINTS_REALTIME      0
> > > > > +#define KVM_HINTS_REALTIME                      0
> > > > > +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
> > > >
> > > > Why does KVM need to get involved?  This is purely a userspace problem.
> > >
> > > It doesn't.  I only need reserve a hints bit, and the canonical source
> > > for that happens to live in the kernel.  That's why this patch doesn't
> > > touch any actual code ;)
>
> The issue is that this "hint" effectively breaks other VMMs that already provide
> an accurate guest.MAXPHYADDR.

Any VMM that doesn't provide an accurate guest.MAXPHYADDR is broken.
Why do we need a "hint" that the virtual processor works?

> > > > E.g. why not use QEMU's fw_cfg to communicate this information to the
> > > > guest?
> > >
> > > That is indeed the other obvious way to implement this.  Given this
> > > information will be needed in code paths which already do CPUID queries
> > > using CPUID to transport that information looked like the better option
> > > to me.
> >
> > I'd like to move forward with this.
> >
> > So, any comment further comments and opinions?
> > Is it ok to grab a hints bit given the explanation above?
> > Or should I go for the fw_cfg approach?
>
> My strong preference is the fw_cfg approach, or if the guest side really wants to
> use CPUID, have QEMU define it's own CPUID signature and provide QEMU-specific
> hints/quirks that way.
Xiaoyao Li Sept. 22, 2022, 5:40 a.m. UTC | #9
On 9/22/2022 12:32 AM, Jim Mattson wrote:
> On Wed, Sep 21, 2022 at 8:00 AM Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Wed, Sep 21, 2022, Gerd Hoffmann wrote:
>>> On Fri, Sep 09, 2022 at 07:02:24AM +0200, Gerd Hoffmann wrote:
>>>> On Thu, Sep 08, 2022 at 02:52:36PM +0000, Sean Christopherson wrote:
>>>>> On Thu, Sep 08, 2022, Gerd Hoffmann wrote:
>>>>>> -#define KVM_HINTS_REALTIME      0
>>>>>> +#define KVM_HINTS_REALTIME                      0
>>>>>> +#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
>>>>>
>>>>> Why does KVM need to get involved?  This is purely a userspace problem.
>>>>
>>>> It doesn't.  I only need reserve a hints bit, and the canonical source
>>>> for that happens to live in the kernel.  That's why this patch doesn't
>>>> touch any actual code ;)
>>
>> The issue is that this "hint" effectively breaks other VMMs that already provide
>> an accurate guest.MAXPHYADDR.
> 
> Any VMM that doesn't provide an accurate guest.MAXPHYADDR is broken.

I stand for it as well.

To me, it looks an old QEMU bug and firmware provided a workaround for 
the bug that doesn't trust guest's CPUID.0x80000008. IMHO, (guest) 
firmware should always trust CPUID and error out if MAXPHYADDR reported 
from CPUID is broken to force VMM fixing itself to provide correct CPUID 
info. But letting firmware drop the workaround surely breaks backwards 
compatibility.

> Why do we need a "hint" that the virtual processor works?
Paolo Bonzini Sept. 22, 2022, 12:24 p.m. UTC | #10
On 9/21/22 18:32, Jim Mattson wrote:
>> The issue is that this "hint" effectively breaks other VMMs that already provide
>> an accurate guest.MAXPHYADDR.
>
> Any VMM that doesn't provide an accurate guest.MAXPHYADDR is broken.
> Why do we need a "hint" that the virtual processor works?

I agree.  Since old (and current) versions of QEMU wouldn't get the bit 
anyway, just fix the next one.  I'll follow up on the QEMU mailing list.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..115bb34413cf 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -37,7 +37,8 @@ 
 #define KVM_FEATURE_HC_MAP_GPA_RANGE	16
 #define KVM_FEATURE_MIGRATION_CONTROL	17
 
-#define KVM_HINTS_REALTIME      0
+#define KVM_HINTS_REALTIME                      0
+#define KVM_HINTS_PHYS_ADDRESS_SIZE_DATA_VALID  1
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.