diff mbox series

[RFC,1/3] xen: Introduce "gpaddr_bits" field to XEN_SYSCTL_physinfo

Message ID 1631034578-12598-2-git-send-email-olekstysh@gmail.com (mailing list archive)
State Superseded
Headers show
Series Add handling of extended regions (safe ranges) on Arm (Was "xen/memory: Introduce a hypercall to provide unallocated space") | expand

Commit Message

Oleksandr Tyshchenko Sept. 7, 2021, 5:09 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

We need to pass info about maximum supported guest address
space size to the toolstack on Arm in order to properly
calculate the base and size of the extended region (safe range)
for the guest. The extended region is unused address space which
could be safely used by domain for foreign/grant mappings on Arm.
The extended region itself will be handled by the subsequents
patch.

Use p2m_ipa_bits variable on Arm, the x86 equivalent is
hap_paddr_bits.

As we change the size of structure bump the interface version.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Changes since RFC:
   - update patch subject/description
   - replace arch-specific sub-struct with common gpaddr_bits
     field and update code to reflect that
---
 tools/include/libxl.h            | 7 +++++++
 tools/libs/light/libxl.c         | 2 ++
 tools/libs/light/libxl_types.idl | 2 ++
 xen/arch/arm/sysctl.c            | 2 ++
 xen/arch/x86/sysctl.c            | 2 ++
 xen/include/public/sysctl.h      | 3 ++-
 6 files changed, 17 insertions(+), 1 deletion(-)

Comments

Andrew Cooper Sept. 7, 2021, 5:35 p.m. UTC | #1
On 07/09/2021 18:09, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> We need to pass info about maximum supported guest address
> space size to the toolstack on Arm in order to properly
> calculate the base and size of the extended region (safe range)
> for the guest. The extended region is unused address space which
> could be safely used by domain for foreign/grant mappings on Arm.
> The extended region itself will be handled by the subsequents
> patch.
>
> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
> hap_paddr_bits.
>
> As we change the size of structure bump the interface version.
>
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

So while I think this is a suitable way forward, you're painting
yourself into a corner WRT migration.

On x86, the correct value is d->arch.cpuid->extd.maxphysaddr and this
value is under toolstack control, not Xen control.  In particular, it
needs to be min(hostA, hostB) to make migration safe, and yes - this is
currently a hole in x86's migration logic that will cause large VMs to
explode.

The same will be true on ARM as/when you gain migration support.

I think this would be better as a domctl.  On ARM, it can reference
p2m_ipa_bits for now along with a /* TODO - make per-domain for
migration support */, while on x86 it can take the appropriate value
(which will soon actually be safe in migration scenarios).

However, that does change the ordering requirements in the toolstack -
this hypercall would need to be made after the domain is created, and
has been levelled, and before its main memory layout is decided.

Alternatively, the abstraction could be hidden in libxl itself in arch
specific code, with x86 referring to the local cpu policy (as libxl has
the copy it is/has worked on), and ARM making a hypercall.  This does
make the ordering more obvious.

(As a note on the x86 specifics of this patch, hap_paddr_bits is
actually unused in practice.  It was a proposal from AMD for the
hypervisor to fill in those details, which wasn't implemented by anyone,
not even Xen, because the important field to modify is maxphysaddr if
you don't want to rewrite every kernel to behave differently when
virtualised.)

~Andrew
Oleksandr Tyshchenko Sept. 7, 2021, 9:28 p.m. UTC | #2
On 07.09.21 20:35, Andrew Cooper wrote:

Hi Andrew

> On 07/09/2021 18:09, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> We need to pass info about maximum supported guest address
>> space size to the toolstack on Arm in order to properly
>> calculate the base and size of the extended region (safe range)
>> for the guest. The extended region is unused address space which
>> could be safely used by domain for foreign/grant mappings on Arm.
>> The extended region itself will be handled by the subsequents
>> patch.
>>
>> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>> hap_paddr_bits.
>>
>> As we change the size of structure bump the interface version.
>>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> So while I think this is a suitable way forward, you're painting
> yourself into a corner WRT migration.
>
> On x86, the correct value is d->arch.cpuid->extd.maxphysaddr and this
> value is under toolstack control, not Xen control.  In particular, it
> needs to be min(hostA, hostB) to make migration safe, and yes - this is
> currently a hole in x86's migration logic that will cause large VMs to
> explode.
>
> The same will be true on ARM as/when you gain migration support.

Oh, I admit, I didn't keep in mind migration support while creating this 
patch.


>
> I think this would be better as a domctl.  On ARM, it can reference
> p2m_ipa_bits for now along with a /* TODO - make per-domain for
> migration support */, while on x86 it can take the appropriate value
> (which will soon actually be safe in migration scenarios).

OK, could you please clarify, did you mean to introduce new domctl 
(something like get_maxphysaddr) or reuse some existing?

And ...

>
> However, that does change the ordering requirements in the toolstack -
> this hypercall would need to be made after the domain is created, and
> has been levelled, and before its main memory layout is decided.

... I am not sure I totally understand the ordering requirements in the 
toolstack.

On Arm this information (gpaddr_bits) should be obtained by the time we 
call libxl__arch_domain_finalise_hw_description()
where we actually calculate extended region and insert it in domain's 
device-tree. At that time, the domain memory is already populated, so 
it's layout is known.
Please see the last patch of this series, which demonstrates the usage:

https://lore.kernel.org/xen-devel/1631034578-12598-4-git-send-email-olekstysh@gmail.com/


> Alternatively, the abstraction could be hidden in libxl itself in arch
> specific code, with x86 referring to the local cpu policy (as libxl has
> the copy it is/has worked on), and ARM making a hypercall.  This does
> make the ordering more obvious.

May I please ask what would be the preferred option to move forward? I 
am fine with both proposed options, with a little preference for the 
former (common domctl, abstraction is hidden in Xen itself in arch 
specific code). It is highly appreciated if we could choose the option 
which would satisfy all parties, this would save me some time.

> (As a note on the x86 specifics of this patch, hap_paddr_bits is
> actually unused in practice.  It was a proposal from AMD for the
> hypervisor to fill in those details, which wasn't implemented by anyone,
> not even Xen, because the important field to modify is maxphysaddr if
> you don't want to rewrite every kernel to behave differently when
> virtualised.)

Thank you for the clarification.


>
> ~Andrew
>
Oleksandr Tyshchenko Sept. 23, 2021, 10:52 p.m. UTC | #3
Hi Andrew.


On 08.09.21 00:28, Oleksandr wrote:
>
> On 07.09.21 20:35, Andrew Cooper wrote:
>
> Hi Andrew
>
>> On 07/09/2021 18:09, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> We need to pass info about maximum supported guest address
>>> space size to the toolstack on Arm in order to properly
>>> calculate the base and size of the extended region (safe range)
>>> for the guest. The extended region is unused address space which
>>> could be safely used by domain for foreign/grant mappings on Arm.
>>> The extended region itself will be handled by the subsequents
>>> patch.
>>>
>>> Use p2m_ipa_bits variable on Arm, the x86 equivalent is
>>> hap_paddr_bits.
>>>
>>> As we change the size of structure bump the interface version.
>>>
>>> Suggested-by: Julien Grall <jgrall@amazon.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> So while I think this is a suitable way forward, you're painting
>> yourself into a corner WRT migration.
>>
>> On x86, the correct value is d->arch.cpuid->extd.maxphysaddr and this
>> value is under toolstack control, not Xen control.  In particular, it
>> needs to be min(hostA, hostB) to make migration safe, and yes - this is
>> currently a hole in x86's migration logic that will cause large VMs to
>> explode.
>>
>> The same will be true on ARM as/when you gain migration support.
>
> Oh, I admit, I didn't keep in mind migration support while creating 
> this patch.
>
>
>>
>> I think this would be better as a domctl.  On ARM, it can reference
>> p2m_ipa_bits for now along with a /* TODO - make per-domain for
>> migration support */, while on x86 it can take the appropriate value
>> (which will soon actually be safe in migration scenarios).
>
> OK, could you please clarify, did you mean to introduce new domctl 
> (something like get_maxphysaddr) or reuse some existing?


May I please ask for clarification here ^ and ...


>
>
> And ...
>
>>
>> However, that does change the ordering requirements in the toolstack -
>> this hypercall would need to be made after the domain is created, and
>> has been levelled, and before its main memory layout is decided.
>
> ... I am not sure I totally understand the ordering requirements in 
> the toolstack.
>
> On Arm this information (gpaddr_bits) should be obtained by the time 
> we call libxl__arch_domain_finalise_hw_description()
> where we actually calculate extended region and insert it in domain's 
> device-tree. At that time, the domain memory is already populated, so 
> it's layout is known.
> Please see the last patch of this series, which demonstrates the usage:
>
> https://lore.kernel.org/xen-devel/1631034578-12598-4-git-send-email-olekstysh@gmail.com/ 
>
>
>
>> Alternatively, the abstraction could be hidden in libxl itself in arch
>> specific code, with x86 referring to the local cpu policy (as libxl has
>> the copy it is/has worked on), and ARM making a hypercall.  This does
>> make the ordering more obvious.
>
> May I please ask what would be the preferred option to move forward? I 
> am fine with both proposed options, with a little preference for the 
> former (common domctl, abstraction is hidden in Xen itself in arch 
> specific code). It is highly appreciated if we could choose the option 
> which would satisfy all parties, this would save me some time.


... here ^ ?


I have just pushed third version [1] of this series without these being 
addressed. I decided to push V3 to let the reviewers to focus on the 
main aspects, but the way how the information in question is passed to 
the toolstack also needs to be clarified and addressed.


[1] 
https://lore.kernel.org/xen-devel/1632437334-12015-1-git-send-email-olekstysh@gmail.com/


[snip]
diff mbox series

Patch

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index b9ba16d..da44944 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -855,6 +855,13 @@  typedef struct libxl__ctx libxl_ctx;
  */
 #define LIBXL_HAVE_PHYSINFO_MAX_POSSIBLE_MFN 1
 
+ /*
+  * LIBXL_HAVE_PHYSINFO_GPADDR_BITS
+  *
+  * If this is defined, libxl_physinfo has a "gpaddr_bits" field.
+  */
+ #define LIBXL_HAVE_PHYSINFO_GPADDR_BITS 1
+
 /*
  * LIBXL_HAVE_DOMINFO_OUTSTANDING_MEMKB 1
  *
diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
index 204eb0b..c86624f 100644
--- a/tools/libs/light/libxl.c
+++ b/tools/libs/light/libxl.c
@@ -405,6 +405,8 @@  int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
     physinfo->cap_vmtrace =
         !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_vmtrace);
 
+    physinfo->gpaddr_bits = xcphysinfo.gpaddr_bits;
+
     GC_FREE;
     return 0;
 }
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 3f9fff6..f7437e4 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -1061,6 +1061,8 @@  libxl_physinfo = Struct("physinfo", [
     ("cap_shadow", bool),
     ("cap_iommu_hap_pt_share", bool),
     ("cap_vmtrace", bool),
+
+    ("gpaddr_bits", uint32),
     ], dir=DIR_OUT)
 
 libxl_connectorinfo = Struct("connectorinfo", [
diff --git a/xen/arch/arm/sysctl.c b/xen/arch/arm/sysctl.c
index f87944e..91dca4f 100644
--- a/xen/arch/arm/sysctl.c
+++ b/xen/arch/arm/sysctl.c
@@ -15,6 +15,8 @@ 
 void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
 {
     pi->capabilities |= XEN_SYSCTL_PHYSCAP_hvm | XEN_SYSCTL_PHYSCAP_hap;
+
+    pi->gpaddr_bits = p2m_ipa_bits;
 }
 
 long arch_do_sysctl(struct xen_sysctl *sysctl,
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index aff52a1..7b14865 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -135,6 +135,8 @@  void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_hap;
     if ( IS_ENABLED(CONFIG_SHADOW_PAGING) )
         pi->capabilities |= XEN_SYSCTL_PHYSCAP_shadow;
+
+    pi->gpaddr_bits = hap_paddr_bits;
 }
 
 long arch_do_sysctl(
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 039ccf8..f53b42e 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -35,7 +35,7 @@ 
 #include "domctl.h"
 #include "physdev.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
 
 /*
  * Read console content from Xen buffer ring.
@@ -120,6 +120,7 @@  struct xen_sysctl_physinfo {
     uint64_aligned_t outstanding_pages;
     uint64_aligned_t max_mfn; /* Largest possible MFN on this host */
     uint32_t hw_cap[8];
+    uint32_t gpaddr_bits;
 };
 
 /*