mbox series

[v4,0/5] DOMCTL-based guest magic region allocation for 11 domUs

Message ID 20240409045357.236802-1-xin.wang2@amd.com (mailing list archive)
Headers show
Series DOMCTL-based guest magic region allocation for 11 domUs | expand

Message

Henry Wang April 9, 2024, 4:53 a.m. UTC
An error message can seen from the init-dom0less application on
direct-mapped 1:1 domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

This is because populate_physmap() automatically assumes gfn == mfn
for direct mapped domains. This cannot be true for the magic pages
that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
helper application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list
is empty at that time.

This series tries to fix this issue using a DOMCTL-based approach,
because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
and inform the toolstack about the region found by hypervisor for
mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
guest memory map, currently only used for the magic page regions.
Patch 2 generalized the extended region finding logic so that it can
be reused for other use cases such as finding 1:1 domU magic regions.
Patch 3 uses the same approach as finding the extended regions to find
the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
hardcoding all base addresses of guest magic region in the init-dom0less
application by consuming the newly introduced DOMCTL. Patch 5 is a
simple patch to do some code duplication clean-up in xc.

Gitlab pipeline for this series:
https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1245192195

Henry Wang (5):
  xen/domctl, tools: Introduce a new domctl to get guest memory map
  xen/arm: Generalize the extended region finding logic
  xen/arm: Find unallocated spaces for magic pages of direct-mapped domU
  xen/memory, tools: Avoid hardcoding GUEST_MAGIC_BASE in init-dom0less
  tools/libs/ctrl: Simplify xc helpers related to populate_physmap()

 tools/helpers/init-dom0less.c            |  35 ++--
 tools/include/xenctrl.h                  |  11 ++
 tools/libs/ctrl/xc_domain.c              | 204 +++++++++++++----------
 xen/arch/arm/dom0less-build.c            |  11 ++
 xen/arch/arm/domain.c                    |  14 ++
 xen/arch/arm/domain_build.c              | 131 ++++++++++-----
 xen/arch/arm/domctl.c                    |  28 +++-
 xen/arch/arm/include/asm/domain.h        |   8 +
 xen/arch/arm/include/asm/domain_build.h  |   4 +
 xen/arch/arm/include/asm/setup.h         |   5 +
 xen/arch/arm/include/asm/static-memory.h |   7 +
 xen/arch/arm/static-memory.c             |  39 +++++
 xen/common/memory.c                      |  30 +++-
 xen/include/public/arch-arm.h            |   7 +
 xen/include/public/domctl.h              |  29 ++++
 xen/include/public/memory.h              |   3 +-
 16 files changed, 415 insertions(+), 151 deletions(-)

Comments

Daniel P. Smith April 18, 2024, 2:16 p.m. UTC | #1
On 4/9/24 00:53, Henry Wang wrote:
> An error message can seen from the init-dom0less application on
> direct-mapped 1:1 domains:
> ```
> Allocating magic pages
> memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
> Error on alloc magic pages
> ```
> 
> This is because populate_physmap() automatically assumes gfn == mfn
> for direct mapped domains. This cannot be true for the magic pages
> that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
> helper application executed in Dom0. For domain using statically
> allocated memory but not 1:1 direct-mapped, similar error "failed to
> retrieve a reserved page" can be seen as the reserved memory list
> is empty at that time.
> 
> This series tries to fix this issue using a DOMCTL-based approach,
> because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
> and inform the toolstack about the region found by hypervisor for
> mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
> guest memory map, currently only used for the magic page regions.
> Patch 2 generalized the extended region finding logic so that it can
> be reused for other use cases such as finding 1:1 domU magic regions.
> Patch 3 uses the same approach as finding the extended regions to find
> the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
> hardcoding all base addresses of guest magic region in the init-dom0less
> application by consuming the newly introduced DOMCTL. Patch 5 is a
> simple patch to do some code duplication clean-up in xc.

Hey Henry,

To help provide some perspective, these issues are not experienced with 
hyperlaunch. This is because we understood early on that you cannot move 
a lightweight version of the toolstack into hypervisor init and not 
provide a mechanism to communicate what it did to the runtime control 
plane. We evaluated the possible mechanism, to include introducing a new 
hypercall op, and ultimately settled on using hypfs. The primary reason 
is this information is static data that, while informative later, is 
only necessary for the control plane to understand the state of the 
system. As a result, hyperlaunch is able to allocate any and all special 
pages required as part of domain construction and communicate their 
addresses to the control plane. As for XSM, hypfs is already protected 
and at this time we do not see any domain builder information needing to 
be restricted separately from the data already present in hypfs.

I would like to make the suggestion that instead of continuing down this 
path, perhaps you might consider adopting the hyperlaunch usage of 
hypfs. Then adjust dom0less domain construction to allocate the special 
pages at construction time. The original hyperlaunch series includes a 
patch that provides the helper app for the xenstore announcement. And I 
can provide you with updated versions if that would be helpful.

V/r,
Daniel P. Smith
Henry Wang April 19, 2024, 1:45 a.m. UTC | #2
Hi Daniel,

On 4/18/2024 10:16 PM, Daniel P. Smith wrote:
> On 4/9/24 00:53, Henry Wang wrote:
>> An error message can seen from the init-dom0less application on
>> direct-mapped 1:1 domains:
>> ```
>> Allocating magic pages
>> memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
>> Error on alloc magic pages
>> ```
>>
>> This is because populate_physmap() automatically assumes gfn == mfn
>> for direct mapped domains. This cannot be true for the magic pages
>> that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
>> helper application executed in Dom0. For domain using statically
>> allocated memory but not 1:1 direct-mapped, similar error "failed to
>> retrieve a reserved page" can be seen as the reserved memory list
>> is empty at that time.
>>
>> This series tries to fix this issue using a DOMCTL-based approach,
>> because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
>> and inform the toolstack about the region found by hypervisor for
>> mapping the magic pages.
>
> Hey Henry,
>
> To help provide some perspective, these issues are not experienced 
> with hyperlaunch. This is because we understood early on that you 
> cannot move a lightweight version of the toolstack into hypervisor 
> init and not provide a mechanism to communicate what it did to the 
> runtime control plane. We evaluated the possible mechanism, to include 
> introducing a new hypercall op, and ultimately settled on using hypfs. 
> The primary reason is this information is static data that, while 
> informative later, is only necessary for the control plane to 
> understand the state of the system. As a result, hyperlaunch is able 
> to allocate any and all special pages required as part of domain 
> construction and communicate their addresses to the control plane. As 
> for XSM, hypfs is already protected and at this time we do not see any 
> domain builder information needing to be restricted separately from 
> the data already present in hypfs.
>
> I would like to make the suggestion that instead of continuing down 
> this path, perhaps you might consider adopting the hyperlaunch usage 
> of hypfs. Then adjust dom0less domain construction to allocate the 
> special pages at construction time. 

Thank you for the suggestion. I think your proposal makes sense. However 
I am not familiar with the hypfs so may I ask some questions first to 
confirm if I understand your proposal correctly: Do you mean I should 
firstly find, allocate and create mapping for these special pages at the 
dom0less domU's construction time, then store the GPA in hypfs and 
extract the GPA from init-dom0less app later on? Should I use existing 
interfaces such as xenhypfs_{open,cat,ls, etc} or I may probably need to 
add new hypercall ops?

> The original hyperlaunch series includes a patch that provides the 
> helper app for the xenstore announcement. And I can provide you with 
> updated versions if that would be helpful.

Thank you, yes a pointer to the corresponding series and patch would be 
definitely helpful.

Kind regards,
Henry

>
> V/r,
> Daniel P. Smith
Stefano Stabellini April 25, 2024, 10:18 p.m. UTC | #3
On Thu, 18 Apr 2024, Daniel P. Smith wrote:
> On 4/9/24 00:53, Henry Wang wrote:
> > An error message can seen from the init-dom0less application on
> > direct-mapped 1:1 domains:
> > ```
> > Allocating magic pages
> > memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
> > Error on alloc magic pages
> > ```
> > 
> > This is because populate_physmap() automatically assumes gfn == mfn
> > for direct mapped domains. This cannot be true for the magic pages
> > that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
> > helper application executed in Dom0. For domain using statically
> > allocated memory but not 1:1 direct-mapped, similar error "failed to
> > retrieve a reserved page" can be seen as the reserved memory list
> > is empty at that time.
> > 
> > This series tries to fix this issue using a DOMCTL-based approach,
> > because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
> > and inform the toolstack about the region found by hypervisor for
> > mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
> > guest memory map, currently only used for the magic page regions.
> > Patch 2 generalized the extended region finding logic so that it can
> > be reused for other use cases such as finding 1:1 domU magic regions.
> > Patch 3 uses the same approach as finding the extended regions to find
> > the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
> > hardcoding all base addresses of guest magic region in the init-dom0less
> > application by consuming the newly introduced DOMCTL. Patch 5 is a
> > simple patch to do some code duplication clean-up in xc.
> 
> Hey Henry,
> 
> To help provide some perspective, these issues are not experienced with
> hyperlaunch. This is because we understood early on that you cannot move a
> lightweight version of the toolstack into hypervisor init and not provide a
> mechanism to communicate what it did to the runtime control plane. We
> evaluated the possible mechanism, to include introducing a new hypercall op,
> and ultimately settled on using hypfs. The primary reason is this information
> is static data that, while informative later, is only necessary for the
> control plane to understand the state of the system. As a result, hyperlaunch
> is able to allocate any and all special pages required as part of domain
> construction and communicate their addresses to the control plane. As for XSM,
> hypfs is already protected and at this time we do not see any domain builder
> information needing to be restricted separately from the data already present
> in hypfs.
> 
> I would like to make the suggestion that instead of continuing down this path,
> perhaps you might consider adopting the hyperlaunch usage of hypfs. Then
> adjust dom0less domain construction to allocate the special pages at
> construction time. The original hyperlaunch series includes a patch that
> provides the helper app for the xenstore announcement. And I can provide you
> with updated versions if that would be helpful.

I also think that the new domctl is not needed and that the dom0less
domain builder should allocate the magic pages. On ARM, we already
allocate HVM_PARAM_CALLBACK_IRQ during dom0less domain build and set
HVM_PARAM_STORE_PFN to ~0ULL. I think it would be only natural to extend
that code to also allocate the magic pages and set HVM_PARAM_STORE_PFN
(and others) correctly. If we do it that way it is simpler and
consistent with the HVM_PARAM_CALLBACK_IRQ allocation, and we don't even
need hypfs. Currently we do not enable hypfs in our safety
certifiability configuration.
Henry Wang April 26, 2024, 5:22 a.m. UTC | #4
Hi Stefano, Daniel,

On 4/26/2024 6:18 AM, Stefano Stabellini wrote:
> On Thu, 18 Apr 2024, Daniel P. Smith wrote:
>> On 4/9/24 00:53, Henry Wang wrote:
>>> An error message can seen from the init-dom0less application on
>>> direct-mapped 1:1 domains:
>>> ```
>>> Allocating magic pages
>>> memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
>>> Error on alloc magic pages
>>> ```
>>>
>>> This is because populate_physmap() automatically assumes gfn == mfn
>>> for direct mapped domains. This cannot be true for the magic pages
>>> that are allocated later for 1:1 Dom0less DomUs from the init-dom0less
>>> helper application executed in Dom0. For domain using statically
>>> allocated memory but not 1:1 direct-mapped, similar error "failed to
>>> retrieve a reserved page" can be seen as the reserved memory list
>>> is empty at that time.
>>>
>>> This series tries to fix this issue using a DOMCTL-based approach,
>>> because for 1:1 direct-mapped domUs, we need to avoid the RAM regions
>>> and inform the toolstack about the region found by hypervisor for
>>> mapping the magic pages. Patch 1 introduced a new DOMCTL to get the
>>> guest memory map, currently only used for the magic page regions.
>>> Patch 2 generalized the extended region finding logic so that it can
>>> be reused for other use cases such as finding 1:1 domU magic regions.
>>> Patch 3 uses the same approach as finding the extended regions to find
>>> the guest magic page regions for direct-mapped DomUs. Patch 4 avoids
>>> hardcoding all base addresses of guest magic region in the init-dom0less
>>> application by consuming the newly introduced DOMCTL. Patch 5 is a
>>> simple patch to do some code duplication clean-up in xc.
>> Hey Henry,
>>
>> To help provide some perspective, these issues are not experienced with
>> hyperlaunch. This is because we understood early on that you cannot move a
>> lightweight version of the toolstack into hypervisor init and not provide a
>> mechanism to communicate what it did to the runtime control plane. We
>> evaluated the possible mechanism, to include introducing a new hypercall op,
>> and ultimately settled on using hypfs. The primary reason is this information
>> is static data that, while informative later, is only necessary for the
>> control plane to understand the state of the system. As a result, hyperlaunch
>> is able to allocate any and all special pages required as part of domain
>> construction and communicate their addresses to the control plane. As for XSM,
>> hypfs is already protected and at this time we do not see any domain builder
>> information needing to be restricted separately from the data already present
>> in hypfs.
>>
>> I would like to make the suggestion that instead of continuing down this path,
>> perhaps you might consider adopting the hyperlaunch usage of hypfs. Then
>> adjust dom0less domain construction to allocate the special pages at
>> construction time. The original hyperlaunch series includes a patch that
>> provides the helper app for the xenstore announcement. And I can provide you
>> with updated versions if that would be helpful.
> I also think that the new domctl is not needed and that the dom0less
> domain builder should allocate the magic pages.

Yes this is indeed much better. Thanks Daniel for suggesting this.

> On ARM, we already
> allocate HVM_PARAM_CALLBACK_IRQ during dom0less domain build and set
> HVM_PARAM_STORE_PFN to ~0ULL. I think it would be only natural to extend
> that code to also allocate the magic pages and set HVM_PARAM_STORE_PFN
> (and others) correctly. If we do it that way it is simpler and
> consistent with the HVM_PARAM_CALLBACK_IRQ allocation, and we don't even
> need hypfs. Currently we do not enable hypfs in our safety
> certifiability configuration.

It is indeed very important to consider the safety certification (which 
I completely missed). Therefore I've sent an updated version based on 
HVMOP [1]. In the future we can switch to hypfs if needed.

[1] 
https://lore.kernel.org/xen-devel/20240426031455.579637-1-xin.wang2@amd.com/

Kind regards,
Henry