diff mbox series

[RFC,V3] xen/gnttab: Store frame GFN in struct page_info on Arm

Message ID 1632425551-18910-1-git-send-email-olekstysh@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC,V3] xen/gnttab: Store frame GFN in struct page_info on Arm | expand

Commit Message

Oleksandr Tyshchenko Sept. 23, 2021, 7:32 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Rework Arm implementation to store grant table frame GFN
in struct page_info directly instead of keeping it in
standalone status/shared arrays.

To cover 64-bit/40-bit IPA on Arm64/Arm32 we need the space
to hold 52-bit/28-bit + extra bit value respectively. In order
to not grow the size of struct page_info borrow the required
amount of bits from type_info's count portion which current
context won't suffer (currently only 1 bit is used on Arm).
Please note, to minimize code changes and avoid introducing
an extra #ifdef-s to the header, we keep the same amount of
bits on both subarches, although the count portion on Arm64
could be wider, so we waste some bits here.

Introduce corresponding PGT_* constructs and access macros.
Update existing gnttab macros to deal with GFN value according
to new location. Also update the use of count portion on Arm
in share_xen_page_with_guest().

Update the P2M code to clean said GFN portion when putting
a reference on the grant table page in p2m_put_l3_page().
The added check is based on the assumption that grant table page
is the xen_heap page and its entry has p2m_ram_rw type, which
is correct. However, this check is not entirely precise and we
might end up clearing the GFN portion for other xen_heap pages
with the same p2m_type. But, this action is considered as
harmless, since only grant table pages really use that portion.

And for everything to work correctly introduce arch-specific
macros to be called from alloc_xenheap_pages()/free_xenheap_pages()
which purposes on Arm are to clear the portion before use and
make sure the portion is cleared after use, on x86 these are
just stubs.

This patch is intended to fix the potential issue on Arm
which might happen when remapping grant-table frame.
A guest (or the toolstack) will unmap the grant-table frame
using XENMEM_remove_physmap. This is a generic hypercall,
so on x86, we are relying on the fact the M2P entry will
be cleared on removal. For architecture without the M2P,
the GFN would still be present in the grant frame/status
array. So on the next call to map the page, we will end up to
request the P2M to remove whatever mapping was the given GFN.
This could well be another mapping.

Besides that, this patch simplifies arch code on Arm by
removing arrays and corresponding management code and
as the result gnttab_init_arch/gnttab_destroy_arch helpers
and struct grant_table_arch become useless and can be
dropped globally.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
You can find the related discussions at:
https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/

! Please note, there is still unresolved locking question here for which
I failed to find a suitable solution. So, it is still an RFC !

According to the internal conversation:
Now the GFN field in the struct page_info is accessed from
gnttab_set_frame_gfn() in the grant table code and from page_set_frame_gfn()
in the P2M code (the former uses the latter).

We need to prevent the concurrent access to this field. But, we cannot grab
the grant lock from the P2M code because we will introduce a lock inversion.
The page_set_frame_gfn() will be called from the P2M code with the p2m lock held
and then acquire the grant table lock. The gnttab_map_frame() will do the inverse.

Changes RFC1 -> RFC2:
 - update patch description
 - add/update comments in code
 - clarify check in p2m_put_l3_page()
 - introduce arch_alloc_xenheap_page() and arch_free_xenheap_page()
   and drop page_arch_init()
 - add ASSERT to gnttab_shared_page() and gnttab_status_page()
 - rework changes to Arm's struct page_info: do not split type_info,
   allocate GFN portion by reducing count portion, create corresponding
   PGT_* construct, etc
 - update page_get_frame_gfn() and page_set_frame_gfn()
 - update the use of count portion on Arm
 - drop the leading underscore in the macro parameter names

Changes RFC2 -> RFC3:
 - update patch description
 - drop PGT_count_base and MASK_INSR() in share_xen_page_with_guest()
 - update alloc_xenheap_page() and free_xenheap_page() for SEPARATE_XENHEAP
   case (Arm32)
 - provide an extra bit for GFN portion, to get PGT_INVALID_FRAME_GFN
   one bit more than the maximum number of physical address bits on Arm32
---
 xen/arch/arm/mm.c                 |  8 ++++--
 xen/arch/arm/p2m.c                | 21 ++++++++++++---
 xen/common/grant_table.c          |  9 -------
 xen/common/page_alloc.c           | 20 +++++++++++++-
 xen/include/asm-arm/grant_table.h | 57 +++++++++++++++------------------------
 xen/include/asm-arm/mm.h          | 35 +++++++++++++++++++++---
 xen/include/asm-x86/grant_table.h |  5 ----
 xen/include/asm-x86/mm.h          |  4 +++
 8 files changed, 99 insertions(+), 60 deletions(-)

Comments

Andrew Cooper Sept. 23, 2021, 8:59 p.m. UTC | #1
On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> You can find the related discussions at:
> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
>
> ! Please note, there is still unresolved locking question here for which
> I failed to find a suitable solution. So, it is still an RFC !

Just FYI, I thought I'd share some of the plans for ABI v2.  Obviously
these plans are future work and don't solve the current problem.

Guests mapping Xen pages is backwards.  There are reasons why it was
used for x86 PV guests, but the entire interface should have been design
differently for x86 HVM.

In particular, Xen should be mapping guest RAM, rather than the guest
manipulating the 2nd stage tables to map Xen RAM.  Amongst other things,
its far far lower overhead.


A much better design is one where the grant table looks like an MMIO
device.  The domain builder decides the ABI (v1 vs v2 - none of this
dynamic switch at runtime nonsense), and picks a block of guest physical
addresses, which are registered with Xen.  This forms the grant table,
status table (v2 only), and holes to map into.  Details can be conveyed
via ACPI table or DT, as applicable

Xen maps the RAM (which is now accounted to the guest, an improvement
over today) forming the grant and status tables, and grant map/unmap
hypercalls simplify massively to just {src domid, gref, flags} => slot,
which also solves the "is the same grant mapped elsewhere?" problem.

There is never a need for HVM guests to map the same grant twice (as it
controls the first stage tables and can create whatever alias it
desires), but there's also no need to allow the guest to pick where the
mapping occurs.  This vastly simplifies both the Xen and guest kernel
implementations.

~Andrew
Oleksandr Tyshchenko Sept. 23, 2021, 9:38 p.m. UTC | #2
On 23.09.21 23:59, Andrew Cooper wrote:

Hi Andrew.

> On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> You can find the related discussions at:
>> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
>> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
>> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
>>
>> ! Please note, there is still unresolved locking question here for which
>> I failed to find a suitable solution. So, it is still an RFC !
> Just FYI, I thought I'd share some of the plans for ABI v2.  Obviously
> these plans are future work and don't solve the current problem.
>
> Guests mapping Xen pages is backwards.  There are reasons why it was
> used for x86 PV guests, but the entire interface should have been design
> differently for x86 HVM.
>
> In particular, Xen should be mapping guest RAM, rather than the guest
> manipulating the 2nd stage tables to map Xen RAM.  Amongst other things,
> its far far lower overhead.
>
>
> A much better design is one where the grant table looks like an MMIO
> device.  The domain builder decides the ABI (v1 vs v2 - none of this
> dynamic switch at runtime nonsense), and picks a block of guest physical
> addresses, which are registered with Xen.  This forms the grant table,
> status table (v2 only), and holes to map into.  Details can be conveyed
> via ACPI table or DT, as applicable
>
> Xen maps the RAM (which is now accounted to the guest, an improvement
> over today) forming the grant and status tables, and grant map/unmap
> hypercalls simplify massively to just {src domid, gref, flags} => slot,
> which also solves the "is the same grant mapped elsewhere?" problem.
>
> There is never a need for HVM guests to map the same grant twice (as it
> controls the first stage tables and can create whatever alias it
> desires), but there's also no need to allow the guest to pick where the
> mapping occurs.  This vastly simplifies both the Xen and guest kernel
> implementations.

That's interesting. Thank you for detailed explanation. I think, I got 
the high level idea.


>
> ~Andrew
>
Roger Pau Monné Sept. 24, 2021, 8:41 a.m. UTC | #3
On Thu, Sep 23, 2021 at 09:59:26PM +0100, Andrew Cooper wrote:
> On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
> > Suggested-by: Julien Grall <jgrall@amazon.com>
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > ---
> > You can find the related discussions at:
> > https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
> > https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
> > https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
> >
> > ! Please note, there is still unresolved locking question here for which
> > I failed to find a suitable solution. So, it is still an RFC !
> 
> Just FYI, I thought I'd share some of the plans for ABI v2.  Obviously
> these plans are future work and don't solve the current problem.
> 
> Guests mapping Xen pages is backwards.  There are reasons why it was
> used for x86 PV guests, but the entire interface should have been design
> differently for x86 HVM.
> 
> In particular, Xen should be mapping guest RAM, rather than the guest
> manipulating the 2nd stage tables to map Xen RAM.  Amongst other things,
> its far far lower overhead.
> 
> 
> A much better design is one where the grant table looks like an MMIO
> device.  The domain builder decides the ABI (v1 vs v2 - none of this
> dynamic switch at runtime nonsense), and picks a block of guest physical
> addresses, which are registered with Xen.  This forms the grant table,
> status table (v2 only), and holes to map into.

I think this could be problematic for identity mapped Arm dom0, as
IIRC in that case grants are mapped so that gfn == mfn in order to
account for the lack of an IOMMU. You could use a bounce buffer, but
that would introduce a big performance penalty.

Other question is whether we want/need to keep such mode going
forward.

Regards, Roger.
Julien Grall Sept. 24, 2021, 2:52 p.m. UTC | #4
Hi Roger,

On 24/09/2021 13:41, Roger Pau Monné wrote:
> On Thu, Sep 23, 2021 at 09:59:26PM +0100, Andrew Cooper wrote:
>> On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
>>> Suggested-by: Julien Grall <jgrall@amazon.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>> You can find the related discussions at:
>>> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
>>> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
>>> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
>>>
>>> ! Please note, there is still unresolved locking question here for which
>>> I failed to find a suitable solution. So, it is still an RFC !
>>
>> Just FYI, I thought I'd share some of the plans for ABI v2.  Obviously
>> these plans are future work and don't solve the current problem.
>>
>> Guests mapping Xen pages is backwards.  There are reasons why it was
>> used for x86 PV guests, but the entire interface should have been design
>> differently for x86 HVM.
>>
>> In particular, Xen should be mapping guest RAM, rather than the guest
>> manipulating the 2nd stage tables to map Xen RAM.  Amongst other things,
>> its far far lower overhead.
>>
>>
>> A much better design is one where the grant table looks like an MMIO
>> device.  The domain builder decides the ABI (v1 vs v2 - none of this
>> dynamic switch at runtime nonsense), and picks a block of guest physical
>> addresses, which are registered with Xen.  This forms the grant table,
>> status table (v2 only), and holes to map into.
> 
> I think this could be problematic for identity mapped Arm dom0, as
> IIRC in that case grants are mapped so that gfn == mfn in order to
> account for the lack of an IOMMU. You could use a bounce buffer, but
> that would introduce a big performance penalty.

Or you could find a hole that is outside of the RAM regions. This is not 
trivial but not impossible (see [1]).

> 
> Other question is whether we want/need to keep such mode going
> forward.

I am assunming by "such mode" you mean "identity mapped". If so, then I 
am afraid this is not going to disappear on Arm at least. There are 
still out there many platforms without IOMMUs or devices which are not 
protected (the GPU is a common one).

Furthermore, Arm just sent a series to introduce identity mapping for 
domUs as well (see [2]).

[1] <1631034578-12598-1-git-send-email-olekstysh@gmail.com>
[2] <20210923031115.1429719-1-penny.zheng@arm.com>

Cheers,
Roger Pau Monné Sept. 24, 2021, 4:10 p.m. UTC | #5
On Fri, Sep 24, 2021 at 07:52:24PM +0500, Julien Grall wrote:
> Hi Roger,
> 
> On 24/09/2021 13:41, Roger Pau Monné wrote:
> > On Thu, Sep 23, 2021 at 09:59:26PM +0100, Andrew Cooper wrote:
> > > On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
> > > > Suggested-by: Julien Grall <jgrall@amazon.com>
> > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > ---
> > > > You can find the related discussions at:
> > > > https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
> > > > https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
> > > > https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
> > > > 
> > > > ! Please note, there is still unresolved locking question here for which
> > > > I failed to find a suitable solution. So, it is still an RFC !
> > > 
> > > Just FYI, I thought I'd share some of the plans for ABI v2.  Obviously
> > > these plans are future work and don't solve the current problem.
> > > 
> > > Guests mapping Xen pages is backwards.  There are reasons why it was
> > > used for x86 PV guests, but the entire interface should have been design
> > > differently for x86 HVM.
> > > 
> > > In particular, Xen should be mapping guest RAM, rather than the guest
> > > manipulating the 2nd stage tables to map Xen RAM.  Amongst other things,
> > > its far far lower overhead.
> > > 
> > > 
> > > A much better design is one where the grant table looks like an MMIO
> > > device.  The domain builder decides the ABI (v1 vs v2 - none of this
> > > dynamic switch at runtime nonsense), and picks a block of guest physical
> > > addresses, which are registered with Xen.  This forms the grant table,
> > > status table (v2 only), and holes to map into.
> > 
> > I think this could be problematic for identity mapped Arm dom0, as
> > IIRC in that case grants are mapped so that gfn == mfn in order to
> > account for the lack of an IOMMU. You could use a bounce buffer, but
> > that would introduce a big performance penalty.
> 
> Or you could find a hole that is outside of the RAM regions. This is not
> trivial but not impossible (see [1]).

I certainly not familiar with the Arm identity map.

If you map them at random areas (so no longer identity mapped), how do
you pass the addresses to the physical devices for DMA operations? I
assume there must be some kind of translation then that converts from
gfn to mfn in order to cope with the lack of an IOMMU, and because
dom0 doesn't know the mfn of the grant reference in order to map it at
the same gfn.

Thanks, Roger.
Julien Grall Sept. 25, 2021, 1:48 a.m. UTC | #6
Hi Roger,

On 24/09/2021 21:10, Roger Pau Monné wrote:
> On Fri, Sep 24, 2021 at 07:52:24PM +0500, Julien Grall wrote:
>> Hi Roger,
>>
>> On 24/09/2021 13:41, Roger Pau Monné wrote:
>>> On Thu, Sep 23, 2021 at 09:59:26PM +0100, Andrew Cooper wrote:
>>>> On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
>>>>> Suggested-by: Julien Grall <jgrall@amazon.com>
>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>> ---
>>>>> You can find the related discussions at:
>>>>> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
>>>>> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
>>>>> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
>>>>>
>>>>> ! Please note, there is still unresolved locking question here for which
>>>>> I failed to find a suitable solution. So, it is still an RFC !
>>>>
>>>> Just FYI, I thought I'd share some of the plans for ABI v2.  Obviously
>>>> these plans are future work and don't solve the current problem.
>>>>
>>>> Guests mapping Xen pages is backwards.  There are reasons why it was
>>>> used for x86 PV guests, but the entire interface should have been design
>>>> differently for x86 HVM.
>>>>
>>>> In particular, Xen should be mapping guest RAM, rather than the guest
>>>> manipulating the 2nd stage tables to map Xen RAM.  Amongst other things,
>>>> its far far lower overhead.
>>>>
>>>>
>>>> A much better design is one where the grant table looks like an MMIO
>>>> device.  The domain builder decides the ABI (v1 vs v2 - none of this
>>>> dynamic switch at runtime nonsense), and picks a block of guest physical
>>>> addresses, which are registered with Xen.  This forms the grant table,
>>>> status table (v2 only), and holes to map into.
>>>
>>> I think this could be problematic for identity mapped Arm dom0, as
>>> IIRC in that case grants are mapped so that gfn == mfn in order to
>>> account for the lack of an IOMMU. You could use a bounce buffer, but
>>> that would introduce a big performance penalty.
>>
>> Or you could find a hole that is outside of the RAM regions. This is not
>> trivial but not impossible (see [1]).
> 
> I certainly not familiar with the Arm identity map.
> 
> If you map them at random areas (so no longer identity mapped), how do
> you pass the addresses to the physical devices for DMA operations? I
> assume there must be some kind of translation then that converts from
> gfn to mfn in order to cope with the lack of an IOMMU, 

For grant mapping, the hypercall will return the machine address in 
dev_bus_addr. Dom0, will keep the conversion dom0 GFN <-> MFN for later 
use in the swiotlb.

For foreign mapping, AFAICT, we are expecting them to bounce everytime. 
But DMA into a foreign mapping should be rarer.

> and because
> dom0 doesn't know the mfn of the grant reference in order to map it at
> the same gfn.

IIRC, we tried an approach where the grant mapping would be direct 
mapped in dom0. However, this was an issue on arm32 because Debian was 
(is?) using short descriptor page tables. This didn't allow dom0 to 
cover all the mappings and therefore some mappings would not be accessible.
Oleksandr Tyshchenko Oct. 14, 2021, 4:22 p.m. UTC | #7
Hello, all.

The potential issue on Arm (which might happen when remapping 
grant-table frame) is still present, it hasn't disappeared.
Some effort was put in trying to fix that by current patch. Although I 
have addressed (I hope) all review comments received for this patch, I 
realize this patch (in its current form) cannot go in without resolving 
locking issue I described in a post-commit message (we don't want to 
make things worse than the current state). I would appreciate any 
thoughts regarding that.


On 25.09.21 04:48, Julien Grall wrote:
> Hi Roger,
>
> On 24/09/2021 21:10, Roger Pau Monné wrote:
>> On Fri, Sep 24, 2021 at 07:52:24PM +0500, Julien Grall wrote:
>>> Hi Roger,
>>>
>>> On 24/09/2021 13:41, Roger Pau Monné wrote:
>>>> On Thu, Sep 23, 2021 at 09:59:26PM +0100, Andrew Cooper wrote:
>>>>> On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
>>>>>> Suggested-by: Julien Grall <jgrall@amazon.com>
>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> ---
>>>>>> You can find the related discussions at:
>>>>>> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/ 
>>>>>>
>>>>>> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/ 
>>>>>>
>>>>>> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/ 
>>>>>>
>>>>>>
>>>>>> ! Please note, there is still unresolved locking question here 
>>>>>> for which
>>>>>> I failed to find a suitable solution. So, it is still an RFC !
>>>>>
>>>>> Just FYI, I thought I'd share some of the plans for ABI v2.  
>>>>> Obviously
>>>>> these plans are future work and don't solve the current problem.
>>>>>
>>>>> Guests mapping Xen pages is backwards.  There are reasons why it was
>>>>> used for x86 PV guests, but the entire interface should have been 
>>>>> design
>>>>> differently for x86 HVM.
>>>>>
>>>>> In particular, Xen should be mapping guest RAM, rather than the guest
>>>>> manipulating the 2nd stage tables to map Xen RAM.  Amongst other 
>>>>> things,
>>>>> its far far lower overhead.
>>>>>
>>>>>
>>>>> A much better design is one where the grant table looks like an MMIO
>>>>> device.  The domain builder decides the ABI (v1 vs v2 - none of this
>>>>> dynamic switch at runtime nonsense), and picks a block of guest 
>>>>> physical
>>>>> addresses, which are registered with Xen.  This forms the grant 
>>>>> table,
>>>>> status table (v2 only), and holes to map into.
>>>>
>>>> I think this could be problematic for identity mapped Arm dom0, as
>>>> IIRC in that case grants are mapped so that gfn == mfn in order to
>>>> account for the lack of an IOMMU. You could use a bounce buffer, but
>>>> that would introduce a big performance penalty.
>>>
>>> Or you could find a hole that is outside of the RAM regions. This is 
>>> not
>>> trivial but not impossible (see [1]).
>>
>> I certainly not familiar with the Arm identity map.
>>
>> If you map them at random areas (so no longer identity mapped), how do
>> you pass the addresses to the physical devices for DMA operations? I
>> assume there must be some kind of translation then that converts from
>> gfn to mfn in order to cope with the lack of an IOMMU, 
>
> For grant mapping, the hypercall will return the machine address in 
> dev_bus_addr. Dom0, will keep the conversion dom0 GFN <-> MFN for 
> later use in the swiotlb.
>
> For foreign mapping, AFAICT, we are expecting them to bounce 
> everytime. But DMA into a foreign mapping should be rarer.
>
>> and because
>> dom0 doesn't know the mfn of the grant reference in order to map it at
>> the same gfn.
>
> IIRC, we tried an approach where the grant mapping would be direct 
> mapped in dom0. However, this was an issue on arm32 because Debian was 
> (is?) using short descriptor page tables. This didn't allow dom0 to 
> cover all the mappings and therefore some mappings would not be 
> accessible.
>
Julien Grall Nov. 25, 2021, 7:04 p.m. UTC | #8
Hi Oleksandr,

Apologies for the late answer. I was waiting for XSA-387 to go out 
before commenting.

On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Rework Arm implementation to store grant table frame GFN
> in struct page_info directly instead of keeping it in
> standalone status/shared arrays.
> 
> To cover 64-bit/40-bit IPA on Arm64/Arm32 we need the space
> to hold 52-bit/28-bit + extra bit value respectively. In order
> to not grow the size of struct page_info borrow the required
> amount of bits from type_info's count portion which current
> context won't suffer (currently only 1 bit is used on Arm).
> Please note, to minimize code changes and avoid introducing
> an extra #ifdef-s to the header, we keep the same amount of
> bits on both subarches, although the count portion on Arm64
> could be wider, so we waste some bits here.
> 
> Introduce corresponding PGT_* constructs and access macros.
> Update existing gnttab macros to deal with GFN value according
> to new location. Also update the use of count portion on Arm
> in share_xen_page_with_guest().
> 
> Update the P2M code to clean said GFN portion when putting
> a reference on the grant table page in p2m_put_l3_page().
> The added check is based on the assumption that grant table page
> is the xen_heap page and its entry has p2m_ram_rw type, which
> is correct. However, this check is not entirely precise and we
> might end up clearing the GFN portion for other xen_heap pages
> with the same p2m_type. But, this action is considered as
> harmless, since only grant table pages really use that portion.
> 
> And for everything to work correctly introduce arch-specific
> macros to be called from alloc_xenheap_pages()/free_xenheap_pages()
> which purposes on Arm are to clear the portion before use and
> make sure the portion is cleared after use, on x86 these are
> just stubs.
> 
> This patch is intended to fix the potential issue on Arm
> which might happen when remapping grant-table frame.
> A guest (or the toolstack) will unmap the grant-table frame
> using XENMEM_remove_physmap. This is a generic hypercall,
> so on x86, we are relying on the fact the M2P entry will
> be cleared on removal. For architecture without the M2P,
> the GFN would still be present in the grant frame/status
> array. So on the next call to map the page, we will end up to
> request the P2M to remove whatever mapping was the given GFN.
> This could well be another mapping.
> 
> Besides that, this patch simplifies arch code on Arm by
> removing arrays and corresponding management code and
> as the result gnttab_init_arch/gnttab_destroy_arch helpers
> and struct grant_table_arch become useless and can be
> dropped globally.

Before dropping the arch specific helpers, I would check with the RISC-v 
folks that they will not need it in the near future.

> Suggested-by: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> You can find the related discussions at:
> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/
> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/
> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/
> 
> ! Please note, there is still unresolved locking question here for which
> I failed to find a suitable solution. So, it is still an RFC !
> 
> According to the internal conversation:
> Now the GFN field in the struct page_info is accessed from
> gnttab_set_frame_gfn() in the grant table code and from page_set_frame_gfn()
> in the P2M code (the former uses the latter).
> 
> We need to prevent the concurrent access to this field. But, we cannot grab
> the grant lock from the P2M code because we will introduce a lock inversion.
> The page_set_frame_gfn() will be called from the P2M code with the p2m lock held
> and then acquire the grant table lock. The gnttab_map_frame() will do the inverse.

This is a tricky one. I think, we will:

   1) Need to use the P2M lock to protect the access to the GFN in the 
struct page_info *.
   2) Defer the call to page_set_frame_gfn() from gnttab_map_frame() to 
xenmem_add_to_physmap_one()
   3) In xenmem_add_to_physmap_one() hold the P2M lock while checking 
the page was not already mapped (e.g. page_get_frame_gfn() == 
INVALID_GFN) and do the mapping. Call page_set_frame_gfn() on success.

This would still allow the guest to shot itself in the foot (e.g. 
potentially removing the wrong mapping) if it tries concurrent hypercall 
but I believe we would not introduce issue like XSA-380.

At the end this would look quite similar to how x86 deal with the M2P 
update.

For the record, I actually considered whether it is worth to fully 
implement an M2P on Arm. We technically have space in the struct 
page_info for that. However, I don't see it necessary in other place of 
Xen, so I would prefer to keep the space free for other purpose (or 
event be able to remove it).

@Stefano, what do you think?

> 
> Changes RFC1 -> RFC2:
>   - update patch description
>   - add/update comments in code
>   - clarify check in p2m_put_l3_page()
>   - introduce arch_alloc_xenheap_page() and arch_free_xenheap_page()
>     and drop page_arch_init()
>   - add ASSERT to gnttab_shared_page() and gnttab_status_page()
>   - rework changes to Arm's struct page_info: do not split type_info,
>     allocate GFN portion by reducing count portion, create corresponding
>     PGT_* construct, etc
>   - update page_get_frame_gfn() and page_set_frame_gfn()
>   - update the use of count portion on Arm
>   - drop the leading underscore in the macro parameter names
> 
> Changes RFC2 -> RFC3:
>   - update patch description
>   - drop PGT_count_base and MASK_INSR() in share_xen_page_with_guest()
>   - update alloc_xenheap_page() and free_xenheap_page() for SEPARATE_XENHEAP
>     case (Arm32)
>   - provide an extra bit for GFN portion, to get PGT_INVALID_FRAME_GFN
>     one bit more than the maximum number of physical address bits on Arm32
> ---
>   xen/arch/arm/mm.c                 |  8 ++++--
>   xen/arch/arm/p2m.c                | 21 ++++++++++++---
>   xen/common/grant_table.c          |  9 -------
>   xen/common/page_alloc.c           | 20 +++++++++++++-
>   xen/include/asm-arm/grant_table.h | 57 +++++++++++++++------------------------
>   xen/include/asm-arm/mm.h          | 35 +++++++++++++++++++++---
>   xen/include/asm-x86/grant_table.h |  5 ----
>   xen/include/asm-x86/mm.h          |  4 +++
>   8 files changed, 99 insertions(+), 60 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index eea926d..b1e42e5 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1376,14 +1376,18 @@ unsigned long domain_get_maximum_gpfn(struct domain *d)
>   void share_xen_page_with_guest(struct page_info *page, struct domain *d,
>                                  enum XENSHARE_flags flags)
>   {
> +    unsigned long type_info;
> +
>       if ( page_get_owner(page) == d )
>           return;
>   
>       spin_lock(&d->page_alloc_lock);
>   
>       /* The incremented type count pins as writable or read-only. */
> -    page->u.inuse.type_info =
> -        (flags == SHARE_ro ? PGT_none : PGT_writable_page) | 1;
> +    type_info = page->u.inuse.type_info & ~(PGT_type_mask | PGT_count_mask);

The local variable can be avoided if you write:

page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
page->u.inuse.type_info |= ...;

> +    page->u.inuse.type_info = type_info |
> +        (flags == SHARE_ro ? PGT_none : PGT_writable_page) |
> +        MASK_INSR(1, PGT_count_mask);
>   
>       page_set_owner(page, d);
>       smp_wmb(); /* install valid domain ptr before updating refcnt. */
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 8b20b43..7a8d92d 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -718,8 +718,10 @@ static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
>    * TODO: Handle superpages, for now we only take special references for leaf
>    * pages (specifically foreign ones, which can't be super mapped today).
>    */
> -static void p2m_put_l3_page(const lpae_t pte)
> +static void p2m_put_l3_page(struct p2m_domain *p2m, const lpae_t pte)

You pass 'p2m' but you don't seem to use it.

>   {
> +    mfn_t mfn = lpae_get_mfn(pte);
> +
>       ASSERT(p2m_is_valid(pte));
>   
>       /*
> @@ -731,11 +733,22 @@ static void p2m_put_l3_page(const lpae_t pte)
>        */
>       if ( p2m_is_foreign(pte.p2m.type) )
>       {
> -        mfn_t mfn = lpae_get_mfn(pte);
> -
>           ASSERT(mfn_valid(mfn));
>           put_page(mfn_to_page(mfn));
>       }
> +
> +#ifdef CONFIG_GRANT_TABLE
> +    /*
> +     * Check whether we deal with grant table page. As the grant table page
> +     * is xen_heap page and its entry has known p2m type, detect it and mark
> +     * the stored GFN as invalid. Although this check is not precise and we
> +     * might end up updating this for other xen_heap pages, this action is
> +     * harmless to these pages since only grant table pages have this field
> +     * in use. So, at worst, unnecessary action might be performed.
> +     */
> +    if ( (pte.p2m.type == p2m_ram_rw) && is_xen_heap_mfn(mfn) )

I would use p2m_is_ram() to cover read-only mapping. I think it would 
also be better to use an ``else if`` so it is clear that this doesn't 
cover foreign mapping (it is possible to map xenheap page from another 
domain).

> +        page_set_frame_gfn(mfn_to_page(mfn), INVALID_GFN);
> +#endif
>   }
>   
>   /* Free lpae sub-tree behind an entry */
> @@ -768,7 +781,7 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>           p2m->stats.mappings[level]--;
>           /* Nothing to do if the entry is a super-page. */
>           if ( level == 3 )
> -            p2m_put_l3_page(entry);
> +            p2m_put_l3_page(p2m, entry);
>           return;
>       }
>   
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index fe1fc11..08fc827 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -93,8 +93,6 @@ struct grant_table {
>   
>       /* Domain to which this struct grant_table belongs. */
>       const struct domain *domain;
> -
> -    struct grant_table_arch arch;
>   };
>   
>   unsigned int __read_mostly opt_max_grant_frames = 64;
> @@ -1981,14 +1979,9 @@ int grant_table_init(struct domain *d, int max_grant_frames,
>   
>       grant_write_lock(gt);
>   
> -    ret = gnttab_init_arch(gt);
> -    if ( ret )
> -        goto unlock;
> -
>       /* gnttab_grow_table() allocates a min number of frames, so 0 is okay. */
>       ret = gnttab_grow_table(d, 0);
>   
> - unlock:
>       grant_write_unlock(gt);
>   
>    out:
> @@ -3894,8 +3887,6 @@ grant_table_destroy(
>       if ( t == NULL )
>           return;
>   
> -    gnttab_destroy_arch(t);
> -
>       for ( i = 0; i < nr_grant_frames(t); i++ )
>           free_xenheap_page(t->shared_raw[i]);
>       xfree(t->shared_raw);
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 5801358..aafd847 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2161,6 +2161,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
>   void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>   {
>       struct page_info *pg;
> +    unsigned int i;
>   
>       ASSERT(!in_irq());
>   
> @@ -2169,6 +2170,9 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>       if ( unlikely(pg == NULL) )
>           return NULL;
>   
> +    for ( i = 0; i < (1u << order); i++ )
> +        arch_alloc_xenheap_page(&pg[i]);
> +
>       memguard_unguard_range(page_to_virt(pg), 1 << (order + PAGE_SHIFT));
>   
>       return page_to_virt(pg);
> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>   
>   void free_xenheap_pages(void *v, unsigned int order)
>   {
> +    struct page_info *pg;
> +    unsigned int i;
> +
>       ASSERT(!in_irq());
>   
>       if ( v == NULL )
>           return;
>   
> +    pg = virt_to_page(v);
> +
>       memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>   
> -    free_heap_pages(virt_to_page(v), order, false);
> +    for ( i = 0; i < (1u << order); i++ )
> +        arch_free_xenheap_page(&pg[i]);
> +
> +    free_heap_pages(pg, order, false);
>   }
>   
>   #else  /* !CONFIG_SEPARATE_XENHEAP */
> @@ -2220,7 +2232,10 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>           return NULL;
>   
>       for ( i = 0; i < (1u << order); i++ )
> +    {
>           pg[i].count_info |= PGC_xen_heap;
> +        arch_alloc_xenheap_page(&pg[i]);
> +    }
>   
>       return page_to_virt(pg);
>   }
> @@ -2238,7 +2253,10 @@ void free_xenheap_pages(void *v, unsigned int order)
>       pg = virt_to_page(v);
>   
>       for ( i = 0; i < (1u << order); i++ )
> +    {
>           pg[i].count_info &= ~PGC_xen_heap;
> +        arch_free_xenheap_page(&pg[i]);
> +    }
>   
>       free_heap_pages(pg, order, true);
>   }
> diff --git a/xen/include/asm-arm/grant_table.h b/xen/include/asm-arm/grant_table.h
> index 0ce77f9..479339d 100644
> --- a/xen/include/asm-arm/grant_table.h
> +++ b/xen/include/asm-arm/grant_table.h
> @@ -11,11 +11,6 @@
>   #define INITIAL_NR_GRANT_FRAMES 1U
>   #define GNTTAB_MAX_VERSION 1
>   
> -struct grant_table_arch {
> -    gfn_t *shared_gfn;
> -    gfn_t *status_gfn;
> -};
> -
>   static inline void gnttab_clear_flags(struct domain *d,
>                                         unsigned int mask, uint16_t *addr)
>   {
> @@ -46,35 +41,11 @@ int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
>   #define gnttab_dom0_frames()                                             \
>       min_t(unsigned int, opt_max_grant_frames, PFN_DOWN(_etext - _stext))
>   
> -#define gnttab_init_arch(gt)                                             \
> -({                                                                       \
> -    unsigned int ngf_ = (gt)->max_grant_frames;                          \
> -    unsigned int nsf_ = grant_to_status_frames(ngf_);                    \
> -                                                                         \
> -    (gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_);                  \
> -    (gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_);                  \
> -    if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn )                \
> -    {                                                                    \
> -        while ( ngf_-- )                                                 \
> -            (gt)->arch.shared_gfn[ngf_] = INVALID_GFN;                   \
> -        while ( nsf_-- )                                                 \
> -            (gt)->arch.status_gfn[nsf_] = INVALID_GFN;                   \
> -    }                                                                    \
> -    else                                                                 \
> -        gnttab_destroy_arch(gt);                                         \
> -    (gt)->arch.shared_gfn ? 0 : -ENOMEM;                                 \
> -})
> -
> -#define gnttab_destroy_arch(gt)                                          \
> -    do {                                                                 \
> -        XFREE((gt)->arch.shared_gfn);                                    \
> -        XFREE((gt)->arch.status_gfn);                                    \
> -    } while ( 0 )
> -
>   #define gnttab_set_frame_gfn(gt, st, idx, gfn)                           \
>       do {                                                                 \
> -        ((st) ? (gt)->arch.status_gfn : (gt)->arch.shared_gfn)[idx] =    \
> -            (gfn);                                                       \
> +        struct page_info *pg_ = (st) ? gnttab_status_page(gt, idx)       \
> +                                     : gnttab_shared_page(gt, idx);      \
> +        page_set_frame_gfn(pg_, gfn);                                    \
>       } while ( 0 )
>   
>   #define gnttab_get_frame_gfn(gt, st, idx) ({                             \
> @@ -82,11 +53,25 @@ int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
>           : gnttab_shared_gfn(NULL, gt, idx);                              \
>   })
>   
> -#define gnttab_shared_gfn(d, t, i)                                       \
> -    (((i) >= nr_grant_frames(t)) ? INVALID_GFN : (t)->arch.shared_gfn[i])
> +#define gnttab_shared_page(t, i) ({                                      \
> +    ASSERT((t)->shared_raw[i]);                                          \

The ASSERT() is unnecessary because virt_to_mfn() will panic() if the 
virtual address is not mapped.

> +    mfn_to_page(_mfn(__virt_to_mfn((t)->shared_raw[i])));                \

You can directly use virt_to_page(...) for xenheap which also contains
some ASSERT() preventing NULL among other value.

> +})
> +
> +#define gnttab_status_page(t, i) ({                                      \
> +    ASSERT((t)->status[i]);                                              \
> +    mfn_to_page(_mfn(__virt_to_mfn((t)->status[i])));                    \

Same here.

> +})
>   
> -#define gnttab_status_gfn(d, t, i)                                       \
> -    (((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i])
> +#define gnttab_shared_gfn(d, t, i) ({                                    \
> +    struct page_info *pg_ = gnttab_shared_page(t, i);                    \

NIT: I would drop the local variable.

> +    page_get_frame_gfn(pg_);                                             \
> +})
> +
> +#define gnttab_status_gfn(d, t, i) ({                                    \
> +    struct page_info *pg_ = gnttab_status_page(t, i);                    \

NIT: I would drop the local variable.

> +    page_get_frame_gfn(pg_);                                             \
> +})
>   
>   #define gnttab_need_iommu_mapping(d)                    \
>       (is_domain_direct_mapped(d) && is_iommu_enabled(d))
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index 7b5e7b7..a00c5f5 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -98,9 +98,17 @@ struct page_info
>   #define PGT_writable_page PG_mask(1, 1)  /* has writable mappings?         */
>   #define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 63.                 */
>   
> - /* Count of uses of this frame as its current type. */
> -#define PGT_count_width   PG_shift(2)
> -#define PGT_count_mask    ((1UL<<PGT_count_width)-1)
> + /* 2-bit count of uses of this frame as its current type. */
> +#define PGT_count_mask    PG_mask(3, 3)
> +
> +/*
> + * Stored in bits [28:0] or [60:0] GFN if page is used for grant table frame.

I think this wording is conflicting with ...

> + * This only valid for the xenheap pages.

... this becase xen heap pages are used in other situations. But I would 
prefer if the comment doesn't mention grant-table frame. This would 
allow use to repurpose the field for other xenheap if needed.

Typo: This *is* only valid

> + */
> +#define PGT_gfn_width     PG_shift(3)
> +#define PGT_gfn_mask      ((1UL<<PGT_gfn_width)-1)
> +
> +#define PGT_INVALID_FRAME_GFN   _gfn(PGT_gfn_mask)
>   
>    /* Cleared when the owning guest 'frees' this page. */
>   #define _PGC_allocated    PG_shift(1)
> @@ -166,6 +174,27 @@ extern unsigned long xenheap_base_pdx;
>   
>   #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
>   
> +#define page_get_frame_gfn(p) ({                                \

Above, you wrote:

"This is only valid for xenheap pages". So I would add an ASSERT() to 
confirm that.

> +    gfn_t gfn_ = _gfn((p)->u.inuse.type_info & PGT_gfn_mask);   \
> +    gfn_eq(gfn_, PGT_INVALID_FRAME_GFN) ? INVALID_GFN : gfn_;   \
> +})

Can the function be converted to a static inline?

> +
> +#define page_set_frame_gfn(p, gfn) ({                           \

Same questions as above to add an ASSERT() and convert to a static inline.

> +    gfn_t gfn_ = gfn_eq(gfn, INVALID_GFN) ?                     \
> +                 PGT_INVALID_FRAME_GFN : gfn;                   \
> +    (p)->u.inuse.type_info &= ~PGT_gfn_mask;                    \
> +    (p)->u.inuse.type_info |= gfn_x(gfn_);                      \
> +})
> +
> +/*
> + * As the struct page_info representing the xen_heap page can contain
> + * the grant table frame GFN on Arm we need to clear it beforehand and
> + * make sure it is not still set when freeing a page.
> + */
I would prefer if the comment doesn't mention grant-table frame. This 
would allow use to repurpose the field for other xenheap if needed.

> +#define arch_alloc_xenheap_page(p)   page_set_frame_gfn(p, INVALID_GFN)
> +#define arch_free_xenheap_page(p) \
> +    BUG_ON(!gfn_eq(page_get_frame_gfn(p), INVALID_GFN))

I believe this BUG_ON() could be triggered if gnttab_map_frame() 
succeeds but then we fail to insert the entry in the P2M. This means we 
would need to revert changes done in gnttab_map_frame() in case of failure.

However, I am still a bit unease with the BUG_ON(). A domain will not 
necessarily remove the grant-table mapping from its P2M before shutting 
down. So you are relying on Xen to go through the P2M before we free the 
page.

This is the case today, but I am not sure we would want to rely on it 
because it will be hard to remember this requirement if we decide to 
optimize p2m_relinquish().

One possibility would be to add a comment in p2m_relinquish(). That's 
assuming there are no other places which could lead to false positively 
hit the BUG().

> +
>   #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>   /* PDX of the first page in the frame table. */
>   extern unsigned long frametable_base_pdx;
> diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
> index 84e3296..0eb018f 100644
> --- a/xen/include/asm-x86/grant_table.h
> +++ b/xen/include/asm-x86/grant_table.h
> @@ -14,9 +14,6 @@
>   
>   #define INITIAL_NR_GRANT_FRAMES 1U
>   
> -struct grant_table_arch {
> -};
> -
>   static inline int create_grant_host_mapping(uint64_t addr, mfn_t frame,
>                                               unsigned int flags,
>                                               unsigned int cache_flags)
> @@ -35,8 +32,6 @@ static inline int replace_grant_host_mapping(uint64_t addr, mfn_t frame,
>       return replace_grant_pv_mapping(addr, frame, new_addr, flags);
>   }
>   
> -#define gnttab_init_arch(gt) 0
> -#define gnttab_destroy_arch(gt) do {} while ( 0 )
>   #define gnttab_set_frame_gfn(gt, st, idx, gfn) do {} while ( 0 )
>   #define gnttab_get_frame_gfn(gt, st, idx) ({                             \
>       mfn_t mfn_ = (st) ? gnttab_status_mfn(gt, idx)                       \
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index cb90527..04d8704 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -327,6 +327,10 @@ struct page_info
>   
>   #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
>   
> +/* No arch-specific actions are needed for the xen_heap page */
> +#define arch_alloc_xenheap_page(p)   do {} while ( 0 )
> +#define arch_free_xenheap_page(p)    do {} while ( 0 )
> +
>   #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>   extern unsigned long max_page;
>   extern unsigned long total_pages;
>
Oleksandr Tyshchenko Nov. 26, 2021, 1:51 p.m. UTC | #9
On 25.11.21 21:04, Julien Grall wrote:
> Hi Oleksandr,
>
> Apologies for the late answer. I was waiting for XSA-387 to go out 
> before commenting.


Hi Julien,


I got it, no problem


>
>
> On 23/09/2021 20:32, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Rework Arm implementation to store grant table frame GFN
>> in struct page_info directly instead of keeping it in
>> standalone status/shared arrays.
>>
>> To cover 64-bit/40-bit IPA on Arm64/Arm32 we need the space
>> to hold 52-bit/28-bit + extra bit value respectively. In order
>> to not grow the size of struct page_info borrow the required
>> amount of bits from type_info's count portion which current
>> context won't suffer (currently only 1 bit is used on Arm).
>> Please note, to minimize code changes and avoid introducing
>> an extra #ifdef-s to the header, we keep the same amount of
>> bits on both subarches, although the count portion on Arm64
>> could be wider, so we waste some bits here.
>>
>> Introduce corresponding PGT_* constructs and access macros.
>> Update existing gnttab macros to deal with GFN value according
>> to new location. Also update the use of count portion on Arm
>> in share_xen_page_with_guest().
>>
>> Update the P2M code to clean said GFN portion when putting
>> a reference on the grant table page in p2m_put_l3_page().
>> The added check is based on the assumption that grant table page
>> is the xen_heap page and its entry has p2m_ram_rw type, which
>> is correct. However, this check is not entirely precise and we
>> might end up clearing the GFN portion for other xen_heap pages
>> with the same p2m_type. But, this action is considered as
>> harmless, since only grant table pages really use that portion.
>>
>> And for everything to work correctly introduce arch-specific
>> macros to be called from alloc_xenheap_pages()/free_xenheap_pages()
>> which purposes on Arm are to clear the portion before use and
>> make sure the portion is cleared after use, on x86 these are
>> just stubs.
>>
>> This patch is intended to fix the potential issue on Arm
>> which might happen when remapping grant-table frame.
>> A guest (or the toolstack) will unmap the grant-table frame
>> using XENMEM_remove_physmap. This is a generic hypercall,
>> so on x86, we are relying on the fact the M2P entry will
>> be cleared on removal. For architecture without the M2P,
>> the GFN would still be present in the grant frame/status
>> array. So on the next call to map the page, we will end up to
>> request the P2M to remove whatever mapping was the given GFN.
>> This could well be another mapping.
>>
>> Besides that, this patch simplifies arch code on Arm by
>> removing arrays and corresponding management code and
>> as the result gnttab_init_arch/gnttab_destroy_arch helpers
>> and struct grant_table_arch become useless and can be
>> dropped globally.
>
> Before dropping the arch specific helpers, I would check with the 
> RISC-v folks that they will not need it in the near future.

arch/riscv/configs/tiny64_defconfig says that CONFIG_GRANT_TABLE is not 
set, for me it sounds like unlikely for *the near* future. But, anyway, 
it would be better to clarify.


>
>
>> Suggested-by: Julien Grall <jgrall@amazon.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> You can find the related discussions at:
>> https://lore.kernel.org/xen-devel/93d0df14-2c8a-c2e3-8c51-54412190171c@xen.org/ 
>>
>> https://lore.kernel.org/xen-devel/1628890077-12545-1-git-send-email-olekstysh@gmail.com/ 
>>
>> https://lore.kernel.org/xen-devel/1631652245-30746-1-git-send-email-olekstysh@gmail.com/ 
>>
>>
>> ! Please note, there is still unresolved locking question here for which
>> I failed to find a suitable solution. So, it is still an RFC !
>>
>> According to the internal conversation:
>> Now the GFN field in the struct page_info is accessed from
>> gnttab_set_frame_gfn() in the grant table code and from 
>> page_set_frame_gfn()
>> in the P2M code (the former uses the latter).
>>
>> We need to prevent the concurrent access to this field. But, we 
>> cannot grab
>> the grant lock from the P2M code because we will introduce a lock 
>> inversion.
>> The page_set_frame_gfn() will be called from the P2M code with the 
>> p2m lock held
>> and then acquire the grant table lock. The gnttab_map_frame() will do 
>> the inverse.
>
> This is a tricky one. I think, we will:
>
>   1) Need to use the P2M lock to protect the access to the GFN in the 
> struct page_info *.
>   2) Defer the call to page_set_frame_gfn() from gnttab_map_frame() to 
> xenmem_add_to_physmap_one()
>   3) In xenmem_add_to_physmap_one() hold the P2M lock while checking 
> the page was not already mapped (e.g. page_get_frame_gfn() == 
> INVALID_GFN) and do the mapping. Call page_set_frame_gfn() on success.
>
> This would still allow the guest to shot itself in the foot (e.g. 
> potentially removing the wrong mapping) if it tries concurrent 
> hypercall but I believe we would not introduce issue like XSA-380.
>
> At the end this would look quite similar to how x86 deal with the M2P 
> update.

Thank you for the suggestion, I need to analyze the code to better 
understand your idea and technical possibility to implement it, I will 
come up with questions if any.


>
>
> For the record, I actually considered whether it is worth to fully 
> implement an M2P on Arm. We technically have space in the struct 
> page_info for that. However, I don't see it necessary in other place 
> of Xen, so I would prefer to keep the space free for other purpose (or 
> event be able to remove it).
>
> @Stefano, what do you think?
>
>>
>> Changes RFC1 -> RFC2:
>>   - update patch description
>>   - add/update comments in code
>>   - clarify check in p2m_put_l3_page()
>>   - introduce arch_alloc_xenheap_page() and arch_free_xenheap_page()
>>     and drop page_arch_init()
>>   - add ASSERT to gnttab_shared_page() and gnttab_status_page()
>>   - rework changes to Arm's struct page_info: do not split type_info,
>>     allocate GFN portion by reducing count portion, create corresponding
>>     PGT_* construct, etc
>>   - update page_get_frame_gfn() and page_set_frame_gfn()
>>   - update the use of count portion on Arm
>>   - drop the leading underscore in the macro parameter names
>>
>> Changes RFC2 -> RFC3:
>>   - update patch description
>>   - drop PGT_count_base and MASK_INSR() in share_xen_page_with_guest()
>>   - update alloc_xenheap_page() and free_xenheap_page() for 
>> SEPARATE_XENHEAP
>>     case (Arm32)
>>   - provide an extra bit for GFN portion, to get PGT_INVALID_FRAME_GFN
>>     one bit more than the maximum number of physical address bits on 
>> Arm32
>> ---
>>   xen/arch/arm/mm.c                 |  8 ++++--
>>   xen/arch/arm/p2m.c                | 21 ++++++++++++---
>>   xen/common/grant_table.c          |  9 -------
>>   xen/common/page_alloc.c           | 20 +++++++++++++-
>>   xen/include/asm-arm/grant_table.h | 57 
>> +++++++++++++++------------------------
>>   xen/include/asm-arm/mm.h          | 35 +++++++++++++++++++++---
>>   xen/include/asm-x86/grant_table.h |  5 ----
>>   xen/include/asm-x86/mm.h          |  4 +++
>>   8 files changed, 99 insertions(+), 60 deletions(-)
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index eea926d..b1e42e5 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -1376,14 +1376,18 @@ unsigned long domain_get_maximum_gpfn(struct 
>> domain *d)
>>   void share_xen_page_with_guest(struct page_info *page, struct 
>> domain *d,
>>                                  enum XENSHARE_flags flags)
>>   {
>> +    unsigned long type_info;
>> +
>>       if ( page_get_owner(page) == d )
>>           return;
>>         spin_lock(&d->page_alloc_lock);
>>         /* The incremented type count pins as writable or read-only. */
>> -    page->u.inuse.type_info =
>> -        (flags == SHARE_ro ? PGT_none : PGT_writable_page) | 1;
>> +    type_info = page->u.inuse.type_info & ~(PGT_type_mask | 
>> PGT_count_mask);
>
> The local variable can be avoided if you write:
>
> page->u.inuse.type_info &= ~(PGT_type_mask | PGT_count_mask);
> page->u.inuse.type_info |= ...;

agree, will do


>
>
>> +    page->u.inuse.type_info = type_info |
>> +        (flags == SHARE_ro ? PGT_none : PGT_writable_page) |
>> +        MASK_INSR(1, PGT_count_mask);
>>         page_set_owner(page, d);
>>       smp_wmb(); /* install valid domain ptr before updating refcnt. */
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 8b20b43..7a8d92d 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -718,8 +718,10 @@ static int p2m_mem_access_radix_set(struct 
>> p2m_domain *p2m, gfn_t gfn,
>>    * TODO: Handle superpages, for now we only take special references 
>> for leaf
>>    * pages (specifically foreign ones, which can't be super mapped 
>> today).
>>    */
>> -static void p2m_put_l3_page(const lpae_t pte)
>> +static void p2m_put_l3_page(struct p2m_domain *p2m, const lpae_t pte)
>
> You pass 'p2m' but you don't seem to use it.

indeed, p2m->domain was needed for the first version of this patch, but 
with current implementation it is not needed anymore, will drop.


>
>
>>   {
>> +    mfn_t mfn = lpae_get_mfn(pte);
>> +
>>       ASSERT(p2m_is_valid(pte));
>>         /*
>> @@ -731,11 +733,22 @@ static void p2m_put_l3_page(const lpae_t pte)
>>        */
>>       if ( p2m_is_foreign(pte.p2m.type) )
>>       {
>> -        mfn_t mfn = lpae_get_mfn(pte);
>> -
>>           ASSERT(mfn_valid(mfn));
>>           put_page(mfn_to_page(mfn));
>>       }
>> +
>> +#ifdef CONFIG_GRANT_TABLE
>> +    /*
>> +     * Check whether we deal with grant table page. As the grant 
>> table page
>> +     * is xen_heap page and its entry has known p2m type, detect it 
>> and mark
>> +     * the stored GFN as invalid. Although this check is not precise 
>> and we
>> +     * might end up updating this for other xen_heap pages, this 
>> action is
>> +     * harmless to these pages since only grant table pages have 
>> this field
>> +     * in use. So, at worst, unnecessary action might be performed.
>> +     */
>> +    if ( (pte.p2m.type == p2m_ram_rw) && is_xen_heap_mfn(mfn) )
>
> I would use p2m_is_ram() to cover read-only mapping. I think it would 
> also be better to use an ``else if`` so it is clear that this doesn't 
> cover foreign mapping (it is possible to map xenheap page from another 
> domain).

ok, will use p2m_is_ram() and ``else if`` construct, however I don't 
entirely understand why we also want/need to include read-only pages (as 
type is set to p2m_ram_rw in xenmem_add_to_physmap_one() for case 
XENMAPSPACE_grant_table)?


>
>
>> + page_set_frame_gfn(mfn_to_page(mfn), INVALID_GFN);
>> +#endif
>>   }
>>     /* Free lpae sub-tree behind an entry */
>> @@ -768,7 +781,7 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>>           p2m->stats.mappings[level]--;
>>           /* Nothing to do if the entry is a super-page. */
>>           if ( level == 3 )
>> -            p2m_put_l3_page(entry);
>> +            p2m_put_l3_page(p2m, entry);
>>           return;
>>       }
>>   diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
>> index fe1fc11..08fc827 100644
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -93,8 +93,6 @@ struct grant_table {
>>         /* Domain to which this struct grant_table belongs. */
>>       const struct domain *domain;
>> -
>> -    struct grant_table_arch arch;
>>   };
>>     unsigned int __read_mostly opt_max_grant_frames = 64;
>> @@ -1981,14 +1979,9 @@ int grant_table_init(struct domain *d, int 
>> max_grant_frames,
>>         grant_write_lock(gt);
>>   -    ret = gnttab_init_arch(gt);
>> -    if ( ret )
>> -        goto unlock;
>> -
>>       /* gnttab_grow_table() allocates a min number of frames, so 0 
>> is okay. */
>>       ret = gnttab_grow_table(d, 0);
>>   - unlock:
>>       grant_write_unlock(gt);
>>      out:
>> @@ -3894,8 +3887,6 @@ grant_table_destroy(
>>       if ( t == NULL )
>>           return;
>>   -    gnttab_destroy_arch(t);
>> -
>>       for ( i = 0; i < nr_grant_frames(t); i++ )
>>           free_xenheap_page(t->shared_raw[i]);
>>       xfree(t->shared_raw);
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index 5801358..aafd847 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -2161,6 +2161,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
>>   void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>>   {
>>       struct page_info *pg;
>> +    unsigned int i;
>>         ASSERT(!in_irq());
>>   @@ -2169,6 +2170,9 @@ void *alloc_xenheap_pages(unsigned int order, 
>> unsigned int memflags)
>>       if ( unlikely(pg == NULL) )
>>           return NULL;
>>   +    for ( i = 0; i < (1u << order); i++ )
>> +        arch_alloc_xenheap_page(&pg[i]);
>> +
>>       memguard_unguard_range(page_to_virt(pg), 1 << (order + 
>> PAGE_SHIFT));
>>         return page_to_virt(pg);
>> @@ -2177,14 +2181,22 @@ void *alloc_xenheap_pages(unsigned int order, 
>> unsigned int memflags)
>>     void free_xenheap_pages(void *v, unsigned int order)
>>   {
>> +    struct page_info *pg;
>> +    unsigned int i;
>> +
>>       ASSERT(!in_irq());
>>         if ( v == NULL )
>>           return;
>>   +    pg = virt_to_page(v);
>> +
>>       memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
>>   -    free_heap_pages(virt_to_page(v), order, false);
>> +    for ( i = 0; i < (1u << order); i++ )
>> +        arch_free_xenheap_page(&pg[i]);
>> +
>> +    free_heap_pages(pg, order, false);
>>   }
>>     #else  /* !CONFIG_SEPARATE_XENHEAP */
>> @@ -2220,7 +2232,10 @@ void *alloc_xenheap_pages(unsigned int order, 
>> unsigned int memflags)
>>           return NULL;
>>         for ( i = 0; i < (1u << order); i++ )
>> +    {
>>           pg[i].count_info |= PGC_xen_heap;
>> +        arch_alloc_xenheap_page(&pg[i]);
>> +    }
>>         return page_to_virt(pg);
>>   }
>> @@ -2238,7 +2253,10 @@ void free_xenheap_pages(void *v, unsigned int 
>> order)
>>       pg = virt_to_page(v);
>>         for ( i = 0; i < (1u << order); i++ )
>> +    {
>>           pg[i].count_info &= ~PGC_xen_heap;
>> +        arch_free_xenheap_page(&pg[i]);
>> +    }
>>         free_heap_pages(pg, order, true);
>>   }
>> diff --git a/xen/include/asm-arm/grant_table.h 
>> b/xen/include/asm-arm/grant_table.h
>> index 0ce77f9..479339d 100644
>> --- a/xen/include/asm-arm/grant_table.h
>> +++ b/xen/include/asm-arm/grant_table.h
>> @@ -11,11 +11,6 @@
>>   #define INITIAL_NR_GRANT_FRAMES 1U
>>   #define GNTTAB_MAX_VERSION 1
>>   -struct grant_table_arch {
>> -    gfn_t *shared_gfn;
>> -    gfn_t *status_gfn;
>> -};
>> -
>>   static inline void gnttab_clear_flags(struct domain *d,
>>                                         unsigned int mask, uint16_t 
>> *addr)
>>   {
>> @@ -46,35 +41,11 @@ int replace_grant_host_mapping(unsigned long 
>> gpaddr, mfn_t mfn,
>>   #define gnttab_dom0_frames() \
>>       min_t(unsigned int, opt_max_grant_frames, PFN_DOWN(_etext - 
>> _stext))
>>   -#define gnttab_init_arch(gt) \
>> -({ \
>> -    unsigned int ngf_ = 
>> (gt)->max_grant_frames;                          \
>> -    unsigned int nsf_ = 
>> grant_to_status_frames(ngf_);                    \
>> - \
>> -    (gt)->arch.shared_gfn = xmalloc_array(gfn_t, 
>> ngf_);                  \
>> -    (gt)->arch.status_gfn = xmalloc_array(gfn_t, 
>> nsf_);                  \
>> -    if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn 
>> )                \
>> - { \
>> -        while ( ngf_-- 
>> )                                                 \
>> -            (gt)->arch.shared_gfn[ngf_] = 
>> INVALID_GFN;                   \
>> -        while ( nsf_-- 
>> )                                                 \
>> -            (gt)->arch.status_gfn[nsf_] = 
>> INVALID_GFN;                   \
>> - } \
>> - else \
>> - gnttab_destroy_arch(gt); \
>> -    (gt)->arch.shared_gfn ? 0 : 
>> -ENOMEM;                                 \
>> -})
>> -
>> -#define gnttab_destroy_arch(gt) \
>> -    do { \
>> - XFREE((gt)->arch.shared_gfn); \
>> - XFREE((gt)->arch.status_gfn); \
>> -    } while ( 0 )
>> -
>>   #define gnttab_set_frame_gfn(gt, st, idx, 
>> gfn)                           \
>>       do { \
>> -        ((st) ? (gt)->arch.status_gfn : (gt)->arch.shared_gfn)[idx] 
>> =    \
>> - (gfn);                                                       \
>> +        struct page_info *pg_ = (st) ? gnttab_status_page(gt, 
>> idx)       \
>> +                                     : gnttab_shared_page(gt, 
>> idx);      \
>> +        page_set_frame_gfn(pg_, 
>> gfn);                                    \
>>       } while ( 0 )
>>     #define gnttab_get_frame_gfn(gt, st, idx) 
>> ({                             \
>> @@ -82,11 +53,25 @@ int replace_grant_host_mapping(unsigned long 
>> gpaddr, mfn_t mfn,
>>           : gnttab_shared_gfn(NULL, gt, 
>> idx);                              \
>>   })
>>   -#define gnttab_shared_gfn(d, t, 
>> i)                                       \
>> -    (((i) >= nr_grant_frames(t)) ? INVALID_GFN : 
>> (t)->arch.shared_gfn[i])
>> +#define gnttab_shared_page(t, i) 
>> ({                                      \
>> + ASSERT((t)->shared_raw[i]); \
>
> The ASSERT() is unnecessary because virt_to_mfn() will panic() if the 
> virtual address is not mapped.

Right, ...


>
>
>> + mfn_to_page(_mfn(__virt_to_mfn((t)->shared_raw[i]))); \
>
> You can directly use virt_to_page(...) for xenheap which also contains
> some ASSERT() preventing NULL among other value.

    ... indeed this would look much better


>
>
>> +})
>> +
>> +#define gnttab_status_page(t, i) 
>> ({                                      \
>> + ASSERT((t)->status[i]); \
>> + mfn_to_page(_mfn(__virt_to_mfn((t)->status[i]))); \
>
> Same here.

ok


>
>> +})
>>   -#define gnttab_status_gfn(d, t, 
>> i)                                       \
>> -    (((i) >= nr_status_frames(t)) ? INVALID_GFN : 
>> (t)->arch.status_gfn[i])
>> +#define gnttab_shared_gfn(d, t, i) 
>> ({                                    \
>> +    struct page_info *pg_ = gnttab_shared_page(t, 
>> i);                    \
>
> NIT: I would drop the local variable.

ok


>
>
>> + page_get_frame_gfn(pg_); \
>> +})
>> +
>> +#define gnttab_status_gfn(d, t, i) 
>> ({                                    \
>> +    struct page_info *pg_ = gnttab_status_page(t, 
>> i);                    \
>
> NIT: I would drop the local variable.

ok


>
>> + page_get_frame_gfn(pg_); \
>> +})
>>     #define gnttab_need_iommu_mapping(d)                    \
>>       (is_domain_direct_mapped(d) && is_iommu_enabled(d))
>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
>> index 7b5e7b7..a00c5f5 100644
>> --- a/xen/include/asm-arm/mm.h
>> +++ b/xen/include/asm-arm/mm.h
>> @@ -98,9 +98,17 @@ struct page_info
>>   #define PGT_writable_page PG_mask(1, 1)  /* has writable 
>> mappings?         */
>>   #define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 
>> 63.                 */
>>   - /* Count of uses of this frame as its current type. */
>> -#define PGT_count_width   PG_shift(2)
>> -#define PGT_count_mask    ((1UL<<PGT_count_width)-1)
>> + /* 2-bit count of uses of this frame as its current type. */
>> +#define PGT_count_mask    PG_mask(3, 3)
>> +
>> +/*
>> + * Stored in bits [28:0] or [60:0] GFN if page is used for grant 
>> table frame.
>
> I think this wording is conflicting with ...
>
>> + * This only valid for the xenheap pages.
>
> ... this becase xen heap pages are used in other situations. But I 
> would prefer if the comment doesn't mention grant-table frame. This 
> would allow use to repurpose the field for other xenheap if needed.
>
> Typo: This *is* only valid


ok, so how about to simply mention it's purpose as xenheap GFN here and 
down this header?

For example,
Stored in bits [28:0] or [60:0] GFN if page is xenheap page.

BTW, shall I rename the access helpers page_set(get)_frame_gfn() as 
well? For me the frame is associated with grant-table.
Something to: page_set(get)_xenheap_gfn() or even page_set(get)_gfn().


>
>
>> + */
>> +#define PGT_gfn_width     PG_shift(3)
>> +#define PGT_gfn_mask      ((1UL<<PGT_gfn_width)-1)
>> +
>> +#define PGT_INVALID_FRAME_GFN   _gfn(PGT_gfn_mask)
>>      /* Cleared when the owning guest 'frees' this page. */
>>   #define _PGC_allocated    PG_shift(1)
>> @@ -166,6 +174,27 @@ extern unsigned long xenheap_base_pdx;
>>     #define maddr_get_owner(ma) (page_get_owner(maddr_to_page((ma))))
>>   +#define page_get_frame_gfn(p) ({                                \
>
> Above, you wrote:
>
> "This is only valid for xenheap pages". So I would add an ASSERT() to 
> confirm that.

Sounds reasonable, will add.


>
>
>> +    gfn_t gfn_ = _gfn((p)->u.inuse.type_info & PGT_gfn_mask);   \
>> +    gfn_eq(gfn_, PGT_INVALID_FRAME_GFN) ? INVALID_GFN : gfn_; \
>> +})
>
> Can the function be converted to a static inline?


I don't see why not from the first look, will convert.


>
>
>> +
>> +#define page_set_frame_gfn(p, gfn) ({ \
>
> Same questions as above to add an ASSERT() and convert to a static inline.


ok


>
>
>> +    gfn_t gfn_ = gfn_eq(gfn, INVALID_GFN) ?                     \
>> +                 PGT_INVALID_FRAME_GFN : gfn; \
>> +    (p)->u.inuse.type_info &= ~PGT_gfn_mask;                    \
>> +    (p)->u.inuse.type_info |= gfn_x(gfn_);                      \
>> +})
>> +
>> +/*
>> + * As the struct page_info representing the xen_heap page can contain
>> + * the grant table frame GFN on Arm we need to clear it beforehand and
>> + * make sure it is not still set when freeing a page.
>> + */
> I would prefer if the comment doesn't mention grant-table frame. This 
> would allow use to repurpose the field for other xenheap if needed.

ok


>
>
>> +#define arch_alloc_xenheap_page(p) page_set_frame_gfn(p, INVALID_GFN)
>> +#define arch_free_xenheap_page(p) \
>> +    BUG_ON(!gfn_eq(page_get_frame_gfn(p), INVALID_GFN))
>
> I believe this BUG_ON() could be triggered if gnttab_map_frame() 
> succeeds but then we fail to insert the entry in the P2M. This means 
> we would need to revert changes done in gnttab_map_frame() in case of 
> failure.
>
> However, I am still a bit unease with the BUG_ON(). A domain will not 
> necessarily remove the grant-table mapping from its P2M before 
> shutting down. So you are relying on Xen to go through the P2M before 
> we free the page.
>
> This is the case today, but I am not sure we would want to rely on it 
> because it will be hard to remember this requirement if we decide to 
> optimize p2m_relinquish().
>
> One possibility would be to add a comment in p2m_relinquish(). That's 
> assuming there are no other places which could lead to false 
> positively hit the BUG().

These make me think that it would be better (safer and simpler) to just 
remove this BUG_ON() for now. Do you agree?


>
>
>> +
>>   #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>>   /* PDX of the first page in the frame table. */
>>   extern unsigned long frametable_base_pdx;
>> diff --git a/xen/include/asm-x86/grant_table.h 
>> b/xen/include/asm-x86/grant_table.h
>> index 84e3296..0eb018f 100644
>> --- a/xen/include/asm-x86/grant_table.h
>> +++ b/xen/include/asm-x86/grant_table.h
>> @@ -14,9 +14,6 @@
>>     #define INITIAL_NR_GRANT_FRAMES 1U
>>   -struct grant_table_arch {
>> -};
>> -
>>   static inline int create_grant_host_mapping(uint64_t addr, mfn_t 
>> frame,
>>                                               unsigned int flags,
>>                                               unsigned int cache_flags)
>> @@ -35,8 +32,6 @@ static inline int 
>> replace_grant_host_mapping(uint64_t addr, mfn_t frame,
>>       return replace_grant_pv_mapping(addr, frame, new_addr, flags);
>>   }
>>   -#define gnttab_init_arch(gt) 0
>> -#define gnttab_destroy_arch(gt) do {} while ( 0 )
>>   #define gnttab_set_frame_gfn(gt, st, idx, gfn) do {} while ( 0 )
>>   #define gnttab_get_frame_gfn(gt, st, idx) 
>> ({                             \
>>       mfn_t mfn_ = (st) ? gnttab_status_mfn(gt, 
>> idx)                       \
>> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
>> index cb90527..04d8704 100644
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -327,6 +327,10 @@ struct page_info
>>     #define maddr_get_owner(ma) (page_get_owner(maddr_to_page((ma))))
>>   +/* No arch-specific actions are needed for the xen_heap page */
>> +#define arch_alloc_xenheap_page(p)   do {} while ( 0 )
>> +#define arch_free_xenheap_page(p)    do {} while ( 0 )
>> +
>>   #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
>>   extern unsigned long max_page;
>>   extern unsigned long total_pages;
>>
>
Oleksandr Tyshchenko Nov. 29, 2021, 3:58 p.m. UTC | #10
Hi Julien


[snip]


>
>>>
>>> ! Please note, there is still unresolved locking question here for 
>>> which
>>> I failed to find a suitable solution. So, it is still an RFC !
>>>
>>> According to the internal conversation:
>>> Now the GFN field in the struct page_info is accessed from
>>> gnttab_set_frame_gfn() in the grant table code and from 
>>> page_set_frame_gfn()
>>> in the P2M code (the former uses the latter).
>>>
>>> We need to prevent the concurrent access to this field. But, we 
>>> cannot grab
>>> the grant lock from the P2M code because we will introduce a lock 
>>> inversion.
>>> The page_set_frame_gfn() will be called from the P2M code with the 
>>> p2m lock held
>>> and then acquire the grant table lock. The gnttab_map_frame() will 
>>> do the inverse.
>>
>> This is a tricky one. I think, we will:
>>
>>   1) Need to use the P2M lock to protect the access to the GFN in the 
>> struct page_info *.
>>   2) Defer the call to page_set_frame_gfn() from gnttab_map_frame() 
>> to xenmem_add_to_physmap_one()
>>   3) In xenmem_add_to_physmap_one() hold the P2M lock while checking 
>> the page was not already mapped (e.g. page_get_frame_gfn() == 
>> INVALID_GFN) and do the mapping. Call page_set_frame_gfn() on success.
>>
>> This would still allow the guest to shot itself in the foot (e.g. 
>> potentially removing the wrong mapping) if it tries concurrent 
>> hypercall but I believe we would not introduce issue like XSA-380.
>>
>> At the end this would look quite similar to how x86 deal with the M2P 
>> update.
>
> Thank you for the suggestion, I need to analyze the code to better 
> understand your idea and technical possibility to implement it, I will 
> come up with questions if any.

I experimented a bit... Could you please clarify, is the code snippet 
below is close to what you meant?


diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b594db4..dba9258 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1488,8 +1488,27 @@ int xenmem_add_to_physmap_one(
          return -ENOSYS;
      }

-    /* Map at new location. */
-    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
+    if ( space != XENMAPSPACE_grant_table )
+        /* Map at new location. */
+        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
+    else
+    {
+#ifdef CONFIG_GRANT_TABLE
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        p2m_write_lock(p2m);
+        if ( gfn_eq(page_get_frame_gfn(page), INVALID_GFN) )
+        {
+            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
+            if ( !rc )
+                page_set_frame_gfn(page, gfn);
+        }
+        p2m_write_unlock(p2m);
+#else
+        ASSERT_UNREACHABLE();
+        rc = -EINVAL;
+#endif
+    }

      /*
       * For XENMAPSPACE_gmfn_foreign if we failed to add the mapping, 
we need
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 59604b1..64e9e77 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -4167,10 +4167,8 @@ int gnttab_map_frame(struct domain *d, unsigned 
long idx, gfn_t gfn, mfn_t *mfn)
           * Make sure gnttab_unpopulate_status_frames() won't 
(successfully)
           * free the page until our caller has completed its operation.
           */
-        if ( get_page(mfn_to_page(*mfn), d) )
-            gnttab_set_frame_gfn(gt, status, idx, gfn);
-        else
-            rc = -EBUSY;
+        if ( !get_page(mfn_to_page(*mfn), d) )
+           rc = -EBUSY;
      }

      grant_write_unlock(gt);
(END)

If yes *and* I correctly understand the code, then looks like 
gnttab_set_frame_gfn becomes useless on Arm and can be dropped globally 
(x86's variant is already dummy).


[snip]
Julien Grall Dec. 1, 2021, 4:22 p.m. UTC | #11
On 29/11/2021 15:58, Oleksandr wrote:
> 
> Hi Julien

Hi,

> 
> 
> [snip]
> 
> 
>>
>>>>
>>>> ! Please note, there is still unresolved locking question here for 
>>>> which
>>>> I failed to find a suitable solution. So, it is still an RFC !
>>>>
>>>> According to the internal conversation:
>>>> Now the GFN field in the struct page_info is accessed from
>>>> gnttab_set_frame_gfn() in the grant table code and from 
>>>> page_set_frame_gfn()
>>>> in the P2M code (the former uses the latter).
>>>>
>>>> We need to prevent the concurrent access to this field. But, we 
>>>> cannot grab
>>>> the grant lock from the P2M code because we will introduce a lock 
>>>> inversion.
>>>> The page_set_frame_gfn() will be called from the P2M code with the 
>>>> p2m lock held
>>>> and then acquire the grant table lock. The gnttab_map_frame() will 
>>>> do the inverse.
>>>
>>> This is a tricky one. I think, we will:
>>>
>>>   1) Need to use the P2M lock to protect the access to the GFN in the 
>>> struct page_info *.
>>>   2) Defer the call to page_set_frame_gfn() from gnttab_map_frame() 
>>> to xenmem_add_to_physmap_one()
>>>   3) In xenmem_add_to_physmap_one() hold the P2M lock while checking 
>>> the page was not already mapped (e.g. page_get_frame_gfn() == 
>>> INVALID_GFN) and do the mapping. Call page_set_frame_gfn() on success.
>>>
>>> This would still allow the guest to shot itself in the foot (e.g. 
>>> potentially removing the wrong mapping) if it tries concurrent 
>>> hypercall but I believe we would not introduce issue like XSA-380.
>>>
>>> At the end this would look quite similar to how x86 deal with the M2P 
>>> update.
>>
>> Thank you for the suggestion, I need to analyze the code to better 
>> understand your idea and technical possibility to implement it, I will 
>> come up with questions if any.
> 
> I experimented a bit... Could you please clarify, is the code snippet 
> below is close to what you meant?

It is close to what I had in my mind. A few comments below.

> 
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index b594db4..dba9258 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1488,8 +1488,27 @@ int xenmem_add_to_physmap_one(
>           return -ENOSYS;
>       }
> 
> -    /* Map at new location. */
> -    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
> +    if ( space != XENMAPSPACE_grant_table )

I would consider to use this approach for any xenheap pages.

> +        /* Map at new location. */
> +        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
> +    else
> +    {
> +#ifdef CONFIG_GRANT_TABLE
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        p2m_write_lock(p2m);
> +        if ( gfn_eq(page_get_frame_gfn(page), INVALID_GFN) )

I think we want to return an error if page_get_frame_gfn() return a 
valid GFN.

> +        {
> +            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, p2m->default_access);
> +            if ( !rc )
> +                page_set_frame_gfn(page, gfn);
> +        }
> +        p2m_write_unlock(p2m);
> +#else
> +        ASSERT_UNREACHABLE();
> +        rc = -EINVAL;
> +#endif
> +    }
> 
>       /*
>        * For XENMAPSPACE_gmfn_foreign if we failed to add the mapping, 
> we need
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 59604b1..64e9e77 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4167,10 +4167,8 @@ int gnttab_map_frame(struct domain *d, unsigned 
> long idx, gfn_t gfn, mfn_t *mfn)
>            * Make sure gnttab_unpopulate_status_frames() won't 
> (successfully)
>            * free the page until our caller has completed its operation.
>            */
> -        if ( get_page(mfn_to_page(*mfn), d) )
> -            gnttab_set_frame_gfn(gt, status, idx, gfn);
> -        else
> -            rc = -EBUSY;
> +        if ( !get_page(mfn_to_page(*mfn), d) )
> +           rc = -EBUSY;
>       }
> 
>       grant_write_unlock(gt);
> (END)
> 
> If yes *and* I correctly understand the code, then looks like 
> gnttab_set_frame_gfn becomes useless on Arm and can be dropped globally 
> (x86's variant is already dummy).

It will not be a dummy version soon see [1].

Cheers,

[1] 
https://lore.kernel.org/xen-devel/8b73ff7c-4dd6-ff2e-14b9-088fdce0beb9@suse.com/
Julien Grall Dec. 1, 2021, 4:32 p.m. UTC | #12
Hi Oleksandr,

On 26/11/2021 13:51, Oleksandr wrote:
> On 25.11.21 21:04, Julien Grall wrote:
>>>   {
>>> +    mfn_t mfn = lpae_get_mfn(pte);
>>> +
>>>       ASSERT(p2m_is_valid(pte));
>>>         /*
>>> @@ -731,11 +733,22 @@ static void p2m_put_l3_page(const lpae_t pte)
>>>        */
>>>       if ( p2m_is_foreign(pte.p2m.type) )
>>>       {
>>> -        mfn_t mfn = lpae_get_mfn(pte);
>>> -
>>>           ASSERT(mfn_valid(mfn));
>>>           put_page(mfn_to_page(mfn));
>>>       }
>>> +
>>> +#ifdef CONFIG_GRANT_TABLE
>>> +    /*
>>> +     * Check whether we deal with grant table page. As the grant 
>>> table page
>>> +     * is xen_heap page and its entry has known p2m type, detect it 
>>> and mark
>>> +     * the stored GFN as invalid. Although this check is not precise 
>>> and we
>>> +     * might end up updating this for other xen_heap pages, this 
>>> action is
>>> +     * harmless to these pages since only grant table pages have 
>>> this field
>>> +     * in use. So, at worst, unnecessary action might be performed.
>>> +     */
>>> +    if ( (pte.p2m.type == p2m_ram_rw) && is_xen_heap_mfn(mfn) )
>>
>> I would use p2m_is_ram() to cover read-only mapping. I think it would 
>> also be better to use an ``else if`` so it is clear that this doesn't 
>> cover foreign mapping (it is possible to map xenheap page from another 
>> domain).
> 
> ok, will use p2m_is_ram() and ``else if`` construct, however I don't 
> entirely understand why we also want/need to include read-only pages (as 
> type is set to p2m_ram_rw in xenmem_add_to_physmap_one() for case 
> XENMAPSPACE_grant_table)?

Most of this code is already ready to be used by other xenheap pages 
(see other part of the discussion). So I would like to use p2m_is_ram()
as it reduces the risk of us forgetting that read-only are not handled.
  [...]

>>> + page_get_frame_gfn(pg_); \
>>> +})
>>>     #define gnttab_need_iommu_mapping(d)                    \
>>>       (is_domain_direct_mapped(d) && is_iommu_enabled(d))
>>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
>>> index 7b5e7b7..a00c5f5 100644
>>> --- a/xen/include/asm-arm/mm.h
>>> +++ b/xen/include/asm-arm/mm.h
>>> @@ -98,9 +98,17 @@ struct page_info
>>>   #define PGT_writable_page PG_mask(1, 1)  /* has writable 
>>> mappings?         */
>>>   #define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 
>>> 63.                 */
>>>   - /* Count of uses of this frame as its current type. */
>>> -#define PGT_count_width   PG_shift(2)
>>> -#define PGT_count_mask    ((1UL<<PGT_count_width)-1)
>>> + /* 2-bit count of uses of this frame as its current type. */
>>> +#define PGT_count_mask    PG_mask(3, 3)
>>> +
>>> +/*
>>> + * Stored in bits [28:0] or [60:0] GFN if page is used for grant 
>>> table frame.
>>
>> I think this wording is conflicting with ...
>>
>>> + * This only valid for the xenheap pages.
>>
>> ... this becase xen heap pages are used in other situations. But I 
>> would prefer if the comment doesn't mention grant-table frame. This 
>> would allow use to repurpose the field for other xenheap if needed.
>>
>> Typo: This *is* only valid
> 
> 
> ok, so how about to simply mention it's purpose as xenheap GFN here and 
> down this header?
> 
> For example,
> Stored in bits [28:0] or [60:0] GFN if page is xenheap page.
> 
> BTW, shall I rename the access helpers page_set(get)_frame_gfn() as 
> well? For me the frame is associated with grant-table.
> Something to: page_set(get)_xenheap_gfn() or even page_set(get)_gfn().

Naming them to page_{set, get)_xenheap_gfn() sounds like a good idea.
It would be clearer what is the purpose of the two helpers.

>>> +#define arch_alloc_xenheap_page(p) page_set_frame_gfn(p, INVALID_GFN)
>>> +#define arch_free_xenheap_page(p) \
>>> +    BUG_ON(!gfn_eq(page_get_frame_gfn(p), INVALID_GFN))
>>
>> I believe this BUG_ON() could be triggered if gnttab_map_frame() 
>> succeeds but then we fail to insert the entry in the P2M. This means 
>> we would need to revert changes done in gnttab_map_frame() in case of 
>> failure.
>>
>> However, I am still a bit unease with the BUG_ON(). A domain will not 
>> necessarily remove the grant-table mapping from its P2M before 
>> shutting down. So you are relying on Xen to go through the P2M before 
>> we free the page.
>>
>> This is the case today, but I am not sure we would want to rely on it 
>> because it will be hard to remember this requirement if we decide to 
>> optimize p2m_relinquish().
>>
>> One possibility would be to add a comment in p2m_relinquish(). That's 
>> assuming there are no other places which could lead to false 
>> positively hit the BUG().
> 
> These make me think that it would be better (safer and simpler) to just 
> remove this BUG_ON() for now. Do you agree?
I would drop the BUG_ON() here.

Cheers,
Oleksandr Tyshchenko Dec. 1, 2021, 4:49 p.m. UTC | #13
On 01.12.21 18:32, Julien Grall wrote:
> Hi Oleksandr,


Hi Julien


>
> On 26/11/2021 13:51, Oleksandr wrote:
>> On 25.11.21 21:04, Julien Grall wrote:
>>>>   {
>>>> +    mfn_t mfn = lpae_get_mfn(pte);
>>>> +
>>>>       ASSERT(p2m_is_valid(pte));
>>>>         /*
>>>> @@ -731,11 +733,22 @@ static void p2m_put_l3_page(const lpae_t pte)
>>>>        */
>>>>       if ( p2m_is_foreign(pte.p2m.type) )
>>>>       {
>>>> -        mfn_t mfn = lpae_get_mfn(pte);
>>>> -
>>>>           ASSERT(mfn_valid(mfn));
>>>>           put_page(mfn_to_page(mfn));
>>>>       }
>>>> +
>>>> +#ifdef CONFIG_GRANT_TABLE
>>>> +    /*
>>>> +     * Check whether we deal with grant table page. As the grant 
>>>> table page
>>>> +     * is xen_heap page and its entry has known p2m type, detect 
>>>> it and mark
>>>> +     * the stored GFN as invalid. Although this check is not 
>>>> precise and we
>>>> +     * might end up updating this for other xen_heap pages, this 
>>>> action is
>>>> +     * harmless to these pages since only grant table pages have 
>>>> this field
>>>> +     * in use. So, at worst, unnecessary action might be performed.
>>>> +     */
>>>> +    if ( (pte.p2m.type == p2m_ram_rw) && is_xen_heap_mfn(mfn) )
>>>
>>> I would use p2m_is_ram() to cover read-only mapping. I think it 
>>> would also be better to use an ``else if`` so it is clear that this 
>>> doesn't cover foreign mapping (it is possible to map xenheap page 
>>> from another domain).
>>
>> ok, will use p2m_is_ram() and ``else if`` construct, however I don't 
>> entirely understand why we also want/need to include read-only pages 
>> (as type is set to p2m_ram_rw in xenmem_add_to_physmap_one() for case 
>> XENMAPSPACE_grant_table)?
>
> Most of this code is already ready to be used by other xenheap pages 
> (see other part of the discussion). So I would like to use p2m_is_ram()
> as it reduces the risk of us forgetting that read-only are not handled.

ok, got it, thank you for the explanation.


>
>  [...]
>
>>>> + page_get_frame_gfn(pg_); \
>>>> +})
>>>>     #define gnttab_need_iommu_mapping(d) \
>>>>       (is_domain_direct_mapped(d) && is_iommu_enabled(d))
>>>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
>>>> index 7b5e7b7..a00c5f5 100644
>>>> --- a/xen/include/asm-arm/mm.h
>>>> +++ b/xen/include/asm-arm/mm.h
>>>> @@ -98,9 +98,17 @@ struct page_info
>>>>   #define PGT_writable_page PG_mask(1, 1)  /* has writable 
>>>> mappings?         */
>>>>   #define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 
>>>> 63.                 */
>>>>   - /* Count of uses of this frame as its current type. */
>>>> -#define PGT_count_width   PG_shift(2)
>>>> -#define PGT_count_mask    ((1UL<<PGT_count_width)-1)
>>>> + /* 2-bit count of uses of this frame as its current type. */
>>>> +#define PGT_count_mask    PG_mask(3, 3)
>>>> +
>>>> +/*
>>>> + * Stored in bits [28:0] or [60:0] GFN if page is used for grant 
>>>> table frame.
>>>
>>> I think this wording is conflicting with ...
>>>
>>>> + * This only valid for the xenheap pages.
>>>
>>> ... this becase xen heap pages are used in other situations. But I 
>>> would prefer if the comment doesn't mention grant-table frame. This 
>>> would allow use to repurpose the field for other xenheap if needed.
>>>
>>> Typo: This *is* only valid
>>
>>
>> ok, so how about to simply mention it's purpose as xenheap GFN here 
>> and down this header?
>>
>> For example,
>> Stored in bits [28:0] or [60:0] GFN if page is xenheap page.
>>
>> BTW, shall I rename the access helpers page_set(get)_frame_gfn() as 
>> well? For me the frame is associated with grant-table.
>> Something to: page_set(get)_xenheap_gfn() or even page_set(get)_gfn().
>
> Naming them to page_{set, get)_xenheap_gfn() sounds like a good idea.
> It would be clearer what is the purpose of the two helpers.

agree, will rename


>
>
>>>> +#define arch_alloc_xenheap_page(p) page_set_frame_gfn(p, INVALID_GFN)
>>>> +#define arch_free_xenheap_page(p) \
>>>> +    BUG_ON(!gfn_eq(page_get_frame_gfn(p), INVALID_GFN))
>>>
>>> I believe this BUG_ON() could be triggered if gnttab_map_frame() 
>>> succeeds but then we fail to insert the entry in the P2M. This means 
>>> we would need to revert changes done in gnttab_map_frame() in case 
>>> of failure.
>>>
>>> However, I am still a bit unease with the BUG_ON(). A domain will 
>>> not necessarily remove the grant-table mapping from its P2M before 
>>> shutting down. So you are relying on Xen to go through the P2M 
>>> before we free the page.
>>>
>>> This is the case today, but I am not sure we would want to rely on 
>>> it because it will be hard to remember this requirement if we decide 
>>> to optimize p2m_relinquish().
>>>
>>> One possibility would be to add a comment in p2m_relinquish(). 
>>> That's assuming there are no other places which could lead to false 
>>> positively hit the BUG().
>>
>> These make me think that it would be better (safer and simpler) to 
>> just remove this BUG_ON() for now. Do you agree?
> I would drop the BUG_ON() here.

ok, will drop. So arch_free_xenheap_page will become dummy on both Arm 
and x86, the question is should we retain it with all call sites in 
free_xenheap_pages() variants?


>
>
> Cheers,
>
Oleksandr Tyshchenko Dec. 1, 2021, 6 p.m. UTC | #14
On 01.12.21 18:22, Julien Grall wrote:
>
>
> On 29/11/2021 15:58, Oleksandr wrote:
>>
>> Hi Julien
>
> Hi,


Hi Julien


>
>>
>>
>> [snip]
>>
>>
>>>
>>>>>
>>>>> ! Please note, there is still unresolved locking question here for 
>>>>> which
>>>>> I failed to find a suitable solution. So, it is still an RFC !
>>>>>
>>>>> According to the internal conversation:
>>>>> Now the GFN field in the struct page_info is accessed from
>>>>> gnttab_set_frame_gfn() in the grant table code and from 
>>>>> page_set_frame_gfn()
>>>>> in the P2M code (the former uses the latter).
>>>>>
>>>>> We need to prevent the concurrent access to this field. But, we 
>>>>> cannot grab
>>>>> the grant lock from the P2M code because we will introduce a lock 
>>>>> inversion.
>>>>> The page_set_frame_gfn() will be called from the P2M code with the 
>>>>> p2m lock held
>>>>> and then acquire the grant table lock. The gnttab_map_frame() will 
>>>>> do the inverse.
>>>>
>>>> This is a tricky one. I think, we will:
>>>>
>>>>   1) Need to use the P2M lock to protect the access to the GFN in 
>>>> the struct page_info *.
>>>>   2) Defer the call to page_set_frame_gfn() from gnttab_map_frame() 
>>>> to xenmem_add_to_physmap_one()
>>>>   3) In xenmem_add_to_physmap_one() hold the P2M lock while 
>>>> checking the page was not already mapped (e.g. page_get_frame_gfn() 
>>>> == INVALID_GFN) and do the mapping. Call page_set_frame_gfn() on 
>>>> success.
>>>>
>>>> This would still allow the guest to shot itself in the foot (e.g. 
>>>> potentially removing the wrong mapping) if it tries concurrent 
>>>> hypercall but I believe we would not introduce issue like XSA-380.
>>>>
>>>> At the end this would look quite similar to how x86 deal with the 
>>>> M2P update.
>>>
>>> Thank you for the suggestion, I need to analyze the code to better 
>>> understand your idea and technical possibility to implement it, I 
>>> will come up with questions if any.
>>
>> I experimented a bit... Could you please clarify, is the code snippet 
>> below is close to what you meant?
>
> It is close to what I had in my mind. 

Great, thank you for the clarifying.


> A few comments below.
>
>>
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index b594db4..dba9258 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -1488,8 +1488,27 @@ int xenmem_add_to_physmap_one(
>>           return -ENOSYS;
>>       }
>>
>> -    /* Map at new location. */
>> -    rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>> +    if ( space != XENMAPSPACE_grant_table )
>
> I would consider to use this approach for any xenheap pages.


Well, I think (from the first look) we need to:

1. Drop #ifdef CONFIG_GRANT_TABLE down this function and in 
p2m_put_l3_page()

2. Replace "space != XENMAPSPACE_grant_table" with 
"!is_xen_heap_mfn(mfn)" above.

Is my understanding correct?


>
>
>> +        /* Map at new location. */
>> +        rc = guest_physmap_add_entry(d, gfn, mfn, 0, t);
>> +    else
>> +    {
>> +#ifdef CONFIG_GRANT_TABLE
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        p2m_write_lock(p2m);
>> +        if ( gfn_eq(page_get_frame_gfn(page), INVALID_GFN) )
>
> I think we want to return an error if page_get_frame_gfn() return a 
> valid GFN.

ok, I assume -EBUSY would be appropriate


>
>
>> +        {
>> +            rc = p2m_set_entry(p2m, gfn, 1, mfn, t, 
>> p2m->default_access);
>> +            if ( !rc )
>> +                page_set_frame_gfn(page, gfn);
>> +        }
>> +        p2m_write_unlock(p2m);
>> +#else
>> +        ASSERT_UNREACHABLE();
>> +        rc = -EINVAL;
>> +#endif
>> +    }
>>
>>       /*
>>        * For XENMAPSPACE_gmfn_foreign if we failed to add the 
>> mapping, we need
>> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
>> index 59604b1..64e9e77 100644
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4167,10 +4167,8 @@ int gnttab_map_frame(struct domain *d, 
>> unsigned long idx, gfn_t gfn, mfn_t *mfn)
>>            * Make sure gnttab_unpopulate_status_frames() won't 
>> (successfully)
>>            * free the page until our caller has completed its operation.
>>            */
>> -        if ( get_page(mfn_to_page(*mfn), d) )
>> -            gnttab_set_frame_gfn(gt, status, idx, gfn);
>> -        else
>> -            rc = -EBUSY;
>> +        if ( !get_page(mfn_to_page(*mfn), d) )
>> +           rc = -EBUSY;
>>       }
>>
>>       grant_write_unlock(gt);
>> (END)
>>
>> If yes *and* I correctly understand the code, then looks like 
>> gnttab_set_frame_gfn becomes useless on Arm and can be dropped 
>> globally (x86's variant is already dummy).
>
> It will not be a dummy version soon see [1].

Indeed, I just remembered about this patch, ok.


>
> Cheers,
>
> [1] 
> https://lore.kernel.org/xen-devel/8b73ff7c-4dd6-ff2e-14b9-088fdce0beb9@suse.com/
>
Stefano Stabellini Dec. 9, 2021, 9:11 p.m. UTC | #15
On Thu, 25 Nov 2021, Julien Grall wrote:
> For the record, I actually considered whether it is worth to fully implement
> an M2P on Arm. We technically have space in the struct page_info for that.
> However, I don't see it necessary in other place of Xen, so I would prefer to
> keep the space free for other purpose (or event be able to remove it).
> 
> @Stefano, what do you think?

Sorry for the late reply, I am still catching up after my holidays.

I agree with you here.
diff mbox series

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index eea926d..b1e42e5 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1376,14 +1376,18 @@  unsigned long domain_get_maximum_gpfn(struct domain *d)
 void share_xen_page_with_guest(struct page_info *page, struct domain *d,
                                enum XENSHARE_flags flags)
 {
+    unsigned long type_info;
+
     if ( page_get_owner(page) == d )
         return;
 
     spin_lock(&d->page_alloc_lock);
 
     /* The incremented type count pins as writable or read-only. */
-    page->u.inuse.type_info =
-        (flags == SHARE_ro ? PGT_none : PGT_writable_page) | 1;
+    type_info = page->u.inuse.type_info & ~(PGT_type_mask | PGT_count_mask);
+    page->u.inuse.type_info = type_info |
+        (flags == SHARE_ro ? PGT_none : PGT_writable_page) |
+        MASK_INSR(1, PGT_count_mask);
 
     page_set_owner(page, d);
     smp_wmb(); /* install valid domain ptr before updating refcnt. */
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 8b20b43..7a8d92d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -718,8 +718,10 @@  static int p2m_mem_access_radix_set(struct p2m_domain *p2m, gfn_t gfn,
  * TODO: Handle superpages, for now we only take special references for leaf
  * pages (specifically foreign ones, which can't be super mapped today).
  */
-static void p2m_put_l3_page(const lpae_t pte)
+static void p2m_put_l3_page(struct p2m_domain *p2m, const lpae_t pte)
 {
+    mfn_t mfn = lpae_get_mfn(pte);
+
     ASSERT(p2m_is_valid(pte));
 
     /*
@@ -731,11 +733,22 @@  static void p2m_put_l3_page(const lpae_t pte)
      */
     if ( p2m_is_foreign(pte.p2m.type) )
     {
-        mfn_t mfn = lpae_get_mfn(pte);
-
         ASSERT(mfn_valid(mfn));
         put_page(mfn_to_page(mfn));
     }
+
+#ifdef CONFIG_GRANT_TABLE
+    /*
+     * Check whether we deal with grant table page. As the grant table page
+     * is xen_heap page and its entry has known p2m type, detect it and mark
+     * the stored GFN as invalid. Although this check is not precise and we
+     * might end up updating this for other xen_heap pages, this action is
+     * harmless to these pages since only grant table pages have this field
+     * in use. So, at worst, unnecessary action might be performed.
+     */
+    if ( (pte.p2m.type == p2m_ram_rw) && is_xen_heap_mfn(mfn) )
+        page_set_frame_gfn(mfn_to_page(mfn), INVALID_GFN);
+#endif
 }
 
 /* Free lpae sub-tree behind an entry */
@@ -768,7 +781,7 @@  static void p2m_free_entry(struct p2m_domain *p2m,
         p2m->stats.mappings[level]--;
         /* Nothing to do if the entry is a super-page. */
         if ( level == 3 )
-            p2m_put_l3_page(entry);
+            p2m_put_l3_page(p2m, entry);
         return;
     }
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index fe1fc11..08fc827 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -93,8 +93,6 @@  struct grant_table {
 
     /* Domain to which this struct grant_table belongs. */
     const struct domain *domain;
-
-    struct grant_table_arch arch;
 };
 
 unsigned int __read_mostly opt_max_grant_frames = 64;
@@ -1981,14 +1979,9 @@  int grant_table_init(struct domain *d, int max_grant_frames,
 
     grant_write_lock(gt);
 
-    ret = gnttab_init_arch(gt);
-    if ( ret )
-        goto unlock;
-
     /* gnttab_grow_table() allocates a min number of frames, so 0 is okay. */
     ret = gnttab_grow_table(d, 0);
 
- unlock:
     grant_write_unlock(gt);
 
  out:
@@ -3894,8 +3887,6 @@  grant_table_destroy(
     if ( t == NULL )
         return;
 
-    gnttab_destroy_arch(t);
-
     for ( i = 0; i < nr_grant_frames(t); i++ )
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5801358..aafd847 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2161,6 +2161,7 @@  void init_xenheap_pages(paddr_t ps, paddr_t pe)
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
     struct page_info *pg;
+    unsigned int i;
 
     ASSERT(!in_irq());
 
@@ -2169,6 +2170,9 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( unlikely(pg == NULL) )
         return NULL;
 
+    for ( i = 0; i < (1u << order); i++ )
+        arch_alloc_xenheap_page(&pg[i]);
+
     memguard_unguard_range(page_to_virt(pg), 1 << (order + PAGE_SHIFT));
 
     return page_to_virt(pg);
@@ -2177,14 +2181,22 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
+    struct page_info *pg;
+    unsigned int i;
+
     ASSERT(!in_irq());
 
     if ( v == NULL )
         return;
 
+    pg = virt_to_page(v);
+
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order, false);
+    for ( i = 0; i < (1u << order); i++ )
+        arch_free_xenheap_page(&pg[i]);
+
+    free_heap_pages(pg, order, false);
 }
 
 #else  /* !CONFIG_SEPARATE_XENHEAP */
@@ -2220,7 +2232,10 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
         return NULL;
 
     for ( i = 0; i < (1u << order); i++ )
+    {
         pg[i].count_info |= PGC_xen_heap;
+        arch_alloc_xenheap_page(&pg[i]);
+    }
 
     return page_to_virt(pg);
 }
@@ -2238,7 +2253,10 @@  void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
+    {
         pg[i].count_info &= ~PGC_xen_heap;
+        arch_free_xenheap_page(&pg[i]);
+    }
 
     free_heap_pages(pg, order, true);
 }
diff --git a/xen/include/asm-arm/grant_table.h b/xen/include/asm-arm/grant_table.h
index 0ce77f9..479339d 100644
--- a/xen/include/asm-arm/grant_table.h
+++ b/xen/include/asm-arm/grant_table.h
@@ -11,11 +11,6 @@ 
 #define INITIAL_NR_GRANT_FRAMES 1U
 #define GNTTAB_MAX_VERSION 1
 
-struct grant_table_arch {
-    gfn_t *shared_gfn;
-    gfn_t *status_gfn;
-};
-
 static inline void gnttab_clear_flags(struct domain *d,
                                       unsigned int mask, uint16_t *addr)
 {
@@ -46,35 +41,11 @@  int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
 #define gnttab_dom0_frames()                                             \
     min_t(unsigned int, opt_max_grant_frames, PFN_DOWN(_etext - _stext))
 
-#define gnttab_init_arch(gt)                                             \
-({                                                                       \
-    unsigned int ngf_ = (gt)->max_grant_frames;                          \
-    unsigned int nsf_ = grant_to_status_frames(ngf_);                    \
-                                                                         \
-    (gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_);                  \
-    (gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_);                  \
-    if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn )                \
-    {                                                                    \
-        while ( ngf_-- )                                                 \
-            (gt)->arch.shared_gfn[ngf_] = INVALID_GFN;                   \
-        while ( nsf_-- )                                                 \
-            (gt)->arch.status_gfn[nsf_] = INVALID_GFN;                   \
-    }                                                                    \
-    else                                                                 \
-        gnttab_destroy_arch(gt);                                         \
-    (gt)->arch.shared_gfn ? 0 : -ENOMEM;                                 \
-})
-
-#define gnttab_destroy_arch(gt)                                          \
-    do {                                                                 \
-        XFREE((gt)->arch.shared_gfn);                                    \
-        XFREE((gt)->arch.status_gfn);                                    \
-    } while ( 0 )
-
 #define gnttab_set_frame_gfn(gt, st, idx, gfn)                           \
     do {                                                                 \
-        ((st) ? (gt)->arch.status_gfn : (gt)->arch.shared_gfn)[idx] =    \
-            (gfn);                                                       \
+        struct page_info *pg_ = (st) ? gnttab_status_page(gt, idx)       \
+                                     : gnttab_shared_page(gt, idx);      \
+        page_set_frame_gfn(pg_, gfn);                                    \
     } while ( 0 )
 
 #define gnttab_get_frame_gfn(gt, st, idx) ({                             \
@@ -82,11 +53,25 @@  int replace_grant_host_mapping(unsigned long gpaddr, mfn_t mfn,
         : gnttab_shared_gfn(NULL, gt, idx);                              \
 })
 
-#define gnttab_shared_gfn(d, t, i)                                       \
-    (((i) >= nr_grant_frames(t)) ? INVALID_GFN : (t)->arch.shared_gfn[i])
+#define gnttab_shared_page(t, i) ({                                      \
+    ASSERT((t)->shared_raw[i]);                                          \
+    mfn_to_page(_mfn(__virt_to_mfn((t)->shared_raw[i])));                \
+})
+
+#define gnttab_status_page(t, i) ({                                      \
+    ASSERT((t)->status[i]);                                              \
+    mfn_to_page(_mfn(__virt_to_mfn((t)->status[i])));                    \
+})
 
-#define gnttab_status_gfn(d, t, i)                                       \
-    (((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i])
+#define gnttab_shared_gfn(d, t, i) ({                                    \
+    struct page_info *pg_ = gnttab_shared_page(t, i);                    \
+    page_get_frame_gfn(pg_);                                             \
+})
+
+#define gnttab_status_gfn(d, t, i) ({                                    \
+    struct page_info *pg_ = gnttab_status_page(t, i);                    \
+    page_get_frame_gfn(pg_);                                             \
+})
 
 #define gnttab_need_iommu_mapping(d)                    \
     (is_domain_direct_mapped(d) && is_iommu_enabled(d))
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 7b5e7b7..a00c5f5 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -98,9 +98,17 @@  struct page_info
 #define PGT_writable_page PG_mask(1, 1)  /* has writable mappings?         */
 #define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 63.                 */
 
- /* Count of uses of this frame as its current type. */
-#define PGT_count_width   PG_shift(2)
-#define PGT_count_mask    ((1UL<<PGT_count_width)-1)
+ /* 2-bit count of uses of this frame as its current type. */
+#define PGT_count_mask    PG_mask(3, 3)
+
+/*
+ * Stored in bits [28:0] or [60:0] GFN if page is used for grant table frame.
+ * This only valid for the xenheap pages.
+ */
+#define PGT_gfn_width     PG_shift(3)
+#define PGT_gfn_mask      ((1UL<<PGT_gfn_width)-1)
+
+#define PGT_INVALID_FRAME_GFN   _gfn(PGT_gfn_mask)
 
  /* Cleared when the owning guest 'frees' this page. */
 #define _PGC_allocated    PG_shift(1)
@@ -166,6 +174,27 @@  extern unsigned long xenheap_base_pdx;
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
 
+#define page_get_frame_gfn(p) ({                                \
+    gfn_t gfn_ = _gfn((p)->u.inuse.type_info & PGT_gfn_mask);   \
+    gfn_eq(gfn_, PGT_INVALID_FRAME_GFN) ? INVALID_GFN : gfn_;   \
+})
+
+#define page_set_frame_gfn(p, gfn) ({                           \
+    gfn_t gfn_ = gfn_eq(gfn, INVALID_GFN) ?                     \
+                 PGT_INVALID_FRAME_GFN : gfn;                   \
+    (p)->u.inuse.type_info &= ~PGT_gfn_mask;                    \
+    (p)->u.inuse.type_info |= gfn_x(gfn_);                      \
+})
+
+/*
+ * As the struct page_info representing the xen_heap page can contain
+ * the grant table frame GFN on Arm we need to clear it beforehand and
+ * make sure it is not still set when freeing a page.
+ */
+#define arch_alloc_xenheap_page(p)   page_set_frame_gfn(p, INVALID_GFN)
+#define arch_free_xenheap_page(p) \
+    BUG_ON(!gfn_eq(page_get_frame_gfn(p), INVALID_GFN))
+
 #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 /* PDX of the first page in the frame table. */
 extern unsigned long frametable_base_pdx;
diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
index 84e3296..0eb018f 100644
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -14,9 +14,6 @@ 
 
 #define INITIAL_NR_GRANT_FRAMES 1U
 
-struct grant_table_arch {
-};
-
 static inline int create_grant_host_mapping(uint64_t addr, mfn_t frame,
                                             unsigned int flags,
                                             unsigned int cache_flags)
@@ -35,8 +32,6 @@  static inline int replace_grant_host_mapping(uint64_t addr, mfn_t frame,
     return replace_grant_pv_mapping(addr, frame, new_addr, flags);
 }
 
-#define gnttab_init_arch(gt) 0
-#define gnttab_destroy_arch(gt) do {} while ( 0 )
 #define gnttab_set_frame_gfn(gt, st, idx, gfn) do {} while ( 0 )
 #define gnttab_get_frame_gfn(gt, st, idx) ({                             \
     mfn_t mfn_ = (st) ? gnttab_status_mfn(gt, idx)                       \
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index cb90527..04d8704 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -327,6 +327,10 @@  struct page_info
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma))))
 
+/* No arch-specific actions are needed for the xen_heap page */
+#define arch_alloc_xenheap_page(p)   do {} while ( 0 )
+#define arch_free_xenheap_page(p)    do {} while ( 0 )
+
 #define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 extern unsigned long max_page;
 extern unsigned long total_pages;