[v7,03/15] x86/mm: rewrite virt_to_xen_l*e

Message ID	fd5d98198d9539b232a570a83e7a24be2407e739.1590750232.git.hongyxia@amazon.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=vhhj=7L=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7415D207D4 From: Hongyan Xia <hx242@xen.org> To: xen-devel@lists.xenproject.org Subject: [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e Date: Fri, 29 May 2020 12:11:47 +0100 Message-Id: <fd5d98198d9539b232a570a83e7a24be2407e739.1590750232.git.hongyxia@amazon.com> In-Reply-To: <cover.1590750232.git.hongyxia@amazon.com> References: <cover.1590750232.git.hongyxia@amazon.com> In-Reply-To: <cover.1590750232.git.hongyxia@amazon.com> References: <cover.1590750232.git.hongyxia@amazon.com> Precedence: list Cc: Stefano Stabellini <sstabellini@kernel.org>, julien@xen.org, Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, George Dunlap <george.dunlap@citrix.com>, Jan Beulich <jbeulich@suse.com>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com> Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	switch to domheap for Xen page tables \| expand [v7,00/15] switch to domheap for Xen page tables [v7,01/15] x86/mm: map_pages_to_xen would better have one exit path [v7,02/15] x86/mm: make sure there is one exit path for modify_xen_mappings [v7,03/15] x86/mm: rewrite virt_to_xen_l*e [v7,04/15] x86/mm: switch to new APIs in map_pages_to_xen [v7,05/15] x86/mm: switch to new APIs in modify_xen_mappings [v7,06/15] x86_64/mm: introduce pl2e in paging_init [v7,07/15] x86_64/mm: switch to new APIs in paging_init [v7,08/15] x86_64/mm: switch to new APIs in setup_m2p_table [v7,09/15] efi: use new page table APIs in copy_mapping [v7,10/15] efi: switch to new APIs in EFI code [v7,11/15] x86/smpboot: add exit path for clone_mapping() [v7,12/15] x86/smpboot: switch clone_mapping() to new APIs [v7,13/15] x86/mm: drop old page table APIs [v7,14/15] x86: switch to use domheap page for page tables [v7,15/15] x86/mm: drop _new suffix for page table APIs

Message ID

fd5d98198d9539b232a570a83e7a24be2407e739.1590750232.git.hongyxia@amazon.com (mailing list archive)

State

Superseded

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7415D207D4
From: Hongyan Xia <hx242@xen.org>
To: xen-devel@lists.xenproject.org
Subject: [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e
Date: Fri, 29 May 2020 12:11:47 +0100
Message-Id: 
 <fd5d98198d9539b232a570a83e7a24be2407e739.1590750232.git.hongyxia@amazon.com>
In-Reply-To: <cover.1590750232.git.hongyxia@amazon.com>
References: <cover.1590750232.git.hongyxia@amazon.com>
In-Reply-To: <cover.1590750232.git.hongyxia@amazon.com>
References: <cover.1590750232.git.hongyxia@amazon.com>
Precedence: list
Cc: Stefano Stabellini <sstabellini@kernel.org>, julien@xen.org,
 Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>,
 George Dunlap <george.dunlap@citrix.com>, Jan Beulich <jbeulich@suse.com>,
	=?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

switch to domheap for Xen page tables | expand

Commit Message

Hongyan Xia May 29, 2020, 11:11 a.m. UTC

From: Wei Liu <wei.liu2@citrix.com>

Rewrite those functions to use the new APIs. Modify its callers to unmap
the pointer returned. Since alloc_xen_pagetable_new() is almost never
useful unless accompanied by page clearing and a mapping, introduce a
helper alloc_map_clear_xen_pt() for this sequence.

Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to
unmap the page, which requires domain_page.h header in vmap.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- remove a comment.
- use l1e_get_mfn() instead of converting things back and forth.
- add alloc_map_clear_xen_pt().
- unmap before the next mapping to reduce mapcache pressure.
- use normal unmap calls instead of the macro in error paths because
  unmap can handle NULL now.
---
 xen/arch/x86/domain_page.c | 11 +++--
 xen/arch/x86/mm.c          | 96 +++++++++++++++++++++++++++-----------
 xen/common/vmap.c          |  1 +
 xen/include/asm-x86/mm.h   |  1 +
 xen/include/asm-x86/page.h |  8 +++-
 5 files changed, 86 insertions(+), 31 deletions(-)

Comments

Jan Beulich July 14, 2020, 10:47 a.m. UTC | #1

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Rewrite those functions to use the new APIs. Modify its callers to unmap
> the pointer returned. Since alloc_xen_pagetable_new() is almost never
> useful unless accompanied by page clearing and a mapping, introduce a
> helper alloc_map_clear_xen_pt() for this sequence.
> 
> Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to
> unmap the page, which requires domain_page.h header in vmap.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with two further small adjustments:

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn)
>          free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
>  }
>  
> +void *alloc_map_clear_xen_pt(mfn_t *pmfn)
> +{
> +    mfn_t mfn = alloc_xen_pagetable_new();
> +    void *ret;
> +
> +    if ( mfn_eq(mfn, INVALID_MFN) )
> +        return NULL;
> +
> +    if ( pmfn )
> +        *pmfn = mfn;
> +    ret = map_domain_page(mfn);
> +    clear_page(ret);
> +
> +    return ret;
> +}
> +
>  static DEFINE_SPINLOCK(map_pgdir_lock);
>  
> +/*
> + * For virt_to_xen_lXe() functions, they take a virtual address and return a
> + * pointer to Xen's LX entry. Caller needs to unmap the pointer.
> + */
>  static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)

May I suggest s/virtual/linear/ to at least make the new comment
correct?

> --- a/xen/include/asm-x86/page.h
> +++ b/xen/include/asm-x86/page.h
> @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *);
>  #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
>  #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
>  #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
> -#define vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va))))
> +
> +#define vmap_to_mfn(va) ({                                                  \
> +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va));   \
> +        mfn_t mfn_ = l1e_get_mfn(*pl1e_);                                   \
> +        unmap_domain_page(pl1e_);                                           \
> +        mfn_; })

Just like is already the case in domain_page_map_to_mfn() I think
you want to add "BUG_ON(!pl1e)" here to limit the impact of any
problem to DoS (rather than a possible privilege escalation).

Or actually, considering the only case where virt_to_xen_l1e()
would return NULL, returning INVALID_MFN here would likely be
even more robust. There looks to be just a single caller, which
would need adjusting to cope with an error coming back. In fact -
it already ASSERT()s, despite NULL right now never coming back
from vmap_to_page(). I think the loop there would better be

    for ( i = 0; i < pages; i++ )
    {
        struct page_info *page = vmap_to_page(va + i * PAGE_SIZE);

        if ( page )
            page_list_add(page, &pg_list);
        else
            printk_once(...);
    }

Thoughts?

Jan

Hongyan Xia July 27, 2020, 9:09 a.m. UTC | #2

On Tue, 2020-07-14 at 12:47 +0200, Jan Beulich wrote:
> On 29.05.2020 13:11, Hongyan Xia wrote:
> > From: Wei Liu <wei.liu2@citrix.com>
> > 
> > Rewrite those functions to use the new APIs. Modify its callers to
> > unmap
> > the pointer returned. Since alloc_xen_pagetable_new() is almost
> > never
> > useful unless accompanied by page clearing and a mapping, introduce
> > a
> > helper alloc_map_clear_xen_pt() for this sequence.
> > 
> > Note that the change of virt_to_xen_l1e() also requires
> > vmap_to_mfn() to
> > unmap the page, which requires domain_page.h header in vmap.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with two further small adjustments:
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn)
> >          free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
> >  }
> >  
> > +void *alloc_map_clear_xen_pt(mfn_t *pmfn)
> > +{
> > +    mfn_t mfn = alloc_xen_pagetable_new();
> > +    void *ret;
> > +
> > +    if ( mfn_eq(mfn, INVALID_MFN) )
> > +        return NULL;
> > +
> > +    if ( pmfn )
> > +        *pmfn = mfn;
> > +    ret = map_domain_page(mfn);
> > +    clear_page(ret);
> > +
> > +    return ret;
> > +}
> > +
> >  static DEFINE_SPINLOCK(map_pgdir_lock);
> >  
> > +/*
> > + * For virt_to_xen_lXe() functions, they take a virtual address
> > and return a
> > + * pointer to Xen's LX entry. Caller needs to unmap the pointer.
> > + */
> >  static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
> 
> May I suggest s/virtual/linear/ to at least make the new comment
> correct?
> 
> > --- a/xen/include/asm-x86/page.h
> > +++ b/xen/include/asm-x86/page.h
> > @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *);
> >  #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
> >  #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
> >  #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
> > -#define
> > vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned
> > long)(va))))
> > +
> > +#define vmap_to_mfn(va)
> > ({                                                  \
> > +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned
> > long)(va));   \
> > +        mfn_t mfn_ =
> > l1e_get_mfn(*pl1e_);                                   \
> > +        unmap_domain_page(pl1e_);                                 
> >           \
> > +        mfn_; })
> 
> Just like is already the case in domain_page_map_to_mfn() I think
> you want to add "BUG_ON(!pl1e)" here to limit the impact of any
> problem to DoS (rather than a possible privilege escalation).
> 
> Or actually, considering the only case where virt_to_xen_l1e()
> would return NULL, returning INVALID_MFN here would likely be
> even more robust. There looks to be just a single caller, which
> would need adjusting to cope with an error coming back. In fact -
> it already ASSERT()s, despite NULL right now never coming back
> from vmap_to_page(). I think the loop there would better be
> 
>     for ( i = 0; i < pages; i++ )
>     {
>         struct page_info *page = vmap_to_page(va + i * PAGE_SIZE);
> 
>         if ( page )
>             page_list_add(page, &pg_list);
>         else
>             printk_once(...);
>     }
> 
> Thoughts?

To be honest, I think the current implementation of vmap_to_mfn() is
just incorrect. There is simply no guarantee that a vmap is mapped with
small pages, so IMO we just cannot do virt_to_xen_x1e() here. The
correct way is to have a generic page table walking function which
walks from the base and can stop at any level, and properly return code
to indicate level or any error.

I am inclined to BUG_ON() here, and upstream a proper fix later to
vmap_to_mfn() as an individual patch.

Am I missing anything here?

Hongyan

Jan Beulich July 27, 2020, 7:31 p.m. UTC | #3

On 27.07.2020 11:09, Hongyan Xia wrote:
> On Tue, 2020-07-14 at 12:47 +0200, Jan Beulich wrote:
>> On 29.05.2020 13:11, Hongyan Xia wrote:
>>> --- a/xen/include/asm-x86/page.h
>>> +++ b/xen/include/asm-x86/page.h
>>> @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *);
>>>   #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
>>>   #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
>>>   #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
>>> -#define
>>> vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned
>>> long)(va))))
>>> +
>>> +#define vmap_to_mfn(va)
>>> ({                                                  \
>>> +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned
>>> long)(va));   \
>>> +        mfn_t mfn_ =
>>> l1e_get_mfn(*pl1e_);                                   \
>>> +        unmap_domain_page(pl1e_);
>>>            \
>>> +        mfn_; })
>>
>> Just like is already the case in domain_page_map_to_mfn() I think
>> you want to add "BUG_ON(!pl1e)" here to limit the impact of any
>> problem to DoS (rather than a possible privilege escalation).
>>
>> Or actually, considering the only case where virt_to_xen_l1e()
>> would return NULL, returning INVALID_MFN here would likely be
>> even more robust. There looks to be just a single caller, which
>> would need adjusting to cope with an error coming back. In fact -
>> it already ASSERT()s, despite NULL right now never coming back
>> from vmap_to_page(). I think the loop there would better be
>>
>>      for ( i = 0; i < pages; i++ )
>>      {
>>          struct page_info *page = vmap_to_page(va + i * PAGE_SIZE);
>>
>>          if ( page )
>>              page_list_add(page, &pg_list);
>>          else
>>              printk_once(...);
>>      }
>>
>> Thoughts?
> 
> To be honest, I think the current implementation of vmap_to_mfn() is
> just incorrect. There is simply no guarantee that a vmap is mapped with
> small pages, so IMO we just cannot do virt_to_xen_x1e() here. The
> correct way is to have a generic page table walking function which
> walks from the base and can stop at any level, and properly return code
> to indicate level or any error.
> 
> I am inclined to BUG_ON() here, and upstream a proper fix later to
> vmap_to_mfn() as an individual patch.

Well, yes, in principle large pages can result from e.g. vmalloc()ing
a large enough area. However, rather than thinking of a generic
walking function as a solution, how about the simple one for the
immediate needs: Add MAP_SMALL_PAGES?

Also, as a general remark: When you disagree with review feedback, I
think it would be quite reasonable to wait with sending the next
version until the disagreement gets resolved, unless this is taking
unduly long delays.

Jan

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index b03728e18e..dc8627c1b5 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -333,21 +333,24 @@  void unmap_domain_page_global(const void *ptr)
 mfn_t domain_page_map_to_mfn(const void *ptr)
 {
     unsigned long va = (unsigned long)ptr;
-    const l1_pgentry_t *pl1e;
+    l1_pgentry_t l1e;
 
     if ( va >= DIRECTMAP_VIRT_START )
         return _mfn(virt_to_mfn(ptr));
 
     if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
     {
-        pl1e = virt_to_xen_l1e(va);
+        const l1_pgentry_t *pl1e = virt_to_xen_l1e(va);
+
         BUG_ON(!pl1e);
+        l1e = *pl1e;
+        unmap_domain_page(pl1e);
     }
     else
     {
         ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
-        pl1e = &__linear_l1_table[l1_linear_offset(va)];
+        l1e = __linear_l1_table[l1_linear_offset(va)];
     }
 
-    return l1e_get_mfn(*pl1e);
+    return l1e_get_mfn(l1e);
 }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 462682ba70..b67ecf4107 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4948,8 +4948,28 @@  void free_xen_pagetable_new(mfn_t mfn)
         free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
 }
 
+void *alloc_map_clear_xen_pt(mfn_t *pmfn)
+{
+    mfn_t mfn = alloc_xen_pagetable_new();
+    void *ret;
+
+    if ( mfn_eq(mfn, INVALID_MFN) )
+        return NULL;
+
+    if ( pmfn )
+        *pmfn = mfn;
+    ret = map_domain_page(mfn);
+    clear_page(ret);
+
+    return ret;
+}
+
 static DEFINE_SPINLOCK(map_pgdir_lock);
 
+/*
+ * For virt_to_xen_lXe() functions, they take a virtual address and return a
+ * pointer to Xen's LX entry. Caller needs to unmap the pointer.
+ */
 static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
 {
     l4_pgentry_t *pl4e;
@@ -4958,33 +4978,33 @@  static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
     if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l3_pgentry_t *l3t = alloc_xen_pagetable();
+        mfn_t l3mfn;
+        l3_pgentry_t *l3t = alloc_map_clear_xen_pt(&l3mfn);
 
         if ( !l3t )
             return NULL;
-        clear_page(l3t);
+        UNMAP_DOMAIN_PAGE(l3t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
-            l4_pgentry_t l4e = l4e_from_paddr(__pa(l3t), __PAGE_HYPERVISOR);
+            l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
 
             l4e_write(pl4e, l4e);
             efi_update_l4_pgtable(l4_table_offset(v), l4e);
-            l3t = NULL;
+            l3mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l3t )
-            free_xen_pagetable(l3t);
+        free_xen_pagetable_new(l3mfn);
     }
 
-    return l4e_to_l3e(*pl4e) + l3_table_offset(v);
+    return map_l3t_from_l4e(*pl4e) + l3_table_offset(v);
 }
 
 static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
 {
-    l3_pgentry_t *pl3e;
+    l3_pgentry_t *pl3e, l3e;
 
     pl3e = virt_to_xen_l3e(v);
     if ( !pl3e )
@@ -4993,31 +5013,37 @@  static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l2_pgentry_t *l2t = alloc_xen_pagetable();
+        mfn_t l2mfn;
+        l2_pgentry_t *l2t = alloc_map_clear_xen_pt(&l2mfn);
 
         if ( !l2t )
+        {
+            unmap_domain_page(pl3e);
             return NULL;
-        clear_page(l2t);
+        }
+        UNMAP_DOMAIN_PAGE(l2t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
-            l3e_write(pl3e, l3e_from_paddr(__pa(l2t), __PAGE_HYPERVISOR));
-            l2t = NULL;
+            l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+            l2mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l2t )
-            free_xen_pagetable(l2t);
+        free_xen_pagetable_new(l2mfn);
     }
 
     BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE);
-    return l3e_to_l2e(*pl3e) + l2_table_offset(v);
+    l3e = *pl3e;
+    unmap_domain_page(pl3e);
+
+    return map_l2t_from_l3e(l3e) + l2_table_offset(v);
 }
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
 {
-    l2_pgentry_t *pl2e;
+    l2_pgentry_t *pl2e, l2e;
 
     pl2e = virt_to_xen_l2e(v);
     if ( !pl2e )
@@ -5026,26 +5052,32 @@  l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l1_pgentry_t *l1t = alloc_xen_pagetable();
+        mfn_t l1mfn;
+        l1_pgentry_t *l1t = alloc_map_clear_xen_pt(&l1mfn);
 
         if ( !l1t )
+        {
+            unmap_domain_page(pl2e);
             return NULL;
-        clear_page(l1t);
+        }
+        UNMAP_DOMAIN_PAGE(l1t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
-            l2e_write(pl2e, l2e_from_paddr(__pa(l1t), __PAGE_HYPERVISOR));
-            l1t = NULL;
+            l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
+            l1mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l1t )
-            free_xen_pagetable(l1t);
+        free_xen_pagetable_new(l1mfn);
     }
 
     BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE);
-    return l2e_to_l1e(*pl2e) + l1_table_offset(v);
+    l2e = *pl2e;
+    unmap_domain_page(pl2e);
+
+    return map_l1t_from_l2e(l2e) + l1_table_offset(v);
 }
 
 /* Convert to from superpage-mapping flags for map_pages_to_xen(). */
@@ -5068,8 +5100,8 @@  int map_pages_to_xen(
     unsigned int flags)
 {
     bool locking = system_state > SYS_STATE_boot;
-    l3_pgentry_t *pl3e, ol3e;
-    l2_pgentry_t *pl2e, ol2e;
+    l3_pgentry_t *pl3e = NULL, ol3e;
+    l2_pgentry_t *pl2e = NULL, ol2e;
     l1_pgentry_t *pl1e, ol1e;
     unsigned int  i;
     int rc = -ENOMEM;
@@ -5090,6 +5122,10 @@  int map_pages_to_xen(
 
     while ( nr_mfns != 0 )
     {
+        /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl3e);
+        UNMAP_DOMAIN_PAGE(pl2e);
+
         pl3e = virt_to_xen_l3e(virt);
 
         if ( !pl3e )
@@ -5258,6 +5294,8 @@  int map_pages_to_xen(
                 pl1e = virt_to_xen_l1e(virt);
                 if ( pl1e == NULL )
                     goto out;
+
+                UNMAP_DOMAIN_PAGE(pl1e);
             }
             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
@@ -5434,6 +5472,8 @@  int map_pages_to_xen(
     rc = 0;
 
  out:
+    unmap_domain_page(pl2e);
+    unmap_domain_page(pl3e);
     return rc;
 }
 
@@ -5457,7 +5497,7 @@  int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool locking = system_state > SYS_STATE_boot;
-    l3_pgentry_t *pl3e;
+    l3_pgentry_t *pl3e = NULL;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
     unsigned int  i;
@@ -5473,6 +5513,9 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 
     while ( v < e )
     {
+        /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl3e);
+
         pl3e = virt_to_xen_l3e(v);
 
         if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
@@ -5701,6 +5744,7 @@  int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     rc = 0;
 
  out:
+    unmap_domain_page(pl3e);
     return rc;
 }
 
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index faebc1ddf1..9964ab2096 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -1,6 +1,7 @@ 
 #ifdef VMAP_VIRT_START
 #include <xen/bitmap.h>
 #include <xen/cache.h>
+#include <xen/domain_page.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/pfn.h>
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 3d3f9d49ac..42d1a78731 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -587,6 +587,7 @@  void *alloc_xen_pagetable(void);
 void free_xen_pagetable(void *v);
 mfn_t alloc_xen_pagetable_new(void);
 void free_xen_pagetable_new(mfn_t mfn);
+void *alloc_map_clear_xen_pt(mfn_t *pmfn);
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v);
 
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 5acf3d3d5a..3854feb3ea 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -291,7 +291,13 @@  void copy_page_sse2(void *, const void *);
 #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
 #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
 #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
-#define vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va))))
+
+#define vmap_to_mfn(va) ({                                                  \
+        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va));   \
+        mfn_t mfn_ = l1e_get_mfn(*pl1e_);                                   \
+        unmap_domain_page(pl1e_);                                           \
+        mfn_; })
+
 #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
 
 #endif /* !defined(__ASSEMBLY__) */

[v7,03/15] x86/mm: rewrite virt_to_xen_l*e

Commit Message

Comments

Patch