Message ID | fd5d98198d9539b232a570a83e7a24be2407e739.1590750232.git.hongyxia@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | switch to domheap for Xen page tables | expand |
On 29.05.2020 13:11, Hongyan Xia wrote: > From: Wei Liu <wei.liu2@citrix.com> > > Rewrite those functions to use the new APIs. Modify its callers to unmap > the pointer returned. Since alloc_xen_pagetable_new() is almost never > useful unless accompanied by page clearing and a mapping, introduce a > helper alloc_map_clear_xen_pt() for this sequence. > > Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to > unmap the page, which requires domain_page.h header in vmap. > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > Signed-off-by: Hongyan Xia <hongyxia@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with two further small adjustments: > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn) > free_xenheap_page(mfn_to_virt(mfn_x(mfn))); > } > > +void *alloc_map_clear_xen_pt(mfn_t *pmfn) > +{ > + mfn_t mfn = alloc_xen_pagetable_new(); > + void *ret; > + > + if ( mfn_eq(mfn, INVALID_MFN) ) > + return NULL; > + > + if ( pmfn ) > + *pmfn = mfn; > + ret = map_domain_page(mfn); > + clear_page(ret); > + > + return ret; > +} > + > static DEFINE_SPINLOCK(map_pgdir_lock); > > +/* > + * For virt_to_xen_lXe() functions, they take a virtual address and return a > + * pointer to Xen's LX entry. Caller needs to unmap the pointer. > + */ > static l3_pgentry_t *virt_to_xen_l3e(unsigned long v) May I suggest s/virtual/linear/ to at least make the new comment correct? > --- a/xen/include/asm-x86/page.h > +++ b/xen/include/asm-x86/page.h > @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *); > #define pfn_to_paddr(pfn) __pfn_to_paddr(pfn) > #define paddr_to_pfn(pa) __paddr_to_pfn(pa) > #define paddr_to_pdx(pa) pfn_to_pdx(paddr_to_pfn(pa)) > -#define vmap_to_mfn(va) _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va)))) > + > +#define vmap_to_mfn(va) ({ \ > + const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va)); \ > + mfn_t mfn_ = l1e_get_mfn(*pl1e_); \ > + unmap_domain_page(pl1e_); \ > + mfn_; }) Just like is already the case in domain_page_map_to_mfn() I think you want to add "BUG_ON(!pl1e)" here to limit the impact of any problem to DoS (rather than a possible privilege escalation). Or actually, considering the only case where virt_to_xen_l1e() would return NULL, returning INVALID_MFN here would likely be even more robust. There looks to be just a single caller, which would need adjusting to cope with an error coming back. In fact - it already ASSERT()s, despite NULL right now never coming back from vmap_to_page(). I think the loop there would better be for ( i = 0; i < pages; i++ ) { struct page_info *page = vmap_to_page(va + i * PAGE_SIZE); if ( page ) page_list_add(page, &pg_list); else printk_once(...); } Thoughts? Jan
On Tue, 2020-07-14 at 12:47 +0200, Jan Beulich wrote: > On 29.05.2020 13:11, Hongyan Xia wrote: > > From: Wei Liu <wei.liu2@citrix.com> > > > > Rewrite those functions to use the new APIs. Modify its callers to > > unmap > > the pointer returned. Since alloc_xen_pagetable_new() is almost > > never > > useful unless accompanied by page clearing and a mapping, introduce > > a > > helper alloc_map_clear_xen_pt() for this sequence. > > > > Note that the change of virt_to_xen_l1e() also requires > > vmap_to_mfn() to > > unmap the page, which requires domain_page.h header in vmap. > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > with two further small adjustments: > > > --- a/xen/arch/x86/mm.c > > +++ b/xen/arch/x86/mm.c > > @@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn) > > free_xenheap_page(mfn_to_virt(mfn_x(mfn))); > > } > > > > +void *alloc_map_clear_xen_pt(mfn_t *pmfn) > > +{ > > + mfn_t mfn = alloc_xen_pagetable_new(); > > + void *ret; > > + > > + if ( mfn_eq(mfn, INVALID_MFN) ) > > + return NULL; > > + > > + if ( pmfn ) > > + *pmfn = mfn; > > + ret = map_domain_page(mfn); > > + clear_page(ret); > > + > > + return ret; > > +} > > + > > static DEFINE_SPINLOCK(map_pgdir_lock); > > > > +/* > > + * For virt_to_xen_lXe() functions, they take a virtual address > > and return a > > + * pointer to Xen's LX entry. Caller needs to unmap the pointer. > > + */ > > static l3_pgentry_t *virt_to_xen_l3e(unsigned long v) > > May I suggest s/virtual/linear/ to at least make the new comment > correct? > > > --- a/xen/include/asm-x86/page.h > > +++ b/xen/include/asm-x86/page.h > > @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *); > > #define pfn_to_paddr(pfn) __pfn_to_paddr(pfn) > > #define paddr_to_pfn(pa) __paddr_to_pfn(pa) > > #define paddr_to_pdx(pa) pfn_to_pdx(paddr_to_pfn(pa)) > > -#define > > vmap_to_mfn(va) _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned > > long)(va)))) > > + > > +#define vmap_to_mfn(va) > > ({ \ > > + const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned > > long)(va)); \ > > + mfn_t mfn_ = > > l1e_get_mfn(*pl1e_); \ > > + unmap_domain_page(pl1e_); > > \ > > + mfn_; }) > > Just like is already the case in domain_page_map_to_mfn() I think > you want to add "BUG_ON(!pl1e)" here to limit the impact of any > problem to DoS (rather than a possible privilege escalation). > > Or actually, considering the only case where virt_to_xen_l1e() > would return NULL, returning INVALID_MFN here would likely be > even more robust. There looks to be just a single caller, which > would need adjusting to cope with an error coming back. In fact - > it already ASSERT()s, despite NULL right now never coming back > from vmap_to_page(). I think the loop there would better be > > for ( i = 0; i < pages; i++ ) > { > struct page_info *page = vmap_to_page(va + i * PAGE_SIZE); > > if ( page ) > page_list_add(page, &pg_list); > else > printk_once(...); > } > > Thoughts? To be honest, I think the current implementation of vmap_to_mfn() is just incorrect. There is simply no guarantee that a vmap is mapped with small pages, so IMO we just cannot do virt_to_xen_x1e() here. The correct way is to have a generic page table walking function which walks from the base and can stop at any level, and properly return code to indicate level or any error. I am inclined to BUG_ON() here, and upstream a proper fix later to vmap_to_mfn() as an individual patch. Am I missing anything here? Hongyan
On 27.07.2020 11:09, Hongyan Xia wrote: > On Tue, 2020-07-14 at 12:47 +0200, Jan Beulich wrote: >> On 29.05.2020 13:11, Hongyan Xia wrote: >>> --- a/xen/include/asm-x86/page.h >>> +++ b/xen/include/asm-x86/page.h >>> @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *); >>> #define pfn_to_paddr(pfn) __pfn_to_paddr(pfn) >>> #define paddr_to_pfn(pa) __paddr_to_pfn(pa) >>> #define paddr_to_pdx(pa) pfn_to_pdx(paddr_to_pfn(pa)) >>> -#define >>> vmap_to_mfn(va) _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned >>> long)(va)))) >>> + >>> +#define vmap_to_mfn(va) >>> ({ \ >>> + const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned >>> long)(va)); \ >>> + mfn_t mfn_ = >>> l1e_get_mfn(*pl1e_); \ >>> + unmap_domain_page(pl1e_); >>> \ >>> + mfn_; }) >> >> Just like is already the case in domain_page_map_to_mfn() I think >> you want to add "BUG_ON(!pl1e)" here to limit the impact of any >> problem to DoS (rather than a possible privilege escalation). >> >> Or actually, considering the only case where virt_to_xen_l1e() >> would return NULL, returning INVALID_MFN here would likely be >> even more robust. There looks to be just a single caller, which >> would need adjusting to cope with an error coming back. In fact - >> it already ASSERT()s, despite NULL right now never coming back >> from vmap_to_page(). I think the loop there would better be >> >> for ( i = 0; i < pages; i++ ) >> { >> struct page_info *page = vmap_to_page(va + i * PAGE_SIZE); >> >> if ( page ) >> page_list_add(page, &pg_list); >> else >> printk_once(...); >> } >> >> Thoughts? > > To be honest, I think the current implementation of vmap_to_mfn() is > just incorrect. There is simply no guarantee that a vmap is mapped with > small pages, so IMO we just cannot do virt_to_xen_x1e() here. The > correct way is to have a generic page table walking function which > walks from the base and can stop at any level, and properly return code > to indicate level or any error. > > I am inclined to BUG_ON() here, and upstream a proper fix later to > vmap_to_mfn() as an individual patch. Well, yes, in principle large pages can result from e.g. vmalloc()ing a large enough area. However, rather than thinking of a generic walking function as a solution, how about the simple one for the immediate needs: Add MAP_SMALL_PAGES? Also, as a general remark: When you disagree with review feedback, I think it would be quite reasonable to wait with sending the next version until the disagreement gets resolved, unless this is taking unduly long delays. Jan
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index b03728e18e..dc8627c1b5 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -333,21 +333,24 @@ void unmap_domain_page_global(const void *ptr) mfn_t domain_page_map_to_mfn(const void *ptr) { unsigned long va = (unsigned long)ptr; - const l1_pgentry_t *pl1e; + l1_pgentry_t l1e; if ( va >= DIRECTMAP_VIRT_START ) return _mfn(virt_to_mfn(ptr)); if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END ) { - pl1e = virt_to_xen_l1e(va); + const l1_pgentry_t *pl1e = virt_to_xen_l1e(va); + BUG_ON(!pl1e); + l1e = *pl1e; + unmap_domain_page(pl1e); } else { ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END); - pl1e = &__linear_l1_table[l1_linear_offset(va)]; + l1e = __linear_l1_table[l1_linear_offset(va)]; } - return l1e_get_mfn(*pl1e); + return l1e_get_mfn(l1e); } diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 462682ba70..b67ecf4107 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn) free_xenheap_page(mfn_to_virt(mfn_x(mfn))); } +void *alloc_map_clear_xen_pt(mfn_t *pmfn) +{ + mfn_t mfn = alloc_xen_pagetable_new(); + void *ret; + + if ( mfn_eq(mfn, INVALID_MFN) ) + return NULL; + + if ( pmfn ) + *pmfn = mfn; + ret = map_domain_page(mfn); + clear_page(ret); + + return ret; +} + static DEFINE_SPINLOCK(map_pgdir_lock); +/* + * For virt_to_xen_lXe() functions, they take a virtual address and return a + * pointer to Xen's LX entry. Caller needs to unmap the pointer. + */ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v) { l4_pgentry_t *pl4e; @@ -4958,33 +4978,33 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v) if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) ) { bool locking = system_state > SYS_STATE_boot; - l3_pgentry_t *l3t = alloc_xen_pagetable(); + mfn_t l3mfn; + l3_pgentry_t *l3t = alloc_map_clear_xen_pt(&l3mfn); if ( !l3t ) return NULL; - clear_page(l3t); + UNMAP_DOMAIN_PAGE(l3t); if ( locking ) spin_lock(&map_pgdir_lock); if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) ) { - l4_pgentry_t l4e = l4e_from_paddr(__pa(l3t), __PAGE_HYPERVISOR); + l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR); l4e_write(pl4e, l4e); efi_update_l4_pgtable(l4_table_offset(v), l4e); - l3t = NULL; + l3mfn = INVALID_MFN; } if ( locking ) spin_unlock(&map_pgdir_lock); - if ( l3t ) - free_xen_pagetable(l3t); + free_xen_pagetable_new(l3mfn); } - return l4e_to_l3e(*pl4e) + l3_table_offset(v); + return map_l3t_from_l4e(*pl4e) + l3_table_offset(v); } static l2_pgentry_t *virt_to_xen_l2e(unsigned long v) { - l3_pgentry_t *pl3e; + l3_pgentry_t *pl3e, l3e; pl3e = virt_to_xen_l3e(v); if ( !pl3e ) @@ -4993,31 +5013,37 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v) if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ) { bool locking = system_state > SYS_STATE_boot; - l2_pgentry_t *l2t = alloc_xen_pagetable(); + mfn_t l2mfn; + l2_pgentry_t *l2t = alloc_map_clear_xen_pt(&l2mfn); if ( !l2t ) + { + unmap_domain_page(pl3e); return NULL; - clear_page(l2t); + } + UNMAP_DOMAIN_PAGE(l2t); if ( locking ) spin_lock(&map_pgdir_lock); if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ) { - l3e_write(pl3e, l3e_from_paddr(__pa(l2t), __PAGE_HYPERVISOR)); - l2t = NULL; + l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR)); + l2mfn = INVALID_MFN; } if ( locking ) spin_unlock(&map_pgdir_lock); - if ( l2t ) - free_xen_pagetable(l2t); + free_xen_pagetable_new(l2mfn); } BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE); - return l3e_to_l2e(*pl3e) + l2_table_offset(v); + l3e = *pl3e; + unmap_domain_page(pl3e); + + return map_l2t_from_l3e(l3e) + l2_table_offset(v); } l1_pgentry_t *virt_to_xen_l1e(unsigned long v) { - l2_pgentry_t *pl2e; + l2_pgentry_t *pl2e, l2e; pl2e = virt_to_xen_l2e(v); if ( !pl2e ) @@ -5026,26 +5052,32 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v) if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ) { bool locking = system_state > SYS_STATE_boot; - l1_pgentry_t *l1t = alloc_xen_pagetable(); + mfn_t l1mfn; + l1_pgentry_t *l1t = alloc_map_clear_xen_pt(&l1mfn); if ( !l1t ) + { + unmap_domain_page(pl2e); return NULL; - clear_page(l1t); + } + UNMAP_DOMAIN_PAGE(l1t); if ( locking ) spin_lock(&map_pgdir_lock); if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ) { - l2e_write(pl2e, l2e_from_paddr(__pa(l1t), __PAGE_HYPERVISOR)); - l1t = NULL; + l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR)); + l1mfn = INVALID_MFN; } if ( locking ) spin_unlock(&map_pgdir_lock); - if ( l1t ) - free_xen_pagetable(l1t); + free_xen_pagetable_new(l1mfn); } BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE); - return l2e_to_l1e(*pl2e) + l1_table_offset(v); + l2e = *pl2e; + unmap_domain_page(pl2e); + + return map_l1t_from_l2e(l2e) + l1_table_offset(v); } /* Convert to from superpage-mapping flags for map_pages_to_xen(). */ @@ -5068,8 +5100,8 @@ int map_pages_to_xen( unsigned int flags) { bool locking = system_state > SYS_STATE_boot; - l3_pgentry_t *pl3e, ol3e; - l2_pgentry_t *pl2e, ol2e; + l3_pgentry_t *pl3e = NULL, ol3e; + l2_pgentry_t *pl2e = NULL, ol2e; l1_pgentry_t *pl1e, ol1e; unsigned int i; int rc = -ENOMEM; @@ -5090,6 +5122,10 @@ int map_pages_to_xen( while ( nr_mfns != 0 ) { + /* Clean up mappings mapped in the previous iteration. */ + UNMAP_DOMAIN_PAGE(pl3e); + UNMAP_DOMAIN_PAGE(pl2e); + pl3e = virt_to_xen_l3e(virt); if ( !pl3e ) @@ -5258,6 +5294,8 @@ int map_pages_to_xen( pl1e = virt_to_xen_l1e(virt); if ( pl1e == NULL ) goto out; + + UNMAP_DOMAIN_PAGE(pl1e); } else if ( l2e_get_flags(*pl2e) & _PAGE_PSE ) { @@ -5434,6 +5472,8 @@ int map_pages_to_xen( rc = 0; out: + unmap_domain_page(pl2e); + unmap_domain_page(pl3e); return rc; } @@ -5457,7 +5497,7 @@ int populate_pt_range(unsigned long virt, unsigned long nr_mfns) int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf) { bool locking = system_state > SYS_STATE_boot; - l3_pgentry_t *pl3e; + l3_pgentry_t *pl3e = NULL; l2_pgentry_t *pl2e; l1_pgentry_t *pl1e; unsigned int i; @@ -5473,6 +5513,9 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf) while ( v < e ) { + /* Clean up mappings mapped in the previous iteration. */ + UNMAP_DOMAIN_PAGE(pl3e); + pl3e = virt_to_xen_l3e(v); if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ) @@ -5701,6 +5744,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf) rc = 0; out: + unmap_domain_page(pl3e); return rc; } diff --git a/xen/common/vmap.c b/xen/common/vmap.c index faebc1ddf1..9964ab2096 100644 --- a/xen/common/vmap.c +++ b/xen/common/vmap.c @@ -1,6 +1,7 @@ #ifdef VMAP_VIRT_START #include <xen/bitmap.h> #include <xen/cache.h> +#include <xen/domain_page.h> #include <xen/init.h> #include <xen/mm.h> #include <xen/pfn.h> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 3d3f9d49ac..42d1a78731 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -587,6 +587,7 @@ void *alloc_xen_pagetable(void); void free_xen_pagetable(void *v); mfn_t alloc_xen_pagetable_new(void); void free_xen_pagetable_new(mfn_t mfn); +void *alloc_map_clear_xen_pt(mfn_t *pmfn); l1_pgentry_t *virt_to_xen_l1e(unsigned long v); diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h index 5acf3d3d5a..3854feb3ea 100644 --- a/xen/include/asm-x86/page.h +++ b/xen/include/asm-x86/page.h @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *); #define pfn_to_paddr(pfn) __pfn_to_paddr(pfn) #define paddr_to_pfn(pa) __paddr_to_pfn(pa) #define paddr_to_pdx(pa) pfn_to_pdx(paddr_to_pfn(pa)) -#define vmap_to_mfn(va) _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va)))) + +#define vmap_to_mfn(va) ({ \ + const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va)); \ + mfn_t mfn_ = l1e_get_mfn(*pl1e_); \ + unmap_domain_page(pl1e_); \ + mfn_; }) + #define vmap_to_page(va) mfn_to_page(vmap_to_mfn(va)) #endif /* !defined(__ASSEMBLY__) */