Message ID | 20221216114853.8227-9-julien@xen.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Remove the directmap | expand |
On 16.12.2022 12:48, Julien Grall wrote: > From: Hongyan Xia <hongyxia@amazon.com> > > Building a PV dom0 is allocating from the domheap but uses it like the > xenheap. This is clearly wrong. Fix. "Clearly wrong" would mean there's a bug here, at lest under certain conditions. But there isn't: Even on huge systems, due to running on idle page tables, all memory is mapped at present. > @@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d, > v->arch.pv.event_callback_cs = FLAT_COMPAT_KERNEL_CS; > } > > +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \ > +do { \ > + UNMAP_DOMAIN_PAGE(virt_var); \ Not much point using the macro when ... > + mfn_var = maddr_to_mfn(maddr); \ > + maddr += PAGE_SIZE; \ > + virt_var = map_domain_page(mfn_var); \ ... the variable gets reset again to non-NULL unconditionally right away. > +} while ( false ) This being a local macro and all use sites passing mpt_alloc as the last argument, I think that parameter wants dropping, which would improve readability. > @@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d, > if ( !l3e_get_intpte(*l3tab) ) > { > maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table; > - l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; > - clear_page(l2tab); > - *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT); > + UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc); > + clear_page(l2start); > + *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT); The l2start you map on the last iteration here can be re-used ... > @@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d, > unmap_domain_page(l2t); > } ... in the code the tail of which is visible here, eliminating a redundant map/unmap pair. > @@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d, > * !CONFIG_VIDEO case so the logic here can be simplified. > */ > if ( pv_shim ) > + { > + l4start = map_domain_page(l4start_mfn); > pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start, > vphysmap_start, si); > + UNMAP_DOMAIN_PAGE(l4start); > + } The, at the first glance, redundant re-mapping of the L4 table here could do with explaining in the description. However, I further wonder in how far in shim mode eliminating the direct map is actually useful. Which is to say that I question the need for this change in the first place. Or wait - isn't this (unlike the rest of this patch) actually a bug fix? At this point we're on the domain's page tables, which may not cover the page the L4 is allocated at (if a truly huge shim was configured). So I guess the change is needed but wants breaking out, allowing to at least consider whether to backport it. Jan
Hi Jan, I have been looking at this series recently and tried my best to address your comments. I'll shortly to the other patches too. On 22/12/2022 11:48, Jan Beulich wrote: > On 16.12.2022 12:48, Julien Grall wrote: >> From: Hongyan Xia <hongyxia@amazon.com> >> >> Building a PV dom0 is allocating from the domheap but uses it like the >> xenheap. This is clearly wrong. Fix. > > "Clearly wrong" would mean there's a bug here, at lest under certain > conditions. But there isn't: Even on huge systems, due to running on > idle page tables, all memory is mapped at present. I agree with you, I'll rephrase the commit message. > >> @@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d, >> v->arch.pv.event_callback_cs = FLAT_COMPAT_KERNEL_CS; >> } >> >> +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \ >> +do { \ >> + UNMAP_DOMAIN_PAGE(virt_var); \ > > Not much point using the macro when ... > >> + mfn_var = maddr_to_mfn(maddr); \ >> + maddr += PAGE_SIZE; \ >> + virt_var = map_domain_page(mfn_var); \ > > ... the variable gets reset again to non-NULL unconditionally right > away. Sure, I'll change that. > >> +} while ( false ) > > This being a local macro and all use sites passing mpt_alloc as the > last argument, I think that parameter wants dropping, which would > improve readability. I have to disagree. It wouldn't improve readability but make only make things more obscure. I'll keep the macro as is. > >> @@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d, >> if ( !l3e_get_intpte(*l3tab) ) >> { >> maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table; >> - l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; >> - clear_page(l2tab); >> - *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT); >> + UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc); >> + clear_page(l2start); >> + *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT); > > The l2start you map on the last iteration here can be re-used ... > >> @@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d, >> unmap_domain_page(l2t); >> } > > ... in the code the tail of which is visible here, eliminating a > redundant map/unmap pair. Good catch, I'll remove the redundant pair. > >> @@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d, >> * !CONFIG_VIDEO case so the logic here can be simplified. >> */ >> if ( pv_shim ) >> + { >> + l4start = map_domain_page(l4start_mfn); >> pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start, >> vphysmap_start, si); >> + UNMAP_DOMAIN_PAGE(l4start); >> + } > > The, at the first glance, redundant re-mapping of the L4 table here could > do with explaining in the description. However, I further wonder in how > far in shim mode eliminating the direct map is actually useful. Which is > to say that I question the need for this change in the first place. Or > wait - isn't this (unlike the rest of this patch) actually a bug fix? At > this point we're on the domain's page tables, which may not cover the > page the L4 is allocated at (if a truly huge shim was configured). So I > guess the change is needed but wants breaking out, allowing to at least > consider whether to backport it. > I will create a separate patch for this change. > Jan >
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index c837b2d96f89..cd60f259d1b7 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -383,6 +383,10 @@ int __init dom0_construct_pv(struct domain *d, l3_pgentry_t *l3tab = NULL, *l3start = NULL; l2_pgentry_t *l2tab = NULL, *l2start = NULL; l1_pgentry_t *l1tab = NULL, *l1start = NULL; + mfn_t l4start_mfn = INVALID_MFN; + mfn_t l3start_mfn = INVALID_MFN; + mfn_t l2start_mfn = INVALID_MFN; + mfn_t l1start_mfn = INVALID_MFN; /* * This fully describes the memory layout of the initial domain. All @@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d, v->arch.pv.event_callback_cs = FLAT_COMPAT_KERNEL_CS; } +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \ +do { \ + UNMAP_DOMAIN_PAGE(virt_var); \ + mfn_var = maddr_to_mfn(maddr); \ + maddr += PAGE_SIZE; \ + virt_var = map_domain_page(mfn_var); \ +} while ( false ) + if ( !compat ) { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table; - l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; + UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc); + l4tab = l4start; clear_page(l4tab); - init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)), - d, INVALID_MFN, true); - v->arch.guest_table = pagetable_from_paddr(__pa(l4start)); + init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true); + v->arch.guest_table = pagetable_from_mfn(l4start_mfn); } else { /* Monitor table already created by switch_compat(). */ - l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table)); + l4start_mfn = pagetable_get_mfn(v->arch.guest_table); + l4start = l4tab = map_domain_page(l4start_mfn); /* See public/xen.h on why the following is needed. */ maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table; l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; + UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc); } l4tab += l4_table_offset(v_start); @@ -736,14 +750,16 @@ int __init dom0_construct_pv(struct domain *d, if ( !((unsigned long)l1tab & (PAGE_SIZE-1)) ) { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l1_page_table; - l1start = l1tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; + UNMAP_MAP_AND_ADVANCE(l1start_mfn, l1start, mpt_alloc); + l1tab = l1start; clear_page(l1tab); if ( count == 0 ) l1tab += l1_table_offset(v_start); if ( !((unsigned long)l2tab & (PAGE_SIZE-1)) ) { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table; - l2start = l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; + UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc); + l2tab = l2start; clear_page(l2tab); if ( count == 0 ) l2tab += l2_table_offset(v_start); @@ -753,19 +769,19 @@ int __init dom0_construct_pv(struct domain *d, { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table; - l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; + UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc); } l3tab = l3start; clear_page(l3tab); if ( count == 0 ) l3tab += l3_table_offset(v_start); - *l4tab = l4e_from_paddr(__pa(l3start), L4_PROT); + *l4tab = l4e_from_mfn(l3start_mfn, L4_PROT); l4tab++; } - *l3tab = l3e_from_paddr(__pa(l2start), L3_PROT); + *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT); l3tab++; } - *l2tab = l2e_from_paddr(__pa(l1start), L2_PROT); + *l2tab = l2e_from_mfn(l1start_mfn, L2_PROT); l2tab++; } if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) ) @@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d, if ( !l3e_get_intpte(*l3tab) ) { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table; - l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; - clear_page(l2tab); - *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT); + UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc); + clear_page(l2start); + *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT); } if ( i == 3 ) l3e_get_page(*l3tab)->u.inuse.type_info |= PGT_pae_xen_l2; @@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d, unmap_domain_page(l2t); } +#undef UNMAP_MAP_AND_ADVANCE + + UNMAP_DOMAIN_PAGE(l1start); + UNMAP_DOMAIN_PAGE(l2start); + UNMAP_DOMAIN_PAGE(l3start); + /* Pages that are part of page tables must be read only. */ mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages, &flush_flags); + UNMAP_DOMAIN_PAGE(l4start); + /* Mask all upcalls... */ for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ ) shared_info(d, vcpu_info[i].evtchn_upcall_mask) = 1; @@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d, * !CONFIG_VIDEO case so the logic here can be simplified. */ if ( pv_shim ) + { + l4start = map_domain_page(l4start_mfn); pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start, vphysmap_start, si); + UNMAP_DOMAIN_PAGE(l4start); + } #ifdef CONFIG_COMPAT if ( compat )