Message ID | f49a4a372a9f82e217fa56ba0dc3068deff32ef5.1702607884.git.sanastasio@raptorengineering.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Early Boot Allocation on Power | expand |
On 15.12.2023 03:44, Shawn Anastasio wrote: > In the initial mm-radix implementation, the in-memory partition and > process tables required to configure the MMU were allocated statically > since the boot allocator was not yet available. > > Now that it is, allocate these tables at runtime and bump the size of > the Process Table to its maximum supported value (on POWER9). Also bump > the number of static LVL2/3 PD frames to tolerate cases where the boot > allocator returns an address outside of the range of the LVL2 frame used > for Xen. Don't you need to bump to 4, in case PATB and PRTB end up sufficiently far apart? Or even further, considering that you may need distinct L2 and L3 for each of Xen, PATB, and PRTB? However, with there being memory to allocate now, is there a reason you still reserve (perhaps more than necessary) static memory for the page tables, rather than allocating those dynamically as well? > @@ -105,80 +157,43 @@ static void __init setup_initial_mapping(struct lvl1_pd *lvl1, > die(); > } > > + /* Identity map Xen itself */ > for ( page_addr = map_start; page_addr < map_end; page_addr += PAGE_SIZE ) > { > - struct lvl2_pd *lvl2; > - struct lvl3_pd *lvl3; > - struct lvl4_pt *lvl4; > - pde_t *pde; > - pte_t *pte; > - > - /* Allocate LVL 2 PD if necessary */ > - pde = pt_entry(lvl1, page_addr); > - if ( !pde_is_valid(*pde) ) > - { > - lvl2 = lvl2_pd_pool_alloc(); > - *pde = paddr_to_pde(__pa(lvl2), PDE_VALID, > - XEN_PT_ENTRIES_LOG2_LVL_2); > - } > - else > - lvl2 = __va(pde_to_paddr(*pde)); > + unsigned long flags; > > - /* Allocate LVL 3 PD if necessary */ > - pde = pt_entry(lvl2, page_addr); > - if ( !pde_is_valid(*pde) ) > + if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) ) > { > - lvl3 = lvl3_pd_pool_alloc(); > - *pde = paddr_to_pde(__pa(lvl3), PDE_VALID, > - XEN_PT_ENTRIES_LOG2_LVL_3); > + radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr); > + flags = PTE_XEN_RX; > } > - else > - lvl3 = __va(pde_to_paddr(*pde)); > - > - /* Allocate LVL 4 PT if necessary */ > - pde = pt_entry(lvl3, page_addr); > - if ( !pde_is_valid(*pde) ) > - { > - lvl4 = lvl4_pt_pool_alloc(); > - *pde = paddr_to_pde(__pa(lvl4), PDE_VALID, > - XEN_PT_ENTRIES_LOG2_LVL_4); > - } > - else > - lvl4 = __va(pde_to_paddr(*pde)); > - > - /* Finally, create PTE in LVL 4 PT */ > - pte = pt_entry(lvl4, page_addr); > - if ( !pte_is_valid(*pte) ) > + else if ( is_kernel_rodata(page_addr) ) > { > - unsigned long paddr = (page_addr - map_start) + phys_base; > - unsigned long flags; > - > - radix_dprintk("%016lx being mapped to %016lx\n", paddr, page_addr); > - if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) ) > - { > - radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr); > - flags = PTE_XEN_RX; > - } > - else if ( is_kernel_rodata(page_addr) ) > - { > - radix_dprintk("%016lx being marked as RODATA (RO)\n", page_addr); > - flags = PTE_XEN_RO; > - } > - else > - { > - radix_dprintk("%016lx being marked as DEFAULT (RW)\n", page_addr); > - flags = PTE_XEN_RW; > - } > - > - *pte = paddr_to_pte(paddr, flags); > - radix_dprintk("%016lx is the result of PTE map\n", > - paddr_to_pte(paddr, flags).pte); > + radix_dprintk("%016lx being marked as RODATA (RO)\n", page_addr); > + flags = PTE_XEN_RO; > } > else > { > - early_printk("BUG: Tried to create PTE for already-mapped page!"); > - die(); > + radix_dprintk("%016lx being marked as DEFAULT (RW)\n", page_addr); > + flags = PTE_XEN_RW; > } > + > + map_page_initial(lvl1, page_addr, (page_addr - map_start) + phys_base, flags); > + } > + > + /* Map runtime-allocated PATB, PRTB */ > + for ( page_addr = (uint64_t)initial_patb; > + page_addr < (uint64_t)initial_patb + PATB_SIZE; While technically not an issue, casting pointers to fixed width types is generally bogus. page_addr itself would likely better also be of a different type (unsigned long, uintptr_t, or vaddr_t; the latter only if that's meant to represent hypervisor virtual addresses, not guest ones). > + page_addr += PAGE_SIZE ) > + { > + map_page_initial(lvl1, page_addr, __pa(page_addr), PTE_XEN_RW); > + } > + > + for ( page_addr = (uint64_t)initial_prtb; > + page_addr < (uint64_t)initial_prtb + PRTB_SIZE; > + page_addr += PAGE_SIZE ) > + { > + map_page_initial(lvl1, page_addr, __pa(page_addr), PTE_XEN_RW); > } Just as a remark (you're the maintainer) - in cases like these we generally prefer to omit the braces. > @@ -210,6 +225,16 @@ void __init setup_initial_pagetables(void) > { > struct lvl1_pd *root = lvl1_pd_pool_alloc(); > unsigned long lpcr; > + mfn_t patb_mfn, prtb_mfn; > + > + /* Allocate mfns for in-memory tables using the boot allocator */ > + prtb_mfn = alloc_boot_pages(PRTB_SIZE / PAGE_SIZE, > + max(1, PRTB_SIZE_LOG2 - PAGE_SHIFT)); > + patb_mfn = alloc_boot_pages(PATB_SIZE / PAGE_SIZE, > + max(1, PATB_SIZE_LOG2 - PAGE_SHIFT)); > + > + initial_patb = __va(mfn_to_maddr(patb_mfn)); > + initial_prtb = __va(mfn_to_maddr(prtb_mfn)); Overall, what's the plan wrt directmap: Are you meaning to not have one covering all memory? If you do, I wonder if you wouldn't be better off mapping memory as you pass it to the boot allocator, such that you won't need to map things piecemeal like you're doing here. Jan
Hi Jan, I'm revisiting this series in preparation for a v3 and had some follow-up comments. On 12/21/23 1:06 AM, Jan Beulich wrote: > On 15.12.2023 03:44, Shawn Anastasio wrote: >> In the initial mm-radix implementation, the in-memory partition and >> process tables required to configure the MMU were allocated statically >> since the boot allocator was not yet available. >> >> Now that it is, allocate these tables at runtime and bump the size of >> the Process Table to its maximum supported value (on POWER9). Also bump >> the number of static LVL2/3 PD frames to tolerate cases where the boot >> allocator returns an address outside of the range of the LVL2 frame used >> for Xen. > > Don't you need to bump to 4, in case PATB and PRTB end up sufficiently > far apart? Or even further, considering that you may need distinct L2 > and L3 for each of Xen, PATB, and PRTB? > In practice this won't happen since the boot allocator will start from the top of free address space and move downwards, while the PowerNV firmware allocates all of its memory towards the bottom of the physical address space. I see your point though, and will likely end up scrapping the static page table allocation entirely (see below). > However, with there being memory to allocate now, is there a reason you > still reserve (perhaps more than necessary) static memory for the > page tables, rather than allocating those dynamically as well? > Initially I had avoided doing this to simplify the identity mapping of the paging structures themselves, since dynamically allocating them would (obviously) mean that their location wouldn't be known ahead of time and that would complicate the process of mapping them. On second thought though, this can be trivially solved by keeping track of the maximum and minimum addresses returned by all calls to alloc_boot_pages() in this early paging code and then simply identity mapping that entire region, and I think that this is worth doing to avoid these wasteful static allocations. >> @@ -105,80 +157,43 @@ static void __init setup_initial_mapping(struct lvl1_pd *lvl1, >> die(); >> } >> >> + /* Identity map Xen itself */ >> for ( page_addr = map_start; page_addr < map_end; page_addr += PAGE_SIZE ) >> { >> - struct lvl2_pd *lvl2; >> - struct lvl3_pd *lvl3; >> - struct lvl4_pt *lvl4; >> - pde_t *pde; >> - pte_t *pte; >> - >> - /* Allocate LVL 2 PD if necessary */ >> - pde = pt_entry(lvl1, page_addr); >> - if ( !pde_is_valid(*pde) ) >> - { >> - lvl2 = lvl2_pd_pool_alloc(); >> - *pde = paddr_to_pde(__pa(lvl2), PDE_VALID, >> - XEN_PT_ENTRIES_LOG2_LVL_2); >> - } >> - else >> - lvl2 = __va(pde_to_paddr(*pde)); >> + unsigned long flags; >> >> - /* Allocate LVL 3 PD if necessary */ >> - pde = pt_entry(lvl2, page_addr); >> - if ( !pde_is_valid(*pde) ) >> + if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) ) >> { >> - lvl3 = lvl3_pd_pool_alloc(); >> - *pde = paddr_to_pde(__pa(lvl3), PDE_VALID, >> - XEN_PT_ENTRIES_LOG2_LVL_3); >> + radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr); >> + flags = PTE_XEN_RX; >> } >> - else >> - lvl3 = __va(pde_to_paddr(*pde)); >> - >> - /* Allocate LVL 4 PT if necessary */ >> - pde = pt_entry(lvl3, page_addr); >> - if ( !pde_is_valid(*pde) ) >> - { >> - lvl4 = lvl4_pt_pool_alloc(); >> - *pde = paddr_to_pde(__pa(lvl4), PDE_VALID, >> - XEN_PT_ENTRIES_LOG2_LVL_4); >> - } >> - else >> - lvl4 = __va(pde_to_paddr(*pde)); >> - >> - /* Finally, create PTE in LVL 4 PT */ >> - pte = pt_entry(lvl4, page_addr); >> - if ( !pte_is_valid(*pte) ) >> + else if ( is_kernel_rodata(page_addr) ) >> { >> - unsigned long paddr = (page_addr - map_start) + phys_base; >> - unsigned long flags; >> - >> - radix_dprintk("%016lx being mapped to %016lx\n", paddr, page_addr); >> - if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) ) >> - { >> - radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr); >> - flags = PTE_XEN_RX; >> - } >> - else if ( is_kernel_rodata(page_addr) ) >> - { >> - radix_dprintk("%016lx being marked as RODATA (RO)\n", page_addr); >> - flags = PTE_XEN_RO; >> - } >> - else >> - { >> - radix_dprintk("%016lx being marked as DEFAULT (RW)\n", page_addr); >> - flags = PTE_XEN_RW; >> - } >> - >> - *pte = paddr_to_pte(paddr, flags); >> - radix_dprintk("%016lx is the result of PTE map\n", >> - paddr_to_pte(paddr, flags).pte); >> + radix_dprintk("%016lx being marked as RODATA (RO)\n", page_addr); >> + flags = PTE_XEN_RO; >> } >> else >> { >> - early_printk("BUG: Tried to create PTE for already-mapped page!"); >> - die(); >> + radix_dprintk("%016lx being marked as DEFAULT (RW)\n", page_addr); >> + flags = PTE_XEN_RW; >> } >> + >> + map_page_initial(lvl1, page_addr, (page_addr - map_start) + phys_base, flags); >> + } >> + >> + /* Map runtime-allocated PATB, PRTB */ >> + for ( page_addr = (uint64_t)initial_patb; >> + page_addr < (uint64_t)initial_patb + PATB_SIZE; > > While technically not an issue, casting pointers to fixed width types is > generally bogus. page_addr itself would likely better also be of a > different type (unsigned long, uintptr_t, or vaddr_t; the latter only if > that's meant to represent hypervisor virtual addresses, not guest ones). > Noted. I'll go with one of the other types you suggested. >> + page_addr += PAGE_SIZE ) >> + { >> + map_page_initial(lvl1, page_addr, __pa(page_addr), PTE_XEN_RW); >> + } >> + >> + for ( page_addr = (uint64_t)initial_prtb; >> + page_addr < (uint64_t)initial_prtb + PRTB_SIZE; >> + page_addr += PAGE_SIZE ) >> + { >> + map_page_initial(lvl1, page_addr, __pa(page_addr), PTE_XEN_RW); >> } > > Just as a remark (you're the maintainer) - in cases like these we generally > prefer to omit the braces. > Noted. In this case I feel pretty strongly that the braces aid readability so I'm inclined to keep them. >> @@ -210,6 +225,16 @@ void __init setup_initial_pagetables(void) >> { >> struct lvl1_pd *root = lvl1_pd_pool_alloc(); >> unsigned long lpcr; >> + mfn_t patb_mfn, prtb_mfn; >> + >> + /* Allocate mfns for in-memory tables using the boot allocator */ >> + prtb_mfn = alloc_boot_pages(PRTB_SIZE / PAGE_SIZE, >> + max(1, PRTB_SIZE_LOG2 - PAGE_SHIFT)); >> + patb_mfn = alloc_boot_pages(PATB_SIZE / PAGE_SIZE, >> + max(1, PATB_SIZE_LOG2 - PAGE_SHIFT)); >> + >> + initial_patb = __va(mfn_to_maddr(patb_mfn)); >> + initial_prtb = __va(mfn_to_maddr(prtb_mfn)); > > Overall, what's the plan wrt directmap: Are you meaning to not have one > covering all memory? If you do, I wonder if you wouldn't be better off > mapping memory as you pass it to the boot allocator, such that you > won't need to map things piecemeal like you're doing here. > This isn't a design decision that I've made yet, but I'm inclined to think that having a directmap would be a reasonable decision. Mapping memory as it is passed to the boot allocator seems reasonable in theory, but unless I'm missing something, doing so would create a chicken-and-egg problem since the act of mapping pages requires allocating page frames for the paging structures, and per your prior suggestion I'm planning to do that using the boot allocator itself. This is probably something I'd want to revisit once the PPC mm code is more fleshed out. > Jan Thanks, Shawn
diff --git a/xen/arch/ppc/mm-radix.c b/xen/arch/ppc/mm-radix.c index de181cf6f1..e5604d8fb3 100644 --- a/xen/arch/ppc/mm-radix.c +++ b/xen/arch/ppc/mm-radix.c @@ -22,7 +22,7 @@ void enable_mmu(void); #endif #define INITIAL_LVL1_PD_COUNT 1 -#define INITIAL_LVL2_LVL3_PD_COUNT 2 +#define INITIAL_LVL2_LVL3_PD_COUNT 3 #define INITIAL_LVL4_PT_COUNT 256 static size_t __initdata initial_lvl1_pd_pool_used; @@ -34,17 +34,13 @@ static struct lvl2_pd initial_lvl2_lvl3_pd_pool[INITIAL_LVL2_LVL3_PD_COUNT]; static size_t __initdata initial_lvl4_pt_pool_used; static struct lvl4_pt initial_lvl4_pt_pool[INITIAL_LVL4_PT_COUNT]; -/* Only reserve minimum Partition and Process tables */ #define PATB_SIZE_LOG2 16 /* Only supported partition table size on POWER9 */ #define PATB_SIZE (1UL << PATB_SIZE_LOG2) -#define PRTB_SIZE_LOG2 12 +#define PRTB_SIZE_LOG2 24 /* Maximum process table size on POWER9 */ #define PRTB_SIZE (1UL << PRTB_SIZE_LOG2) -static struct patb_entry - __aligned(PATB_SIZE) initial_patb[PATB_SIZE / sizeof(struct patb_entry)]; - -static struct prtb_entry - __aligned(PRTB_SIZE) initial_prtb[PRTB_SIZE / sizeof(struct prtb_entry)]; +static struct patb_entry *initial_patb; +static struct prtb_entry *initial_prtb; static __init struct lvl1_pd *lvl1_pd_pool_alloc(void) { @@ -86,6 +82,62 @@ static __init struct lvl4_pt *lvl4_pt_pool_alloc(void) return &initial_lvl4_pt_pool[initial_lvl4_pt_pool_used++]; } +static void map_page_initial(struct lvl1_pd *lvl1, vaddr_t virt, paddr_t phys, + unsigned long flags) +{ + struct lvl2_pd *lvl2; + struct lvl3_pd *lvl3; + struct lvl4_pt *lvl4; + pde_t *pde; + pte_t *pte; + + /* Allocate LVL 2 PD if necessary */ + pde = pt_entry(lvl1, virt); + if ( !pde_is_valid(*pde) ) + { + lvl2 = lvl2_pd_pool_alloc(); + *pde = paddr_to_pde(__pa(lvl2), PDE_VALID, + XEN_PT_ENTRIES_LOG2_LVL_2); + } + else + lvl2 = __va(pde_to_paddr(*pde)); + + /* Allocate LVL 3 PD if necessary */ + pde = pt_entry(lvl2, virt); + if ( !pde_is_valid(*pde) ) + { + lvl3 = lvl3_pd_pool_alloc(); + *pde = paddr_to_pde(__pa(lvl3), PDE_VALID, + XEN_PT_ENTRIES_LOG2_LVL_3); + } + else + lvl3 = __va(pde_to_paddr(*pde)); + + /* Allocate LVL 4 PT if necessary */ + pde = pt_entry(lvl3, virt); + if ( !pde_is_valid(*pde) ) + { + lvl4 = lvl4_pt_pool_alloc(); + *pde = paddr_to_pde(__pa(lvl4), PDE_VALID, + XEN_PT_ENTRIES_LOG2_LVL_4); + } + else + lvl4 = __va(pde_to_paddr(*pde)); + + /* Finally, create PTE in LVL 4 PT */ + pte = pt_entry(lvl4, virt); + if ( !pte_is_valid(*pte) ) + { + radix_dprintk("%016lx being mapped to %016lx\n", phys, virt); + *pte = paddr_to_pte(phys, flags); + } + else + { + early_printk("BUG: Tried to create PTE for already-mapped page!"); + die(); + } +} + static void __init setup_initial_mapping(struct lvl1_pd *lvl1, vaddr_t map_start, vaddr_t map_end, @@ -105,80 +157,43 @@ static void __init setup_initial_mapping(struct lvl1_pd *lvl1, die(); } + /* Identity map Xen itself */ for ( page_addr = map_start; page_addr < map_end; page_addr += PAGE_SIZE ) { - struct lvl2_pd *lvl2; - struct lvl3_pd *lvl3; - struct lvl4_pt *lvl4; - pde_t *pde; - pte_t *pte; - - /* Allocate LVL 2 PD if necessary */ - pde = pt_entry(lvl1, page_addr); - if ( !pde_is_valid(*pde) ) - { - lvl2 = lvl2_pd_pool_alloc(); - *pde = paddr_to_pde(__pa(lvl2), PDE_VALID, - XEN_PT_ENTRIES_LOG2_LVL_2); - } - else - lvl2 = __va(pde_to_paddr(*pde)); + unsigned long flags; - /* Allocate LVL 3 PD if necessary */ - pde = pt_entry(lvl2, page_addr); - if ( !pde_is_valid(*pde) ) + if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) ) { - lvl3 = lvl3_pd_pool_alloc(); - *pde = paddr_to_pde(__pa(lvl3), PDE_VALID, - XEN_PT_ENTRIES_LOG2_LVL_3); + radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr); + flags = PTE_XEN_RX; } - else - lvl3 = __va(pde_to_paddr(*pde)); - - /* Allocate LVL 4 PT if necessary */ - pde = pt_entry(lvl3, page_addr); - if ( !pde_is_valid(*pde) ) - { - lvl4 = lvl4_pt_pool_alloc(); - *pde = paddr_to_pde(__pa(lvl4), PDE_VALID, - XEN_PT_ENTRIES_LOG2_LVL_4); - } - else - lvl4 = __va(pde_to_paddr(*pde)); - - /* Finally, create PTE in LVL 4 PT */ - pte = pt_entry(lvl4, page_addr); - if ( !pte_is_valid(*pte) ) + else if ( is_kernel_rodata(page_addr) ) { - unsigned long paddr = (page_addr - map_start) + phys_base; - unsigned long flags; - - radix_dprintk("%016lx being mapped to %016lx\n", paddr, page_addr); - if ( is_kernel_text(page_addr) || is_kernel_inittext(page_addr) ) - { - radix_dprintk("%016lx being marked as TEXT (RX)\n", page_addr); - flags = PTE_XEN_RX; - } - else if ( is_kernel_rodata(page_addr) ) - { - radix_dprintk("%016lx being marked as RODATA (RO)\n", page_addr); - flags = PTE_XEN_RO; - } - else - { - radix_dprintk("%016lx being marked as DEFAULT (RW)\n", page_addr); - flags = PTE_XEN_RW; - } - - *pte = paddr_to_pte(paddr, flags); - radix_dprintk("%016lx is the result of PTE map\n", - paddr_to_pte(paddr, flags).pte); + radix_dprintk("%016lx being marked as RODATA (RO)\n", page_addr); + flags = PTE_XEN_RO; } else { - early_printk("BUG: Tried to create PTE for already-mapped page!"); - die(); + radix_dprintk("%016lx being marked as DEFAULT (RW)\n", page_addr); + flags = PTE_XEN_RW; } + + map_page_initial(lvl1, page_addr, (page_addr - map_start) + phys_base, flags); + } + + /* Map runtime-allocated PATB, PRTB */ + for ( page_addr = (uint64_t)initial_patb; + page_addr < (uint64_t)initial_patb + PATB_SIZE; + page_addr += PAGE_SIZE ) + { + map_page_initial(lvl1, page_addr, __pa(page_addr), PTE_XEN_RW); + } + + for ( page_addr = (uint64_t)initial_prtb; + page_addr < (uint64_t)initial_prtb + PRTB_SIZE; + page_addr += PAGE_SIZE ) + { + map_page_initial(lvl1, page_addr, __pa(page_addr), PTE_XEN_RW); } } @@ -210,6 +225,16 @@ void __init setup_initial_pagetables(void) { struct lvl1_pd *root = lvl1_pd_pool_alloc(); unsigned long lpcr; + mfn_t patb_mfn, prtb_mfn; + + /* Allocate mfns for in-memory tables using the boot allocator */ + prtb_mfn = alloc_boot_pages(PRTB_SIZE / PAGE_SIZE, + max(1, PRTB_SIZE_LOG2 - PAGE_SHIFT)); + patb_mfn = alloc_boot_pages(PATB_SIZE / PAGE_SIZE, + max(1, PATB_SIZE_LOG2 - PAGE_SHIFT)); + + initial_patb = __va(mfn_to_maddr(patb_mfn)); + initial_prtb = __va(mfn_to_maddr(prtb_mfn)); setup_initial_mapping(root, (vaddr_t)_start, (vaddr_t)_end, __pa(_start));
In the initial mm-radix implementation, the in-memory partition and process tables required to configure the MMU were allocated statically since the boot allocator was not yet available. Now that it is, allocate these tables at runtime and bump the size of the Process Table to its maximum supported value (on POWER9). Also bump the number of static LVL2/3 PD frames to tolerate cases where the boot allocator returns an address outside of the range of the LVL2 frame used for Xen. Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> --- Changes in v2: - Bump LVL2/3 PD count to 3 to avoid running out in case the boot allocator returns a suitably high address. xen/arch/ppc/mm-radix.c | 169 +++++++++++++++++++++++----------------- 1 file changed, 97 insertions(+), 72 deletions(-) -- 2.30.2