Message ID | 20220609083039.76667-2-julien@xen.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/mm: Optimize init_heap_pages() | expand |
On 09.06.2022 10:30, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > At the moment, init_heap_pages() will call free_heap_pages() page > by page. To reduce the time to initialize the heap, we will want > to provide multiple pages at the same time. > > init_heap_pages() is now split in two parts: > - init_heap_pages(): will break down the range in multiple set > of contiguous pages. For now, the criteria is the pages should > belong to the same NUMA node. > - init_contig_pages(): will initialize a set of contiguous pages. > For now the pages are still passed one by one to free_heap_pages(). Hmm, the common use of "contiguous" is to describe address correlation. Therefore I'm afraid I'd like to see "contig" avoided when you mean "same node". Perhaps init_node_pages()? > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status) > } > > /* > - * Hand the specified arbitrary page range to the specified heap zone > - * checking the node_id of the previous page. If they differ and the > - * latter is not on a MAX_ORDER boundary, then we reserve the page by > - * not freeing it to the buddy allocator. > + * init_contig_heap_pages() is intended to only take pages from the same > + * NUMA node. > */ > +static bool is_contig_page(struct page_info *pg, unsigned int nid) > +{ > + return (nid == (phys_to_nid(page_to_maddr(pg)))); > +} If such a helper is indeed needed, then I think it absolutely wants pg to be pointer-to-const. And imo it would also help readability if the extra pair of parentheses around the nested function calls was omitted. Given the naming concern, though, I wonder whether this wouldn't better be open-coded in the one place it is used. Actually naming is quite odd here beyond what I'd said further up: "Is this page contiguous?" Such a question requires two pages, i.e. "Are these two pages contiguous?" What you want to know is "Is this page on the given node?" > +/* > + * This function should only be called with valid pages from the same NUMA > + * node. > + * > + * Callers should use is_contig_page() first to check if all the pages > + * in a range are contiguous. > + */ > +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, const again? > + bool need_scrub) > +{ > + unsigned long s, e; > + unsigned int nid = phys_to_nid(page_to_maddr(pg)); > + > + s = mfn_x(page_to_mfn(pg)); > + e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); > + if ( unlikely(!avail[nid]) ) > + { > + bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) && IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant? > + (find_first_set_bit(e) <= find_first_set_bit(s)); > + unsigned long n; > + > + n = init_node_heap(nid, s, nr_pages, &use_tail); > + BUG_ON(n > nr_pages); > + if ( use_tail ) > + e -= n; > + else > + s += n; > + } > + > + while ( s < e ) > + { > + free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub); > + s += 1UL; Nit (I realize the next patch will replace this anyway): Just ++s? Or at least a plain 1 without UL suffix? > @@ -1812,35 +1851,24 @@ static void init_heap_pages( > spin_unlock(&heap_lock); > > if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE ) > - idle_scrub = true; > + need_scrub = true; > > - for ( i = 0; i < nr_pages; i++ ) > + for ( i = 0; i < nr_pages; ) > { > - unsigned int nid = phys_to_nid(page_to_maddr(pg+i)); > + unsigned int nid = phys_to_nid(page_to_maddr(pg)); > + unsigned long left = nr_pages - i; > + unsigned long contig_pages; > > - if ( unlikely(!avail[nid]) ) > + for ( contig_pages = 1; contig_pages < left; contig_pages++ ) > { > - unsigned long s = mfn_x(page_to_mfn(pg + i)); > - unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); > - bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) && > - !(s & ((1UL << MAX_ORDER) - 1)) && > - (find_first_set_bit(e) <= find_first_set_bit(s)); > - unsigned long n; > - > - n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i, > - &use_tail); > - BUG_ON(i + n > nr_pages); > - if ( n && !use_tail ) > - { > - i += n - 1; > - continue; > - } > - if ( i + n == nr_pages ) > + if ( !is_contig_page(pg + contig_pages, nid) ) > break; > - nr_pages -= n; > } Isn't doing this page by page in a loop quite inefficient? Can't you simply obtain the end of the node's range covering the first page, and then adjust "left" accordingly? I even wonder whether the admittedly lax original check's assumption couldn't be leveraged alternatively, by effectively bisecting to the end address on the node of interest (where the assumption is that nodes aren't interleaved - see Wei's NUMA series dealing with precisely that situation). Jan
Hi Jan, On 09/06/2022 13:09, Jan Beulich wrote: > On 09.06.2022 10:30, Julien Grall wrote: >> From: Julien Grall <jgrall@amazon.com> >> >> At the moment, init_heap_pages() will call free_heap_pages() page >> by page. To reduce the time to initialize the heap, we will want >> to provide multiple pages at the same time. >> >> init_heap_pages() is now split in two parts: >> - init_heap_pages(): will break down the range in multiple set >> of contiguous pages. For now, the criteria is the pages should >> belong to the same NUMA node. >> - init_contig_pages(): will initialize a set of contiguous pages. >> For now the pages are still passed one by one to free_heap_pages(). > > Hmm, the common use of "contiguous" is to describe address correlation. > Therefore I'm afraid I'd like to see "contig" avoided when you mean > "same node". Perhaps init_node_pages()? After the next patch, it will not only be the same node, It will also be the same zone at least. Also, in the future, I would like to re-submitting David Woodhouse patch to exclude broken pages (see [1]). Therefore, I think the name init_node_pages() would not be suitable. Please suggest a different name. > >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status) >> } >> >> /* >> - * Hand the specified arbitrary page range to the specified heap zone >> - * checking the node_id of the previous page. If they differ and the >> - * latter is not on a MAX_ORDER boundary, then we reserve the page by >> - * not freeing it to the buddy allocator. >> + * init_contig_heap_pages() is intended to only take pages from the same >> + * NUMA node. >> */ >> +static bool is_contig_page(struct page_info *pg, unsigned int nid) >> +{ >> + return (nid == (phys_to_nid(page_to_maddr(pg)))); >> +} > > If such a helper is indeed needed, then I think it absolutely wants > pg to be pointer-to-const. And imo it would also help readability if > the extra pair of parentheses around the nested function calls was > omitted. Given the naming concern, though, I wonder whether this > wouldn't better be open-coded in the one place it is used. Actually > naming is quite odd here beyond what I'd said further up: "Is this > page contiguous?" Such a question requires two pages, i.e. "Are these > two pages contiguous?" What you want to know is "Is this page on the > given node?" There will be more check in the future (see next patch). I created an helper because it reduces the size of the loop init_heap_pages(). I would be OK to fold it if you strongly prefer that. > >> +/* >> + * This function should only be called with valid pages from the same NUMA >> + * node. >> + * >> + * Callers should use is_contig_page() first to check if all the pages >> + * in a range are contiguous. >> + */ >> +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, > > const again? I will have a look. > >> + bool need_scrub) >> +{ >> + unsigned long s, e; >> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >> + >> + s = mfn_x(page_to_mfn(pg)); >> + e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); >> + if ( unlikely(!avail[nid]) ) >> + { >> + bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) && > > IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant? This is existing code and it is quite complex. So I would prefer if we avoid to simplify and move the code in the same patch. I would be happy to write a follow-up patch to switch to IS_ALIGNED(). > >> + (find_first_set_bit(e) <= find_first_set_bit(s)); >> + unsigned long n; >> + >> + n = init_node_heap(nid, s, nr_pages, &use_tail); >> + BUG_ON(n > nr_pages); >> + if ( use_tail ) >> + e -= n; >> + else >> + s += n; >> + } >> + >> + while ( s < e ) >> + { >> + free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub); >> + s += 1UL; > > Nit (I realize the next patch will replace this anyway): Just ++s? Or > at least a plain 1 without UL suffix? I will switch to s++. > >> @@ -1812,35 +1851,24 @@ static void init_heap_pages( >> spin_unlock(&heap_lock); >> >> if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE ) >> - idle_scrub = true; >> + need_scrub = true; >> >> - for ( i = 0; i < nr_pages; i++ ) >> + for ( i = 0; i < nr_pages; ) >> { >> - unsigned int nid = phys_to_nid(page_to_maddr(pg+i)); >> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >> + unsigned long left = nr_pages - i; >> + unsigned long contig_pages; >> >> - if ( unlikely(!avail[nid]) ) >> + for ( contig_pages = 1; contig_pages < left; contig_pages++ ) >> { >> - unsigned long s = mfn_x(page_to_mfn(pg + i)); >> - unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); >> - bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) && >> - !(s & ((1UL << MAX_ORDER) - 1)) && >> - (find_first_set_bit(e) <= find_first_set_bit(s)); >> - unsigned long n; >> - >> - n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i, >> - &use_tail); >> - BUG_ON(i + n > nr_pages); >> - if ( n && !use_tail ) >> - { >> - i += n - 1; >> - continue; >> - } >> - if ( i + n == nr_pages ) >> + if ( !is_contig_page(pg + contig_pages, nid) ) >> break; >> - nr_pages -= n; >> } > > Isn't doing this page by page in a loop quite inefficient? Can't you > simply obtain the end of the node's range covering the first page, and > then adjust "left" accordingly? The page by page is necessary because we may need to exclude some pages (see [1]) or the range may cross multiple-zone (see [2]). > I even wonder whether the admittedly > lax original check's assumption couldn't be leveraged alternatively, > by effectively bisecting to the end address on the node of interest > (where the assumption is that nodes aren't interleaved - see Wei's > NUMA series dealing with precisely that situation). See above. We went this way because there are some pages to be excluded. The immediate use case is broken pages, but in the future we may need to also exclude pages that contain guest content after Live-Update. I also plan to get rid of the loop in free_heap_pages() to mark each page free. This would mean that pages would only be accessed once in init_heap_pages() (I am still cleaning up that patch and it is a much more controversial change). Cheers, [1] https://lore.kernel.org/xen-devel/20200201003303.2363081-2-dwmw2@infradead.org/ > > Jan
On 09.06.2022 14:33, Julien Grall wrote: > On 09/06/2022 13:09, Jan Beulich wrote: >> On 09.06.2022 10:30, Julien Grall wrote: >>> From: Julien Grall <jgrall@amazon.com> >>> >>> At the moment, init_heap_pages() will call free_heap_pages() page >>> by page. To reduce the time to initialize the heap, we will want >>> to provide multiple pages at the same time. >>> >>> init_heap_pages() is now split in two parts: >>> - init_heap_pages(): will break down the range in multiple set >>> of contiguous pages. For now, the criteria is the pages should >>> belong to the same NUMA node. >>> - init_contig_pages(): will initialize a set of contiguous pages. >>> For now the pages are still passed one by one to free_heap_pages(). >> >> Hmm, the common use of "contiguous" is to describe address correlation. >> Therefore I'm afraid I'd like to see "contig" avoided when you mean >> "same node". Perhaps init_node_pages()? > > After the next patch, it will not only be the same node, It will also be > the same zone at least. Also, in the future, I would like to > re-submitting David Woodhouse patch to exclude broken pages (see [1]). > > Therefore, I think the name init_node_pages() would not be suitable. > Please suggest a different name. _init_heap_pages() then, as a helper of init_heap_pages()? >>> --- a/xen/common/page_alloc.c >>> +++ b/xen/common/page_alloc.c >>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status) >>> } >>> >>> /* >>> - * Hand the specified arbitrary page range to the specified heap zone >>> - * checking the node_id of the previous page. If they differ and the >>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by >>> - * not freeing it to the buddy allocator. >>> + * init_contig_heap_pages() is intended to only take pages from the same >>> + * NUMA node. >>> */ >>> +static bool is_contig_page(struct page_info *pg, unsigned int nid) >>> +{ >>> + return (nid == (phys_to_nid(page_to_maddr(pg)))); >>> +} >> >> If such a helper is indeed needed, then I think it absolutely wants >> pg to be pointer-to-const. And imo it would also help readability if >> the extra pair of parentheses around the nested function calls was >> omitted. Given the naming concern, though, I wonder whether this >> wouldn't better be open-coded in the one place it is used. Actually >> naming is quite odd here beyond what I'd said further up: "Is this >> page contiguous?" Such a question requires two pages, i.e. "Are these >> two pages contiguous?" What you want to know is "Is this page on the >> given node?" > > There will be more check in the future (see next patch). I created an > helper because it reduces the size of the loop init_heap_pages(). I > would be OK to fold it if you strongly prefer that. I don't "strongly" prefer that; I'd also be okay with a suitably named helper. Just that I can't seem to be able to come up with any good name. >>> +/* >>> + * This function should only be called with valid pages from the same NUMA >>> + * node. >>> + * >>> + * Callers should use is_contig_page() first to check if all the pages >>> + * in a range are contiguous. >>> + */ >>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, >>> + bool need_scrub) >>> +{ >>> + unsigned long s, e; >>> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >>> + >>> + s = mfn_x(page_to_mfn(pg)); >>> + e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); >>> + if ( unlikely(!avail[nid]) ) >>> + { >>> + bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) && >> >> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant? > > This is existing code and it is quite complex. So I would prefer if we > avoid to simplify and move the code in the same patch. I would be happy > to write a follow-up patch to switch to IS_ALIGNED(). I do realize it's code you move, but I can accept your desire to merely move the code without any cleanup. Personally I think that rather than a follow-up patch (which doesn't help the reviewing of this one) such an adjustment would better be a prereq one. >>> @@ -1812,35 +1851,24 @@ static void init_heap_pages( >>> spin_unlock(&heap_lock); >>> >>> if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE ) >>> - idle_scrub = true; >>> + need_scrub = true; >>> >>> - for ( i = 0; i < nr_pages; i++ ) >>> + for ( i = 0; i < nr_pages; ) >>> { >>> - unsigned int nid = phys_to_nid(page_to_maddr(pg+i)); >>> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >>> + unsigned long left = nr_pages - i; >>> + unsigned long contig_pages; >>> >>> - if ( unlikely(!avail[nid]) ) >>> + for ( contig_pages = 1; contig_pages < left; contig_pages++ ) >>> { >>> - unsigned long s = mfn_x(page_to_mfn(pg + i)); >>> - unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); >>> - bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) && >>> - !(s & ((1UL << MAX_ORDER) - 1)) && >>> - (find_first_set_bit(e) <= find_first_set_bit(s)); >>> - unsigned long n; >>> - >>> - n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i, >>> - &use_tail); >>> - BUG_ON(i + n > nr_pages); >>> - if ( n && !use_tail ) >>> - { >>> - i += n - 1; >>> - continue; >>> - } >>> - if ( i + n == nr_pages ) >>> + if ( !is_contig_page(pg + contig_pages, nid) ) >>> break; >>> - nr_pages -= n; >>> } >> >> Isn't doing this page by page in a loop quite inefficient? Can't you >> simply obtain the end of the node's range covering the first page, and >> then adjust "left" accordingly? > > The page by page is necessary because we may need to exclude some pages > (see [1]) or the range may cross multiple-zone (see [2]). If you want/need to do this for "future" reasons (aiui [1] was never committed, and you forgot to supply [2], but yes, splitting at zone boundaries is of course necessary), then I think this wants making quite clear in the description. Jan >> I even wonder whether the admittedly >> lax original check's assumption couldn't be leveraged alternatively, >> by effectively bisecting to the end address on the node of interest >> (where the assumption is that nodes aren't interleaved - see Wei's >> NUMA series dealing with precisely that situation). > See above. We went this way because there are some pages to be excluded. > The immediate use case is broken pages, but in the future we may need to > also exclude pages that contain guest content after Live-Update. > > I also plan to get rid of the loop in free_heap_pages() to mark each > page free. This would mean that pages would only be accessed once in > init_heap_pages() (I am still cleaning up that patch and it is a much > more controversial change). > > Cheers, > > [1] > https://lore.kernel.org/xen-devel/20200201003303.2363081-2-dwmw2@infradead.org/ > >> >> Jan >
Hi, On 09/06/2022 14:12, Jan Beulich wrote: > On 09.06.2022 14:33, Julien Grall wrote: >> On 09/06/2022 13:09, Jan Beulich wrote: >>> On 09.06.2022 10:30, Julien Grall wrote: >>>> From: Julien Grall <jgrall@amazon.com> >>>> >>>> At the moment, init_heap_pages() will call free_heap_pages() page >>>> by page. To reduce the time to initialize the heap, we will want >>>> to provide multiple pages at the same time. >>>> >>>> init_heap_pages() is now split in two parts: >>>> - init_heap_pages(): will break down the range in multiple set >>>> of contiguous pages. For now, the criteria is the pages should >>>> belong to the same NUMA node. >>>> - init_contig_pages(): will initialize a set of contiguous pages. >>>> For now the pages are still passed one by one to free_heap_pages(). >>> >>> Hmm, the common use of "contiguous" is to describe address correlation. >>> Therefore I'm afraid I'd like to see "contig" avoided when you mean >>> "same node". Perhaps init_node_pages()? >> >> After the next patch, it will not only be the same node, It will also be >> the same zone at least. Also, in the future, I would like to >> re-submitting David Woodhouse patch to exclude broken pages (see [1]). >> >> Therefore, I think the name init_node_pages() would not be suitable. >> Please suggest a different name. > > _init_heap_pages() then, as a helper of init_heap_pages()? I am fine with your proposed named. That said... > >>>> --- a/xen/common/page_alloc.c >>>> +++ b/xen/common/page_alloc.c >>>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status) >>>> } >>>> >>>> /* >>>> - * Hand the specified arbitrary page range to the specified heap zone >>>> - * checking the node_id of the previous page. If they differ and the >>>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by >>>> - * not freeing it to the buddy allocator. >>>> + * init_contig_heap_pages() is intended to only take pages from the same >>>> + * NUMA node. >>>> */ >>>> +static bool is_contig_page(struct page_info *pg, unsigned int nid) >>>> +{ >>>> + return (nid == (phys_to_nid(page_to_maddr(pg)))); >>>> +} >>> >>> If such a helper is indeed needed, then I think it absolutely wants >>> pg to be pointer-to-const. And imo it would also help readability if >>> the extra pair of parentheses around the nested function calls was >>> omitted. Given the naming concern, though, I wonder whether this >>> wouldn't better be open-coded in the one place it is used. Actually >>> naming is quite odd here beyond what I'd said further up: "Is this >>> page contiguous?" Such a question requires two pages, i.e. "Are these >>> two pages contiguous?" What you want to know is "Is this page on the >>> given node?" >> >> There will be more check in the future (see next patch). I created an >> helper because it reduces the size of the loop init_heap_pages(). I >> would be OK to fold it if you strongly prefer that. > > I don't "strongly" prefer that; I'd also be okay with a suitably named > helper. Just that I can't seem to be able to come up with any good name. ... I am not sure what could be a suitable name for this helper. I will have a look how bad the fold version look like. > >>>> +/* >>>> + * This function should only be called with valid pages from the same NUMA >>>> + * node. >>>> + * >>>> + * Callers should use is_contig_page() first to check if all the pages >>>> + * in a range are contiguous. >>>> + */ >>>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, >>>> + bool need_scrub) >>>> +{ >>>> + unsigned long s, e; >>>> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >>>> + >>>> + s = mfn_x(page_to_mfn(pg)); >>>> + e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); >>>> + if ( unlikely(!avail[nid]) ) >>>> + { >>>> + bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) && >>> >>> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant? >> >> This is existing code and it is quite complex. So I would prefer if we >> avoid to simplify and move the code in the same patch. I would be happy >> to write a follow-up patch to switch to IS_ALIGNED(). > > I do realize it's code you move, but I can accept your desire to merely > move the code without any cleanup. Personally I think that rather than a > follow-up patch (which doesn't help the reviewing of this one) such an > adjustment would better be a prereq one. I will look for a prereq. > >>>> @@ -1812,35 +1851,24 @@ static void init_heap_pages( >>>> spin_unlock(&heap_lock); >>>> >>>> if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE ) >>>> - idle_scrub = true; >>>> + need_scrub = true; >>>> >>>> - for ( i = 0; i < nr_pages; i++ ) >>>> + for ( i = 0; i < nr_pages; ) >>>> { >>>> - unsigned int nid = phys_to_nid(page_to_maddr(pg+i)); >>>> + unsigned int nid = phys_to_nid(page_to_maddr(pg)); >>>> + unsigned long left = nr_pages - i; >>>> + unsigned long contig_pages; >>>> >>>> - if ( unlikely(!avail[nid]) ) >>>> + for ( contig_pages = 1; contig_pages < left; contig_pages++ ) >>>> { >>>> - unsigned long s = mfn_x(page_to_mfn(pg + i)); >>>> - unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); >>>> - bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) && >>>> - !(s & ((1UL << MAX_ORDER) - 1)) && >>>> - (find_first_set_bit(e) <= find_first_set_bit(s)); >>>> - unsigned long n; >>>> - >>>> - n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i, >>>> - &use_tail); >>>> - BUG_ON(i + n > nr_pages); >>>> - if ( n && !use_tail ) >>>> - { >>>> - i += n - 1; >>>> - continue; >>>> - } >>>> - if ( i + n == nr_pages ) >>>> + if ( !is_contig_page(pg + contig_pages, nid) ) >>>> break; >>>> - nr_pages -= n; >>>> } >>> >>> Isn't doing this page by page in a loop quite inefficient? Can't you >>> simply obtain the end of the node's range covering the first page, and >>> then adjust "left" accordingly? >> >> The page by page is necessary because we may need to exclude some pages >> (see [1]) or the range may cross multiple-zone (see [2]). > > If you want/need to do this for "future" reasons (aiui [1] was never > committed You are correct. I would like to revive it at some point. , and you forgot to supply [2], but yes, splitting at zone > boundaries is of course necessary) Sorry. I was meant to write patch #2: 20220609083039.76667-3-julien@xen.org > , then I think this wants making quite > clear in the description. Good point. I will update the commit message. Cheers,
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 3e6504283f1e..a1938df1406c 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status) } /* - * Hand the specified arbitrary page range to the specified heap zone - * checking the node_id of the previous page. If they differ and the - * latter is not on a MAX_ORDER boundary, then we reserve the page by - * not freeing it to the buddy allocator. + * init_contig_heap_pages() is intended to only take pages from the same + * NUMA node. */ +static bool is_contig_page(struct page_info *pg, unsigned int nid) +{ + return (nid == (phys_to_nid(page_to_maddr(pg)))); +} + +/* + * This function should only be called with valid pages from the same NUMA + * node. + * + * Callers should use is_contig_page() first to check if all the pages + * in a range are contiguous. + */ +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, + bool need_scrub) +{ + unsigned long s, e; + unsigned int nid = phys_to_nid(page_to_maddr(pg)); + + s = mfn_x(page_to_mfn(pg)); + e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); + if ( unlikely(!avail[nid]) ) + { + bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) && + (find_first_set_bit(e) <= find_first_set_bit(s)); + unsigned long n; + + n = init_node_heap(nid, s, nr_pages, &use_tail); + BUG_ON(n > nr_pages); + if ( use_tail ) + e -= n; + else + s += n; + } + + while ( s < e ) + { + free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub); + s += 1UL; + } +} + static void init_heap_pages( struct page_info *pg, unsigned long nr_pages) { unsigned long i; - bool idle_scrub = false; + bool need_scrub = scrub_debug; /* * Keep MFN 0 away from the buddy allocator to avoid crossing zone @@ -1812,35 +1851,24 @@ static void init_heap_pages( spin_unlock(&heap_lock); if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE ) - idle_scrub = true; + need_scrub = true; - for ( i = 0; i < nr_pages; i++ ) + for ( i = 0; i < nr_pages; ) { - unsigned int nid = phys_to_nid(page_to_maddr(pg+i)); + unsigned int nid = phys_to_nid(page_to_maddr(pg)); + unsigned long left = nr_pages - i; + unsigned long contig_pages; - if ( unlikely(!avail[nid]) ) + for ( contig_pages = 1; contig_pages < left; contig_pages++ ) { - unsigned long s = mfn_x(page_to_mfn(pg + i)); - unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1)); - bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) && - !(s & ((1UL << MAX_ORDER) - 1)) && - (find_first_set_bit(e) <= find_first_set_bit(s)); - unsigned long n; - - n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i, - &use_tail); - BUG_ON(i + n > nr_pages); - if ( n && !use_tail ) - { - i += n - 1; - continue; - } - if ( i + n == nr_pages ) + if ( !is_contig_page(pg + contig_pages, nid) ) break; - nr_pages -= n; } - free_heap_pages(pg + i, 0, scrub_debug || idle_scrub); + init_contig_heap_pages(pg, contig_pages, need_scrub); + + pg += contig_pages; + i += contig_pages; } }