Message ID | 20220609083039.76667-3-julien@xen.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/mm: Optimize init_heap_pages() | expand |
On 09.06.2022 10:30, Julien Grall wrote: > From: Hongyan Xia <hongyxia@amazon.com> > > The idea is to split the range into multiple aligned power-of-2 regions > which only needs to call free_heap_pages() once each. We check the least > significant set bit of the start address and use its bit index as the > order of this increment. This makes sure that each increment is both > power-of-2 and properly aligned, which can be safely passed to > free_heap_pages(). Of course, the order also needs to be sanity checked > against the upper bound and MAX_ORDER. > > Testing on a nested environment on c5.metal with various amount > of RAM. Time for end_boot_allocator() to complete: > Before After > - 90GB: 1426 ms 166 ms > - 8GB: 124 ms 12 ms > - 4GB: 60 ms 6 ms > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com> > Signed-off-by: Julien Grall <jgrall@amazon.com> > --- > xen/common/page_alloc.c | 39 +++++++++++++++++++++++++++++++++------ > 1 file changed, 33 insertions(+), 6 deletions(-) > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index a1938df1406c..bf852cfc11ea 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1779,16 +1779,28 @@ int query_page_offline(mfn_t mfn, uint32_t *status) > > /* > * init_contig_heap_pages() is intended to only take pages from the same > - * NUMA node. > + * NUMA node and zone. > + * > + * For the latter, it is always true for !CONFIG_SEPARATE_XENHEAP since > + * free_heap_pages() can only take power-of-two ranges which never cross > + * zone boundaries. But for separate xenheap which is manually defined, > + * it is possible for a power-of-two range to cross zones, so we need to > + * check that as well. > */ > -static bool is_contig_page(struct page_info *pg, unsigned int nid) > +static bool is_contig_page(struct page_info *pg, unsigned int nid, > + unsigned int zone) > { > +#ifdef CONFIG_SEPARATE_XENHEAP > + if ( zone != page_to_zone(pg) ) > + return false; > +#endif > + > return (nid == (phys_to_nid(page_to_maddr(pg)))); > } > > /* > * This function should only be called with valid pages from the same NUMA > - * node. > + * node and the same zone. > * > * Callers should use is_contig_page() first to check if all the pages > * in a range are contiguous. > @@ -1817,8 +1829,22 @@ static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, > > while ( s < e ) > { > - free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub); > - s += 1UL; > + /* > + * For s == 0, we simply use the largest increment by checking the > + * index of the MSB set. For s != 0, we also need to ensure that the "The MSB" reads as it it was not in line with the code; at least I would, short of it being spelled out, assume it can only be the page's address which is meant (as is the case for LSB below). But it's the MSB of the range's size. > + * chunk is properly sized to end at power-of-two alignment. We do this > + * by checking the LSB set and use its index as the increment. Both > + * cases need to be guarded by MAX_ORDER. > + * > + * Note that the value of ffsl() and flsl() starts from 1 so we need > + * to decrement it by 1. > + */ > + int inc_order = min(MAX_ORDER, flsl(e - s) - 1); > + > + if ( s ) > + inc_order = min(inc_order, ffsl(s) - 1); > + free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub); > + s += (1UL << inc_order); > } > } > > @@ -1856,12 +1882,13 @@ static void init_heap_pages( > for ( i = 0; i < nr_pages; ) > { > unsigned int nid = phys_to_nid(page_to_maddr(pg)); > + unsigned int zone = page_to_zone(pg); > unsigned long left = nr_pages - i; > unsigned long contig_pages; > > for ( contig_pages = 1; contig_pages < left; contig_pages++ ) > { > - if ( !is_contig_page(pg + contig_pages, nid) ) > + if ( !is_contig_page(pg + contig_pages, nid, zone) ) > break; > } Coming back to your reply to my comment on patch 1: I think this addition of the node check is actually an argument for inlining the function's body here (and then dropping the function). That way the separate-Xen-heap aspect is visible at the place where it matters, rather than requiring an indirection via looking at the helper function (and leaving a dead parameter in the opposite case). But as said - I'm not going to insist as long as the helper function has a suitable name (representing what it does and not misguiding anyone with the common "contig"-means-addresses implication). Jan
Hi Julien, > On 9 Jun 2022, at 09:30, Julien Grall <julien@xen.org> wrote: > > From: Hongyan Xia <hongyxia@amazon.com> > > The idea is to split the range into multiple aligned power-of-2 regions > which only needs to call free_heap_pages() once each. We check the least > significant set bit of the start address and use its bit index as the > order of this increment. This makes sure that each increment is both > power-of-2 and properly aligned, which can be safely passed to > free_heap_pages(). Of course, the order also needs to be sanity checked > against the upper bound and MAX_ORDER. > > Testing on a nested environment on c5.metal with various amount > of RAM. Time for end_boot_allocator() to complete: > Before After > - 90GB: 1426 ms 166 ms > - 8GB: 124 ms 12 ms > - 4GB: 60 ms 6 ms On a arm64 Neoverse N1 system with 32GB of Ram I have: - 1180 ms before - 63 ms after and my internal tests are passing on arm64. Great optimisation :-) (I will do a full review of code the in a second step). > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com> > Signed-off-by: Julien Grall <jgrall@amazon.com> Cheers Bertrand
On 28/06/2022 15:40, Bertrand Marquis wrote: > Hi Julien, Hi Bertrand, >> On 9 Jun 2022, at 09:30, Julien Grall <julien@xen.org> wrote: >> >> From: Hongyan Xia <hongyxia@amazon.com> >> >> The idea is to split the range into multiple aligned power-of-2 regions >> which only needs to call free_heap_pages() once each. We check the least >> significant set bit of the start address and use its bit index as the >> order of this increment. This makes sure that each increment is both >> power-of-2 and properly aligned, which can be safely passed to >> free_heap_pages(). Of course, the order also needs to be sanity checked >> against the upper bound and MAX_ORDER. >> >> Testing on a nested environment on c5.metal with various amount >> of RAM. Time for end_boot_allocator() to complete: >> Before After >> - 90GB: 1426 ms 166 ms >> - 8GB: 124 ms 12 ms >> - 4GB: 60 ms 6 ms > > > On a arm64 Neoverse N1 system with 32GB of Ram I have: > - 1180 ms before > - 63 ms after > > and my internal tests are passing on arm64. Thanks for the testing! The number are a lot better than I was actually expecting on arm64. > > Great optimisation :-) You will have to thanks Hongyan. He came up with the idea :). > > (I will do a full review of code the in a second step). I am planning to send a new version in the next few days. So you may want to wait before reviewing the series. Cheers,
Hi Jan, On 09/06/2022 14:22, Jan Beulich wrote: >> /* >> * This function should only be called with valid pages from the same NUMA >> - * node. >> + * node and the same zone. >> * >> * Callers should use is_contig_page() first to check if all the pages >> * in a range are contiguous. >> @@ -1817,8 +1829,22 @@ static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, >> >> while ( s < e ) >> { >> - free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub); >> - s += 1UL; >> + /* >> + * For s == 0, we simply use the largest increment by checking the >> + * index of the MSB set. For s != 0, we also need to ensure that the > > "The MSB" reads as it it was not in line with the code; at least I would, > short of it being spelled out, assume it can only be the page's address > which is meant (as is the case for LSB below). But it's the MSB of the > range's size. Indeed. I have reworded the comment to: /* * For s == 0, we simply use the largest increment by checking the * MSB of the region size. For s != 0, we also need to ensure that the * chunk is properly sized to end at power-of-two alignment. We do this * by checking the LSB of the start address and use its index as * the increment. Both cases need to be guarded by MAX_ORDER. * * Note that the value of ffsl() and flsl() starts from 1 so we need * to decrement it by 1. */ > >> + * chunk is properly sized to end at power-of-two alignment. We do this >> + * by checking the LSB set and use its index as the increment. Both >> + * cases need to be guarded by MAX_ORDER. >> + * >> + * Note that the value of ffsl() and flsl() starts from 1 so we need >> + * to decrement it by 1. >> + */ >> + int inc_order = min(MAX_ORDER, flsl(e - s) - 1); >> + >> + if ( s ) >> + inc_order = min(inc_order, ffsl(s) - 1); >> + free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub); >> + s += (1UL << inc_order); >> } >> } >> >> @@ -1856,12 +1882,13 @@ static void init_heap_pages( >> for ( i = 0; i < nr_pages; ) >> { >> unsigned int nid = phys_to_nid(page_to_maddr(pg)); >> + unsigned int zone = page_to_zone(pg); >> unsigned long left = nr_pages - i; >> unsigned long contig_pages; >> >> for ( contig_pages = 1; contig_pages < left; contig_pages++ ) >> { >> - if ( !is_contig_page(pg + contig_pages, nid) ) >> + if ( !is_contig_page(pg + contig_pages, nid, zone) ) >> break; >> } > > Coming back to your reply to my comment on patch 1: I think this > addition of the node check is actually an argument for inlining the > function's body here (and then dropping the function). That way the > separate-Xen-heap aspect is visible at the place where it matters, > rather than requiring an indirection via looking at the helper > function (and leaving a dead parameter in the opposite case). But as > said - I'm not going to insist as long as the helper function has a > suitable name (representing what it does and not misguiding anyone > with the common "contig"-means-addresses implication). I have followed your suggestion in patch #1: * is_contig_page() is now inlined * init_contig_heap_pages() was renamed to _init_heap_pages() Cheers,
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index a1938df1406c..bf852cfc11ea 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -1779,16 +1779,28 @@ int query_page_offline(mfn_t mfn, uint32_t *status) /* * init_contig_heap_pages() is intended to only take pages from the same - * NUMA node. + * NUMA node and zone. + * + * For the latter, it is always true for !CONFIG_SEPARATE_XENHEAP since + * free_heap_pages() can only take power-of-two ranges which never cross + * zone boundaries. But for separate xenheap which is manually defined, + * it is possible for a power-of-two range to cross zones, so we need to + * check that as well. */ -static bool is_contig_page(struct page_info *pg, unsigned int nid) +static bool is_contig_page(struct page_info *pg, unsigned int nid, + unsigned int zone) { +#ifdef CONFIG_SEPARATE_XENHEAP + if ( zone != page_to_zone(pg) ) + return false; +#endif + return (nid == (phys_to_nid(page_to_maddr(pg)))); } /* * This function should only be called with valid pages from the same NUMA - * node. + * node and the same zone. * * Callers should use is_contig_page() first to check if all the pages * in a range are contiguous. @@ -1817,8 +1829,22 @@ static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages, while ( s < e ) { - free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub); - s += 1UL; + /* + * For s == 0, we simply use the largest increment by checking the + * index of the MSB set. For s != 0, we also need to ensure that the + * chunk is properly sized to end at power-of-two alignment. We do this + * by checking the LSB set and use its index as the increment. Both + * cases need to be guarded by MAX_ORDER. + * + * Note that the value of ffsl() and flsl() starts from 1 so we need + * to decrement it by 1. + */ + int inc_order = min(MAX_ORDER, flsl(e - s) - 1); + + if ( s ) + inc_order = min(inc_order, ffsl(s) - 1); + free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub); + s += (1UL << inc_order); } } @@ -1856,12 +1882,13 @@ static void init_heap_pages( for ( i = 0; i < nr_pages; ) { unsigned int nid = phys_to_nid(page_to_maddr(pg)); + unsigned int zone = page_to_zone(pg); unsigned long left = nr_pages - i; unsigned long contig_pages; for ( contig_pages = 1; contig_pages < left; contig_pages++ ) { - if ( !is_contig_page(pg + contig_pages, nid) ) + if ( !is_contig_page(pg + contig_pages, nid, zone) ) break; }