Message ID | 20240702020931.7061-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/page_alloc: remove prefetchw() on freeing page to buddy system | expand |
On Tue, 2 Jul 2024 02:09:31 +0000 Wei Yang <richard.weiyang@gmail.com> wrote: > The prefetchw() is introduced from an ancient patch[1]. > > The change log says: > > The basic idea is to free higher order pages instead of going > through every single one. Also, some unnecessary atomic operations > are done away with and replaced with non-atomic equivalents, and > prefetching is done where it helps the most. For a more in-depth > discusion of this patch, please see the linux-ia64 archives (topic > is "free bootmem feedback patch"). > > So there are several changes improve the bootmem freeing, in which the > most basic idea is freeing higher order pages. And as Matthew says, > "Itanium CPUs of this era had no prefetchers." > > I did 10 round bootup tests before and after this change, the data > doesn't prove prefetchw() help speeding up bootmem freeing. The sum of > the 10 round bootmem freeing time after prefetchw() removal even 5.2% > faster than before. I don't think I've ever seen prefetch make a damn bit of difference. > [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > Suggested-by: Matthew Wilcox <willy@infradead.org> > CC: David Hildenbrand <david@redhat.com> > > --- > The patch is based on mm-stable with David's change. Oh help. David makes many changes. Please identify patches with much care. Fully quoting the email title works, as does a link.
On 02.07.24 04:09, Wei Yang wrote: > The prefetchw() is introduced from an ancient patch[1]. > > The change log says: > > The basic idea is to free higher order pages instead of going > through every single one. Also, some unnecessary atomic operations > are done away with and replaced with non-atomic equivalents, and > prefetching is done where it helps the most. For a more in-depth > discusion of this patch, please see the linux-ia64 archives (topic > is "free bootmem feedback patch"). > > So there are several changes improve the bootmem freeing, in which the > most basic idea is freeing higher order pages. And as Matthew says, > "Itanium CPUs of this era had no prefetchers." > > I did 10 round bootup tests before and after this change, the data > doesn't prove prefetchw() help speeding up bootmem freeing. The sum of > the 10 round bootmem freeing time after prefetchw() removal even 5.2% > faster than before. I suspect this is noise, though. > > [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > Suggested-by: Matthew Wilcox <willy@infradead.org> > CC: David Hildenbrand <david@redhat.com> > > --- > The patch is based on mm-stable with David's change. > --- > mm/page_alloc.c | 13 ++----------- > 1 file changed, 2 insertions(+), 11 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 116ee33fd1ce..c46aedfc9a12 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1236,16 +1236,11 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, > */ > if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && > unlikely(context == MEMINIT_HOTPLUG)) { > - prefetchw(p); > - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { > - prefetchw(p + 1); > + for (loop = 0; loop < nr_pages; loop++, p++) { > VM_WARN_ON_ONCE(PageReserved(p)); > __ClearPageOffline(p); > set_page_count(p, 0); > } Something like: for (;;) { ... if (++loop >= nr_pages) break; p++; } Might generate slightly better code, because we know that we execute the loop body at least once. We use that in set_ptes(), for example. > - VM_WARN_ON_ONCE(PageReserved(p)); > - __ClearPageOffline(p); > - set_page_count(p, 0); > > /* > * Freeing the page with debug_pagealloc enabled will try to > @@ -1255,14 +1250,10 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, > debug_pagealloc_map_pages(page, nr_pages); > adjust_managed_page_count(page, nr_pages); > } else { > - prefetchw(p); > - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { > - prefetchw(p + 1); > + for (loop = 0; loop < nr_pages; loop++, p++) { > __ClearPageReserved(p); > set_page_count(p, 0); > } > - __ClearPageReserved(p); > - set_page_count(p, 0); > > /* memblock adjusts totalram_pages() manually. */ > atomic_long_add(nr_pages, &page_zone(page)->managed_pages); Much better Reviewed-by: David Hildenbrand <david@redhat.com>
On Mon, Jul 01, 2024 at 11:22:24PM -0700, Andrew Morton wrote: >On Tue, 2 Jul 2024 02:09:31 +0000 Wei Yang <richard.weiyang@gmail.com> wrote: > >> The prefetchw() is introduced from an ancient patch[1]. >> >> The change log says: >> >> The basic idea is to free higher order pages instead of going >> through every single one. Also, some unnecessary atomic operations >> are done away with and replaced with non-atomic equivalents, and >> prefetching is done where it helps the most. For a more in-depth >> discusion of this patch, please see the linux-ia64 archives (topic >> is "free bootmem feedback patch"). >> >> So there are several changes improve the bootmem freeing, in which the >> most basic idea is freeing higher order pages. And as Matthew says, >> "Itanium CPUs of this era had no prefetchers." >> >> I did 10 round bootup tests before and after this change, the data >> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of >> the 10 round bootmem freeing time after prefetchw() removal even 5.2% >> faster than before. > >I don't think I've ever seen prefetch make a damn bit of difference. > >> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> Suggested-by: Matthew Wilcox <willy@infradead.org> >> CC: David Hildenbrand <david@redhat.com> >> >> --- >> The patch is based on mm-stable with David's change. > >Oh help. David makes many changes. Please identify patches with much >care. Fully quoting the email title works, as does a link. > The commit is 3dadec1babf9eee0c67c967df931d6f0cb124a04 mm: pass meminit_context to __free_pages_core() The link is, if I am correct. https://lore.kernel.org/all/20240607090939.89524-2-david@redhat.com/ BTW, how we track the mail link? Just search in lore.kernel.org?
On Tue, Jul 02, 2024 at 08:57:57AM +0200, David Hildenbrand wrote: >On 02.07.24 04:09, Wei Yang wrote: >> The prefetchw() is introduced from an ancient patch[1]. >> >> The change log says: >> >> The basic idea is to free higher order pages instead of going >> through every single one. Also, some unnecessary atomic operations >> are done away with and replaced with non-atomic equivalents, and >> prefetching is done where it helps the most. For a more in-depth >> discusion of this patch, please see the linux-ia64 archives (topic >> is "free bootmem feedback patch"). >> >> So there are several changes improve the bootmem freeing, in which the >> most basic idea is freeing higher order pages. And as Matthew says, >> "Itanium CPUs of this era had no prefetchers." >> >> I did 10 round bootup tests before and after this change, the data >> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of >> the 10 round bootmem freeing time after prefetchw() removal even 5.2% >> faster than before. > >I suspect this is noise, though. > >> >> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> Suggested-by: Matthew Wilcox <willy@infradead.org> >> CC: David Hildenbrand <david@redhat.com> >> >> --- >> The patch is based on mm-stable with David's change. >> --- >> mm/page_alloc.c | 13 ++----------- >> 1 file changed, 2 insertions(+), 11 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 116ee33fd1ce..c46aedfc9a12 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1236,16 +1236,11 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, >> */ >> if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && >> unlikely(context == MEMINIT_HOTPLUG)) { >> - prefetchw(p); >> - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { >> - prefetchw(p + 1); >> + for (loop = 0; loop < nr_pages; loop++, p++) { >> VM_WARN_ON_ONCE(PageReserved(p)); >> __ClearPageOffline(p); >> set_page_count(p, 0); >> } > >Something like: > >for (;;) { > ... > if (++loop >= nr_pages) > break; > p++; >} > So you prefer to have another version with this format? Sth like this? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c46aedfc9a12..5235015eba3d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1224,7 +1224,7 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, { unsigned int nr_pages = 1 << order; struct page *p = page; - unsigned int loop; + unsigned int loop = 0; /* * When initializing the memmap, __init_single_page() sets the refcount @@ -1236,10 +1236,13 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, */ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && unlikely(context == MEMINIT_HOTPLUG)) { - for (loop = 0; loop < nr_pages; loop++, p++) { + for (;;) { VM_WARN_ON_ONCE(PageReserved(p)); __ClearPageOffline(p); set_page_count(p, 0); + if (++loop >= nr_pages) + break; + p++; } /* @@ -1250,9 +1253,12 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, debug_pagealloc_map_pages(page, nr_pages); adjust_managed_page_count(page, nr_pages); } else { - for (loop = 0; loop < nr_pages; loop++, p++) { + for (;;) { __ClearPageReserved(p); set_page_count(p, 0); + if (++loop >= nr_pages) + break; + p++; } /* memblock adjusts totalram_pages() manually. */ > >Might generate slightly better code, because we know that we execute the loop >body at least once. We use that in set_ptes(), for example. > >> - VM_WARN_ON_ONCE(PageReserved(p)); >> - __ClearPageOffline(p); >> - set_page_count(p, 0); >> /* >> * Freeing the page with debug_pagealloc enabled will try to >> @@ -1255,14 +1250,10 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, >> debug_pagealloc_map_pages(page, nr_pages); >> adjust_managed_page_count(page, nr_pages); >> } else { >> - prefetchw(p); >> - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { >> - prefetchw(p + 1); >> + for (loop = 0; loop < nr_pages; loop++, p++) { >> __ClearPageReserved(p); >> set_page_count(p, 0); >> } >> - __ClearPageReserved(p); >> - set_page_count(p, 0); >> /* memblock adjusts totalram_pages() manually. */ >> atomic_long_add(nr_pages, &page_zone(page)->managed_pages); > >Much better > >Reviewed-by: David Hildenbrand <david@redhat.com> > >-- >Cheers, > >David / dhildenb
On Wed, 3 Jul 2024 00:01:42 +0000 Wei Yang <richard.weiyang@gmail.com> wrote: > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > >> Suggested-by: Matthew Wilcox <willy@infradead.org> > >> CC: David Hildenbrand <david@redhat.com> > >> > >> --- > >> The patch is based on mm-stable with David's change. > > > >Oh help. David makes many changes. Please identify patches with much > >care. Fully quoting the email title works, as does a link. > > > > The commit is 3dadec1babf9eee0c67c967df931d6f0cb124a04 > > mm: pass meminit_context to __free_pages_core() OK, that's in mm-stable. > The link is, if I am correct. > https://lore.kernel.org/all/20240607090939.89524-2-david@redhat.com/ > > BTW, how we track the mail link? Just search in lore.kernel.org? I'm not sure what you mean? I extract the message-id from the email and concat it with "https://lkml.kernel.org/r".
On Tue, Jul 02, 2024 at 05:49:39PM -0700, Andrew Morton wrote: >On Wed, 3 Jul 2024 00:01:42 +0000 Wei Yang <richard.weiyang@gmail.com> wrote: > >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> >> Suggested-by: Matthew Wilcox <willy@infradead.org> >> >> CC: David Hildenbrand <david@redhat.com> >> >> >> >> --- >> >> The patch is based on mm-stable with David's change. >> > >> >Oh help. David makes many changes. Please identify patches with much >> >care. Fully quoting the email title works, as does a link. >> > >> >> The commit is 3dadec1babf9eee0c67c967df931d6f0cb124a04 >> >> mm: pass meminit_context to __free_pages_core() > >OK, that's in mm-stable. > >> The link is, if I am correct. >> https://lore.kernel.org/all/20240607090939.89524-2-david@redhat.com/ >> >> BTW, how we track the mail link? Just search in lore.kernel.org? > >I'm not sure what you mean? I extract the message-id from the email >and concat it with "https://lkml.kernel.org/r". Thanks, I got something new :-)
On 03.07.24 02:12, Wei Yang wrote: > On Tue, Jul 02, 2024 at 08:57:57AM +0200, David Hildenbrand wrote: >> On 02.07.24 04:09, Wei Yang wrote: >>> The prefetchw() is introduced from an ancient patch[1]. >>> >>> The change log says: >>> >>> The basic idea is to free higher order pages instead of going >>> through every single one. Also, some unnecessary atomic operations >>> are done away with and replaced with non-atomic equivalents, and >>> prefetching is done where it helps the most. For a more in-depth >>> discusion of this patch, please see the linux-ia64 archives (topic >>> is "free bootmem feedback patch"). >>> >>> So there are several changes improve the bootmem freeing, in which the >>> most basic idea is freeing higher order pages. And as Matthew says, >>> "Itanium CPUs of this era had no prefetchers." >>> >>> I did 10 round bootup tests before and after this change, the data >>> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of >>> the 10 round bootmem freeing time after prefetchw() removal even 5.2% >>> faster than before. >> >> I suspect this is noise, though. >> >>> >>> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ >>> >>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >>> Suggested-by: Matthew Wilcox <willy@infradead.org> >>> CC: David Hildenbrand <david@redhat.com> >>> >>> --- >>> The patch is based on mm-stable with David's change. >>> --- >>> mm/page_alloc.c | 13 ++----------- >>> 1 file changed, 2 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index 116ee33fd1ce..c46aedfc9a12 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -1236,16 +1236,11 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, >>> */ >>> if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && >>> unlikely(context == MEMINIT_HOTPLUG)) { >>> - prefetchw(p); >>> - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { >>> - prefetchw(p + 1); >>> + for (loop = 0; loop < nr_pages; loop++, p++) { >>> VM_WARN_ON_ONCE(PageReserved(p)); >>> __ClearPageOffline(p); >>> set_page_count(p, 0); >>> } >> >> Something like: >> >> for (;;) { >> ... >> if (++loop >= nr_pages) >> break; >> p++; >> } >> > > So you prefer to have another version with this format? Sth like this? Whatever you prefer, just pointing it out that we can do slightly better :)
On Tue, Jul 02, 2024 at 08:57:57AM +0200, David Hildenbrand wrote: > > I did 10 round bootup tests before and after this change, the data > > doesn't prove prefetchw() help speeding up bootmem freeing. The sum of > > the 10 round bootmem freeing time after prefetchw() removal even 5.2% > > faster than before. > > I suspect this is noise, though. I think it's real, though small. Each prefetchw() is an instruction, and if we can avoid issuing an instruction, we should. > Something like: > > for (;;) { > ... > if (++loop >= nr_pages) > break; > p++; > } > > > Might generate slightly better code, because we know that we execute the > loop body at least once. We use that in set_ptes(), for example. I don't think it's worth doing. Keep the loop simple and obvious. set_ptes() is different because we actually expect to execute the loop exactly once (ie most folios are small). So two compares per call to set_ptes() instead of one makes a difference. Here, we're expecting to execute this loop, what, a million times? Doing a million-and-one compares instead of a million makes no observable difference. I would like to see v2 of this patch dropped, please Andrew.
On Thu, 4 Jul 2024 04:30:50 +0100 Matthew Wilcox <willy@infradead.org> wrote: > > Something like: > > > > for (;;) { > > ... > > if (++loop >= nr_pages) > > break; > > p++; > > } > > > > > > Might generate slightly better code, because we know that we execute the > > loop body at least once. We use that in set_ptes(), for example. > > I don't think it's worth doing. Keep the loop simple and obvious. > set_ptes() is different because we actually expect to execute the loop > exactly once (ie most folios are small). So two compares per call to > set_ptes() instead of one makes a difference. Here, we're expecting > to execute this loop, what, a million times? Doing a million-and-one > compares instead of a million makes no observable difference. > > I would like to see v2 of this patch dropped, please Andrew. thud. Are you supportive of v1? --- a/mm/page_alloc.c~mm-page_alloc-remove-prefetchw-on-freeing-page-to-buddy-system +++ a/mm/page_alloc.c @@ -1236,16 +1236,11 @@ void __free_pages_core(struct page *page */ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && unlikely(context == MEMINIT_HOTPLUG)) { - prefetchw(p); - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { - prefetchw(p + 1); + for (loop = 0; loop < nr_pages; loop++, p++) { VM_WARN_ON_ONCE(PageReserved(p)); __ClearPageOffline(p); set_page_count(p, 0); } - VM_WARN_ON_ONCE(PageReserved(p)); - __ClearPageOffline(p); - set_page_count(p, 0); /* * Freeing the page with debug_pagealloc enabled will try to @@ -1255,14 +1250,10 @@ void __free_pages_core(struct page *page debug_pagealloc_map_pages(page, nr_pages); adjust_managed_page_count(page, nr_pages); } else { - prefetchw(p); - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { - prefetchw(p + 1); + for (loop = 0; loop < nr_pages; loop++, p++) { __ClearPageReserved(p); set_page_count(p, 0); } - __ClearPageReserved(p); - set_page_count(p, 0); /* memblock adjusts totalram_pages() manually. */ atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
On Wed, Jul 03, 2024 at 08:37:02PM -0700, Andrew Morton wrote:
> Are you supportive of v1?
Oh yes. I think it has my suggested-by on it, so I didn't see the
need to add an R-b.
On Thu, 4 Jul 2024 04:39:42 +0100 Matthew Wilcox <willy@infradead.org> wrote: > On Wed, Jul 03, 2024 at 08:37:02PM -0700, Andrew Morton wrote: > > Are you supportive of v1? > > Oh yes. I think it has my suggested-by on it, so I didn't see the > need to add an R-b. I'm a sucker for r-bys. Added, thanks.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 116ee33fd1ce..c46aedfc9a12 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1236,16 +1236,11 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, */ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) && unlikely(context == MEMINIT_HOTPLUG)) { - prefetchw(p); - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { - prefetchw(p + 1); + for (loop = 0; loop < nr_pages; loop++, p++) { VM_WARN_ON_ONCE(PageReserved(p)); __ClearPageOffline(p); set_page_count(p, 0); } - VM_WARN_ON_ONCE(PageReserved(p)); - __ClearPageOffline(p); - set_page_count(p, 0); /* * Freeing the page with debug_pagealloc enabled will try to @@ -1255,14 +1250,10 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, debug_pagealloc_map_pages(page, nr_pages); adjust_managed_page_count(page, nr_pages); } else { - prefetchw(p); - for (loop = 0; loop < (nr_pages - 1); loop++, p++) { - prefetchw(p + 1); + for (loop = 0; loop < nr_pages; loop++, p++) { __ClearPageReserved(p); set_page_count(p, 0); } - __ClearPageReserved(p); - set_page_count(p, 0); /* memblock adjusts totalram_pages() manually. */ atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
The prefetchw() is introduced from an ancient patch[1]. The change log says: The basic idea is to free higher order pages instead of going through every single one. Also, some unnecessary atomic operations are done away with and replaced with non-atomic equivalents, and prefetching is done where it helps the most. For a more in-depth discusion of this patch, please see the linux-ia64 archives (topic is "free bootmem feedback patch"). So there are several changes improve the bootmem freeing, in which the most basic idea is freeing higher order pages. And as Matthew says, "Itanium CPUs of this era had no prefetchers." I did 10 round bootup tests before and after this change, the data doesn't prove prefetchw() help speeding up bootmem freeing. The sum of the 10 round bootmem freeing time after prefetchw() removal even 5.2% faster than before. [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/ Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: Matthew Wilcox <willy@infradead.org> CC: David Hildenbrand <david@redhat.com> --- The patch is based on mm-stable with David's change. --- mm/page_alloc.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)