Message ID | 351df0af-78f2-c20-2a6d-e5f978e5ca1@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [mmotm] mm: delete __ClearPageWaiters() | expand |
On Wed, Mar 2, 2022 at 6:56 PM Hugh Dickins <hughd@google.com> wrote: > > The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and > vmscan.c's free_unref_page_list() callers rely on that not to generate > bad_page() alerts. So __page_cache_release() and release_pages() (and > the presumably copy-and-pasted put_zone_device_private_or_public_page()) > are redundant and misleading to make a special point of clearing it (as > the "__" implies, it could only safely be used on the freeing path). > > Delete __ClearPageWaiters(). Remark on this in one of the "possible" > comments in wake_up_page_bit(), and delete the superfluous comments. > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > We've used this since 2018, and I see Yu Zhao posted similar in 2020: > https://lore.kernel.org/linux-mm/20200818184704.3625199-3-yuzhao@google.com/ > I couldn't join in at that time, but think its reception was over-cautious. Indeed. Tested-by: Yu Zhao <yuzhao@google.com>
On 03.03.22 02:56, Hugh Dickins wrote: > The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and > vmscan.c's free_unref_page_list() callers rely on that not to generate > bad_page() alerts. So __page_cache_release() and release_pages() (and > the presumably copy-and-pasted put_zone_device_private_or_public_page()) > are redundant and misleading to make a special point of clearing it (as > the "__" implies, it could only safely be used on the freeing path). > > Delete __ClearPageWaiters(). Remark on this in one of the "possible" > comments in wake_up_page_bit(), and delete the superfluous comments. > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > We've used this since 2018, and I see Yu Zhao posted similar in 2020: > https://lore.kernel.org/linux-mm/20200818184704.3625199-3-yuzhao@google.com/ > I couldn't join in at that time, but think its reception was over-cautious. > > include/linux/page-flags.h | 2 +- > mm/filemap.c | 22 +++++++--------------- > mm/memremap.c | 2 -- > mm/swap.c | 4 ---- > 4 files changed, 8 insertions(+), 22 deletions(-) > > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -481,7 +481,7 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n) > TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname) > > __PAGEFLAG(Locked, locked, PF_NO_TAIL) > -PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) > +PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) > PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL) > PAGEFLAG(Referenced, referenced, PF_HEAD) > TESTCLEARFLAG(Referenced, referenced, PF_HEAD) > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1179,24 +1179,16 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) > } > > /* > - * It is possible for other pages to have collided on the waitqueue > - * hash, so in that case check for a page match. That prevents a long- > - * term waiter > + * It's possible to miss clearing waiters here, when we woke our page > + * waiters, but the hashed waitqueue has waiters for other pages on it. > * > - * It is still possible to miss a case here, when we woke page waiters > - * and removed them from the waitqueue, but there are still other > - * page waiters. > + * That's okay, it's a rare case. The next waker will clear it. Or, > + * it might be left set until the page is freed: when it's masked off > + * with others in PAGE_FLAGS_CHECK_AT_PREP, by free_pages_prepare(). > */ Does that also apply to ZONE_DEVICE pages via free_zone_device_page()?
On Thu, 3 Mar 2022, David Hildenbrand wrote: > On 03.03.22 02:56, Hugh Dickins wrote: > > The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and > > vmscan.c's free_unref_page_list() callers rely on that not to generate > > bad_page() alerts. So __page_cache_release() and release_pages() (and > > the presumably copy-and-pasted put_zone_device_private_or_public_page()) Hah, I'm showing my age there, or the patch's age: it's been rebranded frequently since then, with linux-next calling it free_zone_device_page(), as you kindly point out. How long before it's free_zone_device_folio()? > > are redundant and misleading to make a special point of clearing it (as > > the "__" implies, it could only safely be used on the freeing path). > > > > Delete __ClearPageWaiters(). Remark on this in one of the "possible" > > comments in wake_up_page_bit(), and delete the superfluous comments. > > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > --- > > We've used this since 2018, and I see Yu Zhao posted similar in 2020: > > https://lore.kernel.org/linux-mm/20200818184704.3625199-3-yuzhao@google.com/ > > I couldn't join in at that time, but think its reception was over-cautious. > > > > include/linux/page-flags.h | 2 +- > > mm/filemap.c | 22 +++++++--------------- > > mm/memremap.c | 2 -- > > mm/swap.c | 4 ---- > > 4 files changed, 8 insertions(+), 22 deletions(-) > > > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -1179,24 +1179,16 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) > > } > > > > /* > > - * It is possible for other pages to have collided on the waitqueue > > - * hash, so in that case check for a page match. That prevents a long- > > - * term waiter > > + * It's possible to miss clearing waiters here, when we woke our page > > + * waiters, but the hashed waitqueue has waiters for other pages on it. > > * > > - * It is still possible to miss a case here, when we woke page waiters > > - * and removed them from the waitqueue, but there are still other > > - * page waiters. > > + * That's okay, it's a rare case. The next waker will clear it. Or, > > + * it might be left set until the page is freed: when it's masked off > > + * with others in PAGE_FLAGS_CHECK_AT_PREP, by free_pages_prepare(). > > */ > > Does that also apply to ZONE_DEVICE pages via free_zone_device_page()? I'm sure you could tell me a lot more about ZONE_DEVICE pages than I could ever tell you. But, if they don't ever reach the main page freer, then they're in the same category as other pages not freed until reboot: any clearing of left-behind PG_waiters will be done by the next waker, not by reaching free_pages_prepare(). Does that really require special mention of ZONE_DEVICE pages here? Would I do better just to remove the comment on PAGE_FLAGS_CHECK_AT_PREP being one of the clearers? (I had to do a bit of research before answering: temporarily confused about the role of PG_waiters, I worried that removing copy-and-pasted __ClearPageWaiters from free_zone_device_page() might risk gradually clogging up the hash queues with spuriously waited pages; no, nonsense, it's just a matter of how efficient the next folio_unlock() will be.) Hugh
On Thu, Mar 03, 2022 at 02:28:46PM -0800, Hugh Dickins wrote: > On Thu, 3 Mar 2022, David Hildenbrand wrote: > > On 03.03.22 02:56, Hugh Dickins wrote: > > > The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and > > > vmscan.c's free_unref_page_list() callers rely on that not to generate > > > bad_page() alerts. So __page_cache_release() and release_pages() (and > > > the presumably copy-and-pasted put_zone_device_private_or_public_page()) > > Hah, I'm showing my age there, or the patch's age: it's been rebranded > frequently since then, with linux-next calling it free_zone_device_page(), > as you kindly point out. How long before it's free_zone_device_folio()? Probably not a serious question, but within the next year, I expect. I have a prototype patch to do the entire page freeing path, but it wasn't a priority for this merge window.
On 03.03.22 23:28, Hugh Dickins wrote: > On Thu, 3 Mar 2022, David Hildenbrand wrote: >> On 03.03.22 02:56, Hugh Dickins wrote: >>> The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and >>> vmscan.c's free_unref_page_list() callers rely on that not to generate >>> bad_page() alerts. So __page_cache_release() and release_pages() (and >>> the presumably copy-and-pasted put_zone_device_private_or_public_page()) > > Hah, I'm showing my age there, or the patch's age: it's been rebranded > frequently since then, with linux-next calling it free_zone_device_page(), > as you kindly point out. How long before it's free_zone_device_folio()? :) > >>> are redundant and misleading to make a special point of clearing it (as >>> the "__" implies, it could only safely be used on the freeing path). >>> >>> Delete __ClearPageWaiters(). Remark on this in one of the "possible" >>> comments in wake_up_page_bit(), and delete the superfluous comments. >>> >>> Signed-off-by: Hugh Dickins <hughd@google.com> >>> --- >>> We've used this since 2018, and I see Yu Zhao posted similar in 2020: >>> https://lore.kernel.org/linux-mm/20200818184704.3625199-3-yuzhao@google.com/ >>> I couldn't join in at that time, but think its reception was over-cautious. >>> >>> include/linux/page-flags.h | 2 +- >>> mm/filemap.c | 22 +++++++--------------- >>> mm/memremap.c | 2 -- >>> mm/swap.c | 4 ---- >>> 4 files changed, 8 insertions(+), 22 deletions(-) >>> >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -1179,24 +1179,16 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) >>> } >>> >>> /* >>> - * It is possible for other pages to have collided on the waitqueue >>> - * hash, so in that case check for a page match. That prevents a long- >>> - * term waiter >>> + * It's possible to miss clearing waiters here, when we woke our page >>> + * waiters, but the hashed waitqueue has waiters for other pages on it. >>> * >>> - * It is still possible to miss a case here, when we woke page waiters >>> - * and removed them from the waitqueue, but there are still other >>> - * page waiters. >>> + * That's okay, it's a rare case. The next waker will clear it. Or, >>> + * it might be left set until the page is freed: when it's masked off >>> + * with others in PAGE_FLAGS_CHECK_AT_PREP, by free_pages_prepare(). >>> */ >> >> Does that also apply to ZONE_DEVICE pages via free_zone_device_page()? > > I'm sure you could tell me a lot more about ZONE_DEVICE pages than I > could ever tell you. But, if they don't ever reach the main page freer, > then they're in the same category as other pages not freed until reboot: > any clearing of left-behind PG_waiters will be done by the next waker, > not by reaching free_pages_prepare(). Does that really require special > mention of ZONE_DEVICE pages here? Would I do better just to remove > the comment on PAGE_FLAGS_CHECK_AT_PREP being one of the clearers? In this context we can consider ZONE_DEVICE pages just like any other pages that, although getting freed, are not returned to the buddy, but instead are returned to another pool. So PAGE_FLAGS_CHECK_AT_PREP won't apply and free_pages_prepare() won't apply. Another example would be hugetlb pages, that are returned to the hugetlb pool, but not back to the buddy unless the huge page pool is shrunk. So I feel like the underlying principle here is: we don't *care* if PG_waiter is cleared when a page gets freed, because it will simply get cleared by the next waker if it sticks around. Then, I agree, we can just drop the comment regarding PAGE_FLAGS_CHECK_AT_PREP and instead have something like " That's okay, it's a rare case and the next waker will just clear it. Note that, depending on the page pool (buddy, ZONE_DEVICE, hugetlb), we might clear the flag while freeing the page, however, this is not required for correctness. "
On Fri, 4 Mar 2022, David Hildenbrand wrote: > > In this context we can consider ZONE_DEVICE pages just like any other > pages that, although getting freed, are not returned to the buddy, but > instead are returned to another pool. So PAGE_FLAGS_CHECK_AT_PREP won't > apply and free_pages_prepare() won't apply. > > Another example would be hugetlb pages, that are returned to the hugetlb > pool, but not back to the buddy unless the huge page pool is shrunk. > > > So I feel like the underlying principle here is: we don't *care* if > PG_waiter is cleared when a page gets freed, because it will simply get > cleared by the next waker if it sticks around. I think we were focused on different issues here. I was focused on how it was redundant for those places to clear the bit, because it was going to get cleared anyway just after (in the buddy case). Whereas you are focused on how it doesn't matter at all whether it gets cleared when freeing. Both valid points. > > Then, I agree, we can just drop the comment regarding > PAGE_FLAGS_CHECK_AT_PREP and instead have something like Okay, the reference to PAGE_FLAGS_CHECK_AT_PREP in the commit message is good enough for me, no need to make a point of it in the code comment. > > > " > That's okay, it's a rare case and the next waker will just clear it. > Note that, depending on the page pool (buddy, ZONE_DEVICE, hugetlb), we > might clear the flag while freeing the page, however, this is not > required for correctness. > " Okay, v2 coming up: I've taken largely your wording (but not exactly). Thanks, Hugh
--- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -481,7 +481,7 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n) TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname) __PAGEFLAG(Locked, locked, PF_NO_TAIL) -PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) +PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL) PAGEFLAG(Referenced, referenced, PF_HEAD) TESTCLEARFLAG(Referenced, referenced, PF_HEAD) --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1179,24 +1179,16 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) } /* - * It is possible for other pages to have collided on the waitqueue - * hash, so in that case check for a page match. That prevents a long- - * term waiter + * It's possible to miss clearing waiters here, when we woke our page + * waiters, but the hashed waitqueue has waiters for other pages on it. * - * It is still possible to miss a case here, when we woke page waiters - * and removed them from the waitqueue, but there are still other - * page waiters. + * That's okay, it's a rare case. The next waker will clear it. Or, + * it might be left set until the page is freed: when it's masked off + * with others in PAGE_FLAGS_CHECK_AT_PREP, by free_pages_prepare(). */ - if (!waitqueue_active(q) || !key.page_match) { + if (!waitqueue_active(q) || !key.page_match) folio_clear_waiters(folio); - /* - * It's possible to miss clearing Waiters here, when we woke - * our page waiters, but the hashed waitqueue has waiters for - * other pages on it. - * - * That's okay, it's a rare case. The next waker will clear it. - */ - } + spin_unlock_irqrestore(&q->lock, flags); } --- a/mm/memremap.c +++ b/mm/memremap.c @@ -487,8 +487,6 @@ void free_zone_device_page(struct page *page) if (WARN_ON_ONCE(!page->pgmap->ops || !page->pgmap->ops->page_free)) return; - __ClearPageWaiters(page); - mem_cgroup_uncharge(page_folio(page)); /* --- a/mm/swap.c +++ b/mm/swap.c @@ -97,7 +97,6 @@ static void __page_cache_release(struct page *page) mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); } - __ClearPageWaiters(page); } static void __put_single_page(struct page *page) @@ -152,7 +151,6 @@ void put_pages_list(struct list_head *pages) continue; } /* Cannot be PageLRU because it's passed to us using the lru */ - __ClearPageWaiters(page); } free_unref_page_list(pages); @@ -966,8 +964,6 @@ void release_pages(struct page **pages, int nr) count_vm_event(UNEVICTABLE_PGCLEARED); } - __ClearPageWaiters(page); - list_add(&page->lru, &pages_to_free); } if (lruvec)
The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and vmscan.c's free_unref_page_list() callers rely on that not to generate bad_page() alerts. So __page_cache_release() and release_pages() (and the presumably copy-and-pasted put_zone_device_private_or_public_page()) are redundant and misleading to make a special point of clearing it (as the "__" implies, it could only safely be used on the freeing path). Delete __ClearPageWaiters(). Remark on this in one of the "possible" comments in wake_up_page_bit(), and delete the superfluous comments. Signed-off-by: Hugh Dickins <hughd@google.com> --- We've used this since 2018, and I see Yu Zhao posted similar in 2020: https://lore.kernel.org/linux-mm/20200818184704.3625199-3-yuzhao@google.com/ I couldn't join in at that time, but think its reception was over-cautious. include/linux/page-flags.h | 2 +- mm/filemap.c | 22 +++++++--------------- mm/memremap.c | 2 -- mm/swap.c | 4 ---- 4 files changed, 8 insertions(+), 22 deletions(-)