Message ID | 20220527080451.48549-1-linmiaohe@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/vmscan: don't try to reclaim freed folios | expand |
On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote: > If folios were freed from under us, there's no need to reclaim them. Skip > these folios to save lots of cpu cycles and avoid possible unnecessary > disk IO. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/vmscan.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f7d9a683e3a7..646dd1efad32 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list, > folio = lru_to_folio(page_list); > list_del(&folio->lru); > > + nr_pages = folio_nr_pages(folio); > + if (folio_ref_count(folio) == 1) { > + /* folio was freed from under us. So we are done. */ > + WARN_ON(!folio_put_testzero(folio)); What? No. This can absolutely happen. We have a refcount on the folio, which means that any other thread can temporarily raise the refcount, so this WARN_ON can trigger. Also, we don't hold the folio locked, or an extra reference, so nr_pages is unstable because it can be split. > + goto free_it; > + } > + > if (!folio_trylock(folio)) > goto keep; > > VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > > - nr_pages = folio_nr_pages(folio); > > /* Account the number of base pages */ > sc->nr_scanned += nr_pages; > -- > 2.23.0 > >
On 2022/5/27 23:02, Matthew Wilcox wrote: > On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote: >> If folios were freed from under us, there's no need to reclaim them. Skip >> these folios to save lots of cpu cycles and avoid possible unnecessary >> disk IO. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/vmscan.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index f7d9a683e3a7..646dd1efad32 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list, >> folio = lru_to_folio(page_list); >> list_del(&folio->lru); >> >> + nr_pages = folio_nr_pages(folio); >> + if (folio_ref_count(folio) == 1) { >> + /* folio was freed from under us. So we are done. */ >> + WARN_ON(!folio_put_testzero(folio)); > > What? No. This can absolutely happen. We have a refcount on the folio, > which means that any other thread can temporarily raise the refcount, IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or under any use. So there should be no way that any other thread can temporarily raise the refcount when folio_ref_count == 1. Or am I miss something? > so this WARN_ON can trigger. Also, we don't hold the folio locked, > or an extra reference, so nr_pages is unstable because it can be split. Yes, you're right. When folio_ref_count != 1, nr_pages is unstable. Will fix it if v2 is possible. :) Thanks a lot for review and comment! > >> + goto free_it; >> + } >> + >> if (!folio_trylock(folio)) >> goto keep; >> >> VM_BUG_ON_FOLIO(folio_test_active(folio), folio); >> >> - nr_pages = folio_nr_pages(folio); >> >> /* Account the number of base pages */ >> sc->nr_scanned += nr_pages; >> -- >> 2.23.0 >> >> > > . >
On Sat, May 28, 2022 at 10:52:11AM +0800, Miaohe Lin wrote: > On 2022/5/27 23:02, Matthew Wilcox wrote: > > What? No. This can absolutely happen. We have a refcount on the folio, > > which means that any other thread can temporarily raise the refcount, > > IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or > under any use. So there should be no way that any other thread can temporarily raise the refcount when > folio_ref_count == 1. Or am I miss something? Take a look at something like GUP (fast). If this page _was_ mapped to userspace, something like this can happen: Thread A Thread B load PTE unmap page refcount goes to 1 vmscan sees the page try_get_ref refcount is now 2. WARN_ON. Thread A will see that the PTE has changed and will now drop its reference, but Thread B already spat out the WARN. A similar thing can happen with the page cache. If this is a worthwhile optimisation (does it happen often that we find a refcount == 1 page?), we could do something like ... if (folio_ref_freeze(folio, 1)) { nr_pages = folio_nr_pages(folio); goto free_it; } ... or ... if (folio_ref_count(folio) == 1 && folio_ref_freeze(folio, 1)) { ... if we want to test-and-test-and-clear But this function is far too complicated already. I really want to see numbers that proves the extra complexity is worth it.
On 2022/5/28 11:13, Matthew Wilcox wrote: > On Sat, May 28, 2022 at 10:52:11AM +0800, Miaohe Lin wrote: >> On 2022/5/27 23:02, Matthew Wilcox wrote: >>> What? No. This can absolutely happen. We have a refcount on the folio, >>> which means that any other thread can temporarily raise the refcount, >> >> IIUC, the folio is only in the isolated page_list now and it's not in the page cache, swap cache, pagetable or >> under any use. So there should be no way that any other thread can temporarily raise the refcount when >> folio_ref_count == 1. Or am I miss something? > > Take a look at something like GUP (fast). If this page _was_ mapped to > userspace, something like this can happen: > > Thread A Thread B > load PTE > unmap page > refcount goes to 1 > vmscan sees the page > try_get_ref > refcount is now 2. WARN_ON. > > Thread A will see that the PTE has changed and will now drop its > reference, but Thread B already spat out the WARN. > > A similar thing can happen with the page cache. Oh, I see. Many thanks for your patient explanation! :) > > If this is a worthwhile optimisation (does it happen often that we find > a refcount == 1 page?), we could do something like ... No, It should be rare. > > if (folio_ref_freeze(folio, 1)) { > nr_pages = folio_nr_pages(folio); > goto free_it; > } > > ... or ... > > if (folio_ref_count(folio) == 1 && > folio_ref_freeze(folio, 1)) { > > ... if we want to test-and-test-and-clear These proposed code changes look good to me. > > But this function is far too complicated already. I really want to > see numbers that proves the extra complexity is worth it. This optimization can save lots of cpu cycles and avoid possible disk I/O in that specified case. But that is a somewhat rare case. So there's no numbers that proves the extra complexity is worth it. Should I drop this patch or proceed with the proposed code changes above in next version? :) Many thanks! > > > . >
On 2022/5/27 23:02, Matthew Wilcox wrote: > On Fri, May 27, 2022 at 04:04:51PM +0800, Miaohe Lin wrote: >> If folios were freed from under us, there's no need to reclaim them. Skip >> these folios to save lots of cpu cycles and avoid possible unnecessary >> disk IO. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/vmscan.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index f7d9a683e3a7..646dd1efad32 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list, >> folio = lru_to_folio(page_list); >> list_del(&folio->lru); >> >> + nr_pages = folio_nr_pages(folio); >> + if (folio_ref_count(folio) == 1) { >> + /* folio was freed from under us. So we are done. */ >> + WARN_ON(!folio_put_testzero(folio)); > > What? No. This can absolutely happen. We have a refcount on the folio, > which means that any other thread can temporarily raise the refcount, > so this WARN_ON can trigger. Also, we don't hold the folio locked, > or an extra reference, so nr_pages is unstable because it can be split. When I reread the code, I found caller holds an extra reference to the folio when calling isolate_lru_pages(), so folio can't be split and thus nr_pages should be stable indeed? Or am I miss something again? Thanks! > >> + goto free_it; >> + } >> + >> if (!folio_trylock(folio)) >> goto keep; >> >> VM_BUG_ON_FOLIO(folio_test_active(folio), folio); >> >> - nr_pages = folio_nr_pages(folio); >> >> /* Account the number of base pages */ >> sc->nr_scanned += nr_pages; >> -- >> 2.23.0 >> >> > > . >
diff --git a/mm/vmscan.c b/mm/vmscan.c index f7d9a683e3a7..646dd1efad32 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1556,12 +1556,18 @@ static unsigned int shrink_page_list(struct list_head *page_list, folio = lru_to_folio(page_list); list_del(&folio->lru); + nr_pages = folio_nr_pages(folio); + if (folio_ref_count(folio) == 1) { + /* folio was freed from under us. So we are done. */ + WARN_ON(!folio_put_testzero(folio)); + goto free_it; + } + if (!folio_trylock(folio)) goto keep; VM_BUG_ON_FOLIO(folio_test_active(folio), folio); - nr_pages = folio_nr_pages(folio); /* Account the number of base pages */ sc->nr_scanned += nr_pages;
If folios were freed from under us, there's no need to reclaim them. Skip these folios to save lots of cpu cycles and avoid possible unnecessary disk IO. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/vmscan.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)