Message ID | 20230417075643.3287513-3-fengwei.yin@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Reduce lock contention related with large folio | expand |
On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote: > Currently, large folio is not batched added to lru list. Which > cause high lru lock contention after enable large folio for > anonymous mappping. Obviously, I think we should be doing a batched add, but I don't think this is right. > @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec) > static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page) > { > pvec->pages[pvec->nr++] = page; > - return pagevec_space(pvec); > + pvec->pages_nr += compound_nr(page); > + > + if (pvec->pages_nr > PAGEVEC_SIZE) > + return 0; > + else > + return pagevec_space(pvec); I assume your thinking here is that we should limit the number of pages in the batches, but I think we should allow the number of folios to reach PAGEVEC_SIZE before we drain the batch onto the LRU list. That will reduce the contention on the LRU lock even further.
On 4/17/23 20:25, Matthew Wilcox wrote: > On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote: >> Currently, large folio is not batched added to lru list. Which >> cause high lru lock contention after enable large folio for >> anonymous mappping. > > Obviously, I think we should be doing a batched add, but I don't think > this is right. > >> @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec) >> static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page) >> { >> pvec->pages[pvec->nr++] = page; >> - return pagevec_space(pvec); >> + pvec->pages_nr += compound_nr(page); >> + >> + if (pvec->pages_nr > PAGEVEC_SIZE) >> + return 0; >> + else >> + return pagevec_space(pvec); > > I assume your thinking here is that we should limit the number of pages > in the batches, but I think we should allow the number of folios to reach > PAGEVEC_SIZE before we drain the batch onto the LRU list. That will > reduce the contention on the LRU lock even further. Yes. The first thought in my mind was to limit the number of folios also. But the concern is that large folio has wider range of size. In the extreme case, if all batched large folios has 2M size, with PAGEVEC_SIZE as 15, one batch could have 30M memory. Which could be too large for some usages. Regards Yin, Fengwei >
Add Ying who found out the large folio is not batched added to lru. On 4/18/23 09:57, Yin Fengwei wrote: > > > On 4/17/23 20:25, Matthew Wilcox wrote: >> On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote: >>> Currently, large folio is not batched added to lru list. Which >>> cause high lru lock contention after enable large folio for >>> anonymous mappping. >> >> Obviously, I think we should be doing a batched add, but I don't think >> this is right. >> >>> @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec) >>> static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page) >>> { >>> pvec->pages[pvec->nr++] = page; >>> - return pagevec_space(pvec); >>> + pvec->pages_nr += compound_nr(page); >>> + >>> + if (pvec->pages_nr > PAGEVEC_SIZE) >>> + return 0; >>> + else >>> + return pagevec_space(pvec); >> >> I assume your thinking here is that we should limit the number of pages >> in the batches, but I think we should allow the number of folios to reach >> PAGEVEC_SIZE before we drain the batch onto the LRU list. That will >> reduce the contention on the LRU lock even further. > > Yes. The first thought in my mind was to limit the number of folios also. > > But the concern is that large folio has wider range of size. In the extreme > case, if all batched large folios has 2M size, with PAGEVEC_SIZE as 15, > one batch could have 30M memory. Which could be too large for some usages. > > > Regards > Yin, Fengwei > >>
Yin Fengwei <fengwei.yin@intel.com> writes: > Add Ying who found out the large folio is not batched added to lru. Thanks! > On 4/18/23 09:57, Yin Fengwei wrote: >> >> >> On 4/17/23 20:25, Matthew Wilcox wrote: >>> On Mon, Apr 17, 2023 at 03:56:43PM +0800, Yin Fengwei wrote: >>>> Currently, large folio is not batched added to lru list. Which >>>> cause high lru lock contention after enable large folio for >>>> anonymous mappping. >>> >>> Obviously, I think we should be doing a batched add, but I don't think >>> this is right. >>> >>>> @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec) >>>> static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page) >>>> { >>>> pvec->pages[pvec->nr++] = page; >>>> - return pagevec_space(pvec); >>>> + pvec->pages_nr += compound_nr(page); >>>> + >>>> + if (pvec->pages_nr > PAGEVEC_SIZE) nr_pages appears better for me. >>>> + return 0; >>>> + else >>>> + return pagevec_space(pvec); >>> >>> I assume your thinking here is that we should limit the number of pages >>> in the batches, but I think we should allow the number of folios to reach >>> PAGEVEC_SIZE before we drain the batch onto the LRU list. That will >>> reduce the contention on the LRU lock even further. >> >> Yes. The first thought in my mind was to limit the number of folios also. >> >> But the concern is that large folio has wider range of size. In the extreme >> case, if all batched large folios has 2M size, with PAGEVEC_SIZE as 15, >> one batch could have 30M memory. Which could be too large for some usages. Yes. I think that these are valid concerns. One possibility to balance between performance and lru cache size is to make nr_pages of per-CPU lru cache < PAGEVEC_SIZE * N. Where N can be determined according to the intended base order of large folios. For example, it can be 4 if we use 2 as intended base order. Just my 2 cents. Best Regards, Huang, Ying
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index f582f7213ea52..d719f7ad5a567 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -10,6 +10,7 @@ #define _LINUX_PAGEVEC_H #include <linux/xarray.h> +#include <linux/mm.h> /* 15 pointers + header align the pagevec structure to a power of two */ #define PAGEVEC_SIZE 15 @@ -22,6 +23,7 @@ struct address_space; struct pagevec { unsigned char nr; bool percpu_pvec_drained; + unsigned short pages_nr; struct page *pages[PAGEVEC_SIZE]; }; @@ -30,6 +32,7 @@ void __pagevec_release(struct pagevec *pvec); static inline void pagevec_init(struct pagevec *pvec) { pvec->nr = 0; + pvec->pages_nr = 0; pvec->percpu_pvec_drained = false; } @@ -54,7 +57,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec) static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page) { pvec->pages[pvec->nr++] = page; - return pagevec_space(pvec); + pvec->pages_nr += compound_nr(page); + + if (pvec->pages_nr > PAGEVEC_SIZE) + return 0; + else + return pagevec_space(pvec); } static inline void pagevec_release(struct pagevec *pvec) @@ -75,6 +83,7 @@ static inline void pagevec_release(struct pagevec *pvec) struct folio_batch { unsigned char nr; bool percpu_pvec_drained; + unsigned short pages_nr; struct folio *folios[PAGEVEC_SIZE]; }; @@ -92,6 +101,7 @@ static_assert(offsetof(struct pagevec, pages) == static inline void folio_batch_init(struct folio_batch *fbatch) { fbatch->nr = 0; + fbatch->pages_nr = 0; fbatch->percpu_pvec_drained = false; } @@ -124,7 +134,12 @@ static inline unsigned folio_batch_add(struct folio_batch *fbatch, struct folio *folio) { fbatch->folios[fbatch->nr++] = folio; - return fbatch_space(fbatch); + fbatch->pages_nr += folio_nr_pages(folio); + + if (fbatch->pages_nr > PAGEVEC_SIZE) + return 0; + else + return fbatch_space(fbatch); } static inline void folio_batch_release(struct folio_batch *fbatch) diff --git a/mm/swap.c b/mm/swap.c index 423199ee8478c..59e3f1e3701c3 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -228,8 +228,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) static void folio_batch_add_and_move(struct folio_batch *fbatch, struct folio *folio, move_fn_t move_fn) { - if (folio_batch_add(fbatch, folio) && !folio_test_large(folio) && - !lru_cache_disabled()) + if (folio_batch_add(fbatch, folio) && !lru_cache_disabled()) return; folio_batch_move_lru(fbatch, move_fn); }
Currently, large folio is not batched added to lru list. Which cause high lru lock contention after enable large folio for anonymous mappping. Running page_fault1 of will-it-scale + order 2 folio with 96 processes on Ice Lake 48C/96T, the lru lock contention could be around 65%: - 65.38% 0.17% page_fault1_pro [kernel.kallsyms] [k] folio_lruvec_lock_irqsave - 65.21% folio_lruvec_lock_irqsave With this patch, the lru lock contention dropped to 45% with same testing: - 44.93% 0.17% page_fault1_pro [kernel.kallsyms] [k] folio_lruvec_lock_irqsave + 44.75% folio_lruvec_lock_irqsave Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> --- include/linux/pagevec.h | 19 +++++++++++++++++-- mm/swap.c | 3 +-- 2 files changed, 18 insertions(+), 4 deletions(-)