Message ID | 20230714161733.4144503-1-ryan.roberts@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | variable-order, large folios for anonymous memory | expand |
On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > > In preparation for FLEXIBLE_THP support, improve > folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be > passed to it. In this case, all contained pages are accounted using the > order-0 folio (or base page) scheme. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > Reviewed-by: Yu Zhao <yuzhao@google.com> > Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> This patch doesn't depend on the rest of the series and therefore can be merged separately in case the rest needs more discussion. Ryan, please feel free to post other code paths (those from v1) you've optimized for large anon folios at any time, since each code path can be reviewed and merged individually as well.
On 14/07/2023 17:52, Yu Zhao wrote: > On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> In preparation for FLEXIBLE_THP support, improve >> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be >> passed to it. In this case, all contained pages are accounted using the >> order-0 folio (or base page) scheme. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> Reviewed-by: Yu Zhao <yuzhao@google.com> >> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> > > This patch doesn't depend on the rest of the series and therefore can > be merged separately in case the rest needs more discussion. > > Ryan, please feel free to post other code paths (those from v1) you've > optimized for large anon folios at any time, since each code path can > be reviewed and merged individually as well. Will do. Hoping to get the "batch zap" series out on Monday. Others need a bit more time to bake as they will need to be aligned to the changes from the review of this series first.
On 14.07.23 18:17, Ryan Roberts wrote: > In preparation for FLEXIBLE_THP support, improve > folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be > passed to it. In this case, all contained pages are accounted using the > order-0 folio (or base page) scheme. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > Reviewed-by: Yu Zhao <yuzhao@google.com> > Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> > --- > mm/rmap.c | 28 +++++++++++++++++++++------- > 1 file changed, 21 insertions(+), 7 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 0c0d8857dfce..f293d072368a 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, > * This means the inc-and-test can be bypassed. > * The folio does not have to be locked. > * > - * If the folio is large, it is accounted as a THP. As the folio > + * If the folio is pmd-mappable, it is accounted as a THP. As the folio > * is new, it's assumed to be mapped exclusively by a single process. > */ > void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, > unsigned long address) > { > - int nr; > + int nr = folio_nr_pages(folio); > > - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); > + VM_BUG_ON_VMA(address < vma->vm_start || > + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); > __folio_set_swapbacked(folio); > > - if (likely(!folio_test_pmd_mappable(folio))) { > + if (!folio_test_large(folio)) { Why remove the "likely" here? The patch itself does not change anything about that condition. > /* increment count (starts at -1) */ > atomic_set(&folio->_mapcount, 0); > - nr = 1; > + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); > + } else if (!folio_test_pmd_mappable(folio)) { > + int i; > + > + for (i = 0; i < nr; i++) { > + struct page *page = folio_page(folio, i); > + > + /* increment count (starts at -1) */ > + atomic_set(&page->_mapcount, 0); > + __page_set_anon_rmap(folio, page, vma, > + address + (i << PAGE_SHIFT), 1); > + } > + > + /* increment count (starts at 0) */ That comment is a bit misleading. We're not talking about a mapcount as in the other cases here. > + atomic_set(&folio->_nr_pages_mapped, nr); > } else { > /* increment count (starts at -1) */ > atomic_set(&folio->_entire_mapcount, 0); > atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); > - nr = folio_nr_pages(folio); > + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); > __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); > } > Apart from that, LGTM.
On 17/07/2023 14:00, David Hildenbrand wrote: > On 14.07.23 18:17, Ryan Roberts wrote: >> In preparation for FLEXIBLE_THP support, improve >> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be >> passed to it. In this case, all contained pages are accounted using the >> order-0 folio (or base page) scheme. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> Reviewed-by: Yu Zhao <yuzhao@google.com> >> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >> --- >> mm/rmap.c | 28 +++++++++++++++++++++------- >> 1 file changed, 21 insertions(+), 7 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 0c0d8857dfce..f293d072368a 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct >> vm_area_struct *vma, >> * This means the inc-and-test can be bypassed. >> * The folio does not have to be locked. >> * >> - * If the folio is large, it is accounted as a THP. As the folio >> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio >> * is new, it's assumed to be mapped exclusively by a single process. >> */ >> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >> unsigned long address) >> { >> - int nr; >> + int nr = folio_nr_pages(folio); >> >> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); >> + VM_BUG_ON_VMA(address < vma->vm_start || >> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); >> __folio_set_swapbacked(folio); >> >> - if (likely(!folio_test_pmd_mappable(folio))) { >> + if (!folio_test_large(folio)) { > > Why remove the "likely" here? The patch itself does not change anything about > that condition. Good question; I'm not sure why. Will have to put it down to bad copy/paste fixup. Will put it back in the next version. > >> /* increment count (starts at -1) */ >> atomic_set(&folio->_mapcount, 0); >> - nr = 1; >> + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); >> + } else if (!folio_test_pmd_mappable(folio)) { >> + int i; >> + >> + for (i = 0; i < nr; i++) { >> + struct page *page = folio_page(folio, i); >> + >> + /* increment count (starts at -1) */ >> + atomic_set(&page->_mapcount, 0); >> + __page_set_anon_rmap(folio, page, vma, >> + address + (i << PAGE_SHIFT), 1); >> + } >> + >> + /* increment count (starts at 0) */ > > That comment is a bit misleading. We're not talking about a mapcount as in the > other cases here. Correct, I'm talking about _nr_pages_mapped, which starts 0, not -1 like _mapcount. The comment was intended to be in the style used in other similar places in rmap.c. I could change it to: "_nr_pages_mapped is 0-based, so set it to the number of pages in the folio" or remove it entirely? What do you prefer? > >> + atomic_set(&folio->_nr_pages_mapped, nr); >> } else { >> /* increment count (starts at -1) */ >> atomic_set(&folio->_entire_mapcount, 0); >> atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); >> - nr = folio_nr_pages(folio); >> + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); >> __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); >> } >> > > Apart from that, LGTM. >
On 17.07.23 15:13, Ryan Roberts wrote: > On 17/07/2023 14:00, David Hildenbrand wrote: >> On 14.07.23 18:17, Ryan Roberts wrote: >>> In preparation for FLEXIBLE_THP support, improve >>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be >>> passed to it. In this case, all contained pages are accounted using the >>> order-0 folio (or base page) scheme. >>> >>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>> --- >>> mm/rmap.c | 28 +++++++++++++++++++++------- >>> 1 file changed, 21 insertions(+), 7 deletions(-) >>> >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 0c0d8857dfce..f293d072368a 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct >>> vm_area_struct *vma, >>> * This means the inc-and-test can be bypassed. >>> * The folio does not have to be locked. >>> * >>> - * If the folio is large, it is accounted as a THP. As the folio >>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio >>> * is new, it's assumed to be mapped exclusively by a single process. >>> */ >>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >>> unsigned long address) >>> { >>> - int nr; >>> + int nr = folio_nr_pages(folio); >>> >>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); >>> + VM_BUG_ON_VMA(address < vma->vm_start || >>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); >>> __folio_set_swapbacked(folio); >>> >>> - if (likely(!folio_test_pmd_mappable(folio))) { >>> + if (!folio_test_large(folio)) { >> >> Why remove the "likely" here? The patch itself does not change anything about >> that condition. > > Good question; I'm not sure why. Will have to put it down to bad copy/paste > fixup. Will put it back in the next version. > >> >>> /* increment count (starts at -1) */ >>> atomic_set(&folio->_mapcount, 0); >>> - nr = 1; >>> + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); >>> + } else if (!folio_test_pmd_mappable(folio)) { >>> + int i; >>> + >>> + for (i = 0; i < nr; i++) { >>> + struct page *page = folio_page(folio, i); >>> + >>> + /* increment count (starts at -1) */ >>> + atomic_set(&page->_mapcount, 0); >>> + __page_set_anon_rmap(folio, page, vma, >>> + address + (i << PAGE_SHIFT), 1); >>> + } >>> + >>> + /* increment count (starts at 0) */ >> >> That comment is a bit misleading. We're not talking about a mapcount as in the >> other cases here. > > Correct, I'm talking about _nr_pages_mapped, which starts 0, not -1 like > _mapcount. The comment was intended to be in the style used in other similar > places in rmap.c. I could change it to: "_nr_pages_mapped is 0-based, so set it > to the number of pages in the folio" or remove it entirely? What do you prefer? > We only have to comment what's weird, not what's normal. IOW, we also didn't have such a comment in the existing code when doing atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); What might make sense here is a simple "All pages of the folio are PTE-mapped."
On 17/07/2023 14:19, David Hildenbrand wrote: > On 17.07.23 15:13, Ryan Roberts wrote: >> On 17/07/2023 14:00, David Hildenbrand wrote: >>> On 14.07.23 18:17, Ryan Roberts wrote: >>>> In preparation for FLEXIBLE_THP support, improve >>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be >>>> passed to it. In this case, all contained pages are accounted using the >>>> order-0 folio (or base page) scheme. >>>> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>>> --- >>>> mm/rmap.c | 28 +++++++++++++++++++++------- >>>> 1 file changed, 21 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>> index 0c0d8857dfce..f293d072368a 100644 >>>> --- a/mm/rmap.c >>>> +++ b/mm/rmap.c >>>> @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct >>>> vm_area_struct *vma, >>>> * This means the inc-and-test can be bypassed. >>>> * The folio does not have to be locked. >>>> * >>>> - * If the folio is large, it is accounted as a THP. As the folio >>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio >>>> * is new, it's assumed to be mapped exclusively by a single process. >>>> */ >>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct >>>> *vma, >>>> unsigned long address) >>>> { >>>> - int nr; >>>> + int nr = folio_nr_pages(folio); >>>> >>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); >>>> + VM_BUG_ON_VMA(address < vma->vm_start || >>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); >>>> __folio_set_swapbacked(folio); >>>> >>>> - if (likely(!folio_test_pmd_mappable(folio))) { >>>> + if (!folio_test_large(folio)) { >>> >>> Why remove the "likely" here? The patch itself does not change anything about >>> that condition. >> >> Good question; I'm not sure why. Will have to put it down to bad copy/paste >> fixup. Will put it back in the next version. >> >>> >>>> /* increment count (starts at -1) */ >>>> atomic_set(&folio->_mapcount, 0); >>>> - nr = 1; >>>> + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); >>>> + } else if (!folio_test_pmd_mappable(folio)) { >>>> + int i; >>>> + >>>> + for (i = 0; i < nr; i++) { >>>> + struct page *page = folio_page(folio, i); >>>> + >>>> + /* increment count (starts at -1) */ >>>> + atomic_set(&page->_mapcount, 0); >>>> + __page_set_anon_rmap(folio, page, vma, >>>> + address + (i << PAGE_SHIFT), 1); >>>> + } >>>> + >>>> + /* increment count (starts at 0) */ >>> >>> That comment is a bit misleading. We're not talking about a mapcount as in the >>> other cases here. >> >> Correct, I'm talking about _nr_pages_mapped, which starts 0, not -1 like >> _mapcount. The comment was intended to be in the style used in other similar >> places in rmap.c. I could change it to: "_nr_pages_mapped is 0-based, so set it >> to the number of pages in the folio" or remove it entirely? What do you prefer? >> > > We only have to comment what's weird, not what's normal. > > IOW, we also didn't have such a comment in the existing code when doing > atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); > > > What might make sense here is a simple > > "All pages of the folio are PTE-mapped." > ACK - thanks.
diff --git a/mm/rmap.c b/mm/rmap.c index 0c0d8857dfce..f293d072368a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1278,31 +1278,45 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, * This means the inc-and-test can be bypassed. * The folio does not have to be locked. * - * If the folio is large, it is accounted as a THP. As the folio + * If the folio is pmd-mappable, it is accounted as a THP. As the folio * is new, it's assumed to be mapped exclusively by a single process. */ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address) { - int nr; + int nr = folio_nr_pages(folio); - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); + VM_BUG_ON_VMA(address < vma->vm_start || + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); __folio_set_swapbacked(folio); - if (likely(!folio_test_pmd_mappable(folio))) { + if (!folio_test_large(folio)) { /* increment count (starts at -1) */ atomic_set(&folio->_mapcount, 0); - nr = 1; + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); + } else if (!folio_test_pmd_mappable(folio)) { + int i; + + for (i = 0; i < nr; i++) { + struct page *page = folio_page(folio, i); + + /* increment count (starts at -1) */ + atomic_set(&page->_mapcount, 0); + __page_set_anon_rmap(folio, page, vma, + address + (i << PAGE_SHIFT), 1); + } + + /* increment count (starts at 0) */ + atomic_set(&folio->_nr_pages_mapped, nr); } else { /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); - nr = folio_nr_pages(folio); + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); - __page_set_anon_rmap(folio, &folio->page, vma, address, 1); } /**