diff mbox series

[RFC,1/1] mm: mark folio accessed in minor fault

Message ID 20231220102948.1963798-1-zhaoyang.huang@unisoc.com (mailing list archive)
State New
Headers show
Series [RFC,1/1] mm: mark folio accessed in minor fault | expand

Commit Message

zhaoyang.huang Dec. 20, 2023, 10:29 a.m. UTC
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Inactive mapped folio will be promoted to active only when it is
scanned in shrink_inactive_list, while the vfs folio will do this
immidiatly when it is accessed. These will introduce two affections:

1. NR_ACTIVE_FILE is not accurate as expected.
2. Low reclaiming efficiency caused by dummy nactive folio which should
   be kept as earlier as shrink_active_list.

I would like to suggest mark the folio be accessed in minor fault to
solve this situation.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/filemap.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Matthew Wilcox (Oracle) Dec. 20, 2023, 2:14 p.m. UTC | #1
On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> Inactive mapped folio will be promoted to active only when it is
> scanned in shrink_inactive_list, while the vfs folio will do this
> immidiatly when it is accessed. These will introduce two affections:
> 
> 1. NR_ACTIVE_FILE is not accurate as expected.
> 2. Low reclaiming efficiency caused by dummy nactive folio which should
>    be kept as earlier as shrink_active_list.
> 
> I would like to suggest mark the folio be accessed in minor fault to
> solve this situation.

This isn't going to be as effective as you imagine.  Almost all file
faults are handled through filemap_map_pages().  So I must ask, what
testing have you done with this patch?

And while you're gathering data, what effect would this patch have on your
workloads?

diff --git a/mm/filemap.c b/mm/filemap.c
index 2e6b1daac6cd..8cecf82dcc5a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3418,6 +3418,7 @@ static struct folio *next_uptodate_folio(struct xa_state *xas,
 		max_idx = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
 		if (xas->xa_index >= max_idx)
 			goto unlock;
+		folio_mark_accessed(folio);
 		return folio;
 unlock:
 		folio_unlock(folio);
Zhaoyang Huang Dec. 21, 2023, 1:58 a.m. UTC | #2
On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Inactive mapped folio will be promoted to active only when it is
> > scanned in shrink_inactive_list, while the vfs folio will do this
> > immidiatly when it is accessed. These will introduce two affections:
> >
> > 1. NR_ACTIVE_FILE is not accurate as expected.
> > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> >    be kept as earlier as shrink_active_list.
> >
> > I would like to suggest mark the folio be accessed in minor fault to
> > solve this situation.
>
> This isn't going to be as effective as you imagine.  Almost all file
> faults are handled through filemap_map_pages().  So I must ask, what
> testing have you done with this patch?
>
> And while you're gathering data, what effect would this patch have on your
> workloads?
Thanks for heads-up, I am out of date for readahead mechanism. My goal
is to have mapped file pages behave like other pages which could be
promoted immediately when they are accessed. I will update the patch
and provide benchmark data in new patch set.

>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2e6b1daac6cd..8cecf82dcc5a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3418,6 +3418,7 @@ static struct folio *next_uptodate_folio(struct xa_state *xas,
>                 max_idx = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE);
>                 if (xas->xa_index >= max_idx)
>                         goto unlock;
> +               folio_mark_accessed(folio);
>                 return folio;
>  unlock:
>                 folio_unlock(folio);
>
Matthew Wilcox (Oracle) Dec. 21, 2023, 4:09 a.m. UTC | #3
On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > Inactive mapped folio will be promoted to active only when it is
> > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > immidiatly when it is accessed. These will introduce two affections:
> > >
> > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > >    be kept as earlier as shrink_active_list.
> > >
> > > I would like to suggest mark the folio be accessed in minor fault to
> > > solve this situation.
> >
> > This isn't going to be as effective as you imagine.  Almost all file
> > faults are handled through filemap_map_pages().  So I must ask, what
> > testing have you done with this patch?
> >
> > And while you're gathering data, what effect would this patch have on your
> > workloads?
> Thanks for heads-up, I am out of date for readahead mechanism. My goal

It's not a terribly new mechanism ... filemap_map_pages() was added nine
years ago in 2014 by commit f1820361f83d

> is to have mapped file pages behave like other pages which could be
> promoted immediately when they are accessed. I will update the patch
> and provide benchmark data in new patch set.

Understood.  I don't know the history of this, so I'm not sure if the
decision to not mark folios as accessed here was intentional or not.
I suspect it's entirely unintentional.

By the way, rather than inserting an explicit call to folio_set_accessed()
in filemap_fault(), change the filemap_get_folio() call to
__filemap_get_folio() and add FGP_ACCESSED to the fgp flags.
Yu Zhao Dec. 21, 2023, 4:53 a.m. UTC | #4
On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > >
> > > > Inactive mapped folio will be promoted to active only when it is
> > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > immidiatly when it is accessed. These will introduce two affections:
> > > >
> > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > >    be kept as earlier as shrink_active_list.
> > > >
> > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > solve this situation.
> > >
> > > This isn't going to be as effective as you imagine.  Almost all file
> > > faults are handled through filemap_map_pages().  So I must ask, what
> > > testing have you done with this patch?
> > >
> > > And while you're gathering data, what effect would this patch have on your
> > > workloads?
> > Thanks for heads-up, I am out of date for readahead mechanism. My goal
>
> It's not a terribly new mechanism ... filemap_map_pages() was added nine
> years ago in 2014 by commit f1820361f83d
>
> > is to have mapped file pages behave like other pages which could be
> > promoted immediately when they are accessed. I will update the patch
> > and provide benchmark data in new patch set.
>
> Understood.  I don't know the history of this, so I'm not sure if the
> decision to not mark folios as accessed here was intentional or not.
> I suspect it's entirely unintentional.

It's intentional. For the active/inactive LRU, all folios start
inactive. The first scan of a folio transfers the A-bit (if it's set
during the initial fault) to PG_referenced; the second scan of this
folio, if the A-bit is set again, moves it to the active list. This
way single-use folios, i.e., folios mapped for file streaming, can be
reclaimed quickly, since they are "demoted" rather than "promoted" on
the second scan. This RFC would regress memory streaming workloads.
Zhaoyang Huang Dec. 21, 2023, 6:28 a.m. UTC | #5
On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > >
> > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > >
> > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > >    be kept as earlier as shrink_active_list.
> > > > >
> > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > solve this situation.
> > > >
> > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > testing have you done with this patch?
> > > >
> > > > And while you're gathering data, what effect would this patch have on your
> > > > workloads?
> > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> >
> > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > years ago in 2014 by commit f1820361f83d
> >
> > > is to have mapped file pages behave like other pages which could be
> > > promoted immediately when they are accessed. I will update the patch
> > > and provide benchmark data in new patch set.
> >
> > Understood.  I don't know the history of this, so I'm not sure if the
> > decision to not mark folios as accessed here was intentional or not.
> > I suspect it's entirely unintentional.
>
> It's intentional. For the active/inactive LRU, all folios start
> inactive. The first scan of a folio transfers the A-bit (if it's set
> during the initial fault) to PG_referenced; the second scan of this
> folio, if the A-bit is set again, moves it to the active list. This
> way single-use folios, i.e., folios mapped for file streaming, can be
> reclaimed quickly, since they are "demoted" rather than "promoted" on
> the second scan. This RFC would regress memory streaming workloads.
Thanks. Please correct me if I am wrong. IMO, there will be no
minor-fault for single-use folios which means RFC could behave the
same as mainline does now? I think it doesn't make sense to have
multiple-mapped pages filled in page_list to shrink_page_list since we
can distinguish them in advance.
Yu Zhao Dec. 21, 2023, 6:32 a.m. UTC | #6
On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > >
> > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > >
> > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > >    be kept as earlier as shrink_active_list.
> > > > > >
> > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > solve this situation.
> > > > >
> > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > testing have you done with this patch?
> > > > >
> > > > > And while you're gathering data, what effect would this patch have on your
> > > > > workloads?
> > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > >
> > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > years ago in 2014 by commit f1820361f83d
> > >
> > > > is to have mapped file pages behave like other pages which could be
> > > > promoted immediately when they are accessed. I will update the patch
> > > > and provide benchmark data in new patch set.
> > >
> > > Understood.  I don't know the history of this, so I'm not sure if the
> > > decision to not mark folios as accessed here was intentional or not.
> > > I suspect it's entirely unintentional.
> >
> > It's intentional. For the active/inactive LRU, all folios start
> > inactive. The first scan of a folio transfers the A-bit (if it's set
> > during the initial fault) to PG_referenced; the second scan of this
> > folio, if the A-bit is set again, moves it to the active list. This
> > way single-use folios, i.e., folios mapped for file streaming, can be
> > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > the second scan. This RFC would regress memory streaming workloads.
> Thanks. Please correct me if I am wrong. IMO, there will be no
> minor-fault for single-use folios

Why not? What prevents a specific *access pattern* from triggering minor faults?

> which means RFC could behave the
> same as mainline does now? I think it doesn't make sense to have
> multiple-mapped pages filled in page_list to shrink_page_list since we
> can distinguish them in advance.
Zhaoyang Huang Dec. 22, 2023, 5:53 a.m. UTC | #7
On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> >
> > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > >
> > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > >
> > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > >
> > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > solve this situation.
> > > > > >
> > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > testing have you done with this patch?
> > > > > >
> > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > workloads?
> > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > >
> > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > years ago in 2014 by commit f1820361f83d
> > > >
> > > > > is to have mapped file pages behave like other pages which could be
> > > > > promoted immediately when they are accessed. I will update the patch
> > > > > and provide benchmark data in new patch set.
> > > >
> > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > decision to not mark folios as accessed here was intentional or not.
> > > > I suspect it's entirely unintentional.
> > >
> > > It's intentional. For the active/inactive LRU, all folios start
> > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > during the initial fault) to PG_referenced; the second scan of this
> > > folio, if the A-bit is set again, moves it to the active list. This
> > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > the second scan. This RFC would regress memory streaming workloads.
> > Thanks. Please correct me if I am wrong. IMO, there will be no
> > minor-fault for single-use folios
>
> Why not? What prevents a specific *access pattern* from triggering minor faults?
Please find the following chart for mapped page state machine
transfication. We can find that:
1. RFC behaves the same as the mainline in (1)(2)
2. VM_EXEC mapped pages are activated earlier than mainline which help
improve scan efficiency in (3)(4)
3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.

(1)
                                  1st access
shrink_active_list              1st scan(shink_folio_list)
2nd scan(shrink_folio_list')
mainline                     INA/UNR                        NA
                          INA/REF
DROP
RFC                           INA/UNR                        NA
                           INA/REF
DROP

(2)
                                  1st access                   2nd
access                        shrink_active_list          1st
scan(shink_folio_list)
mainline                     INA/UNR                     INA/UNR
                      NA                                 ACT/REF
RFC                           INA/UNR                     INA/REF
                        NA                                 ACT/REF

(3)
                                  1st access
shrink_active_list        1st scan(shink_folio_list)       2nd access
      2nd scan(shrink_active_list)     3rd scan(shink_folio_list)
mainline                     INA/UNR                        NA
                          INA/REF                           INA/REF
                NA                                     ACT/REF
RFC                           INA/UNR                        NA
                           INA/REF                           ACT/REF
                ACT/REF                           NA
(VM_EXEC)
RFC                           INA/UNR                        NA
                           INA/REF                           ACT/REF
                INA/REF                            DROP
(non VM_EXEC)

(4)
                                  1st access                   2nd
access                         3rd access
shrink_active_list                   shink_folio_list
mainline                     INA/UNR                       INA/UNR
                       INA/UNR                          NA
                           ACT/REF
RFC                           INA/UNR                       INA/REF
                         ACT/REF                         ACT/REF
                       NA
(VM_EXEC)
RFC                           INA/UNR                       INA/REF
                         ACT/REF                         ACT/REF
                       NA
(Non VM_EXEC)
>
> > which means RFC could behave the
> > same as mainline does now? I think it doesn't make sense to have
> > multiple-mapped pages filled in page_list to shrink_page_list since we
> > can distinguish them in advance.
Yu Zhao Dec. 22, 2023, 6:14 a.m. UTC | #8
On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > >
> > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > >
> > > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > > >
> > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > > >
> > > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > > solve this situation.
> > > > > > >
> > > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > > testing have you done with this patch?
> > > > > > >
> > > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > > workloads?
> > > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > > >
> > > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > > years ago in 2014 by commit f1820361f83d
> > > > >
> > > > > > is to have mapped file pages behave like other pages which could be
> > > > > > promoted immediately when they are accessed. I will update the patch
> > > > > > and provide benchmark data in new patch set.
> > > > >
> > > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > > decision to not mark folios as accessed here was intentional or not.
> > > > > I suspect it's entirely unintentional.
> > > >
> > > > It's intentional. For the active/inactive LRU, all folios start
> > > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > > during the initial fault) to PG_referenced; the second scan of this
> > > > folio, if the A-bit is set again, moves it to the active list. This
> > > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > > the second scan. This RFC would regress memory streaming workloads.
> > > Thanks. Please correct me if I am wrong. IMO, there will be no
> > > minor-fault for single-use folios
> >
> > Why not? What prevents a specific *access pattern* from triggering minor faults?
> Please find the following chart for mapped page state machine
> transfication.

I'm not sure what you are asking me to look at -- is the following
trying to illustrate something related to my question above?

> We can find that:
> 1. RFC behaves the same as the mainline in (1)(2)
> 2. VM_EXEC mapped pages are activated earlier than mainline which help
> improve scan efficiency in (3)(4)
> 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.
>
> (1)
>                                   1st access
> shrink_active_list              1st scan(shink_folio_list)
> 2nd scan(shrink_folio_list')
> mainline                     INA/UNR                        NA
>                           INA/REF
> DROP
> RFC                           INA/UNR                        NA
>                            INA/REF
> DROP
>
> (2)
>                                   1st access                   2nd
> access                        shrink_active_list          1st
> scan(shink_folio_list)
> mainline                     INA/UNR                     INA/UNR
>                       NA                                 ACT/REF
> RFC                           INA/UNR                     INA/REF
>                         NA                                 ACT/REF
>
> (3)
>                                   1st access
> shrink_active_list        1st scan(shink_folio_list)       2nd access
>       2nd scan(shrink_active_list)     3rd scan(shink_folio_list)
> mainline                     INA/UNR                        NA
>                           INA/REF                           INA/REF
>                 NA                                     ACT/REF
> RFC                           INA/UNR                        NA
>                            INA/REF                           ACT/REF
>                 ACT/REF                           NA
> (VM_EXEC)
> RFC                           INA/UNR                        NA
>                            INA/REF                           ACT/REF
>                 INA/REF                            DROP
> (non VM_EXEC)
>
> (4)
>                                   1st access                   2nd
> access                         3rd access
> shrink_active_list                   shink_folio_list
> mainline                     INA/UNR                       INA/UNR
>                        INA/UNR                          NA
>                            ACT/REF
> RFC                           INA/UNR                       INA/REF
>                          ACT/REF                         ACT/REF
>                        NA
> (VM_EXEC)
> RFC                           INA/UNR                       INA/REF
>                          ACT/REF                         ACT/REF
>                        NA
> (Non VM_EXEC)
> >
> > > which means RFC could behave the
> > > same as mainline does now? I think it doesn't make sense to have
> > > multiple-mapped pages filled in page_list to shrink_page_list since we
> > > can distinguish them in advance.
zhaoyang.huang Dec. 22, 2023, 6:28 a.m. UTC | #9
On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > >
> > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > >
> > > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > > >
> > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > > >
> > > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > > solve this situation.
> > > > > > >
> > > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > > testing have you done with this patch?
> > > > > > >
> > > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > > workloads?
> > > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > > >
> > > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > > years ago in 2014 by commit f1820361f83d
> > > > >
> > > > > > is to have mapped file pages behave like other pages which could be
> > > > > > promoted immediately when they are accessed. I will update the patch
> > > > > > and provide benchmark data in new patch set.
> > > > >
> > > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > > decision to not mark folios as accessed here was intentional or not.
> > > > > I suspect it's entirely unintentional.
> > > >
> > > > It's intentional. For the active/inactive LRU, all folios start
> > > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > > during the initial fault) to PG_referenced; the second scan of this
> > > > folio, if the A-bit is set again, moves it to the active list. This
> > > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > > the second scan. This RFC would regress memory streaming workloads.
> > > Thanks. Please correct me if I am wrong. IMO, there will be no
> > > minor-fault for single-use folios
> >
> > Why not? What prevents a specific *access pattern* from triggering minor faults?
> Please find the following chart for mapped page state machine
> transfication.

> I'm not sure what you are asking me to look at -- is the following
> trying to illustrate something related to my question above?

sorry for my fault on table generation, resend it, I am trying to present how RFC performs in a page's stat transfer 

1. RFC behaves the same as the mainline in (1)(2)
2. VM_EXEC mapped pages are activated earlier than mainline which help improve scan efficiency in (3)(4)
3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.

(1)
                                  1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
mainline                     INA/UNR                        NA                          INA/REF                               DROP
RFC                           INA/UNR                        NA                           INA/REF                              DROP

(2)
                                  1st access                 2nd access               shrink_active_list          1st scan(shink_folio_list) 
mainline                     INA/UNR                     INA/UNR                       NA                                 ACT/REF
RFC                           INA/UNR                     INA/REF                       NA                                 ACT/REF

(3)
                                  1st access        1st scan(shink_folio_list)       2nd access      2nd scan(shrink_active_list)     3rd can(shrink_folio_list)
mainline                     INA/UNR                    INA/REF                           INA/REF                NA                                     ACT/REF
RFC                           INA/UNR                    INA/REF                           ACT/REF                ACT/REF                           NA
(VM_EXEC)
RFC                           INA/UNR                    INA/REF                           ACT/REF                INA/REF                            DROP
(non VM_EXEC)

(4)
                                  1st access                   2nd access                   3rd access          1st scan(shrink_active_list)   2nd scan(shink_folio_list)
mainline                     INA/UNR                       INA/UNR                       INA/UNR                    NA                                 ACT/REF
RFC                           INA/UNR                       INA/REF                         ACT/REF                  ACT/REF                       NA
(VM_EXEC)
RFC                           INA/UNR                       INA/REF                        ACT/REF                   ACT/REF                       NA
(Non VM_EXEC)

> We can find that:
> 1. RFC behaves the same as the mainline in (1)(2)
> 2. VM_EXEC mapped pages are activated earlier than mainline which help
> improve scan efficiency in (3)(4)
> 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.
>
> (1)
>                                   1st access
> shrink_active_list              1st scan(shink_folio_list)
> 2nd scan(shrink_folio_list')
> mainline                     INA/UNR                        NA
>                           INA/REF
> DROP
> RFC                           INA/UNR                        NA
>                            INA/REF
> DROP
>
> (2)
>                                   1st access                   2nd
> access                        shrink_active_list          1st
> scan(shink_folio_list)
> mainline                     INA/UNR                     INA/UNR
>                       NA                                 ACT/REF
> RFC                           INA/UNR                     INA/REF
>                         NA                                 ACT/REF
>
> (3)
>                                   1st access
> shrink_active_list        1st scan(shink_folio_list)       2nd access
>       2nd scan(shrink_active_list)     3rd scan(shink_folio_list)
> mainline                     INA/UNR                        NA
>                           INA/REF                           INA/REF
>                 NA                                     ACT/REF
> RFC                           INA/UNR                        NA
>                            INA/REF                           ACT/REF
>                 ACT/REF                           NA
> (VM_EXEC)
> RFC                           INA/UNR                        NA
>                            INA/REF                           ACT/REF
>                 INA/REF                            DROP
> (non VM_EXEC)
>
> (4)
>                                   1st access                   2nd
> access                         3rd access
> shrink_active_list                   shink_folio_list
> mainline                     INA/UNR                       INA/UNR
>                        INA/UNR                          NA
>                            ACT/REF
> RFC                           INA/UNR                       INA/REF
>                          ACT/REF                         ACT/REF
>                        NA
> (VM_EXEC)
> RFC                           INA/UNR                       INA/REF
>                          ACT/REF                         ACT/REF
>                        NA
> (Non VM_EXEC)
> >
> > > which means RFC could behave the
> > > same as mainline does now? I think it doesn't make sense to have
> > > multiple-mapped pages filled in page_list to shrink_page_list since we
> > > can distinguish them in advance.
Yu Zhao Dec. 22, 2023, 6:45 a.m. UTC | #10
On Thu, Dec 21, 2023 at 11:29 PM 黄朝阳 (Zhaoyang Huang)
<zhaoyang.huang@unisoc.com> wrote:
>
>
> On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> >
> > On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > >
> > > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > > >
> > > > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > > > >
> > > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > > > >
> > > > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > > > solve this situation.
> > > > > > > >
> > > > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > > > testing have you done with this patch?
> > > > > > > >
> > > > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > > > workloads?
> > > > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > > > >
> > > > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > > > years ago in 2014 by commit f1820361f83d
> > > > > >
> > > > > > > is to have mapped file pages behave like other pages which could be
> > > > > > > promoted immediately when they are accessed. I will update the patch
> > > > > > > and provide benchmark data in new patch set.
> > > > > >
> > > > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > > > decision to not mark folios as accessed here was intentional or not.
> > > > > > I suspect it's entirely unintentional.
> > > > >
> > > > > It's intentional. For the active/inactive LRU, all folios start
> > > > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > > > during the initial fault) to PG_referenced; the second scan of this
> > > > > folio, if the A-bit is set again, moves it to the active list. This
> > > > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > > > the second scan. This RFC would regress memory streaming workloads.
> > > > Thanks. Please correct me if I am wrong. IMO, there will be no
> > > > minor-fault for single-use folios
> > >
> > > Why not? What prevents a specific *access pattern* from triggering minor faults?
> > Please find the following chart for mapped page state machine
> > transfication.
>
> > I'm not sure what you are asking me to look at -- is the following
> > trying to illustrate something related to my question above?
>
> sorry for my fault on table generation, resend it, I am trying to present how RFC performs in a page's stat transfer
>
> 1. RFC behaves the same as the mainline in (1)(2)
> 2. VM_EXEC mapped pages are activated earlier than mainline which help improve scan efficiency in (3)(4)
> 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.
>
> (1)
>                                   1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
> mainline                     INA/UNR                        NA                          INA/REF                               DROP
> RFC                           INA/UNR                        NA                           INA/REF                              DROP

I don't think this is the case -- with this RFC, *readahead* folios,
which are added into pagecache as INA/UNR, become PG_referenced upon
the initial fault (first access), i.e., INA/REF. The first scan will
actually activate them, i.e., they become ACT/UNR, because they have
both PG_referenced and the A-bit.

So it doesn't behave the same way the mainline does for the first case
you listed. (I didn't look at the rest of the cases.)
Zhaoyang Huang Dec. 22, 2023, 8:41 a.m. UTC | #11
On Fri, Dec 22, 2023 at 2:45 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Dec 21, 2023 at 11:29 PM 黄朝阳 (Zhaoyang Huang)
> <zhaoyang.huang@unisoc.com> wrote:
> >
> >
> > On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > >
> > > On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > > > >
> > > > > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > > > > >
> > > > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > > > > >
> > > > > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > > > > solve this situation.
> > > > > > > > >
> > > > > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > > > > testing have you done with this patch?
> > > > > > > > >
> > > > > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > > > > workloads?
> > > > > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > > > > >
> > > > > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > > > > years ago in 2014 by commit f1820361f83d
> > > > > > >
> > > > > > > > is to have mapped file pages behave like other pages which could be
> > > > > > > > promoted immediately when they are accessed. I will update the patch
> > > > > > > > and provide benchmark data in new patch set.
> > > > > > >
> > > > > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > > > > decision to not mark folios as accessed here was intentional or not.
> > > > > > > I suspect it's entirely unintentional.
> > > > > >
> > > > > > It's intentional. For the active/inactive LRU, all folios start
> > > > > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > > > > during the initial fault) to PG_referenced; the second scan of this
> > > > > > folio, if the A-bit is set again, moves it to the active list. This
> > > > > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > > > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > > > > the second scan. This RFC would regress memory streaming workloads.
> > > > > Thanks. Please correct me if I am wrong. IMO, there will be no
> > > > > minor-fault for single-use folios
> > > >
> > > > Why not? What prevents a specific *access pattern* from triggering minor faults?
> > > Please find the following chart for mapped page state machine
> > > transfication.
> >
> > > I'm not sure what you are asking me to look at -- is the following
> > > trying to illustrate something related to my question above?
> >
> > sorry for my fault on table generation, resend it, I am trying to present how RFC performs in a page's stat transfer
> >
> > 1. RFC behaves the same as the mainline in (1)(2)
> > 2. VM_EXEC mapped pages are activated earlier than mainline which help improve scan efficiency in (3)(4)
> > 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.
> >
> > (1)
> >                                   1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
> > mainline                     INA/UNR                        NA                          INA/REF                               DROP
> > RFC                           INA/UNR                        NA                           INA/REF                              DROP
>
> I don't think this is the case -- with this RFC, *readahead* folios,
> which are added into pagecache as INA/UNR, become PG_referenced upon
> the initial fault (first access), i.e., INA/REF. The first scan will
> actually activate them, i.e., they become ACT/UNR, because they have
> both PG_referenced and the A-bit.
No,Sorry for the confusion. This RFC actually aims at minor fault of
the faulted pages(with one pte setup). In terms of the readahead
pages, can we solve it by add one criteria as bellow, which unifies
all kinds of mapped pages in RFC.

@@ -3273,6 +3273,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
         */
        folio = filemap_get_folio(mapping, index);
        if (likely(!IS_ERR(folio))) {
+               /*
+                * try to promote inactive folio here when it is accessed
+                * as minor fault
+                */
+               if(folio_mapcount(folio))
+                       folio_mark_accessed(folio);
                /*
                 * We found the page, so try async readahead before waiting for
                 * the lock.

>
> So it doesn't behave the same way the mainline does for the first case
> you listed. (I didn't look at the rest of the cases.)
zhaoyang.huang Dec. 22, 2023, 9:41 a.m. UTC | #12
On Fri, Dec 22, 2023 at 2:45 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Dec 21, 2023 at 11:29 PM 黄朝阳 (Zhaoyang Huang)
> <zhaoyang.huang@unisoc.com> wrote:
> >
> >
> > On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > >
> > > On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > > > >
> > > > > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > > > > >
> > > > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > > > > >
> > > > > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > > > > solve this situation.
> > > > > > > > >
> > > > > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > > > > testing have you done with this patch?
> > > > > > > > >
> > > > > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > > > > workloads?
> > > > > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > > > > >
> > > > > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > > > > years ago in 2014 by commit f1820361f83d
> > > > > > >
> > > > > > > > is to have mapped file pages behave like other pages which could be
> > > > > > > > promoted immediately when they are accessed. I will update the patch
> > > > > > > > and provide benchmark data in new patch set.
> > > > > > >
> > > > > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > > > > decision to not mark folios as accessed here was intentional or not.
> > > > > > > I suspect it's entirely unintentional.
> > > > > >
> > > > > > It's intentional. For the active/inactive LRU, all folios start
> > > > > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > > > > during the initial fault) to PG_referenced; the second scan of this
> > > > > > folio, if the A-bit is set again, moves it to the active list. This
> > > > > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > > > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > > > > the second scan. This RFC would regress memory streaming workloads.
> > > > > Thanks. Please correct me if I am wrong. IMO, there will be no
> > > > > minor-fault for single-use folios
> > > >
> > > > Why not? What prevents a specific *access pattern* from triggering minor faults?
> > > Please find the following chart for mapped page state machine
> > > transfication.
> >
> > > I'm not sure what you are asking me to look at -- is the following
> > > trying to illustrate something related to my question above?
> >
> > sorry for my fault on table generation, resend it, I am trying to present how RFC performs in a page's stat transfer
> >
> > 1. RFC behaves the same as the mainline in (1)(2)
> > 2. VM_EXEC mapped pages are activated earlier than mainline which help improve scan efficiency in (3)(4)
> > 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.
> >
> > (1)
> >                                   1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
> > mainline                     INA/UNR                        NA                          INA/REF                               DROP
> > RFC                           INA/UNR                        NA                           INA/REF                              DROP
>
> > I don't think this is the case -- with this RFC, *readahead* folios,
> > which are added into pagecache as INA/UNR, become PG_referenced upon
> > the initial fault (first access), i.e., INA/REF. The first scan will
> > actually activate them, i.e., they become ACT/UNR, because they have
> > both PG_referenced and the A-bit.
> No,Sorry for the confusion. This RFC actually aims at minor fault of
> the faulted pages(with one pte setup). In terms of the readahead
> pages, can we solve it by add one criteria as bellow, which unifies
> all kinds of mapped pages in RFC.
>
> @@ -3273,6 +3273,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
>         */
>        folio = filemap_get_folio(mapping, index);
>        if (likely(!IS_ERR(folio))) {
> +               /*
> +                * try to promote inactive folio here when it is accessed
> +                * as minor fault
> +                */
> +               if(folio_mapcount(folio))
> +                       folio_mark_accessed(folio);
>                /*
>                 * We found the page, so try async readahead before waiting for
>                 * the lock.
>
Please find bellow for the stat machine table of updated RFC, where RFC behaves same or enhances the scan efficiency by promoting the page in shrink_active_list.

(1)
                                  1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
mainline                     INA/UNR                        NA                          INA/REF                               DROP
RFC                           INA/UNR                        NA                          INA/REF                              DROP
RA                              INA/UNR                        NA                          INA/REF                              DROP                          

(2)
                                  1st access                 2nd access               shrink_active_list          1st scan(shink_folio_list) 
mainline                     INA/UNR                     INA/UNR                       NA                                 ACT/REF
RFC                           INA/UNR                     INA/REF                       NA                                 ACT/REF
RA                              INA/UNR                     INA/REF                       NA                                 ACT/REF

(3)
                                  1st access        1st scan(shink_folio_list)       2nd access      2nd scan(shrink_active_list)     3rd scan(shrink_folio_list)
mainline                     INA/UNR                    INA/REF                           INA/REF                NA                                     ACT/REF
RFC                           INA/UNR                    INA/REF                           ACT/REF                ACT/REF                           NA
(VM_EXEC)
RFC                           INA/UNR                    INA/REF                           ACT/REF                INA/REF                            DROP
(non VM_EXEC)
RA                              INA/UNR                    INA/REF                           INA/REF                NA                                     ACT/REF

(4)
                                  1st access                   2nd access                   3rd access          1st scan(shrink_active_list)   2nd scan(shink_folio_list)
mainline                     INA/UNR                       INA/UNR                       INA/UNR                    NA                                 ACT/REF
RFC                           INA/UNR                       INA/REF                         ACT/REF                  ACT/REF                       NA
(VM_EXEC)
RFC                           INA/UNR                       INA/REF                        ACT/REF                   ACT/REF                       NA
(Non VM_EXEC)
RA                              INA/UNR                       INA/REF                        ACT/REF                   ACT/REF                       NA
> >
> > So it doesn't behave the same way the mainline does for the first case
> > you listed. (I didn't look at the rest of the cases.)
Yu Zhao Dec. 23, 2023, 2:41 a.m. UTC | #13
On Fri, Dec 22, 2023 at 2:41 AM 黄朝阳 (Zhaoyang Huang)
<zhaoyang.huang@unisoc.com> wrote:
>
>
>
> On Fri, Dec 22, 2023 at 2:45 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Thu, Dec 21, 2023 at 11:29 PM 黄朝阳 (Zhaoyang Huang)
> > <zhaoyang.huang@unisoc.com> wrote:
> > >
> > >
> > > On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > > > > >
> > > > > > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > > > > > >
> > > > > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > > > > > >
> > > > > > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > > > > > solve this situation.
> > > > > > > > > >
> > > > > > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > > > > > testing have you done with this patch?
> > > > > > > > > >
> > > > > > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > > > > > workloads?
> > > > > > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > > > > > >
> > > > > > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > > > > > years ago in 2014 by commit f1820361f83d
> > > > > > > >
> > > > > > > > > is to have mapped file pages behave like other pages which could be
> > > > > > > > > promoted immediately when they are accessed. I will update the patch
> > > > > > > > > and provide benchmark data in new patch set.
> > > > > > > >
> > > > > > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > > > > > decision to not mark folios as accessed here was intentional or not.
> > > > > > > > I suspect it's entirely unintentional.
> > > > > > >
> > > > > > > It's intentional. For the active/inactive LRU, all folios start
> > > > > > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > > > > > during the initial fault) to PG_referenced; the second scan of this
> > > > > > > folio, if the A-bit is set again, moves it to the active list. This
> > > > > > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > > > > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > > > > > the second scan. This RFC would regress memory streaming workloads.
> > > > > > Thanks. Please correct me if I am wrong. IMO, there will be no
> > > > > > minor-fault for single-use folios
> > > > >
> > > > > Why not? What prevents a specific *access pattern* from triggering minor faults?
> > > > Please find the following chart for mapped page state machine
> > > > transfication.
> > >
> > > > I'm not sure what you are asking me to look at -- is the following
> > > > trying to illustrate something related to my question above?
> > >
> > > sorry for my fault on table generation, resend it, I am trying to present how RFC performs in a page's stat transfer
> > >
> > > 1. RFC behaves the same as the mainline in (1)(2)
> > > 2. VM_EXEC mapped pages are activated earlier than mainline which help improve scan efficiency in (3)(4)
> > > 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.
> > >
> > > (1)
> > >                                   1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
> > > mainline                     INA/UNR                        NA                          INA/REF                               DROP
> > > RFC                           INA/UNR                        NA                           INA/REF                              DROP
> >
> > > I don't think this is the case -- with this RFC, *readahead* folios,
> > > which are added into pagecache as INA/UNR, become PG_referenced upon
> > > the initial fault (first access), i.e., INA/REF. The first scan will
> > > actually activate them, i.e., they become ACT/UNR, because they have
> > > both PG_referenced and the A-bit.
> > No,Sorry for the confusion. This RFC actually aims at minor fault of
> > the faulted pages(with one pte setup). In terms of the readahead
> > pages, can we solve it by add one criteria as bellow, which unifies
> > all kinds of mapped pages in RFC.

Again this is still wrong -- how do you know the other process mapping
this folio isn't also streaming the file?

It'd be best to take a step back and think through my original
question: what prevents a specific *access pattern* from triggering
minor faults? The simple answer is that you can't.

> > @@ -3273,6 +3273,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> >         */
> >        folio = filemap_get_folio(mapping, index);
> >        if (likely(!IS_ERR(folio))) {
> > +               /*
> > +                * try to promote inactive folio here when it is accessed
> > +                * as minor fault
> > +                */
> > +               if(folio_mapcount(folio))
> > +                       folio_mark_accessed(folio);
> >                /*
> >                 * We found the page, so try async readahead before waiting for
> >                 * the lock.
> >
> Please find bellow for the stat machine table of updated RFC, where RFC behaves same or enhances the scan efficiency by promoting the page in shrink_active_list.
>
> (1)
>                                   1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
> mainline                     INA/UNR                        NA                          INA/REF                               DROP
> RFC                           INA/UNR                        NA                          INA/REF                              DROP
> RA                              INA/UNR                        NA                          INA/REF                              DROP
>
> (2)
>                                   1st access                 2nd access               shrink_active_list          1st scan(shink_folio_list)
> mainline                     INA/UNR                     INA/UNR                       NA                                 ACT/REF
> RFC                           INA/UNR                     INA/REF                       NA                                 ACT/REF
> RA                              INA/UNR                     INA/REF                       NA                                 ACT/REF
>
> (3)
>                                   1st access        1st scan(shink_folio_list)       2nd access      2nd scan(shrink_active_list)     3rd scan(shrink_folio_list)
> mainline                     INA/UNR                    INA/REF                           INA/REF                NA                                     ACT/REF
> RFC                           INA/UNR                    INA/REF                           ACT/REF                ACT/REF                           NA
> (VM_EXEC)
> RFC                           INA/UNR                    INA/REF                           ACT/REF                INA/REF                            DROP
> (non VM_EXEC)
> RA                              INA/UNR                    INA/REF                           INA/REF                NA                                     ACT/REF
>
> (4)
>                                   1st access                   2nd access                   3rd access          1st scan(shrink_active_list)   2nd scan(shink_folio_list)
> mainline                     INA/UNR                       INA/UNR                       INA/UNR                    NA                                 ACT/REF
> RFC                           INA/UNR                       INA/REF                         ACT/REF                  ACT/REF                       NA
> (VM_EXEC)
> RFC                           INA/UNR                       INA/REF                        ACT/REF                   ACT/REF                       NA
> (Non VM_EXEC)
> RA                              INA/UNR                       INA/REF                        ACT/REF                   ACT/REF                       NA
> > >
> > > So it doesn't behave the same way the mainline does for the first case
> > > you listed. (I didn't look at the rest of the cases.)
Zhaoyang Huang Jan. 2, 2024, 5:36 a.m. UTC | #14
I update the patch on v515 as below[1], which calls
mark_page_accessed(mapped_page) in filemap_map_pages and
filemap_fault<in minor fault path> to promote none-single-use pages(by
waiving major fault) earlier than the first scan does now. The patch
is verified on a 2G RAM based android system by the test script, which
loops the following 4 steps[2] for 30 times and measures the numbers
of trace event "mm_filemap_add_to_page_cache/mm_filemap_delete_from_page_cache"
which can be deemed as page cache retention rate(could be thrashing
rate also). In terms of the test result, RFC could save 10% of the
add_to page in each phase and get a better APP start time(decreased
5%) .

[1]
diff --git a/mm/filemap.c b/mm/filemap.c
index 279380c..308e415 100644
+++ b/mm/filemap.c

@@ -3124,6 +3125,7 @@
  filemap_invalidate_lock_shared(mapping);
  mapping_locked = true;
  }
+ mark_page_accessed(page);
  } else {
  /* No page in the page cache at all */
  count_vm_event(PGMAJFAULT);
@@ -3147,7 +3149,7 @@
  goto out_retry;
  filemap_invalidate_unlock_shared(mapping);
  return VM_FAULT_OOM;
  }
 }

  if (!lock_page_maybe_drop_mmap(vmf, page, &fpin))
@@ -3388,8 +3390,10 @@

  /* We're about to handle the fault */
  if (vmf->address == addr)
   ret = VM_FAULT_NOPAGE;

+ if (page_mapcount(page))
+ mark_page_accessed(page);
  do_set_pte(vmf, page, addr);
  /* no need to invalidate: a not-present page won't be cached */
  update_mmu_cache(vma, addr, vmf->pte);

[2]
1. start an APP
2. malloc and mlock 512MB pages
3. restart the APP
4. kill the APP

[3]
                                                               v515
                                                      RFC
                                                               add_to
              delete_from                    add_to
delete_from
1. start an APP                                       41290
     88279                             32235                    79305
2. malloc and mlock 512MB pages        342103               374396
                      304310                  339048
3. restart the APP                                   46552
    162456                           42279                    176368
4. kill the APP

On Sat, Dec 23, 2023 at 10:41 AM Yu Zhao <yuzhao@google.com> wrote:
>
> On Fri, Dec 22, 2023 at 2:41 AM 黄朝阳 (Zhaoyang Huang)
> <zhaoyang.huang@unisoc.com> wrote:
> >
> >
> >
> > On Fri, Dec 22, 2023 at 2:45 PM Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > On Thu, Dec 21, 2023 at 11:29 PM 黄朝阳 (Zhaoyang Huang)
> > > <zhaoyang.huang@unisoc.com> wrote:
> > > >
> > > >
> > > > On Thu, Dec 21, 2023 at 10:53 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > > >
> > > > > On Thu, Dec 21, 2023 at 2:33 PM Yu Zhao <yuzhao@google.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 11:28 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Dec 21, 2023 at 12:53 PM Yu Zhao <yuzhao@google.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 9:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Dec 21, 2023 at 09:58:25AM +0800, Zhaoyang Huang wrote:
> > > > > > > > > > On Wed, Dec 20, 2023 at 10:14 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Dec 20, 2023 at 06:29:48PM +0800, zhaoyang.huang wrote:
> > > > > > > > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > > > > > > > > >
> > > > > > > > > > > > Inactive mapped folio will be promoted to active only when it is
> > > > > > > > > > > > scanned in shrink_inactive_list, while the vfs folio will do this
> > > > > > > > > > > > immidiatly when it is accessed. These will introduce two affections:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. NR_ACTIVE_FILE is not accurate as expected.
> > > > > > > > > > > > 2. Low reclaiming efficiency caused by dummy nactive folio which should
> > > > > > > > > > > >    be kept as earlier as shrink_active_list.
> > > > > > > > > > > >
> > > > > > > > > > > > I would like to suggest mark the folio be accessed in minor fault to
> > > > > > > > > > > > solve this situation.
> > > > > > > > > > >
> > > > > > > > > > > This isn't going to be as effective as you imagine.  Almost all file
> > > > > > > > > > > faults are handled through filemap_map_pages().  So I must ask, what
> > > > > > > > > > > testing have you done with this patch?
> > > > > > > > > > >
> > > > > > > > > > > And while you're gathering data, what effect would this patch have on your
> > > > > > > > > > > workloads?
> > > > > > > > > > Thanks for heads-up, I am out of date for readahead mechanism. My goal
> > > > > > > > >
> > > > > > > > > It's not a terribly new mechanism ... filemap_map_pages() was added nine
> > > > > > > > > years ago in 2014 by commit f1820361f83d
> > > > > > > > >
> > > > > > > > > > is to have mapped file pages behave like other pages which could be
> > > > > > > > > > promoted immediately when they are accessed. I will update the patch
> > > > > > > > > > and provide benchmark data in new patch set.
> > > > > > > > >
> > > > > > > > > Understood.  I don't know the history of this, so I'm not sure if the
> > > > > > > > > decision to not mark folios as accessed here was intentional or not.
> > > > > > > > > I suspect it's entirely unintentional.
> > > > > > > >
> > > > > > > > It's intentional. For the active/inactive LRU, all folios start
> > > > > > > > inactive. The first scan of a folio transfers the A-bit (if it's set
> > > > > > > > during the initial fault) to PG_referenced; the second scan of this
> > > > > > > > folio, if the A-bit is set again, moves it to the active list. This
> > > > > > > > way single-use folios, i.e., folios mapped for file streaming, can be
> > > > > > > > reclaimed quickly, since they are "demoted" rather than "promoted" on
> > > > > > > > the second scan. This RFC would regress memory streaming workloads.
> > > > > > > Thanks. Please correct me if I am wrong. IMO, there will be no
> > > > > > > minor-fault for single-use folios
> > > > > >
> > > > > > Why not? What prevents a specific *access pattern* from triggering minor faults?
> > > > > Please find the following chart for mapped page state machine
> > > > > transfication.
> > > >
> > > > > I'm not sure what you are asking me to look at -- is the following
> > > > > trying to illustrate something related to my question above?
> > > >
> > > > sorry for my fault on table generation, resend it, I am trying to present how RFC performs in a page's stat transfer
> > > >
> > > > 1. RFC behaves the same as the mainline in (1)(2)
> > > > 2. VM_EXEC mapped pages are activated earlier than mainline which help improve scan efficiency in (3)(4)
> > > > 3. none VM_EXEC mapped pages are dropped as vfs pages do during 3rd scan.
> > > >
> > > > (1)
> > > >                                   1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
> > > > mainline                     INA/UNR                        NA                          INA/REF                               DROP
> > > > RFC                           INA/UNR                        NA                           INA/REF                              DROP
> > >
> > > > I don't think this is the case -- with this RFC, *readahead* folios,
> > > > which are added into pagecache as INA/UNR, become PG_referenced upon
> > > > the initial fault (first access), i.e., INA/REF. The first scan will
> > > > actually activate them, i.e., they become ACT/UNR, because they have
> > > > both PG_referenced and the A-bit.
> > > No,Sorry for the confusion. This RFC actually aims at minor fault of
> > > the faulted pages(with one pte setup). In terms of the readahead
> > > pages, can we solve it by add one criteria as bellow, which unifies
> > > all kinds of mapped pages in RFC.
>
> Again this is still wrong -- how do you know the other process mapping
> this folio isn't also streaming the file?
>
> It'd be best to take a step back and think through my original
> question: what prevents a specific *access pattern* from triggering
> minor faults? The simple answer is that you can't.
I agree with that and get more puzzled. The RFC's goal is that the
more minor faults over the page the sooner it gets promoted as vfs
pages do.

It's intentional. For the active/inactive LRU, all folios start
inactive. The first scan of a folio transfers the A-bit (if it's set
during the initial fault) to PG_referenced;
[RFC behaves the same as above]
the second scan of this folio, if the A-bit is set again, moves it to
the active list.
[RFC is NOT against this but just let minor faults promote the page in advance]
This way single-use folios, i.e., folios mapped for file streaming,
can be reclaimed quickly, since they are "demoted" rather than
"promoted" on the second scan. This RFC would regress memory streaming
workloads.


>
> > > @@ -3273,6 +3273,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> > >         */
> > >        folio = filemap_get_folio(mapping, index);
> > >        if (likely(!IS_ERR(folio))) {
> > > +               /*
> > > +                * try to promote inactive folio here when it is accessed
> > > +                * as minor fault
> > > +                */
> > > +               if(folio_mapcount(folio))
> > > +                       folio_mark_accessed(folio);
> > >                /*
> > >                 * We found the page, so try async readahead before waiting for
> > >                 * the lock.
> > >
> > Please find bellow for the stat machine table of updated RFC, where RFC behaves same or enhances the scan efficiency by promoting the page in shrink_active_list.
> >
> > (1)
> >                                   1st access         shrink_active_list              1st scan(shink_folio_list)       2nd scan(shrink_folio_list')
> > mainline                     INA/UNR                        NA                          INA/REF                               DROP
> > RFC                           INA/UNR                        NA                          INA/REF                              DROP
> > RA                              INA/UNR                        NA                          INA/REF                              DROP
> >
> > (2)
> >                                   1st access                 2nd access               shrink_active_list          1st scan(shink_folio_list)
> > mainline                     INA/UNR                     INA/UNR                       NA                                 ACT/REF
> > RFC                           INA/UNR                     INA/REF                       NA                                 ACT/REF
> > RA                              INA/UNR                     INA/REF                       NA                                 ACT/REF
> >
> > (3)
> >                                   1st access        1st scan(shink_folio_list)       2nd access      2nd scan(shrink_active_list)     3rd scan(shrink_folio_list)
> > mainline                     INA/UNR                    INA/REF                           INA/REF                NA                                     ACT/REF
> > RFC                           INA/UNR                    INA/REF                           ACT/REF                ACT/REF                           NA
> > (VM_EXEC)
> > RFC                           INA/UNR                    INA/REF                           ACT/REF                INA/REF                            DROP
> > (non VM_EXEC)
> > RA                              INA/UNR                    INA/REF                           INA/REF                NA                                     ACT/REF
> >
> > (4)
> >                                   1st access                   2nd access                   3rd access          1st scan(shrink_active_list)   2nd scan(shink_folio_list)
> > mainline                     INA/UNR                       INA/UNR                       INA/UNR                    NA                                 ACT/REF
> > RFC                           INA/UNR                       INA/REF                         ACT/REF                  ACT/REF                       NA
> > (VM_EXEC)
> > RFC                           INA/UNR                       INA/REF                        ACT/REF                   ACT/REF                       NA
> > (Non VM_EXEC)
> > RA                              INA/UNR                       INA/REF                        ACT/REF                   ACT/REF                       NA
> > > >
> > > > So it doesn't behave the same way the mainline does for the first case
> > > > you listed. (I didn't look at the rest of the cases.)
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index f0a15ce1bd1b..b1ee6ce716c2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3273,6 +3273,11 @@  vm_fault_t filemap_fault(struct vm_fault *vmf)
 	 */
 	folio = filemap_get_folio(mapping, index);
 	if (likely(!IS_ERR(folio))) {
+		/*
+		 * try to promote inactive folio here when it is accessed
+		 * as minor fault
+		 */
+		folio_mark_accessed(folio);
 		/*
 		 * We found the page, so try async readahead before waiting for
 		 * the lock.