[RFC,1/1] mm: Skip folio with private data during isolation

Message ID	20240826085056.895865-1-zhaoyang.huang@unisoc.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: "zhaoyang.huang" <zhaoyang.huang@unisoc.com> To: Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Matthew Wilcox <willy@infradead.org>, Yu Zhao <yuzhao@google.com>, Baolin Wang <baolin.wang@linux.alibaba.com>, <linux-mm@kvack.org>, Zhaoyang Huang <huangzhaoyang@gmail.com>, <steve.kang@unisoc.com> Subject: [RFC PATCH 1/1] mm: Skip folio with private data during isolation Date: Mon, 26 Aug 2024 16:50:56 +0800 Message-ID: <20240826085056.895865-1-zhaoyang.huang@unisoc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[RFC,1/1] mm: Skip folio with private data during isolation \| expand [RFC,1/1] mm: Skip folio with private data during isolation

zhaoyang.huang Aug. 26, 2024, 8:50 a.m. UTC

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Since clean target folio with private data will be given up finally in
__remove_mapping as it has extra refcnt, it is better to skip it during
isolation to save the slot for more qualified folio. Current one could
be the candidate for next round of scanning after the private data gone.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/vmscan.c | 2 ++
 1 file changed, 2 insertions(+)

David Hildenbrand Aug. 26, 2024, 10:36 a.m. UTC | #1

On 26.08.24 10:50, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> Since clean target folio with private data will be given up finally in
> __remove_mapping as it has extra refcnt, it is better to skip it during
> isolation to save the slot for more qualified folio. Current one could
> be the candidate for next round of scanning after the private data gone.
> 
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>   mm/vmscan.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cfa839284b92..755bf3a387f3 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1685,6 +1685,8 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>   		 */
>   		scan += nr_pages;
>   
> +		if (folio_test_private(folio) && !folio_test_dirty(folio))
> +			goto move;
>   		if (!folio_test_lru(folio))
>   			goto move;
>   		if (!sc->may_unmap && folio_mapped(folio))

An earlier filemap_release_folio() would have failed if the private data 
(buffers) cannot get freed, and we went into the activate_locked path.


if (folio_needs_release(folio)) {
	if (!filemap_release_folio(folio, sc->gfp_mask)
		goto activate_locked;
...

if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) {
	...
} else if (!mapping || !__remove_mapping(mapping, folio, true,
}

At least on the shrink_folio_list() path, I'm not sure the code you are 
adding could even trigger. We should not reach __remove_mapping() with 
folio_test_private().

Zhaoyang Huang Aug. 26, 2024, 11:10 a.m. UTC | #2

On Mon, Aug 26, 2024 at 6:36 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 26.08.24 10:50, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Since clean target folio with private data will be given up finally in
> > __remove_mapping as it has extra refcnt, it is better to skip it during
> > isolation to save the slot for more qualified folio. Current one could
> > be the candidate for next round of scanning after the private data gone.
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >   mm/vmscan.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index cfa839284b92..755bf3a387f3 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1685,6 +1685,8 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> >                */
> >               scan += nr_pages;
> >
> > +             if (folio_test_private(folio) && !folio_test_dirty(folio))
> > +                     goto move;
> >               if (!folio_test_lru(folio))
> >                       goto move;
> >               if (!sc->may_unmap && folio_mapped(folio))
>
> An earlier filemap_release_folio() would have failed if the private data
> (buffers) cannot get freed, and we went into the activate_locked path.
>
>
> if (folio_needs_release(folio)) {
>         if (!filemap_release_folio(folio, sc->gfp_mask)
>                 goto activate_locked;
> ...
>
> if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) {
>         ...
> } else if (!mapping || !__remove_mapping(mapping, folio, true,
> }
>
> At least on the shrink_folio_list() path, I'm not sure the code you are
> adding could even trigger. We should not reach __remove_mapping() with
> folio_test_private().
Thanks for heads up. You are right, the bh is judged if existing
before __remove_mapping. ASAIU, the metadata associated with the bh
has risk to be freed such as journal data etc or it introduces extra
IO. Actually, this patch is inspired by a practical problem we just
run across which the bh remains on LRU for a long time since it is
attached to a journal_head that can not be freed by jbd2.
>
> --
> Cheers,
>
> David / dhildenb
>

David Hildenbrand Aug. 26, 2024, 1:01 p.m. UTC | #3

On 26.08.24 13:10, Zhaoyang Huang wrote:
> On Mon, Aug 26, 2024 at 6:36 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 26.08.24 10:50, zhaoyang.huang wrote:
>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>>
>>> Since clean target folio with private data will be given up finally in
>>> __remove_mapping as it has extra refcnt, it is better to skip it during
>>> isolation to save the slot for more qualified folio. Current one could
>>> be the candidate for next round of scanning after the private data gone.
>>>
>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>> ---
>>>    mm/vmscan.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index cfa839284b92..755bf3a387f3 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1685,6 +1685,8 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
>>>                 */
>>>                scan += nr_pages;
>>>
>>> +             if (folio_test_private(folio) && !folio_test_dirty(folio))
>>> +                     goto move;
>>>                if (!folio_test_lru(folio))
>>>                        goto move;
>>>                if (!sc->may_unmap && folio_mapped(folio))
>>
>> An earlier filemap_release_folio() would have failed if the private data
>> (buffers) cannot get freed, and we went into the activate_locked path.
>>
>>
>> if (folio_needs_release(folio)) {
>>          if (!filemap_release_folio(folio, sc->gfp_mask)
>>                  goto activate_locked;
>> ...
>>
>> if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) {
>>          ...
>> } else if (!mapping || !__remove_mapping(mapping, folio, true,
>> }
>>
>> At least on the shrink_folio_list() path, I'm not sure the code you are
>> adding could even trigger. We should not reach __remove_mapping() with
>> folio_test_private().
> Thanks for heads up. You are right, the bh is judged if existing
> before __remove_mapping. ASAIU, the metadata associated with the bh
> has risk to be freed such as journal data etc or it introduces extra
> IO. Actually, this patch is inspired by a practical problem we just
> run across which the bh remains on LRU for a long time since it is
> attached to a journal_head that can not be freed by jbd2.

Okay, but I assume (as stated) this patch does not have an affect on 
that, or am I missing something?

I assume you have some way to test before/after, if you run into similar 
problems in practice.

Zhaoyang Huang Aug. 27, 2024, 2:45 a.m. UTC | #4

On Mon, Aug 26, 2024 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 26.08.24 13:10, Zhaoyang Huang wrote:
> > On Mon, Aug 26, 2024 at 6:36 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 26.08.24 10:50, zhaoyang.huang wrote:
> >>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >>>
> >>> Since clean target folio with private data will be given up finally in
> >>> __remove_mapping as it has extra refcnt, it is better to skip it during
> >>> isolation to save the slot for more qualified folio. Current one could
> >>> be the candidate for next round of scanning after the private data gone.
> >>>
> >>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >>> ---
> >>>    mm/vmscan.c | 2 ++
> >>>    1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>> index cfa839284b92..755bf3a387f3 100644
> >>> --- a/mm/vmscan.c
> >>> +++ b/mm/vmscan.c
> >>> @@ -1685,6 +1685,8 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> >>>                 */
> >>>                scan += nr_pages;
> >>>
> >>> +             if (folio_test_private(folio) && !folio_test_dirty(folio))
> >>> +                     goto move;
> >>>                if (!folio_test_lru(folio))
> >>>                        goto move;
> >>>                if (!sc->may_unmap && folio_mapped(folio))
> >>
> >> An earlier filemap_release_folio() would have failed if the private data
> >> (buffers) cannot get freed, and we went into the activate_locked path.
> >>
> >>
> >> if (folio_needs_release(folio)) {
> >>          if (!filemap_release_folio(folio, sc->gfp_mask)
> >>                  goto activate_locked;
> >> ...
> >>
> >> if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) {
> >>          ...
> >> } else if (!mapping || !__remove_mapping(mapping, folio, true,
> >> }
> >>
> >> At least on the shrink_folio_list() path, I'm not sure the code you are
> >> adding could even trigger. We should not reach __remove_mapping() with
> >> folio_test_private().
> > Thanks for heads up. You are right, the bh is judged if existing
> > before __remove_mapping. ASAIU, the metadata associated with the bh
> > has risk to be freed such as journal data etc or it introduces extra
> > IO. Actually, this patch is inspired by a practical problem we just
> > run across which the bh remains on LRU for a long time since it is
> > attached to a journal_head that can not be freed by jbd2.
>
> Okay, but I assume (as stated) this patch does not have an affect on
> that, or am I missing something?
No, this is actually a migration failure issue[1] related to cma_alloc
where the bh keeps busy as the journal's transaction can't be
launched[2][3][4][5][6]. I am just inspired by this issue to check if
there is anything to do in reclaiming. By counting from a ramdump,
there are 300MB "lru, private" pages found in a 6GB RAM system which
could lower the reclaiming efficiency if the same scenario as above
happens.

[1]
crash_arm64_v8.0.4++> kmem -p|grep
ffffff808f0aa150(sb->s_bdev->bd_inode->i_mapping)
fffffffe01a51c00  e9470000 ffffff808f0aa150        3  2 8000000008020
lru,private //within CMA area
fffffffe03d189c0 174627000 ffffff808f0aa150        4  2
2004000000008020 lru,private
fffffffe03d88e00 176238000 ffffff808f0aa150      3f9  2
2008000000008020 lru,private
fffffffe03d88e40 176239000 ffffff808f0aa150        6  2
2008000000008020 lru,private
fffffffe03d88e80 17623a000 ffffff808f0aa150        5  2
2008000000008020 lru,private
fffffffe03d88ec0 17623b000 ffffff808f0aa150        1  2
2008000000008020 lru,private
fffffffe03d88f00 17623c000 ffffff808f0aa150        0  2
2008000000008020 lru,private
fffffffe040e6540 183995000 ffffff808f0aa150      3f4  2
2004000000008020 lru,private

[2] page -> buffer_head
crash_arm64_v8.0.4++> struct page.private fffffffe01a51c00 -x
      private = 0xffffff802fca0c00

[3] buffer_head -> journal_head
crash_arm64_v8.0.4++> struct buffer_head.b_private 0xffffff802fca0c00
  b_private = 0xffffff8041338e10,

[4] journal_head -> b_cp_transaction
crash_arm64_v8.0.4++> struct journal_head.b_cp_transaction 0xffffff8041338e10 -x
  b_cp_transaction = 0xffffff80410f1900,

[5] transaction_t -> journal
crash_arm64_v8.0.4++> struct transaction_t.t_journal 0xffffff80410f1900 -x
  t_journal = 0xffffff80e70f3000,

[6] j_free & j_max_transaction_buffers (j_free >
j_max_transaction_buffers, transaction would NOT be launched)
crash_arm64_v8.0.4++> struct
journal_t.j_free,j_max_transaction_buffers 0xffffff80e70f3000 -x
  j_free = 0x3f1,
  j_max_transaction_buffers = 0x100,

>
> I assume you have some way to test before/after, if you run into similar
> problems in practice.
We solve the migration issue by moving the affected file to a
none-journal disk. Whereas, I think the buffered folio issue(carrying
meta data) could be taken into consideration to do something.
>
> --
> Cheers,
>
> David / dhildenb
>

Matthew Wilcox Aug. 27, 2024, 2:49 a.m. UTC | #5

On Tue, Aug 27, 2024 at 10:45:11AM +0800, Zhaoyang Huang wrote:
> On Mon, Aug 26, 2024 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
> > On 26.08.24 13:10, Zhaoyang Huang wrote:
> > > On Mon, Aug 26, 2024 at 6:36 PM David Hildenbrand <david@redhat.com> wrote:
> > >> An earlier filemap_release_folio() would have failed if the private data
> > >> (buffers) cannot get freed, and we went into the activate_locked path.
> > >>
> > >> if (folio_needs_release(folio)) {
> > >>          if (!filemap_release_folio(folio, sc->gfp_mask)
> > >>                  goto activate_locked;
> > >> ...

> No, this is actually a migration failure issue[1] related to cma_alloc
> where the bh keeps busy as the journal's transaction can't be
> launched[2][3][4][5][6]. I am just inspired by this issue to check if
> there is anything to do in reclaiming. By counting from a ramdump,
> there are 300MB "lru, private" pages found in a 6GB RAM system which
> could lower the reclaiming efficiency if the same scenario as above
> happens.

As David said, if we encounter a filesystem folio with private data,
we ask the filesystem to strip the private data off.  Usually this
succeeds, because most folios aren't part of the journal for very long.

Zhaoyang Huang Aug. 27, 2024, 2:53 a.m. UTC | #6

On Tue, Aug 27, 2024 at 10:49 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Aug 27, 2024 at 10:45:11AM +0800, Zhaoyang Huang wrote:
> > On Mon, Aug 26, 2024 at 9:01 PM David Hildenbrand <david@redhat.com> wrote:
> > > On 26.08.24 13:10, Zhaoyang Huang wrote:
> > > > On Mon, Aug 26, 2024 at 6:36 PM David Hildenbrand <david@redhat.com> wrote:
> > > >> An earlier filemap_release_folio() would have failed if the private data
> > > >> (buffers) cannot get freed, and we went into the activate_locked path.
> > > >>
> > > >> if (folio_needs_release(folio)) {
> > > >>          if (!filemap_release_folio(folio, sc->gfp_mask)
> > > >>                  goto activate_locked;
> > > >> ...
>
> > No, this is actually a migration failure issue[1] related to cma_alloc
> > where the bh keeps busy as the journal's transaction can't be
> > launched[2][3][4][5][6]. I am just inspired by this issue to check if
> > there is anything to do in reclaiming. By counting from a ramdump,
> > there are 300MB "lru, private" pages found in a 6GB RAM system which
> > could lower the reclaiming efficiency if the same scenario as above
> > happens.
>
> As David said, if we encounter a filesystem folio with private data,
> we ask the filesystem to strip the private data off.  Usually this
> succeeds, because most folios aren't part of the journal for very long.
Got it, thanks for clarifying.

[RFC,1/1] mm: Skip folio with private data during isolation

Commit Message

Comments

Patch