diff mbox series

[RFC,3/3] mm: remove folio_test_anon(folio)==false path in __folio_add_anon_rmap()

Message ID 20240613000721.23093-4-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap() | expand

Commit Message

Barry Song June 13, 2024, 12:07 a.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

The folio_test_anon(folio)==false case within do_swap_page() has been
relocated to folio_add_new_anon_rmap(). Additionally, two other callers
consistently pass anonymous folios.

stack 1:
remove_migration_pmd
   -> folio_add_anon_rmap_pmd
     -> __folio_add_anon_rmap

stack 2:
__split_huge_pmd_locked
   -> folio_add_anon_rmap_ptes
      -> __folio_add_anon_rmap

__folio_add_anon_rmap() only needs to handle the cases
folio_test_anon(folio)==true now.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/rmap.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

Comments

Barry Song June 13, 2024, 8:46 a.m. UTC | #1
On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
>
> From: Barry Song <v-songbaohua@oppo.com>
>
> The folio_test_anon(folio)==false case within do_swap_page() has been
> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
> consistently pass anonymous folios.
>
> stack 1:
> remove_migration_pmd
>    -> folio_add_anon_rmap_pmd
>      -> __folio_add_anon_rmap
>
> stack 2:
> __split_huge_pmd_locked
>    -> folio_add_anon_rmap_ptes
>       -> __folio_add_anon_rmap
>
> __folio_add_anon_rmap() only needs to handle the cases
> folio_test_anon(folio)==true now.

My team reported a case where swapoff() is calling
folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
with one new anon  (!folio_test_anon(folio)).

I will double check all folio_add_anon_rmap_pte() cases.

>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  mm/rmap.c | 17 +++--------------
>  1 file changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e612d999811a..e84c706c8241 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1299,21 +1299,10 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
>
>         nr = __folio_add_rmap(folio, page, nr_pages, level, &nr_pmdmapped);
>
> -       if (unlikely(!folio_test_anon(folio))) {
> -               VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> -               /*
> -                * For a PTE-mapped large folio, we only know that the single
> -                * PTE is exclusive. Further, __folio_set_anon() might not get
> -                * folio->index right when not given the address of the head
> -                * page.
> -                */
> -               VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> -                                level != RMAP_LEVEL_PMD, folio);
> -               __folio_set_anon(folio, vma, address,
> -                                !!(flags & RMAP_EXCLUSIVE));
> -       } else if (likely(!folio_test_ksm(folio))) {
> +       VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> +
> +       if (likely(!folio_test_ksm(folio)))
>                 __page_check_anon_rmap(folio, page, vma, address);
> -       }
>
>         __folio_mod_stat(folio, nr, nr_pmdmapped);
>
> --
> 2.34.1
>
David Hildenbrand June 13, 2024, 8:49 a.m. UTC | #2
On 13.06.24 10:46, Barry Song wrote:
> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
>>
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> The folio_test_anon(folio)==false case within do_swap_page() has been
>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
>> consistently pass anonymous folios.
>>
>> stack 1:
>> remove_migration_pmd
>>     -> folio_add_anon_rmap_pmd
>>       -> __folio_add_anon_rmap
>>
>> stack 2:
>> __split_huge_pmd_locked
>>     -> folio_add_anon_rmap_ptes
>>        -> __folio_add_anon_rmap
>>
>> __folio_add_anon_rmap() only needs to handle the cases
>> folio_test_anon(folio)==true now.
> 
> My team reported a case where swapoff() is calling
> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
> with one new anon  (!folio_test_anon(folio)).
> 
> I will double check all folio_add_anon_rmap_pte() cases.

Right, swapoff() path is a bit special (e.g., we don't do any kind of 
batching on the swapoff() path).

But we should never get a new large anon folio there, or could that now 
happen?
Barry Song June 13, 2024, 9:06 a.m. UTC | #3
On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 13.06.24 10:46, Barry Song wrote:
> > On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
> >>
> >> From: Barry Song <v-songbaohua@oppo.com>
> >>
> >> The folio_test_anon(folio)==false case within do_swap_page() has been
> >> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
> >> consistently pass anonymous folios.
> >>
> >> stack 1:
> >> remove_migration_pmd
> >>     -> folio_add_anon_rmap_pmd
> >>       -> __folio_add_anon_rmap
> >>
> >> stack 2:
> >> __split_huge_pmd_locked
> >>     -> folio_add_anon_rmap_ptes
> >>        -> __folio_add_anon_rmap
> >>
> >> __folio_add_anon_rmap() only needs to handle the cases
> >> folio_test_anon(folio)==true now.
> >
> > My team reported a case where swapoff() is calling
> > folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
> > with one new anon  (!folio_test_anon(folio)).
> >
> > I will double check all folio_add_anon_rmap_pte() cases.
>
> Right, swapoff() path is a bit special (e.g., we don't do any kind of
> batching on the swapoff() path).
>
> But we should never get a new large anon folio there, or could that now
> happen?

My team encountered the following issue while testing this RFC
series on real hardware. Let me take a look to identify the
problem.

[  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
[  261.214213][T11285] memcg:ffffff8003678000
[  261.214217][T11285] aops:swap_aops
[  261.214233][T11285] flags:
0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
[  261.214241][T11285] page_type: 0x0()
[  261.214246][T11285] raw: 2000000000081009 0000000000000000
dead000000000122 0000000000000000
[  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
0000000400000000 ffffff8003678000
[  261.214254][T11285] page dumped because:
VM_WARN_ON_FOLIO(!folio_test_anon(folio))
[  261.214257][T11285] page_owner tracks the page as allocated
[  261.214260][T11285] page last allocated via order 0, migratetype
Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
[  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
[  261.214284][T11285]  prep_new_page+0x28/0x13c
[  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
[  261.214298][T11285]  __alloc_pages+0x15c/0x330
[  261.214304][T11285]  __folio_alloc+0x1c/0x4c
[  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
[  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
[  261.214326][T11285]  swapin_readahead+0x64/0x448
[  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
[  261.214337][T11285]  invoke_syscall+0x58/0x114
[  261.214353][T11285]  el0_svc_common+0xac/0xe0
[  261.214360][T11285]  do_el0_svc+0x1c/0x28
[  261.214366][T11285]  el0_svc+0x38/0x68
[  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
[  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
[  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
[  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
[  261.214395][T11285]  free_unref_page_list+0x84/0x378
[  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
[  261.214409][T11285]  evict_folios+0x1458/0x19b4
[  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
[  261.214422][T11285]  shrink_one+0xa8/0x234
[  261.214427][T11285]  shrink_node+0xb38/0xde0
[  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
[  261.214437][T11285]  kswapd+0x290/0x4e4
[  261.214442][T11285]  kthread+0x114/0x1bc
[  261.214459][T11285]  ret_from_fork+0x10/0x20
[  261.214477][T11285] ------------[ cut here ]------------
[  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
folio_add_anon_rmap_ptes+0x2b4/0x330

[  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
[  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
[  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
[  261.215433][T11285] sp : ffffffc0a7dbbbf0
[  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
x27: ffffff80db480000
[  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
x24: 0000007b44941000
[  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
x21: fffffffe04cc2980
[  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
x18: ffffffed011dae80
[  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
x15: 0000000000000004
[  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
x12: 0000000000000003
[  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
: 0d46c0889b468e00
[  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
: 322e31363220205b
[  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
: 0000000000000000
[  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
: ffffff8068c15c80
[  261.215501][T11285] Call trace:
[  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
[  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
[  261.215516][T11285]  invoke_syscall+0x58/0x114
[  261.215526][T11285]  el0_svc_common+0xac/0xe0
[  261.215532][T11285]  do_el0_svc+0x1c/0x28
[  261.215539][T11285]  el0_svc+0x38/0x68
[  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
[  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
[  261.215552][T11285] ---[ end trace 0000000000000000 ]---


>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
David Hildenbrand June 13, 2024, 9:12 a.m. UTC | #4
On 13.06.24 11:06, Barry Song wrote:
> On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 13.06.24 10:46, Barry Song wrote:
>>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
>>>>
>>>> From: Barry Song <v-songbaohua@oppo.com>
>>>>
>>>> The folio_test_anon(folio)==false case within do_swap_page() has been
>>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
>>>> consistently pass anonymous folios.
>>>>
>>>> stack 1:
>>>> remove_migration_pmd
>>>>      -> folio_add_anon_rmap_pmd
>>>>        -> __folio_add_anon_rmap
>>>>
>>>> stack 2:
>>>> __split_huge_pmd_locked
>>>>      -> folio_add_anon_rmap_ptes
>>>>         -> __folio_add_anon_rmap
>>>>
>>>> __folio_add_anon_rmap() only needs to handle the cases
>>>> folio_test_anon(folio)==true now.
>>>
>>> My team reported a case where swapoff() is calling
>>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
>>> with one new anon  (!folio_test_anon(folio)).
>>>
>>> I will double check all folio_add_anon_rmap_pte() cases.
>>
>> Right, swapoff() path is a bit special (e.g., we don't do any kind of
>> batching on the swapoff() path).
>>
>> But we should never get a new large anon folio there, or could that now
>> happen?
> 
> My team encountered the following issue while testing this RFC
> series on real hardware. Let me take a look to identify the
> problem.
> 
> [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
> mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
> [  261.214213][T11285] memcg:ffffff8003678000
> [  261.214217][T11285] aops:swap_aops
> [  261.214233][T11285] flags:
> 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
> [  261.214241][T11285] page_type: 0x0()
> [  261.214246][T11285] raw: 2000000000081009 0000000000000000
> dead000000000122 0000000000000000
> [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
> 0000000400000000 ffffff8003678000
> [  261.214254][T11285] page dumped because:
> VM_WARN_ON_FOLIO(!folio_test_anon(folio))
> [  261.214257][T11285] page_owner tracks the page as allocated
> [  261.214260][T11285] page last allocated via order 0, migratetype
> Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
> 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
> [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
> [  261.214284][T11285]  prep_new_page+0x28/0x13c
> [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
> [  261.214298][T11285]  __alloc_pages+0x15c/0x330
> [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
> [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
> [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
> [  261.214326][T11285]  swapin_readahead+0x64/0x448
> [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
> [  261.214337][T11285]  invoke_syscall+0x58/0x114
> [  261.214353][T11285]  el0_svc_common+0xac/0xe0
> [  261.214360][T11285]  do_el0_svc+0x1c/0x28
> [  261.214366][T11285]  el0_svc+0x38/0x68
> [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
> [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
> [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
> [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
> [  261.214395][T11285]  free_unref_page_list+0x84/0x378
> [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
> [  261.214409][T11285]  evict_folios+0x1458/0x19b4
> [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
> [  261.214422][T11285]  shrink_one+0xa8/0x234
> [  261.214427][T11285]  shrink_node+0xb38/0xde0
> [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
> [  261.214437][T11285]  kswapd+0x290/0x4e4
> [  261.214442][T11285]  kthread+0x114/0x1bc
> [  261.214459][T11285]  ret_from_fork+0x10/0x20
> [  261.214477][T11285] ------------[ cut here ]------------
> [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
> folio_add_anon_rmap_ptes+0x2b4/0x330
> 
> [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
> -SSBS BTYPE=--)
> [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
> [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
> [  261.215433][T11285] sp : ffffffc0a7dbbbf0
> [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
> x27: ffffff80db480000
> [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
> x24: 0000007b44941000
> [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
> x21: fffffffe04cc2980
> [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
> x18: ffffffed011dae80
> [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
> x15: 0000000000000004
> [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
> x12: 0000000000000003
> [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
> : 0d46c0889b468e00
> [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
> : 322e31363220205b
> [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
> : 0000000000000000
> [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
> : ffffff8068c15c80
> [  261.215501][T11285] Call trace:
> [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
> [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
> [  261.215516][T11285]  invoke_syscall+0x58/0x114
> [  261.215526][T11285]  el0_svc_common+0xac/0xe0
> [  261.215532][T11285]  do_el0_svc+0x1c/0x28
> [  261.215539][T11285]  el0_svc+0x38/0x68
> [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
> [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
> [  261.215552][T11285] ---[ end trace 0000000000000000 ]---

Ah, yes. in unuse_pte(), you'll have to do the right thing if 
!folio_test(anon) before doing the folio_add_anon_rmap_pte().

You might have a fresh anon folio in the swapcache that was never mapped 
(hopefully order-0, otherwise we'd likely be in trouble).
Barry Song June 14, 2024, 7:56 a.m. UTC | #5
On Thu, Jun 13, 2024 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 13.06.24 11:06, Barry Song wrote:
> > On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 13.06.24 10:46, Barry Song wrote:
> >>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
> >>>>
> >>>> From: Barry Song <v-songbaohua@oppo.com>
> >>>>
> >>>> The folio_test_anon(folio)==false case within do_swap_page() has been
> >>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
> >>>> consistently pass anonymous folios.
> >>>>
> >>>> stack 1:
> >>>> remove_migration_pmd
> >>>>      -> folio_add_anon_rmap_pmd
> >>>>        -> __folio_add_anon_rmap
> >>>>
> >>>> stack 2:
> >>>> __split_huge_pmd_locked
> >>>>      -> folio_add_anon_rmap_ptes
> >>>>         -> __folio_add_anon_rmap
> >>>>
> >>>> __folio_add_anon_rmap() only needs to handle the cases
> >>>> folio_test_anon(folio)==true now.
> >>>
> >>> My team reported a case where swapoff() is calling
> >>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
> >>> with one new anon  (!folio_test_anon(folio)).
> >>>
> >>> I will double check all folio_add_anon_rmap_pte() cases.
> >>
> >> Right, swapoff() path is a bit special (e.g., we don't do any kind of
> >> batching on the swapoff() path).
> >>
> >> But we should never get a new large anon folio there, or could that now
> >> happen?
> >
> > My team encountered the following issue while testing this RFC
> > series on real hardware. Let me take a look to identify the
> > problem.
> >
> > [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
> > mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
> > [  261.214213][T11285] memcg:ffffff8003678000
> > [  261.214217][T11285] aops:swap_aops
> > [  261.214233][T11285] flags:
> > 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
> > [  261.214241][T11285] page_type: 0x0()
> > [  261.214246][T11285] raw: 2000000000081009 0000000000000000
> > dead000000000122 0000000000000000
> > [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
> > 0000000400000000 ffffff8003678000
> > [  261.214254][T11285] page dumped because:
> > VM_WARN_ON_FOLIO(!folio_test_anon(folio))
> > [  261.214257][T11285] page_owner tracks the page as allocated
> > [  261.214260][T11285] page last allocated via order 0, migratetype
> > Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
> > 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
> > [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
> > [  261.214284][T11285]  prep_new_page+0x28/0x13c
> > [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
> > [  261.214298][T11285]  __alloc_pages+0x15c/0x330
> > [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
> > [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
> > [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
> > [  261.214326][T11285]  swapin_readahead+0x64/0x448
> > [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
> > [  261.214337][T11285]  invoke_syscall+0x58/0x114
> > [  261.214353][T11285]  el0_svc_common+0xac/0xe0
> > [  261.214360][T11285]  do_el0_svc+0x1c/0x28
> > [  261.214366][T11285]  el0_svc+0x38/0x68
> > [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
> > [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
> > [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
> > [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
> > [  261.214395][T11285]  free_unref_page_list+0x84/0x378
> > [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
> > [  261.214409][T11285]  evict_folios+0x1458/0x19b4
> > [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
> > [  261.214422][T11285]  shrink_one+0xa8/0x234
> > [  261.214427][T11285]  shrink_node+0xb38/0xde0
> > [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
> > [  261.214437][T11285]  kswapd+0x290/0x4e4
> > [  261.214442][T11285]  kthread+0x114/0x1bc
> > [  261.214459][T11285]  ret_from_fork+0x10/0x20
> > [  261.214477][T11285] ------------[ cut here ]------------
> > [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
> > folio_add_anon_rmap_ptes+0x2b4/0x330
> >
> > [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
> > -SSBS BTYPE=--)
> > [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
> > [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
> > [  261.215433][T11285] sp : ffffffc0a7dbbbf0
> > [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
> > x27: ffffff80db480000
> > [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
> > x24: 0000007b44941000
> > [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
> > x21: fffffffe04cc2980
> > [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
> > x18: ffffffed011dae80
> > [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
> > x15: 0000000000000004
> > [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
> > x12: 0000000000000003
> > [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
> > : 0d46c0889b468e00
> > [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
> > : 322e31363220205b
> > [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
> > : 0000000000000000
> > [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
> > : ffffff8068c15c80
> > [  261.215501][T11285] Call trace:
> > [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
> > [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
> > [  261.215516][T11285]  invoke_syscall+0x58/0x114
> > [  261.215526][T11285]  el0_svc_common+0xac/0xe0
> > [  261.215532][T11285]  do_el0_svc+0x1c/0x28
> > [  261.215539][T11285]  el0_svc+0x38/0x68
> > [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
> > [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
> > [  261.215552][T11285] ---[ end trace 0000000000000000 ]---
>
> Ah, yes. in unuse_pte(), you'll have to do the right thing if
> !folio_test(anon) before doing the folio_add_anon_rmap_pte().
>
> You might have a fresh anon folio in the swapcache that was never mapped
> (hopefully order-0, otherwise we'd likely be in trouble).

Yes. It is order-0

[  261.214260][T11285] page last allocated via order 0, migratetype

Otherwise, we would have encountered this VM_WARN_ON_FOLIO?

__folio_add_anon_rmap()
{
...
VM_WARN_ON_FOLIO(folio_test_large(folio) &&
     level != RMAP_LEVEL_PMD, folio);
...
}

Given that nobody has ever reported this warning, I assume all callers
using folio_add_anon_rmap_pte(s) are right now safe to move to ?

if (!folio_test_anon(folio))
               folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
else
               folio_add_anon_rmap_pte(s)

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
David Hildenbrand June 14, 2024, 8:51 a.m. UTC | #6
On 14.06.24 09:56, Barry Song wrote:
> On Thu, Jun 13, 2024 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 13.06.24 11:06, Barry Song wrote:
>>> On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 13.06.24 10:46, Barry Song wrote:
>>>>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
>>>>>>
>>>>>> From: Barry Song <v-songbaohua@oppo.com>
>>>>>>
>>>>>> The folio_test_anon(folio)==false case within do_swap_page() has been
>>>>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
>>>>>> consistently pass anonymous folios.
>>>>>>
>>>>>> stack 1:
>>>>>> remove_migration_pmd
>>>>>>       -> folio_add_anon_rmap_pmd
>>>>>>         -> __folio_add_anon_rmap
>>>>>>
>>>>>> stack 2:
>>>>>> __split_huge_pmd_locked
>>>>>>       -> folio_add_anon_rmap_ptes
>>>>>>          -> __folio_add_anon_rmap
>>>>>>
>>>>>> __folio_add_anon_rmap() only needs to handle the cases
>>>>>> folio_test_anon(folio)==true now.
>>>>>
>>>>> My team reported a case where swapoff() is calling
>>>>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
>>>>> with one new anon  (!folio_test_anon(folio)).
>>>>>
>>>>> I will double check all folio_add_anon_rmap_pte() cases.
>>>>
>>>> Right, swapoff() path is a bit special (e.g., we don't do any kind of
>>>> batching on the swapoff() path).
>>>>
>>>> But we should never get a new large anon folio there, or could that now
>>>> happen?
>>>
>>> My team encountered the following issue while testing this RFC
>>> series on real hardware. Let me take a look to identify the
>>> problem.
>>>
>>> [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
>>> mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
>>> [  261.214213][T11285] memcg:ffffff8003678000
>>> [  261.214217][T11285] aops:swap_aops
>>> [  261.214233][T11285] flags:
>>> 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
>>> [  261.214241][T11285] page_type: 0x0()
>>> [  261.214246][T11285] raw: 2000000000081009 0000000000000000
>>> dead000000000122 0000000000000000
>>> [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
>>> 0000000400000000 ffffff8003678000
>>> [  261.214254][T11285] page dumped because:
>>> VM_WARN_ON_FOLIO(!folio_test_anon(folio))
>>> [  261.214257][T11285] page_owner tracks the page as allocated
>>> [  261.214260][T11285] page last allocated via order 0, migratetype
>>> Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
>>> 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
>>> [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
>>> [  261.214284][T11285]  prep_new_page+0x28/0x13c
>>> [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
>>> [  261.214298][T11285]  __alloc_pages+0x15c/0x330
>>> [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
>>> [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
>>> [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
>>> [  261.214326][T11285]  swapin_readahead+0x64/0x448
>>> [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
>>> [  261.214337][T11285]  invoke_syscall+0x58/0x114
>>> [  261.214353][T11285]  el0_svc_common+0xac/0xe0
>>> [  261.214360][T11285]  do_el0_svc+0x1c/0x28
>>> [  261.214366][T11285]  el0_svc+0x38/0x68
>>> [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
>>> [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
>>> [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
>>> [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
>>> [  261.214395][T11285]  free_unref_page_list+0x84/0x378
>>> [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
>>> [  261.214409][T11285]  evict_folios+0x1458/0x19b4
>>> [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
>>> [  261.214422][T11285]  shrink_one+0xa8/0x234
>>> [  261.214427][T11285]  shrink_node+0xb38/0xde0
>>> [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
>>> [  261.214437][T11285]  kswapd+0x290/0x4e4
>>> [  261.214442][T11285]  kthread+0x114/0x1bc
>>> [  261.214459][T11285]  ret_from_fork+0x10/0x20
>>> [  261.214477][T11285] ------------[ cut here ]------------
>>> [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
>>> folio_add_anon_rmap_ptes+0x2b4/0x330
>>>
>>> [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
>>> -SSBS BTYPE=--)
>>> [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
>>> [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
>>> [  261.215433][T11285] sp : ffffffc0a7dbbbf0
>>> [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
>>> x27: ffffff80db480000
>>> [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
>>> x24: 0000007b44941000
>>> [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
>>> x21: fffffffe04cc2980
>>> [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
>>> x18: ffffffed011dae80
>>> [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
>>> x15: 0000000000000004
>>> [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
>>> x12: 0000000000000003
>>> [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
>>> : 0d46c0889b468e00
>>> [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
>>> : 322e31363220205b
>>> [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
>>> : 0000000000000000
>>> [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
>>> : ffffff8068c15c80
>>> [  261.215501][T11285] Call trace:
>>> [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
>>> [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
>>> [  261.215516][T11285]  invoke_syscall+0x58/0x114
>>> [  261.215526][T11285]  el0_svc_common+0xac/0xe0
>>> [  261.215532][T11285]  do_el0_svc+0x1c/0x28
>>> [  261.215539][T11285]  el0_svc+0x38/0x68
>>> [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
>>> [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
>>> [  261.215552][T11285] ---[ end trace 0000000000000000 ]---
>>
>> Ah, yes. in unuse_pte(), you'll have to do the right thing if
>> !folio_test(anon) before doing the folio_add_anon_rmap_pte().
>>
>> You might have a fresh anon folio in the swapcache that was never mapped
>> (hopefully order-0, otherwise we'd likely be in trouble).
> 
> Yes. It is order-0
> 
> [  261.214260][T11285] page last allocated via order 0, migratetype
> 
> Otherwise, we would have encountered this VM_WARN_ON_FOLIO?
> 
> __folio_add_anon_rmap()
> {
> ...
> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>       level != RMAP_LEVEL_PMD, folio);
> ...
> }
> 
> Given that nobody has ever reported this warning, I assume all callers
> using folio_add_anon_rmap_pte(s) are right now safe to move to ?

Yes, and we should likely add a VM_WARN_ON_ONCE() here that we have a 
small folio if !anon. If that ever changes, we can assess the situation.

Only swap created "new" anon folios without properly calling the right 
function so far, all other code handles that correctly.
Barry Song June 14, 2024, 8:56 a.m. UTC | #7
On Fri, Jun 14, 2024 at 8:51 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 14.06.24 09:56, Barry Song wrote:
> > On Thu, Jun 13, 2024 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 13.06.24 11:06, Barry Song wrote:
> >>> On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 13.06.24 10:46, Barry Song wrote:
> >>>>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
> >>>>>>
> >>>>>> From: Barry Song <v-songbaohua@oppo.com>
> >>>>>>
> >>>>>> The folio_test_anon(folio)==false case within do_swap_page() has been
> >>>>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
> >>>>>> consistently pass anonymous folios.
> >>>>>>
> >>>>>> stack 1:
> >>>>>> remove_migration_pmd
> >>>>>>       -> folio_add_anon_rmap_pmd
> >>>>>>         -> __folio_add_anon_rmap
> >>>>>>
> >>>>>> stack 2:
> >>>>>> __split_huge_pmd_locked
> >>>>>>       -> folio_add_anon_rmap_ptes
> >>>>>>          -> __folio_add_anon_rmap
> >>>>>>
> >>>>>> __folio_add_anon_rmap() only needs to handle the cases
> >>>>>> folio_test_anon(folio)==true now.
> >>>>>
> >>>>> My team reported a case where swapoff() is calling
> >>>>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
> >>>>> with one new anon  (!folio_test_anon(folio)).
> >>>>>
> >>>>> I will double check all folio_add_anon_rmap_pte() cases.
> >>>>
> >>>> Right, swapoff() path is a bit special (e.g., we don't do any kind of
> >>>> batching on the swapoff() path).
> >>>>
> >>>> But we should never get a new large anon folio there, or could that now
> >>>> happen?
> >>>
> >>> My team encountered the following issue while testing this RFC
> >>> series on real hardware. Let me take a look to identify the
> >>> problem.
> >>>
> >>> [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
> >>> mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
> >>> [  261.214213][T11285] memcg:ffffff8003678000
> >>> [  261.214217][T11285] aops:swap_aops
> >>> [  261.214233][T11285] flags:
> >>> 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
> >>> [  261.214241][T11285] page_type: 0x0()
> >>> [  261.214246][T11285] raw: 2000000000081009 0000000000000000
> >>> dead000000000122 0000000000000000
> >>> [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
> >>> 0000000400000000 ffffff8003678000
> >>> [  261.214254][T11285] page dumped because:
> >>> VM_WARN_ON_FOLIO(!folio_test_anon(folio))
> >>> [  261.214257][T11285] page_owner tracks the page as allocated
> >>> [  261.214260][T11285] page last allocated via order 0, migratetype
> >>> Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
> >>> 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
> >>> [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
> >>> [  261.214284][T11285]  prep_new_page+0x28/0x13c
> >>> [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
> >>> [  261.214298][T11285]  __alloc_pages+0x15c/0x330
> >>> [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
> >>> [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
> >>> [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
> >>> [  261.214326][T11285]  swapin_readahead+0x64/0x448
> >>> [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
> >>> [  261.214337][T11285]  invoke_syscall+0x58/0x114
> >>> [  261.214353][T11285]  el0_svc_common+0xac/0xe0
> >>> [  261.214360][T11285]  do_el0_svc+0x1c/0x28
> >>> [  261.214366][T11285]  el0_svc+0x38/0x68
> >>> [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
> >>> [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
> >>> [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
> >>> [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
> >>> [  261.214395][T11285]  free_unref_page_list+0x84/0x378
> >>> [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
> >>> [  261.214409][T11285]  evict_folios+0x1458/0x19b4
> >>> [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
> >>> [  261.214422][T11285]  shrink_one+0xa8/0x234
> >>> [  261.214427][T11285]  shrink_node+0xb38/0xde0
> >>> [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
> >>> [  261.214437][T11285]  kswapd+0x290/0x4e4
> >>> [  261.214442][T11285]  kthread+0x114/0x1bc
> >>> [  261.214459][T11285]  ret_from_fork+0x10/0x20
> >>> [  261.214477][T11285] ------------[ cut here ]------------
> >>> [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
> >>> folio_add_anon_rmap_ptes+0x2b4/0x330
> >>>
> >>> [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
> >>> -SSBS BTYPE=--)
> >>> [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
> >>> [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
> >>> [  261.215433][T11285] sp : ffffffc0a7dbbbf0
> >>> [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
> >>> x27: ffffff80db480000
> >>> [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
> >>> x24: 0000007b44941000
> >>> [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
> >>> x21: fffffffe04cc2980
> >>> [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
> >>> x18: ffffffed011dae80
> >>> [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
> >>> x15: 0000000000000004
> >>> [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
> >>> x12: 0000000000000003
> >>> [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
> >>> : 0d46c0889b468e00
> >>> [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
> >>> : 322e31363220205b
> >>> [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
> >>> : 0000000000000000
> >>> [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
> >>> : ffffff8068c15c80
> >>> [  261.215501][T11285] Call trace:
> >>> [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
> >>> [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
> >>> [  261.215516][T11285]  invoke_syscall+0x58/0x114
> >>> [  261.215526][T11285]  el0_svc_common+0xac/0xe0
> >>> [  261.215532][T11285]  do_el0_svc+0x1c/0x28
> >>> [  261.215539][T11285]  el0_svc+0x38/0x68
> >>> [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
> >>> [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
> >>> [  261.215552][T11285] ---[ end trace 0000000000000000 ]---
> >>
> >> Ah, yes. in unuse_pte(), you'll have to do the right thing if
> >> !folio_test(anon) before doing the folio_add_anon_rmap_pte().
> >>
> >> You might have a fresh anon folio in the swapcache that was never mapped
> >> (hopefully order-0, otherwise we'd likely be in trouble).
> >
> > Yes. It is order-0
> >
> > [  261.214260][T11285] page last allocated via order 0, migratetype
> >
> > Otherwise, we would have encountered this VM_WARN_ON_FOLIO?
> >
> > __folio_add_anon_rmap()
> > {
> > ...
> > VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> >       level != RMAP_LEVEL_PMD, folio);
> > ...
> > }
> >
> > Given that nobody has ever reported this warning, I assume all callers
> > using folio_add_anon_rmap_pte(s) are right now safe to move to ?
>
> Yes, and we should likely add a VM_WARN_ON_ONCE() here that we have a
> small folio if !anon. If that ever changes, we can assess the situation.
>

this patch actually has a WARN_ON for all !anon, so it extends the WARN
to small.

-       if (unlikely(!folio_test_anon(folio))) {
-               VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
-               /*
-                * For a PTE-mapped large folio, we only know that the single
-                * PTE is exclusive. Further, __folio_set_anon() might not get
-                * folio->index right when not given the address of the head
-                * page.
-                */
-               VM_WARN_ON_FOLIO(folio_test_large(folio) &&
-                                level != RMAP_LEVEL_PMD, folio);
-               __folio_set_anon(folio, vma, address,
-                                !!(flags & RMAP_EXCLUSIVE));
-       } else if (likely(!folio_test_ksm(folio))) {
+       VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+

> Only swap created "new" anon folios without properly calling the right
> function so far, all other code handles that correctly.
>
> --
> Cheers,
>
> David / dhildenb
>
Barry Song June 14, 2024, 8:58 a.m. UTC | #8
On Fri, Jun 14, 2024 at 8:56 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Fri, Jun 14, 2024 at 8:51 PM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 14.06.24 09:56, Barry Song wrote:
> > > On Thu, Jun 13, 2024 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
> > >>
> > >> On 13.06.24 11:06, Barry Song wrote:
> > >>> On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
> > >>>>
> > >>>> On 13.06.24 10:46, Barry Song wrote:
> > >>>>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
> > >>>>>>
> > >>>>>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>>>>
> > >>>>>> The folio_test_anon(folio)==false case within do_swap_page() has been
> > >>>>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
> > >>>>>> consistently pass anonymous folios.
> > >>>>>>
> > >>>>>> stack 1:
> > >>>>>> remove_migration_pmd
> > >>>>>>       -> folio_add_anon_rmap_pmd
> > >>>>>>         -> __folio_add_anon_rmap
> > >>>>>>
> > >>>>>> stack 2:
> > >>>>>> __split_huge_pmd_locked
> > >>>>>>       -> folio_add_anon_rmap_ptes
> > >>>>>>          -> __folio_add_anon_rmap
> > >>>>>>
> > >>>>>> __folio_add_anon_rmap() only needs to handle the cases
> > >>>>>> folio_test_anon(folio)==true now.
> > >>>>>
> > >>>>> My team reported a case where swapoff() is calling
> > >>>>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
> > >>>>> with one new anon  (!folio_test_anon(folio)).
> > >>>>>
> > >>>>> I will double check all folio_add_anon_rmap_pte() cases.
> > >>>>
> > >>>> Right, swapoff() path is a bit special (e.g., we don't do any kind of
> > >>>> batching on the swapoff() path).
> > >>>>
> > >>>> But we should never get a new large anon folio there, or could that now
> > >>>> happen?
> > >>>
> > >>> My team encountered the following issue while testing this RFC
> > >>> series on real hardware. Let me take a look to identify the
> > >>> problem.
> > >>>
> > >>> [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
> > >>> mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
> > >>> [  261.214213][T11285] memcg:ffffff8003678000
> > >>> [  261.214217][T11285] aops:swap_aops
> > >>> [  261.214233][T11285] flags:
> > >>> 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
> > >>> [  261.214241][T11285] page_type: 0x0()
> > >>> [  261.214246][T11285] raw: 2000000000081009 0000000000000000
> > >>> dead000000000122 0000000000000000
> > >>> [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
> > >>> 0000000400000000 ffffff8003678000
> > >>> [  261.214254][T11285] page dumped because:
> > >>> VM_WARN_ON_FOLIO(!folio_test_anon(folio))
> > >>> [  261.214257][T11285] page_owner tracks the page as allocated
> > >>> [  261.214260][T11285] page last allocated via order 0, migratetype
> > >>> Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
> > >>> 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
> > >>> [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
> > >>> [  261.214284][T11285]  prep_new_page+0x28/0x13c
> > >>> [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
> > >>> [  261.214298][T11285]  __alloc_pages+0x15c/0x330
> > >>> [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
> > >>> [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
> > >>> [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
> > >>> [  261.214326][T11285]  swapin_readahead+0x64/0x448
> > >>> [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
> > >>> [  261.214337][T11285]  invoke_syscall+0x58/0x114
> > >>> [  261.214353][T11285]  el0_svc_common+0xac/0xe0
> > >>> [  261.214360][T11285]  do_el0_svc+0x1c/0x28
> > >>> [  261.214366][T11285]  el0_svc+0x38/0x68
> > >>> [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
> > >>> [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
> > >>> [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
> > >>> [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
> > >>> [  261.214395][T11285]  free_unref_page_list+0x84/0x378
> > >>> [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
> > >>> [  261.214409][T11285]  evict_folios+0x1458/0x19b4
> > >>> [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
> > >>> [  261.214422][T11285]  shrink_one+0xa8/0x234
> > >>> [  261.214427][T11285]  shrink_node+0xb38/0xde0
> > >>> [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
> > >>> [  261.214437][T11285]  kswapd+0x290/0x4e4
> > >>> [  261.214442][T11285]  kthread+0x114/0x1bc
> > >>> [  261.214459][T11285]  ret_from_fork+0x10/0x20
> > >>> [  261.214477][T11285] ------------[ cut here ]------------
> > >>> [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
> > >>> folio_add_anon_rmap_ptes+0x2b4/0x330
> > >>>
> > >>> [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
> > >>> -SSBS BTYPE=--)
> > >>> [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
> > >>> [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
> > >>> [  261.215433][T11285] sp : ffffffc0a7dbbbf0
> > >>> [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
> > >>> x27: ffffff80db480000
> > >>> [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
> > >>> x24: 0000007b44941000
> > >>> [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
> > >>> x21: fffffffe04cc2980
> > >>> [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
> > >>> x18: ffffffed011dae80
> > >>> [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
> > >>> x15: 0000000000000004
> > >>> [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
> > >>> x12: 0000000000000003
> > >>> [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
> > >>> : 0d46c0889b468e00
> > >>> [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
> > >>> : 322e31363220205b
> > >>> [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
> > >>> : 0000000000000000
> > >>> [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
> > >>> : ffffff8068c15c80
> > >>> [  261.215501][T11285] Call trace:
> > >>> [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
> > >>> [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
> > >>> [  261.215516][T11285]  invoke_syscall+0x58/0x114
> > >>> [  261.215526][T11285]  el0_svc_common+0xac/0xe0
> > >>> [  261.215532][T11285]  do_el0_svc+0x1c/0x28
> > >>> [  261.215539][T11285]  el0_svc+0x38/0x68
> > >>> [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
> > >>> [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
> > >>> [  261.215552][T11285] ---[ end trace 0000000000000000 ]---
> > >>
> > >> Ah, yes. in unuse_pte(), you'll have to do the right thing if
> > >> !folio_test(anon) before doing the folio_add_anon_rmap_pte().
> > >>
> > >> You might have a fresh anon folio in the swapcache that was never mapped
> > >> (hopefully order-0, otherwise we'd likely be in trouble).
> > >
> > > Yes. It is order-0
> > >
> > > [  261.214260][T11285] page last allocated via order 0, migratetype
> > >
> > > Otherwise, we would have encountered this VM_WARN_ON_FOLIO?
> > >
> > > __folio_add_anon_rmap()
> > > {
> > > ...
> > > VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> > >       level != RMAP_LEVEL_PMD, folio);
> > > ...
> > > }
> > >
> > > Given that nobody has ever reported this warning, I assume all callers
> > > using folio_add_anon_rmap_pte(s) are right now safe to move to ?
> >
> > Yes, and we should likely add a VM_WARN_ON_ONCE() here that we have a
> > small folio if !anon. If that ever changes, we can assess the situation.
> >
>
> this patch actually has a WARN_ON for all !anon, so it extends the WARN
> to small.
>
> -       if (unlikely(!folio_test_anon(folio))) {
> -               VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> -               /*
> -                * For a PTE-mapped large folio, we only know that the single
> -                * PTE is exclusive. Further, __folio_set_anon() might not get
> -                * folio->index right when not given the address of the head
> -                * page.
> -                */
> -               VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> -                                level != RMAP_LEVEL_PMD, folio);
> -               __folio_set_anon(folio, vma, address,
> -                                !!(flags & RMAP_EXCLUSIVE));
> -       } else if (likely(!folio_test_ksm(folio))) {
> +       VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> +
>
> > Only swap created "new" anon folios without properly calling the right
> > function so far, all other code handles that correctly.

as I assume patch2/3 should have moved all !anon to
folio_add_new_anon_rmap()

diff --git a/mm/ksm.c b/mm/ksm.c
index d2641bc2efc9..c913ba8c37eb 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1420,7 +1420,14 @@ static int replace_page(struct vm_area_struct
*vma, struct page *page,
         */
        if (!is_zero_pfn(page_to_pfn(kpage))) {
                folio_get(kfolio);
-               folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
+               /*
+                * We currently ensure that new folios cannot be partially
+                * exclusive: they are either fully exclusive or fully shared.
+                */
+               if (!folio_test_anon(kfolio))
+                       folio_add_new_anon_rmap(kfolio, vma, addr, RMAP_NONE);
+               else
+                       folio_add_anon_rmap_pte(kfolio, kpage, vma,
addr, RMAP_NONE);
                newpte = mk_pte(kpage, vma->vm_page_prot);
        } else {
                /*
diff --git a/mm/memory.c b/mm/memory.c
index 1f24ecdafe05..16797cc22058 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4339,6 +4339,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
        if (unlikely(folio != swapcache && swapcache)) {
                folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
                folio_add_lru_vma(folio, vma);
+       } else if (!folio_test_anon(folio)) {
+               /*
+                * We currently ensure that new folios cannot be partially
+                * exclusive: they are either fully exclusive or fully shared.
+                */
+               folio_add_new_anon_rmap(folio, vma, address, rmap_flags);
        } else {
                folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
                                        rmap_flags);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ae1d2700f6a3..ac53e46de48a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1908,8 +1908,14 @@ static int unuse_pte(struct vm_area_struct
*vma, pmd_t *pmd,
                VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
                if (pte_swp_exclusive(old_pte))
                        rmap_flags |= RMAP_EXCLUSIVE;
-
-               folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags);
+               /*
+                * We currently ensure that new folios cannot be partially
+                * exclusive: they are either fully exclusive or fully shared.
+                */
+               if (!folio_test_anon(folio))
+                       folio_add_new_anon_rmap(folio, vma, addr, rmap_flags);
+               else
+                       folio_add_anon_rmap_pte(folio, page, vma,
addr, rmap_flags);
        } else { /* ksm created a completely new copy */
                folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
                folio_add_lru_vma(folio, vma);


> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
David Hildenbrand June 14, 2024, 9:04 a.m. UTC | #9
On 14.06.24 10:58, Barry Song wrote:
> On Fri, Jun 14, 2024 at 8:56 PM Barry Song <21cnbao@gmail.com> wrote:
>>
>> On Fri, Jun 14, 2024 at 8:51 PM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 14.06.24 09:56, Barry Song wrote:
>>>> On Thu, Jun 13, 2024 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 13.06.24 11:06, Barry Song wrote:
>>>>>> On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>>>
>>>>>>> On 13.06.24 10:46, Barry Song wrote:
>>>>>>>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> From: Barry Song <v-songbaohua@oppo.com>
>>>>>>>>>
>>>>>>>>> The folio_test_anon(folio)==false case within do_swap_page() has been
>>>>>>>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
>>>>>>>>> consistently pass anonymous folios.
>>>>>>>>>
>>>>>>>>> stack 1:
>>>>>>>>> remove_migration_pmd
>>>>>>>>>        -> folio_add_anon_rmap_pmd
>>>>>>>>>          -> __folio_add_anon_rmap
>>>>>>>>>
>>>>>>>>> stack 2:
>>>>>>>>> __split_huge_pmd_locked
>>>>>>>>>        -> folio_add_anon_rmap_ptes
>>>>>>>>>           -> __folio_add_anon_rmap
>>>>>>>>>
>>>>>>>>> __folio_add_anon_rmap() only needs to handle the cases
>>>>>>>>> folio_test_anon(folio)==true now.
>>>>>>>>
>>>>>>>> My team reported a case where swapoff() is calling
>>>>>>>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
>>>>>>>> with one new anon  (!folio_test_anon(folio)).
>>>>>>>>
>>>>>>>> I will double check all folio_add_anon_rmap_pte() cases.
>>>>>>>
>>>>>>> Right, swapoff() path is a bit special (e.g., we don't do any kind of
>>>>>>> batching on the swapoff() path).
>>>>>>>
>>>>>>> But we should never get a new large anon folio there, or could that now
>>>>>>> happen?
>>>>>>
>>>>>> My team encountered the following issue while testing this RFC
>>>>>> series on real hardware. Let me take a look to identify the
>>>>>> problem.
>>>>>>
>>>>>> [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
>>>>>> mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
>>>>>> [  261.214213][T11285] memcg:ffffff8003678000
>>>>>> [  261.214217][T11285] aops:swap_aops
>>>>>> [  261.214233][T11285] flags:
>>>>>> 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
>>>>>> [  261.214241][T11285] page_type: 0x0()
>>>>>> [  261.214246][T11285] raw: 2000000000081009 0000000000000000
>>>>>> dead000000000122 0000000000000000
>>>>>> [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
>>>>>> 0000000400000000 ffffff8003678000
>>>>>> [  261.214254][T11285] page dumped because:
>>>>>> VM_WARN_ON_FOLIO(!folio_test_anon(folio))
>>>>>> [  261.214257][T11285] page_owner tracks the page as allocated
>>>>>> [  261.214260][T11285] page last allocated via order 0, migratetype
>>>>>> Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
>>>>>> 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
>>>>>> [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
>>>>>> [  261.214284][T11285]  prep_new_page+0x28/0x13c
>>>>>> [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
>>>>>> [  261.214298][T11285]  __alloc_pages+0x15c/0x330
>>>>>> [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
>>>>>> [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
>>>>>> [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
>>>>>> [  261.214326][T11285]  swapin_readahead+0x64/0x448
>>>>>> [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
>>>>>> [  261.214337][T11285]  invoke_syscall+0x58/0x114
>>>>>> [  261.214353][T11285]  el0_svc_common+0xac/0xe0
>>>>>> [  261.214360][T11285]  do_el0_svc+0x1c/0x28
>>>>>> [  261.214366][T11285]  el0_svc+0x38/0x68
>>>>>> [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
>>>>>> [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
>>>>>> [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
>>>>>> [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
>>>>>> [  261.214395][T11285]  free_unref_page_list+0x84/0x378
>>>>>> [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
>>>>>> [  261.214409][T11285]  evict_folios+0x1458/0x19b4
>>>>>> [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
>>>>>> [  261.214422][T11285]  shrink_one+0xa8/0x234
>>>>>> [  261.214427][T11285]  shrink_node+0xb38/0xde0
>>>>>> [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
>>>>>> [  261.214437][T11285]  kswapd+0x290/0x4e4
>>>>>> [  261.214442][T11285]  kthread+0x114/0x1bc
>>>>>> [  261.214459][T11285]  ret_from_fork+0x10/0x20
>>>>>> [  261.214477][T11285] ------------[ cut here ]------------
>>>>>> [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
>>>>>> folio_add_anon_rmap_ptes+0x2b4/0x330
>>>>>>
>>>>>> [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
>>>>>> -SSBS BTYPE=--)
>>>>>> [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
>>>>>> [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
>>>>>> [  261.215433][T11285] sp : ffffffc0a7dbbbf0
>>>>>> [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
>>>>>> x27: ffffff80db480000
>>>>>> [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
>>>>>> x24: 0000007b44941000
>>>>>> [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
>>>>>> x21: fffffffe04cc2980
>>>>>> [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
>>>>>> x18: ffffffed011dae80
>>>>>> [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
>>>>>> x15: 0000000000000004
>>>>>> [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
>>>>>> x12: 0000000000000003
>>>>>> [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
>>>>>> : 0d46c0889b468e00
>>>>>> [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
>>>>>> : 322e31363220205b
>>>>>> [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
>>>>>> : 0000000000000000
>>>>>> [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
>>>>>> : ffffff8068c15c80
>>>>>> [  261.215501][T11285] Call trace:
>>>>>> [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
>>>>>> [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
>>>>>> [  261.215516][T11285]  invoke_syscall+0x58/0x114
>>>>>> [  261.215526][T11285]  el0_svc_common+0xac/0xe0
>>>>>> [  261.215532][T11285]  do_el0_svc+0x1c/0x28
>>>>>> [  261.215539][T11285]  el0_svc+0x38/0x68
>>>>>> [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
>>>>>> [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
>>>>>> [  261.215552][T11285] ---[ end trace 0000000000000000 ]---
>>>>>
>>>>> Ah, yes. in unuse_pte(), you'll have to do the right thing if
>>>>> !folio_test(anon) before doing the folio_add_anon_rmap_pte().
>>>>>
>>>>> You might have a fresh anon folio in the swapcache that was never mapped
>>>>> (hopefully order-0, otherwise we'd likely be in trouble).
>>>>
>>>> Yes. It is order-0
>>>>
>>>> [  261.214260][T11285] page last allocated via order 0, migratetype
>>>>
>>>> Otherwise, we would have encountered this VM_WARN_ON_FOLIO?
>>>>
>>>> __folio_add_anon_rmap()
>>>> {
>>>> ...
>>>> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>>>>        level != RMAP_LEVEL_PMD, folio);
>>>> ...
>>>> }
>>>>
>>>> Given that nobody has ever reported this warning, I assume all callers
>>>> using folio_add_anon_rmap_pte(s) are right now safe to move to ?
>>>
>>> Yes, and we should likely add a VM_WARN_ON_ONCE() here that we have a
>>> small folio if !anon. If that ever changes, we can assess the situation.
>>>
>>
>> this patch actually has a WARN_ON for all !anon, so it extends the WARN
>> to small.

Not what I mean, see below:

>>
>> -       if (unlikely(!folio_test_anon(folio))) {
>> -               VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
>> -               /*
>> -                * For a PTE-mapped large folio, we only know that the single
>> -                * PTE is exclusive. Further, __folio_set_anon() might not get
>> -                * folio->index right when not given the address of the head
>> -                * page.
>> -                */
>> -               VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>> -                                level != RMAP_LEVEL_PMD, folio);
>> -               __folio_set_anon(folio, vma, address,
>> -                                !!(flags & RMAP_EXCLUSIVE));
>> -       } else if (likely(!folio_test_ksm(folio))) {
>> +       VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>> +
>>
>>> Only swap created "new" anon folios without properly calling the right
>>> function so far, all other code handles that correctly.
> 
> as I assume patch2/3 should have moved all !anon to
> folio_add_new_anon_rmap()
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d2641bc2efc9..c913ba8c37eb 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1420,7 +1420,14 @@ static int replace_page(struct vm_area_struct
> *vma, struct page *page,
>           */
>          if (!is_zero_pfn(page_to_pfn(kpage))) {
>                  folio_get(kfolio);
> -               folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
> +               /*
> +                * We currently ensure that new folios cannot be partially
> +                * exclusive: they are either fully exclusive or fully shared.
> +                */
> +               if (!folio_test_anon(kfolio))
> +                       folio_add_new_anon_rmap(kfolio, vma, addr, RMAP_NONE);
> +               else
> +                       folio_add_anon_rmap_pte(kfolio, kpage, vma, > addr, RMAP_NONE);

I don't think that is required? We are only working with anon folios. Or 
were you able to trigger this? (which would be weird)

[...]

> -               folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags);
> +               /*
> +                * We currently ensure that new folios cannot be partially
> +                * exclusive: they are either fully exclusive or fully shared.
> +                */
> +               if (!folio_test_anon(folio))

Here I suggest changing the comment to (and adding the VM_WARN_ON_ONCE):

/*
  * We currently only expect small !anon folios, for which we now that
  * they are either fully exclusive or fully shared. If we ever get large
  * folios here, we have to be careful.
  */
if (!folio_test_anon(folio) {
	VM_WARN_ON_ONCE(folio_test_large(folio));
	folio_add_new_anon_rmap(folio, vma, addr, rmap_flags);
} else {
...
Barry Song June 14, 2024, 9:33 a.m. UTC | #10
On Fri, Jun 14, 2024 at 9:04 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 14.06.24 10:58, Barry Song wrote:
> > On Fri, Jun 14, 2024 at 8:56 PM Barry Song <21cnbao@gmail.com> wrote:
> >>
> >> On Fri, Jun 14, 2024 at 8:51 PM David Hildenbrand <david@redhat.com> wrote:
> >>>
> >>> On 14.06.24 09:56, Barry Song wrote:
> >>>> On Thu, Jun 13, 2024 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>>
> >>>>> On 13.06.24 11:06, Barry Song wrote:
> >>>>>> On Thu, Jun 13, 2024 at 8:49 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>>>>
> >>>>>>> On 13.06.24 10:46, Barry Song wrote:
> >>>>>>>> On Thu, Jun 13, 2024 at 12:08 PM Barry Song <21cnbao@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> From: Barry Song <v-songbaohua@oppo.com>
> >>>>>>>>>
> >>>>>>>>> The folio_test_anon(folio)==false case within do_swap_page() has been
> >>>>>>>>> relocated to folio_add_new_anon_rmap(). Additionally, two other callers
> >>>>>>>>> consistently pass anonymous folios.
> >>>>>>>>>
> >>>>>>>>> stack 1:
> >>>>>>>>> remove_migration_pmd
> >>>>>>>>>        -> folio_add_anon_rmap_pmd
> >>>>>>>>>          -> __folio_add_anon_rmap
> >>>>>>>>>
> >>>>>>>>> stack 2:
> >>>>>>>>> __split_huge_pmd_locked
> >>>>>>>>>        -> folio_add_anon_rmap_ptes
> >>>>>>>>>           -> __folio_add_anon_rmap
> >>>>>>>>>
> >>>>>>>>> __folio_add_anon_rmap() only needs to handle the cases
> >>>>>>>>> folio_test_anon(folio)==true now.
> >>>>>>>>
> >>>>>>>> My team reported a case where swapoff() is calling
> >>>>>>>> folio_add_anon_rmap_pte *not* folio_add_anon_rmap_ptes
> >>>>>>>> with one new anon  (!folio_test_anon(folio)).
> >>>>>>>>
> >>>>>>>> I will double check all folio_add_anon_rmap_pte() cases.
> >>>>>>>
> >>>>>>> Right, swapoff() path is a bit special (e.g., we don't do any kind of
> >>>>>>> batching on the swapoff() path).
> >>>>>>>
> >>>>>>> But we should never get a new large anon folio there, or could that now
> >>>>>>> happen?
> >>>>>>
> >>>>>> My team encountered the following issue while testing this RFC
> >>>>>> series on real hardware. Let me take a look to identify the
> >>>>>> problem.
> >>>>>>
> >>>>>> [  261.214195][T11285] page:000000004cdd779e refcount:4 mapcount:1
> >>>>>> mapping:00000000ba142c22 index:0x1 pfn:0x1b30a6
> >>>>>> [  261.214213][T11285] memcg:ffffff8003678000
> >>>>>> [  261.214217][T11285] aops:swap_aops
> >>>>>> [  261.214233][T11285] flags:
> >>>>>> 0x2000000000081009(locked|uptodate|owner_priv_1|swapbacked|zone=1|kasantag=0x0)
> >>>>>> [  261.214241][T11285] page_type: 0x0()
> >>>>>> [  261.214246][T11285] raw: 2000000000081009 0000000000000000
> >>>>>> dead000000000122 0000000000000000
> >>>>>> [  261.214251][T11285] raw: 0000000000000001 00000000000d84b3
> >>>>>> 0000000400000000 ffffff8003678000
> >>>>>> [  261.214254][T11285] page dumped because:
> >>>>>> VM_WARN_ON_FOLIO(!folio_test_anon(folio))
> >>>>>> [  261.214257][T11285] page_owner tracks the page as allocated
> >>>>>> [  261.214260][T11285] page last allocated via order 0, migratetype
> >>>>>> Movable, gfp_mask 0x2140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), pid
> >>>>>> 11285, tgid 11285 (swapoff), ts 261214177545, free_ts 261151875699
> >>>>>> [  261.214268][T11285]  post_alloc_hook+0x1b8/0x1c0
> >>>>>> [  261.214284][T11285]  prep_new_page+0x28/0x13c
> >>>>>> [  261.214291][T11285]  get_page_from_freelist+0x198c/0x1aa4
> >>>>>> [  261.214298][T11285]  __alloc_pages+0x15c/0x330
> >>>>>> [  261.214304][T11285]  __folio_alloc+0x1c/0x4c
> >>>>>> [  261.214310][T11285]  __read_swap_cache_async+0xd8/0x48c
> >>>>>> [  261.214320][T11285]  swap_cluster_readahead+0x158/0x324
> >>>>>> [  261.214326][T11285]  swapin_readahead+0x64/0x448
> >>>>>> [  261.214331][T11285]  __arm64_sys_swapoff+0x6ec/0x14b0
> >>>>>> [  261.214337][T11285]  invoke_syscall+0x58/0x114
> >>>>>> [  261.214353][T11285]  el0_svc_common+0xac/0xe0
> >>>>>> [  261.214360][T11285]  do_el0_svc+0x1c/0x28
> >>>>>> [  261.214366][T11285]  el0_svc+0x38/0x68
> >>>>>> [  261.214372][T11285]  el0t_64_sync_handler+0x68/0xbc
> >>>>>> [  261.214376][T11285]  el0t_64_sync+0x1a8/0x1ac
> >>>>>> [  261.214381][T11285] page last free pid 90 tgid 90 stack trace:
> >>>>>> [  261.214386][T11285]  free_unref_page_prepare+0x338/0x374
> >>>>>> [  261.214395][T11285]  free_unref_page_list+0x84/0x378
> >>>>>> [  261.214400][T11285]  shrink_folio_list+0x1234/0x13e4
> >>>>>> [  261.214409][T11285]  evict_folios+0x1458/0x19b4
> >>>>>> [  261.214417][T11285]  try_to_shrink_lruvec+0x1c8/0x264
> >>>>>> [  261.214422][T11285]  shrink_one+0xa8/0x234
> >>>>>> [  261.214427][T11285]  shrink_node+0xb38/0xde0
> >>>>>> [  261.214432][T11285]  balance_pgdat+0x7a4/0xdb4
> >>>>>> [  261.214437][T11285]  kswapd+0x290/0x4e4
> >>>>>> [  261.214442][T11285]  kthread+0x114/0x1bc
> >>>>>> [  261.214459][T11285]  ret_from_fork+0x10/0x20
> >>>>>> [  261.214477][T11285] ------------[ cut here ]------------
> >>>>>> [  261.214480][T11285] WARNING: CPU: 3 PID: 11285 at mm/rmap.c:1305
> >>>>>> folio_add_anon_rmap_ptes+0x2b4/0x330
> >>>>>>
> >>>>>> [  261.215403][T11285] pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT
> >>>>>> -SSBS BTYPE=--)
> >>>>>> [  261.215423][T11285] pc : folio_add_anon_rmap_ptes+0x2b4/0x330
> >>>>>> [  261.215428][T11285] lr : folio_add_anon_rmap_ptes+0x2b4/0x330
> >>>>>> [  261.215433][T11285] sp : ffffffc0a7dbbbf0
> >>>>>> [  261.215437][T11285] x29: ffffffc0a7dbbbf0 x28: ffffff8040860a08
> >>>>>> x27: ffffff80db480000
> >>>>>> [  261.215445][T11285] x26: fffffffe04cc2980 x25: ffffff808f77f120
> >>>>>> x24: 0000007b44941000
> >>>>>> [  261.215452][T11285] x23: 0000000000000001 x22: 0000000000000001
> >>>>>> x21: fffffffe04cc2980
> >>>>>> [  261.215459][T11285] x20: ffffff80db480000 x19: fffffffe04cc2980
> >>>>>> x18: ffffffed011dae80
> >>>>>> [  261.215465][T11285] x17: 0000000000000001 x16: ffffffffffffffff
> >>>>>> x15: 0000000000000004
> >>>>>> [  261.215471][T11285] x14: ffffff82fb73fac0 x13: 0000000000000003
> >>>>>> x12: 0000000000000003
> >>>>>> [  261.215476][T11285] x11: 00000000fffeffff x10: c0000000fffeffff x9
> >>>>>> : 0d46c0889b468e00
> >>>>>> [  261.215483][T11285] x8 : 0d46c0889b468e00 x7 : 545b5d3935343431 x6
> >>>>>> : 322e31363220205b
> >>>>>> [  261.215489][T11285] x5 : ffffffed013de407 x4 : ffffffed00698967 x3
> >>>>>> : 0000000000000000
> >>>>>> [  261.215495][T11285] x2 : 0000000000000000 x1 : ffffffc0a7dbb8c0 x0
> >>>>>> : ffffff8068c15c80
> >>>>>> [  261.215501][T11285] Call trace:
> >>>>>> [  261.215504][T11285]  folio_add_anon_rmap_ptes+0x2b4/0x330
> >>>>>> [  261.215509][T11285]  __arm64_sys_swapoff+0x8cc/0x14b0
> >>>>>> [  261.215516][T11285]  invoke_syscall+0x58/0x114
> >>>>>> [  261.215526][T11285]  el0_svc_common+0xac/0xe0
> >>>>>> [  261.215532][T11285]  do_el0_svc+0x1c/0x28
> >>>>>> [  261.215539][T11285]  el0_svc+0x38/0x68
> >>>>>> [  261.215544][T11285]  el0t_64_sync_handler+0x68/0xbc
> >>>>>> [  261.215548][T11285]  el0t_64_sync+0x1a8/0x1ac
> >>>>>> [  261.215552][T11285] ---[ end trace 0000000000000000 ]---
> >>>>>
> >>>>> Ah, yes. in unuse_pte(), you'll have to do the right thing if
> >>>>> !folio_test(anon) before doing the folio_add_anon_rmap_pte().
> >>>>>
> >>>>> You might have a fresh anon folio in the swapcache that was never mapped
> >>>>> (hopefully order-0, otherwise we'd likely be in trouble).
> >>>>
> >>>> Yes. It is order-0
> >>>>
> >>>> [  261.214260][T11285] page last allocated via order 0, migratetype
> >>>>
> >>>> Otherwise, we would have encountered this VM_WARN_ON_FOLIO?
> >>>>
> >>>> __folio_add_anon_rmap()
> >>>> {
> >>>> ...
> >>>> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> >>>>        level != RMAP_LEVEL_PMD, folio);
> >>>> ...
> >>>> }
> >>>>
> >>>> Given that nobody has ever reported this warning, I assume all callers
> >>>> using folio_add_anon_rmap_pte(s) are right now safe to move to ?
> >>>
> >>> Yes, and we should likely add a VM_WARN_ON_ONCE() here that we have a
> >>> small folio if !anon. If that ever changes, we can assess the situation.
> >>>
> >>
> >> this patch actually has a WARN_ON for all !anon, so it extends the WARN
> >> to small.
>
> Not what I mean, see below:
>
> >>
> >> -       if (unlikely(!folio_test_anon(folio))) {
> >> -               VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> >> -               /*
> >> -                * For a PTE-mapped large folio, we only know that the single
> >> -                * PTE is exclusive. Further, __folio_set_anon() might not get
> >> -                * folio->index right when not given the address of the head
> >> -                * page.
> >> -                */
> >> -               VM_WARN_ON_FOLIO(folio_test_large(folio) &&
> >> -                                level != RMAP_LEVEL_PMD, folio);
> >> -               __folio_set_anon(folio, vma, address,
> >> -                                !!(flags & RMAP_EXCLUSIVE));
> >> -       } else if (likely(!folio_test_ksm(folio))) {
> >> +       VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> >> +
> >>
> >>> Only swap created "new" anon folios without properly calling the right
> >>> function so far, all other code handles that correctly.
> >
> > as I assume patch2/3 should have moved all !anon to
> > folio_add_new_anon_rmap()
> >
> > diff --git a/mm/ksm.c b/mm/ksm.c
> > index d2641bc2efc9..c913ba8c37eb 100644
> > --- a/mm/ksm.c
> > +++ b/mm/ksm.c
> > @@ -1420,7 +1420,14 @@ static int replace_page(struct vm_area_struct
> > *vma, struct page *page,
> >           */
> >          if (!is_zero_pfn(page_to_pfn(kpage))) {
> >                  folio_get(kfolio);
> > -               folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
> > +               /*
> > +                * We currently ensure that new folios cannot be partially
> > +                * exclusive: they are either fully exclusive or fully shared.
> > +                */
> > +               if (!folio_test_anon(kfolio))
> > +                       folio_add_new_anon_rmap(kfolio, vma, addr, RMAP_NONE);
> > +               else
> > +                       folio_add_anon_rmap_pte(kfolio, kpage, vma, > addr, RMAP_NONE);
>
> I don't think that is required? We are only working with anon folios. Or
> were you able to trigger this? (which would be weird)

I didn't trigger this. but I am not sure if kfifo is always anon based on
the code context.

for page,  it is 100% anon(otherwise "goto out"), but I am not quite
sure about kpage
by the code context.

static int try_to_merge_one_page(struct vm_area_struct *vma,
                                 struct page *page, struct page *kpage)
{
        pte_t orig_pte = __pte(0);
        int err = -EFAULT;

        if (page == kpage)                      /* ksm page forked */
                return 0;

        if (!PageAnon(page))
                goto out;
        ....
}

Then I saw this

static int replace_page(struct vm_area_struct *vma, struct page *page,
                        struct page *kpage, pte_t orig_pte)
{
        ...
        VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
        VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage),
                        kfolio);
}

If kfolio is always anon, we should have used
VM_BUG_ON_FOLIO(PageAnonExclusive(kpage), folio)
just like
VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
without "folio_test_anon(kfolio)".

So I lost my way.

>
> [...]
>
> > -               folio_add_anon_rmap_pte(folio, page, vma, addr, rmap_flags);
> > +               /*
> > +                * We currently ensure that new folios cannot be partially
> > +                * exclusive: they are either fully exclusive or fully shared.
> > +                */
> > +               if (!folio_test_anon(folio))
>
> Here I suggest changing the comment to (and adding the VM_WARN_ON_ONCE):
>
> /*
>   * We currently only expect small !anon folios, for which we now that
>   * they are either fully exclusive or fully shared. If we ever get large
>   * folios here, we have to be careful.
>   */
> if (!folio_test_anon(folio) {
>         VM_WARN_ON_ONCE(folio_test_large(folio));
>         folio_add_new_anon_rmap(folio, vma, addr, rmap_flags);
> } else {
> ...

looks good to me.

>
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
David Hildenbrand June 14, 2024, 11:10 a.m. UTC | #11
>> I don't think that is required? We are only working with anon folios. Or
>> were you able to trigger this? (which would be weird)
> 
> I didn't trigger this. but I am not sure if kfifo is always anon based on
> the code context.
> 
> for page,  it is 100% anon(otherwise "goto out"), but I am not quite
> sure about kpage
> by the code context.
> 
> static int try_to_merge_one_page(struct vm_area_struct *vma,
>                                   struct page *page, struct page *kpage)
> {
>          pte_t orig_pte = __pte(0);
>          int err = -EFAULT;
> 
>          if (page == kpage)                      /* ksm page forked */
>                  return 0;
> 
>          if (!PageAnon(page))
>                  goto out;
>          ....
> }
> 
> Then I saw this
> 
> static int replace_page(struct vm_area_struct *vma, struct page *page,
>                          struct page *kpage, pte_t orig_pte)
> {
>          ...
>          VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
>          VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage),
>                          kfolio);
> }
> 
> If kfolio is always anon, we should have used
> VM_BUG_ON_FOLIO(PageAnonExclusive(kpage), folio)
> just like
> VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
> without "folio_test_anon(kfolio)".
> 
> So I lost my way.

try_to_merge_one_page() is either called with a KSM page (anon) from 
try_to_merge_with_ksm_page() or with the shared zeropage (!anon and must 
never become anon) from cmp_and_merge_page().

So in replace_page(), we either have an ksm/anon page or the shared 
zeropage.

We never updated the documentation of replace_page() to spell out that 
"kpage" can also be the shared zeropage.

Note how replace_page() calls folio_add_anon_rmap_pte() not for the 
shared zeropage.

If we would have to craft a new anon page things would be going terribly 
wrong.

So not, this (!anon -> anon) must not happen and if it were to happen, 
it would be a real bug and your check in  folio_add_anon_rmap_pte() 
would catch it.
Barry Song June 14, 2024, 10:24 p.m. UTC | #12
On Fri, Jun 14, 2024 at 11:10 PM David Hildenbrand <david@redhat.com> wrote:
>
> >> I don't think that is required? We are only working with anon folios. Or
> >> were you able to trigger this? (which would be weird)
> >
> > I didn't trigger this. but I am not sure if kfifo is always anon based on
> > the code context.
> >
> > for page,  it is 100% anon(otherwise "goto out"), but I am not quite
> > sure about kpage
> > by the code context.
> >
> > static int try_to_merge_one_page(struct vm_area_struct *vma,
> >                                   struct page *page, struct page *kpage)
> > {
> >          pte_t orig_pte = __pte(0);
> >          int err = -EFAULT;
> >
> >          if (page == kpage)                      /* ksm page forked */
> >                  return 0;
> >
> >          if (!PageAnon(page))
> >                  goto out;
> >          ....
> > }
> >
> > Then I saw this
> >
> > static int replace_page(struct vm_area_struct *vma, struct page *page,
> >                          struct page *kpage, pte_t orig_pte)
> > {
> >          ...
> >          VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
> >          VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage),
> >                          kfolio);
> > }
> >
> > If kfolio is always anon, we should have used
> > VM_BUG_ON_FOLIO(PageAnonExclusive(kpage), folio)
> > just like
> > VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
> > without "folio_test_anon(kfolio)".
> >
> > So I lost my way.
>
> try_to_merge_one_page() is either called with a KSM page (anon) from
> try_to_merge_with_ksm_page() or with the shared zeropage (!anon and must
> never become anon) from cmp_and_merge_page().
>
> So in replace_page(), we either have an ksm/anon page or the shared
> zeropage.
>
> We never updated the documentation of replace_page() to spell out that
> "kpage" can also be the shared zeropage.
>
> Note how replace_page() calls folio_add_anon_rmap_pte() not for the
> shared zeropage.
>
> If we would have to craft a new anon page things would be going terribly
> wrong.
>
> So not, this (!anon -> anon) must not happen and if it were to happen,
> it would be a real bug and your check in  folio_add_anon_rmap_pte()
> would catch it.

Thanks very much for the explanation. I wonder if the below can help
improve the doc. If yes, I may add a separate patch for it in v2.

diff --git a/mm/ksm.c b/mm/ksm.c
index d2641bc2efc9..56b10265e617 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1411,14 +1411,13 @@ static int replace_page(struct vm_area_struct
*vma, struct page *page,
                goto out_mn;
        }
        VM_BUG_ON_PAGE(PageAnonExclusive(page), page);
-       VM_BUG_ON_FOLIO(folio_test_anon(kfolio) && PageAnonExclusive(kpage),
-                       kfolio);
-
        /*
         * No need to check ksm_use_zero_pages here: we can only have a
         * zero_page here if ksm_use_zero_pages was enabled already.
         */
        if (!is_zero_pfn(page_to_pfn(kpage))) {
+               VM_BUG_ON_FOLIO(!folio_test_anon(kfolio) ||
PageAnonExclusive(kpage),
+                       kfolio);
                folio_get(kfolio);
                folio_add_anon_rmap_pte(kfolio, kpage, vma, addr, RMAP_NONE);
                newpte = mk_pte(kpage, vma->vm_page_prot);

>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index e612d999811a..e84c706c8241 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1299,21 +1299,10 @@  static __always_inline void __folio_add_anon_rmap(struct folio *folio,
 
 	nr = __folio_add_rmap(folio, page, nr_pages, level, &nr_pmdmapped);
 
-	if (unlikely(!folio_test_anon(folio))) {
-		VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
-		/*
-		 * For a PTE-mapped large folio, we only know that the single
-		 * PTE is exclusive. Further, __folio_set_anon() might not get
-		 * folio->index right when not given the address of the head
-		 * page.
-		 */
-		VM_WARN_ON_FOLIO(folio_test_large(folio) &&
-				 level != RMAP_LEVEL_PMD, folio);
-		__folio_set_anon(folio, vma, address,
-				 !!(flags & RMAP_EXCLUSIVE));
-	} else if (likely(!folio_test_ksm(folio))) {
+	VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
+
+	if (likely(!folio_test_ksm(folio)))
 		__page_check_anon_rmap(folio, page, vma, address);
-	}
 
 	__folio_mod_stat(folio, nr, nr_pmdmapped);