diff mbox series

[RFC,for,stable,5.15,and,5.10] mm/memory: only copy anonymous pages during fork()

Message ID 20241113160103.48943-2-vbabka@suse.cz (mailing list archive)
State New
Headers show
Series [RFC,for,stable,5.15,and,5.10] mm/memory: only copy anonymous pages during fork() | expand

Commit Message

Vlastimil Babka Nov. 13, 2024, 4:01 p.m. UTC
When a combination of unfortunate factors occur, we might BUG in fork():

dup_mmap()
  copy_page_range()
    copy_***_range()
      copy_present_pte()
        copy_present_page()
          page_add_new_anon_rmap()
            __page_set_anon_rmap()
              BUG_ON(!anon_vma);

The factors are:

- source vma is VM_MIXEDMAP otherwise copy_page_range() would bail out
  when !src_vma->anon_vma
  - I think this was due to gpfs, but can happen in-tree as well
- is_cow_mapping() is true because VM_MAYWRITE (even though the vma
  was a read-only mapping of a .so file)
- MMF_HAS_PINNED is true, thus some actual pinning has happened
- page_maybe_dma_pinned() is true as a false positive, because mapcount
  and thus refcount is >1024

That makes us reach page_needs_cow_for_dma() in copy_present_page() and
evaluate it as true and attempt to CoW a file page and hit the BUG_ON()
because we never had a reason to instantiate anon_vma for the source
vma.

AFAICS this was fixed inadvertedly in 5.19 by commit fb3d824d1a46
("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and
page_try_dup_anon_rmap()") or another commit in that series. What caught
my attention is this part of the changelog:

    We really only care about pins on anonymous pages, because they are prone
    to getting replaced in the COW handler once mapped R/O.  For !anon pages
    in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really care about
    that, at least not that I could come up with an example.

And as part of that commit, an PageAnon() test is added in
copy_present_pte().

But the code is already refactored a lot, so this is an attempt at a
minimal fix for LTS kernels by placing the PageAnon() check to
copy_present_page().

Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
Cc: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Hi, we've seen this in our 5.14 based kernel and it involved the out of
tree gpfs module, but I believe the same thing can happen in LTS's 5.10
and 5.15 without out of tree modules as well. So I'd like your opinion
on this fix before I propose it to stable as a non-standard
version-specific fix (I don't think we'd want to backport fb3d824d1a46
with prerequisities). Thanks.

 mm/memory.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

David Hildenbrand Nov. 13, 2024, 4:09 p.m. UTC | #1
On 13.11.24 17:01, Vlastimil Babka wrote:
> When a combination of unfortunate factors occur, we might BUG in fork():
> 
> dup_mmap()
>    copy_page_range()
>      copy_***_range()
>        copy_present_pte()
>          copy_present_page()
>            page_add_new_anon_rmap()
>              __page_set_anon_rmap()
>                BUG_ON(!anon_vma);
> 
> The factors are:
> 
> - source vma is VM_MIXEDMAP otherwise copy_page_range() would bail out
>    when !src_vma->anon_vma
>    - I think this was due to gpfs, but can happen in-tree as well
> - is_cow_mapping() is true because VM_MAYWRITE (even though the vma
>    was a read-only mapping of a .so file)
> - MMF_HAS_PINNED is true, thus some actual pinning has happened
> - page_maybe_dma_pinned() is true as a false positive, because mapcount
>    and thus refcount is >1024
> 
> That makes us reach page_needs_cow_for_dma() in copy_present_page() and
> evaluate it as true and attempt to CoW a file page and hit the BUG_ON()
> because we never had a reason to instantiate anon_vma for the source
> vma.
> 
> AFAICS this was fixed inadvertedly in 5.19 by commit fb3d824d1a46
> ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and
> page_try_dup_anon_rmap()") or another commit in that series. What caught
> my attention is this part of the changelog:
> 
>      We really only care about pins on anonymous pages, because they are prone
>      to getting replaced in the COW handler once mapped R/O.  For !anon pages
>      in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really care about
>      that, at least not that I could come up with an example.
> 
> And as part of that commit, an PageAnon() test is added in
> copy_present_pte().
> 
> But the code is already refactored a lot, so this is an attempt at a
> minimal fix for LTS kernels by placing the PageAnon() check to
> copy_present_page().
> 
> Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> Cc: Peter Xu <peterx@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Hi, we've seen this in our 5.14 based kernel and it involved the out of
> tree gpfs module, but I believe the same thing can happen in LTS's 5.10
> and 5.15 without out of tree modules as well. So I'd like your opinion
> on this fix before I propose it to stable as a non-standard
> version-specific fix (I don't think we'd want to backport fb3d824d1a46
> with prerequisities). Thanks.

I recall seeing+discussing this exact patch already a couple years ago :D

Ah, here is the 5.15 version

https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com

And the 5.10 version

https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/


... I could have sworn they got applied.

... and in linux-5.10.y I see

commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32
Author: Yuanzheng Song <songyuanzheng@huawei.com>
Date:   Fri Oct 28 03:07:05 2022 +0000

     mm/memory: add non-anonymous page check in the copy_present_page()

     The vma->anon_vma of the child process may be NULL because
     the entire vma does not contain anonymous pages. In this
     case, a BUG will occur when the copy_present_page() passes
     a copy of a non-anonymous page of that vma to the
     page_add_new_anon_rmap() to set up new anonymous rmap.



Maybe you missed that the PageAnon() check is simply a couple of lines 
further down in there?
Vlastimil Babka Nov. 13, 2024, 4:18 p.m. UTC | #2
On 11/13/24 17:09, David Hildenbrand wrote:
>> ---
>> Hi, we've seen this in our 5.14 based kernel and it involved the out of
>> tree gpfs module, but I believe the same thing can happen in LTS's 5.10
>> and 5.15 without out of tree modules as well. So I'd like your opinion
>> on this fix before I propose it to stable as a non-standard
>> version-specific fix (I don't think we'd want to backport fb3d824d1a46
>> with prerequisities). Thanks.
> 
> I recall seeing+discussing this exact patch already a couple years ago :D
> 
> Ah, here is the 5.15 version
> 
> https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com
> 
> And the 5.10 version
> 
> https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/
> 
> 
> ... I could have sworn they got applied.
> 
> ... and in linux-5.10.y I see
> 
> commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32
> Author: Yuanzheng Song <songyuanzheng@huawei.com>
> Date:   Fri Oct 28 03:07:05 2022 +0000
> 
>      mm/memory: add non-anonymous page check in the copy_present_page()
> 
>      The vma->anon_vma of the child process may be NULL because
>      the entire vma does not contain anonymous pages. In this
>      case, a BUG will occur when the copy_present_page() passes
>      a copy of a non-anonymous page of that vma to the
>      page_add_new_anon_rmap() to set up new anonymous rmap.
> 
> 
> 
> Maybe you missed that the PageAnon() check is simply a couple of lines 
> further down in there?

No I was only looking at the 5.15 branch so far, and seems it was never
applied there, unlike 5.10. Weird. But thanks for the heads up.
David Hildenbrand Nov. 13, 2024, 4:25 p.m. UTC | #3
On 13.11.24 17:18, Vlastimil Babka wrote:
> On 11/13/24 17:09, David Hildenbrand wrote:
>>> ---
>>> Hi, we've seen this in our 5.14 based kernel and it involved the out of
>>> tree gpfs module, but I believe the same thing can happen in LTS's 5.10
>>> and 5.15 without out of tree modules as well. So I'd like your opinion
>>> on this fix before I propose it to stable as a non-standard
>>> version-specific fix (I don't think we'd want to backport fb3d824d1a46
>>> with prerequisities). Thanks.
>>
>> I recall seeing+discussing this exact patch already a couple years ago :D
>>
>> Ah, here is the 5.15 version
>>
>> https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com
>>
>> And the 5.10 version
>>
>> https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/
>>
>>
>> ... I could have sworn they got applied.
>>
>> ... and in linux-5.10.y I see
>>
>> commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32
>> Author: Yuanzheng Song <songyuanzheng@huawei.com>
>> Date:   Fri Oct 28 03:07:05 2022 +0000
>>
>>       mm/memory: add non-anonymous page check in the copy_present_page()
>>
>>       The vma->anon_vma of the child process may be NULL because
>>       the entire vma does not contain anonymous pages. In this
>>       case, a BUG will occur when the copy_present_page() passes
>>       a copy of a non-anonymous page of that vma to the
>>       page_add_new_anon_rmap() to set up new anonymous rmap.
>>
>>
>>
>> Maybe you missed that the PageAnon() check is simply a couple of lines
>> further down in there?
> 
> No I was only looking at the 5.15 branch so far, and seems it was never
> applied there, unlike 5.10. Weird. But thanks for the heads up.

Indeed, looks like it never got applied to 5.15 ...
Vlastimil Babka Nov. 13, 2024, 4:26 p.m. UTC | #4
On 11/13/24 17:25, David Hildenbrand wrote:
> On 13.11.24 17:18, Vlastimil Babka wrote:
>> On 11/13/24 17:09, David Hildenbrand wrote:
>>>> ---
>>>> Hi, we've seen this in our 5.14 based kernel and it involved the out of
>>>> tree gpfs module, but I believe the same thing can happen in LTS's 5.10
>>>> and 5.15 without out of tree modules as well. So I'd like your opinion
>>>> on this fix before I propose it to stable as a non-standard
>>>> version-specific fix (I don't think we'd want to backport fb3d824d1a46
>>>> with prerequisities). Thanks.
>>>
>>> I recall seeing+discussing this exact patch already a couple years ago :D
>>>
>>> Ah, here is the 5.15 version
>>>
>>> https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@huawei.com
>>>
>>> And the 5.10 version
>>>
>>> https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@huawei.com/
>>>
>>>
>>> ... I could have sworn they got applied.
>>>
>>> ... and in linux-5.10.y I see
>>>
>>> commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32
>>> Author: Yuanzheng Song <songyuanzheng@huawei.com>
>>> Date:   Fri Oct 28 03:07:05 2022 +0000
>>>
>>>       mm/memory: add non-anonymous page check in the copy_present_page()
>>>
>>>       The vma->anon_vma of the child process may be NULL because
>>>       the entire vma does not contain anonymous pages. In this
>>>       case, a BUG will occur when the copy_present_page() passes
>>>       a copy of a non-anonymous page of that vma to the
>>>       page_add_new_anon_rmap() to set up new anonymous rmap.
>>>
>>>
>>>
>>> Maybe you missed that the PageAnon() check is simply a couple of lines
>>> further down in there?
>> 
>> No I was only looking at the 5.15 branch so far, and seems it was never
>> applied there, unlike 5.10. Weird. But thanks for the heads up.
> 
> Indeed, looks like it never got applied to 5.15 ...

The author forgot to actually include stable@
will forward it
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 6d058973a97e..73871bac0e4c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -887,6 +887,10 @@  copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
 {
 	struct page *new_page;
 
+	/* We only care about pins on anonymous pages */
+	if (!PageAnon(page))
+		return 1;
+
 	/*
 	 * What we want to do is to check whether this page may
 	 * have been pinned by the parent process.  If so,